Group Membership Verification with Privacy: Sparse or Dense?
Group membership verification checks if a biometric trait corresponds to one member of a group without revealing the identity of that member. Recent contributions provide privacy for group membership protocols through the joint use of two mechanisms: quantizing templates into discrete embeddings, and aggregating several templates into one group representation.
However, this scheme has one drawback: the data structure representing the group has a limited size and cannot recognize noisy query when many templates are aggregated. Moreover, the sparsity of the embeddings seemingly plays a crucial role on the performance verification.
This paper proposes a mathematical model for group membership verification allowing to reveal the impact of sparsity on both security, compactness, and verification performances. This models bridges the gap towards a Bloom filter robust to noisy queries. It shows that a dense solution is more competitive unless the queries are almost noiseless.
Group membership verification is a procedure checking whether an item or an individual is a member of a group. If membership is positively established, then an access to some ressources (a building, a file, …) is granted; otherwise the access is refused. This paper focuses on privacy preserving group membership verification procedures where members must be distinguished from non-members, but where the members of a group should not be distinguished one another.
To this aim, a few recent contributions have proposed to rely on the aggregation and the embedding of several distinctive templates into a unique and compact high dimensional feature representing the members of a group [3, 4]. It has been demonstrated that this allows a good assessment of the membership property at test time. It has also been shown that this provides privacy and security. Privacy is enforced because it is impossible to infer from the aggregated feature which original distinctive template matches the one used to probe the system. Security is preserved since nothing meaningful leaks from embedded data [8, 9].
 and , however face severe limitations. Basically, it seems impossible to create features representing groups having many members. In this case, the probability to identify true positives vanishes and the false negative rate grows accordingly. Furthermore, the robustness of the matching procedure fades and becomes unable to absorb even the smallest amount of noise that inherently differentiate the enrolled template of one member and the template captured at query time for this same member. In contrast, features representing only few group members are robust to noise and cause almost no false negatives. A detailled analysis of  and  suggests that these limitations originate from the sparsity level of the features representing group members.
This paper investigates the impact of the sparsity level of the high dimensional features representing group members on the quality of (true positive) matches and on their robustness to noise. It shows it is possible to trade compactness and sparsity for better security or better verification performance.
Sect. II first considers the aggregation of discrete random sequences, and models this compromise with information theoretical tools. Sect. III applies this viewpoint to binary random sequences and shows that the noise on the query has an impact depending on the sparsity of the sequences. Sect. IV bridges the gap between the templates, i.e. real -dimensional vectors, and the discrete sequences considered in the previous sections. Sect. V gathers the experimental results for a group membership verification based on faces.
Ii Discrete Sequences
This section considers the problem of creating a representation of a group of sequences , whose use is to test whether a query sequence is a noisy version of one of these original sequences. This test is done at query time when the original sequences are no longer available and all that remains is the representation .
The sequences are elements of where is a finite alphabet of cardinality , say . The sequence follows a statistical model giving a central role to the symbol . The symbols of the sequences are independent and identically distributed with
for . Sparsity means that probability is small, density means that is close to so that is uniformly distributed over .
Ii-a Structure of the group representation
We impose the following conditions on the aggregation computing the group representation :
is a discrete sequence of the same length ,
Symbol only depends on symbols ,
The same aggregation is made index-wise: with abuse of notation, , ,
does not depend on any ordering of the set ,
These requirements are well known in traitor tracing and group testing as they usually model the collusion attack or the test results over groups. Here, they simplify the analysis reducing the problem to a single letter formulation where index is dropped involving symbols , and .
These conditions motivate a 2-stage construction. The first stage computes the type (a.k.a. histogram or tally) of the symbols . Denote by the set of possible type values. Its cardinality equals which might be too big. The second stage applies a surjective function , where is a much smaller set.
Ii-B Noisy query
At enrollment time, the system receives sequences, aggregates them into the compact representation , and then forgets the sequences. At query time, the system receives a new sequence conforming with one of the following hypotheses:
: is a noisy version of one of the enrolled sequences. Without loss of generality, .
: , where shares the same statistical model but it is independent of .
We model the source of noise (due to different acquisition conditions) by a discrete communication channel. It is defined by function with . We impose some symmetry w.r.t. the symbol : and , .
At query time, the system computes a score and compares to a threshold: hypothesis is deemed true if . This test leads to two probabilities of error:
is the probability of false positive: .
is the probability of false negative: .
The emphasis on is natural. It is expected that: i) the more sequences are aggregated, the less reliable the test is, ii) the longer the sequences are, the more reliable the test is.
Ii-C Figures of merit
The section presents three information theoretic quantities (expressed in nats) measuring the performances of the scheme. The first two depends on the statistical model of (especially ) and the aggregation mechanism . The last one depends moreover on the channel.
The compactness of the group representation is measured by the entropy . It roughly means that the number of typical sequences scales exponentially as , which can be theoretically compressed to the rate of nats per symbol.
We consider an insider aiming at disclosing one of the enrolled sequences. Observing the group representation , its uncertainty is measured by the equivocation . This means that the insider does not know which of the typical sequences the enrolled sequences are.
In our application, the requirement of utmost importance is to have a very small probability of false positive. We are interested in an asymptotical setup where . This motivates the use of the false positive error exponent as a figure of merit:
If , it means that exponentially vanishes as becomes larger. The theory of test hypothesis shows that is upper bounded by the mutual information where is a symbol of the query sequence, i.e. a noisy version of . It means that the necessary length for achieving the requirement is 
Ii-D Noiseless setup
The bigger and , the better the performance in terms of verifiability and security. Yet, they can not be both big at the same time. The noiseless case when the channel introduces no error and simply illustrates the trade-off:
with and (1). For a given , is maximised by the dense solution: with equality for .
Iii Binary alphabet
This section explores the binary case where . We first set the surjection as the identity function s.t. . Then, the impact of the surjection is investigated.
Iii-a Working with types
In the binary case, there are type values. There can be uniquely labelled by the number of symbols ‘1’ in , i.e. .
In the noiseless case, after some rewriting:
with , the entropy of a Bernoulli r.v. . If and is large:
This is not the maximum of this quantity. For large , the best option is to set
with and . This was proven in the totally different application of traitor tracing [6, Prop. 3.8].
This section outlines two setups: the dense setup where , and the sparse setup where goes to when more sequences are packed in the group representation. Both setups share the asymptotical property that for large . According to (3), we can pack a big number of sequences into one group representation provided that their length scales proportionally to .
The figure of merit for compactness for types is just where follows a binomial distribution: . In the dense setup , the binomial distribution is approximated by a Gaussian distribution providing:
In the sparse setup , the binomial distribution is approximated by a Poisson distribution :
This shows that the types are not compact in the dense setup; It approximatively remains constant in the sparse setup.
Thanks to (5), we only need to calculate . In the dense setup, and converges to as increases. Merging into a single representation protects an individual sequence. If sparse,
Therefore, converges to zero as increases, contrary to the dense setup. It might be more insightful to see that the ratio of uncertainties before and after observing , i.e. , converges to 1 in both cases. Merging does provide some security but sparsity is more detrimental.
Iii-B Adding a surjection
The motivation of the surjection onto a smaller set is to bound as , . The Markov chain imposes that . The surjection thus provoques a loss in verification as depicted in Fig. 1.
App. -A shows that for , this loss is minimized for:
where is a threshold depending on . In the dense setup, and the surjection corresponds to a majority vote collusion in traitor tracing (a threshold model in group testing). Hence, by [6, Prop. 3.4]:
In the sparse setup which corresponds to an ‘All-1’ attack in traitor tracing (a the perfect model in group testing). Then the best option is to set and [6, Prop. 3.3]:
From (3), the necessary length is .
The main property still holds but the surjection lowers from to (dense), from to (sparse). The sparse setup is still the best option w.r.t. .
Iii-C Relationship with the Bloom filter
A Bloom filter is a well-known data structure designed for set membership, embedding items to be enrolled into thanks to hash functions. Its probability of false negative is exactly , whereas the probability of false positive is not null. The number of hash functions minimizing is . Then, the necessary length to meet a required false positive level is .
These numbers show the connection with our scheme (14). At the enrollment phase, the hash functions indeed associate to the -th item a binary sequence indicating which bits of have to be set. This sequence is indeed sparse with . The necessary length is the same. Indeed, the enrollment phase of a Bloom filter is nothing more than the ‘All-1’ surjection.
The only difference resides in the statistical model. There is at most symbols ‘1’ in sequence whereas, in our model, that follows a binomial distribution . Yet, asymptotically as , by some concentration phenomenon, the two models get similar. This explains why we end up with similar optimal parameters. Yet, the Bloom filter only works when the query object is exactly one enrolled item, whereas the next section shows that our scheme is robust to noise.
Iv Real vectors
This section deals with real vectors: vectors to be enrolled , and the query vector . All have unit norm. An embedding mechanism makes the connection with the previous section. As in , this study models the embedding as a probabilistic function.
Iv-a Binary embedding
For instance, for , a popular embedding is:
where . This in turn gives i.i.d. Bernoulli symbols with if .
At the query time, the embedding mechanism uses the same random vectors but a different threshold:
Under , suppose that . This correlation defines the channel with the error rates:
The error rate has the expression (and similarly for ):
Iv-B Induced channel
For this embedding, the parameters for the vectors define the setup for the sequences. It is a priori difficult to find the best tuning . For a fixed , decreases with while increases. App. -B reveals that is sensitive to especially with the ‘All-1’ surjection of the sparse solution. Fig. 2 shows indeed that the dense solution is more robust, unless is very close to 1. Here, we enforce a surjection (identity, All-1, or majority vote) and make a grid search to find the optimum for a given . It happens that these parameters are better set to 0, i.e. dense solution, for the identity and majority vote. As for the ‘All-1’ surjection, we observe that is s.t. and is slightly bigger than to lower . Yet, this sparse solution is not as good as the dense solution unless is close to 1, i.e. the query vector is very close to the enrolled vector.
This observation holds only for the embedding function (15). Hashing functions less prone to error may exist.
V Experimental work
We evaluate our scheme with face recognition. Face images are coming from LFW , CFP  and FEI  databases. For each dataset, individuals are enrolled into random groups. There is the same number of positive and negative (impostors) queries.
Labeled Faces in the Wild These are pictures of celebrities in all sort of viewpoint and under an uncontrolled environment. We use pre-aligned LFW images. The enrollment set consists of individuals with at least two images in the LFW database. One random template of each individual is enrolled in the system, playing the role of . Some other individuals were randomly picked in the database to play the role of impostors.
Celebrities in Frontal-Profile These are frontal and profile views of celebrities taken in an uncontrolled environnement. We only use frontal images enrolled in the system. The impostor set is a random selection of other individuals.
Faculdade de Engenharia Industrial The FEI database contains images in frontal view in a controlled environnement. We use pre-aligned images. There are subjects with two frontal images (one with a neutral expression and the other with a smiling facial expression). The database is created by randomly sampling individuals to be enrolled, and impostors.
V-a Experimental Setup
Face descriptors are obtained from a pre-trained network based on VGG-Face architecture followed by PCA  . FEI corresponds to the scenario of employees entering in a building with face recognition, whereas CFP is more difficult, and LFW even more difficult. To equalize the difficulty, we apply a dimension reduction (Probabilistic Principal Component Analysis ) to (FEI), (CFP), and (LFW). The parameters of PPCA are learned on a different set of images, not on the enrolled templates and queries. The vectors are also normalized. With such post-processing, the average correlation between positive pairs equals 0.83 (FEI), 0.78 (CFP), and 0.68 (LFW) with a standard deviation of . Despite the dimension reduction, the hardest dataset is LFW and the easiest FEI.
In one simulation run, the enrollment phase makes random groups with the same number of members. A user claims she/he belongs to group . This claim is true under hypothesis and false under hypothesis (i.e. the user is an impostor). Her/his template is quantized to the sequence , and is sent to the system, which compares to the group representation . This is done for all impostors and all queries of enrolled people. One Monte-Carlo simulation is composed of runs. The figure of merit is when .
V-B Exp. #1: Comparison to the baselines
Our scheme is compared to the following baselines:
EoA-SP and AoE-SP  (signal processing approach)
EoA-ML and AoE-ML  (machine learning approach)
The drawback of these baselines is that the length of the data structure is bounded. Here, it is set to maximum value, i.e. the dimension of templates.
Our scheme allows more freedom. Setting produces a much bigger representation. It is not surprising that our scheme is better than the baselines. Fig. 3 validates our motivation to get rid off the drawback of the baselines with limited , to achieve better verification performance. These results are obtained with the dense solution. Indeed, despite all our efforts, we could not achieve better results with the sparse solution. This confirms the lesson learnt from Fig. 2: the dense solution outperforms the sparse solution when the average correlation between positive pairs is lower than .
The improvement is also better as the size of groups increases. We explain this by the use of the types, i.e. . Equation (9) shows that increases with for the dense solution, compensating for aggregating more templates.
V-C Exp. #2: Reducing the size of the group representation
There are two ways for reducing the size of the group representation. The first means is to decrease , the second means is to lower thanks to a surjection. Sect. III-B presented optimal surjections from to . We found experimentally good surjections to sets for .
This is done according to the following heuristic. Starting from , we iteratively decrease the size of by one. This amounts to merge two symbols of . By brute force, we analyse all the pairs of symbols measuring the loss in induced by their merging. By merging the best pair, we decrease the number of symbols in by one. This process is iterated until the targeted size of is achieved. This heuristic is not optimal, but it is tractable. Fig. 4 compares these two means. Employing a coarser surjection is slightly better in terms of verification performances.
V-D Unexpected results
We have argued that FEI CFP LFW in terms of difficulty due to the opposite ordering of the datasets typical correlation between positive pairs. Eq. (19) shows that a lower produces a higher (and ), whence a lower . In Fig. 3, the experimental results contradict this intuition.
This may be explained by the Signal to Noise Ratio at the template level. We define it as where is the average correlation for positive pairs and is the variance of this correlation for negative pairs. If a negative query is uniformly distributed over the hypersphere, then its correlation with an enrolled template is approximatively distributed as a centered Gaussian distribution with variance .
Yet, has no impact on , , and . We suppose that its impact is tangible on the entropy of the template vectors. Sect. II assumes that the enrolled sequences are statistically independent. This assumption is not granted with the embedding of Sect. IV. Yet, a bigger favors the independence (or at least the decorrelation) between real template vectors.
Our theoretical study justifies that the dense setup is more interesting in terms of verification performance and security level unless we are operating in the high-SNR regime where the positive queries are very well correlated with the enrolled templates. This statement holds for any embedding, yet some are certainly more suited than others depending on , , and the geometrical relationship among positive pairs.
This work is supported by the project CHIST-ERA ID_IOT 20CH21 167534. Let us first explain how is computed. Denote and channel , and . Then,
-a Surjection to
We assume here the noiseless setup allowing to write as . Inspired by traitor tracing, we consider a probabilistic surjection where . The vector parametrizes the surjection. Denote by the derivative w.r.t. . After some lengthy calculus:
It is not possible to cancel the gradient . The optimal thus lies on the boundary of the hypercube . This makes the surjection deterministic. Assuming , then and because is strictly decreasing. This makes and must be set to the lowest possible value, i.e. , to increase at most. This is indeed the case for any with . In the same way, and so is if . Yet, for a given , ranges from 0 to as increases from 0 to 1. Therefore, is optimal only over an interval of .
For odd and , if , and 1 (i.e. majority vote) otherwise is optimal because ( and ).
The ‘All-1’ surjection: makes so that if and for .
-B Impact of the channel
Suppose that is a parameter of the channel . Then
because and .
Suppose now that . Then,
Taking (23) around the noiseless channel where and because :
We only express the first terms to outline that if while and hence are not null, then this derivative goes to . A small deviation from the noiseless case with has a major detrimental impact on . That situation happens for sure when working with type, i.e. : Consider the null type obtained when : while , .
One can prove that the surjection can mitigate this effect if and . This happens with the majority vote of the dense setup, but unfortunately, not with of the ‘All-1’ surjection in the sparse setup.
- (2015) Practical and optimal LSH for angular distance. NIPS. External Links: Cited by: §IV.
- (1988) Solution to problem 87-6* : the entropy of a poisson distribution. SIAM Review 30 (2), pp. 314–317. Cited by: §III-A2.
- (2019) Aggregation and embedding for group membership verification. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Cited by: §I, §I, 1st item.
- (2019-06) Privacy preserving group membership verification and identification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Cited by: §I, §I, 2nd item.
- (2008) Labeled faces in the wild: a database forstudying face recognition in unconstrained environments. In Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, Cited by: §V.
- (2015) Search problems in cryptography from fingerprinting to lattice sieving. Ph.D. Thesis, Eindhoven University of Technology. Cited by: §III-A1, §III-B.
- (2015) Deep face recognition.. In Proceedings of the British Machine Vision Conference, Cited by: §V-A.
- (2017) Privacy preserving identification using sparse approximation with ambiguization. In Proceedings of the IEEE International Workshop on Information Forensics and Security, Cited by: §I.
- (2018) Privacy-preserving outsourced media search using secure sparse ternary codes. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Cited by: §I.
- (2016) Frontal to profile face verification in the wild. In Proceeding of the IEEE Winter Conference on Applications of Computer Vision, Cited by: §V.
- (1959) Probability of error for optimal codes in a gaussian channel. Bell System Tech. J. 38, pp. 611–656. Cited by: §II-C3.
- (2010) A new ranking method for principal components analysis and its application to face image analysis. Image and Vision Computing 28 (6), pp. 902–913. Cited by: §V.
- (1999) Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61 (3), pp. 611–622. Cited by: §V-A.