Capacities and Capacity-Achieving Decodersfor Various Fingerprinting Games

# Capacities and Capacity-Achieving Decoders for Various Fingerprinting Games

Thijs Laarhoven111T. Laarhoven is with the Department of Mathematics and Computer Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands.
E-mail: mail@thijs.com.
July 30, 2019
###### Abstract

Combining an information-theoretic approach to fingerprinting with a more constructive, statistical approach, we derive new results on the fingerprinting capacities for various informed settings, as well as new log-likelihood decoders with provable code lengths that asymptotically match these capacities. The simple decoder built against the interleaving attack is further shown to achieve the simple capacity for unknown attacks, and is argued to be an improved version of the recently proposed decoder of Oosterwijk et al. With this new universal decoder, cut-offs on the bias distribution function can finally be dismissed.

Besides the application of these results to fingerprinting, a direct consequence of our results to group testing is that (i) a simple decoder asymptotically requires a factor more tests to find defectives than a joint decoder, and (ii) the simple decoder presented in this paper provably achieves this bound.

## 1 Introduction

To protect copyrighted content against unauthorized redistribution, distributors may embed watermarks or fingerprints in the content, uniquely linking copies to individual users. Then, if an illegal copy of the content is found, the distributor can extract the watermark from the copy and compare it to the database of watermarks, to determine which user was responsible.

To combat this solution, pirates may try to form a coalition of several colluders, each owning a differently watermarked copy of the content, and perform a collusion attack. By comparing their different versions of the content, they will detect differences in their copies which must be part of the watermark. They can then create a new pirate copy, where the resulting watermark matches the watermark of different pirates in different segments of the content, making it hard for the distributor to find the responsible users. Fortunately, under the assumption that if the pirates don’t detect any differences (because they all received the same version) they output this watermark (known in the literature as the Boneh-Shaw marking assumption [5]), it is still possible to find all colluders using suitable fingerprinting codes.

### 1.1 Model

The above fingerprinting game is often modeled as the following two-person game between the distributor and the coalition of pirates . The set of colluders is assumed to be a random subset of size from the complete set of users , and the identities of these colluders are unknown to the distributor. The aim of the game for the distributor is ultimately to discover the identities of the colluders, while the colluders want to stay hidden. The game consists of the following three phases: (i) the distributor uses an encoder to generate the fingerprints; (ii) the colluders employ a collusion channel to generate the pirate output, and (iii) the distributor uses a decoder to map the pirate output to a set of accused users.

##### Encoder

First, the distributor generates a fingerprinting code , consisting of code words from .222More generally is a code with code words of length from an alphabet of size , but in this paper we restrict our attention to the binary case . The th entry of code word indicates which version of the content is assigned to user in the th segment. The parameter is referred to as the code length, and the distributor would like to be as small as possible.

A common restriction on the encoding process is to assume that is created by first generating a probability vector by choosing each entry independently from a certain distribution function , and then generating according to . This guarantees that watermarks of different users are independent, and that watermarks in different positions are independent. Schemes that satisfy this assumption are sometimes called bias-based schemes, and the encoders discussed in this paper also belong to this category.

##### Collusion channel

After generating , the entries are used to select and embed watermarks in the content, and the content is sent out to all users. The colluders then get together, compare their copies, and use a certain collusion channel or pirate attack to select the pirate output . If the pirate attack behaves symmetrically both in the colluders and in the positions , then the collusion channel can be modeled by a vector , consisting of entries indicating the probability of outputting a when the pirates received ones and zeroes. Some common attacks are described in Section 2.3.

##### Decoder

Finally, after the pirate output has been generated and distributed, we assume that the distributor intercepts it and applies a decoding algorithm to the pirate output , the code and the (secret) bias vector to compute a set of accused users. This is commonly done by assigning scores to users, and accusing those users whose score exceeds some predefined threshold . The distributor wins the game if is non-empty and contains only colluders (i.e. ) and loses if this is not the case, which could be because an innocent user is falsely accused (a false positive error), or because no guilty users are accused (a false negative error). We often write and for upper bounds on the false positive and false negative probabilities respectively.

### 1.2 Related work

Work on the above bias-based fingerprinting game started in 2003, when Tardos proved that any fingerprinting scheme must satisfy , and that a bias-based scheme is able to achieve this optimal scaling in  [38]. He proved the latter by providing a simple and explicit construction with a code length of , which is known in the literature as the Tardos scheme.

##### Improved constructions

Later work on the constructive side of fingerprinting focused on improving upon Tardos’ result by sharpening the bounds [3, 35], optimizing the distribution functions [27], improving the score function [36], tightening the bounds again with this improved score function [18, 22, 28, 34, 36, 37], optimizing the score function [29], and again tightening the bounds with this optimized score function [16, 30] to finally end up with a sufficient asymptotic code length of for large . This construction can be extended to larger alphabets, in which case the code length scales as . Other work on practical constructions focused on joint decoders, which are computationally more involved but may work with shorter codes [24, 25, 31], and side-informed fingerprinting games [7, 10, 21, 29], where estimating the collusion channel was considered to get an improved performance.

Recently Abbe and Zheng [1] showed that, in the context of fingerprinting [24], if the set of allowed collusion channels satisfies a certain one-sidedness condition, then a decoder that achieves capacity against the information-theoretic worst-case attack is a universal decoder achieving capacity against arbitrary attacks. The main drawback of using this result is that the worst-case attack is hard to compute, but this does lead to more insight why e.g. Oosterwijk et al. [30] obtained a universal decoder by considering the decoder against the ‘interleaving attack’, which is known to be the asymptotic worst-case attack.

##### Fingerprinting capacities

At the same time, work was also done on establishing bounds on the fingerprinting capacity , which translate to lower bounds on the required asymptotic code length through for large . For the binary case Huang and Moulin [11, 12, 13, 14, 25] and Amiri and Tardos [2] independently derived exact asymptotics for the fingerprinting capacity for arbitrary attacks as , corresponding to a minimum code length of . Huang and Moulin [14] further showed that to achieve this bound, an encoder should use the arcsine distribution for generating biases :

 F∗(p)=2πarcsin√p.(0

These capacity-results were later generalized to the -ary setting [4, 15] showing that a -ary code length of is asymptotically optimal.

##### Dynamic fingerprinting

There has also been some interest in a variant of the above fingerprinting game where several rounds of the two-player game between the distributor and the coalition are played sequentially. This allows the distributor to adjust the encoding and decoding steps of the next rounds to the knowledge obtained from previous rounds. Many of the bias-based constructions can also be used effectively in this dynamic setting [17, 20, 21] with equivalent asymptotics for the required code length, but allowing the distributor to trace all colluders even if the collusion channel is not symmetric in the colluders, and leading to significantly smaller first order terms than in the ‘static’ setting. These bias-based dynamic schemes may even be able to compete with the celebrated scheme of Fiat and Tassa [9].

##### Group testing

Finally, a different area of research closely related to fingerprinting is that of group testing, where the set of users corresponds to a set of items, the set of colluders corresponds to a subset of defective items, and where the aim of the distributor is to find all defective items by performing group tests. This game corresponds to a special case of the fingerprinting game, where the pirate attack is fixed in advance (and possibly known to the distributor) to (a variant of) the ‘all- attack’. In this game it is significantly easier to find all pirates/defectives; it is known that a joint decoder asymptotically requires only tests [33], while simple decoders exist requiring as few as tests to find all defectives [6]. Recent work has shown that applying results from fingerprinting to group testing may lead to improved results compared to what is known in the group testing literature [19, 23].

### 1.3 Contributions

In this work we first extend the work of Huang and Moulin [14] by deriving explicit asymptotics for the simple and joint capacities of various fingerprinting games with different amounts of side-information. Table 1 summarizes tight lower bounds on the code length constant for various informed settings obtained via the capacities. These asymptotics can be seen as our ‘targets’ for the second part of this paper, which describes decoders with provable bounds on and that asymptotically achieve these capacities. In fact, if the collusion channel that the decoder was built against matches the attack used by the pirates, then the proof that the resulting simple decoders achieve capacity is remarkably simple and holds for arbitrary attacks.

##### Capacity-achieving simple decoding without cut-offs

Similar to Oosterwijk et al. [29, 30], who studied the decoder built against the interleaving attack because that attack is in a sense optimal, we then turn our attention to the simple decoder designed against the interleaving attack, and argue that it is an improved version of Oosterwijk et al.’s universal decoder. To provide a sneak preview of this result, the new score function is the following:

 g(x,y,p)=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩ln(1+pc(1−p))x=y=0ln(1−1c)x≠yln(1+1−pcp)x=y=1 (2)

This decoder is shown to achieve the uninformed simple capacity, and we argue that with this decoder (i) the Gaussian assumption always holds (and convergence to the normal distribution is much faster), and (ii) no cut-offs on the bias distribution function are ever needed anymore.

##### Joint log-likelihood decoders

Since it is not hard to extend the definition of the simple decoder to joint decoding, we also present and analyze joint log-likelihood decoders. Analyzing these joint decoders turns out to be somewhat harder due to the ‘mixed tuples’, but we give some motivation why these decoders seem to work well. We also conjecture that the joint decoder tailored against the interleaving attack achieves the joint uninformed capacity, but proving this result is left for future work.

##### Applications to group testing

Since the all- attack in fingerprinting is equivalent to a problem known in the literature as group testing [21, 23], some of our results can also be applied to this area. In fact, we derive two new results in the area of group testing: (i) any simple-decoder group testing algorithm requires at least group tests to find defective items hidden among items, and (ii) the decoder discussed in Section 4.1 provably achieves this optimal scaling in . This decoder was previously considered in [23], but no provable bounds on the (asymptotic) code lengths were given there.

### 1.4 Outline

The outline of the paper is as follows. Section 2 first describes the various different models we consider in this paper, and provides a roadmap for Sections 3 and 4. Section 3 discusses capacity results for each of these models, while Section 4 discusses decoders which aim to match the lower bounds on obtained in Section 3. Finally, in Section 5 we conclude with a brief discussion of the most important results and remaining open problems.

## 2 Different models

Let us first describe how the results in Sections 3 and 4 are structured according to different assumptions, leading to different models. Besides the general assumptions on the model discussed in the introduction, we further make a distinction between models based on (1) the computational complexity of the decoder, (2) the information about known to the distributor, and (3) the collusion channel used by the pirates. These are discussed in Sections 2.1, 2.2 and 2.3 respectively.

### 2.1 Decoding complexity

Commonly two types of decoders are considered, which use different amounts of information to decide whether a user should be accused or not.

1. Simple decoding: To quote Moulin [25, Section 4.3]: “The receiver makes an innocent/guilty decision on each user independently of the other users, and there lies the simplicity but also the suboptimality of this decoder.” In other words, the decision to accuse user depends only on the th code word of , and not on other code words from .

2. Joint decoding: In this case, the decoder is allowed to base the decision whether to accuse a user on the entire code . Such decoders may be able to obtain smaller code lengths than possible with the best simple decoders.

Using more information generally causes the time complexity of the decoding step to go up, so usually there is a trade-off between a shorter code length and a faster decoding algorithm.

### 2.2 Side-informed distributors

We consider three different scenarios with respect to the knowledge of the distributor about the collusion channel . Depending on the application, different scenarios may apply.

1. Fully informed: Even before is generated, the distributor already knows exactly what the pirate attack will be. This information can thus be used to optimize both the encoding and decoding phases. This scenario applies to various group testing models, and may apply to dynamic traitor tracing, where after several rounds the distributor may have estimated the pirate strategy.

2. Partially informed: The tracer does not know in advance what collusion channel will be used, so the encoding is aimed at arbitrary attacks. However, after obtaining the pirate output , the distributor does learn more about before running an accusation algorithm, e.g. by estimating the attack based on the available data. So the encoding is uninformed, but we assume that the decoder is informed and knows . Since the asymptotically optimal bias distribution function in fingerprinting is known to be the arcsine distribution , we will assume that is used for generating biases. This scenario is similar to EM decoding [7, 10].

3. Uninformed: In this case, both the encoding and decoding phases are assumed to be done without prior knowledge about , so also the decoder should be designed to work against arbitrary attacks. This is the most commonly studied fingerprinting game.

For simplicity of the analysis, in the partially informed setting we assume that the estimation of the collusion channel is precise, so that is known exactly to the decoder. This assumption may not be realistic, but at least we can then obtain explicit expressions for the capacities, and get an idea of how much estimating the strategy may help in reducing the code length. This also allows us to derive explicit lower bounds on : even if somehow the attack can be estimated correctly, then the corresponding capacities tell us that we will still need at least a certain number of symbols to find the pirates.

### 2.3 Common collusion channels

As mentioned in the introduction, we assume that collusion channels satisfy the marking assumption, which means that and . For the remaining values of the pirates are free to choose how often they want to output a when they receive ones. Some commonly considered attacks are listed below.

1. Interleaving attack: The coalition randomly selects one of its members and outputs his symbol. This corresponds to . This attack is known to be asymptotically optimal (from the point of view of the colluders) in the uninformed max-min fingerprinting game [14].

2. All- attack: The pirates output a whenever they can, i.e., whenever they have at least one . This translates to . This attack is of particular interest due to its relation with group testing.

3. Majority voting: The colluders output the most common symbol among their received symbols. This means that .

4. Minority voting: The traitors output the symbol which they received the least often (but received at least once). For , this corresponds to .

5. Coin-flip attack: If the pirates receive both symbols, they flip a fair coin to decide which symbol to output. So for , this corresponds to .

For even , defining in a consistent way for majority and minority voting is not straightforward. For simplicity, in the analysis of these two attacks we will therefore assume that is odd. Note that in the uninformed setting, we do not distinguish between different collusion channels; the encoder and decoder should then work against arbitrary attacks.

The upcoming two sections about capacities (Section 3) and decoders (Section 4) are structured according to the above classification, where first the decoding complexity is chosen, then the side-information is fixed, and finally different attacks are considered. For instance, to find the joint capacity in the fully informed game one has to go to Section 3.2.1, while the new simple uninformed decoder can be found in Section 4.1.3.

## 3 Capacities

In this section we establish lower bounds on the code length of any valid decoder, by inspecting the information-theoretic capacities of the various fingerprinting games. We will use some common definitions from information theory, such as the binary entropy function , the relative entropy or Kullback-Leibler divergence , and the mutual information . The results in this section build further upon previous work on this topic by Huang and Moulin [14].

### 3.1 Simple capacities

For simple decoders, we assume that the decision whether to accuse user is based solely on , and . Focusing on a single position, and denoting the random variables corresponding to a colluder’s symbol, the pirate output, and the bias in this position by , and , the interesting quantity to look at [11] is the mutual information . This quantity depends on the pirate strategy and on the bias . To study this mutual information we will use the following equality [14, Equation (61)],

 I(X1;Y|P=p)=pd(a1∥a)+(1−p)d(a0∥a), (3)

where are defined as

 a =c∑z=0(cz)pz(1−p)c−zθz, (4) a0 =c−1∑z=0(c−1z)pz(1−p)c−z−1θz, (5) a1 =c∑z=1(c−1z−1)pz−1(1−p)c−zθz. (6)

Note that given and , the above formulas allow us to compute the associated mutual information explicitly.

#### 3.1.1 Fully informed

In the fully informed setting we are free to choose to maximize the capacity, given a collusion channel . When the attack is known to the distributor in advance, there is no reason to use different values of ; the distributor should always use the value of that maximizes the mutual information payoff . Given an attack strategy , the capacity we are interested in is thus

 Cs(θ)=maxpI(X1;Y|P=p). (7)

For general attacks finding the optimal value of analytically can be hard, but for certain specific attacks we can investigate the resulting expressions individually to find the optimal values of that maximize the mutual information. This leads to the following results for the five attacks listed in Section 2.3. Proofs will appear in the full version.

###### Theorem 1.

The simple informed capacities and the corresponding optimal values of for the five attacks of Section 2.3 are:

 Cs(θint) ∼12c2ln2, psint =12, (S1) Cs(θall1) ∼ln2c, psall1 ∼ln2c, (S2) Cs(θmaj) ∼1πcln2, psmaj =12, (S3) Cs(θmin) ∼ln2c, psmin ∼ln2c, (S4) Cs(θcoin) ∼ln24c, pscoin ∼ln22c. (S5)

Since fully informed protection against the all- attack is equivalent to noiseless group testing, and since the code length scales in terms of the capacity as , we immediately get the following corollary.

###### Corollary 1.

Any simple group testing algorithm for defectives and total items requires an asymptotic number of group tests of at least

 ℓ∼clnnln(2)2≈2.08clnn. (8)

Note that this seems to contradict earlier results of [19], which suggested that under a certain Gaussian assumption, only tests are required. This apparent contradiction is caused by the fact that the Gaussian assumption in [19] is not correct in the regime of small , for which those results were derived. In fact, the distributions considered in that paper roughly behave like binomial distributions over trials with probability of success of , which converge to Poisson distributions. Numerical inspection shows that the relevant distribution tails are indeed not very Gaussian and do not decay fast enough. Rigorous analysis of the scores in [19] shows that an asymptotic code length of about is sufficient when , which is well above the lower bound of Corollary 1. Details can be found in the full version.

#### 3.1.2 Partially informed

If the encoder is uninformed, then the best he can do against arbitrary attacks (for large ) is to generate biases using the arcsine distribution . So instead of computing the mutual information in one point , we now average over different values of where follows the arcsine distribution. So the capacity we are interested in is given by

 Cs(θ)=EpI(X1;Y|P=p)=∫10I(X1;Y|P=p)π√p(1−p)dp. (9)

The resulting integrals are hard to evaluate analytically, even for large , although for some collusion channels we can use Pinsker’s inequality (similar to the proof of [14, Theorem 7]) to show that . And indeed, if we look at the numerics of in Figure 1, it seems that the partially informed capacity usually scales as . As a consequence, even if the attack can be estimated exactly, then still a code length of the order is required to get a scheme that works. Note that for the interleaving attack, the capacity scales as .

#### 3.1.3 Uninformed

For the uninformed fingerprinting game, where both the encoder and decoder are built to work against arbitrary attacks, we are interested in the following max-min game:

 Cs=maxFminθ(F)EpI(X1;Y|P=p). (10)

Huang and Moulin [14, 15] previously solved this uninformed game for asymptotically large coalition sizes as follows.

###### Proposition 1.

[15, Theorem 3] The simple uninformed capacity is given by

 Cs∼12c2ln2, (11)

and the optimizing encoder and collusion channel achieving this bound for large are the arcsine distribution and the interleaving attack .

Note that while for the interleaving attack the capacity is the same (up to order terms) for each of the three side-informed cases, for the four other attacks the capacity gradually increases from to to when the distributor is more and more informed.

### 3.2 Joint capacities

If the computational complexity of the decoder is not an issue, joint decoding may be an option. In that case, the relevant quantity to examine is the mutual information between the symbols of all colluders, denoted by , and the pirate output , given :  [14]. Note that only depends on through , so . To compute the joint capacities, we use the following convenient explicit formula [14, Equation (59)]:

 1cI(Z;Y|P=p)=1c[h(a)−ah], (12)

where is the binary entropy function, and is defined as

 ah =c∑z=0(cz)pz(1−p)c−zh(θz). (13)

#### 3.2.1 Fully informed

In the fully informed setting, the capacity is again obtained by considering the mutual information and maximizing it as a function of :

 Cj(θ)=1cmaxpI(Z;Y|P=p). (14)

Computing this is very easy for the all- attack, the majority voting attack and the minority voting attack, since one can easily prove that the joint capacity is equal to whenever the collusion channel is deterministic, e.g. when for all . Since the capacity for the interleaving attack was already known, the only non-trivial case is the coin-flip attack. A proof of the following theorem can be found in the full version.

###### Theorem 2.

The joint informed capacities and the corresponding optimal values of for the five attacks of Section 2.3 are:

 Cj(θint) ∼12c2ln2, pjint =12, (J1) Cj(θall1) =1c, pjall1 ∼ln2c, (J2) Cj(θmaj) =1c, pjmaj =12, (J3) Cj(θmin) =1c, pjmin =12, (J4) Cj(θcoin) ∼log2(5/4)c, pjcoin ∼ln(5/3)c. (J5)

Recall that there is a one-to-one correspondence between the all- attack and group testing, so the result above establishes firm bounds on the asymptotic number of group tests required by any probabilistic group testing algorithm. This result was already known, and was first derived by Sebő [33, Theorem 2].

#### 3.2.2 Partially informed

For the partially informed capacity we again average over the mutual information where is drawn at random from the arcsine distribution . Thus the capacity is given by

 Cj(θ)=1cEpI(Z;Y|P=p). (15)

Exact results are again hard to obtain, but we can at least compute the capacities numerically to see how they behave. Figure 2 shows the capacities of the five attacks of Section 2.3. Although the capacities are higher for joint decoding than for simple decoding, the joint capacities of all attacks but the interleaving attack also scale as .

#### 3.2.3 Uninformed

Finally, if we are working with joint decoders which are supposed to work against arbitrary attacks, then we are interested in the following max-min mutual information game:

 Cj=maxFminθ(F)Ep1cI(Z;Y|P=p). (16)

This joint capacity game was previously solved by Huang and Moulin [14] who showed that also in the joint game, the interleaving attack and the arcsine distribution together form a saddle-point solution to the uninformed fingerprinting game.

###### Proposition 2.

[14, Theorem 6, Corollary 7] The joint uninformed capacity is given by

 Cj∼12c2ln2, (17)

and the optimizing encoder and collusion channel achieving this bound for large are the arcsine distribution and the interleaving attack .

## 4 Decoders

After deriving “targets” for our decoders in the previous section, this section discusses decoders that aim to match these bounds. We will follow the score-based framework introduced by Tardos [38], which was later generalized to joint decoders by Moulin [25]. For simple decoding, this means that a user receives a score of the form

 Sj=ℓ∑i=1Sj,i=ℓ∑i=1g(Xj,i,yi,pi), (18)

where is called the score function. User is then accused if for some threshold .

For joint decoding, scores are assigned to tuples of distinct users according to

 ST=ℓ∑i=1ST,i=ℓ∑i=1g(Xj1,i,…,Xjc,i,yi,pi). (19)

In this case, a tuple of users is accused if the joint tuple score exceeds some other threshold . Note that this accusation algorithm is not exactly well-defined, since it is possible that a user appears both in a tuple that is accused and in a tuple that is not accused. For the analysis we will assume that the scheme is only successful if the single tuple consisting of all colluders has a score exceeding and no other tuples have a score exceeding , in which case all users in the guilty tuple are accused.

### 4.1 Simple decoders

Several different score functions for the simple decoder setting were considered before, but in this work we will restrict our attention to the following log-likelihood scores, which perform well and turn out to be easy to analyze:

 g(x,y,p)=ln(Pg(x,y|p)Pi(x,y|p)). (20)

Here corresponds to the probability of seeing the pair when user is guilty, and corresponds to the same probability under the assumption that is innocent. Using this score function , the complete score of a user is the logarithm of a Neyman-Pearson score over the entire codeword:

 Sj=ℓ∑i=1ln(Pg(xj,i,yi|pi)Pi(xj,i,yi|pi))=ln(Pg(xj,y|p)Pi(xj,y|p)). (21)

Such Neyman-Pearson scores are known to be optimally discriminative to decide whether to accuse a user or not. Log-likelihood scores were previously considered in the context of fingerprinting in e.g. [24, 32].

#### 4.1.1 Fully informed

For the central theorem below, we will make use of the following function , which is closely related to the moment generating functions of scores in one position for innocent and guilty users. This function is defined as

 M(t)=∑x,yPi(x,y|p)1−tPg(x,y|p)t (22)

and it satisfies and .

###### Theorem 3.

Let and be fixed and known to the distributor. Let , and let the threshold and code length be defined as

 η=ln(nε1),ℓ=√γ(1+√γ)−lnM(1−√γ)ln(nε1). (23)

Then with probability at least no innocent users are accused (regardless of which collusion channel was used), and with probability at least a colluder is caught (if the collusion channel is indeed ).

###### Proof.

For innocent users , we would like to prove that , where is the user’s total score over all positions. If this can be proved, then it follows that with probability at least no innocent users are accused. Using the Markov inequality for with and optimizing over , we see that the optimum lies close to . For simplicity we choose which, combined with the given value of , leads to the following bound:

 Pi(Sj>η)=minα>0Pi(eαSj>eαη)≤minα>0Ei(eαSj)eαη (24) =minα>0ℓ∏i=1Ei(eαSj,i)eαη=minα>0M(α)ℓ(n/ε1)α≤M(1)ℓn/ε1=ε1n. (25)

For guilty users, we would like to prove that for an arbitrary guilty user , we have . Again using Markov’s inequality (but now with a more sophisticated exponent ) we get

 Pg(Sj<η)≤minβ>0Eg(e−βSj)e−βη=minβ>0ℓ∏i=1Eg(e−βSj,i)e−βη (26) =minβ>0M(1−β)ℓe−βη≤M(1−√γ)ℓe−√γη=ε2, (27)

where the last equality follows from the definitions of and of (23). ∎

Compared to previous papers analyzing provable bounds on the error probabilities, the proof of Theorem 3 is remarkably short and simple. The only problem is that the given expression for is not very informative as to how scales for large . The following corollary answers this question, by showing how scales for small .

###### Corollary 2.

If then achieves the optimal asymptotic scaling (achieves capacity) for arbitrary :

 ℓ=log2nI(X1;Y|P=p)[1+O(√γ)], (28)
###### Proof.

First, let us study the behavior of for small , by computing the first order Taylor expansion of around :

 M(1−√γ) =∑x,yPg(x,y|p)exp(−√γln(Pg(x,y|p)Pi(x,y|p))) (29) \lx@stackrel(a)=∑x,yPg(x,y|p)(1−√γln(Pg(x,y|p)Pi(x,y|p))+O(γ)) (30) =1−√γ∑x,yPg(x,y|p)ln(Pg(x,y|p)Pi(x,y|p))+O(γ) (31) =1−√γI(X1;Y|P=p)ln2+O(γ). (32)

Here follows from the fact that if , the factor in front of the exponentiation would already cause this term to be , while if , then also and thus the ratio is bounded and does not depend on . Substituting the above result in the original equation for we thus get the result of (28):

 ℓ =√γ(1+√γ)−lnM(1−√γ)ln(nε1) (33) =√γ(1+√γ)√γI(X1;Y|P=p)ln2+O(γ)ln(nε1) (34) =log2nI(X1;Y|P=p)[1+O(√γ)]. (35)

Since the capacities tell us that , it follows that asymptotically achieves capacity. ∎

Since this construction is asymptotically optimal regardless of , in the fully informed setting we can now simply optimize (using Theorem 1) to get the following results.

###### Corollary 3.

Using the values for of Theorem 1, the asymptotics for for the five attacks of Section 2.3 are:

 ℓ(θint) =2c2ln(n)[1+O(√γ)], (36) ℓ(θall1) =cln(2)2ln(n)[1+O(√γ)], (37) ℓ(θmaj) =πcln(n)[1+O(√γ)], (38) ℓ(θmin) =cln(2)2ln(n)[1+O(√γ)], (39) ℓ(θcoin) =4cln(2)2ln(n)[1+O(√γ)]. (40)

Since the all- attack is equivalent to group testing, we mention this result separately, together with a more explicit expression for .

###### Corollary 4.

Let and let be fixed. Then the log-likelihood score function is given by333To be precise: , and for convenience we have scaled by a factor .

 g(x,y)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩+1(x,y)=(0,0)−1+O(1/c)(x,y)=(0,1)−∞(x,y)=(1,0)+c(x,y)=(1,1) (41)

Using this score function in combination with the parameters and of Theorem 3, we obtain a simple group testing algorithm with an asymptotic number of group tests of

 ℓ∼clnnln(2)2≈2.08clnn, (42)

thus achieving the simple group testing capacity.

#### 4.1.2 Partially informed

Since the score functions from Section 4.1.1 achieve capacity for each value of , using this score function we also trivially achieve the partially informed capacity when the arcsine distribution is used. Estimates of these capacities, and thus the resulting code lengths, can be found in Section 3.1.2.

#### 4.1.3 Uninformed

We now arrive at what is arguably one of the most important results of this paper. Just like Oosterwijk et al. [29], who specifically studied the score function tailored against the interleaving attack, we now also take a closer look at the log-likelihood score function designed against the interleaving attack. 444Considering the interleaving attack for designing a universal decoder is further motivated by the results of Abbe and Zheng [1, 24], who showed that under certain conditions, the worst-case attack decoder is a universal capacity-achieving decoder. The interleaving attack is theoretically not the worst-case attack for finite , but since it is known to be the asymptotic worst-case attack, the difference between the worst-case attack and the interleaving attack vanishes for large . Working out the details, this score function is of the form:

 g(x,y,p)=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩ln(1+pc(1−p))x=y=0ln(1−1c)x≠yln(1+1−pcp)x=y=1 (43)

The first thing to note here is that if we denote Oosterwijk et al.’s [30] score function by , then satisfies

 g(x,y,p)=ln(1+h(x,y,p)c). (44)

If , then by Tayloring the logarithm around we see that . Since scaling a score function by a constant does not affect its performance, this implies that and are then equivalent. Since for Oosterwijk et al.’s score function one generally needs to use cut-offs on that guarantee that (cf. [16]), and since the decoder of Oosterwijk et al. is known to asymptotically achieve the uninformed capacity, we immediately get the following result.

###### Proposition 3.

The score function of (43) asymptotically achieves the uninformed simple capacity when the same cut-offs on as those in [16] are used.

So optimizing the decoder so that it is resistant against the interleaving attack again leads to a decoder that is resistant against arbitrary attacks.

##### Cutting off the cut-offs

Although Proposition 3 is already a nice result, we can do even better. We can prove a stronger statement, which shows one of the reasons why the log-likelihood decoder is probably more practical than the decoder of Oosterwijk et al.

###### Theorem 4.

The score function of (43) achieves the uninformed simple capacity when no cut-offs are used.

###### sketch.

First note that in the limit of large , the cut-offs of Ibrahimi et al. converge to . So for large , the difference between not using cut-offs and using cut-offs is negligible, as long as the contribution of the tails of near or to the distribution of user scores is negligible. Since with this score function , all moments of both innocent and guilty user scores are finite (arbitrary powers of logarithms always lose against the of the arcsine distribution and the decreasing width of the interval between and the cut-off), the tails indeed decay exponentially. So also without cut-offs this score function asymptotically achieves the uninformed simple capacity. ∎

Note that the same result does not apply to the score function of Oosterwijk et al. [30], for which the tails of the distributions are not Gaussian enough to omit the use of cut-offs. The main difference is that for small , the score function of [29] scales as (which explodes when is really small), while the log-likelihood decoder then only scales as which is much smaller.

Let us now mention a third way to obtain a capacity-achieving uninformed simple decoder which is again very similar to the two decoders above. To construct this decoder, we use a Bayesian approximation of the proposed empirical mutual information decoder of Moulin [25], and again plug in the asymptotic worst-case attack, the interleaving attack.

###### Theorem 5.

Using Bayesian inference with an a priori probability of guilt of , the empirical mutual information decoder tailored against the interleaving attack can be approximated with the following score function:

 m(x,y,p)=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩ln(1+pn(1−p))x=y=0ln(1−1n)x≠yln(1+1−pnp)x=y=1 (45)
###### sketch.

For now, let be fixed. The empirical mutual information decoder assigns a score to a user using

 (46)

where denotes the empirical estimate of based on the data , , . For large , these estimates will converge to the real probabilities, so we can approximate by

 Sj≈ℓ∑i=1ln(P(xj,i,yi)P(xj,i)P(yi))=ℓ∑i=1m(xj,i,yi,pi). (47)

Here and can be easily computed, but for computing we need to know whether user is guilty or not. Using Bayesian inference, we can write

 P(x,y)=Pg(x,y)P(j∈C)+Pi(x,y)P(j∉C). (48)

Assuming an a priori probability of guilt of , we can work out the details to obtain

 m(x,y,p)=ln(1+cn[Pg(x,y)Pi(x,y)−1]). (49)

Filling in the corresponding probabilities for the interleaving attack, we end up with the score function of (45). ∎

For values of with , this decoder is again equivalent to both the log-likelihood score function and Oosterwijk et al.’s score function .

### 4.2 Joint decoders

For the joint decoding setting, scores are assigned to tuples of users, and again higher scores correspond to a higher probability of being accused. The most natural step from the simple log-likelihood decoders to joint decoders seems to be to use the following joint score function:

 g(x1,…,xc,y,p)=ln(Pgc(x1,…,xc,y|p)Pic(x1,…,xc,y|p)). (50)

Here is under the assumption that in this tuple all users are guilty, while for we assume that all users are innocent. Note that under the assumption that the attack is colluder-symmetric, the score function only depends on :

 g(x1,…,xc,y,p)=g(z,y,p)=ln(Pgc(z,y|p)Pic(z,y|p)). (51)

#### 4.2.1 Fully informed

To analyze the joint decoder, we again make use of the moment generating function for the score assigned to tuples of innocent users. This function is now defined by

 M(t)=∑z,yPic(z,y|p)1−tPgc(z,y|p)t (52)

and it satisfies and . Using similar techniques as in Section 4.1.1, we obtain the following result.

###### Theorem 6.

Let and be fixed and known to the distributor. Let , and let the threshold and code length be defined as

 η=ln(ncε1),ℓ=√γ(1+√γ)−lnM(1−√γ)ln(ncε1). (53)

Then with probability at least all all-innocent tuples are not accused, and with probability at least the single all-guilty tuple is accused.

###### sketch.

The proof is very similar to the proof of Theorem 3. Instead of innocent and guilty users we now have all-innocent tuples and just all-guilty tuple, which changes some of the numbers in , and . We again apply the Markov inequality with for innocent tuples and for guilty tuples, to obtain the given expressions for and . ∎

Note that Theorem 6 does not prove that we can actually find the set of colluders with high probability, since mixed tuples consisting of both innocent and guilty users also exist, and these may or may not have a score exceeding . This does prove that with high probability we can find a set of users, for which (i) all tuples not containing these users have a score below , and (ii) the tuple containing exactly these users has a score above . Regardless of what the scores for mixed tuples are, with probability at least such a set consists and contains at least one colluder. Furthermore, if this set is unique, then with high probability this is exactly the set of colluders. But there is no guarantee that it is unique without additional proofs. This is left for future work.

To further motivate why using this joint decoder may be the right choice, the following proposition shows that at least the scaling of the resulting code lengths is optimal. Note that the extra that we get from can be combined with the mutual information to obtain , which corresponds to the joint capacity.

###### Proposition 4.

If then the code length of Theorem 6 scales as

 ℓ=log2n1cI(Z;Y|P=p)[1+O(√γ)], (54)

thus asymptotically achieving the optimal code length (up to first order terms) for arbitrary values of .

Since the asymptotic code length is optimal regardless of , these asymptotics are also optimal when is optimized to maximize the mutual information in the fully informed setting.

Finally, although it is hard to estimate the scores of mixed tuples with this decoder, just like in [31] we expect that the joint decoder score for a tuple is roughly equal to the sum of the individual simple decoder scores. So a tuple of users consisting of colluders and innocent users is expected to have a score roughly a factor smaller than the expected score for the all-guilty tuple. So after computing the scores for all tuples of size , we can get rough estimates of how many guilty users are contained in each tuple, and for instance try to find the set of users that best matches these estimates. There are several options for post-processing that may improve the accuracy of using this joint decoder, which are left for future work.

#### 4.2.2 Partially informed

As mentioned in Proposition 4, the code length is asymptotically optimal regardless of , so the code length in the partially uninformed setting is also asymptotically optimal. Asymptotics on can thus be obtained by combining Proposition 4 with the results of Section 3.2.2.

#### 4.2.3 Uninformed

Note that if the above joint decoder turns out to work well, then we can again plug in the interleaving attack to get something that might just work well against arbitrary attacks. While we cannot prove that this joint decoder is optimal, we can already see what the score function would be, and conjecture that it works against arbitrary attacks.

###### Conjecture 1.

The joint log-likelihood decoder against the interleaving attack, with the score function defined by

 g(z,y,p)={ln(1−zc)−ln(1−p)(y=0)ln(zc)−ln(p)(y=1) (55)

works against arbitrary attacks and asymptotically achieves the joint capacity of the uninformed fingerprinting game.

A further study of this universal joint decoder is left as an open problem.

## 5 Discussion

Let us now briefly discuss the results from Sections 3 and 4, their consequences, and some directions for future work.

##### Informed simple decoding

For the setting of simple decoders, we derived explicit asymptotics on the informed capacities for various attacks, which often scale as . We further showed that log-likelihood scores provably match these bounds for large , regardless of and . Because these decoders are optimal for any value of , they are also optimal in the partially informed setting, where different values of are used. If the encoder uses the arcsine distribution to generate biases, we showed that these capacities generally seem to scale as , which is roughly ‘halfway’ between the fully informed and uninformed capacities.

##### Uninformed simple decoding

Although log-likelihood decoders have already been studied before in the context of fingerprinting, the main drawback was always that to use these decoders, you would either have to fill in (and know) the exact pirate strategy, or compute the worst-case attack explicitly. So if you are in the simple uninformed setting where you don’t know the pirate strategy and where the worst-case attack is not given by a nice closed-form expression [14, Fig. 4b], how can you construct such decoders for large ? The trick seems to be to just fill in the asymptotic worst-case attack, which Huang and Moulin showed is the interleaving attack [14], and which is much simpler to analyze. After previously suggesting this idea to Oosterwijk et al., we now used the same trick here to obtain two other capacity-achieving score functions using two different methods (but each time filling in the interleaving attack). So in total we now have three different methods to obtain (closed-form) capacity-achieving decoders in the uninformed setting:

• Using Lagrange-multipliers, Oosterwijk et al. [29] obtained:

 h(x,y,p)=⎧⎪ ⎪⎨⎪ ⎪⎩+p1−px=y=0−1x≠y+1−ppx=y=1 (56)
• Using Neyman-Pearson-based log-likelihood scores, we obtained:

 g(x,y,p)=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩ln(1+pc(1−p))x=y=0ln(1−1c)x≠yln(1+1−pcp)x=y=1 (57)
• Using a Bayesian approximation of the empirical mutual information decoder of Moulin [25], we obtained:

 m(x,y,p)=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩ln(1+pn(1−p))x=y=0ln(1−1n)x≠yln(1+1−pnp)x=y=1 (58)

For and large , these score functions are equivalent up to a scaling factor:

 h(x,y,p)∼c⋅g(x,y,p)∼n⋅m(x,y,p), (59)

and therefore all three are asymptotically optimal. So there may be many different roads that lead to Rome, but they all seem to have one thing in common: to build a universal decoder that works against arbitrary attacks, one should build a decoder that works against the asymptotic worst-case pirate attack, the interleaving attack. And if it does work against this attack, then it probably works against any other attack as well.

##### Joint decoding

Although deriving the joint informed capacities is much easier than deriving the simple informed capacities, actually building decoders that provably match these bounds is a different matter. We conjectured that the same log-likelihood scores achieve capacity when a suitable accusation algorithm is used, and we conjectured that the log-likelihood score built against the interleaving attack achieves the uninformed joint capacity, but we cannot prove any of these statements beyond reasonable doubt. For now this is left as an open problem.

##### Group testing

Since the all- attack is equivalent to group testing, some of the results we obtained also apply to group testing. The joint capacity was already known [33], but to the best of our knowledge both the simple capacity (Corollary 1) and a simple decoder matching this simple capacity (Corollary 4) were not yet known before. Attempts have been made to build efficient simple decoders with a code length not much longer than the joint capacity [6], but these do not match the simple capacity. Future work will include computing the capacities and building decoders for various noisy group testing models, where the marking assumption may not apply.

##### Dynamic fingerprinting

Although this paper focused on applications to the ‘static’ fingerprinting game, the construction of [20] can trivially be applied to the decoders in this paper as well to build efficient dynamic fingerprinting schemes. Although the asymptotics for the code length in this dynamic construction are the same, (i) the order terms are significantly smaller in the dynamic game, (ii) one does not need the assumption that the pirate strategy is colluder-symmetric, and (iii) one does not necessarily need to know (a good estimate of) in advance [20, Section V]. An important open problem remains to determine the dynamic uninformed fingerprinting capacity, which may prove or disprove that the construction of [20] is optimal.

##### Further generalizations

While this paper already aims to provide a rather complete set of guidelines on what to do in the various different fingerprinting games (with different amounts of side-information, and different computational assumptions on the decoder), there are some further generalizations that were not considered here due to lack of space. We mention two in particular:

• Larger alphabets: In this work we focused on the binary case of different symbols, but it may be advantageous to work with larger alphabet sizes , since the code length decreases linearly with . For the results about decoders we did not really use that we were working with a binary alphabet, so it seems a straightforward exercise to prove that the -ary versions of the log-likelihood decoders also achieve capacity. A harder problem seems to be to actually compute these capacities in the various informed settings, since the maximization problem then transforms from a one-dimensional optimization problem to a -dimensional optimization problem.

• Tuple decoding: As in [31], we can consider a setting in between the simple and joint decoding settings, where decisions to accuse are made based on looking at tuples of users of size at most . Tuple decoding may offer a trade-off between the high complexity, low code length of a joint decoder and the low complexity, higher code length of a simple decoder, and so it may be useful to know how the capacities scale in the region .

## 6 Acknowledgments

The author is very grateful to Pierre Moulin for his insightful comments and suggestions during the author’s visit to Urbana-Champaign that inspired work on this paper. The author would also like to thank Teddy Furon for pointing out the connection between decoders designed against the interleaving attack and the results of Abbe and Zheng [1], and for finding some mistakes in a preliminary version of this manuscript. Finally, the author thanks Jeroen Doumen, Jan-Jaap Oosterwijk, Boris Škorić, and Benne de Weger for valuable discussions and comments.

## References

• [1] E. Abbe and L. Zheng. Linear universal decoding for compound channels. IEEE Transactions on Information Theory, 56(12):5999–6013, 2010.
• [2] E. Amiri and G. Tardos. High rate fingerprinting codes and the fingerprinting capacity. In 20th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 336–345, 2009.
• [3] O. Blayer and T. Tassa. Improved versions of Tardos’ fingerprinting scheme. Designs, Codes and Cryptography, 48(1):79–103, 2008.
• [4] D. Boesten and B. Škorić. Asymptotic fingerprinting capacity for non-binary alphabets. In 13th Conference on Information Hiding (IH), pages 1–13, 2011.
• [5] D. Boneh and J. Shaw. Collusion-secure fingerprinting for digital data. IEEE Transactions on Information Theory, 44(5):1897–1905, 1998.
• [6] C.-L. Chan, S. Jaggi, V. Saligrama, and S. Agnihotri. Non-adaptive group testing: explicit bounds and novel algorithms. In IEEE International Symposium on Information Theory (ISIT), pages 1837–1841, 2012.
• [7] A. Charpentier, F. Xie, C. Fontaine, and T. Furon. Expectation maximization decoding of Tardos probabilistic fingerprinting code. In SPIE Proceedings, volume 7254, 2009.
• [8] T. M. Cover and J. A. Thomas. Elements of Information Theory (2nd Edition). Wiley Press, 2006.
• [9] A. Fiat and T. Tassa. Dynamic traitor tracing. Journal of Cryptology, 14(3):354–371, 2001.
• [10] T. Furon and L. Pérez-Freire, EM decoding of Tardos traitor tracing codes. In ACM Symposium on Multimedia and Security (MM&Sec), pages 99–106, 2009.
• [11] Y.-W. Huang and P. Moulin. Capacity-achieving fingerprint decoding. In IEEE Workshop on Information Forensics and Security (WIFS), pages 51–55, 2009.
• [12] Y.-W. Huang and P. Moulin. Saddle-point solution of the fingerprinting capacity game under the marking assumption. In IEEE International Symposium on Information Theory (ISIT), pages 2256–2260, 2009.
• [13] Y.-W. Huang and P. Moulin. Maximin optimality of the arcsine fingerprinting distribution and the interleaving attack for large coalitions. In IEEE Workshop on Information Forensics and Security (WIFS), pages 1–6, 2010.