Capacities and CapacityAchieving Decoders
for Various Fingerprinting Games
Abstract
Combining an informationtheoretic approach to fingerprinting with a more constructive, statistical approach, we derive new results on the fingerprinting capacities for various informed settings, as well as new loglikelihood decoders with provable code lengths that asymptotically match these capacities. The simple decoder built against the interleaving attack is further shown to achieve the simple capacity for unknown attacks, and is argued to be an improved version of the recently proposed decoder of Oosterwijk et al. With this new universal decoder, cutoffs on the bias distribution function can finally be dismissed.
Besides the application of these results to fingerprinting, a direct consequence of our results to group testing is that (i) a simple decoder asymptotically requires a factor more tests to find defectives than a joint decoder, and (ii) the simple decoder presented in this paper provably achieves this bound.
1 Introduction
To protect copyrighted content against unauthorized redistribution, distributors may embed watermarks or fingerprints in the content, uniquely linking copies to individual users. Then, if an illegal copy of the content is found, the distributor can extract the watermark from the copy and compare it to the database of watermarks, to determine which user was responsible.
To combat this solution, pirates may try to form a coalition of several colluders, each owning a differently watermarked copy of the content, and perform a collusion attack. By comparing their different versions of the content, they will detect differences in their copies which must be part of the watermark. They can then create a new pirate copy, where the resulting watermark matches the watermark of different pirates in different segments of the content, making it hard for the distributor to find the responsible users. Fortunately, under the assumption that if the pirates don’t detect any differences (because they all received the same version) they output this watermark (known in the literature as the BonehShaw marking assumption [5]), it is still possible to find all colluders using suitable fingerprinting codes.
1.1 Model
The above fingerprinting game is often modeled as the following twoperson game between the distributor and the coalition of pirates . The set of colluders is assumed to be a random subset of size from the complete set of users , and the identities of these colluders are unknown to the distributor. The aim of the game for the distributor is ultimately to discover the identities of the colluders, while the colluders want to stay hidden. The game consists of the following three phases: (i) the distributor uses an encoder to generate the fingerprints; (ii) the colluders employ a collusion channel to generate the pirate output, and (iii) the distributor uses a decoder to map the pirate output to a set of accused users.
Encoder
First, the distributor generates a fingerprinting code , consisting of code words from .^{2}^{2}2More generally is a code with code words of length from an alphabet of size , but in this paper we restrict our attention to the binary case . The th entry of code word indicates which version of the content is assigned to user in the th segment. The parameter is referred to as the code length, and the distributor would like to be as small as possible.
A common restriction on the encoding process is to assume that is created by first generating a probability vector by choosing each entry independently from a certain distribution function , and then generating according to . This guarantees that watermarks of different users are independent, and that watermarks in different positions are independent. Schemes that satisfy this assumption are sometimes called biasbased schemes, and the encoders discussed in this paper also belong to this category.
Collusion channel
After generating , the entries are used to select and embed watermarks in the content, and the content is sent out to all users. The colluders then get together, compare their copies, and use a certain collusion channel or pirate attack to select the pirate output . If the pirate attack behaves symmetrically both in the colluders and in the positions , then the collusion channel can be modeled by a vector , consisting of entries indicating the probability of outputting a when the pirates received ones and zeroes. Some common attacks are described in Section 2.3.
Decoder
Finally, after the pirate output has been generated and distributed, we assume that the distributor intercepts it and applies a decoding algorithm to the pirate output , the code and the (secret) bias vector to compute a set of accused users. This is commonly done by assigning scores to users, and accusing those users whose score exceeds some predefined threshold . The distributor wins the game if is nonempty and contains only colluders (i.e. ) and loses if this is not the case, which could be because an innocent user is falsely accused (a false positive error), or because no guilty users are accused (a false negative error). We often write and for upper bounds on the false positive and false negative probabilities respectively.
1.2 Related work
Work on the above biasbased fingerprinting game started in 2003, when Tardos proved that any fingerprinting scheme must satisfy , and that a biasbased scheme is able to achieve this optimal scaling in [38]. He proved the latter by providing a simple and explicit construction with a code length of , which is known in the literature as the Tardos scheme.
Improved constructions
Later work on the constructive side of fingerprinting focused on improving upon Tardos’ result by sharpening the bounds [3, 35], optimizing the distribution functions [27], improving the score function [36], tightening the bounds again with this improved score function [18, 22, 28, 34, 36, 37], optimizing the score function [29], and again tightening the bounds with this optimized score function [16, 30] to finally end up with a sufficient asymptotic code length of for large . This construction can be extended to larger alphabets, in which case the code length scales as . Other work on practical constructions focused on joint decoders, which are computationally more involved but may work with shorter codes [24, 25, 31], and sideinformed fingerprinting games [7, 10, 21, 29], where estimating the collusion channel was considered to get an improved performance.
Recently Abbe and Zheng [1] showed that, in the context of fingerprinting [24], if the set of allowed collusion channels satisfies a certain onesidedness condition, then a decoder that achieves capacity against the informationtheoretic worstcase attack is a universal decoder achieving capacity against arbitrary attacks. The main drawback of using this result is that the worstcase attack is hard to compute, but this does lead to more insight why e.g. Oosterwijk et al. [30] obtained a universal decoder by considering the decoder against the ‘interleaving attack’, which is known to be the asymptotic worstcase attack.
Fingerprinting capacities
At the same time, work was also done on establishing bounds on the fingerprinting capacity , which translate to lower bounds on the required asymptotic code length through for large . For the binary case Huang and Moulin [11, 12, 13, 14, 25] and Amiri and Tardos [2] independently derived exact asymptotics for the fingerprinting capacity for arbitrary attacks as , corresponding to a minimum code length of . Huang and Moulin [14] further showed that to achieve this bound, an encoder should use the arcsine distribution for generating biases :
(1) 
These capacityresults were later generalized to the ary setting [4, 15] showing that a ary code length of is asymptotically optimal.
Dynamic fingerprinting
There has also been some interest in a variant of the above fingerprinting game where several rounds of the twoplayer game between the distributor and the coalition are played sequentially. This allows the distributor to adjust the encoding and decoding steps of the next rounds to the knowledge obtained from previous rounds. Many of the biasbased constructions can also be used effectively in this dynamic setting [17, 20, 21] with equivalent asymptotics for the required code length, but allowing the distributor to trace all colluders even if the collusion channel is not symmetric in the colluders, and leading to significantly smaller first order terms than in the ‘static’ setting. These biasbased dynamic schemes may even be able to compete with the celebrated scheme of Fiat and Tassa [9].
Group testing
Finally, a different area of research closely related to fingerprinting is that of group testing, where the set of users corresponds to a set of items, the set of colluders corresponds to a subset of defective items, and where the aim of the distributor is to find all defective items by performing group tests. This game corresponds to a special case of the fingerprinting game, where the pirate attack is fixed in advance (and possibly known to the distributor) to (a variant of) the ‘all attack’. In this game it is significantly easier to find all pirates/defectives; it is known that a joint decoder asymptotically requires only tests [33], while simple decoders exist requiring as few as tests to find all defectives [6]. Recent work has shown that applying results from fingerprinting to group testing may lead to improved results compared to what is known in the group testing literature [19, 23].
1.3 Contributions
In this work we first extend the work of Huang and Moulin [14] by deriving explicit asymptotics for the simple and joint capacities of various fingerprinting games with different amounts of sideinformation. Table 1 summarizes tight lower bounds on the code length constant for various informed settings obtained via the capacities. These asymptotics can be seen as our ‘targets’ for the second part of this paper, which describes decoders with provable bounds on and that asymptotically achieve these capacities. In fact, if the collusion channel that the decoder was built against matches the attack used by the pirates, then the proof that the resulting simple decoders achieve capacity is remarkably simple and holds for arbitrary attacks.
Fully informed  Partially informed  

Simple  Joint  Simple  Joint  
Interleaving atk.  
All attack  
Majority voting  
Minority voting  
Coinflip attack 
Capacityachieving simple decoding without cutoffs
Similar to Oosterwijk et al. [29, 30], who studied the decoder built against the interleaving attack because that attack is in a sense optimal, we then turn our attention to the simple decoder designed against the interleaving attack, and argue that it is an improved version of Oosterwijk et al.’s universal decoder. To provide a sneak preview of this result, the new score function is the following:
(2) 
This decoder is shown to achieve the uninformed simple capacity, and we argue that with this decoder (i) the Gaussian assumption always holds (and convergence to the normal distribution is much faster), and (ii) no cutoffs on the bias distribution function are ever needed anymore.
Joint loglikelihood decoders
Since it is not hard to extend the definition of the simple decoder to joint decoding, we also present and analyze joint loglikelihood decoders. Analyzing these joint decoders turns out to be somewhat harder due to the ‘mixed tuples’, but we give some motivation why these decoders seem to work well. We also conjecture that the joint decoder tailored against the interleaving attack achieves the joint uninformed capacity, but proving this result is left for future work.
Applications to group testing
Since the all attack in fingerprinting is equivalent to a problem known in the literature as group testing [21, 23], some of our results can also be applied to this area. In fact, we derive two new results in the area of group testing: (i) any simpledecoder group testing algorithm requires at least group tests to find defective items hidden among items, and (ii) the decoder discussed in Section 4.1 provably achieves this optimal scaling in . This decoder was previously considered in [23], but no provable bounds on the (asymptotic) code lengths were given there.
1.4 Outline
The outline of the paper is as follows. Section 2 first describes the various different models we consider in this paper, and provides a roadmap for Sections 3 and 4. Section 3 discusses capacity results for each of these models, while Section 4 discusses decoders which aim to match the lower bounds on obtained in Section 3. Finally, in Section 5 we conclude with a brief discussion of the most important results and remaining open problems.
2 Different models
Let us first describe how the results in Sections 3 and 4 are structured according to different assumptions, leading to different models. Besides the general assumptions on the model discussed in the introduction, we further make a distinction between models based on (1) the computational complexity of the decoder, (2) the information about known to the distributor, and (3) the collusion channel used by the pirates. These are discussed in Sections 2.1, 2.2 and 2.3 respectively.
2.1 Decoding complexity
Commonly two types of decoders are considered, which use different amounts of information to decide whether a user should be accused or not.

Simple decoding: To quote Moulin [25, Section 4.3]: “The receiver makes an innocent/guilty decision on each user independently of the other users, and there lies the simplicity but also the suboptimality of this decoder.” In other words, the decision to accuse user depends only on the th code word of , and not on other code words from .

Joint decoding: In this case, the decoder is allowed to base the decision whether to accuse a user on the entire code . Such decoders may be able to obtain smaller code lengths than possible with the best simple decoders.
Using more information generally causes the time complexity of the decoding step to go up, so usually there is a tradeoff between a shorter code length and a faster decoding algorithm.
2.2 Sideinformed distributors
We consider three different scenarios with respect to the knowledge of the distributor about the collusion channel . Depending on the application, different scenarios may apply.

Fully informed: Even before is generated, the distributor already knows exactly what the pirate attack will be. This information can thus be used to optimize both the encoding and decoding phases. This scenario applies to various group testing models, and may apply to dynamic traitor tracing, where after several rounds the distributor may have estimated the pirate strategy.

Partially informed: The tracer does not know in advance what collusion channel will be used, so the encoding is aimed at arbitrary attacks. However, after obtaining the pirate output , the distributor does learn more about before running an accusation algorithm, e.g. by estimating the attack based on the available data. So the encoding is uninformed, but we assume that the decoder is informed and knows . Since the asymptotically optimal bias distribution function in fingerprinting is known to be the arcsine distribution , we will assume that is used for generating biases. This scenario is similar to EM decoding [7, 10].

Uninformed: In this case, both the encoding and decoding phases are assumed to be done without prior knowledge about , so also the decoder should be designed to work against arbitrary attacks. This is the most commonly studied fingerprinting game.
For simplicity of the analysis, in the partially informed setting we assume that the estimation of the collusion channel is precise, so that is known exactly to the decoder. This assumption may not be realistic, but at least we can then obtain explicit expressions for the capacities, and get an idea of how much estimating the strategy may help in reducing the code length. This also allows us to derive explicit lower bounds on : even if somehow the attack can be estimated correctly, then the corresponding capacities tell us that we will still need at least a certain number of symbols to find the pirates.
2.3 Common collusion channels
As mentioned in the introduction, we assume that collusion channels satisfy the marking assumption, which means that and . For the remaining values of the pirates are free to choose how often they want to output a when they receive ones. Some commonly considered attacks are listed below.

Interleaving attack: The coalition randomly selects one of its members and outputs his symbol. This corresponds to . This attack is known to be asymptotically optimal (from the point of view of the colluders) in the uninformed maxmin fingerprinting game [14].

All attack: The pirates output a whenever they can, i.e., whenever they have at least one . This translates to . This attack is of particular interest due to its relation with group testing.

Majority voting: The colluders output the most common symbol among their received symbols. This means that .

Minority voting: The traitors output the symbol which they received the least often (but received at least once). For , this corresponds to .

Coinflip attack: If the pirates receive both symbols, they flip a fair coin to decide which symbol to output. So for , this corresponds to .
For even , defining in a consistent way for majority and minority voting is not straightforward. For simplicity, in the analysis of these two attacks we will therefore assume that is odd. Note that in the uninformed setting, we do not distinguish between different collusion channels; the encoder and decoder should then work against arbitrary attacks.
2.4 Roadmap
The upcoming two sections about capacities (Section 3) and decoders (Section 4) are structured according to the above classification, where first the decoding complexity is chosen, then the sideinformation is fixed, and finally different attacks are considered. For instance, to find the joint capacity in the fully informed game one has to go to Section 3.2.1, while the new simple uninformed decoder can be found in Section 4.1.3.
3 Capacities
In this section we establish lower bounds on the code length of any valid decoder, by inspecting the informationtheoretic capacities of the various fingerprinting games. We will use some common definitions from information theory, such as the binary entropy function , the relative entropy or KullbackLeibler divergence , and the mutual information . The results in this section build further upon previous work on this topic by Huang and Moulin [14].
3.1 Simple capacities
For simple decoders, we assume that the decision whether to accuse user is based solely on , and . Focusing on a single position, and denoting the random variables corresponding to a colluder’s symbol, the pirate output, and the bias in this position by , and , the interesting quantity to look at [11] is the mutual information . This quantity depends on the pirate strategy and on the bias . To study this mutual information we will use the following equality [14, Equation (61)],
(3) 
where are defined as
(4)  
(5)  
(6) 
Note that given and , the above formulas allow us to compute the associated mutual information explicitly.
3.1.1 Fully informed
In the fully informed setting we are free to choose to maximize the capacity, given a collusion channel . When the attack is known to the distributor in advance, there is no reason to use different values of ; the distributor should always use the value of that maximizes the mutual information payoff . Given an attack strategy , the capacity we are interested in is thus
(7) 
For general attacks finding the optimal value of analytically can be hard, but for certain specific attacks we can investigate the resulting expressions individually to find the optimal values of that maximize the mutual information. This leads to the following results for the five attacks listed in Section 2.3. Proofs will appear in the full version.
Theorem 1.
The simple informed capacities and the corresponding optimal values of for the five attacks of Section 2.3 are:
(S1)  
(S2)  
(S3)  
(S4)  
(S5) 
Since fully informed protection against the all attack is equivalent to noiseless group testing, and since the code length scales in terms of the capacity as , we immediately get the following corollary.
Corollary 1.
Any simple group testing algorithm for defectives and total items requires an asymptotic number of group tests of at least
(8) 
Note that this seems to contradict earlier results of [19], which suggested that under a certain Gaussian assumption, only tests are required. This apparent contradiction is caused by the fact that the Gaussian assumption in [19] is not correct in the regime of small , for which those results were derived. In fact, the distributions considered in that paper roughly behave like binomial distributions over trials with probability of success of , which converge to Poisson distributions. Numerical inspection shows that the relevant distribution tails are indeed not very Gaussian and do not decay fast enough. Rigorous analysis of the scores in [19] shows that an asymptotic code length of about is sufficient when , which is well above the lower bound of Corollary 1. Details can be found in the full version.
3.1.2 Partially informed
If the encoder is uninformed, then the best he can do against arbitrary attacks (for large ) is to generate biases using the arcsine distribution . So instead of computing the mutual information in one point , we now average over different values of where follows the arcsine distribution. So the capacity we are interested in is given by
(9) 
The resulting integrals are hard to evaluate analytically, even for large , although for some collusion channels we can use Pinsker’s inequality (similar to the proof of [14, Theorem 7]) to show that . And indeed, if we look at the numerics of in Figure 1, it seems that the partially informed capacity usually scales as . As a consequence, even if the attack can be estimated exactly, then still a code length of the order is required to get a scheme that works. Note that for the interleaving attack, the capacity scales as .
3.1.3 Uninformed
For the uninformed fingerprinting game, where both the encoder and decoder are built to work against arbitrary attacks, we are interested in the following maxmin game:
(10) 
Huang and Moulin [14, 15] previously solved this uninformed game for asymptotically large coalition sizes as follows.
Proposition 1.
[15, Theorem 3] The simple uninformed capacity is given by
(11) 
and the optimizing encoder and collusion channel achieving this bound for large are the arcsine distribution and the interleaving attack .
Note that while for the interleaving attack the capacity is the same (up to order terms) for each of the three sideinformed cases, for the four other attacks the capacity gradually increases from to to when the distributor is more and more informed.
3.2 Joint capacities
If the computational complexity of the decoder is not an issue, joint decoding may be an option. In that case, the relevant quantity to examine is the mutual information between the symbols of all colluders, denoted by , and the pirate output , given : [14]. Note that only depends on through , so . To compute the joint capacities, we use the following convenient explicit formula [14, Equation (59)]:
(12) 
where is the binary entropy function, and is defined as
(13) 
3.2.1 Fully informed
In the fully informed setting, the capacity is again obtained by considering the mutual information and maximizing it as a function of :
(14) 
Computing this is very easy for the all attack, the majority voting attack and the minority voting attack, since one can easily prove that the joint capacity is equal to whenever the collusion channel is deterministic, e.g. when for all . Since the capacity for the interleaving attack was already known, the only nontrivial case is the coinflip attack. A proof of the following theorem can be found in the full version.
Theorem 2.
The joint informed capacities and the corresponding optimal values of for the five attacks of Section 2.3 are:
(J1)  
(J2)  
(J3)  
(J4)  
(J5) 
Recall that there is a onetoone correspondence between the all attack and group testing, so the result above establishes firm bounds on the asymptotic number of group tests required by any probabilistic group testing algorithm. This result was already known, and was first derived by Sebő [33, Theorem 2].
3.2.2 Partially informed
For the partially informed capacity we again average over the mutual information where is drawn at random from the arcsine distribution . Thus the capacity is given by
(15) 
Exact results are again hard to obtain, but we can at least compute the capacities numerically to see how they behave. Figure 2 shows the capacities of the five attacks of Section 2.3. Although the capacities are higher for joint decoding than for simple decoding, the joint capacities of all attacks but the interleaving attack also scale as .
3.2.3 Uninformed
Finally, if we are working with joint decoders which are supposed to work against arbitrary attacks, then we are interested in the following maxmin mutual information game:
(16) 
This joint capacity game was previously solved by Huang and Moulin [14] who showed that also in the joint game, the interleaving attack and the arcsine distribution together form a saddlepoint solution to the uninformed fingerprinting game.
Proposition 2.
[14, Theorem 6, Corollary 7] The joint uninformed capacity is given by
(17) 
and the optimizing encoder and collusion channel achieving this bound for large are the arcsine distribution and the interleaving attack .
4 Decoders
After deriving “targets” for our decoders in the previous section, this section discusses decoders that aim to match these bounds. We will follow the scorebased framework introduced by Tardos [38], which was later generalized to joint decoders by Moulin [25]. For simple decoding, this means that a user receives a score of the form
(18) 
where is called the score function. User is then accused if for some threshold .
For joint decoding, scores are assigned to tuples of distinct users according to
(19) 
In this case, a tuple of users is accused if the joint tuple score exceeds some other threshold . Note that this accusation algorithm is not exactly welldefined, since it is possible that a user appears both in a tuple that is accused and in a tuple that is not accused. For the analysis we will assume that the scheme is only successful if the single tuple consisting of all colluders has a score exceeding and no other tuples have a score exceeding , in which case all users in the guilty tuple are accused.
4.1 Simple decoders
Several different score functions for the simple decoder setting were considered before, but in this work we will restrict our attention to the following loglikelihood scores, which perform well and turn out to be easy to analyze:
(20) 
Here corresponds to the probability of seeing the pair when user is guilty, and corresponds to the same probability under the assumption that is innocent. Using this score function , the complete score of a user is the logarithm of a NeymanPearson score over the entire codeword:
(21) 
Such NeymanPearson scores are known to be optimally discriminative to decide whether to accuse a user or not. Loglikelihood scores were previously considered in the context of fingerprinting in e.g. [24, 32].
4.1.1 Fully informed
For the central theorem below, we will make use of the following function , which is closely related to the moment generating functions of scores in one position for innocent and guilty users. This function is defined as
(22) 
and it satisfies and .
Theorem 3.
Let and be fixed and known to the distributor. Let , and let the threshold and code length be defined as
(23) 
Then with probability at least no innocent users are accused (regardless of which collusion channel was used), and with probability at least a colluder is caught (if the collusion channel is indeed ).
Proof.
For innocent users , we would like to prove that , where is the user’s total score over all positions. If this can be proved, then it follows that with probability at least no innocent users are accused. Using the Markov inequality for with and optimizing over , we see that the optimum lies close to . For simplicity we choose which, combined with the given value of , leads to the following bound:
(24)  
(25) 
For guilty users, we would like to prove that for an arbitrary guilty user , we have . Again using Markov’s inequality (but now with a more sophisticated exponent ) we get
(26)  
(27) 
where the last equality follows from the definitions of and of (23). ∎
Compared to previous papers analyzing provable bounds on the error probabilities, the proof of Theorem 3 is remarkably short and simple. The only problem is that the given expression for is not very informative as to how scales for large . The following corollary answers this question, by showing how scales for small .
Corollary 2.
If then achieves the optimal asymptotic scaling (achieves capacity) for arbitrary :
(28) 
Proof.
First, let us study the behavior of for small , by computing the first order Taylor expansion of around :
(29)  
(30)  
(31)  
(32) 
Here follows from the fact that if , the factor in front of the exponentiation would already cause this term to be , while if , then also and thus the ratio is bounded and does not depend on . Substituting the above result in the original equation for we thus get the result of (28):
(33)  
(34)  
(35) 
Since the capacities tell us that , it follows that asymptotically achieves capacity. ∎
Since this construction is asymptotically optimal regardless of , in the fully informed setting we can now simply optimize (using Theorem 1) to get the following results.
Corollary 3.
Since the all attack is equivalent to group testing, we mention this result separately, together with a more explicit expression for .
Corollary 4.
Let and let be fixed. Then the loglikelihood score function is given by^{3}^{3}3To be precise: , and for convenience we have scaled by a factor .
(41) 
Using this score function in combination with the parameters and of Theorem 3, we obtain a simple group testing algorithm with an asymptotic number of group tests of
(42) 
thus achieving the simple group testing capacity.
4.1.2 Partially informed
Since the score functions from Section 4.1.1 achieve capacity for each value of , using this score function we also trivially achieve the partially informed capacity when the arcsine distribution is used. Estimates of these capacities, and thus the resulting code lengths, can be found in Section 3.1.2.
4.1.3 Uninformed
We now arrive at what is arguably one of the most important results of this paper. Just like Oosterwijk et al. [29], who specifically studied the score function tailored against the interleaving attack, we now also take a closer look at the loglikelihood score function designed against the interleaving attack. ^{4}^{4}4Considering the interleaving attack for designing a universal decoder is further motivated by the results of Abbe and Zheng [1, 24], who showed that under certain conditions, the worstcase attack decoder is a universal capacityachieving decoder. The interleaving attack is theoretically not the worstcase attack for finite , but since it is known to be the asymptotic worstcase attack, the difference between the worstcase attack and the interleaving attack vanishes for large . Working out the details, this score function is of the form:
(43) 
The first thing to note here is that if we denote Oosterwijk et al.’s [30] score function by , then satisfies
(44) 
If , then by Tayloring the logarithm around we see that . Since scaling a score function by a constant does not affect its performance, this implies that and are then equivalent. Since for Oosterwijk et al.’s score function one generally needs to use cutoffs on that guarantee that (cf. [16]), and since the decoder of Oosterwijk et al. is known to asymptotically achieve the uninformed capacity, we immediately get the following result.
Proposition 3.
So optimizing the decoder so that it is resistant against the interleaving attack again leads to a decoder that is resistant against arbitrary attacks.
Cutting off the cutoffs
Although Proposition 3 is already a nice result, we can do even better. We can prove a stronger statement, which shows one of the reasons why the loglikelihood decoder is probably more practical than the decoder of Oosterwijk et al.
Theorem 4.
The score function of (43) achieves the uninformed simple capacity when no cutoffs are used.
sketch.
First note that in the limit of large , the cutoffs of Ibrahimi et al. converge to . So for large , the difference between not using cutoffs and using cutoffs is negligible, as long as the contribution of the tails of near or to the distribution of user scores is negligible. Since with this score function , all moments of both innocent and guilty user scores are finite (arbitrary powers of logarithms always lose against the of the arcsine distribution and the decreasing width of the interval between and the cutoff), the tails indeed decay exponentially. So also without cutoffs this score function asymptotically achieves the uninformed simple capacity. ∎
Note that the same result does not apply to the score function of Oosterwijk et al. [30], for which the tails of the distributions are not Gaussian enough to omit the use of cutoffs. The main difference is that for small , the score function of [29] scales as (which explodes when is really small), while the loglikelihood decoder then only scales as which is much smaller.
All roads lead to Rome
Let us now mention a third way to obtain a capacityachieving uninformed simple decoder which is again very similar to the two decoders above. To construct this decoder, we use a Bayesian approximation of the proposed empirical mutual information decoder of Moulin [25], and again plug in the asymptotic worstcase attack, the interleaving attack.
Theorem 5.
Using Bayesian inference with an a priori probability of guilt of , the empirical mutual information decoder tailored against the interleaving attack can be approximated with the following score function:
(45) 
sketch.
For now, let be fixed. The empirical mutual information decoder assigns a score to a user using
(46) 
where denotes the empirical estimate of based on the data , , . For large , these estimates will converge to the real probabilities, so we can approximate by
(47) 
Here and can be easily computed, but for computing we need to know whether user is guilty or not. Using Bayesian inference, we can write
(48) 
Assuming an a priori probability of guilt of , we can work out the details to obtain
(49) 
Filling in the corresponding probabilities for the interleaving attack, we end up with the score function of (45). ∎
For values of with , this decoder is again equivalent to both the loglikelihood score function and Oosterwijk et al.’s score function .
4.2 Joint decoders
For the joint decoding setting, scores are assigned to tuples of users, and again higher scores correspond to a higher probability of being accused. The most natural step from the simple loglikelihood decoders to joint decoders seems to be to use the following joint score function:
(50) 
Here is under the assumption that in this tuple all users are guilty, while for we assume that all users are innocent. Note that under the assumption that the attack is colludersymmetric, the score function only depends on :
(51) 
4.2.1 Fully informed
To analyze the joint decoder, we again make use of the moment generating function for the score assigned to tuples of innocent users. This function is now defined by
(52) 
and it satisfies and . Using similar techniques as in Section 4.1.1, we obtain the following result.
Theorem 6.
Let and be fixed and known to the distributor. Let , and let the threshold and code length be defined as
(53) 
Then with probability at least all allinnocent tuples are not accused, and with probability at least the single allguilty tuple is accused.
sketch.
The proof is very similar to the proof of Theorem 3. Instead of innocent and guilty users we now have allinnocent tuples and just allguilty tuple, which changes some of the numbers in , and . We again apply the Markov inequality with for innocent tuples and for guilty tuples, to obtain the given expressions for and . ∎
Note that Theorem 6 does not prove that we can actually find the set of colluders with high probability, since mixed tuples consisting of both innocent and guilty users also exist, and these may or may not have a score exceeding . This does prove that with high probability we can find a set of users, for which (i) all tuples not containing these users have a score below , and (ii) the tuple containing exactly these users has a score above . Regardless of what the scores for mixed tuples are, with probability at least such a set consists and contains at least one colluder. Furthermore, if this set is unique, then with high probability this is exactly the set of colluders. But there is no guarantee that it is unique without additional proofs. This is left for future work.
To further motivate why using this joint decoder may be the right choice, the following proposition shows that at least the scaling of the resulting code lengths is optimal. Note that the extra that we get from can be combined with the mutual information to obtain , which corresponds to the joint capacity.
Proposition 4.
If then the code length of Theorem 6 scales as
(54) 
thus asymptotically achieving the optimal code length (up to first order terms) for arbitrary values of .
Since the asymptotic code length is optimal regardless of , these asymptotics are also optimal when is optimized to maximize the mutual information in the fully informed setting.
Finally, although it is hard to estimate the scores of mixed tuples with this decoder, just like in [31] we expect that the joint decoder score for a tuple is roughly equal to the sum of the individual simple decoder scores. So a tuple of users consisting of colluders and innocent users is expected to have a score roughly a factor smaller than the expected score for the allguilty tuple. So after computing the scores for all tuples of size , we can get rough estimates of how many guilty users are contained in each tuple, and for instance try to find the set of users that best matches these estimates. There are several options for postprocessing that may improve the accuracy of using this joint decoder, which are left for future work.
4.2.2 Partially informed
4.2.3 Uninformed
Note that if the above joint decoder turns out to work well, then we can again plug in the interleaving attack to get something that might just work well against arbitrary attacks. While we cannot prove that this joint decoder is optimal, we can already see what the score function would be, and conjecture that it works against arbitrary attacks.
Conjecture 1.
The joint loglikelihood decoder against the interleaving attack, with the score function defined by
(55) 
works against arbitrary attacks and asymptotically achieves the joint capacity of the uninformed fingerprinting game.
A further study of this universal joint decoder is left as an open problem.
5 Discussion
Let us now briefly discuss the results from Sections 3 and 4, their consequences, and some directions for future work.
Informed simple decoding
For the setting of simple decoders, we derived explicit asymptotics on the informed capacities for various attacks, which often scale as . We further showed that loglikelihood scores provably match these bounds for large , regardless of and . Because these decoders are optimal for any value of , they are also optimal in the partially informed setting, where different values of are used. If the encoder uses the arcsine distribution to generate biases, we showed that these capacities generally seem to scale as , which is roughly ‘halfway’ between the fully informed and uninformed capacities.
Uninformed simple decoding
Although loglikelihood decoders have already been studied before in the context of fingerprinting, the main drawback was always that to use these decoders, you would either have to fill in (and know) the exact pirate strategy, or compute the worstcase attack explicitly. So if you are in the simple uninformed setting where you don’t know the pirate strategy and where the worstcase attack is not given by a nice closedform expression [14, Fig. 4b], how can you construct such decoders for large ? The trick seems to be to just fill in the asymptotic worstcase attack, which Huang and Moulin showed is the interleaving attack [14], and which is much simpler to analyze. After previously suggesting this idea to Oosterwijk et al., we now used the same trick here to obtain two other capacityachieving score functions using two different methods (but each time filling in the interleaving attack). So in total we now have three different methods to obtain (closedform) capacityachieving decoders in the uninformed setting:

Using Lagrangemultipliers, Oosterwijk et al. [29] obtained:
(56) 
Using NeymanPearsonbased loglikelihood scores, we obtained:
(57) 
Using a Bayesian approximation of the empirical mutual information decoder of Moulin [25], we obtained:
(58)
For and large , these score functions are equivalent up to a scaling factor:
(59) 
and therefore all three are asymptotically optimal. So there may be many different roads that lead to Rome, but they all seem to have one thing in common: to build a universal decoder that works against arbitrary attacks, one should build a decoder that works against the asymptotic worstcase pirate attack, the interleaving attack. And if it does work against this attack, then it probably works against any other attack as well.
Joint decoding
Although deriving the joint informed capacities is much easier than deriving the simple informed capacities, actually building decoders that provably match these bounds is a different matter. We conjectured that the same loglikelihood scores achieve capacity when a suitable accusation algorithm is used, and we conjectured that the loglikelihood score built against the interleaving attack achieves the uninformed joint capacity, but we cannot prove any of these statements beyond reasonable doubt. For now this is left as an open problem.
Group testing
Since the all attack is equivalent to group testing, some of the results we obtained also apply to group testing. The joint capacity was already known [33], but to the best of our knowledge both the simple capacity (Corollary 1) and a simple decoder matching this simple capacity (Corollary 4) were not yet known before. Attempts have been made to build efficient simple decoders with a code length not much longer than the joint capacity [6], but these do not match the simple capacity. Future work will include computing the capacities and building decoders for various noisy group testing models, where the marking assumption may not apply.
Dynamic fingerprinting
Although this paper focused on applications to the ‘static’ fingerprinting game, the construction of [20] can trivially be applied to the decoders in this paper as well to build efficient dynamic fingerprinting schemes. Although the asymptotics for the code length in this dynamic construction are the same, (i) the order terms are significantly smaller in the dynamic game, (ii) one does not need the assumption that the pirate strategy is colludersymmetric, and (iii) one does not necessarily need to know (a good estimate of) in advance [20, Section V]. An important open problem remains to determine the dynamic uninformed fingerprinting capacity, which may prove or disprove that the construction of [20] is optimal.
Further generalizations
While this paper already aims to provide a rather complete set of guidelines on what to do in the various different fingerprinting games (with different amounts of sideinformation, and different computational assumptions on the decoder), there are some further generalizations that were not considered here due to lack of space. We mention two in particular:

Larger alphabets: In this work we focused on the binary case of different symbols, but it may be advantageous to work with larger alphabet sizes , since the code length decreases linearly with . For the results about decoders we did not really use that we were working with a binary alphabet, so it seems a straightforward exercise to prove that the ary versions of the loglikelihood decoders also achieve capacity. A harder problem seems to be to actually compute these capacities in the various informed settings, since the maximization problem then transforms from a onedimensional optimization problem to a dimensional optimization problem.

Tuple decoding: As in [31], we can consider a setting in between the simple and joint decoding settings, where decisions to accuse are made based on looking at tuples of users of size at most . Tuple decoding may offer a tradeoff between the high complexity, low code length of a joint decoder and the low complexity, higher code length of a simple decoder, and so it may be useful to know how the capacities scale in the region .
6 Acknowledgments
The author is very grateful to Pierre Moulin for his insightful comments and suggestions during the author’s visit to UrbanaChampaign that inspired work on this paper. The author would also like to thank Teddy Furon for pointing out the connection between decoders designed against the interleaving attack and the results of Abbe and Zheng [1], and for finding some mistakes in a preliminary version of this manuscript. Finally, the author thanks Jeroen Doumen, JanJaap Oosterwijk, Boris Škorić, and Benne de Weger for valuable discussions and comments.
References
 [1] E. Abbe and L. Zheng. Linear universal decoding for compound channels. IEEE Transactions on Information Theory, 56(12):5999–6013, 2010.
 [2] E. Amiri and G. Tardos. High rate fingerprinting codes and the fingerprinting capacity. In 20th ACMSIAM Symposium on Discrete Algorithms (SODA), pages 336–345, 2009.
 [3] O. Blayer and T. Tassa. Improved versions of Tardos’ fingerprinting scheme. Designs, Codes and Cryptography, 48(1):79–103, 2008.
 [4] D. Boesten and B. Škorić. Asymptotic fingerprinting capacity for nonbinary alphabets. In 13th Conference on Information Hiding (IH), pages 1–13, 2011.
 [5] D. Boneh and J. Shaw. Collusionsecure fingerprinting for digital data. IEEE Transactions on Information Theory, 44(5):1897–1905, 1998.
 [6] C.L. Chan, S. Jaggi, V. Saligrama, and S. Agnihotri. Nonadaptive group testing: explicit bounds and novel algorithms. In IEEE International Symposium on Information Theory (ISIT), pages 1837–1841, 2012.
 [7] A. Charpentier, F. Xie, C. Fontaine, and T. Furon. Expectation maximization decoding of Tardos probabilistic fingerprinting code. In SPIE Proceedings, volume 7254, 2009.
 [8] T. M. Cover and J. A. Thomas. Elements of Information Theory (2nd Edition). Wiley Press, 2006.
 [9] A. Fiat and T. Tassa. Dynamic traitor tracing. Journal of Cryptology, 14(3):354–371, 2001.
 [10] T. Furon and L. PérezFreire, EM decoding of Tardos traitor tracing codes. In ACM Symposium on Multimedia and Security (MM&Sec), pages 99–106, 2009.
 [11] Y.W. Huang and P. Moulin. Capacityachieving fingerprint decoding. In IEEE Workshop on Information Forensics and Security (WIFS), pages 51–55, 2009.
 [12] Y.W. Huang and P. Moulin. Saddlepoint solution of the fingerprinting capacity game under the marking assumption. In IEEE International Symposium on Information Theory (ISIT), pages 2256–2260, 2009.
 [13] Y.W. Huang and P. Moulin. Maximin optimality of the arcsine fingerprinting distribution and the interleaving attack for large coalitions. In IEEE Workshop on Information Forensics and Security (WIFS), pages 1–6, 2010.