Optimal Compression for TwoField Entries in FixedWidth Memories
Abstract
Data compression is a wellstudied (and wellsolved) problem in the setup of long coding blocks. But important emerging applications need to compress data to memory words of small fixed widths. This new setup is the subject of this paper. In the problem we consider we have two sources with known discrete distributions, and we wish to find codes that maximize the success probability that the two source outputs are represented in bits or less. A good practical use for this problem is a table with twofield entries that is stored in a memory of a fixed width . Such tables of very large sizes drive the core functionalities of network switches and routers. After defining the problem formally, we solve it optimally with an efficient codedesign algorithm. We also solve the problem in the more constrained case where a single code is used in both fields (to save space for storing code dictionaries). For both codedesign problems we find decompositions that yield efficient dynamicprogramming algorithms. With an empirical study we show the success probabilities of the optimal codes for different distributions and memory widths. In particular, the study demonstrates the superiority of the new codes over existing compression algorithms.
I Introduction
In the bestknown datacompression problem, a discrete distribution on source elements is given, and one wishes to find a representation for the source elements with minimum expected length. This problem was solved by the wellknown Huffman coding [2]. Huffman coding reduces the problem to a very efficient recursive algorithm on a tree, which solves it optimally. Indeed, Huffman coding has found use in numerous communication and storage applications. Minimizing the expected length of the coded sequence emitted by the source translates to optimally low transmission or storage costs in those systems. However, there is an important setup in which minimizing the expected length does not translate to optimal improvement in system performance. This setup is fixedwidth memory. Information is stored in a fixedwidth memory in words of bits each. A word is designated to store an entry of fields, each emitted by a source. In this case the prime objective is to fit the representations of the sources into bits, and not to minimize the expected length for each source.
The fixedwidth memory setup is extremely useful in real applications. The most immediate use of it is in networking applications, wherein switches and routers rely on fast access to entries of very large tables [3]. The fast access requires simple word addressing, and the large table size requires good compression. In addition to this principal application, datacentric computing applications can similarly benefit from efficient compression of large data sets in fixedwidth memories. In this paper we consider the compression of data entries with fields into memory words of bits, where is a parameter. This special case of two fields is motivated by switch and router tables in which twofield tables are most common. Throughout the paper we assume that the element chosen for an entry field is drawn i.i.d. from a known distribution for that field, and that elements of distinct fields are independent of each other.
Our operational model for the compression assumes that success to fit an entry to bits translates to fast data access, while failure results in a much slower access because a secondary slower memory is accessed to fit the entry. Therefore, we want to maximize the number of entries we succeed to fit, but do allow some (small) fraction of failures. Correspondingly, the performance measure we consider throughout the paper is : the probability that the encoding of the twofield entry fits in bits. That is, we seek a uniquely decodable encoding function that on some entries is allowed to declare encoding failure, but does so with the lowest possible probability for the source distributions. We emphasize that we do not allow decoding errors, and all entries that succeed encoding are recovered perfectly at read time.
Maximizing is trivial when we encode the two fields jointly by a single code: we simply take the highest probabilities in the product distribution of the two fields, and represent each of these pairs with a codeword of length . However, this solution has prohibitive cost because it requires a dictionary with size quadratic in the support of the individual distributions. For example, if each field takes one of values, we will need a dictionary of order in the trivial solution. To avoid the space blowup, we restrict the code to use dictionaries of sizes linear in the distribution supports. The way to guarantee linear dictionary sizes is by composing the entry code from two codes for the individual fields. Our task is to find two codes for the two fields, such that when the two codewords are placed together in the width word, the entry fields can be decoded unambiguously. The design objective for the codes is to maximize : the probability that the above encoding of the entry has length of at most bits. We first show by example that Huffman coding is not optimal for this design objective. Consider a data entry with two fields, such that the value of the first field is one of possible source elements , and the value of the second field is one of elements . The distributions on these elements are given in the two left columns of Table I.(A)(B); the values of the two fields are drawn independently. We encode the two fields using two codes specified in the right columns of Table I.(A)(B), respectively. The codewords of , are concatenated, and this encoding needs to be stored in the memory word. Table I.(C) enumerates the possible entries (combinations of values for the two fields), and for each gives its probability, its encoding, and the encoding length. The rightmost column of Table I.(C) indicates whether the encoding succeeds to fit in bits (), or not (). The success probability of the pair of codes is the sum of all probabilities in rows marked with . This amounts to



It can be checked that the codes give better than the success probability of the respective Huffman codes, which equals 0.78.
In the sequel we design code pairs , which we call entry coding schemes. We refer to an entry coding scheme as optimal if it maximizes among all entry coding schemes in its class. In the paper we find optimal coding schemes for two classes with different restrictions on their implementation cost: one is given in Section III and one in Section IV.
Finding codes that give optimal cannot be done by known compression algorithms. Also, bruteforce search for an optimal code has complexity that grows exponentially with the number of source elements. Even if we consider only monotone codes – those which assign codewords of lengths nonincreasing with the element probabilities – we can show that the number of such codes grows asymptotically as at least , where . An exact count of monotone codes is provided in the Appendix. The infeasibility of code search motivates the study in this paper toward devising efficient polynomialtime algorithms for finding optimal fixedwidth codes.
For the case of entries with fields, this paper solves the problem of optimal fixedwidth compression. We present an efficient algorithm that takes any pair of element distributions and an integer , and outputs an entry encoder with optimal . The encoder has the uniquedecodability property, and requires a dictionary whose size equals the sum of the element counts for the two fields. We also solve the problem for the more restrictive setup in which the same code needs to be used for both fields. This setup is motivated by systems with a more restricted memory budget, which cannot afford storing two different dictionaries for the two fields. For both setups finding optimal codes is formulated as efficient dynamicprogramming algorithms building on clever decompositions of the problems to smaller problems.
Fig. 1 shows an illustration of a plausible realization of fixedwidth compression in a networking application. First, (a) illustrates the write operation where the entry is encoded and is written to the fast SRAM memory if the encoded entry has a length of at most bits. Otherwise, it is stored in the slow DRAM memory. (b) shows the read operation where encoded data is first searched in the fast SRAM memory. If it is not found, the slow DRAM memory is also accessed after the entry index is translated.
Paper organization
In Section II, we present the formal model for the problem of compression for fixedwidth memories. Setting up the problem requires some definitions extending classical singlesource compression to sources modeling the fields in the entry. Another important novelty of Section II is the notion of an encoder that is allowed to fail on some of the input pairs. In Section III we derive an algorithm for finding an optimal coding scheme for fixedwidth compression with fields. There are two main ingredients to that algorithm: 1) an algorithm finding an optimal prefix code for the first field given the code for the second field, and 2) a code for the second field that is optimal for any code used in the first field. In the second ingredient it turns out that the code needs not be a prefix code, but it does need to satisfy a property which we call padding invariance. Next, in Section IV we study the special case where the two fields of an entry need to use the same code. The restriction to use one code renders the algorithm of Section III useless for this case. Hence we develop a completely different decomposition of the problem that can efficiently optimize a single code jointly for both fields. In Section V we present empiric results showing the success probabilities of the optimal codes for different values of the memory width . The results show some significant advantage of the new codes over existing Huffman codes. In addition, they compare the performance of the schemes using two codes (Section III) to those that use a single code (Section IV). Finally, Section VI summarizes our results and discusses directions for future work.
Relevant prior work
There have been several extensions of the problem solved by Huffman coding that received attention in the literature [4, 5, 6, 7, 8]. These are not directly related to fixedwidth memories, but they use clever algorithmic techniques that may be found useful for this problem too. The notion of fixedwidth representations is captured by the work of Tunstall on variabletofixed coding [9], but without the feature of multifield entries. The problem of fixedwidth encoding is also related to the problem of compression with low probability of buffer overflow [10, 11]. However, to the best of our knowledge the prior work only covers a single distribution and an asymptotic number of encoding instances (in our problem the number of encoding instances is a fixed ). The most directly related to the results of this paper is our prior work [12] considering a combinatorial version of this problem, where instead of source distributions we are given an instance with element pairs. This combinatorial version is less realistic for applications, because a new code needs to be designed after every change in the memory occupancy.
Ii Model and Problem Formulation
Iia Definitions
We start by providing some definitions, upon which we next cast the formal problem formulation. Throughout the paper we assume that data placed in the memory come from known distributions, as now defined.
Definition 1 (Element Distribution).
An element distribution = is characterized by an (ordered) set of elements with their corresponding positive appearance probabilities. An element of is drawn randomly and independently according to the distribution , i.e., , with and .
Throughout the paper we assume that the elements are ordered in nonincreasing order of their corresponding probabilities . Given an element distribution, a fixedtovariable code , which we call code for short, is a set of binary codewords (of different lengths in general), with a mapping from to called an encoder, and a mapping from to called a decoder.
In this paper we are mostly interested in coding data entries composed of element pairs, so we extend Definition 1 to distributions of twoelement data entries. For short, we refer to a data entry simply as an entry.
Definition 2 (Entry Distribution).
An entry distribution = is a juxtaposition of two element distributions. Each field in an entry is drawn randomly and independently according to its corresponding distribution in , i.e., and , with . The numbers of possible elements in the first and second field of an entry are and , respectively.
Example 1.
The entry distribution of the two fields illustrated in Table I can be specified as .
Once we defined the distributions governing the creation of entries, we can proceed to definitions addressing the coding of such entries. In all our coding schemes, entries are encoded to bit strings of fixed length (possibly with some padding), where is a parameter equal to the width of the memory in use. An entry coding scheme is defined through its encoding function.
Definition 3 (Entry Encoding Function).
An entry encoding function is a mapping , where is the set of binary vectors of length , and is a special symbol denoting encoding failure.
The input to the encoding function are two elements, one for field taken from , and one for field taken from . In successful encoding, the encoder outputs a binary vector of length to be stored in memory, and this vector is obtained uniquely for this input entry. The encoder is allowed to fail, in which case the output is the special symbol . When the encoder fails, we assume that an alternative representation is found for the entry, and stored in a different memory not bound to the width constraint. This alternative representation is outside the scope of this work, but because we know it results in higher cost (in space and/or in time), our objective here is to minimize the failure probability of the encoding function. Before we formally define the failure probability, we give a definition of the decoding function matching the encoding function of Definition 3.
Definition 4 (Entry Decoding Function).
An entry decoding function is a mapping , such that if , then . The decoding function is limited to use space to store the mapping.
The definition of the decoding function is straightforward: it is the inverse mapping of the encoding function when encoding succeeds. The constraint of using space for dictionaries is crucial: combining the two fields of the entry to one product distribution on elements is both practically infeasible and theoretically uninteresting. With encoding and decoding in place, we turn to define the important measure of encoding success probability.
Definition 5 (Encoding Success Probability).
Given an entry distribution and an entry encoding function , define the encoding success probability as
(1) 
where the probability is calculated over all pairs of from according to their corresponding probabilities .
Concluding this section, we define an entry coding scheme as a pair of entry encoding function and matching entry decoding function , meeting the respective definitions above. Our principal objective in this paper is to find an entry coding scheme that maximizes the encoding success probability given the entry distribution and the output length .
Iii Optimal Entry Coding
In this section we develop a general entry coding scheme that yields optimal encoding success probability under some reasonable assumptions on the entry encoding function.
Iiia Uniquely decodable entry encoding
The requirement from the decoding function to invert the encoding function for all successful encodings implies that the encoding function must be uniquely decodable, that is, an encoderoutput bit vector represents a unique entry . Constraining the decoding function to using only space implies using two codes: for and for , because joint encoding of in general requires space for decoding. We then associate an entry encoding function with the pair of codes . The entry decoding function must first “parse” the vector to two parts: one used to decode and one to decode . An effective way to allow this parsing is by selecting the code for to be a prefix code, defined next.
Definition 6 (Prefix Code).
For a set of elements , a code is called a prefix code if in its codeword set no codeword is a prefix (start) of any other codeword.
Thanks to using a prefix code for , the decoder can read the length vector from left to right and unequivocally find the last bit representing , and in turn the remaining bits representing . Hence in the remainder of the section we restrict the entry encoding functions to have a constituent prefix encoder for their first field. This restriction of the encoder to be prefix is reminiscent of the classical result in information theory by KraftMcMillan [13] that prefix codes are sufficient for optimal fixedtovariable uniquely decodable compression. In our case we cannot prove that encoding functions with a prefix code for are always sufficient for optimality, but we also could not find better than prefix encoding functions that attain the space constraint at the decoder. To be formally correct, we call an entry coding scheme optimal in this section if it has the maximum success probability among coding schemes whose first code is prefix.
For unique decodability in the second field also, we use a special code property we call padding invariance.
Definition 7 (PaddingInvariant Code).
A code is called a paddinginvariant code if after eliminating the trailing (last) zero bits from its codewords (possibly including the empty codeword of length 0), distinct binary vectors are obtained.
Example 2.
Consider the following codes: with a set of codewords , with and with . The code is a prefix code since none of its codewords is a prefix of another codeword. It is also a paddinginvariant code since after eliminating the trailing zeros, the four distinct codewords (the empty codeword), and are obtained. While is also a paddinginvariant code (the codewords are distinct), it is not a prefix code since 1 is a prefix of 110 and is a prefix of 1 and 110. The code is neither a prefix code (1 is a prefix of 10) nor a paddinginvariant code (the codeword 1 is obtained after eliminating the trailing zero bits of 1 and 10).
We now show how a prefix code and a paddinginvariant code are combined to obtain a uniquelydecodable entry encoding function. The following is not yet an explicit construction, but rather a general construction method we later specialize to obtain an optimal construction.
Construction 0.
Let be a prefix code and be a paddinginvariant code. The vector output by the entry encoding function is defined as follows. The entry encoding function first maps to a word of and to a word of . Then the vector is formed by concatenating the codeword of to the right of ’s codeword, and padding the resulting binary vector with zeros on the right to get bits total. If the two codewords exceed bits before padding, the entry encoding function returns .
We show that because is a prefix code while is a paddinginvariant code (not necessarily prefix), the entry encoding function of Construction ‣ IIIA is uniquely decodable. It is easy to see also that padding invariance of the second code is necessary to guarantee that.
Property 1.
Let be an entry encoding function specified by Construction ‣ IIIA. Let be two entries composed of elements , . For representing concatenation, let and be the two length vectors output by the entry encoding function . If , then necessarily , i.e., .
To show the correctness of this property we explain why the decoding of a coded entry is unique.
Proof.
We first explain that there is a single element whose encoding is a prefix of . Assume the contrary, and let be two (distinct) elements such that and are both prefixes of . It then follows that either is a prefix of or is a prefix of . Since and is a prefix code, both options are impossible, and we must have that there is indeed a single element whose encoding is a prefix of . Thus necessarily and . By eliminating the first identical bits that stand for from , we are left with the encoding of the second field, possibly including zero trailing bits. By the properties of the paddinginvariant code , it has at most a single element that maps to the bits remaining after eliminating any number of the trailing zero bits. We identify this codeword as , and deduce the only possible element of this field that has this codeword. ∎
To decode an encoded entry composed of the encodings , , possibly with some padded zero bits, a decoder can simply identify the single element whose encoding is a prefix of the encoded entry. As is a prefix code, there cannot be more than one such element. The decoder can then identify the beginning of the second part of the encoded entry, which encodes a unique codeword of with some trailing zero bits. This is necessarily from which can be derived.
Toward finding an entry coding scheme with maximal encodingsuccess probability, we would now like to find codes and for Construction ‣ IIIA that are optimal given an entry distribution and the output length . Since the direct joint computation of optimal and codes is difficult, we take a more indirect approach to achieve optimality. 1) we first derive an efficient algorithm to find an optimal prefix code for given a code for , and then 2) we find a (paddinginvariant) code for that is universally optimal for any code used for . This way we reduce the joint optimization of , (hard) to a conditional optimization of given (easier). We prove that this conditionally optimal is also unconditionally optimal. To do so, we show that the code assumed in the conditioning is optimal for any choice of . These two steps are detailed in the next two subsections, and later used together to establish an optimal coding scheme.
IiiB An optimal prefix code for conditioned on a code for
Our objective in this subsection is to find a prefix code that maximizes the encoding success probability given a code for the second field. We show constructively that this task can be achieved with an efficient algorithm.
We denote for and . Finding an optimal coding scheme when is easy. In that case we could allocate bits to and bits to and apply two independent fixedlength codes with a total of bits and obtain a success probability . Thus we focus on the interesting case where .
Given a code for , we show a polynomialtime algorithm that finds an optimal conditional prefix code for . This code will have an encoding function maximizing the probability given , when is restricted to be prefix. Then, we also say that with its corresponding decoding function is an optimal conditional coding scheme.
To build the code we assign codewords to elements of , where . Clearly, all such codewords have lengths of at most bits. For a (single) binary string , let denote the length in bits of . Since for every element which is assigned a codeword the code satisfies , it holds that must be an integer multiple of . We define the weight of a codeword of length as the number of units of in , denoted by . The elements of not represented by are said to have length and codeword weight zero. A prefix code exists with prescribed codeword lengths if they satisfy Kraft’s inequality [14]. In our terminology, this means that the sum of weights of the codewords of need be at most .
Definition 8.
Consider entries composed of an element from the first (highest probability) elements of (for ), and an arbitrary element of . For and , we denote by the maximal sum of probabilities of such entries that can be encoded successfully by a prefix code whose sum of weights for the first codewords is at most . Formally,
(2) 
where is the indicator function. Note that depends on the conditioned , but we keep this dependence implicit to simplify notation.
The following theorem relates the maximal success probability of a conditional coding scheme and the function .
Theorem 1.
The maximal success probability of a conditional coding scheme is given by
Proof.
To satisfy Kraft’s inequality we should limit the sum of weights to . In addition, the success probability of the coding scheme with an encoding function is calculated based on entries with any number of the elements of . ∎
We next show how to compute efficiently for all , in particular for , that yield the optimal conditional . To do that, we use the following recursive formula for . First note the boundary cases for and for (this means an invalid code). We can now present the formula of that calculates its values for based on the values of the function for and .
Lemma 2.
The function satisfies for
(3) 
Proof.
The optimal code that attains either assigns a codeword to or does not. The two arguments of the outer max function in (2) are the respective success probabilities for these two choices. In the former case we consider all possible lengths of the codeword of . A codeword length of reduces the available sum of weights for the first elements by . In addition, an entry contributes to the success probability the value if its encoding width (given ) is at most . In the latter case the element does not contribute to the success probability and has no weight, hence in this case. ∎
Finally, the pseudocode of the dynamicprogramming algorithm that finds the optimal conditional code based on the above recursive formula is given in Algorithm 1. It iteratively calculates the values of . It also uses a vector to represent the codeword lengths for the first at most elements of in a solution achieving .
Time Complexity: By the above description, there are iterations and in each values are calculated, each by considering sums of elements. It follows that the time complexity of the algorithm is , which is polynomial in the size of the input. The last equality follows from the fact that in a nontrivial instance where some entries fail encoding.
IiiC A universally optimal code for
We now develop the second component required to complete an unconditionally optimal entry coding scheme: a code for the elements that is optimal for any code used for the elements.
To that end we now define the natural notion of a monotone coding scheme, in which higherprobability elements are assigned codeword lengths shorter or equal to lowerprobability ones.
Definition 9 (Monotone Coding Scheme).
A coding scheme with an encoding function of an entry distribution is called monotone if is satisfies
(i) for , implies that for two elements .
(ii) for , the elements that are not assigned a codeword in are the last elements in (if any).
It is an intuitive fact that without loss of generality an optimal entry coding scheme can be assumed monotone. We prove this in the following.
Lemma 3.
For any entry distribution , and any memory width , there exists a monotone optimal coding scheme.
Proof.
We show how to build an optimal monotone coding scheme based on any optimal coding scheme with an encoding function . For consider two arbitrary indices that satisfy . Then necessarily . If codewords are assigned to the two elements and , we can replace by a new code obtained by permuting the two codewords of . With this change, an entry with is encoded successfully after the change if the corresponding entry with was encoded successfully before the change. Then, we deduce that such a change cannot decrease , and the result follows. In addition, if some elements are not assigned codewords, the same argument shows that these elements should be the ones with the smallest probabilities. ∎
The optimality of the code to be specified for will be established by showing it attains the following upper bound on success probability.
Proposition 4.
Given any code , the encoding success probability of any entry encoding function is bounded from above as follows.
(5) 
where is the highest index of an element that is assigned a codeword by .
Proof.
First by the monotonicity property proved in Lemma 3, assigns codewords to elements with indices , for some . Hence for indices greater than the success probability is identically zero. Given an element with a codeword of length , at most elements of can be successfully encoded with it in an entry. So the inner sum in (5) is the maximal success probability given . Summing over all and multiplying by gives the upper bound. ∎
It turns out that there exists a paddinginvariant code that attains the upper bound of Proposition 4 with equality for any code . For the ordered set , let the encoder of map to the shortest binary representation of for , and to for . The binary representation is put in the codeword from left to right, leastsignificant bit (LSB) first. Then we have the following.
Proposition 5.
Given any code , the encoding success probability of is
(6) 
where is the highest index of an element that is assigned a codeword by .
Proof.
If the codeword of uses bits, there are bits left vacant for . The mapping specified for allows encoding successfully the first elements of , which gives the stated success probability. ∎
In particular, when the encoding of has length , the single element is encoded successfully by the empty codeword . Other examples are the two codewords (, 1) when , and the four codewords (, 1, 01, 11) when . It is clear that the code is padding invariant, because its codewords are minimallength binary representations of integers. Now we are ready to specify the optimal entry coding scheme in the following.
Construction 1.
Theorem 6.
For any entry distribution and a memory width , Construction 1 gives an optimal entry coding scheme, that is, a coding scheme that maximizes the success probability among all uniquelydecodable coding schemes with a prefix code in the first field.
From Theorem 6 we can readily obtain an efficient algorithm finding an optimal twofield entry encoding, which is given in Algorithm 2. Optimality is proved up to the assumption that the first code is a prefix code. It is not clear how one can obtain better codes than those with a prefix , while keeping the dictionary size .
For the special case when is , the recursive formula for the calculation of the function can be simplified as follows.
Lemma 7.
When is , the function satisfies for
(7) 
It is easily seen that (7) is obtained from (3) by replacing the indicator function with the partial sum that accommodates all the elements that have a short enough representation to fit alongside the element.
The following example illustrates Construction 1 on the entry distribution from the Introduction.
Example 3.
Consider the entry distribution from Table I with . The width parameter is . For the ordered set , we select the code by mapping to and (for ) to the shortest binary representation of . Then, and . The code is a paddinginvariant code. To get the prefix code we apply Algorithm 1 on the code . We recursively calculate the values of and for , . In particular, for each value of the values are calculated based on the previous value of . The values are listed in Table II. Each column describes a different value of . (Whenever values of and are not shown, a specific value of does not improve the probability achieved for a smaller value of in the same column.) The value of implies a restriction on the values of the codeword lengths. If the lengths of the codewords are described by a set , they must satisfy , i.e. .
We first explain the values for , considering the contribution to the success probability of data entries with as the first element. This happens w.p. . For we must have , i.e. the element a is assigned a codeword of length 4. Then, there is a single pair that can be encoded successfully and is given by is . Likewise, for , we can have a codeword of length 3 for and the two pairs can be encoded within bits, such that . If we cannot further decrease the codeword length and improve the success probability. For we can have a codeword length of 2 bits, as described by . This enables encoding successfully the three pairs with a success probability of 0.4 as given by . The values for larger values of are calculated in a similar manner based on the recursive formulas. The optimal codeword lengths for are given by . This enables to encode successfully all pairs besides , achieving as given by .
Finally, by applying Construction ‣ IIIA on , we obtain the entry encoding function .
0  0  0  0  0  

()  (,)  (,,)  (,,,)  (,,,,)  
0.2  0.2  0.2  0.2  0.2  
(4)  (4,)  (4,,)  (4,,,)  (4,,,,)  
0.32  0.35  0.35  0.35  0.35  
(3)  (4,4)  (4,4,)  (4,4,,)  (4,4,,,)  
0.47  0.47  0.47  0.47  
(3,4)  (3,4,)  (3,4,,)  (3,4,,,)  
0.4  0.56  0.56  0.56  0.56  
(2)  (3,3)  (3,3,)  (3,3,,)  (3,3,,,)  
0.64  0.64  0.64  
(3,3,4)  (3,3,4,)  (3,3,4,,)  
0.64  0.688  0.688  0.688  
(2,3)  (3,3,3)  (3,3,3,)  (3,3,3,,)  
0.72  0.728  0.728  
(2,3,4)  (3,3,3,4)  (3,3,3,4,)  
0.7  0.768  0.768  0.768  
(2,2)  (2,3,3)  (2,3,3,)  (2,3,3,,)  
0.78  0.808  0.808  
(2,2,4)  (2,3,3,4)  (2,3,3,4,)  
0.828  0.832  0.838  
(2,2,3)  (2,3,3,3)  (2,3,3,4,4)  
0.868  0.868  
(2,2,3,4)  (2,2,3,4,)  
0.86  0.892  0.898  
(2,2,2)  (2,2,3,3)  (2,2,3,4,4)  
0.9  0.922  
(2,2,2,4)  (2,2,3,3,4)  
0.924  0.94  
(2,2,2,3)  (2,2,3,3,3)  
0.954  
(2,2,2,3,4)  
0.94  0.972  
(2,2,2,2)  (2,2,2,3,3) 
Iv Optimal Entry Encoding with the Same Code for Both Fields
In this section we move to study the problem of entry coding schemes for the special case where we require that both fields use the same code. It is commonly the case that both fields have the same element distribution, and then using one code instead of two can cut the dictionary storage by half. In practice this can offer a significant cost saving. Throughout this section we thus assume that the fields have the same distribution, but the results can be readily extended to the case where the distributions are different and we still want a single code with optimal success probability. Formally, in this section our problem is to efficiently design a single code that offers optimal encoding success probability in a width memory.
In the special case of a single distribution we have an element distribution = , and the entry distribution is . Now the space constraint for the decoder (to hold the dictionary) is to be of size at most . This means that we need to find one code for that will be used in both fields. To be able to parse the two fields of the entry, needs to be a prefix code.
Iva Observations on the problem
Before moving to solve the problem, it will be instructive to first understand the root difficulty in restricting both fields to use the same code. If we try to extend the dynamicprogramming solution of Section III to the singledistribution,singlecode case, we get the following maximization problem
(8) 
where we adapted the expression from (2) to the case of a single distribution and a single code. But now trying to extend the recursive expression for in (3) gives
(9) 
which cannot be used by the algorithm because the indicator function now depends on lengths of codewords that were not assigned yet, namely for the elements . So even though we now only have a single code to design, this task is considerably more challenging than the conditional optimization of Section IIIB. At this point the only apparent route to solve (8) is by trying exhaustively all length assignments to satisfying Kraft’s inequality, and enumerating the arguments of the max function in (8) directly. But this would be intractable.
IvB Efficient algorithm for optimal entry encoding
In the remainder of the section we show an algorithm that offers an efficient way around the abovementioned difficulty to assign codeword lengths to elements. We present this efficient algorithm formally, but first note its main idea.
The main idea: we showed in Section IVA that it is not possible to maximize the singlecode success probability for elements given the optimal codeword lengths for elements. So it does not work to successively add elements to the solution while maintaining optimality as an invariant. But fortunately, it turns out that it does work to successively add codeword lengths to the solution while maintaining optimality as an invariant. The subtle part is that the lengths need to be added in a carefully thoughtof sequence, which in particular, is not the linear sequence or its reverseordered counterpart. We show that if the codeword lengths are added in the order of the sequence
(10) 
(for even^{1}^{1}1For convenience we assume that is even, but all the results extend to odd . ), then for any subsequence we can maximize the success probability given the optimal codeword lengths taken from the subsequence that is one shorter. For example, when our algorithm will first find an optimal code only using codeword length ; based on this optimum it will find an optimal code with lengths and , and then continue to add the codeword lengths in that order.
We now turn to a more formal treatment of the algorithm. We first define the function holding the optimal success probabilities for subproblems of the problem instance. The following Definition 10 is the adaptation of Definition 8 to the sequence of codeword lengths applicable in the singlecode case.
Definition 10.
Consider assignments of finite codeword lengths to the consecutive elements from , where the lengths are assigned from the values taken from the subsequence of (10) that ends with . For we denote by the maximal success probability for such an assignment whose sum of weights for these codewords is at most . Formally,
(11) 
The following two theorems are the key drivers of the efficient dynamicprogramming algorithm finding the optimal code.
Theorem 8.
Let for some integer . For the length subsequent to in the sequence (10) , we have the following
(12) 
where we define
(13) 
Proof.
Given maximal values for all values of and with lengths up to in the sequence, the maximal value when is also allowed is obtained by assigning length to between and elements in the range . By the monotonicity of , which is higher than all previous lengths must be assigned to the highest indices in the range . Thus for each the success probability is the value of for the corresponding range of elements with the residual weight . In particular, the elements assigned length do not add to the success probability, because plus any length in the subsequence up to exceeds . In the extreme case when (all elements assigned length ), the definition (13) when appearing in the righthand side of (12) gives a valid assignment with success probability if in the lefthand side is sufficiently large. ∎