Optimal Compression for Two-Field Entries in Fixed-Width Memories

Optimal Compression for Two-Field Entries in Fixed-Width Memories

Ori Rottenstreich and Yuval Cassuto
Abstract

Data compression is a well-studied (and well-solved) problem in the setup of long coding blocks. But important emerging applications need to compress data to memory words of small fixed widths. This new setup is the subject of this paper. In the problem we consider we have two sources with known discrete distributions, and we wish to find codes that maximize the success probability that the two source outputs are represented in bits or less. A good practical use for this problem is a table with two-field entries that is stored in a memory of a fixed width . Such tables of very large sizes drive the core functionalities of network switches and routers. After defining the problem formally, we solve it optimally with an efficient code-design algorithm. We also solve the problem in the more constrained case where a single code is used in both fields (to save space for storing code dictionaries). For both code-design problems we find decompositions that yield efficient dynamic-programming algorithms. With an empirical study we show the success probabilities of the optimal codes for different distributions and memory widths. In particular, the study demonstrates the superiority of the new codes over existing compression algorithms.

Data compression, fixed-width memories, table compression, Huffman coding, network switches and routers.
footnotetext: This paper solves a problem first formulated in the paper “Compression for Fixed-Width Memories” presented at the IEEE International Symposium on Information Theory (ISIT) 2013, Istanbul, Turkey [1]. O. Rottenstreich is with Princeton University, Princeton NJ, USA (e-mail: orir@princeton.edu). Y. Cassuto is with the Viterbi Department of Electrical Engineering, Technion – Israel Institute of Technology, Haifa Israel (e-mail: ycassuto@ee.technion.ac.il). This work was supported in part by the Israel Science Foundation (ISF), and by the Intel Center for Computing Intelligence (ICRI-CI).

I Introduction

In the best-known data-compression problem, a discrete distribution on source elements is given, and one wishes to find a representation for the source elements with minimum expected length. This problem was solved by the well-known Huffman coding [2]. Huffman coding reduces the problem to a very efficient recursive algorithm on a tree, which solves it optimally. Indeed, Huffman coding has found use in numerous communication and storage applications. Minimizing the expected length of the coded sequence emitted by the source translates to optimally low transmission or storage costs in those systems. However, there is an important setup in which minimizing the expected length does not translate to optimal improvement in system performance. This setup is fixed-width memory. Information is stored in a fixed-width memory in words of bits each. A word is designated to store an entry of fields, each emitted by a source. In this case the prime objective is to fit the representations of the sources into bits, and not to minimize the expected length for each source.

The fixed-width memory setup is extremely useful in real applications. The most immediate use of it is in networking applications, wherein switches and routers rely on fast access to entries of very large tables [3]. The fast access requires simple word addressing, and the large table size requires good compression. In addition to this principal application, data-centric computing applications can similarly benefit from efficient compression of large data sets in fixed-width memories. In this paper we consider the compression of data entries with fields into memory words of bits, where is a parameter. This special case of two fields is motivated by switch and router tables in which two-field tables are most common. Throughout the paper we assume that the element chosen for an entry field is drawn i.i.d. from a known distribution for that field, and that elements of distinct fields are independent of each other.

Our operational model for the compression assumes that success to fit an entry to bits translates to fast data access, while failure results in a much slower access because a secondary slower memory is accessed to fit the entry. Therefore, we want to maximize the number of entries we succeed to fit, but do allow some (small) fraction of failures. Correspondingly, the performance measure we consider throughout the paper is : the probability that the encoding of the two-field entry fits in bits. That is, we seek a uniquely decodable encoding function that on some entries is allowed to declare encoding failure, but does so with the lowest possible probability for the source distributions. We emphasize that we do not allow decoding errors, and all entries that succeed encoding are recovered perfectly at read time.

Maximizing is trivial when we encode the two fields jointly by a single code: we simply take the highest probabilities in the product distribution of the two fields, and represent each of these pairs with a codeword of length . However, this solution has prohibitive cost because it requires a dictionary with size quadratic in the support of the individual distributions. For example, if each field takes one of values, we will need a dictionary of order in the trivial solution. To avoid the space blow-up, we restrict the code to use dictionaries of sizes linear in the distribution supports. The way to guarantee linear dictionary sizes is by composing the entry code from two codes for the individual fields. Our task is to find two codes for the two fields, such that when the two codewords are placed together in the width- word, the entry fields can be decoded unambiguously. The design objective for the codes is to maximize : the probability that the above encoding of the entry has length of at most bits. We first show by example that Huffman coding is not optimal for this design objective. Consider a data entry with two fields, such that the value of the first field is one of possible source elements , and the value of the second field is one of elements . The distributions on these elements are given in the two left columns of Table I.(A)-(B); the values of the two fields are drawn independently. We encode the two fields using two codes specified in the right columns of Table I.(A)-(B), respectively. The codewords of , are concatenated, and this encoding needs to be stored in the memory word. Table I.(C) enumerates the possible entries (combinations of values for the two fields), and for each gives its probability, its encoding, and the encoding length. The rightmost column of Table I.(C) indicates whether the encoding succeeds to fit in bits (), or not (). The success probability of the pair of codes is the sum of all probabilities in rows marked with . This amounts to

Element Prob. codeword
0.4 00
0.3 01
0.16 10
0.08 110
0.06 111
(A) First field with code
Element Prob. codeword
0.5 0
0.3 10
0.2 11
(B) Second field with code
Entry Prob. Encoding Width Width
() 0.20 00 0 3
() 0.12 00 10 4
() 0.08 00 11 4
() 0.15 01 0 3
() 0.09 01 10 4
() 0.06 01 11 4
() 0.08 10 0 3
() 0.048 10 10 4
() 0.032 10 11 4
() 0.04 110 0 4
() 0.024 110 10 5 -
() 0.016 110 11 5 -
() 0.03 111 0 4
() 0.018 111 10 5 -
() 0.012 111 11 5 -
(C) Possible entries encoded by the encoding function
TABLE I: Example of coding for two fields. I(A) and I(B) list entry distributions of the two fields, and the two codes and chosen for them. I(C) lists all combinations of values from the two fields, their encoding using the concatenation , and the encoding lengths. Spaces are presented for convenience, in real memory the codewords are concatenated without spaces and with trailing zeros if needed.

It can be checked that the codes give better than the success probability of the respective Huffman codes, which equals 0.78.

In the sequel we design code pairs , which we call entry coding schemes. We refer to an entry coding scheme as optimal if it maximizes among all entry coding schemes in its class. In the paper we find optimal coding schemes for two classes with different restrictions on their implementation cost: one is given in Section III and one in Section IV.

Finding codes that give optimal cannot be done by known compression algorithms. Also, brute-force search for an optimal code has complexity that grows exponentially with the number of source elements. Even if we consider only monotone codes – those which assign codewords of lengths non-increasing with the element probabilities – we can show that the number of such codes grows asymptotically as at least , where . An exact count of monotone codes is provided in the Appendix. The infeasibility of code search motivates the study in this paper toward devising efficient polynomial-time algorithms for finding optimal fixed-width codes.

For the case of entries with fields, this paper solves the problem of optimal fixed-width compression. We present an efficient algorithm that takes any pair of element distributions and an integer , and outputs an entry encoder with optimal . The encoder has the unique-decodability property, and requires a dictionary whose size equals the sum of the element counts for the two fields. We also solve the problem for the more restrictive setup in which the same code needs to be used for both fields. This setup is motivated by systems with a more restricted memory budget, which cannot afford storing two different dictionaries for the two fields. For both setups finding optimal codes is formulated as efficient dynamic-programming algorithms building on clever decompositions of the problems to smaller problems.

Fig. 1 shows an illustration of a plausible realization of fixed-width compression in a networking application. First, (a) illustrates the write operation where the entry is encoded and is written to the fast SRAM memory if the encoded entry has a length of at most bits. Otherwise, it is stored in the slow DRAM memory. (b) shows the read operation where encoded data is first searched in the fast SRAM memory. If it is not found, the slow DRAM memory is also accessed after the entry index is translated.

(a) Write
(b) Read
Fig. 1: A plausible realization of a memory system with fixed-width compression. It includes a fast fixed-width (e.g., SRAM) memory with words of bits, and an additional slower (e.g. DRAM) memory. Entries that can be encoded within bits are stored in the fast memory while other entries are stored in the slow memory without the width limit. The encoding and decoding are performed based on the designed code pair .

Paper organization

In Section II, we present the formal model for the problem of compression for fixed-width memories. Setting up the problem requires some definitions extending classical single-source compression to sources modeling the fields in the entry. Another important novelty of Section II is the notion of an encoder that is allowed to fail on some of the input pairs. In Section III we derive an algorithm for finding an optimal coding scheme for fixed-width compression with fields. There are two main ingredients to that algorithm: 1) an algorithm finding an optimal prefix code for the first field given the code for the second field, and 2) a code for the second field that is optimal for any code used in the first field. In the second ingredient it turns out that the code needs not be a prefix code, but it does need to satisfy a property which we call padding invariance. Next, in Section IV we study the special case where the two fields of an entry need to use the same code. The restriction to use one code renders the algorithm of Section III useless for this case. Hence we develop a completely different decomposition of the problem that can efficiently optimize a single code jointly for both fields. In Section V we present empiric results showing the success probabilities of the optimal codes for different values of the memory width . The results show some significant advantage of the new codes over existing Huffman codes. In addition, they compare the performance of the schemes using two codes (Section III) to those that use a single code (Section IV). Finally, Section VI summarizes our results and discusses directions for future work.

Relevant prior work

There have been several extensions of the problem solved by Huffman coding that received attention in the literature [4, 5, 6, 7, 8]. These are not directly related to fixed-width memories, but they use clever algorithmic techniques that may be found useful for this problem too. The notion of fixed-width representations is captured by the work of Tunstall on variable-to-fixed coding [9], but without the feature of multi-field entries. The problem of fixed-width encoding is also related to the problem of compression with low probability of buffer overflow [10, 11]. However, to the best of our knowledge the prior work only covers a single distribution and an asymptotic number of encoding instances (in our problem the number of encoding instances is a fixed ). The most directly related to the results of this paper is our prior work [12] considering a combinatorial version of this problem, where instead of source distributions we are given an instance with element pairs. This combinatorial version is less realistic for applications, because a new code needs to be designed after every change in the memory occupancy.

Ii Model and Problem Formulation

Ii-a Definitions

We start by providing some definitions, upon which we next cast the formal problem formulation. Throughout the paper we assume that data placed in the memory come from known distributions, as now defined.

Definition 1 (Element Distribution).

An element distribution = is characterized by an (ordered) set of elements with their corresponding positive appearance probabilities. An element of is drawn randomly and independently according to the distribution , i.e., , with and .

Throughout the paper we assume that the elements are ordered in non-increasing order of their corresponding probabilities . Given an element distribution, a fixed-to-variable code , which we call code for short, is a set of binary codewords (of different lengths in general), with a mapping from to called an encoder, and a mapping from to called a decoder.

In this paper we are mostly interested in coding data entries composed of element pairs, so we extend Definition 1 to distributions of two-element data entries. For short, we refer to a data entry simply as an entry.

Definition 2 (Entry Distribution).

An entry distribution = is a juxtaposition of two element distributions. Each field in an entry is drawn randomly and independently according to its corresponding distribution in , i.e., and , with . The numbers of possible elements in the first and second field of an entry are and , respectively.

Example 1.

The entry distribution of the two fields illustrated in Table I can be specified as .

Once we defined the distributions governing the creation of entries, we can proceed to definitions addressing the coding of such entries. In all our coding schemes, entries are encoded to bit strings of fixed length (possibly with some padding), where is a parameter equal to the width of the memory in use. An entry coding scheme is defined through its encoding function.

Definition 3 (Entry Encoding Function).

An entry encoding function is a mapping , where is the set of binary vectors of length , and is a special symbol denoting encoding failure.

The input to the encoding function are two elements, one for field taken from , and one for field taken from . In successful encoding, the encoder outputs a binary vector of length to be stored in memory, and this vector is obtained uniquely for this input entry. The encoder is allowed to fail, in which case the output is the special symbol . When the encoder fails, we assume that an alternative representation is found for the entry, and stored in a different memory not bound to the width- constraint. This alternative representation is outside the scope of this work, but because we know it results in higher cost (in space and/or in time), our objective here is to minimize the failure probability of the encoding function. Before we formally define the failure probability, we give a definition of the decoding function matching the encoding function of Definition 3.

Definition 4 (Entry Decoding Function).

An entry decoding function is a mapping , such that if , then . The decoding function is limited to use space to store the mapping.

The definition of the decoding function is straightforward: it is the inverse mapping of the encoding function when encoding succeeds. The constraint of using space for dictionaries is crucial: combining the two fields of the entry to one product distribution on elements is both practically infeasible and theoretically uninteresting. With encoding and decoding in place, we turn to define the important measure of encoding success probability.

Definition 5 (Encoding Success Probability).

Given an entry distribution and an entry encoding function , define the encoding success probability as

(1)

where the probability is calculated over all pairs of from according to their corresponding probabilities .

Concluding this section, we define an entry coding scheme as a pair of entry encoding function and matching entry decoding function , meeting the respective definitions above. Our principal objective in this paper is to find an entry coding scheme that maximizes the encoding success probability given the entry distribution and the output length .

Iii Optimal Entry Coding

In this section we develop a general entry coding scheme that yields optimal encoding success probability under some reasonable assumptions on the entry encoding function.

Iii-a Uniquely decodable entry encoding

The requirement from the decoding function to invert the encoding function for all successful encodings implies that the encoding function must be uniquely decodable, that is, an encoder-output bit vector represents a unique entry . Constraining the decoding function to using only space implies using two codes: for and for , because joint encoding of in general requires space for decoding. We then associate an entry encoding function with the pair of codes . The entry decoding function must first “parse” the vector to two parts: one used to decode and one to decode . An effective way to allow this parsing is by selecting the code for to be a prefix code, defined next.

Definition 6 (Prefix Code).

For a set of elements , a code is called a prefix code if in its codeword set no codeword is a prefix (start) of any other codeword.

Thanks to using a prefix code for , the decoder can read the length- vector from left to right and unequivocally find the last bit representing , and in turn the remaining bits representing . Hence in the remainder of the section we restrict the entry encoding functions to have a constituent prefix encoder for their first field. This restriction of the encoder to be prefix is reminiscent of the classical result in information theory by Kraft-McMillan [13] that prefix codes are sufficient for optimal fixed-to-variable uniquely decodable compression. In our case we cannot prove that encoding functions with a prefix code for are always sufficient for optimality, but we also could not find better than prefix encoding functions that attain the space constraint at the decoder. To be formally correct, we call an entry coding scheme optimal in this section if it has the maximum success probability among coding schemes whose first code is prefix.

For unique decodability in the second field also, we use a special code property we call padding invariance.

Definition 7 (Padding-Invariant Code).

A code is called a padding-invariant code if after eliminating the trailing (last) zero bits from its codewords (possibly including the empty codeword of length 0), distinct binary vectors are obtained.

Example 2.

Consider the following codes: with a set of codewords , with and with . The code is a prefix code since none of its codewords is a prefix of another codeword. It is also a padding-invariant code since after eliminating the trailing zeros, the four distinct codewords (the empty codeword), and are obtained. While is also a padding-invariant code (the codewords are distinct), it is not a prefix code since 1 is a prefix of 110 and is a prefix of 1 and 110. The code is neither a prefix code (1 is a prefix of 10) nor a padding-invariant code (the codeword 1 is obtained after eliminating the trailing zero bits of 1 and 10).

We now show how a prefix code and a padding-invariant code are combined to obtain a uniquely-decodable entry encoding function. The following is not yet an explicit construction, but rather a general construction method we later specialize to obtain an optimal construction.

Construction 0.

Let be a prefix code and be a padding-invariant code. The vector output by the entry encoding function is defined as follows. The entry encoding function first maps to a word of and to a word of . Then the vector is formed by concatenating the codeword of to the right of ’s codeword, and padding the resulting binary vector with zeros on the right to get bits total. If the two codewords exceed bits before padding, the entry encoding function returns .

We show that because is a prefix code while is a padding-invariant code (not necessarily prefix), the entry encoding function of Construction III-A is uniquely decodable. It is easy to see also that padding invariance of the second code is necessary to guarantee that.

Property 1.

Let be an entry encoding function specified by Construction III-A. Let be two entries composed of elements , . For representing concatenation, let and be the two length- vectors output by the entry encoding function . If , then necessarily , i.e., .

To show the correctness of this property we explain why the decoding of a coded entry is unique.

Proof.

We first explain that there is a single element whose encoding is a prefix of . Assume the contrary, and let be two (distinct) elements such that and are both prefixes of . It then follows that either is a prefix of or is a prefix of . Since and is a prefix code, both options are impossible, and we must have that there is indeed a single element whose encoding is a prefix of . Thus necessarily and . By eliminating the first identical bits that stand for from , we are left with the encoding of the second field, possibly including zero trailing bits. By the properties of the padding-invariant code , it has at most a single element that maps to the bits remaining after eliminating any number of the trailing zero bits. We identify this codeword as , and deduce the only possible element of this field that has this codeword. ∎

To decode an encoded entry composed of the encodings , , possibly with some padded zero bits, a decoder can simply identify the single element whose encoding is a prefix of the encoded entry. As is a prefix code, there cannot be more than one such element. The decoder can then identify the beginning of the second part of the encoded entry, which encodes a unique codeword of with some trailing zero bits. This is necessarily from which can be derived.

Toward finding an entry coding scheme with maximal encoding-success probability, we would now like to find codes and for Construction III-A that are optimal given an entry distribution and the output length . Since the direct joint computation of optimal and codes is difficult, we take a more indirect approach to achieve optimality. 1) we first derive an efficient algorithm to find an optimal prefix code for given a code for , and then 2) we find a (padding-invariant) code for that is universally optimal for any code used for . This way we reduce the joint optimization of , (hard) to a conditional optimization of given (easier). We prove that this conditionally optimal is also unconditionally optimal. To do so, we show that the code assumed in the conditioning is optimal for any choice of . These two steps are detailed in the next two sub-sections, and later used together to establish an optimal coding scheme.

input : Entry distribution , memory width , code
output : Conditionally optimal prefix code
initialization:
foreach  do  ;
foreach  do  ;
intermediate solutions:
for  do
   for  do
    
     if outer max attained by first argument then
       = , where is the that attains the inner max
       else
        
         end if
        
         end for
        
         end for
        Calculate prefix code based on codeword lengths given by
         output results:
         Encoding success probability
Optimal conditional prefix code
Algorithm 1 Optimal Conditional Prefix Code

Iii-B An optimal prefix code for conditioned on a code for

Our objective in this sub-section is to find a prefix code that maximizes the encoding success probability given a code for the second field. We show constructively that this task can be achieved with an efficient algorithm.

We denote for and . Finding an optimal coding scheme when is easy. In that case we could allocate bits to and bits to and apply two independent fixed-length codes with a total of bits and obtain a success probability . Thus we focus on the interesting case where .

Given a code for , we show a polynomial-time algorithm that finds an optimal conditional prefix code for . This code will have an encoding function maximizing the probability given , when is restricted to be prefix. Then, we also say that with its corresponding decoding function is an optimal conditional coding scheme.

To build the code we assign codewords to elements of , where . Clearly, all such codewords have lengths of at most bits. For a (single) binary string , let denote the length in bits of . Since for every element which is assigned a codeword the code satisfies , it holds that must be an integer multiple of . We define the weight of a codeword of length as the number of units of in , denoted by . The elements of not represented by are said to have length and codeword weight zero. A prefix code exists with prescribed codeword lengths if they satisfy Kraft’s inequality [14]. In our terminology, this means that the sum of weights of the codewords of need be at most .

Definition 8.

Consider entries composed of an element from the first (highest probability) elements of (for ), and an arbitrary element of . For and , we denote by the maximal sum of probabilities of such entries that can be encoded successfully by a prefix code whose sum of weights for the first codewords is at most . Formally,

(2)

where is the indicator function. Note that depends on the conditioned , but we keep this dependence implicit to simplify notation.

The following theorem relates the maximal success probability of a conditional coding scheme and the function .

Theorem 1.

The maximal success probability of a conditional coding scheme is given by

Proof.

To satisfy Kraft’s inequality we should limit the sum of weights to . In addition, the success probability of the coding scheme with an encoding function is calculated based on entries with any number of the elements of . ∎

We next show how to compute efficiently for all , in particular for , that yield the optimal conditional . To do that, we use the following recursive formula for . First note the boundary cases for and for (this means an invalid code). We can now present the formula of that calculates its values for based on the values of the function for and .

Lemma 2.

The function satisfies for

(3)
Proof.

The optimal code that attains either assigns a codeword to or does not. The two arguments of the outer max function in (2) are the respective success probabilities for these two choices. In the former case we consider all possible lengths of the codeword of . A codeword length of reduces the available sum of weights for the first elements by . In addition, an entry contributes to the success probability the value if its encoding width (given ) is at most . In the latter case the element does not contribute to the success probability and has no weight, hence in this case. ∎

Finally, the pseudocode of the dynamic-programming algorithm that finds the optimal conditional code based on the above recursive formula is given in Algorithm 1. It iteratively calculates the values of . It also uses a vector to represent the codeword lengths for the first at most elements of in a solution achieving .

Time Complexity: By the above description, there are iterations and in each values are calculated, each by considering sums of elements. It follows that the time complexity of the algorithm is , which is polynomial in the size of the input. The last equality follows from the fact that in a non-trivial instance where some entries fail encoding.

Iii-C A universally optimal code for

We now develop the second component required to complete an unconditionally optimal entry coding scheme: a code for the elements that is optimal for any code used for the elements.

To that end we now define the natural notion of a monotone coding scheme, in which higher-probability elements are assigned codeword lengths shorter or equal to lower-probability ones.

Definition 9 (Monotone Coding Scheme).

A coding scheme with an encoding function of an entry distribution is called monotone if is satisfies

(i) for , implies that for two elements .

(ii) for , the elements that are not assigned a codeword in are the last elements in (if any).

It is an intuitive fact that without loss of generality an optimal entry coding scheme can be assumed monotone. We prove this in the following.

Lemma 3.

For any entry distribution , and any memory width , there exists a monotone optimal coding scheme.

Proof.

We show how to build an optimal monotone coding scheme based on any optimal coding scheme with an encoding function . For consider two arbitrary indices that satisfy . Then necessarily . If codewords are assigned to the two elements and , we can replace by a new code obtained by permuting the two codewords of . With this change, an entry with is encoded successfully after the change if the corresponding entry with was encoded successfully before the change. Then, we deduce that such a change cannot decrease , and the result follows. In addition, if some elements are not assigned codewords, the same argument shows that these elements should be the ones with the smallest probabilities. ∎

The optimality of the code to be specified for will be established by showing it attains the following upper bound on success probability.

Proposition 4.

Given any code , the encoding success probability of any entry encoding function is bounded from above as follows.

(5)

where is the highest index of an element that is assigned a codeword by .

Proof.

First by the monotonicity property proved in Lemma 3, assigns codewords to elements with indices , for some . Hence for indices greater than the success probability is identically zero. Given an element with a codeword of length , at most elements of can be successfully encoded with it in an entry. So the inner sum in (5) is the maximal success probability given . Summing over all and multiplying by gives the upper bound. ∎

It turns out that there exists a padding-invariant code that attains the upper bound of Proposition 4 with equality for any code . For the ordered set , let the encoder of map to the shortest binary representation of for , and to for . The binary representation is put in the codeword from left to right, least-significant bit (LSB) first. Then we have the following.

Proposition 5.

Given any code , the encoding success probability of is

(6)

where is the highest index of an element that is assigned a codeword by .

Proof.

If the codeword of uses bits, there are bits left vacant for . The mapping specified for allows encoding successfully the first elements of , which gives the stated success probability. ∎

In particular, when the encoding of has length , the single element is encoded successfully by the empty codeword . Other examples are the two codewords (, 1) when , and the four codewords (, 1, 01, 11) when . It is clear that the code is padding invariant, because its codewords are minimal-length binary representations of integers. Now we are ready to specify the optimal entry coding scheme in the following.

Construction 1.

Given and , let be the prefix code obtained by applying Algorithm 1 on the code . Then the entry encoding function is defined by applying Construction III-A on ,.

input : Entry distribution , memory width
output : Optimal coding scheme
calculation:
Calculate padding invariant code by assigning the elements the codewords that correspond to
Calculate prefix code by Algorithm 1 given the code
output results:
An optimal coding scheme with an encoding function
Algorithm 2
Theorem 6.

For any entry distribution and a memory width , Construction 1 gives an optimal entry coding scheme, that is, a coding scheme that maximizes the success probability among all uniquely-decodable coding schemes with a prefix code in the first field.

From Theorem 6 we can readily obtain an efficient algorithm finding an optimal two-field entry encoding, which is given in Algorithm 2. Optimality is proved up to the assumption that the first code is a prefix code. It is not clear how one can obtain better codes than those with a prefix , while keeping the dictionary size .

For the special case when is , the recursive formula for the calculation of the function can be simplified as follows.

Lemma 7.

When is , the function satisfies for

(7)

It is easily seen that (7) is obtained from (3) by replacing the indicator function with the partial sum that accommodates all the elements that have a short enough representation to fit alongside the element.

The following example illustrates Construction 1 on the entry distribution from the Introduction.

Example 3.

Consider the entry distribution from Table I with . The width parameter is . For the ordered set , we select the code by mapping to and (for ) to the shortest binary representation of . Then, and . The code is a padding-invariant code. To get the prefix code we apply Algorithm 1 on the code . We recursively calculate the values of and for , . In particular, for each value of the values are calculated based on the previous value of . The values are listed in Table II. Each column describes a different value of . (Whenever values of and are not shown, a specific value of does not improve the probability achieved for a smaller value of in the same column.) The value of implies a restriction on the values of the codeword lengths. If the lengths of the codewords are described by a set , they must satisfy , i.e. .

We first explain the values for , considering the contribution to the success probability of data entries with as the first element. This happens w.p. . For we must have , i.e. the element a is assigned a codeword of length 4. Then, there is a single pair that can be encoded successfully and is given by is . Likewise, for , we can have a codeword of length 3 for and the two pairs can be encoded within bits, such that . If we cannot further decrease the codeword length and improve the success probability. For we can have a codeword length of 2 bits, as described by . This enables encoding successfully the three pairs with a success probability of 0.4 as given by . The values for larger values of are calculated in a similar manner based on the recursive formulas. The optimal codeword lengths for are given by . This enables to encode successfully all pairs besides , achieving as given by .

Finally, by applying Construction III-A on , we obtain the entry encoding function .

0 0 0 0 0
(-) (-,-) (-,-,-) (-,-,-,-) (-,-,-,-,-)
0.2 0.2 0.2 0.2 0.2
(4) (4,-) (4,-,-) (4,-,-,-) (4,-,-,-,-)
0.32 0.35 0.35 0.35 0.35
(3) (4,4) (4,4,-) (4,4,-,-) (4,4,-,-,-)
0.47 0.47 0.47 0.47
(3,4) (3,4,-) (3,4,-,-) (3,4,-,-,-)
0.4 0.56 0.56 0.56 0.56
(2) (3,3) (3,3,-) (3,3,-,-) (3,3,-,-,-)
0.64 0.64 0.64
(3,3,4) (3,3,4,-) (3,3,4,-,-)
0.64 0.688 0.688 0.688
(2,3) (3,3,3) (3,3,3,-) (3,3,3,-,-)
0.72 0.728 0.728
(2,3,4) (3,3,3,4) (3,3,3,4,-)
0.7 0.768 0.768 0.768
(2,2) (2,3,3) (2,3,3,-) (2,3,3,-,-)
0.78 0.808 0.808
(2,2,4) (2,3,3,4) (2,3,3,4,-)
0.828 0.832 0.838
(2,2,3) (2,3,3,3) (2,3,3,4,4)
0.868 0.868
(2,2,3,4) (2,2,3,4,-)
0.86 0.892 0.898
(2,2,2) (2,2,3,3) (2,2,3,4,4)
0.9 0.922
(2,2,2,4) (2,2,3,3,4)
0.924 0.94
(2,2,2,3) (2,2,3,3,3)
0.954
(2,2,2,3,4)
0.94 0.972
(2,2,2,2) (2,2,2,3,3)
TABLE II: The values of (top of each pair in the table) and (bottom) for , for Example 3 with . The optimal value of is given by and the codeword lengths for by .

Iv Optimal Entry Encoding with the Same Code for Both Fields

In this section we move to study the problem of entry coding schemes for the special case where we require that both fields use the same code. It is commonly the case that both fields have the same element distribution, and then using one code instead of two can cut the dictionary storage by half. In practice this can offer a significant cost saving. Throughout this section we thus assume that the fields have the same distribution, but the results can be readily extended to the case where the distributions are different and we still want a single code with optimal success probability. Formally, in this section our problem is to efficiently design a single code that offers optimal encoding success probability in a width- memory.

In the special case of a single distribution we have an element distribution = , and the entry distribution is . Now the space constraint for the decoder (to hold the dictionary) is to be of size at most . This means that we need to find one code for that will be used in both fields. To be able to parse the two fields of the entry, needs to be a prefix code.

Iv-a Observations on the problem

Before moving to solve the problem, it will be instructive to first understand the root difficulty in restricting both fields to use the same code. If we try to extend the dynamic-programming solution of Section III to the single-distribution,single-code case, we get the following maximization problem

(8)

where we adapted the expression from (2) to the case of a single distribution and a single code. But now trying to extend the recursive expression for in (3) gives

(9)

which cannot be used by the algorithm because the indicator function now depends on lengths of codewords that were not assigned yet, namely for the elements . So even though we now only have a single code to design, this task is considerably more challenging than the conditional optimization of Section III-B. At this point the only apparent route to solve (8) is by trying exhaustively all length assignments to satisfying Kraft’s inequality, and enumerating the arguments of the max function in (8) directly. But this would be intractable.

Iv-B Efficient algorithm for optimal entry encoding

In the remainder of the section we show an algorithm that offers an efficient way around the above-mentioned difficulty to assign codeword lengths to elements. We present this efficient algorithm formally, but first note its main idea.
The main idea: we showed in Section IV-A that it is not possible to maximize the single-code success probability for elements given the optimal codeword lengths for elements. So it does not work to successively add elements to the solution while maintaining optimality as an invariant. But fortunately, it turns out that it does work to successively add codeword lengths to the solution while maintaining optimality as an invariant. The subtle part is that the lengths need to be added in a carefully thought-of sequence, which in particular, is not the linear sequence or its reverse-ordered counterpart. We show that if the codeword lengths are added in the order of the sequence

(10)

(for even111For convenience we assume that is even, but all the results extend to odd . ), then for any sub-sequence we can maximize the success probability given the optimal codeword lengths taken from the sub-sequence that is one shorter. For example, when our algorithm will first find an optimal code only using codeword length ; based on this optimum it will find an optimal code with lengths and , and then continue to add the codeword lengths in that order.

We now turn to a more formal treatment of the algorithm. We first define the function holding the optimal success probabilities for sub-problems of the problem instance. The following Definition 10 is the adaptation of Definition 8 to the sequence of codeword lengths applicable in the single-code case.

Definition 10.

Consider assignments of finite codeword lengths to the consecutive elements from , where the lengths are assigned from the values taken from the sub-sequence of (10) that ends with . For we denote by the maximal success probability for such an assignment whose sum of weights for these codewords is at most . Formally,

(11)

The following two theorems are the key drivers of the efficient dynamic-programming algorithm finding the optimal code.

Theorem 8.

Let for some integer . For the length subsequent to in the sequence (10) , we have the following

(12)

where we define

(13)
Proof.

Given maximal values for all values of and with lengths up to in the sequence, the maximal value when is also allowed is obtained by assigning length to between and elements in the range . By the monotonicity of , which is higher than all previous lengths must be assigned to the highest indices in the range . Thus for each the success probability is the value of for the corresponding range of elements with the residual weight . In particular, the elements assigned length do not add to the success probability, because plus any length in the sub-sequence up to exceeds . In the extreme case when (all elements assigned length ), the definition (13) when appearing in the right-hand side of (12) gives a valid assignment with success probability if in the left-hand side is sufficiently large. ∎

input : Element distribution , memory width
output : Prefix code with optimal entry-encoding success probability
initialization:
foreach  do  for all indices and ;
foreach  do  for all indices and ;
codewords of length :
foreach ,  do
   for  do