Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching

Rana Ali Amjad and Georg Böcherer This work was supported by the German Ministry of Education and Research in the framework of an Alexander von Humboldt Professorship. Institute for Communications Engineering
Technische Universität München, Germany

Fixed-to-variable length (f2v) matchers are used to reversibly transform an input sequence of independent and uniformly distributed bits into an output sequence of bits that are (approximately) independent and distributed according to a target distribution. The degree of approximation is measured by the informational divergence between the output distribution and the target distribution. An algorithm is developed that efficiently finds optimal f2v codes. It is shown that by encoding the input bits blockwise, the informational divergence per bit approaches zero as the block length approaches infinity. A relation to data compression by Tunstall coding is established.

I Introduction

Distribution matching considers the problem of mapping uniformly distributed bits to symbols that are approximately distributed according to a target distribution. In difference to the simulation of random processes [1] or the exact generation of distributions [2], distribution matching requires that the original bit sequence can be recovered from the generated symbol sequence. We measure the degree of approximation by the normalized informational divergence (I-divergence), which is an appropriate measure when we want to achieve channel capacity of noisy and noiseless channels [3, Sec. 3.4.3 & Chap. 6] by using a matcher. A related work is [4],[3, Chap. 3], where it is shown that variable-to-fixed length (v2f) matching is optimally done by geometric Huffman coding and the relation to fixed-to-variable length (f2v) source encoders is discussed. In the present work, we consider binary distribution matching by prefix-free f2v codes.

I-a Rooted Trees With Probabilities

12435761243576(a) tree probabilities defined by a leaf distribution(b) tree probabilities defined by branching distributions
Fig. 1: A rooted tree with probabilities. The set of branching nodes is and the set of leaf nodes is . In (a), a uniform leaf distribution is chosen, i.e., for each , . The leaf distribution determines the node probabilities and branching distributions. In (b), identical branching distributions are chosen, i.e., for each , , where and . The branching distributions determine the resulting node probabilities, which we denote by . Since the tree is complete, and defines a leaf distribution.

We use the framework of rooted trees with probabilities [5],[6]. Let be the set of all binary trees with leaves and consider some tree . Index all nodes by the numbers where is the root. Note that there are at least nodes in the tree, with equality if the tree is complete. A tree is complete if any right-infinite binary sequence starts with a path from the root to a leaf. Let be the set of leaf nodes and let be the set of branching nodes. Probabilities can be assigned to the tree by defining a distribution over the paths through the tree. For each , denote by the probability that a path is chosen that passes through node . Since each path ends at a different leaf node, defines a leaf distribution, i.e., . For each branching node , denote by the branching distribution, i.e., the probabilities of branch and branch after passing through node . The probabilities on the tree are completely defined either by defining the branching distributions or by defining the leaf distribution . See Fig. 1 for an example.

I-B v2f Source Encoding and f2v Distribution Matching

Consider a binary distribution with , , , and a binary tree with leaves. Let , , be the node probabilities that result from having all branching distributions equal to , i.e. for each . See Fig. 1(b) for an example. Let be a uniform leaf distribution, i.e., for each , see Fig. 1(a) for an example. We use the tree as a v2f source code for a discrete memoryless source (DMS) . To guarantee lossless compression, the tree for a v2f source encoder has to be complete. Consequently, defines a leaf distribution, i.e., . We denote the set of complete binary trees with leaves by . Each code word consists of bits and the resulting entropy rate at the encoder output is


where is the entropy of the leaf distribution defined by and where is defined accordingly. From (1), we conclude that the objective is to solve


The solution is known to be attained by Tunstall coding [7]. The tree in Fig. 1 is a Tunstall code for , and and the corresponding v2f source encoder is


The dual problem is f2v distribution matching. is now a binary target distribution and we generate the codewords defined by the paths through a (not necessarily complete) binary tree uniformly according to . For example, the f2v distribution matcher defined by the tree in Fig. 1 is


Denote by the path lengths and let be a random variable that is uniformly distributed over the path lengths according to . We want the I-divergence per output bit of and to be small, i.e., we want to solve


In contrast to (2), the minimization is now over the set of all (not necessarily complete) binary trees with leaves. Note that although for a non-complete tree we have , the problem (5) is well-defined, since there is always a complete tree with leaves and . The sum in (5) is over the support of , which is . Solving (5) is the problem that we consider in this work.

I-C Outline

In Sec. II and Sec. III, we restrict attention to complete trees. We show that Tunstall coding applied to minimizes and that iteratively applying Tunstall coding to weighted versions of minimizes . In Sec. IV we derive conditions for the optimality of complete trees and show that the I-divergence per bit can be made arbitrarily small by letting the blocklength approach infinity. Finally, in Sec. V, we illustrate by an example that source decoders are sub-optimal distribution matchers and vice-versa, distribution dematchers are sub-optimal source encoders.

Ii Minimizing I-divergence

Let be the set of real numbers. For a finite set , we say that is a weighted distribution if for each , . We allow for . The I-divergence of a distribution and a weighted distribution is


where denotes the support of . The reason why we need this generalization of the notion of distributions and I-divergence will become clear in the next section.

Proposition 1.

Let be a weighted binary target distribution, and let


be an optimal complete tree. Then we find that

  • An optimal complete tree can be constructed by applying Tunstall coding to .

  • If and , then also minimizes among all possibly non-complete binary trees , i.e., the optimal tree is complete.


Part i. We write


and hence


Consider now an arbitrary complete tree . Since the tree is complete, there exist (at least) two leaves that are siblings, say and . Denote by the corresponding branching node. The contribution of these two leaves to the objective function on the right-hand side of (9) can be written as


Now consider the tree that results from removing the nodes and . The new set of leaf nodes is and the new set of branching nodes is . Also defines a weighted leaf distribution on . The same procedure can be applied repeatedly by defining , until consists only of the root node. We use this idea to re-write the objective function of the right-hand side of (9) as follows.


Since is a constant independent of the tree , we have


The right-hand side of (12) is clearly maximized by the complete tree with the branching nodes with the greatest weighted probabilities. According to [8, p. 47], this is exactly the tree that is constructed when Tunstall coding is applied to the weighted distribution .

Part ii. We now consider and . Assume we have constructed a non-complete binary tree. Because of non-completeness, we can remove a branch from the tree. Without loss of generality, assume that this branch is labeled by a zero. Denote by the leaves on the subtree of the branch. Denote the tree after removing the branch by . Now,


where the inequality follows because by assumption . Thus, for the new tree , the objective function (II) is bounded as


In summary, under the assumption and , the objective function (II) that we want to maximize does not decrease when removing branches, which shows that there is an optimal complete tree. This proves the statement ii. of the proposition. ∎

Iii Minimizing I-divergence Per Bit

The following two propositions relate the problem of minimizing the I-divergence per bit to the problem of minimizing the un-normalized I-divergence.

Let be some set of binary trees with leaves and define

Proposition 2.

We have


where is the weighted distribution induced by .


By (15), for any tree , we have


We write the left-hand side of (18) as


Consider the path through the tree that ends at leaf . Denote by and the number of times the labels and occur, respectively. The length of the path can be expressed as . The term can now be written as


Using (20) and (19) in (18) shows that for any binary tree we have


which is the statement of the proposition. ∎

Proposition 3.



Then the optimal complete tree


is constructed by applying Tunstall coding to .


The proposition is a consequence of Prop. 2 and Prop. 1.i. ∎

Iii-a Iterative Algorithm

By Prop. 3, if we know the I-divergence , then we can find by Tunstall coding. However, is not known a priori. We solve this problem by iteratively applying Tunstall coding to , where is an estimate of and by updating our estimate. This procedure is stated in Alg. III-A.

  • Algorithm 1.
    solved by Tunstall coding on
    2. Tunstall on

Proposition 4.

Alg. III-A finds as defined in Prop. 3 in finitely many steps.


The proof is similar to the proof of [3, Prop. 4.1].

We first show that is strictly monotonically decreasing. Let be the value that is assigned to in step 1. of the th iteration and denote by the value that is assigned to in step 2. of the th iteration. Suppose that the algorithm does not terminate in the th iteration. We have


By step 2, we have


and since by our assumption the algorithm does not terminate in the th iteration, we have


Now assume the algorithm terminated, and let be the tree after termination. Because of the assignments in steps 1. and 2., the terminating condition implies that for any tree , we have


Consequently, we have


We conclude that after termination, is equal to the optimal tuple in Prop. 3.

Finally, we have shown that is strictly monotonically decreasing so that for all . But there is only a finite number of complete binary trees with leaves. Thus, the algorithm terminates after finitely many steps. ∎

Iv Optimality of Complete Trees

Complete trees are not optimal in general: Consider and . For , Tunstall coding constructs the (unique) complete binary tree with leaves, independent of which target vector we pass to it. The path lengths are . The I-divergence per bit achieved by this is


Now, we could instead use a non-complete tree with the paths and . In this case, I-divergence per bit is


In summary, for the considered example, using a complete tree is sub-optimal. We will in the following derive simple conditions on the target vector that guarantee that the optimal tree is complete.

Iv-a Sufficient Conditions for Optimality

Proposition 5.

Let be a distribution. If , then the optimal tree is complete for any and it is constructed by Alg. III-A.


According to Prop. 1.ii, the tree that minimizes is complete if the entries of the weighted distribution are both less than or equal to one. Without loss of generality, assume that . Thus, we only need to check this condition for . We have


We calculate the value of that is achieved by the (unique) complete tree with leaves, namely


For each , this is achieved by the complete tree with all path lengths equal to . Substituting the right-hand side of (32) for in (31), we obtain


which is the condition stated in the proposition. ∎

Iv-B Asymptotic Achievability for Complete Trees

Proposition 6.

Denote by the complete tree with leaves that is constructed by applying Alg. III-A to a target distribution . Then we have


and in particular, the I-divergence per bit approaches zero as .


The expected length can be bounded by the converse of the Coding Theorem for DMS [8, p. 45] as


Thus, we have


The tree that minimizes the right-hand side is found by applying Tunstall coding to . Without loss of generality, assume that . According to the Tunstall Lemma [8, p. 47], the induced leaf probability of a tree constructed by Tunstall coding is lower bounded as


We can therefore bound the I-divergence as


We can now bound the I-divergence per bit as


This proves the proposition. ∎

Iv-C Optimality of Complete Trees for Large Enough

Proposition 7.

For any target distribution with and , there is an such that for all , the tree that minimizes


is complete.


Without loss of generality, assume that . By Prop. 6, we have . Thus, there exists an such that


Thus, for all , both entries of are smaller than . The proposition now follows by Prop. 2 and Prop. 1.ii. ∎

Tunstall on Alg. III-A on
v2f source encoder
redundancy 0.038503 0.04176
f2v distribution matcher
I-divergence per bit 0.039206 0.037695
TABLE I: Comparison of v2f source coding and f2v distribution matching: ;

V Source Coding Versus Distribution Matching

An ideal source encoder transforms the output of a DMS into a sequence of bits that are independent and uniformly distributed. Reversely, applying the corresponding decoder to a sequence of uniformly distributed bits generates a sequence of symbols that are iid according to . This suggests to design a f2v distribution matcher by first calculating the optimal v2f source encoder. The inverse mapping is f2v and can be used as a distribution matcher.

We illustrate by an example that this approach is sub-optimal in general. Consider the DMS with We calculate the optimal binary v2f source encoder with blocklength by applying Tunstall coding to . The resulting encoder is displayed in the 1st column of Table I. Using the source decoder as a distribution matcher results in an I-divergence per bit of bits. Next, we use Alg. III-A to calculate the optimal f2v matcher for . The resulting mapping is displayed in the 2nd column of Table I. The achieved I-divergence per bit is bits, which is smaller than the value obtained by using the source decoder.

In general, the decoder of an optimal v2f source encoder is a sub-optimal f2v distribution matcher and the dematcher of an optimal v2f distribution matcher is a sub-optimal v2f source encoder.


  • [1] Y. Steinberg and S. Verdu, “Simulation of random processes and rate-distortion theory,” IEEE Trans. Inf. Theory, vol. 42, no. 1, pp. 63–86, 1996.
  • [2] D. Knuth and A. Yao, The complexity of nonuniform random number generation.   New York: Academic Press, 1976, pp. 357–428.
  • [3] G. Böcherer, “Capacity-achieving probabilistic shaping for noisy and noiseless channels,” Ph.D. dissertation, RWTH Aachen University, 2012. [Online]. Available:
  • [4] G. Böcherer and R. Mathar, “Matching dyadic distributions to channels,” in Proc. Data Compression Conf., 2011, pp. 23–32.
  • [5] R. A. Rueppel and J. L. Massey, “Leaf-average node-sum interchanges in rooted trees with applications,” in Communications and Cryptography: Two sides of One Tapestry, R. E. Blahut, D. J. Costello Jr., U. Maurer, and T. Mittelholzer, Eds.   Kluwer Academic Publishers, 1994.
  • [6] G. Böcherer, “Rooted trees with probabilities revisited,” Feb. 2013. [Online]. Available:
  • [7] B. Tunstall, “Synthesis of noiseless compression codes,” Ph.D. dissertation, 1967.
  • [8] J. L. Massey, “Applied digital information theory I,” lecture notes, ETH Zurich. [Online]. Available:
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description