# Modified Rice-Golomb Code for Predictive Coding of Integers with Real-valued Predictions

###### Abstract

Rice-Golomb codes are widely used in practice to encode integer-valued prediction residuals. However, in lossless coding of audio, image, and video, specially those involving linear predictors, the predictions are from the real domain. In this paper, we have modified and extended the Rice-Golomb code so that it can operate at fractional precision to efficiently exploit the real-valued predictions. Coding at arbitrarily small precision allows the residuals to be modeled with the Laplace distribution instead of its discrete counterpart, namely the two-sided geometric distribution (TSGD). Unlike the Rice-Golomb code, which maps equally probable opposite-signed residuals to different integers, the proposed coding scheme is symmetric in the sense that, at arbitrarily small precision, it assigns codewords of equal length to equally probable residual intervals. The symmetry of both the Laplace distribution and the code facilitates the analysis of the proposed coding scheme to determine the average code-length and the optimal value of the associated coding parameter. Experimental results demonstrate that the proposed scheme, by making efficient use of real-valued predictions, achieves better compression as compared to the conventional scheme.

## I Introduction

Prediction, which involves estimating the outcome of a data source given some past observations, is an effective tool for data compression. Consider the sequential encoding of an integer-valued source , , over the alphabet , on a symbol-by-symbol basis. In predictive coding, given the previously encoded data , the value of is predicted as and the residual is encoded using an entropy code such as Huffman code or Rice-Golomb code. Since the previously encoded data are already available at the decoder, it can make the same prediction and reconstruct . The beneficial effect of prediction is that it decorrelates the data in the sense that the entropy of the residuals is significantly lower than the entropy of the original sequence.

Although nonlinear predictions place no restriction on the predictor and thus can achieve better performance, linear predictions that restrict the predictor to be a linear function of the previously encoded data are much simpler to construct and analyze. Therefore, linear predictors are widely used in practice. In order linear predictive coding, after having observed the past data sequence , the value of is predicted as a linear combination of the previous values as

(1) |

where , are real-valued predictor coefficients that need to be optimized. The most common measure of the performance of a predictor is the mean squared error (MSE). Therefore, the coefficients ’s that minimize the MSE, i.e., , are considered optimal. In practice, however, ’s are learnt by minimizing the MSE over a window of size such that

(2) |

Note that even if the source values ’s are integers, the predictor coefficients ’s are drawn from the real domain leading to real-valued predictions and residuals.

Conventional predictive coding techniques round the real-valued prediction ^{1}^{1}1The subscript has been dropped for notational convenience. to the nearest integer and then encode the integer residual

(3) |

Given that is also available at the decoder, we can restrict the number of possible values for to by taking into account the fact that can only take values from the alphabet (see [1]). Now encoding of using an entropy coder requires the knowledge of the probability distribution of the residuals. Therefore, in sequential symbol by symbol coding, the probabilities of the possible residual values are also needed to be estimated adaptively. However, for a large alphabet there might not be sufficient number of data in practice to reliably estimate these probabilities. Hence, in practical applications, a parametric representation of the probability distribution of the residuals is often preferred [1, 2].

It has been observed that the distributions of the real-valued prediction residuals in audio [3], image [4], and video [5, 6] coding highly peak at zero, that can be closely approximated by the Laplace distributions. A Laplace distribution, which sharply peaks at zero, is defined by the following probability density function (pdf),

(4) |

Here is a scale parameter which controls the two-sided decay rate. Since conventional predictive coding schemes encode the integer-valued residuals, they model the distribution of using a‘discrete analog’ of the Laplace distribution. A discrete analog of a continuous distribution with pdf has been proposed in [7] as a discrete distribution supported on the set having the probability mass function (pmf)

(5) |

It follows from (4) and (5) that the pmf of the discrete analog of the Laplace distribution takes the form,

(6) |

which is known as the two-sided geometric distribution (TSGD) in the literature [2].

Popular prefix coding schemes use Golomb codes [8] to exploit the exponential decaying in the pmf of integer residuals. However, Golomb codes are optimal [9] for the one-sided geometric distribution (OSGD) of the form

(7) |

Given a positive integer parameter , the Golomb code of has two parts: the prefix in unary representation and the reminder of that division, , in minimal binary representation. The unary representation of a non-negative integer consists of number of ‘1’s followed by a ‘0’. The minimal binary representation of a non-negative integer from the alphabet uses bits when or bits otherwise. For a given , the optimal value of the parameter is given by [9]

(8) |

Since Golomb codes are defined for non-negative integers only, popular Golomb-based codecs map the integer residual into a unique non-negative integer prior to encoding by the following overlap and interleave scheme originally proposed by Rice in [10],

(9) |

Although Golomb codes are optimal for geometrically distributed non-negative integers, the above mentioned scheme with Rice-mapping is not optimal for TSGD (see [2]). Moreover, when the predictions are constrained to integers, a real-valued bias is typically present in the prediction residuals, which introduces a shift parameter in the TSGD model in addition to the parameter (see [1] and [11]). This off-centred TSGD is modeled by the following pmf [2]

(10) |

A complete characterization of the optimal prefix code for the off-centred TSGD has been presented in [2]. It divides the two-dimensional parameter space into four types of region and associates a different code construct with each type. However, the article admits that two dimensionality of the parameter space adds significant complexity to the characterization and analysis of the code. Moreover, being optimal for the off-centred TSGD, these codes do not preclude improving predictive compression gain further if the residuals could be handled in the real domain, minimizing the loss due to rounding.

In this paper, we extend and modify the Rice-Golomb code so that it can handle the real-valued predictions at an arbitrary precision. More specifically, the contribution of the paper can be summarized as follows. Firstly, we generalize the Rice mapping (9) so that it can operate at an arbitrary precision. We then present the complete encoding and decoding algorithms based on the generalized Rice mapping. With the generalized Rice mapping, although the encoding is similar to Rice-Golomb encoding, the decoding is slightly convoluted. One of the salient features of the proposed coding scheme is that it is symmetric, i.e, when operating at finest precision, it assigns codewords of equal length to equally probable residual intervals. Secondly, assuming that the real-valued residuals are Laplace distributed, we analyze the proposed coding scheme and determine the close form expression for the average code length. Thirdly, we determine the relationship between the scale parameter and the optimal value of the coding parameter when the code operates at the finest precision. The analysis of the proposed scheme to determine the average code-length and the optimal value of the associated coding parameter is greatly facilitated by the symmetry in both the Laplace distribution and the code. Although, for the codes operating at other precision, a relationship between and is not readily available, we have demonstrated analytically and experimentally that a sub-optimal strategy incurs negligible redundancy at sufficiently small precision.

The organization of the rest of the paper is as follows. The modified Rice-Golomb code is presented along with a novel generalized Rice mapping and the implementation details of the proposed coding scheme in Section II. In Section III, the proposed scheme is then analyzed to determine its average code-length and the relationship between the scale parameter and the coding parameter at different fractional precisions. Finally, experimental results are presented to demonstrate the efficacy of the proposed coding scheme in Section IV.

## Ii Modified Rice-Golomb code at fractional precision

In conventional predictive coding of integers, the real-valued predictions are first mapped to their nearest integers and then the integer-valued residuals are encoded with an entropy code. In order to extend the Rice-Golomb code to exploit the real-valued prediction at any arbitrary precision, let the scheme operate at precision , where and are positive integers and . Therefore, prior to residual encoding, the prediction is rounded to , which is the integer multiple of nearest to . Standard Rice-Golomb code is then an instance of this extended code with .

Although can take any integer multiple value of , using the fact that unknown is from , the decoder can deduce that the residual will take integer-apart values in the form

(11) |

###### Example 1.

When and , the possible residual values are integer multiples of , i.e., . Let the prediction be , which must be rounded to the nearest multiple of , i.e., to . Once the rounded prediction is fixed to , the possible residual values are , which are of the form .

Now these integer-apart discrete residual values need to be mapped to unique non-negative integers so that they can be encoded using Golomb codes. For an efficient implementation of the code, the mapping also needs to be easy to compute.

### Ii-a Residual mapping

According to the Laplace distribution, small-valued residuals have higher probabilities than those of large-valued residuals. Since Golomb codes assign shorter-length codewords to small-valued non-negative integers, small-valued discrete residuals should be mapped to small-valued non-negative integers. In the above example, should be mapped to , and should me mapped to and so on. More generally, if then . Thus, should be mapped according to the function

(12) |

Similarly, if then and consequently, should be mapped according to the function,

(13) |

The mapping (12) and (13) require an explicit formula for the computation of associated with the residual . It can be shown that

(14) |

Obviously, and . Therefore, it follows from (11) and (14) that

(15) |

and

(16) |

When these values of and are substituted in (12) and (13), both the mappings converge to the following,

(17) |

Indeed this mapping is the generalization of the Rice mapping (9). When , the residual is integer valued and thus . In this case, the mapping (17) transforms into the Rice mapping (9). In the other extreme case of , the prediction is not rounded at all. For this asymptotic case of , we will denote the mapping with .

### Ii-B Encoding and Decoding

Having defined the mapping at precision , the encoding operation is similar to the Rice-Golomb coding, however, the decoding operation is slightly convoluted due to the use of the floor function in the residual mapping (17).

Encoding: Given the parameter value , compute and as follows

(18) | |||

(19) |

Then encode in unary and in minimal binary. The pseudocode of the encoding algorithm is given in Algorithm 1.

Decoding: Given , the decoder can compute . Given , recovering from , however, is not straightforward. In Rice-Golomb coding, by checking whether is even or odd the decoder can decide which of the constituent functions in (9) was used: if is even, it is the first constituent function that was used and ; otherwise the second constituent function was used and . Unlike the Rice mapping (9), where the value of the first constituent function is always even and the second always odd, the value of the both constituent functions in the residual mapping (17) can be even or odd. However, in conjunction with , as explained below, it is possible to deduce from which of the constituent functions was used.

Consider the first constituent function of the residual mapping (17).

(20) | |||||

It follows from (20) that the value of the first constituent function is even (odd) if is even (odd). Now consider the second constituent function of (17)

(21) | |||||

It follows from (21) that the value of the second constituent function is odd (even) if is even (odd). These relationships between and are summarized in Table I.

Even | Odd | ||
---|---|---|---|

Even | |||

Odd |

These relationships now can be used to decide which of the constituent functions in the mapping (17) was used. If both and are even or both are odd then the first constituent function was used; otherwise the second constituent function was used. Having decided on the constituent functions, the decoding now follows from (20) and (21) as

(22) |

The pseudocode of the decoding algorithm is given in Algorithm 2.

## Iii Analysis

In Section III-A, the association of non-negative integers with different residual intervals by the mapping is determined, which aids in computing the average code-length of the proposed coding scheme in Section III-B.

### Iii-a Association of non-negative integers to residual intervals

When operating at precision , a real-valued prediction , where and , is rounded to

(23) |

Then for a given , the residual takes the form

(24) | |||||

Another relevant quantity is , which is even if either or and odd if . Now depending on and , there can be four cases.

Case 1: Both and are even

In this case, for some and either or . If then from Table I it follows that

(25) | |||||

Therefore, the non-negative integer associated with the residual is . For example, when and , the non-negative integer associated with the interval is .

On the other hand, if then from Table I it follows that

(26) | |||||

Therefore, the non-negative integer associated with the residual is where . For example, when and , we only have satisfying the condition . Therefore, must be associated with the interval .

Case 2: is even and is odd

In this case, for some and . Now from Table I it follows that

(27) | |||||

Therefore, the non-negative integer associated with the residual is where . For example, when and , we have and such that . Thus must be associated with the interval .

Case 3: is odd and is even

In this case, for some and either or . If then from Table I it follows that

(28) | |||||

Therefore, the non-negative integer associated with the residual is . For example, when and , the non-negative integer associated with the interval must be .

On the other hand, if then from Table I it follows that

(29) | |||||

Therefore, the non-negative integer associated with the residual is where . For example, when and , we only have such that . Therefore, must be associated with the interval .

Case 4: Both and are odd

In this case, for some and . Now from Table I it follows that

(30) | |||||

Therefore, the non-negative integer associated with the residual is where . For example, when and , we have and satisfying , and thus must be associated with the interval .

The assignment of non-negative integers by the mapping to different intervals of the residual for different values of is depicted in Fig. 1.

### Iii-B Average code-length

It follows from Fig. 1 that when , which corresponds to the Rice mapping (9), the assignment of non-negative integers to different residual intervals is asymmetric as equally probable, opposite-signed intervals are mapped to different integers. This asymmetry results in the assignments of codes of different lengths to equally probable residual intervals. However, this asymmetry reduces at finer precision and the assignment becomes symmetric in the asymptotic case of .

The analysis of the modified Rice-Golomb code becomes simpler for the asymptotic case of due to the symmetric assignment of non-negative integers to the residual intervals. It can be observed from Fig. 1 that the assignment corresponding to is same as the assignment corresponding to but with a left shift of . When is encoded using a Golomb code with a parameter , this left shift is also reflected in the association of code-lengths with different residual intervals. Let for a given , the length of the code associated with in the modified Rice-Golomb code at precision be . In the asymptotic case of , the code-length will be denoted by . Now let us consider two cases depending on whether is a power of or not.

#### Iii-B1 The case

When , the minimal binary representation of always takes bits. In the asymptotic case of , for any residual such that , we get . As the unary representation of requires bits, the length of the code associated with is . Now it follows from Fig. 2 that the code-length associated with the left-shifted intervals and is .

This observation can be used to determine the average code-length of the modified Rice-Golomb code. Given , let the average code-length achieved with the modified Rice-Golomb code, in encoding residuals that are Laplace distributed with parameter , be . Then we have the following theorem.

###### Theorem 2.

If , then

##### Proof

#### Iii-B2 The case

First we determine the code-lengths associated with different intervals for the asymptotic case of . Let . When , the minimal binary representation of takes bits if or bits if . Thus the minimum possible code-length in this scheme is which results when and . Since , this minimum code-length is associated with the interval . Beyond the point , we have and the minimal binary representation of requires bits. However, the value of remains for . Therefore, for . Now, when although becomes whose unary representation requires two bits, the value of becomes for . Hence, for . Since, when , we have , for , we get . This calculation is generalized in the following lemma.

###### Lemma 3.

Given for any integer , if then the code-length associated with is .

##### Proof

See Appendix A-A.

Now for an arbitrary , the minimum code-length is associated with the left-shifted interval and the code-length is associated with the left-shifted intervals and . Using these association of code-lengths to different intervals as depicted in Fig. 3, we can determine the average code-length .

###### Theorem 4.

If for any integer then