Lossless Intra Coding in HEVC with IntegertoInteger DST
Abstract
It is desirable to support efficient lossless coding within video coding standards, which are primarily designed for lossy coding, with as little modification as possible. A simple approach is to skip transform and quantization, and directly entropy code the prediction residual, but this is inefficient for compression. A more efficient and popular approach is to process the residual block with DPCM prior to entropy coding. This paper explores an alternative approach based on processing the residual block with integertointeger (i2i) transforms. I2i transforms map integers to integers, however, unlike the integer transforms used in HEVC for lossy coding, they do not increase the dynamic range at the output and can be used in lossless coding. We use both an i2i DCT from the literature and a novel i2i approximation of the DST. Experiments with the HEVC reference software show competitive results.
Lossless Intra Coding in HEVC with IntegertoInteger DST
Fatih Kamisli 
Middle East Technical University 
Ankara, Turkey 
Index Terms— Image coding, Video Coding, Discrete cosine transforms, Lossless coding, HEVC
1 Introduction
The^{1}^{1}1This research was supported by Grant 113E516 of Tübitak. emerging high efficiency video coding (HEVC) standard [1] or the widely deployed H.264/AVC standard [2] support both lossy and lossless compression. Lossy compression in these standards is achieved with a blockbased approach. First, a block of pixels are predicted using previously coded pixels. Next, the prediction error block is computed and transformed with a block based transform, and the transform coefficients are quantized and entropy coded.
It is desirable to support efficient lossless compression using the lossy coding architecture with as little modification as possible so that encoders/decoders can also support lossless compression without any significant complexity increase. The simplest approach is to just skip the transform and quantization steps, and directly entropy code the block of prediction errors. This approach is indeed used in HEVC version 1 [1]. While this is a simple and lowcomplexity approach, it is inefficient and a large number of approaches have been proposed. A popular method is to retain the standard blockbased prediction and process the residual block with differential pulse code modulation (DPCM) prior to entropy coding [3, 4].
This paper explores an alternative approach for lossless intra coding based on integertointeger (i2i) transforms. Integertointeger transforms map integers to integers. However, unlike the integer transforms used in HEVC for lossy coding, they do not increase the dynamic range at the output and can therefore be easily employed in lossless coding. While there are many papers that employ i2i approximations of the discrete cosine transform (DCT) in lossless image compression [5], we could not come across a work which explores i2i transforms for lossless compression of prediction residuals in video coding standards H.264/AVC or HEVC. This paper uses an i2i DCT from the literature in lossless intra coding and presents improved coding results over the standard. In addition, this paper also derives a novel i2i approximation of the odd type3 discrete sine transform (DST), which is the first in the literature to the best of our knowledge, and this i2i transform improves coding gains further.
The remainder of the paper is organized as follows. In Section 2, a brief overview of related previous research on lossless video compression is provided. Section 3 discusses i2i DCTs and also obtains a novel i2i approximation of the DST. Section 4 presents experimental results by using these i2i transforms within HEVC for lossless intra coding. Finally, Section 5 concludes the paper.
2 Previous Research
2.1 Methods based on residual DPCM
Methods based on residual differential pulse code modulation (RDPCM) first perform the default blockbased prediction and then process the prediction error block further with a DPCM method, i.e. a pixelbypixel prediction method. There are many methods proposed in the literature based on residual DPCM [3, 4].
One of the earliest of such methods was proposed in [3] for lossless intra coding in H.264/AVC. Here, first the blockbased spatial prediction is performed, and then a simple pixelbypixel differencing operation is applied on the residual pixels in only horizontal and vertical intra prediction modes. In horizontal mode, from each residual pixel, its left neighbor is subtracted and the result is the RDPCM pixel of the block. Similar differencing is performed along the vertical direction in the vertical intra mode. Note that the residuals of other angular modes are not processed in [3].
2.2 Methods based on modified entropy coding
In lossy coding, transform coefficients of prediction residuals are entropy coded, while in lossless coding, the prediction residuals are entropy coded. Considering the difference of the statistics of quantized transform coefficients and prediction residuals, several modifications in entropy coding were proposed for lossless coding. The HEVC version 2 includes reversing the scan order of coefficients, using a dedicated context model for the significance map and other tools [6, 7].
3 Integertointeger (i2i) transforms
Integertointeger (i2i) transforms map integer inputs to integer outputs and are invertible. However, unlike the integer transforms in HEVC [8], which also map integers to integers, they do not increase the dynamic range at the output. Therefore they can be easily used in lossless compression.
One possible method to obtain i2i transforms is to decompose a transform into a cascade of plane rotations, and then approximate each rotation with a lifting structure, which can easily map integers to integers.
3.1 Plane rotations and the lifting scheme
A plane rotation can be represented with the 2x2 matrix below, and it can be decomposed into three lifting steps or two lifting steps and two scaling factors [5] as shown below in Equation (1) and in Figure 1.
(1)  
In these equations, the lifting factors are related to the plane rotation parameters as follows :

and

, , and .
Each lifting step can be inverted with another lifting step because
(2) 
In other words, each lifting step is inverted by subtracting out what was added in the forward lifting step. Notice that each lifting step is still invertible even if the multiplication of the input samples with floating point or are rounded to integers, as long as the same rounding operation is applied in both forward and inverse lifting steps. This implies that each lifting step can map integers to integers and is easily invertible.
Notice that each lifting step in the above factorization requires floating point multiplications. To avoid floating point multiplications, the lifting factors and can be approximated with rationals of the form ( and are integers), which can be implemented with only addition and bitshift operations. Note that bitshift operation implicitly includes a rounding operation, which is necessary for mapping integers to integers, as discussed above. and can be chosen depending on the desired accuracy to approximate the plane rotation and the desired level of computational complexity.
3.2 I2i DCT
A significant amount of work on i2i transforms is done to develop i2i approximations of the discrete cosine transform (DCT). Although there are other methods, the most popular method, due its to lower computational complexity, is to utilize the factorization of the DCT into plane rotations and butterfly structures [9, 10, 5].
Two wellknown factorizations of the DCT into plane rotations and butterflies are the Chen’s and Loeffler’s factorizations [9, 10]. Loeffler’s 4point DCT factorization is shown in Figure 2. It contains three butterflies and one plane rotation. Note that the output samples in Figure 2 need to be scaled by to obtain an orthogonal DCT.
The butterfly structures shown in Figure 2 map integers to integers because the output samples are the sum and difference of the inputs. They are also easily invertible by themselves and dividing the output samples by 2.
The plane rotation in Figure 2 can be decomposed into three lifting steps or two lifting steps and two scaling factors as discussed in Section 3.1 to obtain integertointeger mapping. Using two lifting steps per plane rotation reduces the complexity. The two scaling factors can be combined with other scaling factors (if present) at the output, creating a scaled i2i DCT. The scaling factors at the output can be absorbed into the quantization stage in lossy coding. In lossless coding, all scaling factors can be omitted. However, care is needed when omitting scaling factors since for some output samples, the dynamic range may become too high when scaling factors are omitted. For example, in Figure 2, the DC output sample becomes the sum off all input samples when scaling factors are omitted, however, it may be preferable that it is the average of all input samples. This can improve the entropy coding performance. Hence in lossless coding the butterflies of Figure 2 are replaced with lifting steps as shown in Figure 3 [5].
3.3 I2i DST
The odd type3 DST has been shown to provide improved coding performance compared to the DCT for lossy coding of intra prediction residuals [11]. To the best of our knowledge, an i2i approximation of the DST has not appeared in the literature. To develop an i2i DST, we first approximate the DST with a cascade of plane rotations, and approximate these rotations with lifting steps to obtain an i2i DST.
3.3.1 Approximation of DST with plane rotations
To the best of our knowledge an exact factorization of the DST into butterflies and/or plane rotations does not exist in the literature. We obtain an approximation of the DST by cascading multiple plane rotations using a modified version of the algorithm presented in [12].
Chen’s algorithm [12] can be briefly summarized as follows. A transform consisting of cascaded plane rotations is formed iteratively, where in each iteration, a pair of signal nodes is selected and the parameter () for the plane rotation is computed. The pair of nodes and the rotation parameter in each iteration are determined from the autocorrelation matrix of the input signal so that the coding gain of the transform after this plane rotation is maximized. In each iteration, the best rotation angle is determined from a closed form expression for each possible pair of nodes, and then the pair with the best coding gain is selected. The autocorrelation matrix is updated before moving to the next iteration. The iterations stop when a desired level of coding gain is achieved or a desired number of rotations are used.
The autocorrelation of the input signal, which is the intra prediction residual signal () in this paper, has been obtained in previous research based on modeling the image signal with a firstorder Markov process and is given by , where is the correlation coefficient of the Markov process [13, 14]. As , the optimal transform for this autocorrelation is the odd type3 DST [13, 11]. We use a value of in the autocorrelation expression and utilize a modified version of Chen’s algorithm to obtain the cascade of plane rotations shown in Figure 4 to approximate the 4point DST with 4 rotations.
3.3.2 Scaled i2i DST approximation
Each plane rotation in Figure 4 can be decomposed into two lifting steps and two scaling factors as discussed in Section 3.1. Since we are interested in lossless coding, the two scaling factors are omitted and only lifting steps are used for representing each plane rotation, giving lifting implementation of a scaled DST shown in Figure 5. Note that omitting the scaling factors does not change the biorthogonal coding gain given in [5] but the rotation may become nonorthogonal. Finally, the lifting factors and are quantized to 3 bits for easy implementation of multiplications with addition and bitshift operations. Table 1 lists the obtained lifting factors.
5/8  4/8  3/8  2/8  7/8  3/8  5/8  4/8 

3.4 I2i DCT and DST within HEVC
This paper uses the i2i DCT approximation shown in Figure 3 and the i2i DST approximation shown in Figure 5 to explore i2i transforms in lossless coding within HEVC. The lifting factors and are quantized to 3 bits for easy implementation with addition and bitshift operations. The lifting factors in the i2i DCT (Figure 3) are and [5]. The lifting factors in the i2i DST approximation (Figure 5) are given in Table 1.
The obtained i2i transforms are used along first the horizontal and then the vertical direction to obtain i2i 2D DCT and i2i 2D DST. These i2i 2D transforms are used in lossless compression to transform intra prediction residuals of luma and chroma pictures in only 4x4 transform units (TU). The transform coefficients are directly fed to the entropy coder without quantization. In larger TUs, the default HEVC processing is used. Notice that in lossless coding, the encoder choses 4x4 TUs much more frequently than other TUs. Exploring i2i transforms in larger TUs is left for future work.
4 Experimental Results
The i2i transforms are implemented into the HEVC version 2 Range Extensions (RExt) reference software (HM15.0+RExt8.1) [15] to provide experimental results for the i2i transform approach. The following systems are derived from the reference software and compared in terms of lossless compression performance and complexity :

HEVCv1

HEVCv2

i2iDCT

i2iDCT+RDPCM

i2iDST

i2iDST+RDPCM.
The HEVCv1 system represents HEVC version 1, which just skips transform and quantization for lossless coding, as discussed in Section 1. The HEVCv2 system represents HEVC version 2, and includes all available RExt tools, such as RDPCM, reversing the scan order, a dedicated context model for the significance map and other tools [6, 7] as discussed in Sections 2.1 and 2.2.
The remaining two systems employ the i2i transforms discussed in Section 3.4. The i2i transforms are used in intra coded blocks in only 4x4 transform units (TU). In larger TUs, the default processing of HEVC version 2 is used.
In the i2iDCT and i2iDST systems, the RDPCM system of the reference software is disabled in 4x4 TUs, and is replaced with the 4x4 i2i 2D DCT and DST transform, respectively. In intra coding, these i2i transforms are applied to the residual TU of all intra prediction modes.
In the i2iDCT+RDPCM and i2iDST+RDPCM systems, the i2i transform and RDPCM methods are combined in intra coding. In other words, in intra coding of 4x4 TUs, the RDPCM method of HEVC version 2 is used if the intra prediction mode is horizontal or vertical, and the i2i 2D DCT or DST transform is used for other intra prediction modes.
For the experimental results, the common test conditions in [16] are followed, except that only the first 32 frames are coded due to our limited computational resources. All results are shown in Table 2, which includes average percentage () bitrate reduction and encoding/decoding time of all systems with respect to HEVCv1 system for AllIntraMain settings.
HEVCv2  i2iDCT  i2iDCT  i2iDST  i2iDST  
+RDPCM  +RDPCM  
Class A  7.2  9.9  10.9  11.1  11.6 
Class B  4.7  4.3  5.0  5.4  5.8 
Class C  5.4  3.8  5.1  5.3  6.1 
Class D  7.6  6.4  8.1  7.6  8.7 
Class E  8.2  8.3  9.7  9.0  10.2 
Average  6.4  6.3  7.5  7.5  8.3 
Enc. T.  
Dec. T. 
HEVCv2, i2iDCT, i2iDCT+RDPCM, i2iDST and i2iDST+RDPCM systems achieve , , , and percent overall average bitrate reduction over HEVCv1, respectively. For Class A, which include sequences with the largest resolution (2560x1600), systems employing i2i transforms achieve significantly larger bitrate reductions than the HEVCv2 system. For the other classes, systems employing i2i transforms are typically slightly better than HEVCv2.
Notice also that if the systems employing i2i transforms are compared, then systems employing the DST achieve on average around larger bitrate reduction than systems employing the DCT. This result is similar to the results comparing the DCT and DST in lossy intra coding [1].
5 Conclusions
This paper explored an alternative approach for lossless coding based on integertointeger (i2i) transforms within HEVC. I2i transforms map integers to integers without increasing the dynamic range at the output and were used in this paper to transform intra prediction residuals of luma and chroma pictures in only 4x4 transform units (TU). An i2i DCT from the literature and a novel i2i approximation of the DST were explored. Experimental results showed improved performance with respect to other major methods, such as RDPCM, in terms of both compression performance and complexity.
References
 [1] G.J. Sullivan, J. Ohm, WooJin Han, and T. Wiegand, “Overview of the High Efficiency Video coding (HEVC) standard,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 22, no. 12, pp. 1649–1668, Dec 2012.
 [2] T. Wiegand, G.J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 13, no. 7, pp. 560–576, July 2003.
 [3] YungLyul Lee, KiHun Han, and G.J. Sullivan, “Improved lossless intra coding for H.264/MPEG4 AVC,” Image Processing, IEEE Transactions on, vol. 15, no. 9, pp. 2610–2615, Sept 2006.
 [4] SungWook Hong, Jae Hee Kwak, and YungLyul Lee, “Cross residual transform for lossless intracoding for HEVC,” Signal Processing: Image Communication, vol. 28, no. 10, pp. 1335 – 1341, 2013.
 [5] Jie Liang and T.D. Tran, “Fast multiplierless approximations of the dct with the lifting scheme,” Signal Processing, IEEE Transactions on, vol. 49, no. 12, pp. 3032–3044, Dec 2001.
 [6] D. Flynn, D. Marpe, M. Naccari, Tung Nguyen, C. Rosewarne, K. Sharman, J. Sole, and Jizheng Xu, “Overview of the range extensions for the hevc standard: Tools, profiles, and performance,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 26, no. 1, pp. 4–19, Jan 2016.
 [7] J Sole, R Joshi, and Karczewicz M, “Rce2 test b.1: Residue rotation and significance map context,” JCTVCN0044, Vienna, Austria, pp. 1–10, July 2013.
 [8] M. Budagavi, A. Fuldseth, G. Bjontegaard, V. Sze, and M. Sadafale, “Core transform design in the high efficiency video coding (hevc) standard,” Selected Topics in Signal Processing, IEEE Journal of, vol. 7, no. 6, pp. 1029–1041, Dec 2013.
 [9] WenHsiung Chen, C. Smith, and S. Fralick, “A fast computational algorithm for the discrete cosine transform,” Communications, IEEE Transactions on, vol. 25, no. 9, pp. 1004–1009, Sep 1977.
 [10] C. Loeffler, A. Ligtenberg, and George S. Moschytz, “Practical fast 1d dct algorithms with 11 multiplications,” in Acoustics, Speech, and Signal Processing, 1989. ICASSP89., 1989 International Conference on, May 1989, pp. 988–991 vol.2.
 [11] Chuohao Yeo, Yih Han Tan, Zhengguo Li, and S. Rahardja, “Modedependent transforms for coding directional intra prediction residuals,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 22, no. 4, pp. 545–554, April 2012.
 [12] Haoming Chen and Bing Zeng, “New transforms tightly bounded by dct and klt,” Signal Processing Letters, IEEE, vol. 19, no. 6, pp. 344–347, June 2012.
 [13] Chuohao Yeo, Yih Han Tan, Zhengguo Li, and S. Rahardja, “Modedependent fast separable klt for blockbased intra coding,” in Circuits and Systems (ISCAS), 2011 IEEE International Symposium on, May 2011, pp. 621–624.
 [14] F. Kamisli, “Blockbased spatial prediction and transforms based on 2d markov processes for image and video compression,” Image Processing, IEEE Transactions on, vol. 24, no. 4, pp. 1247–1260, April 2015.
 [15] “HM reference software (hm15.0+rext8.1),” https://hevc.hhi.fraunhofer.de/trac/hevc/browser/tags/HM15.0+RExt8.1, Accessed: 20160101.
 [16] Frank Bossen, “Common test conditions and software reference configurations,” Joint Collaborative Team on Video Coding (JCTVC), JCTVCF900, 2011.