# Efficient Computation of the 8-point DCT via Summation by Parts

{onecolabstract}This paper introduces a new fast algorithm for the 8-point discrete cosine transform (DCT) based on the summation-by-parts formula. The proposed method converts the DCT matrix into an alternative transformation matrix that can be decomposed into sparse matrices of low multiplicative complexity. The method is capable of scaled and exact DCT computation and its associated fast algorithm achieves the theoretical minimal multiplicative complexity for the 8-point DCT. Depending on the nature of the input signal simplifications can be introduced and the overall complexity of the proposed algorithm can be further reduced. Several types of input signal are analyzed: arbitrary, null mean, accumulated, and null mean/accumulated signal. The proposed tool has potential application in harmonic detection, image enhancement, and feature extraction, where input signal DC level is discarded and/or the signal is required to be integrated.

Keywords

DCT, Fast Algorithms, Image Processing

## 1 Introduction

Discrete transforms play a central role in signal processing. Noteworthy methods include trigonometric transforms—such as the discrete Fourier transform (DFT) [Oppenheim2009], discrete Hartley transform (DHT) [Oppenheim2009], discrete cosine transform (DCT) [Britanak2007], and discrete sine transform (DST) [Britanak2007]—as well as the Haar and Walsh-Hadamard transforms [Ahmed1975]. Among these methods, the DCT has been applied in several practical contexts: noise reduction [Gupta2012], watermarking methods [An2009], image/video compression techniques [Britanak2007], and harmonic detection [Britanak2007], to cite a few. In fact, when processing signals modeled as a stationary Markov-1 type random process, the DCT behaves as the asymptotic case of the optimal Karhunen–Loève transform in terms of data decorrelation [Britanak2007]. This approximation holds true when the correlation coefficient of the related stochastic process tends to the unit, which is the case for many real signals—specially images [Britanak2007]. Moreover, the recent increase in image/video processing demand for consumer electronics [Chien2013] and big data manipulation [Wigan2013] emphasizes the necessity for fast and efficient DCT computation [Ji2002].

As a consequence, the 8-point DCT is adopted in several image and video coding schemes [Bhaskaran1995], such as JPEG [Wallace1992], MPEG-1 [Wallace1992], H.264 [Wiegand2003], HEVC [Pourazad2012], AVS China [Rao2014, p. 61], and VP-10 [Rao2014, p. 165]. Aiming at minimizing the computational cost of the DCT evaluation, a number of fast algorithms for the 8-point DCT have been proposed, including Chen’s DCT algorithm [Wen-HsiengChen2003], Lee method [Lee1984], Loeffler algorithm [Loeffler1989], Feig-Winograd DCT factorization [Feig1992], and the Arai DCT [Arai1988].

Multiplication operations as required by DCT and others discrete-time transforms can be implemented via long sequences of additions, bit-shifting operation, and sign changes [Hamming1989]. Thus, algorithms that require multiplications often have higher computational costs [Blahut2010]. Therefore, above-mentioned methods were developed in order to reduce the overall number of multiplications [Britanak2007]. The Arai DCT is particularly useful because it furnishes a scaled version of the DCT spectrum. In some applications such as harmonic detection [Limin2007, Zheng2010] and JPEG-like image compression [Wallace1992, Bhaskaran1995], the scaled DCT is often a sufficient tool. This is because in these contexts only the relative value of the spectrum is necessary. Therefore, part of the cost of computing the DCT can be avoided [Britanak2007].

Among the fundamental mathematical tools, we separate the summation-by-parts technique [Graham1989, p. 54], which is the discrete-time counterpart for the well-known integration-by-parts method [Apostol1981, p. 144]. Although applied in several contexts such as computational physics for approximate second derivatives [Mattssona2004], approximations of the linear advection-diffusion equation in computational fluid dynamics [Mattsson2003], and rapid calculation of slow converging series in electromagnetic problems [Mosig2002], it has been particularly overlooked by the signal processing community. Early attempts to employ it as a numerical analysis tool are due to Boudreaux-Bartels and collaborators in the context of the DFT computation [Boudreaux-Bartels1987] and the evaluation of Fourier coefficients errors calculations [Boudreaux-Bartels1989].

The aim of this paper is to propose a new fast algorithm for the 8-point DCT computation based on the summation-by-parts formula for periodic signals [Graham1989, cintra2012soma]. The introduced method is sought to achieve the theoretical minimal multiplicative complexity for the exact DCT computation [Duhamel1987, heideman1988multiplicative]. Moreover, to further minimize computational costs, the proposed algorithm is also sought to provide a scaled version of the DCT spectrum [Britanak2007]. The proposed algorithm finds application in some important problems, such as feature detection, where DC level may not be relevant [Jain1989, Wang2002]. Also, it can be applied to scenarios where input signal is natively accumulated (integrated) [Oppenheim2009, p. 19]. This situation occurs in face recognition problems, where usual algorithms require data to be integrated [Elboher2012, Viola2001].

This paper is organized as follows. In Section 2, we furnish the mathematical background for the summation-by-parts technique and the DCT. Considering matrix formalism, we detail the proposed algorithm for the DCT in Section LABEL:sectionDCT. In Section LABEL:sectioncomplexity, the introduced method is assessed in terms of its computational complexity and comparisons with competing algorithms are shown. Section LABEL:sectionconclusion brings final comments and remarks.

## 2 Mathematical Background

### 2.1 Summation-by-parts

The summation-by-parts technique is the discrete-time equivalent of the integration-by-parts method [Graham1989]. Let and be two discrete-time signals. The summation-by-parts prescribes that [Graham1989, Boudreaux-Bartels1987]:

where denotes the forward difference operator given by [Graham1989]. Above expression can be simplified with the assumption of the following additional weak conditions. Admitting that the considered signals are periodic with period , it was established in [cintra2012soma] that:

(1) |

The above condition is not too restrictive. Indeed discrete-time Fourier analysis often assume that the input signals are periodic [Oppenheim2009, Britanak2007, Gonzalez2001]. In particular, the DCT can be obtained as the solution to the harmonic oscillation problem [Britanak2007].

The expression can be interpreted as a discrete-time transformation. Let be the input signal to be transformed and be a given discrete transformation kernel for the th transform-domain component. Therefore, we have that:

(2) |

where is the transformed output signal. Table 1 summarizes common transformation kernels. Therefore, applying (1) into (2) yields the following expression for the transform-domain components:

(3) |

where , for . Comparing (2) with (2.1), we notice that the original transform expression was re-written into an alternative form where both the input data and the kernel function were processed. Notice that is the output of an accumulator system for input signal [Oppenheim2009]; whereas derives from a forward difference system for input signal [Oppenheim2009]. Although the forward difference system is not causal, this fact poses no difficulty to above formalism. This is because is not a random real-time sequence—but a deterministic sequence whose values are known a priori [Hamming1989, p. 7].

Transform | |
---|---|

DFT [Oppenheim2009] | |

DHT [Oppenheim2009] | |

DCT [Britanak2007] | |

DST [Britanak2007] |

Moreover, if possesses null mean, then the following expression holds true:

For trigonometric transforms, above condition implies (null DC value). Therefore, (2.1) can be simplified and written as:

(4) |

Above summation ranges from to . This means that the transformation matrix linked to (2.1) has dimension . This fact contrasts with the original transformation matrix, which has size . Thus, the summation-by-parts effected a dimension reduction of transform computation. As a consequence, the computational cost of associate algorithms is expected to be reduced.

Figure 1 depicts the overall diagram for the transform computation based on the summation-by-parts formula, when is assumed to be an arbitrary signal. Notice that, if is a power of two, both the DC removal block and the accumulation system are multiplierless operations.

### 2.2 Discrete Cosine Transform

The DCT is a linear transformation that maps an -point discrete-time signal into another -point discrete-time signal according to the following relationship [Loeffler1989]:

(5) |

where , , and , for . The above expression can be given a compact format by means of matrix representation. Indeed, considering signals and in column vector format as and , we have that:

(6) |

where is the DCT matrix, whose -entry is given by . For , we have the following transformation matrix: