Efficient Computation of the 8-point DCT via Summation by Parts

Efficient Computation of the 8-point DCT via Summation by Parts

D. F. G. Coelho  R. J. Cintra  V. S. Dimitrov D. F. G. Coelho is with the Department of Electrical and Computer Engineering, University of Calgary, Calgary, Canada.R. J. Cintra is with the Signal Processing Group, Departamento de Estatística, Universidade Federal de Pernambuco and the Department of Electrical and Computer Engineering, University of Calgary, Calgary, Canada. E-mail: rjdsc@de.ufpe.brV. S. Dimitrov is with the Department of Electrical and Computer Engineering, University of Calgary, Calgary, Canada.

This paper introduces a new fast algorithm for the 8-point discrete cosine transform (DCT) based on the summation-by-parts formula. The proposed method converts the DCT matrix into an alternative transformation matrix that can be decomposed into sparse matrices of low multiplicative complexity. The method is capable of scaled and exact DCT computation and its associated fast algorithm achieves the theoretical minimal multiplicative complexity for the 8-point DCT. Depending on the nature of the input signal simplifications can be introduced and the overall complexity of the proposed algorithm can be further reduced. Several types of input signal are analyzed: arbitrary, null mean, accumulated, and null mean/accumulated signal. The proposed tool has potential application in harmonic detection, image enhancement, and feature extraction, where input signal DC level is discarded and/or the signal is required to be integrated.


DCT, Fast Algorithms, Image Processing

1 Introduction

Discrete transforms play a central role in signal processing. Noteworthy methods include trigonometric transforms—such as the discrete Fourier transform (DFT) [Oppenheim2009], discrete Hartley transform (DHT) [Oppenheim2009], discrete cosine transform (DCT) [Britanak2007], and discrete sine transform (DST) [Britanak2007]—as well as the Haar and Walsh-Hadamard transforms [Ahmed1975]. Among these methods, the DCT has been applied in several practical contexts: noise reduction [Gupta2012], watermarking methods [An2009], image/video compression techniques [Britanak2007], and harmonic detection [Britanak2007], to cite a few. In fact, when processing signals modeled as a stationary Markov-1 type random process, the DCT behaves as the asymptotic case of the optimal Karhunen–Loève transform in terms of data decorrelation [Britanak2007]. This approximation holds true when the correlation coefficient of the related stochastic process tends to the unit, which is the case for many real signals—specially images [Britanak2007]. Moreover, the recent increase in image/video processing demand for consumer electronics [Chien2013] and big data manipulation [Wigan2013] emphasizes the necessity for fast and efficient DCT computation [Ji2002].

As a consequence, the 8-point DCT is adopted in several image and video coding schemes [Bhaskaran1995], such as JPEG [Wallace1992], MPEG-1 [Wallace1992], H.264 [Wiegand2003], HEVC [Pourazad2012], AVS China [Rao2014, p. 61], and VP-10 [Rao2014, p. 165]. Aiming at minimizing the computational cost of the DCT evaluation, a number of fast algorithms for the 8-point DCT have been proposed, including Chen’s DCT algorithm [Wen-HsiengChen2003], Lee method [Lee1984], Loeffler algorithm [Loeffler1989], Feig-Winograd DCT factorization [Feig1992], and the Arai DCT [Arai1988].

Multiplication operations as required by DCT and others discrete-time transforms can be implemented via long sequences of additions, bit-shifting operation, and sign changes [Hamming1989]. Thus, algorithms that require multiplications often have higher computational costs [Blahut2010]. Therefore, above-mentioned methods were developed in order to reduce the overall number of multiplications [Britanak2007]. The Arai DCT is particularly useful because it furnishes a scaled version of the DCT spectrum. In some applications such as harmonic detection [Limin2007, Zheng2010] and JPEG-like image compression [Wallace1992, Bhaskaran1995], the scaled DCT is often a sufficient tool. This is because in these contexts only the relative value of the spectrum is necessary. Therefore, part of the cost of computing the DCT can be avoided [Britanak2007].

Among the fundamental mathematical tools, we separate the summation-by-parts technique [Graham1989, p. 54], which is the discrete-time counterpart for the well-known integration-by-parts method [Apostol1981, p. 144]. Although applied in several contexts such as computational physics for approximate second derivatives [Mattssona2004], approximations of the linear advection-diffusion equation in computational fluid dynamics [Mattsson2003], and rapid calculation of slow converging series in electromagnetic problems [Mosig2002], it has been particularly overlooked by the signal processing community. Early attempts to employ it as a numerical analysis tool are due to Boudreaux-Bartels and collaborators in the context of the DFT computation [Boudreaux-Bartels1987] and the evaluation of Fourier coefficients errors calculations [Boudreaux-Bartels1989].

The aim of this paper is to propose a new fast algorithm for the 8-point DCT computation based on the summation-by-parts formula for periodic signals [Graham1989, cintra2012soma]. The introduced method is sought to achieve the theoretical minimal multiplicative complexity for the exact DCT computation [Duhamel1987, heideman1988multiplicative]. Moreover, to further minimize computational costs, the proposed algorithm is also sought to provide a scaled version of the DCT spectrum [Britanak2007]. The proposed algorithm finds application in some important problems, such as feature detection, where DC level may not be relevant [Jain1989, Wang2002]. Also, it can be applied to scenarios where input signal is natively accumulated (integrated) [Oppenheim2009, p. 19]. This situation occurs in face recognition problems, where usual algorithms require data to be integrated [Elboher2012, Viola2001].

This paper is organized as follows. In Section 2, we furnish the mathematical background for the summation-by-parts technique and the DCT. Considering matrix formalism, we detail the proposed algorithm for the DCT in Section LABEL:sectionDCT. In Section LABEL:sectioncomplexity, the introduced method is assessed in terms of its computational complexity and comparisons with competing algorithms are shown. Section LABEL:sectionconclusion brings final comments and remarks.

2 Mathematical Background

2.1 Summation-by-parts

The summation-by-parts technique is the discrete-time equivalent of the integration-by-parts method [Graham1989]. Let  and  be two discrete-time signals. The summation-by-parts prescribes that [Graham1989, Boudreaux-Bartels1987]:

where  denotes the forward difference operator given by  [Graham1989]. Above expression can be simplified with the assumption of the following additional weak conditions. Admitting that the considered signals are periodic with period , it was established in [cintra2012soma] that:


The above condition is not too restrictive. Indeed discrete-time Fourier analysis often assume that the input signals are periodic [Oppenheim2009, Britanak2007, Gonzalez2001]. In particular, the DCT can be obtained as the solution to the harmonic oscillation problem [Britanak2007].

The expression can be interpreted as a discrete-time transformation. Let be the input signal to be transformed and be a given discrete transformation kernel for the th transform-domain component. Therefore, we have that:


where is the transformed output signal. Table 1 summarizes common transformation kernels. Therefore, applying (1) into (2) yields the following expression for the transform-domain components:


where , for . Comparing (2) with (2.1), we notice that the original transform expression was re-written into an alternative form where both the input data and the kernel function were processed. Notice that is the output of an accumulator system for input signal  [Oppenheim2009]; whereas derives from a forward difference system for input signal  [Oppenheim2009]. Although the forward difference system is not causal, this fact poses no difficulty to above formalism. This is because is not a random real-time sequence—but a deterministic sequence whose values are known a priori [Hamming1989, p. 7].

DFT [Oppenheim2009]
DHT [Oppenheim2009]
DCT [Britanak2007]
DST [Britanak2007]
Table 1: Common discrete transform kernels

Moreover, if possesses null mean, then the following expression holds true:

For trigonometric transforms, above condition implies (null DC value). Therefore, (2.1) can be simplified and written as:


Above summation ranges from to . This means that the transformation matrix linked to (2.1) has dimension . This fact contrasts with the original transformation matrix, which has size . Thus, the summation-by-parts effected a dimension reduction of transform computation. As a consequence, the computational cost of associate algorithms is expected to be reduced.

Figure 1 depicts the overall diagram for the transform computation based on the summation-by-parts formula, when is assumed to be an arbitrary signal. Notice that, if is a power of two, both the DC removal block and the accumulation system are multiplierless operations.

Figure 1: Block diagram of the proposed architecture.

2.2 Discrete Cosine Transform

The DCT is a linear transformation that maps an -point discrete-time signal  into another -point discrete-time signal  according to the following relationship [Loeffler1989]:


where , , and , for . The above expression can be given a compact format by means of matrix representation. Indeed, considering signals  and  in column vector format as and , we have that:


where is the DCT matrix, whose -entry is given by . For , we have the following transformation matrix:

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description