Fundamental Limits of Video Coding: A Closedform Characterization of Rate Distortion Region from First Principles
Abstract
Classical motioncompensated video coding methods have been standardized by MPEG over the years and video codecs have become integral parts of media entertainment applications. Despite the ubiquitous use of video coding techniques, it is interesting to note that a closed form ratedistortion characterization for video coding is not available in the literature. In this paper, we develop a simple, yet, fundamental characterization of ratedistortion region in video coding based on informationtheoretic first principles. The concept of conditional motion estimation is used to derive the closedform expression for ratedistortion region without losing its generality. Conditional motion estimation offers an elegant means to analyze the ratedistortion tradeoffs and demonstrates the viability of achieving the bounds derived. The concept involves classifying image regions into active and inactive based on the amount of motion activity. By appropriately modeling the residuals corresponding to active and inactive regions, a closed form expression for ratedistortion function is derived in terms of motion activity and spatiotemporal correlation that commonly exist in video content. Experiments on real video clips using H.264 codec are presented to demonstrate the practicality and validity of the proposed ratedistortion analysis.
I Introduction
How much compression is possible on a video clip? The analysis presented in this paper answers this question by deriving ratedistortion bounds for video coding from first principles. A classical video coding system consists of three main components [11](Chapter 8): (1) video analysis which includes motion estimation and compensation, (2) quantization, and (3) binary encoding as depicted in Fig. 1. The effectiveness of a video coding technique depends on the rate distortion tradeoffs that it offers. Often, an application such as a video streaming service might dictate the limit on maximum acceptable distortion. This limit, in turn, places a bound on the minimum amount of rate required to encode video. On the other hand, a communication technology may put a limit on the maximum data transfer rate that the system can handle. This limit, in turn, places a bound on the resulting video quality. Therefore, there is a need to investigate the tradeoffs between the bitrate (R) required to encode video and the resulting distortion (D) in the reconstructed video. Ratedistortion (RD) analysis deals with lossy video coding and it establishes a relationship between the two parameters by means of a Ratedistortion function [6, 10, 1]. Since RD analysis is based on informationtheoretic concepts, it places absolute bounds on achievable rates and thus derives its significance.
Ia Motivation
Motion estimation and compensation process is one strategy to remove the temporal correlation that naturally exists in video. This is accomplished in practical video coding standards such as MPEG4 and H.264 by means of blockbased motion estimation methods. While video coding is an integral part in every multimedia application that exists today, it is interesting to note that a closed form ratedistortion characterization for video coding is not available in the literature. Ratedistortion optimization (RDO) [32, 35, 33], which is based on Lagrangian formulation comes close to the RD analysis presented in this paper. While RDO is mathematically elegant and has been widely and effectively used in many practical coders including H.264, it doesn’t lend itself to closed form characterization of RD tradeoffs. The proposed RD analysis, on the other hand, is based on informationtheoretic concepts and thus establishes absolute bounds on RD tradeoffs. These bounds are based on measurable and quantifiable parameters such as motion activity and correlation in video. Hence, they are of significant practical value.
IB Main Contributions
In this paper, we develop a theoretical basis to investigate the RD tradeoffs in video coding. We characterize the RD region using the concept of conditional motion estimation. In this concept, motion estimation and compensation are performed only for active regions selected based on a motion activity criterion [3, 4]. Based on this concept, we devise a strategy to balance the bitbudget (rate) against the video quality. We derive a closedform relationship for video coding from the first principles and validate this relationship by experimenting with several video clips. The main contributions of this paper are outlined below.

First, a framework for conditional motion estimation and compensation that facilitates the derivation of RD formulation, is presented. In this framework, each frame is divided into blocks of standard size. The blocks are classified into active and inactive categories based on the magnitude of intensity change between consecutive frames. The residuals corresponding to active blocks are represented by displaced frame differences (DFDs). The DFDs are quantized and then encoded along with motion vectors. The residuals corresponding to inactive blocks are represented by frame difference (FDs). FDs are quantized and then encoded without motion vectors. We demonstrate that this process offers an excellent means to analyze the RD tradeoffs in video coding.

Second, the RD function associated with motion estimation and compensation is derived from first principles. Using conditional motion estimation doesn’t reduce the generality of the RD function derived. On the other hand, conditional motion estimation provides a means to derive the RD function. If motion estimation is perfect, then DFDs corresponding to active blocks will be uncorrelated [11] and can be modeled as white Gaussian process. While perfect estimation of motion is not practical, it is meaningful and leads to the derivation of an upper bound for RD tradeoff. On the other hand, the residuals corresponding to inactive pixels, i.e., FDs, exhibit spatiotemporal correlation and hence are modeled using GaussMarkov process. For simplification purposes, we use firstorder GaussMarkov model parametrized by the correlation coefficient , where the subscript indicates inactive blocks. Modeling of FDs using GaussMarkov process leads to the derivation of a lowerbound for RD tradeoff. Different videos exhibit different levels of motion activity, which we quantified using the parameter that varies between 0 to 1. We derive a closed form expression for the RD region in terms of and . This closed form expression is applicable to all classical motion compensated video coding schemes that exist today and is given by:
(1) where () and () are the variance and distortion pairs associated with the uncorrelated and correlated residual streams resulting from the motion estimation and compensation process. Details of the derivation are given in Section IV.

Third, the proposed RD formulation is validated by experiments on several video clips. In particular, we used H.264 codec to compress and reconstruct videos with different levels of motion activity. Results demonstrating the validity of the proposed model are presented on two specific video clips: one with low motion activity and one with high motion activity.
IC Organization
This paper is organized as follows. Section II presents a brief literature survey in RD theory and its applications. It also discusses the related work in conditional motion estimation. Section III explains the conditional motion estimation concept. The RD analysis for motion estimation based video encoding is discussed in section IV. Section V outlines experimental results, discussions, summary, and conclusions.
Ii Related Work
There is an extensive literature on RD theory and RD optimization methods for video encoding. Below, we provide a brief summary of research progress applicable to the issues discussed in this paper.
Iia RateDistortion Analysis
The foundation for RD theory was first formulated by Shannon in [6]. A survey of early contributions to RD theory are presented in [7]. RD analysis for scalable video coding is presented in [30, 31]. Closed form expressions have been derived in the literature for different types of sources, not necessarily for video. For example, the function for an independent and identically distributed (i.i.d.) Gaussian source is given by [10] (Ch. 13),
(2) 
where is the variance of the source. If the source samples are correlated as in the case of a typical video, this formulation is not applicable. A RD function for any source that can be modeled as an order GaussMarkov process is derived in [19].
IiB RateDistortion Analysis for Video Coding
One way to improve the efficiency of video coding is by exploiting the spatial and temporal correlations that exist in the video through the use of vector coding. In practice, vector coding is implemented using blockbased methods. Deriving an function corresponding to blockbased video coding is much more complex compared to that of scalar coding [34, 25]. While video encoders always make use of block based strategies, they are not easily amenable for RD analysis.
An alternate method for deriving the RD relation for video encoding is by modeling the process of motion estimation and compensation. One of the most widely used and effective rate distortion optimization (RDO) technique in video coder control is Lagrangian formulation [35, 33, 34], which is discussed below.
IiC RateDistortion Optimization
The Lagrangian formulation of the rate distortion optimization problem is given by,
(3) 
where the Lagrangian ratedistortion functional, , is minimized for a particular value of the Lagrangian multiplier [32, 35, 33].
In practical coders such as H.264, RDO is used in both motion estimation to find a motion vector and the subsequent mode decision to decide a suitable mode to encode the residual data. While the applicability of RDO formulation is demonstrated by practical coders, it doesn’t lend itself to a closed form characterization of the RD region which is important for theoretical analysis. This is the focus of our research work.
IiD Applications Beyond Coding
Applications of RD analysis extend beyond coding. A study of the RD function and its applicability in designing a practical communication system for video sources with bounded performance is discussed in [8]. Numerous other applications of RD theory beyond coding, communications and signal processing are extensively discussed in [9]. In many applications, a basic RD problem is formulated and solved using techniques such as Lagrangian. In pattern classification [2], for example, features belonging to different classes are assumed as outputs of a source and an equivalent data compression problem is designed. The R(D) function for such a data compression problem explains the tradeoffs between the number of features selected and the resulting error in classification.
IiE Paradigm Shift in Video Encoding
Conventional video coding techniques perform motion estimation on the sender side. Motion in video is represented using different methods including pixelbased representation, regionbased representation, block based representation, and meshbased representation etc [11]. Motion is estimated using a criterion such as DFD or Bayesian. A tutorial on estimating two dimensional motion is presented in [12].
The complexity of an encoder increases as the complexity of the motion estimation method increases. An encoder of this kind is not suitable to be used in resource constrained application such as wireless sensor networks. A better way of encoding in resource constrained situations is distributed source coding which is built on Slepian  Wolf coding, [13], WynerZiv coding [14] and channel coding principles. One of the coding techniques built on distributed source coding principles, known as PRISM, is described in [15]. The principles of distributed source coding [13] are extended to lossycompression in [16]. The RD analysis for WynerZiv video coding has been proposed in [17].
Ratedistortion in distributed systems has applicability in video surveillance networks. A ratedistortion function for distributed source (video) coding with correlated memoryless Gaussian sources in which L sources are assumed to provide partial side information at the decoder side to construct the source is proposed in [18].
Iii Conditional Motion Estimation
In this section, we briefly outline the process of conditional motion estimation. Conditional motion estimation process begins with subdivision of the image frames into blocks of equal size, and then, classification of these blocks into active or inactive classes. Activity is determined based on the difference in intensities corresponding to two consecutive frames. Then, blockbased motion estimation is performed for only active blocks.
Iiia Active and Inactive Blocks
The classification of the blocks into active and inactive blocks is based on two thresholds, one at pixel level and one at block level [3]. If a value in the difference image is greater than the threshold , then that pixel is classified as an active pixel, otherwise, it is classified as an inactive pixel [20]. The two thresholds need to be chosen adaptively based on the spatial and temporal correlations present in the video and the desirable level of RD tradeoff. In our previous work, we developed an online training strategy to adaptively select the thresholds using Bayesian criterion in [4, 21]. Fig. 2 depicts a block in a sample difference image and the active and inactive pixels within the block. The number of active pixels in every block is counted, and if this count is greater than , it is classified as an active block, else, it is classified as an inactive block.
The selection of the two thresholds and is an important task in conditional motion estimation process. The selection of is crucial since it directly decides if a pixel is active or not. It is known that in a frame there exits a correlation between intensities of adjacent groups of pixels.
The selection of also significantly impacts the performance of the proposed method. For example, if is increased, then the number of active blocks decreases, resulting in low bit rate and high distortion. On the other hand, if is decreased, then the number of active blocks increases, resulting in high bit rate and low distortion. In order to demonstrate this, an image is selected from a video sequence and the difference image (i.e., the difference between the current frame and previous frame) is found. The active blocks found in this frame using the difference image are displayed for two values of in Fig. 3. It can be observed that when is small, the number of active blocks is large and viceversa. In our approach, is kept constant for all the blocks in order to simplify the analysis.
IiiB Difference Image
Let be the anchor frame and be the target frame. If D() represents a difference image, then,
(4) 
where is a vector representing pixel locations. Every pixel in is compared to its corresponding and classified as an active pixel if it is greater or inactive if it is lesser. The number of active pixels in a block are then counted and if the count is greater than , then that block is classified as an active block else it is classified as an inactive block. Once the active blocks in a target frame are determined, the next step is to perform blockbased motion estimation for all those active blocks. We assume that the anchor frame is already available at the decoding side and encode the target frame using conditional motion estimation.
IiiC Blockbased Motion Estimation
Blockbased motion estimation is a motion compensated video technique used in various video coding standards including [22], MPEGX [23]. A blockbased motion estimation technique uses a blockmatching algorithm to test each block in the anchor frame with every block in the target frame to find the block that matches the most. The matching criteria is usually the mean square difference between the blocks compared. In our work, a fast blockmatching search algorithm called diamond search algorithm [24] is used to estimate motion vectors for all blocks in the anchor frame. A motion vector represents the displacements of a block along x and y directions.
Let be a motion vector whose parameters and represent the horizontal and vertical displacements that a block in an anchor frame undergoes to reach its position in the target frame. If every block is identified by the first pixel in it, then the set of motion vectors for the frame can be represented as [11]. In conditional motion estimation, we find motion vectors for active blocks only. The inactive blocks are assumed not to have moved and are represented with zero motion vectors. Once the motion vectors are found, the displaced frame difference, , which is the difference between the target frame and the motion compensated anchor frame, is generated. This can be written as,
(5) 
where is the motion compensated frame constructed from and motion vectors set . The displaced frame difference, thus evaluated, is scalar quantized to obtain .
Iv RateDistortion Analysis
A video encoder based on conditional motion estimation effectively transforms the video into an alternate representation consisting of three different outputs: (1) motion vectors, (2) quantized DFDs corresponding to active blocks, and (3) quantized FDs corresponding to inactive blocks. In many practical video coding, transformations such as Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT) are employed. They do not impact the theoretical RD analysis because of their orthogonality and energypreserving properties. Hence, they are not taken into account in our analysis. Similarly, while quantization strategies can be taken into account to further fine tune the RD bounds, they are not considered here to keep the RD analysis in its simplest and most fundamental form.
In order to develop RD analysis for the video coding, one needs to first analyze the RD tradeoffs offered by these three components individually and later combine them in a meaningful way. In the following subsections, we consider each of these sources, and investigate the corresponding RD tradeoffs.
Iva Motion Activity and Motion Vectors
Motion activity is a measure of activity level in the video. A sports video clip, for example, will have high activity whereas a television news clip will have low activity. In clock based encoding methods, motion activity could be measured in terms of proportion of active blocks (say, ) in the video.
Motion estimation and compensation process is an effective strategy to reduce or remove the temporal correlation that usually exists in video. This process transforms the video stream into uncorrelated data stream consisting of motion vectors and residuals. Uncorrelated data stream, in turn, can be encoded using scalar coding as opposed to correlated data stream which requires vector coding.
The process of motion estimation and compensation leads to a representation that includes MVs and the corresponding DFDs. The bit rate associated with this representation needs to take into account the bit rate required for MVs and the DFDs. There is no distortion associated with MVs. However, quantization of DFDs leads to distortion.
A common model for representing the motion field is a Gibbs/Markov field [26, 11]. This model is defined by a neighborhood structure called clique. Let represent the set of cliques; then the probability density function corresponding to Gibbs/Markov field is defined as:
(6) 
where Z is a normalization factor. The function , known as the potential function, measures the difference between pixels in the same clique:
(7) 
The bit rate allocation for motion vectors, is given by,
(8) 
where is the number of bits for each motion vector and and are the dimensions of the block.
Each active block is represented by DFDs, and only one motion vector. Thus, the bitrate required to represent motion vectors (MVs) is far less compared to the bitrate needed to represent FDs and DFDs. As the block size gets larger, the cost of encoding motion vectors becomes smaller.
Further, the number of motion vectors can be directly computed from the number of active blocks. This relationship between the MVs and active blocks allows us to directly analyze and assess the RD tradeoffs associated with MVs.
It is worth mentioning that motion vectors are also correlated. In our previous work, we developed a motion estimation method that takes into account the correlation among motion vectors [36]. In the present analysis, however, the correlation among motion vectors is not considered because it is not as fundamental as the correlation that exists at pixel level.
IvB Ratedistortion Analysis for Displaced Frame Differences
The DFD image is represented by and it is computed using (5). This image consists of residuals obtained after compensating each block in the anchor frame for its motion and subtracting the motion compensated anchor frame from the target frame.
Accurate computation of MVs results in a DFD image consisting of pixels that follow independent and identically distributed (i.i.d.) samples of Gaussian source [11] which can be encoded using scalar coding techniques. In practice, however, the residuals obtained after motion compensation are still correlated. Hence, transforms such as DCT and DWT are applied to the residuals to remove the correlation that still exists after motion compensation. For establishing RD bounds, however, modelling DFDs as i.i.d. Gaussian source is most appropriate.
The RD relationship for i.i.d. Gaussian source is given by [10](Ch. 13),
(9) 
where is the source variance, is the distortion resulting from encoding the active pixels.
IvC Ratedistortion analysis for Frame Differences
There exists a strong spatial correlation among pixels in video. In image processing literature, realworld images are best modeled using Gaussian process [27], and GaussMarkov process is an appropriate model for a correlated Gaussian source. For simplification, we model FD samples using the first order GaussMarkov process. Several experiments have been carried out to test the suitability of GaussMarkov process for modeling FDs. The close match between the PDF of GaussMarkov source and that of the DFDs shown in Fig. 4, validates this assumption.
The relationship between the bit rate incurred in encoding a firstorder GaussMarkov source and the resulting distortion in its reconstruction is given by [25],
(10) 
where is the correlation that exists between adjacent samples and is the variance of the source samples.
IvD Overall Ratedistortion Analysis and Characterization of Rate Region
Based on the analysis presented so far, the considerations for RD tradeoffs in video encoding can be summarized as follows:

Motion Vectors: Motion vectors are computed only for active blocks. The bitrate required to encode motion vectors can be computed directly from the number of active blocks. Motion vectors have an indirect impact on the resulting distortion. Accurate estimation of motion vectors will significantly reduce the temporal correlation between consecutive video frames at the expense of computational complexity.

Frame Differences: Inactive blocks are represented by FDs only. Motion vectors are not needed for inactive blocks.

Displaced Frame Differences: Active blocks are represented by MVs and DFDs. Scalar coding is sufficient to encode DFDs. Further, even after accounting for motion vectors, active blocks require lesser bitrate compared to inactive blocks.
The overall rate and distortion for video coding scheme can now be expressed as,
As the block size gets larger, the bits allocated for a motion vector gets smaller compared to the bits allocated for DFDs and FDs. Hence, we can express the RD relationship as follows:
(12) 
This closed form expression indicates that the video encoding rate is primarily a function of motion activity and the statistics of the residual data after motion estimation and compensation. Video encoding methods need to be accurate in estimating motion vectors and in exploiting the correlation that exists after motion compensation in order to provide better ratedistortion tradeoffs.
IvE Theoretical Results
The above RD analysis reveals that RD tradeoffs in a video source depend on primarily two aspects: motion activity and spatiotemporal correlation which are outlined below. They will be followed up in the next section within the experimental analysis.

Motion Activity: Fig. 5 shows the variations in RD curve as a function of . For this experiment, the model parameters are set to the following values: = 10, = 100, and = 0.5. The lower curve in in Fig. 5 represents the scenario when the image consists of inactive regions only. This happens for video clips with slow motion. The upper curve in Fig. 5 represents the scenario when the image consists of only active regions. This happens for video clips with fast motion.

Spatiotemporal Correlation: RD tradeoffs largely depend on spatiotemporal correlation that naturally exists in video sequences. Let the parameter represent the spatiotemporal correlation that exists in the video. Fig. 6 shows the variations in the RD curve as a function of the correlation coefficient (). For this experiment, the model parameters are set to the following values: = 50, and = 100. In this figure, the lower curve represents the scenario with high correlation and the upper curve represents the scenario with less correlation. The plots suggests large correlation leads to large RD tradeoff. The plots also illustrate that uncorrelated data is difficult to compress.
V Experiments, Results, and Discussion
The proposed RD analysis has been validated through simulations in MATLAB and using H.264 codec. The derived closed form bounds for RD bounds are compared with the experimental results obtained from H.264 codec on a wide variety of video clips.
Va Experimental Set up
The experimental set up consists of H.264 video encoder stimulated on the JM software, which is the official reference software for the H.264/1449610 AVC profiles. Several video clips of varying resolution (for example, 240 342 and 486 720) in YUV 4:2:0 format with frame rate of 30 frames/sec were encoded into H.264 format and then decoded back to the original file format at different bit rates. The encoder speed ranges from 2.5 Mbits/sec to 30 Mbit/sec. After running the encoder, RD statistics for every encoded frame and cumulative results are collected. Encoder output size in bytes and MSE for Y frame are recorded for each bit rate. Bit rate (R) for each video is calculated as follows:
(13) 
The results shown are based on the experiments on two specific video clips with different levels of motion activity: (1) A Table Tennis video clip with low motion activity and (2) a Football video clip with high motion activity, each with several number of frames.
VB Experiments with H.264 Encoder
Figs (a)a shows a target frame from a Table Tennis video clip. The theoretical and experimental RD results for this video clip are plotted in Figure (c)c. The model parameters for this video clip are set to the following values: = 20, = 50, = 0.59 and = 0.05. The plot shown in Magenta color corresponds to the RD result obtained from the H.264 codec.
VC Discussion
The results shown in Fig. (c)c and Fig. (d)d demonstrate the following important aspects of the proposed RD model.

The top and bottom plots in each figure provide informationtheoretic bounds derived based on the PDFs associated with DFDs, and FDs. The first plot in the middle of each figure (in Magenta) represents the RD results obtained from the H.264 codec. The second plot in the middle of each figure represents the expected RD results based on the proposed model. The closeness of the theoretical and practical RD plots demonstrate the validity of the proposed RD analysis.

The region between the two theoretical RD curves can be characterized as the RD region for classical motionestimation based video coding techniques. This RD region is dependent on spatial correlation described by () and motion activity ().

While large spatial correlation makes the RD curve go down, large motion activity makes the RD curve go up. This is demonstrated by the experimental plots corresponding to the two video clips. The second (football) video clip has more motion activity compared to the first video clip, resulting in higher rate as well as larger distortion compared to the first video clip.
VD Summary and Conclusions
In this paper, we characterized the RD region in video coding using the concept of conditional motion estimation. Through a practical implementation, we demonstrated the validity of the proposed RD analysis. Our work can be extended in the following ways.

While a typical FD image follows a firstorder GaussMarkov process, it is possible that higherorder GaussMarkov process can be used to model FD image. This may lead to tighter bounds for the RD region.

Informationtheoretic RD analysis doesn’t take into account the implementation overheads such as blockbased representation, and quantization. Thus, the analysis presented in this paper can be extended and made practical by taking the overheads associated with implementation.
References
 [1] T. Berger, Rate distortion theory, Englewood Cliffs, NJ, Prentice Hall, 1971.
 [2] J. Pearl, An application of ratedistortion theory to pattern recognition and classification, pattern Recognition, vol. 8, no. 1, pp. 1122, Jan 1976.
 [3] G. B. Rath and A. Makur, Subblock matching based conditional motion estimation with automatic threshold selection for video compression, IEEE Transactions on circuits and systems for video tech., vol. 13, no. 9, pp. 914924, Sep 2003.
 [4] R. Yarlagadda, Rate Distortion analysis for adaptive threshold based conditional motion estimation schemes, MS Thesis Wichita State University, May 2005.
 [5] C. Shannon and W. Weaver, The mathematical theory of communication, University of Illinois, 1949.
 [6] C. Shannon, Coding theorems for a discrete source with fidelity criterion, Information and Decision Process, McGrawHill, Robert E. Machol, 1960.
 [7] H. C. Andrews, Bibilography on rate distortion theory, IEEE Transactions on Information Theory, vol. IT17, pp. 198199, March 1971.
 [8] L. D. Davisson, RateDistortion theory and application, Proceedings of the IEEE, vol. 60, no. 7, pp. 800808, July 1972.
 [9] R. G. Gallager, Information theory and reliable communication, New York: Wiley, 1968.
 [10] T. M. Cover and J. A. Thomas, Elements of information theory, Wiley Series in Telecommunications, 2004.
 [11] Y. Wang, J. Ostermann, and Y. Q. Zhang, Video processing and communication, Prentice Hall, Signal processing series, Alan V. Oppenheim, Series Editor, 2002.
 [12] C. Stiller and J. Konard, Estimating motion in image sequences, IEEE Signal Processing Magazine, vol. 16, no. 4, pp.7091, July 1999.
 [13] D. Slepian and J. Wolf, Noiseless coding of correlated information sources, IEEE Transactions on Information Theory, vol. IT19, no. 7, pp. 471480, July 1973.
 [14] A. Wyner, Recent results in the Shannon theory, IEEE Transactions on Information Theory, vol. IT20, no. 1, pp. 29, January 1974.
 [15] A. M. Rohit Puri and K.Ramchandran, PRISM:A Video Coding Paradigm with Motion Estimation at the Decoder, IEEE transactions on Image Processing, vol. 16, no. 10, pp. 24362448, October 2007.
 [16] A. Wyner and J. ziv, The rate distortion function for source with side information at the decoder, IEEE Transactions on Information Theory, vol. 22, no. 2, pp. 110, January 1976.
 [17] Z. Li, L. Liu, and E. J. Delp , Rate distortion analysis of motion side estimation in wynerziv video coding, IEEE Transactions on Image Processing, vol. 16, no. 1, pp. 98113, January 2007.
 [18] Y. Oohama, Ratedistortion theory for Gaussian multiterminal source coding systems with several side informations at the decoder, IEEE Transactions on Information Theory, vol. 51, no. 7, pp. 25772593, July 2005.
 [19] S. Ghoneimy and S. F. Bahgat, Ratedistortion function for nth order gaussianmarkov process, Circuits Systems Signal Process, vol. 12, no. 4,pp. 567578, 1993.
 [20] G. B. Rath and A. Makur, Iterative least squares and compression based estimations for a 4parameter linear global motion model and global motion compensation, IEEE Transactions on Circuits and Systems for Video Tech, vol. 9, pp. 10751099, Oct 1999.
 [21] S. Payyavula, Automatic threshold selection using bayesian decision model for block based conditional motion estimation, MS Thesis Wichita State Univeresity, May 2004.
 [22] CCITT, Recommendation H.261: Video Codec for audiovisual services at p x 64 kbits/s, COM XVR 37E,1989.
 [23] ISO/IEC JTC1 IS 111722(MPEG1),Information Technology Coding of Moving pictures and Associated Audio for Digital Storage Media at up to About 1.5 Mbit/s, 1993.
 [24] S. Zhu and K. K. Ma, A new diamond search algorithm for fast blockmatching motion estimation, IEEE transactions on Image Processing, vol. 9, no. 2, pp. 287290, 2000.
 [25] B. J. Bunin, Rate distortion functions for Gaussian Markov process, The Bell System tecnical journal, pp. 30593075, November 1969.
 [26] S. German and D. Geman Stochastic relaxation, Gibbs Distributions, and the Bayesian restoration of images, IEEE Trasactions on Pattern Analysis and Machine Intelligence, Vol. 6, pp. 721241, Nov. 1984.
 [27] A. K. Jain Fundamentals of Digital Image Processing, PrenticeHall, Englewood Cliffs, ISBN 0133361659, 1989.
 [28] H. F. Ates and Y. Altunbasak, RateDistortion and Complexity Optimized Motion Estimation for H.264 Video Coding, IEEE Circuits Systems and Video Technology, Vol. 18, No. 2, pp. 159171, 2008.
 [29] C. Liu and I. Bouazizi and M. Gabbouj, Advanced Rate Adaptation for Unicast Streaming of Acalable Video, Proc. IEEE ICC, 2010.
 [30] Cook, G.W. and PradesNebot and J. and Yuxin Liu and Delp, E.J. , Ratedistortion analysis of motioncompensated rate scalable video, IEEE Trans. Image Processing, vol.15, no.8, pp. 21702190, 2006.
 [31] R. Zhang and M. Comer, Ratedistortion analysis for spatially scalable video coding, IEEE Trans. Image Processing, vol.19, no.11, pp. 29472957, 2010.
 [32] A. Ortega and K. Ramachandran, Ratedistortion methods for image and video compression, IEEE Signal Processing Magazine, Vol. 15, No.6, pp.2350, 1998.
 [33] G. Sullivan and T. Wiegand, Ratedistortion optimization for video compression, IEEE Signal Processing Magazine, Vol. 15, No.6, pp. 7490, 1998.
 [34] P. A. Chou and T. Lookabaugh and R. M. Gray, Entropyconstrained vector quantization, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, pp. 3142, Jan. 1989.
 [35] Y. Shoham and A. Gersho, Efficient bit allocation for an arbitrary set of quantizers, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 36, No. 9, pp. 14451453, 1988.
 [36] K. Namuduri, Motion estimation using spatiotemporal contextual information, IEEE Trans. Circuits Syst. Video Tech, Vol 14, No. 8, pp. 11111115, 2004.