Efficient Macroblock Coding-Mode Decision for H.264/AVC Video Coding

inverse transform and recon-
struction of the pixels can be omitted. We calculate the residual error in the transform-domain
by taking advantage of the fact that the transform of several intra prediction signals (DC, Hori-
zontal, and Vertical) can be very efciently calculated, and their transform coefcients have few
non-zero entries.
This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part
without payment of fee is granted for nonprot educational and research purposes provided that all such whole or partial copies include
the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of
the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or
republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All
rights reserved.
Copyright c Mitsubishi Electric Research Laboratories, Inc., 2004
201 Broadway, Cambridge, Massachusetts 02139 MERLCoverPageSide2 Publication History:
1. First printing, TR-2004-79, 07 2004 PCS-2004 DRAFT
Xin et al.


1


Efficient Macroblock Coding-mode Decision for H.264/AVC Video Coding

Jun Xin, Anthony Vetro, and Huifang Sun
(jxin, avetro, hsun@merl.com)

Abstract
In this paper, we propose to use transform-domain processing in the macroblock coding-
mode decision of H.264/AVC such that an optimal mode decision is achieved with significantly
reduced computational complexity. Specifically, we achieve the computational savings by
calculating the distortion and the residual error in the transform-domain. We show that the
distortion calculation can be performed efficiently in the transform-domain such that the inverse
transform and reconstruction of the pixels can be omitted. We calculate the residual error in the
transform-domain by taking advantage of the fact that the transform of several intra prediction
signals (DC, Horizontal, and Vertical) can be very efficiently calculated, and their transform
coefficients have few non-zero entries.
I. Introduction
Major international video coding standards, including H.264/AVC [1], are based on a
basic hybrid-coding framework that uses motion compensated prediction to remove temporal
correlations and transforms to remove spatial correlations.
The basic encoding process of such a standard video encoder is shown in Figure 1. Each
frame of an input video is divided into macroblocks. Each macroblock is subject to a
transform/quantization, and entropy coding. The output of the transform/quantization is subject
to an inverse quantization/transform. Motion estimation is performed, and a coding-mode
decision is made considering the content of a pixel buffer. The coding-mode decision selects an
optimal coding-mode. Then, the result of the prediction is subtracted from the input signal to
produce an error signal. The result of the prediction is also added to the output of the inverse
quantization/transform and stored into the pixel buffer.
The macroblock can be encoded as an intra-macroblock, which uses information from
just the current frame. Alternatively, the macroblock can be encoded as an inter-macroblock,
which is predicted using motion vectors that are estimated through motion estimation from the
current and previous frames. There are various ways to perform intra-prediction and inter-
prediction.
In general, each frame of video is divided into macroblocks, where each macroblock
consists of a plurality of smaller-sized blocks. The macroblock is the basic unit of encoding,
while the blocks typically correspond to the dimension of the transform. For instance, both
MPEG-2 and H.264/AVC specify 16x16 macroblocks. However, the block size in MPEG-2 is PCS-2004 DRAFT
Xin et al.


2
8x8, corresponding to 8x8 DCT and Inverse DCT operations, while the block size in H.264/AVC
is 4x4 corresponding to the H.264/AVC 4x4 transform (HT) and inverse transform operations.
We use the notion of macroblock partition to refer to the group of pixels in a macroblock
that share a common prediction. The dimensions of a macroblock, block and macroblock
partition are not necessarily equal. An allowable set of macroblock partitions typically vary from
one coding scheme to another.
AVC defines a wide variety of allowable set of macroblock partitions. For instance, a
16x16 macroblock may have a mix of 8x8, 4x4, 4x8 and 8x4 macroblock partitions within a
single macroblock. Prediction can then be performed independently for each macroblock
partition, but the coding is still based on a 4x4 block.
The encoder selects the coding-modes for the macroblock, including the best macroblock
partition and mode of prediction for each macroblock partition, such that the video coding
performance is optimized. The selection process is conventionally referred to as macroblock
coding-mode decision.
In H.264/AVC, there are many available modes for coding a macroblock. The available
coding-modes for a macroblock in an I-slice include: intra_4x4 prediction and intra_16x16
prediction for luma samples, and intra_8x8 prediction for chroma samples.
In the intra_4x4 prediction, each 4x4 macroblock partition can be coded using one of the
nine prediction modes defined by the H.264/AVC standard. In the intra_16x16 and intra_8x8
predictions, each 16x16 or 8x8 macroblock partition can be coded using one of the four defined
prediction modes. For a macroblock in a P-slice or B-slice, in additional to the coding-modes
available for I-slices, many more coding-modes are available using various combinations of

Transform/Quantization

Inverse quantization/

Inverse transform

Pixel

buffers

Prediction

Entropy

coding

Motion

estimation

Mode

decision

Transform/Quantization

Inverse quantization/

Inverse transform

Pixel

buffers

Prediction

Entropy

coding

Entropy

coding

Motion

estimation

Mode

decision

Motion

estimation

Mode

decision


Figure 1. A standard video encoder based on hybrid DCT/MC. PCS-2004 DRAFT
Xin et al.


3
macroblock partitions and reference frames. Every coding-mode provides a different rate-
distortion (RD) trade-off.
Typically, the rate-distortion optimization uses a Lagrange multiplier to make the
macroblock mode decision [2][3]. The rate-distortion optimization evaluates the Lagrange cost
for each candidate coding-mode for a macroblock and selects the mode that yields a minimum
Lagrange cost. The process for determining the Lagrange cost needs be performed many times
because there are a large number of available modes for coding a macroblock according to the
H.264/AVC standard. Therefore, the computation of the rate-distortion optimized coding-mode
decision is very intensive. Consequently, there exists a need to perform efficient rate-distortion
optimized macroblock mode decision in H.264/AVC video coding.
In this paper, we provide an efficient method for determining the Lagrange cost, which
leads to an efficient, rate-distortion optimized macroblock mode decision. We will first give a
brief review of the conventional rate-distortion optimized macroblock mode decision, and then
present our proposed approach. We are currently working on simulations, and we will provide
the results in later versions of the paper.
II. Rate-distortion Optimized Macroblock Mode Decision
If there are N candidate modes for coding a macroblock, then the Lagrange cost of the n
th

candidate mode J
n
, is the sum of the Lagrange cost of its associated macroblock partitions:
,...,N
,
n
J
J
n
P
i
i
n
n
2
1

1
,
=
=
=






(1)
Where P
n
is the number of macroblock partitions of the n
th
candidate mode. A
macroblock partition can be of different size depending on the prediction mode. For example, the
partition size is 4x4 for the intra_4x4 prediction, and 16x16 for the intra_16x16 prediction.
If the number of candidate coding-modes for the i
th
partition of the n
th
macroblock is K
n,i
,
then the cost of this macroblock partition is
(
)
(
)
k
i
n
k
i
n
K
k
k
i
n
K
k
i
n
R
D
J
J
i
n
i
n
,
,
,
,
,...,
2
,
1
,
,
,...,
2
,
1
,
,
,
min
min

+
=
=
=
=





(2)
Where R and D are respectively the rate and distortion, and is the Lagrange multiplier.
The Lagrange multiplier controls the rate-distortion tradeoff of the macroblock coding, and may
be derived from a quantization parameter. The above equation states that the Lagrange cost of
the i
th
partition of the n
th
macroblock, J
n,i
, is selected to be the minimum of the K
n,i
costs that are
yielded by the candidate coding-modes for this partition. Therefore, the optimal coding-mode of
this partition is the one that yields J
n,i
.
The optimal coding-mode for the macroblock is selected to be the candidate mode that
yields the minimum cost, i.e.,

n
N
n
J
J
,...,
2
,
1
*
min
=
=








(3) PCS-2004 DRAFT
Xin et al.


4