A video sequence consists of a number of pictures - usually called frames. Subsequent frames are very similar, thus, containing a lot of redundancy. The goal is to remove the redundancy to gain better compression ratios.

A first approach would be to simply subtract a reference frame from a given frame.
The difference is then called *residual* and usually contains less energy (or information) than the original frame.
The residual can be encoded at a lower bit-rate with the same quality.
The decoder can reconstruct the original frame by adding the reference frame again.

A more sophisticated approach is to approximate the motion of the whole scene and the objects of a video sequence. The motion is described by some parameters that have to be encoded in the bit-stream. The pixels of the predicted frame are approximated by appropriately translated pixels of the reference frame. This gives much better residuals than a simple subtraction. However, the bit-rate occupied by the parameters of the motion model must not become too big.

Usually, the frames are processed in groups.
One frame (usually the first) is encoded without motion compensation just as a normal image.
This frame is called *I-frame* (intra-coded frame, MPEG terminology) or *I-picture*.
The other frames are called *P-frames* or *P-pictures* and are predicted from the I-frame or P-frame that comes (temporally) immediately before it.
The prediction schemes are, for instance, described as IPPPP, meaning that a group consists of one I-frame followed by four P-frames.

Frames can also be predicted from future frames.
The future frames have to be encoded before the predicted frames, of course.
Thus, the encoding order does not necesserily match the real frame order.
Such frames are usually predicted from two direction, i.e. from the I- or P-frames that immediately precede or follow the predicted frame.
These bidirectionally predicted frames are called *B-frames*.
A coding scheme could, for instance, be IBBPBBPBBPBB.

Table of contents |

2 Block Motion Compensation 3 Overlapped Block Motion Compensation 4 Motion Estimation |

- It models precisely the major part of motion usually found in video sequences with just a few parameters. The share in bit-rate of these parameters is neglectable.
- It does not partition the frames. This avoids artifacts at partition borders.
- A straight line (in the time direction) of pixels with equal spacial positions in the frame corresponds to a continuously moving point in the real scene. Other MC schemes introduce discontinuities in the time direction.

In **block motion compensation** (BMC), the frames are partitioned in blocks of pixels (e.g. macroblocks of 16×16 pixels in MPEG).
Each block is predicted from a block of equal size in the reference frame.
The blocks are not transformed in any way apart from being shifted to the position of the predicted block.
This shift is represented by a *motion vector*.

The motion vectors are the parameters of this motion model and have to be encoded into the bit-stream. As the motion vectors are not independent (e.g. if two neighbouring blocks belong to the same moving object), they are usually encoded differentially to save bit-rate. This means that the difference of the motion vector and the neighbouring motion vector(s) encoded before is encoded. An entropy codec can exploit the resulting statistical distribution of the motion vectors (around the zero vector).

It is possible to shift blocks by non-integer vectors, which is called *sub-pixel precision*.
This is done by interpolating the pixel's values.
Usually, the precision of the motion vectors is increased by one bit: *half-pixel precision*.
Of course, the computational expense for sub-pixel precision is much higher since the interpolation functions can be quite consuming.

The big disadvantage of block motion compensation are the discontinuities introduced at block borders (blocking artifacts). They have the form of sharp horizontal and vertical edges which, firstly, are highly visual for the human eye and, secondly, produce ringing effects (big coefficients in high frequency sub-bands) in the frequency transform used for transform coding of the residual frames.

**Motion estimation** (BME, OBME) is the process of finding optimal or near-optimal motion vectors which minimise the overall prediction error.
The prediction error of a block is defined as the mean squared error (MSE) between predicted and actual pixel values over all pixels of the block.

To find optimal motion vectors, one basically has to calculate the block prediction error for each motion vector within a certain search range and pick the one with the smallest error (full search). A faster and sub-optimal method is to use a coarse search grid for a first approximation and to refine the grid in the surrounding of this approximation in further steps. The most common representative of this method is the 3-step search which uses search grids of 3×3 motion vectors and 3 refinement steps to get an overall search range of 15×15 pixel.

For OBME, the pixel-wise prediction errors of a block and its overlapping neighbouring blocks have to be weighted and summed according to the window function before being squared. As in the process of successively finding/refining motion vectors some neighbouring MVs are not known yet, the corresponding prediction errors can be ignored (not added) as a sub-optimal solution.

The major disadvantages of OBMC are increased computational complexity of OBME, and the fact that prediction errors and, thus, also the optimal motion vectors depend on neighbouring blocks/motion vectors. Therefore, there is no algorithm with polynomial computational complexity that guarantees optimal motion vectors. However, there are near-optimal iterative and non-iterative methods with acceptable computational complexity.