Four-stage model of data compression

Almost all data compression systems can be viewed as comprising four successive stages of data processing arranged as a processing pipeline (though some stages will often be combined with a neighboring stage, performed "off-line," or otherwise made rudimentary).

The four stages are

(A) Preliminary pre-processing steps.

(B) Organization by context.

(D) Length-reducing code.

The ubiquitous compression pipeline (A-B-C-D) is what is of interest.

With (A) we mean various pre-processing steps that may be appropriate before the final compression engine.

Lossy compression often follows the same pattern as lossless, but with one or more quantization steps somewhere in (A). Sometimes clever designers may defer `loss' until suggested by statistics detected in (C); an example of this would be modern zerotree image coding.
(B) Organization by context often means data reordering, for which a simple but good example is JPEG's "Zigzag" ordering. The purpose of this step is to improve the estimates found by the next step.
(C) A probability estimate (or its heuristic equivalent) is formed for each token to be encoded. Often the estimation formula will depend on context found by (B) with separate 'bins' of state variables maintained for each conditioned class.
(D) Finally, based on its estimated probability, each compressed file token is represented as bits in the compressed file. Ideally, a 12.5%-probable token should be encoded with three bits, but details become complicated.

Table of contents

1 Further discussion

1.1 Preliminary pre-processing steps
1.2 Organization by context
1.3 Probability estimation
1.4 Entropy coding

2 Further examples

Further discussion

Preliminary pre-processing steps

Preliminary compression steps is a catch-all including such ideas as compaction transforms (e.g. subband image coding), hierarchal decompositions, template matching, string searching, and miscellaneous tricks like the Burrows-Wheeler transform. We aren't concerned with this stage here, except to note that many such preprocessing steps fall nicely within the scope of "Organization by Context." In lossy systems, any loss is usually introduced in (A), so remaining steps are the same for both lossy and lossless.

Organization by context

Organization by context often means data reordering, for which a simple example is JPEG's `Zigzag' ordering -- a simple device to sequence a vector's elements into approximate order of expected energy (or bit rate requirement). (One important but little-remarked reason why more modern zerotree-type image compressors outperform JPEG is that they bunch similar statistics across very large image areas, while JPEG just does a 64-pel block.)

More generally, context organization includes separation and concatenation to move data with similar context, such as adjacent pixel values, to adjacent locations in an encoding stream, or to provide a similar context character to a context-conditioned probability estimation machine.

Probability estimation

A probability estimate (or its heuristic equivalent) is formed for each token to be encoded; the estimates depend on data reordering or bin assignment in step (B). The probability estimate takes the form of a vector of k probabilities, summing to one, for k possible decision outcomes labelled 0, 1, .... k-1. Often k=2 and the outcome tokens are simply bits.

These probabilities can be reconstructed by the decoder, so what is encoded into the compressed file is just the outcome token. Instead of using {0, 1} as the outcome token set when k=2, it is often convenient to use {MPS, LPS} -- the More and Less Probable Symbol.

Eventually this section should discuss some topics in probability estimation:

Statistical Coding: Conditional probabilities
Statistical Coding: The `Overtraining' problem
Statistical Coding: Distributed estimation
Axioms for Probability estimation: efficacy, ergodicity, insensitivity
Static Probability estimation: offline, semi-adaptive
Stationary Probability estimation: Bayesian
Window-based Probability estimation: e.g. Lempel-Ziv
Decaying-average Probability estimation: e.g. QM-coder

Entropy coding

The encoded events are converted into the bits of the compressed data file. Let us first separate these systems into two types. K is relatively large. Huffman coding is popular and useful, especially when symbol probabilities were already estimated in an earlier "offline" effort. Adaptive Huffman coding is also in use.

K = 2. Golomb coding and its variants like Langdon coding are simple, effective and popular; these produce output bits only when LPS is encountered. Arithmetic coding is also effective, as are their close cousins, quasi-arithmetic codes and FSM codes. An advantage of arithmetic and related codes over Golomb coding is the automatic interleaving of compressed data from multiple contexts, though chips implementing interleaved Golomb-type codes have recently been introduced by Ricoh Co., Ltd.

Interemdiate values like K=3 might seem to pose a problem. In fact, practical systems may be slightly more complicated than indicated here. Also, in some systems, e.g. LZ text coding, stages (A)-(B) of the compression system may be designed to get good compression even with trivial (C)-(D).

Further examples

Instead of a Huffman code, a simple often applicable coding device is MTF (Move To Front) followed by a simple length-reducing code such as the so-called Fibonacci code. That might be viewed as MTF is stage (B), the simple "Fibo code" as stage (D), with stage (C) disappearing entirely, if the Fibo code fits the model statistics well.

See also: Data compression