Table of contents |

2 Examples 3 The Rao-Blackwell theorem |

In statistics, one often considers a family of probability distributions for a random variable `X` (and `X` is often a vector whose components are scalar-valued random variables, frequently independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. A quantity `T(X)`
that depends on the (observable) random variable `X` but **not** on the (unobservable) parameter θ is called a **statistic**. Sir Ronald Fisher tried to make precise the intuitive idea that a statistic may capture all of the information in `X` that is relevant to the estimation of θ. A statistic that does that is called a **sufficient statistic**. The precise definition is this:

- A statistic
*T*(*X*) is**sufficient for θ**precisely if the conditional probability distribution of the data*X*given the statistic*T*(*X*) does not depend on θ.

- If
*X*_{1}, ....,*X*_{n}are independent Bernoulli-distributed random variables with expected value*p*, then the sum*X*_{1}+ ... +*X*_{n}is a sufficient statistic for*p*. - If
*X*_{1}, ....,*X*_{n}are independent and uniformly distributed on the interval [0,θ], then max(*X*_{1}, ....,*X*_{n}) is sufficient for θ.

Since the conditional distribution of `X` given *T*(*X*) does not depend on θ, neither does the conditional *expected value* of *g*(*X*) given *T*(*X*), where *g* is any (sufficiently well-behaved) function. Consequently that conditional expected value is actually a *statistic*, and so is available for use in estimation. If *g*(*X*) is any kind of estimator of θ, then typically the conditional expectation of *g*(*X*) given *T*(*X*) is a better estimator of θ ; one way of making that statement precise is called the **Rao-Blackwell theorem**. Sometimes one can very easily construct a very crude estimator *g*(*X*), and then evaluate that conditional expected value to get an estimator that is in various senses optimal.