Likelihood principle

The likelihood principle is a principle of inference which asserts that all of the information in a sample is contained in the likelihood function.

(A likelihood function is a conditional probability distribution considered as a function of its second argument, holding the first fixed. For example, consider a model which gives the probability of observables X as a function of a parameter . Then for a specific value of X,

is a likelihood function for .)

A widely-used application of the likelihood principle is the method of maximum likelihood.

The likelihood principle is not universally accepted. Some widely-used methods of conventional statistics, for example significance tests, are not consistent with the likelihood principle. Let us briefly consider some of the arguments for and against the likelihood principle.

Arguments in favor of the likelihood principle

From a Bayesian point of view, the likelihood principle is a consequence that falls out of Bayes' theorem. An observation A enters the formula,

only through the likelihood function, . In general, observations come into play through the likelihood function, and only through the likelihood function; no other mechanism is needed.

Arguments against the likelihood principle

The likelihood principle implies that any event that did not happen, has no effect on an inference. (For if an unrealized event does affect an inference, there is some information not contained in the likelihood function.) However, unrealized events do play a role in some common statistical methods. For example, the result of a significance test depends on the probability of a result as extreme or more extreme than the observation. Thus, to the extent that such methods are accepted, the likelihood principle is denied.

The likelihood principle also yields some apparently paradoxical results. A commonly cited example is the optional stopping problem. Suppose I tell you that I tossed a coin 10 times and observed 7 heads. You might make some inference about the probability of heads. Suppose now I tell that I tossed the coin until I observed 7 heads, and I tossed it 10 times. Will you now make some different inference?

The likelihood function is the same in both cases: it is proportional to

According to the likelihood principle, the inference should be the same in either case. But there seems to be something fishy; it would seem possible to argue to a foregone conclusion by simpling tossing the coin enough. Such apparently-paradoxical results of this kind are considered evidence against the likelihood principle.