Maximum likelihood
In
statistics, the method of
maximum likelihood, pioneered by
geneticist/
statistician Sir Ronald A. Fisher, is a method of
point estimation, that uses as an estimate of an unobservable population parameter the member of the parameter space that maximizes the
likelihood function.
For the moment let
p denote the unobservable population parameter to be estimated. Let
X denote the random variable observed (which in general will not be
scalarvalued, but often will be a vector of probabilistically
independent scalarvalued random variables. The probability of an observed outcome
X=x (this is casesensitive notation!), or the value at (lowercase)
x of the probability density function of the random variable (Capital)
X,
as a function of p with x held fixed is the
likelihood function

For example, in a large population of voters, the proportion
p who will vote "yes" is unobservable, and is to be estimated based on a political opinion poll. A sample of
n voters is chosen randomly, and it is observed that
x of those
n voters will vote "yes". Then the likelihood function is

The value of
p that maximizes
L(p) is the
maximumlikelihood estimate of
p. By finding the root of the first derivative one will obtain
x/n as the maximumlikelihood estimate. In this case, as in many other cases, it is much easier to take
the logarithm of the likelihood function before finding the root of the derivative:

Taking the logarithm of the likelihood is so common that the term
loglikelihood is commonplace among statisticians. The loglikelihood is closely related to
information entropy.
If we replace the lowercase x with capital X then we have, not the observed value in a particular case, but rather a random variable, which, like all random variables, has a probability distribution. The value (lowercase) x/n observed in a particular case is an estimate; the random variable (Capital) X/n is an estimator. The statistician may take the nature of the probability distribution of the estimator to indicate how good the estimator is; in particular it is desirable that the probability that the estimator is far from the parameter p be small. Maximumlikelihood estimators are sometimes better than unbiased estimatorss. They also have a property called "functional invariance" that unbiased estimators lack: for any function f, the maximumlikelihood estimator of f(p) is f(T), where T is the maximumlikelihood estimator of p.
However, the bias of maximumlikelihood estimators can be substantial. Consider a case where n tickets numbered from 1 through to n are placed in a box and one is selected at random, giving a value X. If n is unknown, then the maximumlikelihood estimator of n is X, even though the expectation of X is only n/2; we can only be certain that n is at least X and is probably more.