Main Page | See live article | Alphabetical index

Binomial distribution

The binomial distribution is a discrete probability distribution which describes the number of successes in a sequence of n independent experiments, each of which yielding success with probability p. Such a success/failure experiment is also called a Bernoulli experiment.

A typical example is the following: 5% of the population are HIV-positive. You pick 500 people randomly. How likely is it that you get 30 or more HIV-positives? The number of HIV-positives you pick is a random variable X which follows a binomial distribution with n = 500 and p = .05. We are interested in the probability Pr[X ≥ 30].

In general, if the random variable X follows the binomial distribution with parameters n and p, we write X ~ B(n, p). The probability of getting exactly k successes is given by

Pr[X = k] = C(n, k) pk (1-p)n-k     for k = 0, 1, 2, ..., n
Here, C(n, k) denotes the binomial coefficient of n and k, whence the name of the distribution. The formula can be understood as follows: we want k successes (pk) and n-k failures ((1-p)n-k). However, the k successes can occur anywhere among the n trials, and there are C(n, k) different ways of distributing k successes in a sequence of n trials.

If X ~ B(n, p), then the expected value of X is

E[X] = np
and the variance is
Var(X) = np(1-p).
The most likely value or mode of X is given by the largest integer less than or equal to (n+1)p; if m = (n+1)p is itself an integer, then m-1 and m are both modes.

If X ~ B(n, p) and Y ~ B(m, p) are independent binomial variables, then X + Y is again a binomial variable; its distribution is B(n+m, p).

Two other important distributions arise as approximations of binomial distributions:

pictures of these approximations would be nice.

The formula for Bézier curves was inspired by the binomial distribution.