The Zipf-Mandelbrot law
(also known as the Pareto
is a power-law distribution
on ranked data, named after the Harvard linguistic professor
George Kingsley Zipf (1902
who suggested regularity in texts, and the mathematician Benoit Mandelbrot
(born November 20
), who generalized it.
The distribution of words ranked by their frequency in a random
corpus of text is generally a power-law distribution, known
as Zipf's law.
If one plots the frequency rank of words contained in a large
corpus of text data versus the number of occurences or actual
frequencies, one obtains a power-law distribution,
with exponent close to one (but see Gelbukh and Sidoro 2001).