Covariance matrix
In
statistics, the
covariance matrix generalizes the concept of
variance from one to
n dimensions, or in other words, from
scalarvalued
random variables to
vectorvalued random variables (tuples of scalar random variables). If
X is a scalarvalued random variable with
expected value μ then its variance is
If
X is an
nby1 column vectorvalued random variable whose expected value is an
nby1 column vector μ then its variance is the
nby
n nonnegativedefinite
matrix
The entries in this matrix are the covariances between the
n different scalar components of
X. Since the covariance between a scalarvalued random variable and itself is its variance, it follows that, in particular, the entries on the diagonal of this matrix are the variances of the scalar components of
X. This may appear to be a property of this matrix that depends on which coordinate system is chosen for the space in which the random vector
X resides. However, it is true generally that if
u is any unit vector, then the variance of the projection of
X on
u is
u^{T}Σ
u. (This point is expanded upon somewhat at
[1]. It is a consequence of an identity that appears below.)
Nomenclatures differ. Some statisticians, following the probabilist William Feller, call this the variance of the random vector X, because it is the natural generalization to higher dimensions of the 1dimensional variance. Others call it the covariance matrix, because it is the matrix of covariances between the scalar components of the vector X.
With scalarvalued random variables X, we have the identity

if
a is constant, i.e., not random. If
X is an
nby1 column vectorvalued random variable and
A is an
mby
n constant (i.e., nonrandom) matrix, then
AX is an
mby1 column vectorvalued random variable, whose variance must therefore be an
mby
m matrix. It is
This covariance matrix (though very simple) is a very useful tool in many
very different areas. From it a transformation matrix can be derived
that allows one to completely decorrelate
the data or, from a different point of view, to find an optimal basis
for representing the data in a compact way.
This is called
PCA (
principal components analysis) in
statistics and
KLTransform (KarhunenLoève transform) in
image processing.