The self-organising map
is a method for unsupervised learning
, based on a grid of artificial neurons whose weights are adapted to match input vectors in a training set. It was first described by Teuvo Kohonen, and so is sometimes referred to as a Kohonen map
The SOM algorithm is fed with feature vectors, which can be of any dimension. In most applications, however, the number of dimensions will be high.
Output maps can also be made in different dimensions: 1-dimensional, 2-dimensional, etc., but most popular are 2D and 3D maps, for SOMs are mainly used for dimensionality reduction rather than expansion.
The algorithm is explained most easily in terms of a set of artificial neurons, each having its own physical location on the output map, which take part in a winner-take-all process (a competitive network) where a node with its weight vector closest to the vector of inputs is declared the winner and its weights are adjusted making them closer to the input vector.
Each node has a set of neighbours. When this node wins a competition, the neighbours' weights are also changed. They are not changed as much though. The further the neighbour is from the winner, the smaller its weight change.
This process is then repeated for each input vector, over and over, for a number (usually large) of cycles. Different inputs produce different winners.
The network winds up associating output nodes with groups or patterns in the input data set. If these patterns can be named, the names can be attached to the associated nodes in the trained net.
Like most artificial neural networks, the SOM has two modes of operation:
- During the training process a map is built, the neural network organises itself, using a competitive process. The network must be given a large number of input vectors, as much as possible representing the kind of vectors that are expected during the second phase (if any). Otherwise, all input vectors must be administered several times...
- During the mapping process a new input vector may quickly be given a location on the map, it is automatically classified or categorised. There will be one single winning neuron: the neuron whose weight vector lies closest to the input vector. (This can be simply determined by calculating the Euclidean distance between input vector an weight vector.)
A newer version of the self-organizing map is called the generative topographic map (GTM). The GTM was first presented in 1996
in a paper by Bishop, Svensen, and Williams. The GTM is a probabilistic version of SOM, which is provably convergent and does not require a shrinking neighborhood or a decreasing step size. The GTM is a generative model of data: the training data is assumed to arise by first probabilistically picking a point in a low-dimensional space, mapping the point to the observed high-dimensional input space (via a smooth function), then adding noise in the high-dimensional input space. The parameters of the low-dimensional probability distribution, the smooth map, and the noise in the high-dimensional input space are all learned from the training set by the Expectation-Maximization (EM) algorithm.