Kohonen Overview

The basic units are neurons , and they are organized into two layers: the input layer and the output layer (also called the output map ). All input neurons are connected to all output neurons, and these connections have strengths , or weights , associated with them. During training, each unit competes with all the others to “win” each record.

Input data is presented to the input layer, and the values are propagated to the output layer. The output neuron with the strongest response is said to be the winner and is the answer for that input.

Initially, all weights are random. When a unit wins a record, its weights (along with those of other nearby units, collectively referred to as a neighborhood ) are adjusted to better match the pattern of predictor values for that record. All input records are shown, and weights are updated accordingly. This process is repeated many times until the changes become very small. As training proceeds, the weights on the grid units are adjusted so that they form a two-dimensional “map” of the clusters (hence the term self-organizing map ).

The output map is a two-dimensional grid of neurons, with no connections between the units.

When the network is fully trained, records that are similar should be close together on the output map, whereas records that are vastly different will be far apart.

As noted previously, instead of trying to predict a known outcome, a Kohonen network is a form of unsupervised learning and thus does not use a target field. Instead of trying to predict an outcome, Kohonen nets try to uncover patterns in the set of input fields. Usually, a Kohonen net will end up with a few units that summarize many observations ( strong units), and several units that don’t really correspond to any of the observations ( weak units). The strong units (and sometimes other units adjacent to them in the grid) represent probable cluster centers.

Another use of Kohonen networks is in dimension reduction. The spatial characteristic of the two-dimensional grid provides a mapping from the k original predictors to two derived features that preserve the similarity relationships in the original predictors. In some cases, this can give you the same kind of benefit as factor analysis or PCA.

Next steps

Like your visualization? Why not deploy it? For more information, see Deploy a model.