The Relationship Between Cluster Analysis, Latent Class Analysis and Self-Organizing Maps – The Data Story Guide

There are four main types of algorithms in use for cluster-based segmentation:

Hierarchical cluster analysis
K-Means Cluster Analysis
Latent class analysis
Self-organizing maps

Overall evaluation of the algorithms

Where the goal is to form segments, latent class analysis is almost always preferable to any of the other algorithms. Indeed, the other algorithms should generally be regarded as "plan B" algorithms, only used when latent class analysis cannot be used. This is because latent class analysis has important strengths relative to the other algorithms, whereas the other algorithms have no substantive advantages over latent class analysis. However, as ultimately segmentation is part art and part science, it is often the case that the other algorithms can lead to useful and even superior solutions to those obtained from latent class analysis, so the best approach is to use latent class analysis if in a rush but to consider multiple different segmentation where time permits.

The strengths of latent class analysis

It is able to accommodate lots of different types of data. For example, it can be used to create segments using combinations of categorical, numeric, and other more exotic types of data, whereas most programs developed for the other algorithms can only accommodate numeric variables.
It can deal with missing data in a sensible way, allocating people into segments based on their available data, whereas the standard implementations of the other algorithms only work with no missing data (technically, latent class analysis makes a missing at random assumption, whereas other methods make a less plausible assumption that any data is missing completely at random).
It can accommodate weights. Implementations of the other algorithms generally ignore weights (and, there are no standard weighted versions of any of these other algorithms).
It is a theoretically superior model. That is, latent class models are built upon many decades of statistical theory. By contrast, the other algorithms are all one-off algorithms which have no strong theoretical support. This is explained in a little more detail below.
Latent class algorithms can be modified to incorporate lots of varied phenomena (e.g., predictor variables, complex sampling, response biases), all of which are not readily addressed with the alternative algorithms.

The advantages of the other algorithms

As discussed below, k-means cluster analysis can be viewed as a variant of latent class analysis. Its only advantage over latent class analysis is that it is much faster to compute which means that with a huge database k-means can be preferable.

Hierarchical cluster analysis can produce a dendrogram (i.e., a graph which shows the order in which segments are grouped together).

Self-organizing maps create clusters that are ordered on a two-dimensional "map" and, where a large number of clusters are created, this can be beneficial from a presentation perspective.

While each of these advantages can be relevant in some circumstances they are, by and large, irrelevant in most segmentation studies which is why latent class is, in general, superior.

The relationship between the algorithms

Each of latent class analysis, k-means cluster analysis, and self-organizing map algorithms have an almost identical structure:

Step 1: Initialization. Observations are assigned to a pre-determined number of clusters. Most commonly this is done randomly (either by randomly assigning observations to clusters or by randomly generating parameters). However, it can involve assigning respondents to pre-existing groups (e.g., by age) or using another algorithm to form the initial clusters (e.g., hierarchical cluster analysis to form some initial clusters prior to running k-means cluster analysis).

In the case of self-organizing maps, each cluster is assigned a location on a grid (e.g., if there are 12 clusters, the clusters may be assigned positions in a grid with three rows and four columns).

Step 2: Initial cluster description. A statistical summary is prepared of each cluster. With k-means and self-organizing maps, this involves computing the mean value for each variable in each cluster. Latent class analysis also typically involves computation of the means, occasionally measures of variation (e.g., the standard deviation) as well as the sizes of the clusters.

Step 3: Computing the distance between each observation and each cluster. A measure of the distance between each observation and each cluster is computed. With latent class analysis, a probability of cluster membership is computed; this probability takes into account both the distance from each observation from each cluster and the size of the cluster.

Step 4: Revising the cluster descriptions. Using the result of Step 3 the cluster descriptions are updated. This occurs in slightly different ways for each of the algorithms:

Latent class analysis: A weighted analysis is undertaken for each cluster, computing the cluster description with the probability of cluster membership as the weight and computing the size of each cluster as the average of the probabilities.
Cluster analysis: The mean for each cluster on each variable is computed as the average values of the variables for the observations that are most similar to the cluster's current description. Note that this is basically the same process as with latent class analysis, except that the "weights" are all 1s and 0s (i.e., where a 1 indicates closest to a cluster).
Self-organizing maps: The mean for each cluster on each variable is computed as the average values of the variables for the observations that are most similar to the cluster's current description and observations in clusters that are located nearby on the grid.

Step 5: Iteration to convergence'. Steps 3 and 4 are referred to jointly as an iteration. They are repeated in a continuous cycle until the descriptions stabilize (which is referred to as convergence). In the case of cluster analysis, this may take only a few iterations. In the case of latent class analysis and self-organizing maps, it can take hundreds or thousands of observations.

The latent class model is, in general, best from a theoretical perspective as the ways that it differs from the other two algorithms are, from a theoretical perspective, desirable:

When allocating observations to clusters it is necessarily the case that there will often be some uncertainty (e.g., if an observation has equally close to two clusters). This is taken into account by only the latent class algorithm.
It can be proved using probability theory that if allocating respondents to clusters it is appropriate to take into account the size of the clusters. For example, if a respondent is equally similar to two clusters, but one cluster is ten times the size of the other cluster, the respondent should be more likely to be allocated to the larger cluster (due to Bayes' Theorem). Only the latent class approach complies with this principle.

Note that these two advantages relate to the earlier discussion of the theoretical advantages of latent class analysis. As discussed above, there are other more pragmatic benefits to using latent class analysis.

Comparing hierarchical cluster analysis with the other algorithms

Hierarchical cluster analysis forms clusters as follows:

Step 1: Each observation is assigned to a cluster (i.e., if there are 100 observations then there are 100 clusters).

Step 2: The distance is computed between all pairs of clusters.

Step 3: The two most similar clusters are merged.

Step 4: Repeat steps 2 and 3 until only a single cluster remains.

Numerous hierarchical cluster analysis algorithms have been developed and they differ in terms of how they conducted step 2 (e.g., do they compute the distance between the two most similar observations in a cluster, the two least similar, the average, etc.).

A strength of hierarchical cluster analysis is that it always ensures that the most similar observations are in the same clusters. However, this strength also causes its great weakness: often there are situations where a cluster should be split and reallocated to other clusters, but because no such step exists in hierarchical cluster analysis this never occurs. The net effect of this is that commonly clusters created by hierarchical cluster analysis are less homogeneous than clusters formed by the other algorithms.

The only situation where a hierarchical cluster analysis algorithm will better explain the variation in data is when there has been a problem with the other algorithms, such as with the random selection of initial clusters being poor. Historically this strength was a sufficient reason to consider using hierarchical cluster analysis, either as a standalone algorithm or as a stage before k-means cluster analysis. However, most k-means cluster analysis, latent class, and self-organizing map programs can now compute lots of different segmentations, each using different start-points, making hierarchical cluster analysis a generally inferior method, except where there is an interest in the dendrogram (which is a tree showing the history of the merging of the different clusters).