The Basic Mechanics of Principal Components Analysis – The Data Story Guide

The following description gives an explanation of how principal components analysis can be computed. The actual algorithm described below is not used in any standard program, but the commonly used algorithms can only be explained using mathematical concepts from linear algebra.

Computing the first component

As discussed on the main Principal Components Analysis page, PCA analyzes a Correlation Matrix and infers components that are consistent with the observed correlations.

Each component is created as a weighted sum of the existing variables. PCA starts by trying to find the single component which best explains the observed correlations between the variables.

Consider the following three variables:

v1	v2	v3
1	1	1
2	3	5
3	2	2
4	5	3
5	4	4

The correlation matrix of the three variables is:

	v1	v2	v3
v1	1.0	.8	.4
v2	.8	1.0	.6
v3	.4	.6	1.0

Note that there are moderate-to-strong correlations between all of the variables. Thus, any underlying component must be correlated with all the variables. A first guess then is that our new component could simply be the sum of each of the existing variables:

\(Component = 1.0 \times v1 + 1.0 \times v2 + 1.0 \times v3\)

The resulting component matrix, which shows the correlation between each of the variables and the computed component, is then:

	Component
v1	.856
v2	.934
v3	.778

These correlations are all very high and thus our estimated component is a pretty good component. However, it can be improved. Looking again at the correlation matrix, reproduced below again, we can deduce that our original guess of giving equal weights to the different components was a touch naïve. Note that v2 has the highest average correlation with all the variables. Thus, if we were instead to give a higher weight to v2 when estimating our component we will likely end up with marginally higher correlations with all the variables. Similarly, note that v3 has the lowest average correlation, and thus by the same argument it should be given a lower weight.

	v1	v2	v3
v1	1.0	.8	.4
v2	.8	1.0	.6
v3	.4	.6	1.0

Using trial and error, we can deduce that the optimal formula for computing the component is:

\(Component = 1.0 \times v1 + 1.086 \times v2 + 0.866 \times v3\)

Note that we have not multiplied v1 by anything other than 1. This is because the numbers that are multiplied by the other variables are relative to v1 having a weight of 1. If we were to put a weight other than 1 next to v1 we would then have to multiply each of these other weights by this number. For example, the following weights are the ones generated by SPSS (and shown in the Component Score Coefficient Matrix) and you can see that their relativities are the same:

\(Components = 1.0 \times v1 + 1.086 \times v2 + 0.866 \times v3\)

Computing the remaining components

The next component is computed as follows:

Regression is used to predict each variable based on its component.
The residuals of the regression model are then computed.
The correlation matrix is computed using the residuals.
The same basic process as described above is performed to create a second component.
These steps are then repeated until the number of components is equal to the number of variables.

Rotation

Typically, Varimax Rotation is performed to aid interpretation.