Diagonalization involves reordering the rows and columns of tables and visualizations so that a diagonal pattern appears in the data, which facilitates interpretation. There are two main types of diagonal patterns: segmentation and hierarchy. Diagonalization can be performed manually by rearranging rows and columns so that all the small numbers appear in a corner of a table. It can also be performed automatically using a number of algorithms.
Diagonalization makes patterns easier
Diagonalization is most easily understood by looking at a worked example. First look at the heatmap of correlations shown below. Each number shows the correlation between the viewing of the different programs. For example, we can see that viewing of each program with itself is perfectly correlated (Correlation of 1), viewing of Star Trek Discovery is highly correlated with viewing Picard, while there is a weak negative correlation between viewing of The Mandalorian and Little Fires Everywhere.
The same data is shown again below, but the rows and the columns have been rearranged so that the patterns are clearer. We can easily see that the shows now in the first five rows are relatively highly correlated with each other. If you have some familiarity with these shows, you will recognize that they are all science fiction shows. We can also see that there is a second segment of shows, represented by the shows in the last five columns. Viewing of these programs is also more highly correlated with each other than with the science fiction shows. This second segment of shows are also similar in terms of content to each other, all being adult-oriented dramas.
Hierarchy and segmentation
A diagonalized table will typically show one of two patterns: segmentation and hierarchy.
Figures adapted from Marbeau, Y. (1998). Communication of research results. The ESOMAR Handbook of market research and opinion research. C. McDonald and P. Vangelder. Amsterdam, ESOMAR. 29: 519-552.
Diagonalize by rearranging rows so that all the small values appear in one or two corners.
The basic way to diagonalize a table is to keep rearranging rows and columns until all the small values appear in one or two columns of a table, A few comments on this:
- This is much easier to use if using software that allows you to drag and drop to rearrange rows while viewing a table (e.g., Q or Displayr).
- It does not matter which corner you have the small values in. Any corner is fine.
- Whether to have small numbers in one or two columns is determined by the pattern in the data. That is, it is not a decision that is made when performing diagonalization. If it is possible to have all the small numbers in one corner then the pattern is one of hierarchy. If small numbers appear in two corners, diagonally opposite each other, then the pattern is one of segmentation.
Using algorithms to automate diagonalization
Two ways of automatically diagonalizing are to:
- Sort the data based on the horizontal dimension in a correspondence analysis of the table.
- Sort using z-statistics as follows:
- Calculate the z-statistic for each cell on the table based on standard statistical tests <tk>.
- Move the cell with the highest z-statistic to the top-left corner o the table.
- Sort the first column and first row of the table from highest to lowest based on the z-score.
Please sign in to leave a comment.