Interpreting Grids of Binary Variables – The Data Story Guide

This article describes how to interpret summary grids of binary variables. It describes how to read the summary table, the underlying data, nets, and interpretation of statistical tests.

Reading grids of binary variables

The table below shows brand imagery for different cola brands. For example, we can see that 6% of people regard Coke as Feminine, 2% regard Coke as Health-conscious, etc.

The underlying data

The table above is a summary of 63 binary variables, which are shown in the table below. The first variable, Feminine, Coke records whether people thought Coke was a feminine brand (1) or not (0). Six percent (6%) of the data in the Feminine, Coke column of the table below are 1s, and consequently, 6% is shown for Coke and Feminine in the table above.

If you scroll to the right in the table below you will see that it has a looped pattern, with the six brands and None of these appearing for each of the 9 brand personality attributes (Feminine, Health-conscious, Innocent, etc.). It is this consistent looped structure that allows the table to be efficiently displayed as a grid. Alternatively, the data could be represented as multiple response data, with a single column showing summarizing all 63 of the variables.

Nets on grids

The summary table is reproduced below. While it is a summary of 63 variables, this table shows 80 cells. The summary of the 63 variables is shown in the sub-table that excludes the NET row and column. The NET row and columns are derived from the 63 variables.

The bottom row of the table shows the net. This is not the total of the numbers above. Rather, it is the proportion of people to have indicated that at least one of the row categories is Feminine. The net for each column is 100% because everybody has either selected a brand or the None of these option.

The final column also shows a net, but in this case, only the final one is 100%. Looking at the Coke rows, the interpretation of the NET value is that 98% of people chose Coke for at least one of the brand personality attributes shown in the table.

The table below adds NET COKE and NET PEPSI as rows. These are not totals of the numbers above. Rather, they show the percentage of people to have data for any of the corresponding rows. For example, 72% for Feminine and NET COKE means that 72% of people selected one or more of Coke, Diet Coke, and Coke Zero as being Feminine.

Statistical tests

Consider the table below, which shows attributes associated with different tech brands. The first column, Easy to use, reveals that Google has a marginally worse score (58%) than Apple (59%). Yet, the significance test shows the score for Google is significantly high and the score for Apple is significantly low. This looks like a mistake, but it is not.

With simpler summary tables, statistical tests compare whether a number is higher or lower than the other numbers in the table (see Summary Tables in Survey Analysis). However, such tests are often not useful with grids, as typically our interest is more in understanding whether there is a relationship between the row and column categories, rather than if a specific value is above average or not.

The chart below plots the first two rows of the table above. What jumps out from this chart is that on all but one attribute, Apple has higher scores than Google. Apple's average score is 57 whereas Google's is 39. When viewed against this context, it becomes clear that the 58% Easy to use score is actually a good score for Google. When we look at the chart, we see that it is the second-highest score for Google, and is only marginally behind the best score of 59% for Innovative. Similarly, we can see that the 59% score for Apple is the third-lowest score.

The example of Google versus Apple emphasizes that we need to take row effects into account when interpreting a binary grid. (These are also known as brand effects in brand association tables, which is a technical term for a table comparing brands by attributes, like all the tables in this article). The table is reproduced again below. Note the Good customer service column. While the numbers in this column vary from 4% to 51%, none are marked as significant. This is because all the variation in these numbers is explainable by looking at the row effects. That is, once the difference between the rows (brands) are factored in, there is no difference between the brands in terms of customer service.

Just as row effects need to be factored into analyses, so do column effects. There are some columns that just have generally lower, or higher, scores, and this needs to be factored in when looking at analyses. We can see this by comparing the High quality versus Low prices column. A score of 40% for Google on High quality is significantly low, whereas 17% for Low prices is not significantly low. This is because the average High quality score is higher than the average Low prices score so when this is factored in, Google's score on High quality is disappointing, but the score on Low prices is on par with average.

The way that the significance tests are computed is as follows:

A log-linear model is used to compute and calculate the row and column effects.
The model is used to predict the expected score for each cell in the table under the assumption that the score in the cell of the table is entirely explained by the row and column effects. The Expected % scores are shown in the table below. We can see, for example, that Google's expected score for Easy to use is 46%.
Statistical tests (a score test) then compare the observed result (58%) with the expected result.