This article describes how to interpret summary grids of binary variables. It describes how to read the summary table, the underlying data, nets, and interpretation of statistical tests.

## Reading grids of binary variables

The table below shows brand imagery for different cola brands. For example, we can see that 6% of people regard *Coke *as *Feminine, *2% regard *Coke *as *Health-conscious, *etc.

## The underlying data

The table above is a summary of 63 binary variables, which are shown in the table below. The first variable, *Feminine, Coke *records whether people thought Coke was a feminine brand (1) or not (0). Six percent (6%) of the data in the *Feminine, Coke *column of the table below are 1s, and consequently, 6% is shown for *Coke *and *Feminine *in the table above.

If you scroll to the right in the table below you will see that it has a *looped *pattern, with the six brands and *None of these appearing* for each of the 9 brand personality attributes (*Feminine, **Health-conscious, Innocent*, etc.*). *It is this consistent looped structure that allows the table to be efficiently displayed as a grid. Alternatively, the data could be represented as multiple response data, with a single column showing summarizing all 63 of the variables.

## Nets on grids

The summary table is reproduced below. While it is a summary of 63 variables, this table shows 80 cells. The summary of the 63 variables is shown in the sub-table that excludes the *NET *row and column. The *NET *row and columns are derived from the 63 variables.

The bottom row of the table shows the *net. *This is not the total of the numbers above. Rather, it is the proportion of people to have indicated that at least one of the row categories is *Feminine. *The net for each column is 100% because everybody has either selected a brand or the *None of these *option*.*

The final column also shows a net*, *but in this case, only the final one is 100%. Looking at the *Coke *rows, the interpretation of the *NET *value is that 98% of people chose *Coke *for at least one of the brand personality attributes shown in the table.

The table below adds *NET COKE *and *NET PEPSI *as rows. These are not totals of the numbers above. Rather, they show the percentage of people to have data for any of the corresponding rows. For example, 72% for *Feminine *and *NET COKE *means that 72% of people selected one or more of *Coke, **Diet Coke, *and *Coke Zero *as being *Feminine.*

### Statistical tests

Consider the table below, which shows attributes associated with different tech brands. The first column, *Easy to use, *reveals that *Google* has a marginally worse score (58%) than *Apple *(59%). Yet, the significance test shows the score for *Google *is significantly high and the score for *Apple *is significantly low. This looks like a mistake, but it is not.

With simpler summary tables, statistical tests compare whether a number is higher or lower than the other numbers in the table (see Summary Tables in Survey Analysis). However, such tests are often not useful with grids, as typically our interest is more in understanding whether there is a relationship between the row and column categories, rather than if a specific value is above average or not.

The chart below plots the first two rows of the table above. What jumps out from this chart is that on all but one attribute, Apple has higher scores than Google. Apple's average score is 57 whereas Google's is 39. When viewed against this context, it becomes clear that the 58% *Easy to use *score is actually a good score for Google. When we look at the chart, we see that it is the second-highest score for Google, and is only marginally behind the best score of 59% for Innovative. Similarly, we can see that the 59% score for Apple is the third-lowest score.

The example of *Google* versus *Apple *emphasizes that we need to take *row effects *into account when interpreting a binary grid. (These are also known as *brand effects *in *brand association tables, *which is a technical term for a table comparing brands by attributes, like all the tables in this article). The table is reproduced again below. Note the *Good customer service *column. While the numbers in this column vary from 4% to 51%, none are marked as significant. This is because all the variation in these numbers is explainable by looking at the row effects. That is, once the difference between the rows (brands) are factored in, there is no difference between the brands in terms of customer service.

Just as row effects need to be factored into analyses, so do column effects. There are some columns that just have generally lower, or higher, scores, and this needs to be factored in when looking at analyses. We can see this by comparing the *High quality *versus *Low prices *column. A score of 40% for *Google *on *High quality *is significantly low, whereas 17% for *Low prices *is not significantly low. This is because the average *High quality *score is higher than the average *Low prices *score so when this is factored in, Google's score on *High quality *is disappointing, but the score on *Low prices *is on par with average.

The way that the significance tests are computed is as follows:

- A
*log-linear*model is used to compute and calculate the row and column effects. - The model is used to predict the expected score for each cell in the table under the assumption that the score in the cell of the table is entirely explained by the row and column effects. The
*Expected %*scores are shown in the table below. We can see, for example, that*Google's*expected score for*Easy to use*is 46%. - Statistical tests (a
*score test*) then compare the observed result (58%) with the expected result.

## Comments

0 comments

Please sign in to leave a comment.