When multiple categories are combined on a table, they are referred to as a NET. This article defines a NET and describes their use and interpretation with a single variable and with sets of variables.
Nets of single variables
NETs, also known as netts and nets, are the number or proportion of people to choose one or more of a set of categories. In the table below, which shows a summary table of a categorical variable (see Overview of Summary Tables), two nets are shown:
- The net of people who are in any of the first three age categories is 35%. In this case, this is just the sum of the above three numbers (as is discussed below, this is not always the case).
- The net of people who are in any of the categories is 100%. This is just the total of all the numbers above, excluding NET 18 to 34.
Nets of multiple binary variables
The table below shows a summary table from a multiple response question, where the underlying data is seven binary variables. As discussed in Overview of Summary Tables, with such summary tables, the categories are not mutually exclusive. If we sum up the percentages, they add up to 154% rather than 100%. This emphasizes a key distinction between summary tables of a single variable versus those of multiple binary variables. In the summary table of a single variable, the overall net (at the bottom) can readily be thought of as being the same thing as a total. However, with multiple response data, the NET is a different thing from the total and, in general, a more useful thing.
The table below has three additional nets added to it: Net Coke, Net Pepsi, and Net Cola. Again, these nets are not totals. For example, if we sum up the three Coke brands, they sum up to 94%, but Net Coke is 72%. We can thus deduce that 22% of people in the data are consuming two or more of the Coke brands weekly.
Unlike summaries of single variables, there is no guarantee that multiple response data will have an overall net of 100%. For example, the table below summarizes the six variables representing the six brands but leaves out the None of these variable. With this data, the overall NET is 82%, which is the proportion of people to mention one or more of the brands.
Nets and missing values
In the example of multiple response data above, the percentages of the individual categories add up to more than the overall net. In the table below, it's the other way around. The overall net shows the seemingly impossible result of 0%. How can this be?
The table above shows the percentages, as well as the counts (the number of people to select each brand) and the sample size (the number of people to have valid data for each of the variables). The percentages are computed as Count / Sample Size.
In the table below, the percentages are computed in the same way, and the reason that the NET is 0% is that it is computed as Count / Sample Size = 0 / 7 = 0%. The important thing to appreciate when a net looks wrong is that it indicates a data integrity problem of some kind.
The table below shows the raw data used to construct the summary table above. It shows a lot of NaNs, which stands for Not a Number. The reason that the Sample Size is 7 is that there are only 7 people who have completed data for each row, none of these people have selected any of the brands.
Some researchers, when they first encounter this problem, think that the solution is to change how sample size is defined. A common proposal is that the sample size should be defined as people who have any data rather than people who have complete data. When implemented, such proposals never work and just create a different set of problems. This is because the core problem is the data, not the way it has been analyzed.
A good strategy, when faced with problematic data, is to question the data generation mechanism (i.e., what caused the weird data to come about?). In this case, a bit of investigation revealed that the reason that people have missing data is that they were only asked about brands that they had not already mentioned in the survey. Consequently, each of the NaNs actually represents a brand that the respondent was familiar with, and the correct fix is to replace all the NaN values with 1s, and then recompute the summary table.
The table of the recoded data is shown below. Note that:
- All the numbers have changed. For example, in the table above, Optus was 15%. It's now 90%. This emphasizes a key point: the problem wasn't with the net. Rather, the problem with the data became obvious due to the net not making sense.
- The overall net is now a much more sensible value of 99%.
- The 7 people that had not heard of any of the brands are still evident in the overall net. That is, once rounding is taken into account, 99% of 725 is 718, and 725 - 718 = 7.
Nets of numeric data (sums)
With numeric data, a "net" is, in reality, a sum of the values, as shown in the total below. Note:
- When there is no missing data, the sum will just be the sum of the other averages.
- If there are missing values, then the sum will not be "sensible", and it is a sign of a data integrity problem that can likely be fixed by recoding the data (e.g., replacing missing values with a 0).
Comments
0 comments
Please sign in to leave a comment.