The structure of the variable set is made up of its variables' data type and its set type.
An individual variable can be categorized as having a type, such as Text, Binary, Nominal, etc. See Overview of Data Types.
There are numerous possible set types, but the most common can be categorized into four groups:
- Variables sets containing a single variable.
- One-dimensional variable sets
- Two-dimensional variable sets
- Variable sets with structural dependencies
Variable sets containing a single variable
This is a trivial case, where a variable set is indistinguishable from a variable. For example, an age variable can be regarded as a variable set containing one variable.
One-Dimensional Variable Sets
A one-dimensional variable set contains multiple= variables of the same type, where the variables are either:
- Ordered by one dimension
Common variants include:
- Binary - Multi Variable Set
- Nominal - Multi Variable Set
- Ordinal - Multi Variable Set
- Numeric - Multi Variable Set
Two-Dimensional Variable Sets
These are a generalization of one-dimensional variable sets, except that the variables have a two-dimensional structure. The most common variants are:
Variable sets containing structural dependencies between the variables
In the two previous set types, the primary goal of the set type was to organize data in a way to facilitates analysis. There are also some variable set types where it can be impossible to analyze the variables independently, as there is a special relationship between them. Common examples are rankings, max-multi coded data, and experiments
Consider a question presenting 10 sneakers and asking people to rank them from most to least preferred. Typically with such data, one variable will be created for each sneaker, and the number will indicate the order with which it was chosen. Let's say that the third variable shows Air Jordan and it has an average ranking of 2.1. What does this mean? It means nothing without knowing the other shoe brands. To correctly analyze this data requires all the data to be analyzed jointly.
Consider the data that comes from a question that asks Which of these brands have you drunk in the past 7 days? Coke, Pepsi, Dr Pepper, None of these. Although the most convenient way of storing this data is as four binary variables, it is possible to instead record this data with one variable showing the first brand chosen, the second showing the second brand shown, etc. When data is in this format, some or all of the variables cannot be analyzed on their own.
Often a survey may show one group of people one concept, and a second group a second concept, and then ask something like "How likely would you be to buy the concept you saw?" (purchase intent). The purchase intent data is meaningless without knowing which person saw which concept, meaning the two variables have a relationship between them and need to be analyzed jointly.
Please sign in to leave a comment.