How data should be analyzed is determined by the type of data. Various categorizations of data type have been developed which have implications for how data is analyzed:
- Numeric versus categorical data
- Psychometric measurement scales
- Modern data types
The concept of variable data types is applicable to all data but is primarily of interest when data is stored in variables.
Numeric vs categorical
The simplest distinction in data is whether it is categorical (e.g., sex) or numeric (e.g., height). Where data is categorical, it is most commonly analyzed by calculating counts and percentages. Where data is numeric, it is commonly analyzed using means, sums, and standard deviations.
Psychometric measurement scales
Traditionally psychometrics, which is the quantitative branch of psychology, splits each of numeric and categorical data into two sub-categories, with the resulting four types referred to as measurement scales:
- Nominal, which represents unordered categories. Nominal data is traditionally reported using proportions (e.g., 38% of people are Male).
- Ordinal, which represents ordered categories. Ordinal data can be reported in the same way as nominal data. For example, if we have asked people to give a rating of 1, 2, 3, 4, or 5, we can report the proportion of people that gave a rating of 3. However, additional results can be computed for ordinal data that are inappropriate for nominal data. For example:
- Medians.
- Ranges. For example, all results were between 2 and 5.
- Proportions of ranges of values (e.g., 38% of people gave a rating of 3 or more out of five).
- Interval data, for which is where it is meaningful to compare the difference between values. For example, the difference between an IQ of 130 and an IQ of 120 is twice the difference between an IQ of 120 and an IQ of 100. All calculations that can be conducted for ordinal data are also meaningful for interval data. However, there are many additional types of calculations that are applicable for interval data but not ordinal and nominal data, such as:
- Means
- Standard deviations
- Linear regressions
- Ratio data, for which meaningful conclusions can be obtained by dividing one number by another. For example, somebody who has $1,000 has twice as much money as somebody who has $500. All the calculations that can be performed for nominal, ordinal, and interval data can also be meaningfully performed for ratio data.
For more information, see the separate articles on Nominal Data, Ordinal Data, Interval Data, and Ratio Data and Stevens, S. S. (1959). Measurement. In C. W. Churchman, ed., Measurement: Definitions and Theories, pp. 18-36. New York: Wiley. Reprinted in G. M. Maranell, ed., (1974) Scaling: A Sourcebook for Behavioral Scientists. Chicago, Aldine: 22-41.
Modern data types
In modern data analysis, some additional types have become widely recognized:
- Binary Data
- Text Data
- Date/Time
- Integer Data
- Count Data
- Duration Data
- Nominal-Ordinal Data
- Banded Data
Data type for variables and in general
The idea of data type is most useful when thinking about variables and their generalizations (e.g., data frames, matrices).
It is theoretically applicable in other contexts, but in practice is not very useful. This is because in most other contexts the data has ratio-scale properties. For example:
- When a proportion is calculated from a nominal variable (e.g., 28% of people like the color blue), this proportion is itself ratio data, even though the underlying variable is nominal.
- When a median is calculated for an ordinal variable, the median itself is ratio data.
Comments
0 comments
Please sign in to leave a comment.