Data transformation is the process of changing the data in some way. More formally, a transformation involves creating a new variable or set of variables from an existing variable or set of variables.
Objectives of transformation
Data transformation is undertaken with the following objectives:
- Making it easier to see patterns in the data (e.g., the Log transformations and Principal Components Analysis).
- Making it easier to communicate patterns in the data (e.g., the Net Promoter Score).
- To address violations of the assumptions of statistical tests (e.g., Ranks, Log transformations).
- To improve the validity of regression models (e.g., Basis Functions).
- To reduce the amount of data (e.g., Principal Components Analysis).
Standard transformations of a categorical variable
A categorical variable can be transformed in one of two ways:
- It can be turned into a numeric variable, by coming up with some rules about the numeric interpretation of categories. For example:
- Replacing the category 18 to 24 with 21 and 25 to 29 with 27 (this is a type of Recoding known as Midpoint Recoding.
- Computing the Net Promoter Score.
- The categories of a categorical variable can be combined. Most commonly, small categories are merged into larger categories. For example:
- When a question asks for reasons for a particular behavior, any reasons that are selected by a small number of respondents can be classified as Other.
- Variables that collect data on Rating Scales may be converted to Binary Variables to make further analysis simpler.
Standard transformations of numeric variables
Univeriate
- Ranks
- Log transformations
- Trimming
- Windsorizing
Multivariate
- Principal Components Analysis
- Cluster Analysis
- t-SNE
- Basis functions, such as:
- Dummy Variables
- Polynomials
- Orthogonal polynomials
Next
See more ways of transforming data in the Data Cleaning and Tidying section.
Comments
0 comments
Article is closed for comments.