In data analysis, a variable is a column in a table of raw data. Variables can be obtained in the original data, or, derived. Variables require metadata in order to be interpreted. There are some other meanings of the word "variable".
Original versus derived variables
The table of raw data below shows four variables, each represented as a column. The first three variables, Work status, Occupation, and Age is data that was collected by people answering a question in a survey.
The fourth variable, 20-24 yrs, has been created from the Age variable and contains a value of 1 for each case that is 20-24 yrs and a 0 otherwise. Where a variable is created from one or more of the original variables, it's variously known as a:
- Constructed variable
- Computed variable
- Created variable
- Derived variable
It is routine to derive additional variables in data analysis. It's done as a way of saving time. This is discussed in more detail in How to Clean and Tidy Data.
Values versus labels
In the table below, the first three variables contain text descriptions (e.g., Fulltime worker). These are usually referred to as labels.
The same data can be shown using numbers or other symbols, which are usually called values. For example, the table below shows the same variables but with values rather than labels.
Other meanings of variable
The term "variable" has some other meanings. It can also refer to:
- A variable in a computer program
- A variable in maths (e.g., a random variable)
- A concept