Data analysis is most straightforward when there is a single rectangular table of data. It can be substantially more difficult if, instead, there are:
- Multiple summary tables
- Separated tables
- Data for a single variable is contained in multiple tables.
Nevertheless, there are instances where multiple are appropriate.
Multiple summary tables
Sometimes people think they have raw data, when instead they have multiple summary tables, like the ones below. This misunderstanding is most common when people have Excel files. It is not possible to extract raw data from these tables, which means that you cannot create your own analyses.
In the example below, there are two tables. Such data is better analyzed using data analysis software if it is stored in a single table.
Data for a single variable, contained in multiple tables
In the example below, the Person table contains a variable called Age. However, this variable cannot be analyzed on its own, with the metadata required to correctly interpret it being located in a separate table called Age. The root cause of the problem here is that the data is stored in a relational database, and should instead be stored in a single data file that is able to store the metadata (see Overview of Data File Formats).
Situations when multiple tables are appropriate
In the previous section, the problem is that the variable itself cannot be analyzed without data contained in other tables. There are, however, situations where it is appropriate for information to be stored across multiple tables. The basic principle is that it is appropriate to have multiple tables where each table can be regarded as a data set in its own right.
The most common example of this is where:
- One data set represents customers, with each row/case being a customer.
- A second database represents purchases, with each row/case representing a transaction.
- There is an ID variable in each of the files, which makes it easy to work out which transactions belong to which customers.
A variant of this common example in survey analysis is replacing transactions with purchase occasions.
For example, where one table or data file contains data about households, and a separate table or data file contains information about all the purchases of a household.