Spreadsheets, such as Excel and Google Sheets, can be used to create a flat data file. Key things to ensure are that:
- The data should be structured as a single rectangular table.
- The data is set up as described in Characteristics of a Good Data File.
By way of summary, this summarizes what a good file may look like:
Whereas this is a file that has been set up poorly:
The overall structure should be rectangular
The data should follow the structure of a flat data file, with rows representing cases and columns representing variables.
Row 1 should contain variable names or labels, where the first character should be a letter. If you are performing your analysis in R or SPSS, you should use a single word and avoid any funny symbols. If using Q or Displayr for your analysis, you can use whatever wording you like. A common mistake is to put some other kind of information into the first row (e..g, the source of the data).
Row 2 should contain the data for the first case. A common mistake is to put metadata into row 2 (e.g., variable labels).
Each subsequent row should contain the data for one case.
Put the data in the first sheet, starting in A1. Create the file in the first worksheet (tab), with nothing in the remaining sheets. Make sure the data starts at cell A1, with no blank cells or titles to the left or above the data.
Leave out any metadata that doesn't fit into the rules above. Presumably, because spreadsheets are flexible, people sometimes add a whole lot of additional metadata (e.g., an extra row at the top or another tab showing question wordings). Don't do this. If the data is set up as described here, there's a good chance of problems.
Avoid using any special formatting. For example:
- Makes sure no cells in the table have been merged.
- Delete any rows and columns below and to the right of the data. It is common that these will contain spaces and things that cause difficulty when reading data, so delete them even if you cannot see anything.
- Be consistent. For example, if you are representing a male as a 1, do it consistently. Do not do it sometimes as 1, sometimes as m, sometimes as males, and other times as Male. This is easiest to do using Excel’s data validation tools or the equivalent tools in Google Sheets.
- Don’t use comments.
- Don’t use highlights, cell, or font formatting to convey meaning (there is no straightforward way of converting this to analyzable data).
Set up the data in accordance with the characteristic of a good data file
Detailed descriptions of how to set up different types of data are described in Characteristics of a Good Data File.
Comments
0 comments
Please sign in to leave a comment.