The SPSS data file (.sav) was originally developed as the file format for the computer program IBM SPSS. Today, it is the most widely used format for storing survey data and is created by and analyzed by most advanced data analysis software.
Overview of the file format
SPSS data files, often called "S A V" files, are binary files. The key feature of the file format is that it is very rich in terms of metadata. In particular, it has a very rich amount of metadata stored for each variable, including:
- A variable Name. E.g., Attitude.
- A Variable Label. E.g., Attitude may have the label How strongly do you agree with the statement ‘Data Science is Cool’?
- Each variable has a set of Value Labels. For example, a 1 for Gender may mean Male and a 2 may mean Female.
- One variable may be flagged as a weight and another as a filter.
- Related variables may be grouped into Multiple Response Sets.
- Certain values may be flagged as Missing Values.
- The scale type of variables will be stored as Nominal, Ordinal, or Scale.
- Information about the date format may be stored.
Strengths of the file format
The richness of the metadata makes the SPSS Data File a good format for storing survey data. This, combined with the file format being 50+ years old, has made it, by far, the most widely used file format for science survey-based data analysis (e.g., psychology, marketing, sociology, politics, market research, social research, polling).
Weaknesses of the file format
- The file format has evolved over its 50+ years of existence, and this can cause some compatibility issues, particularly with text variables.
- Poor support for very long variable and value labels.
- The file format cannot be used for very large data files. This is because the file format requires the whole file to be read into memory in order to be analyzed.
- Limited support for metadata for variable sets. It only supports multiple response questions and not grids.