Data files can be created in multiple file formats. Some formats are much better than others. It can sometimes be challenging to obtain the best file format from another person or organization, so it is often useful to be quite structured about the process rather than just requesting any data file.
Desirable file formats (most to least desirable)
Some data files make data analysis relatively easy. Others can add days, weeks, or longer, to the time taken to analyze the data, as well as increase the risk that the data is incorrectly analyzed.
Novice data analysts often just ask for "a data file". The result is that they often end up with a poor data file. To use an analogy, it's a bit like engaging a builder to build a house. If you just say "give me a house" the outcome will be inferior to that obtained if you use an architect and supervise the process carefully.
The file formats below are listed from best to worst. That is, all else being equal, a QPack is highly desirable and an Excel file is undesirable.
- QPack (.QPack): This is a proprietary format for Q and Displayr.
- QDat File (.QDat): This is a proprietary format for Q and Displayr.
- SPSS Data File (.SAV): Despite the name, this is a widely available file format, which can be created and read by all the leading software packages.
- Triple S Data File (.SSS): This only exists for some data collection platforms.
- SPSS Dimensions/Data Collection File: This is a proprietary data file format, is difficult to obtain, and can only be reliably read by other DImensions/Data Collection/Unicom Business Intelligence software, but sometimes can be read by Q and Displayr.
- CSV File
- Excel File
The importance of file format depends largely on the amount of metadata. For survey data, metadata is extremely important, and it's advisable to get the best file format that you can. In many other applications, such as when analyzing sales data and transaction data, simpler file formats such as CSV and Excel files are often suitable.
Difficulties when obtaining a data file from another organization
Obtaining a data file from another person or organization is not always as simple as it may at first appear. The people that are asked to provide the data file:
- May not know how to create a good data file, and the result is that they inadvertently provide a poor quality data file.
- May not want to go to the effort of creating a good data file, so instead do something that is easy for them and extremely problematic for the data analyst.
Process for obtaining the best possible data file
How to resolve these challenges depends on the circumstances. Some tips:
- Work out which of the following file formats listed above data analysis software can read.
- Ask whoever is providing the data which of these formats they support.
- Request the best of the file formats that are available. In general, the higher on the list, the better. That is, an Excel file is basically the worst of the options; you really don't want this if there's any chance of something better, particularly if you are analyzing survey data.
- Import the data file into your analysis software and carefully check the resulting data set. See How to Check a Data Set.
Comments
0 comments
Please sign in to leave a comment.