It's just about always beneficial to take the time to create an analysis plan prior to actually looking at your data and, better yet, prior to collecting any data. This article discusses:
- When to create an analysis plan?
- What's in an analysis plan?
- Simple example
When to create an analysis plan?
The best practice is to create an analysis plan prior to writing the questionnaire, as the process of writing an analysis plan is often helpful in working out which questions need to be asked.
If the data has already been collected and there is no analysis plan, it is still a good idea to create an analysis plan, as the systematic thinking through of what to do in what order tends to make the whole process much easier.
It is difficult to exaggerate the usefulness of creating an analysis plan. It is unfortunately the rule rather than the exception that when people skip this stage and "jump into" the analysis, they end up generating lots of uninteresting analyses. The act of going back to the goals of the study and working out how to achieve them in an analysis plan is a wonderful way of concentrating the mind.
Of course, even organized researchers are human and take a peek at key results before finalizing their analysis plan.
What's in an analysis plan?
The level of detail in an analysis plan depends on who is going to perform the work. If the person who is creating the analysis plan will also be implementing the plan, the plan may just be a few bullet points. If the analysis is being done by somebody else, a much more detailed analysis plan is the norm, and most companies have standard forms (typically in Excel) specifying the desired structure of the analysis plan. These analysis plans are often called table specs.
The first part of an analysis plan dictates any checking that is required. For example:
- Check the data consist of the correct number of completed responses.
- Check age, gender, and geography match the required quotas (e.g., are similar to Census projections).
For more information, see How to Check a Data Set.
- Delete speeders (completed questionnaire in less than 2 minutes).
- Delete flat-linears (respondents who gave the same answer to all the TV shows in Q15).
For more information, see How to Clean and Tidy Data.
Analysis plans will typically dictate how to weight the data to address any skews that over or under-represent key sub-groups (e.g., "Weight by age, gender, and region to 2020 Census data").
For more information, see How to Weight a Survey.
The data preparation section of an analysis plan will typically indicate what text data needs to be coded. If somebody else is doing the analysis, it will also include detail on categories to be merged and new variables to be created. For example:
- Stack the data.
- Code the text data in Q9 and Q10 (multiple response coding).
- Calculate Top 2 box scores (Definitely buy + probably buy) for Q11 & Q14.
- Recode income as numeric.
- Have three versions of Q15:
- Separate categorical variables,
- Numeric - multi.
For more information, see How to Clean and Tidy Data.
Analysis plans for advanced analyses typically list which techniques need to be used, which data to be analyzed, and any specific output requirements. For example:
- Create a correlation matric of the Numeric - Multi variables of Q15.
- Create a Performance correspondence analysis of a square table on the correlation matrix.
- Use Hierarchical Bayes to analyze the MaxDiff data in Q16, creating new variables showing the utility of each alternative.
- Deep dive.
For more information, see Advanced Analysis.
Planned tables and visualizations
The main part of an analysis plan typically consists of the tables that need to be created. This is commonly referred to as the table spec. Typically you start out with a plan of tables you want to create and then, as you view them, you get additional ideas for further analyses.
Key planned analyses
In a well-designed study, there are a set of clear objectives, and the questions are designed to test these hypotheses. The most important bit of an analysis plan is to list the tables needed to answer the objectives and test hypotheses. For example:
- Calculate the percentage of people that say they will definitely buy, and compare it to our threshold of 25%.
- Calculate % Definitely will buy for Q14 (purchase intent).
- Calculate Top 2 Box % for Q14.
- Compare Top Box and Top 2 Box scores for Q11 and Q14.
- Calculate the correlation between Q14 by Q12 and Q13.
If somebody other than you is going to create the tables, it can be useful to provide more detail, specifying:
- Rows: specifying any rows to be merged and nets, and any specific summary statistics to be calculated.
- Columns: specifying any columns to be created and nets.
- Statistics (e.g., "Shows counts and column percentages, and the mean at the bottom).
- Statistical testing (e.g., "Compare each column with Coca-Cola").
Where the tables required are complicated, it is useful to create a dummy table, showing the entire table, with blank spaces where the numbers to be calculated are filled in.
It's almost always useful to specify additional exploratory tables to look through in case they reveal interesting results that weren't hypothesized. Standard exploratory tables are:
- Summary tables of all the questions in the survey.
- Crosstabs of the key questions (e.g., purchase intent), by every other question in the survey.
Occasionally visualizations are included in the data analysis plan, although more commonly in survey research visualization is a part of reporting rather than the analysis process.