Understanding Conversion Rates Over Time – The Data Story Guide

The simplest way of calculating conversion rates - dividing one number by another - is often wrong. This article presents:

A simple example of a naive conversion calculation
A better conversion rate estimate - the Kaplan-Meier Estimate
Conversion curves for exploring conversion over time
Comments

A simple example of a naive conversion calculation

The table below shows some hypothetical data for a sales rep. The sales rep had ten meetings with clients across five days. Four meetings led to purchases.

The problem to solve is: what's our best estimate of the conversion rate if we calculate the rate on the 5th of January?

The naive way to calculate conversion is as the percentage of people to purchase. In the example above, 4/10 = 40% conversion rate.

A better conversion rate estimate - the Kaplan-Meier Estimate

Based on the data provided, a better estimate of conversion is 54%. This better method is known as the Kaplan-Meier Estimate.

The naive estimate ignores the role of time. It ignores the information that the conversion rates have been calculated on the 5th of January. It implicitly assumes that the six meetings of the sales rep that have not led to purchases will never lead to estimates.

The table below adds some extra columns that help to get a better understanding of the data. The date of the calculation is shown in the Today column. The last two columns show the number of days between Today and the Meeting Date and between the Purchase Date and the Meeting Date.

The table below summarizes the information in the last two columns. The first row shows us the information about people that took 0 days to convert. We can see that:

One meeting converted (# Converted). This is the second meeting from the 4th of January.
Ten meetings had a chance to convert (# Available). That is, it was possible for all 10 to convert in 0 days.
% Conversion for meetings with Days to Convert of 0 is then # Converted / # Available = 1/10 = 10%.

The next row is a little less obvious:

As the table above shows, one meeting did convert after one day, so # Converted is 1.
# Available is now 7. While there are ten meetings in the data set in total:
- One of the meetings converted after 0 days, so it is excluded
- The last two meetings were only observed on the 5th of January, so we have no data that tells us whether they did or did not convert after one day, so they are also excluded.
% Conversion = 1/7 = 14.5%
As 10% converted on day 0, 10% fewer are available, so % Available is 90%.
The percentage to convert over days 0 and 1 (% Cumulative) is then 22.86%, as:
- 14.29% of those available converted on day 1
- But, only 90% were available, so the real percentage that convert on day 1 is 90%*14.29%
- 10% had converted on day 1, so we need to add that, leading to 10% + 90% * 14.29% = 22.86%

The logic for day 1 is repeated for day 2. As everybody available to convert has converted by day 2, the % Cumulative number for the last row of the table is our estimate of the true conversion rate, and it is 53.71% ≈ 54%.

Conversion curves for exploring conversion over time

While the table above can be used to produce an estimate of conversion, it also allows an understanding of how conversion changes over time, as a conversion curve. We can read off values from such a curve. For example, even though the table above stops at two days, the curve continues, telling us that the conversion rate remains at 54% from day 2 through to day 4 based on the available data.

The vertical lines on the conversion curve indicate reductions in the sample size (#Available). The smaller the sample size, the less precise the results. So, while we have no data indicating that conversion increases after day 2, we have progressively less data so should be careful.

Comments

A few comments:

In real-world applications, there is much more data, and you typically compute the Kaplan-Meier Estimate using specialist software.
The naive conversion rate is not always wrong. In the example above, if the calculations are performed on the 7th of January, rather than the 5th, and no additional purchases occur before then, the Kaplan-Meier Estimate estimate is the same as the naive estimate of 40%.
The naive conversion calculation is most problematic when comparing conversion rates over time. For example, if you compare conversion from a time period just finished, with an earlier time period, you often draw the wrong conclusions (as the naive estimate is lower the more recent the time period).
When comparing conversion rates at different points in time, the easiest approach is to choose a relevant number of days to convert. For example, comparing the proportion of people to convert after, say, 30 days. This is done by comparing conversion curves.
The example above uses a very small data set to make the calculations simple. A sample of 10 meetings is very small to be relied upon. When using very small samples, more advanced approaches to calculating conversion curves (e.g., using cox regression), become beneficial.