The Wrong Way To Analyze Experiments

One of the biggest mistakes I see Growth teams make when it comes to analyzing experiments is focusing too much on percentage gains. I’ve seen it time and time again when Growth teams pat themselves on the back for 30% increase in this metric or 15% increase in that metric. In this post, I’ll dissect the problem around percentage gains and why Growth teams should avoid using them to when it comes to assessing the overall impact of an experiment.

The Problems

The first problem with reporting percentage gains is that every experiment has an audience bias. By audience bias, I’m referring to the characteristics of the users in the experiment can differ significantly in terms of demographics, engagement levels, etc. from one experiment to another. For instance, let’s say I run an experiment where I send users a notification when someone likes their post. That experiment is naturally going to have an audience biased heavily towards active users that are actively posting content. You need to be posting content in order to be eligible to receive a notification about someone liking your post. In that hypothetical experiment, I might see only a small increase (ex: 1%) in daily active users (DAUs) because many people in the audience might already be DAUs. However, if I run another experiment to send an email to re-engage dormant users, I might see a 150% increase in DAUs because so few of the users in the experiment would become a DAU organically.

The second issue with percentage gains is that it obscures the true business impact. Going back to the email example, while one experiment had a 1% gain and the other had a 150% gain, we actually have no way of telling which experiment was more impactful for the business. The critical piece of data that is missing is the baseline population that the percentage gain is increasing. The 1% gain could be on a population with 10 million DAUs, which means the experiment netted us 100,000 incremental DAUs. Conversely, the 150% gain might be on a population of 10 million dormant users of which 20,000 were coming back organically, which means that experiment netted us only 50,000 incremental DAUs.

An Example

When I first started on the Growth team at Shopkick, we made the mistake of looking exclusively at percentage gains. During the first several months, the Growth team shipped a lot of experiments that had great percentage gains (70% increase in signups coming from Facebook posts, 50% increase in-store visits, etc.) About 5 months after forming the Growth team, we realized we weren’t actually seeing our topline metrics move. Going back over the data, we realized that a lot of those big percentage gains we posted in earlier experiments weren’t actually impacting the bottom line. For instance, the experiment that delivered a 70% increase in signups in reality was only adding an incremental 20 signups a day. Looking at percentage gains had been blinding us and we needed to focus on absolute numbers if we wanted to gauge the true impact.

Not All Bad

So, are percentage gains completely worthless? While I just spent most of the post railing on percentage gains, there are cases when they can be helpful to look at. For instance, percentage gains can be used to gauge the effectiveness of the experiment by judging the percentage increases for one metric relative to another. For instance, if you send 10% more email to dormant users that resulted in 20% gain in DAUs, that’s a good indicator that users really like the email. The inverse correlation might indicate the email isn’t that great and you’re just making up for it in volume. Just don’t use those percentage gains when reporting the experiments impact.

Wrap Up

About a year and a half ago, at Pinterest we shifted to using absolute numbers on the Growth team when reporting results. It has helped us compare and measure the true business impact of experiments that range across many different surface areas of the product and many different segments of users. Absolute numbers have also helped us more concretely measure a team’s output, by summing up the absolute impact of all the experiments they shipped that quarter. In Growth, the number 1 priority is driving impact, and if you’re not using absolute numbers to measure your output, then you can’t be sure you’re doing that.

John Egan

Growth Engineer @Character.AI

The Wrong Way To Analyze Experiments

Also read...

The 27 Metrics in Pinterest’s Internal Growth Dashboard

4 Metrics Every Growth Hacker Should Be Watching

The Startup Guide To Managing Your Email Reputation

Managing Your Growth Team’s Portfolio: A Step-by-Step Guide

When do features drive growth?

Comments