Why You Should Be A/B Testing Your Infrastructure

The benefits of using a data-driven approach to product development are widely known. Most companies  understand the benefits of running an A/B experiment when adding a new feature or redesigning a page. While engineers and product managers have embraced a data-driven approach to product development, few think to apply it to backend development. We’ve applied A/B testing to major infrastructural changes at Pinterest and have found it extremely helpful in validating those changes have no negative user-facing impact.

Bugs are simply unavoidable when it comes to developing complex software. It’s often hard to prove you’ve you covered all possible edge cases, all possible error cases and all possible performance issues. However, when replacing or re-architecting an existing system, you have the unique opportunity to prove that the new system is at least as good as the one it’s replacing. For rapidly growing companies like Pinterest, the necessity to re-architect or replace one component of our infrastructure happens relatively frequently. We rely heavily on logging, monitoring and unit tests to ensure we’re creating quality code. However, we also run A/B experiments whenever possible as a final step of validation to ensure there’s no unintended impact on Pinners. The way we run the experiment is pretty simple: half of Pinners are sent down the old code path and hit the old system and the other half use the new system. We then monitor the results to make sure there’s no impact across all our key metrics for Pinners in the treatment group. Here are the results of three such experiments.

2013: A new web framework

Our commitment to A/B testing infrastructural changes was forged in early 2013 when we rewrote our web framework. Our legacy code had grown increasingly unwieldy over time, and its functionality was beginning to diverge from that of our mobile apps because it ran through completely independent code paths. So we built a new web framework (code-named Denzel) that was modular and composable and consumed the same API as our mobile clients. At the same time we redesigned the look and feel of the website.

When it came time to launch, we debated extensively whether we should run an experiment at all, since we were fully organizationally committed to the change and hadn’t yet run many experiments on changes of this magnitude. But when we ran the experiment on a small fraction of our traffic, we discovered not only major bugs in some clients we hadn’t fully tested but also that some minor features we hadn’t ported over to the new framework were in fact driving significant user engagement. We reinstated these features and fixed the bugs before fully rolling out the new website, which gave Pinners a better experience and allowed us to understand our product better at the same time.

This first trial by fire helped us establish a broad culture of experimentation and data-driven decision-making, as well as learn to break down big changes into individually testable components.

2014: Pyapns

We rely on an open-source library called pyapns for sending push notifications via Apple’s servers. The project was written several years ago and wasn’t well maintained. Based on our data and what we’d heard from other companies, we had concerns about its reliability. We decided to test out using a different library called PyAPNs, which seemed better written and better maintained. We set up an A/B experiment, monitored the results and found that there was a 1 percent decrease in our visitors with PyAPNs. We did some digging and couldn’t determine the cause for the drop, so we eventually decided to roll back and stick with pyapns.

Figure 1: Experiment results for replacing pyapns

2015: User Service

We’ve slowly been moving towards a more service-oriented architecture. Recently we extracted a lot of our code for managing users and encapsulated it into our new UserService. We took an iterative approach to building the service, extracting one piece of functionality at a time. With such a major refactor of how we handle all user-related data, we wanted to ensure  nothing broke. We set up an experiment for each major piece of functionality that was extracted, for a total of three experiments. Each experiment completed successfully showing no drop in any metrics. The results have given us strong confidence that this new UserService is at parity with the previous code.

We’ve had a lot of success with A/B testing our infrastructure. It’s helped us identify when changes have caused a serious negative impact that we probably wouldn’t have noticed. When they go well, they also give us the confidence that a new system is performing as expected. If you’re not A/B testing your infrastructure changes, you really should be.

By: John Egan is a growth engineer and Andrea Burbank is a data scientist at Pinterest

Acknowledgements: Dan Feng, Josh Inkenbrandt, Nadine Harik, Vy Phan, John Egan and Andrea Burbank for helping run the experiments covered in this post.

Originally published on the Pinterest Engineering Blog

Long-term Impact of Badging

When it comes to growth, one potential pitfall is over optimizing for short-term wins. Growth teams operate at a pretty fast pace, and our team is no exception. We’re always running dozens of experiments at any given time, and once we find something that works, we ship it and move on to the next experiment. However, sometimes it’s important to take a step back and validate that a new tweak or feature really delivers long-term sustainable growth and isn’t just a short-term win that users will get tired of after prolonged exposure. In this post I’ll cover how we optimize for long-term sustainable growth.

Last year, we started including a badge number with all our push notifications. For many people, when an app has a badge number on it, the impulse to open and clear it is irresistible.

Figure 1: Example of badging on the Pinterest iOS app

When we launched badges, we ran an A/B experiment, as we do with any change, and the initial results were fantastic. Badging showed a 7 percent lift in daily active users (DAUs) and a significant lift in other key engagement metrics such as Pin close-ups, repins and Pin click-throughs. With such fantastic results, we quickly shipped the experiment. However, we had a nagging question about the long-term effectiveness of badging. Is badging effective long-term, or does user fatigue eventually set in and make users immune to it?

 To answer this question we created a 1 percent holdout group. A holdout group is an A/B experiment where you ship a feature to 99 percent of Pinners (users) and keep 1 percent  from seeing the feature in order to measure the long-term impact. We will typically run a holdout group whenever we have questions about the long-term impact or effectiveness of a particular feature.

We ran a holdout experiment for a little over one year. What we found was that the initial lift of 7 percent in DAUs settled on a long-term baseline of a 2.5 percent lift in DAUs after a couple months (see mobile views in Figure 2). Then last fall we launched a new feature, Pinterest News, a digest of recent activity of the Pinners you follow. As part of News, we would also badge Pinners when there were new News items. As a result, News helped increase the long-term lift of badging from 2.5 percent to 4 percent.

Figure 2: Lift of the badging group over the holdout group for key engagement metrics

We also found that badging was effective at increasing engagement levels. We classify Pinners into core, casual, marginal, etc., and we found badging had a statistically significant impact on attracting those who would have fallen in the marginal or dormant bucket to instead become core or casual users. This finding was compelling since it proved that badging is effective at improving long-term retention.

Figure 3: The badging group drove more Pinners into higher engagement buckets when compared to the holdout group

Holdout groups have been an effective way for us to ensure we’re building for long-term growth. We also have hold out groups for features like ads, user education, etc. In general, holdout groups should be used anytime there is a question about the long-term impact of a feature. In the case of badging, it allowed us to understand how Pinners responded to badging over a prolonged period of time, which will help inform our notification strategy going forward.

This article was also published to the Pinterest Engineering Blog

The 27 Metrics in Pinterest’s Internal Growth Dashboard

One question I often get asked by people starting out on growth is “what metrics should be in my growth dashboard?”. I’ve written before about what metrics we value at Pinterest. In this post however, I’ll give people a peek behind the scenes and share what our internal growth dashboard looks like.

We have organized our dashboard to reflect our user growth model. We start with our top line growth metric of MAUs. Then we follow the user lifecycle funnel; starting with acquisition metrics, followed by activation, engagement, and finally resurrection.

Pinterest Growth Model



1. Current progress to goal: Current number of MAUs & how much progress we’ve made towards our quarterly MAU goal.

2. MAU Forecast: Forecast of the number of MAUs we could expect to have extrapolated from our growth rate at the same time during previous years. We include this metric to help us anticipate the effect of seasonality on our growth numbers.

3. MAUs by app

4. MAUs by gender

5. MAUs by country: Tracking total number of MAUs in every single country would obviously be overwhelming to view on a chart, so instead we bucket countries together. The buckets we use are USA, Tier 1, Tier 2, Tier 3, and Rest of World. The tiers are based on size of Internet population, Internet ad spending, etc.

6. MAU Accounting: The MAU accounting helps us see what factors are contributing the most to our MAU growth. Specifically we split out total number of signups, resurrections, existing users churning out and new users churning out.

Growth MAU Accounting



7. Total signups

8. Signups by app

9. Signups by referrer

Signups by referrer

10, 11, 12. Invites Sent, Unique Invite Senders, and Invite Signups



 13. Overall Activation Rate: 1d7s is a term we use to refer to users who come back 1 or more times in the week following signup. We measure overall activation rate as 1d7s/signups, or in other words, the percentage of new signups that visit Pinterest again in the week following signup.

 14. Activation by app: This is the same metric of 1d7s/signups split out by platform. We’ve seen that different platforms can actually have pretty dramatic differences in activation rates.

15. Activation by referrer source

16. Activation by gender

Activation rate by gender

17. Overall signups to 1rc7: This metric is similar to the signups to 1d7 except it measures the percentage of new signups that repinned a pin or clicked on a pin in the week following signup. We use this metric to measure as a leading indicator of how well we are activating users into the highly engaged user buckets

 18. 1rc7s by app

19. Signups to 1rc7 ratio by app

20. Signups to engagement funnel by app: This metric tracks the percentage of new signups that are still doing key actions during a one-week time window of 28-35 days after signup. Specifically, we track 35 days after a user signs up, what percentage of them are still an MAU, WARC (weekly active repinner or clicker), WAC (weekly active clicker), or WAR (weekly active repinner).

21. Signup engagement funnel by gender

22. Signups to WAU 35 days after signup: This is one of our key activation metrics. We track the total percentage of users who are still a WAU one month after signup.  Specifically we look to see what percentage of signups were active between 28-35 days after signup.



 23. *AU ratios: We track the ratio of DAUs to MAUs, WAUs to MAUs, and DAUs to WAUs. The ratio between *AUs is a popular metric to gauge how engaged users are with your app.

*AU ratios

24. Email Summary by type: Table of total number of emails sent, opened, & clicked-through split out by email type.

25. Push notification summary by type: Table of total number of push notifications sent & opened split out by type and by platform (iOS & Android).



26. Resurrections by platform: Total number of users that were dormant for 28+ days, but then came back to Pinterest, split out by which platform they came back on.

27. Resurrections by referrer

Retention rates by referrer

To wrap up, you can see we put a big emphasis on activation (the process of getting a new user to convert to a MAU). This is because we consider activation critical to long-term sustainable growth. Strong activation rates are necessary if you want to be able to scale a service to hundreds of millions of users.  We also put an emphasis on segmentation by gender, country, referrer, etc., to more deeply understand how different segments of users interact with Pinterest and see which segments are underperforming. If you have any questions feel free to ping me on twitter or drop me a line.

Growth Hacker TV Episode

Here is my Growth Hacker TV episode. Be sure to view their other great interviews at http://www.growthhacker.tv


Data Driven Growth

Here is the video of a talk I gave recently at the Weapons of Mass Distribution conference run by 500 Startups. I spoke about how you can use data to help drive your growth strategy and figure out which areas of growth you should be focusing on.

Hiring Growth Engineers Is Not Impossible

I read a post recently about how it is impossible to hire growth engineers. I’ve been lucky enough to have the opportunity to work on the growth engineering teams at two successful companies over the past few years. In that time I’ve learned that just like with any engineering role, hiring growth engineers is hard, but it is not impossible. Here’s how:

1. Cultivate Growth Engineers

Whenever any engineer joins a new organization, there is a learning curve and they need to learn & adjust to numerous things. They need to learn the company’s culture, their technology stack, their coding practices, etc. Growth engineering is a relatively new discipline, so the best strategy is to look for engineers that have the qualities that makeup a good growth engineer and help cultivate them into being a successful growth engineer. Teach them the importance of quickly shipping features and learning fast, teach them what MVP means, teach them to analyze the results of A/B experiments.

2. Look for Full Stack Engineers

Growth teams by nature touch the entire technology stack of a product: backend infrastructure, web frontend, Android, and iPhone. However, growth teams are also often extremely resource constrained and have a keen sense of urgency. Experienced engineers that can develop on multiple parts of the stack and quickly learn the parts they don’t know when necessary are critical to a growth team’s success. They give the growth team a huge amount of flexibility to move engineers around when projects/priorities change and also help keep the growth team from getting blocked.

 3. Product Sense 

Engineers believe it or not can have a keen product sense. This is evidenced by the numerous Product Managers that I’ve met over the years that had a CS background. Finding engineers who have a passion for being involved in product is important because they will be the engineers that are engaged by the mission and challenges of growth. They will also be the engineers passionate enough to really think about the user’s experience and mindset as they build product features to create a polished experience on their first try.

 4. Find Creative Problem Solvers

Often times in growth you have to find creative solutions around certain technical limitations. In my mind this is really where the term “growth hacker” comes into play. Whether it is finding a way to track invites on iOS (which has no attribution tracking for their app store) or figuring out how to generate recommendations for people to send invites to from a user’s contact list, creativity plays a large role in the success or failure of growth features. There is no substitute for creative problem solving, you either have it or you don’t.

Hiring growth engineers can be hard, but it is not impossible. Many of the best growth engineers I know came from standard engineering backgrounds, but they had the right mix of skills and were able to grow into the role.  If think you have these skills and are interested in learning how grow a company to millions of users, Pinterest is hiring growth engineers.

The Growth Hacker’s Guide to Push Notifications

Push notifications (aka PNSes) can be a powerful re-engagement tool for mobile apps to communicate to their users. In this post, I’ll cover everything a growth hacker needs to know about push notifications on iOS and Android.


One of the biggest differences between the two platforms is the permissions model for push notifications. On iOS, push notifications are opt-in. Users see the all too familiar dialog “AppName would you like to send you push notifications.” On Android however, push notifications are opt-out, users have to explicitly go into their settings and turn them off. This difference in permission model ends up having big implications for the effectiveness of PNSes on the different platforms. On Android generally 95%+ of users receive PNSes, while on iOS it us usually less than 50%.


Badging is the term used for the little red circle displayed on app icons in iOS. Badging has traditionally been one of the most powerful ways to use push notifications to re-engage users. One reason for its effectiveness is that many users can’t stand having unread notifications and will open the app to clear the badge number, giving the app a chance to re-engage the user. However, the primary reason badging was so effective is that prior to iOS7, apps that asked for the push notification permission would get a push token even if the user denied the push notification permission. The app could then use this push token to display a badge number and try to re-engage the 50%+ of iOS users that do not receive push notifications. Unfortunately, starting in iOS7 this is no longer the case. The good news however, is you can still continue to badge every user that first downloaded the app on iOS6. Another thing worth noting is that badging is no longer exclusive to iOS. Several Android manufacturers have started to add proprietary APIs to support badging on Android.

badge icon

Local Notifications

Local notifications are a way of getting around the permissions issue on iOS. Local notifications look and act like PNSes, but as the name implies, are scheduled by the app to appear at a certain time and date, rather than being pushed to the app at any time by the server. The major advantage of local notifications is that they are not tied to the push notification permission on iOS. This means even if the user does not give the app permission to show push notifications, the app can still show local notifications.
Edit (June 2, 2014): Starting in iOS8 local notifications are no longer a work-around for notification permissions

Geofenced Notifications

Geofenced notifications are when an app monitors the user’s location and shows the user a notification when they are near a place where the app would be useful. A classic example is shopping apps like Shopkick, or RetailMeNot show notifications when users are at the mall.  It can be tricky to get geofencing right; I’ve written before about some of the common pitfalls of geofencing. However, geofencing is still a very powerful tool, especially for location-based apps. Notifying the user when they are at a place where the app has content not only reminds the user to use the app in a situation where the app can add value, but also allows the app to deliver hyper relevant notifications.

Picture PNSes

Standard push notifications can be difficult to work with since all you have to grab the user’s attention is about 100 characters of text. One thing unique to Android is the ability to include an image that is displayed alongside the push notification. Android Picture Notifications make PNSes richer and more importantly, more engaging.

Emoji characters

While iOS doesn’t support picture notifications, it does have some support to spice up the text in push notifications. On iOS it is possible to include emoji characters in the message to try and make the message stand out and grab the user’s attention.

emoji in push notification

If you think there is something about push notifications that I missed, feel free to contact me at [email protected]

Why You Should Be Using ROI To Run Your Growth Team

The very first growth team I was on was run very democratically. Every month we would brainstorm new projects, everyone would vote on the ones they found most promising, and then we would execute on everyone’s top picks. After a few months, the team had delivered several projects that had beaten control, but we still weren’t seeing any visible impact on our bottom line metrics. To find out why, we took another pass through our experiment data and we realized that several of the experiments we had run never had a chance at impacting bottom line numbers. The reason was several of the features we developed were so far down the funnel, they would never reach enough users to move our numbers.

An example of such a project was our Facebook app. The company was a mobile only company, so when people came across us on the web we had to get them to make the jump to mobile and download our app. In order to better convert traffic from Facebook Timeline posts, we built an interactive Facebook app that would give users an idea of what the mobile app was about rather than sending them to a static landing page that instructed them to download the app. The Facebook app performed well, it increased our conversion rate from click to signup from 3% to 6.5%. The problem was, we only received 300 clicks/day from Facebook Timeline posts. This project earned us a whopping 11 incremental users/day. Even if we were able to 20x our Facebook traffic, the project would not have had a meaningful impact.

We realized the problem was with how we were selecting projects. Often the projects people found more exciting, such as building new features, would win over more boring projects such as optimizing the copy on invite messages.  We resolved to fix the issue by evaluating all growth projects in terms of the ROI on our bottom line metric of retained MAUs (a user that is still a MAU more than 30 days after signup).

To illustrate how we used ROI, lets go through an example. Say you think you can increase SEO traffic by 20%. Currently you get 10MM hits from SEO a month, so a 20% increase will net an additional 2MM hits/month. Normally 5% of your SEO traffic signs up for your site, so you can expect to net an additional 100K signups. Typically 50% of signups from SEO convert to a retained MAU, so you can calculate that this project will likely net you an additional 50K retained-MAUs/month. Now lets say this project will require an investment of 20 man-hours to complete. Calculating the return/investment gives us 2,500 retained-MAUs/man-hour.

Picking projects through the lens of ROI helped the team become much more effective with our time and resources by not wasting efforts on projects that sounded good in theory, but in reality would have no impact. I now strongly advise any growth team I talk to, to use ROI if they aren’t already, and to first understand the impact their various project ideas will have, instead of executing on them blindly.

The Secret to Effective Geofencing

You’ve probably heard of geofencing, but in case you haven’t, geofenced notifications are alerts that users receive on their smartphone when they are near a particular location. They are a powerful activation & re-engagement tool for location based apps because they communicate information to a user at a time that is contextually relevant. Over the past year, geofenced notifications have become popular with many shopping apps such as RetailMeNot, RedLaser, etc. In this post I’ll cover some of the basic problems you might encounter with developing geofencing.

1. Inaccurate Location Data
The dirty little secret of most location data providers is that their data just isn’t that accurate. Typically for geofencing you will want to geofence a small radius (ex: 150 meters) around the point of interest. The way these data providers get latitude & longitude information is through a method called address interpolation. The problem is that the latitudes and longitudes from this method can often be off by hundreds of meters. This means when a user is standing directly at the point of interest, there is a good chance they will not fall into the geofencing radius! To overcome this problem you either have to manually fix the location data (like eBay did) or use user data to automatically correct the locations.

2. Battery Drain on Android
Android has built in support for geofencing, but unfortunately it has severe battery drain issues on certain versions & devices. To work around this issue, many developers have taken to developing their own proprietary geofencing framework based on polling GPS. However, a lot of attention needs to be given into tweaking the framework to ensure users receive notifications in a timely manner, but without significantly draining their battery. Some ways to decrease battery drain are to poll less frequently, or instruct the device to only use wifi data to get a location fix rather than turning on the GPS.

3. iOS’s hard limit on the number of geofenced regions
Thankfully, iOS has a geofencing framework that doesn’t seem to drain user’s batteries. Unfortunately, iOS sets a pretty low limit on the number of regions you can monitor. This means to geofence effectively, you need to update which regions your geofence as a user moves around. Luckily, iOS has another framework that will wake up your app when the user has moved a significant distance (on the order of 5-10 miles). You can use this framework to wake up and update the regions your app is geofencing.

These are just a few of the problems that might be encountered when developing geofencing, but geofencing is such a powerful engagement tool and it is well worth the effort.

Hacking Mobile Invites Using Census Data

Getting existing users to invite their friends is a basic, but effective growth strategy. Many of the social networks such as Facebook/LinkedIn drive tremendous growth by pushing new users to invite their friends by importing their email address book. Mobile apps have tried to take the same approach, but the problem is, most don’t do a very good job of maximizing the number of invites. In this post I’ll show a little hack on how you can use demographic data compiled by the US Government to maximize the number of invites.

Suggest Friends to Invite

One of the key mistakes many mobile invite flows make, is they just show an alphabetical list of all your contacts. The problem with this UI is it can take several seconds per friend to search for their contact and add them to the list of people being invited. The alphabetical UI makes inviting more than a couple friends a chore.

The secret to improving this is simply figuring out everyone the user would want to invite and then just putting those people at the top of the list. By showing a list of suggested friends you get two benefits over an alphabetical list (assuming of course that you can deliver quality suggestions)

  1. Adding friends from the suggested section is extremely fast
  2. You can remind the user to invite people that they may have otherwise forgotten

Friend Recommendations


Pulling suggestions from thin air.

There are two components that go into suggestions. One is figuring out whom the user would be willing to invite and the other is to figure out who would be interested in your app. The second part can be especially challenging if all you have to go on is just the contact’s name and phone number. So how do you figure out if someone is interested in your app just based on their name? Well, it helps if you have a broad target demographic (ex: females between the ages of 20 to 45). This is because you can often infer not only someone’s gender, but also their approximate age from just from their name. As fans of Freakonomics already know, popularity of certain names tend to rise and fall over the years.

So, how do you actually go about figuring out someone’s age from their name? Well the Social Security Administration publicizes on their website the top 1000 baby names for each year from the past century. By crunching this data you can figure out for each name, the probability that someone with that name falls in your target demographic and use that information to help generate suggestions.

Popularity of female names by year