Are you using A/B tests to improve your product or your comms?
A step-by-step framework to run an A/B test the right way — from hypothesis to final statistical analysis.
→ If you need help with your A/B tests, write to me at ezequiel@bildungdata.com
In my last post I talked about a framework to iterate the communications we send our users using A/B tests.
But how do you actually run an A/B test the right way?
Let's walk through the steps with an example. It applies to Product teams testing new features and Marketing teams testing communications.
1. Define the hypothesis
"The new Onboarding flow is going to increase the conversion rate from New User to Signup."
2. Define the Control and the Variant
- Variant A: 20% of new users.
- Control: 80% of new users.
3. Calculate the minimum sample size
For this, I can use a sample-size calculator (for example: evanmiller.org/ab-testing/sample-size.html). The required inputs are:
- Current CVR from New User to Signup. Let's say it's 45%. That means that of all new users who download my app / land on my website, only 45% complete the signup.
- Minimum Detectable Effect. The minimum difference between branches that my experiment can detect (1-β)% of the time. Standard is 5%.
- Whether to analyze the difference between branches in Absolute or Relative terms: standard is Relative.
- 1−β (1-Beta): the probability of detecting the Minimum Detectable Effect. Standard is 80%.
- α (Alpha): the probability of validating the hypothesis (declaring the test successful) when, in fact, there's no such difference between the branches (the test shouldn't have been declared successful). Standard is 5%.
With these 4 parameters I can get the minimum sample size per branch. Once I have the number, I ask myself:
With my current flow of new users, how long would it take to finish the test? For example, if the minimum sample size per branch is 10,000 users and my app/site has 500 new users per day (400 to the control and 100 to variant A), it would take 100 days to hit the required users in Variant A.
This is where theory hits reality — to run the A/B test in a reasonable time frame (say 15 days), I'll have to define a smaller sample size by changing some of the parameters I set before. For example, I can set the Minimum Detectable Effect to 7%, or bump the % of users going to Variant A up to 30%.
4. Run the A/B test
5. End the test once both branches hit the minimum sample size
I'll need to look at:
- Number of new users who entered Variant A and Control.
- Number of users from each variant who signed up.
I plug the data into my statistical significance calculator (for example: evanmiller.org/ab-testing/chi-squared.html) and analyze. The possible scenarios are:
- Scenario A: Variant A has better conversion than the control with 95% (1-α) confidence. The hypothesis is confirmed and the new onboarding is adopted.
- Scenario B: the Control has better conversion than Variant A with 95% (1-α) confidence. The hypothesis is rejected and the new onboarding is not adopted.
- Scenario C: there are no statistically significant conclusions, so there's no reason to change the current onboarding. The hypothesis is rejected and the new onboarding is not adopted.
Are you starting to run A/B tests and have questions? Write to me and let's talk.