Ever tested a landing page, implemented it, and watched your metrics move in the wrong direction?
It happens to most of us, despite our best efforts.
Why?
Sometimes it’s a random chance. Sometimes it caused by changes in the marketplace.
But sometimes it’s something else entirely: statistical error.
I’ve collected six statistical errors that CROs, and marketers who dabble in CRO, often make. Let’s take a look, so we can try to avoid them.
1. Not Understanding Statistical Significance
There was a time, not so long ago when marketers would run split tests without even considering statistical significance.
Thankfully, now that most split tests are run using tools that automatically measure it, this outcome is far less common.
I wish I could say the same for actually understanding statistical significance.
I know, I know, we’re getting nerdy here.
Look, I don’t expect everybody to have a graduate-level understanding of confidence intervals and p-tests, but there’s a bare-minimum of understanding we need to have here in order to put these results to use.
Let me explain why.
First off, it’s tempting to stop the test before you reach that lauded 95 percent confidence limit if you don’t understand why it’s there. You might think, “Hey, 90 percent confident is pretty confident, why am I wasting money running the other landing page when it’s not converting as well?”
Here’s why.
When your result reaches 95 percent statistical confidence, it doesn’t mean there’s a 95 percent chance that you picked a winner. That’s a tempting interpretation, but what it really means is that there’s a 5 percent chance your result would be at least this extreme, even if the long-term performances were null.
And what that means is that if none of your split tests made a difference at all, 1 in 20 would still get a “95 percent confident” result!
Let me put that the other way round:
If you run 10 tests and only get one statistically significant result, it’s pretty much 50/50 that that result is a fluke.
See how bad this gets if you settle for, say, 80 percent confidence? You could get 1 positive result out of 5 by sheer chance, and likely pick a lot of losers.
And, that brings us to the second reason why it’s important to understand statistical significance: just because you ran an experiment before, that doesn’t mean the result is written in stone.
Sometimes a statistically significant result is a fluke, even if you’re hitting that 95 percent limit.
Take note of how many of your results are positive. If you’re getting 1 positive result for every 10 tests, odds are good that half of your positive results are flukes.
Be ready to test again if you have reason to doubt the previous result.
2. Getting Fooled By Bots
Spam bots and ghost referrer spams can trigger the JavaScript in your Google Analytics or split testing software, and that means that the results of your tests can be botched, your site’s overall conversion rates and bounce rates can end up being thrown out of whack, and so on.
Bots capable of triggering JavaScript can make up a third of traffic on large sites, and can even make up the majority of traffic on smaller sites.
How can you rule out bots, as an influence in your split tests?
Start by looking at your referral sources for unfamiliar sources.
Look for reviews of unfamiliar sources to see if they are trustworthy.
You should also look at your pages to see if any nonexistent pages are listed, and check for fake events triggered in your analytics.
Some bots are direct referrers, making it impossible to see where they are coming from. To spot these, look for networks and browsers that have suspicious bounce rates, time on site, etc. compared to the rest of your traffic.
When running split test, in particular, it’s important to verify that bots aren’t skewing results by reducing conversions for one version over another. Check for consistency across browsers and networks, between versions of your split test.
In Google Analytics, make sure that “Exclude all hits from known bots and spiders” is checked, in “View settings” in the “Admin” section.
You can filter out ghost bot traffic by setting up a hostname filter in Google Analytics. Since ghost bot traffic doesn’t show up in your server logs, you can filter your traffic to only display hostnames that are actually used to host your site.
Similarly, you can filter out spam visits by creating custom segments to remove the suspicious referrers and browsers.
There’s no guaranteed method of removing all bot traffic from your reports, but these methods can make a huge difference. One final important step you can take, however, is to view data on your split tests for trusted referrers only to check for any inconsistencies there.
3. Thinking Correlation Is Causation
I think we’ve all heard this one by now, but in case you haven’t, correlation means two things tend to occur together, while causation means one thing causes another.
Causation leads to correlation, but correlation doesn’t always mean one thing causes another. There may be no cause and effect relationship, or you might have cause and effect backwards.
Even though most of us are in on this by now, it’s easy to relegate this understanding to our intellectual brain and then go on ignoring it in practice, usually without even realizing it.
So, repeat after me: “correlation is not causation.”
There are two important takeaways here:
- Sometimes correlation is causation.
- We will never know unless we test.
There is no shame at all in analyzing your existing data, looking for bumps in conversions, and looking for possible actions taken that could have caused the bump.
What’s important is what comes next: figuring out a way to test that inference.
I get it. We live in the real world and we can’t run everything through an A/B test, and I’m not saying you should try to, because it isn’t always possible.
What you can always do, though, is develop a plan of action and some metrics to measure. Where split tests aren’t possible, you can stagger your actions, apply them in different places, and see if the results follow a predictable pattern.
4. Confusing Statistical & Practical Significance
Was the result of your last A/B test 99.999 percent confident? That’s great, but if the result was a .1 percent increase in your conversion rate, does it really matter?
That example’s extreme, since it would take a huge number of views to get that level of confidence on such a small improvement, but it does get to the heart of a common confusion.
Weak statistical confidence doesn’t mean that your versions aren’t that different from each other; it might just mean you haven’t had many views to test. Strong statistical confidence doesn’t mean that one version is far better than the other; it could just mean you ran the test for a long time.
In the section on statistical significance, I strongly urged against accepting results with weak statistical significance. Here I want to make sure you don’t interpret this the wrong way. I’m not saying you should keep running every test until you get a statistically significant result.
Quite the opposite.
Small companies with limited traffic should focus on making big changes that result in big effects on conversion rate. That means focusing on changes that you can test quickly.
If your versions are so similar to each other that you can’t get a statistically significant result in a relatively short period of time, you may not be testing big enough.
I don’t hear it as often as I used to, but it used to be common “knowledge” that you were supposed to test “only one thing at a time.” This is generally a bad advice when interpreted literally: CRO should be about testing cohesive options against each other, not making random alterations with no theoretical justification.
Consider a medical analogy. It’s true that in a pharmaceutical experiment, you wouldn’t want to test one medication against a different medication plus exercise, since you wouldn’t be able to tell if changes were caused by the medication, the exercise, or both. That’s supposed to be the justification for not testing more than one thing at a time.
But by analogy, I would argue that your entire landing page is the medication. Tweaking individual site elements would be analogous to tweaking atoms in the compound of your medication with no medical justification. Your landing pages should be constructed to test one overall idea.
5. Disregarding Traffic Source
Don’t expect results from one traffic source to carry over to another.
I’ve seen this happen all too often.
If a landing page is designed for AdWords, don’t expect to get the same results if you switch to Facebook.
Don’t test a new homepage with paid traffic and then expect your organic traffic to behave similarly. This is a great way to throw away money fast.
Always test a landing page for the traffic source it will be receiving. There’s not much else to say here, I just see this faulty thinking often.
6. Ignoring Micro-conversions (Or Macro-conversions)
I’ve seen people test new layouts and throw up their hands when they didn’t see any change in conversions, not realizing that there was a strong and important impact on how far visitors made it through the funnel.
I’ve also seen people test landing pages that boosted click-through rates to the cart, only to realize that their sales went down after updating the page.
It’s important not to let your key metrics blind you to other things that are going on.
I’m an advocate of measuring both micro- and macro-conversions. We need to see the sales funnel as a cohesive system, not as a series of individual steps.
Sometimes boosting a click-through rate here will cause a decrease in sales there, by failing to alleviate a consumer concern.
Other times, it may remove a bottleneck without improving overall sales, revealing where other bottlenecks in the funnel might be.
Don’t fall into this trap.
More Conversion Rate Optimization Resources: