Day #18 : The Bonferroni Correction – 365 DoA
By now we are masters of statistics… right? Okay, not really, but we are getting there. So far we’ve covered two types of errors, type 1 which you can read about here, and type 2 which you can read about here. Armed with this new knowledge we can break into a way to correct for type 1 errors that come about from multiple comparisons. Sound confusing? Well, not for long, let’s break it down and talk Bonferroni.*
As we previously mentioned type 1 errors are our “dog alarm” errors, or false positives. If we’ve made a type 1 error we found significance in our observed value when there really wasn’t any. We discussed one possible way this could happen, if we reduced our threshold for what we considered significant. However, just like any good error there is more than one way to make it!
When we are computing statistics, we sometimes make what is called multiple comparisons. I’ll give an example first. Say we perform an experiment testing memory, so we give our subjects five different memory tests. We are now making something called multiple comparisons. Because we are testing one thing (memory) by five different methods. When we collect this data, we now have 5 different ways to have something be significant, this raises our chances that we will end up with a false positive because we’ve increased the chance that one of our tests will have an outlier result (IE they did better on one test by chance).
This led to the creation of the Bonferroni correction method. It’s a simple ways to control for multiple comparisons, but it does have it’s limitations. As with almost all the things we’ve covered, this method is named after the person who came up with it, Carlo Emilio Bonferroni.
In normal hypothesis testing we will set our confidence level at ~5%, that is to say that we are 95% sure that our result, if significant is not an error. This gives us a p-value of 0.05, but when we do multiple comparisons, we should adjust this and thus we use Bonferroni’s method. Using our memory testing example, we have 5 tests all checking memory, so Bonferroni says that our corrected confidence level is actually
P(significant result) = 1-(1-0.05)^5 = 0.2262190625
Which says that the probability of finding a significant result with a p-value set at 0.05 and using 5 comparisons gives us a corrected p-value of 0.226 or a 22.6% chance of making a type 1 error. This is just for five comparisons, if we instead had ten tests, this number would jump to p-value = 0.401 or 40.1% chance of making a type 1 error. At twenty tests, we would have a p-value = 0.641 or a 64.1% chance of making a type 1 error. So you can see that as we increase the number of tests we do, unless we adjust our confidence level, we are more and more likely to make a type 1 error. Fear not fellow stats person, Bonferroni came to the rescue with the Bonferroni-corrected p-value. This is easy to find too, it is just:
Bonferroni-corrected p-value = α/n
where α is the original p-value (in our example we used 0.05) and n is the number of tests being performed (again in our example we had 5 tests). So if we plug in our values of α = 0.05 and n = 5 we see that in order to correct for our multiple comparisons we need to have a p-value = 0.01. On the extreme end, if we have 20 tests our p-value would now be p-value = 0.0025.
For some of you who are more experienced, you may already see the limitation, it is possible to have hundreds if not thousands of multiple comparisons and that would make the corrected p-value incredibly low, almost impossible to find significance low. Thankfully there are other methods that have their own strengths and weaknesses, like the Tukey method, or the Holm-Bonferroni method. Maybe we can break into some of those in another post, I’m not sure what we’re talking about tomorrow to be honest. That’s it for now though!
Until next time, don’t stop learning!
*As usual, I make no claim to the accuracy of this information, some of it might be wrong. I’m learning, which is why I’m writing these posts and if you’re reading this then I am assuming you are trying to learn too. My plea to yu is this, if you see something that is not correct, or if you want to expand on something, do it. Let’s learn together!!