Significance in statistics
That feeling when your p-value is lower than your alpha, aww yeah! But what does it really mean? It’s one thing to say there is significance and on the surface it means the two things are different “enough” to be considered two things, but I think there’s a simpler way to explain it. So today we’re going to talk about what significance actually means in the practical sense. Maybe it’s super obvious, but it never hurts to state it anyway.
Statistics is hard, if it wasn’t everyone would understand it. In school we’re taught how to work the tools we’re given, the t-test, the ANOVA, f-test, things like that, but we’re rarely explained with enough detail why those things even are the way they are. Why do they work? Why can’t we use something completely different? Why are there assumptions we need to make to use the tools? And if you’re anything like me, the learning stops and starts at the why. I was a mediocre calculus student, but scored highest in my class in differential equations (which is basically the jeopardy version of calculus, seriously). I realized my success in that class was because the teacher took the time to go over the why. Ever since I’ve been hunting down the why in everything I learn, so this is my attempt to explain the why in statistics. I’m still learning, but the hope is that this helps you learn too!
We’ve covered a lot in the “in statistics” series. In particular I’ve explained confidence intervals and why they help us determine if we have significance or not. Does the number you have fall within the confidence interval? If not, it’s significant. Yet, I realized the other day that we never really defined significance. Which is an important “why” when we do statistics. Why is something significant and what does that mean?
The obvious answer is that the two things we’re looking at are different. That’s the truth of it, but there is some nuance here that we can talk about. I learn best with examples, so let’s just start with an example and explain what we really have when something is NOT significant. Two car companies release brand new super fast cars that everyone wants. Both companies say theirs is the best and people are divided based on brand favoritism, but not you! You want to see if there is a significant difference between the two, so you go to the dealerships and test drive several of the same type for both brands.
You’ve decided you want something super fast, so you measure the 0-60mph (miles per hour for those of you who are not American) for 5 of each brand to make sure that the dealer isn’t being sneaky and lending you a special version. You collect your data, do a t-test and come to a conclusion. Before we determine what that conclusion is, let’s state the null hypothesis and our significance (since it’s a good habit to do). Our null hypothesis is the “boring” hypothesis, which is that the there is no difference in the 0-60mph times between brands. That means are alternate hypothesis is that there IS a difference. Since we’re doing this for fun (because we all agree that statistics is fun!) we set our alpha to 0.05, which again means there is a 5% chance that we make a type 2 error.
So we determine the mean and standard deviations for both sets of data (our two samples, which are the two brands), plug the data into our software since we’re lazy and don’t feel like looking at a table, and find no significance! *BUMMM BUMMM BUMMMMMMMMMMM!!!* That means that there is no difference in the 0-60mph times between the two cars, sure but obviously there are two different brands so what gives?
When we do not have significance it means our groups are artificial, or rather that we separated our samples (the two brands), but it’s not a “real” separation, they have the same property that we were testing. It’s like separating cars and trucks, then determining if one sample has more doors than the other. If there isn’t (which I would assume to be true since most in either sample have either 2 or 4 doors) separation we’ve drawn is arbitrary, we could’ve separated our population (gas powered vehicles as a whole) by color and compared doors for all it matters.
That’s where the subtlety comes in, we’ve drawn a split in our data that isn’t adequately justified from a statistics standpoint. That means that they are all technically the same, it wouldn’t matter how we throw our brand labels onto the cars, our 0-60mph time tests wouldn’t be able to tell them apart. However, that doesn’t mean that is true for every measure, it just means that in this case it doesn’t work. Put another way, the split isn’t “real” just for the value we’re testing, but it could be a valid way to seperate the cars for a different metric. Let’s look at another example using our two brands.
Maybe instead of 0-60 times we want to look at weight of the two brands and we find that one of them is significantly lighter (p < 0.05) than the other. Suddenly our two groups are different and we can say that the way we divided the two groups (by brand) is a valid way to separate our data. Basically how you seperate your data matters and that separation is only valid for that one measurement, in this case our seperation by brand was valid for weight, but it was an artificial separation for our 0-60 mph time.
In a practical sense, this may be that a new heart medication doesn’t significantly help people with heart problems. So our “split” between groups is artificial. It’s “artificial” because we can’t tell the two groups apart unless we know which is taking the medication, we can’t look at the data and say anything of value. However, say the drug unexpectedly helps people with high blood sugar, so the researchers run another test between two groups one with the medication and one without and find a significant lowering in blood sugar. Suddenly our split (those taking the medication vs. those not) is valid.
So while this all may be obvious to some, it’s always good to go over it and cover the details. Because that’s the thing about doing statistics, you can’t just look at the result, you need to look at the details. In the world there is a lot of people doing bad statistics, knowing even the basics — even if you aren’t taking a statistics course or won’t ever use it for your work — can help you separate out the good statistics from the bad.