Defining parametric tests in statistics
We’ve been throwing around the term a lot in this series. I’ve been saying in parametric statistics this, in parametric statistics that, but I kept putting off giving a definition. It’s not because it’s hard to understand, it’s just that typically when you’re doing statistics you already know if you’re using a parametric test, but because we try to make no assumptions in this series, we’re going to put this to bed once and for all. Today we’re talking about parametric statistics!
If I sat you in front of a car and said that you needed to fix something, you may be able to do it, you may not. If I told you what you were replacing and how, you could probably do it yourself (with some work of course). However, if I told you there was a problem with the car and that you needed to fix it, if you didn’t understand how the car works, it would almost be impossible for you to do the needed repairs. I feel that way about statistics, if you don’t understand how statistics works, how could I ever hope to apply the proper tools to the task? How would I know I was doing it right, or that the output I got was correct? That’s the point of the “in statistics” series, to help people understand the why behind the tools. Because chances are you’re going to one day find yourself solving problems that have no predefined solution to verify you did it correctly.
Since we have been talking about statistics, I’ve been saying that we make several assumptions to be able to apply the tools we use. All tests make some sort of assumptions, even the non-parametric tests, which we will define some other time. Parametric tests have a couple basic assumptions about the underlying population you’ve sampled, but before we get into all that let’s step back and define parametric statistics to begin with.
When we do parametric testing, what we are really doing is building a model of our population (that would be all the data that exists, from which we’ve taken a small sample of) using certain parameters. So remember, parametric as in parameter. Parametric tests hold the assumption that we can use a set of parameters to estimate our population using the sample that we collected. Because we are defining our population using these parameters, we need to know in advance that the population we’re trying to model matches the assumptions of the model building tools we’re using.
Some (non-exhaustive) examples of parametric statistics are the t-test, the ANOVA, and the Tukey test (remember t-test is definitely not short for Tukey test, ask me why I’m reminding you of this). At the heart of all of this, they have assumptions about the population your data was sampled from, the big assumptions are:
- ) Your data are independent (IID)
- ) The population from which you’re sampling is normal (or can be approximated as normal)
- ) That if two or more groups are being sampled, they have equal (or close to equal) variance.
That third assumption, isn’t ALWAYS necessary and in fact, you can “shut it off” in certain tests, like with the t-test or ANOVA by telling the software to not assume equal variances.
Now that we’ve listed the assumptions, let me reiterate the why, as in why do we need to have these assumptions. All the tools we use in parametric statistics are basically doing the same thing, trying to construct a model of the population the data was sampled from without having the entire population data (which would be difficult, if not impossible to get). Therefore because the normal distribution seems to always pop up (also thanks to the central limit theorem), we’ve built tools to make our lives easier to work with data that come from these distributions.
Non-parametric tests, which we will (hopefully) go into more detail about in a future post, do not make assumptions about the population the data came from. That isn’t to say they have no assumptions at all, but just that the assumptions non-parametric tests have are far more lenient than the parametric test assumptions.
The next obvious question is, if non-parametric tests don’t have strict assumptions why not just use them all the time? The short answer is it’s complicated. The slightly longer answer is that parametric tests are better at distinguishing when something is found, they are more “accurate” if you will. All statistical models are wrong, some are just more wrong than others, which is why it’s important to have a large enough sample size for your data. Parametric tests, because of the somewhat rigid assumptions, have more “statistical power” behind them.
You could in theory run both parametric and non-parametric tests on data that meet the parametric test assumptions and your answer should be very close, but not exact. That slight difference is why we use parametric tests whenever possible. Plus there are a lot of built in tools that make life easier for people (especially now with the computer age in full swing). Plus, as odd as this sound, parametric tests do better with skewed or non-normal data (when we use the central limit theorem, meaning we have enough samples to approximate the normal).
The main takeaway being is that we prefer parametric tests because of the statistical power primarily and so we apply them whenever we can. It’s why we always check to make sure the data are normal before running tests instead of going with the non-parametric tests. While we did a pretty good job of going into what a non-parametric test is, we’ll go into more detail (again, hopefully) in a future post. Hopefully this clears up what exactly a parametric test is and why we have so many assumptions.