Day #21 : Defining Parametric Statistics – 365 DoA
Well my lovely readers, we’ve made it to the three week mark, 5.7% of the way through! Okay maybe that doesn’t seem like a big deal written like that, but hey it’s progress. So last post we had our independence day, or rather defined what it meant to have independent events vs. dependent events. We also said it was an important assumption in parametric statistics that our events are independent, but then we realized we never defined what parametric statistics even is, oops. So let’s stop dragging our feet and talk parametric statistics!*
Statistical analysis is an interesting subject because we can get very complex, or we can simplify to the point that our result doesn’t match what we actually see. Sure that could be said for a lot of things, but what’s odd is that even with the assumptions made (or not made) we can sometimes glean useful information from our model even if the assumptions don’t hold. Why are we talking about this when we should be defining parametric statistics? Well simply because there are assumptions made in parametric statistics and what is interesting is that even if those assumptions don’t hold, the analysis can still provide information about your dataset. Which, to me is fascinating.
Still we’re here for parametric statistics, so let’s do it.
Back in about 1925, the term parametric statistics was mentioned in a work by smart guy Ronald Fisher. His work, “Statistical Methods for Research Workers” created the foundation for modern statistics, so if you’re not a fan, you can blame Ronald. Parametric statistics specifically is a branch of statistics where we assume that our sample data come from a population that can be modeled by a probability distribution with a fixed set of parameters.
In this case things like population mean is a parameter, while sample mean is a statistic, an important distinction since we talked about sample means a few posts back when we covered the central limit theorem. When we say parametric statistics, we are talking about parameters that define a dataset. If we can use a set number of parameters to define a dataset, we can use parametric statistical tests. This is why when we use parametric statistics, we have assumptions to begin with, these assumptions allow us to simplify our analysis. We’ve covered the assumptions we make when we use parametric statistics, but let’s go over it one more time. If we are going to use parametric statistics, we typically assume:
- A normal distribution, our data should have a normal (or at LEAST) symmetric distribution
- Similar variance, our data from multiple groups is assumed to have the same variance.
- Independence, we assume that our data are independent, which we defined last post!
The first two we can test for explicitly by some simple plots, the independence part is a little harder to do. That’s not to say we cannot test for independence, we can and we saw an example of how to do that when we talked independence. The problem is that when we have very complex data, checking this assumption using the data is difficult so we rely on the how. As in how was the data collected, what kind of data are we looking at, and how do we know that each data point is independent of every other data point.
Okay, we’ve defined parametric statistics, sort of. This all may seem like a bit much, so let’s simplify it slightly. When we use parametric tests (IE tests that assume we are dealing with parametric statistics) we are basically just testing the means. Let’s look at an example of what we mean when we say that.
The humble (or not so humble) t-test! A t-test is used to determine if there is a significant difference between the means of two groups, which may be related to certain features (IE – a movement vs no movement condition when analysing EEG data). This test relies on parametric statistics.
If we look at our coin flipping example, we saw that our dataset had a normal distribution and an unknown variance. A t-test works well when we are looking at data and we aren’t sure if it came from the same population (IE testing one condition to another to determine if there is a difference). In our coin example we are comparing our dataset to the dataset of an idealized coin to see if our data falls in the same distribution (IE we cannot tell a difference between our data and that data). If we wanted to determine if our data fit into one of several different and distinct groups, there are other tools that we would use for that and we can talk about those later.
Hopefully that clears some of this up, if not don’t worry! Tomorrow we will cover nonparametric statistics and compare the two so we can see the difference. That SHOULD clear up any lingering questions about what parametric statistical tests are and what nonparametric statistical tests are. If that doesn’t help, well we still have 94.3% of 365 Days of Academia left, so we can go over it more for sure.
Until next time, don’t stop learning!
*As usual, I make no claim to the accuracy of this information, some of it might be wrong. I’m learning, which is why I’m writing these posts and if you’re reading this then I am assuming you are trying to learn too. My plea to you is this, if you see something that is not correct, or if you want to expand on something, do it. Let’s learn together!!