Day 14: Significance, Part 2
Well here we are two weeks into 365DoA, I was excited until I realized that puts us at 3.8356% of the way done. So if you remember from last post we’ve started our significance talk, as in what does it mean to have a value that is significant, what does that mean exactly, and how to do we find out? Today is the day I finally break, we’re going to have to do some math. Despite my best efforts I don’t think we can finish the significance discussion without it and still manage to make sense. With that, let’s just dive in.*
There are a few different ways we can determine significance. However, since the methods rely on different maths, today we are going to focus on something called the T-test. In this example we are going to have two different populations, if you recall from the last post (I know I keep linking to previous posts, but it’s important to make sure you are up to speed) we said that if we have an experiment set up to determine if a coin is bias — when we flip it, does it favor heads/tails? — our null hypothesis is going to be the coin is NOT bias. That is to say that if we flip the coin a bunch of times, there is no large difference between heads/tails. The big question we left off on was how to we define “large difference,” because if you remember I said you can flip a coin hundreds of times, but it won’t be exactly 50% heads 50% tails. Instead it trends that way, but because of other variables like the way you flip it, wind, etc. you won’t typically see 50/50 heads/tails.
So we flip test our coin (the coin we are not sure is bias or not), let’s say 10,000 times and tally the result to see if it is heads or tails. After a long time (10,000 times is a lot) at first glance it looks like we have let’s say 156 more heads than tails. Does that mean we have a coin that is bias? Well there are two ways we can determine this:
- We assume the probability density function for the coin is normally distributed and attempt to see where our result falls on the curve.
- We have a second coin, one we know is not bias and perform the experiment and compare the result
In the first solution, we would be using parametric statistics. We can do this when we assume that our probability density is normally distributed and that our trials are independent (IE I don’t flip the coin a special way if I get a heads vs. a tails). Because we did NOT treat the coin specially depending on the result we can make both of these assumptions and we can see where our value falls on the curve.
By now you’re probably scratching your head about the probability density and where that came from. Well, that is where the math comes in, sorry I tried. So we say that a coin flip is going to be either heads or tails and if it is fair we should see 50% of each written differently:
p(heads) = 0.5 = p(tails)
Where p is probability so p(heads) is the probability of heads and p(tails) the probability of getting a tails. I’m explicitly explaining this because we will be using that notation quite a bit. We can rewrite this equation if you would prefer to sum to 1 (100%):
p(heads)+p(tails) = 1
This should make intuitive sense because assuming the coin doesn’t land on its side, the number of heads and tails should sum to 1 (which again means 100%) no matter how many heads or tails you get (IE flip a coin 10 times and it is all tails the p(tails) = 1 and the p(heads) is 0, so you sum to 1 even in this extreme). This gets far more complex when you start doing more than one or two trials though, for an example say we flip a coin 5 times, what are the possible outcomes?
Well without writing them all out, we have 22222 = 2^5 = 32 possibilities. Which is why I didn’t write them all out. Now how did I calculate that? Well you have either heads or tails as an outcome (2) and we flip the coin five times, so we have 2^5 possible outcomes. Don’t believe me? Well, write them out, don’t worry I can wait…
I promise we are getting to the probability curve so just stick with me here.
Fine, you don’t have to beg, I’ll write it all out. This will be helpful anyway, but to make things more screen friendly let’s limit the coin flips to 3, that would give us 2^3 =8 possibilities (where H is heads and T is tails) which are:
That is all you can flip, no really that might be easier for you to write out anyway, check my work, I insist. Now, with all this written out, let’s talk about why this is a normal distribution. Hopefully we can all agree that if the coin is fair, each of these outcomes is equally likely to happen, because we have eight the likelihood of getting the first outcome is 1/8 or 0.125, same for the other seven. Here is where it gets interesting, what are the chances you flip only heads?
p(3 heads) = 1/8 = 0.125
This should be intuitive because if we look at the list of all the possible outcomes for the three trials there is only one instance where we get all heads. Now, say we only care about getting 2 heads, what is the probability that for three flips we will get exactly 2 heads?
p(2 heads) = 3/8 = 0.375
Why is it different if each of the choices is just as likely? If we look at our list of all possible outcomes, we see exactly three choices out of the eight that give us two heads. So the probability is 3/8 or 37.5%. You’ll notice if we flip the question and ask what are the odds we flip exactly 2 tails the answer is the same.
Okay so we went over the basics, but what if we have more than three tosses (which we also can call trials), say we have n number of trials. We do this a lot in math when we want to describe something generally in this case n = any number you want greater than zero.
An aside, technically you can set n = 0, but this is what we call the trivial case (or solution) and if you ever hear that term chances are the answer to the trivial case is going to be zero, which it is in this case. In other words the probability of getting heads or tails without flipping a coin is zero.
We also can generalize our other variable and say that k is the number of heads or tails we are interested in. In our previous case we asked what the chances of getting two heads is, so in that case k = 2 heads. However, by generalizing it we can set it to anything, notice in this case if we set k = 0 heads we do not have the trivial solution, because saying k = 0 heads is the same as asking k = n tails, which will give us a non-zero value. We just saw this in the above example with three coin flips.
So now we have the following:
p(number of heads, number of tosses) = p(k, n)
So the probability of of getting k number of heads/tails (since we use this value for either question) in n number of coin flips would now read:
p(k, n) = (1/2)^n C(k,n)
You may have noticed that I snuck in a variable I didn’t talk about, well two if you want to get technical. Let’s tackle them one at a time, first the (1/2)^n, which we read as (1/2) to the power of n and remember that in this case n is the number of trials (or coin flips) we have. C(k.n) is the number of combinations of k (the number of heads or tails) we can have in n trials (in our above example C(k,n) = C(2 heads, 3 trials) = 3, or three different combinations that gave us 2 heads for 3 coin flips.
The term combinations is important here, combinations are the way we can arrange the outcomes WITHOUT taking order into consideration. For example if we only care that we have two heads in our three trials we can write it HHT HTH or THH. Since order doesn’t matter, we say that these are all valid solutions.
However, if order matters, you are talking about permutations. An easy way to explain permutations is to say that a combination lock is really a permutation lock. This is because order matters when you are using the lock. If 1, 2, 3 opens the lock, even though you are using the same numbers 3, 2, 1 or any other combination won’t open the lock, only 1, 2, 3.
This formula comes from something called pascal’s triangle and I found a lovely gif that explains this fairly well.
If you look, you have n rows of disks, this means that:
- there are a total of 1+2+…+n disks
- every yellow disc corresponds to a unique pair of blue discs, and vice versa
- this means that we have (n+1)/2 = 1/2*n(n+1) pairs.
Yeah, it’s a little complex and writing out the math in a more intuitive (or frankly standard) way is difficult on the blog, but I think the biggest takeaway from all of this is that we can mathematically describe a binomial distribution (which is basically the discrete version of the normal distribution) using the tools we’ve just covered.
This post is getting a little long for one day. I didn’t want to do this, but I think we will end here and I will show you how to determine if we have a value that is statistically significant (in this case that would mean we have a bias coin) or if we were just unlucky and our coin is fair. Once we do that, we can talk nonparametric statistics using the same example. Luckily I feel like that topic will go a little faster now that we’ve covered most of the basics.
Until next time, don’t stop learning!
*Reminder, I make no claim to the accuracy of this information, some of it might be wrong. I’m learning, which is why I’m here writing all this. If you’re reading this then you are probably trying to learn too. If you see something that is not correct, or if you want to expand on something, please do it. Let’s learn together!!