Day 25: The p-value
Now it seems like we are getting somewhere. Last post we covered z-score and you can read that if you haven’t already, it might be good to familiarize yourself with it since today we are going to talk p-value and the difference between z-score and p-value. That said, let’s dive in and look at the value in the p-value.*
Quick review if you didn’t feel like clicking the link (no worries I know that feeling). z-score was just a measure of how far our value was from the mean in standard deviations. We said that our standard deviation isn’t a fixed value, but a value that depends on the skewness of our data (how spread out our values are). We also said that the concept was important because we could say how likely an observed value was caused by randomness or if it was actually significant by how many standard deviations away from the mean it was located.
For example, when dealing with normally distributed data, we know that the amount of data located one standard deviation away from the mean (either plus or minus) is 68.3%, if we go out to two standard deviations away from the mean we find 95.5% of our data, etc. This tells us something very important, if our observed value is more than two standard deviations away from the mean then we can say that there is at LEAST a 95.5% chance that our observed value is significant and not caused by randomness of the system.
However, just like there are different units to measure with, there are also different ways we can determine significance. Z-score is (in my opinion) very intuitive, but z-score has a sibling and that is p-value. P-value is sort of like z-score in they both tell us how significant our observed value is, however instead of counting the standard deviations away p-value uses percentages.
This may make you think that p-value stands for percent value, this isn’t the case, although it may help to remember it that way. P-value actually stands for probability value. We covered what the standard is for selecting a p-value, that was p-value = 0.05. We also talked about cases where that isn’t always the case, like particle physics, or where p-value doesn’t hold and needs to be adjusted using the bonferroni correction (or some other correction, currently we’ve only talked bonferroni).
Using our z-score from before, we can calculate the p-value and there are several different tables or even functions (such as the normcdf function, which we will cover some other time). There is an actual formula for you to calculate it directly, however really what you are doing is calculating the z-score anyway. Hence why we said that the z-score and p-value are related.
One last point before we wrap up, a p-value of p = 0.05 really means that there is a 5% chance that your observed value was caused by randomness. In other words, when used properly (heavy emphasis on properly) you have a 5% chance of making a type 1 error. Which is why p-values are more useful when talking about your results because p = 0.05 is easier to grasp than trying to explain that your data point is 3 standard deviations away.
Well that’s all we have time for today. Next up we may be talking about statistical distributions since we are on a heavy statistics kick, who knows! In any case, it feels like we did a pretty good job covering what a p-value is and how it works.
Until next time, don’t stop learning!
*Don’t forget, I make no claim to the accuracy of this information; some of it might be wrong. I’m learning, which is why I’m writing these posts and if you’re reading this then I am assuming you are trying to learn too. My plea to you is this, if you see something that is not correct, or if you want to expand on something, do it. Let’s learn together!!