Day 250: Maximum Likelihood Estimation
If we are going to talk about expectation maximization (now that I’m done complaining for a bit), we are going to have to introduce the idea of maximum likelihood. It’s going to be very easy to introduce, but it is a very powerful tool in estimating the state of something. Of course, it takes understanding a little bit of statistics, but trust me, if I can understand it, so can you.
The normal distribution, it is everywhere in nature and this is a good thing! As engineers we love the normal distribution because it is easy to work with. It describes so many different things that we even have ways to take distributions that are not normal and make them normal. There are other ways to do that too, that is just one example. So why are we talking about the normal distribution? Well maximum likelihood relies on something very simple about this distribution, take a look at the image below.
This is the humble normal (or gaussian) distribution. I’ve gone ahead and labeled the mean here (since it isn’t clearly labeled otherwise). The probability that something will happen is highest at the mean, it is the largest value, if we look near the tails at say 3σ or -3σ it is very unlikely that this will occur. Think of it this way, if I flip a fair coin 1000 times I should get ~50% heads and ~50% tails, that 50/50 result corresponds to the mean.
Now what if we flipped the coin and found a ~75% heads and 25% tails, this is unlikely and this result would be further to the tails of the distribution (the 3σ or -3σ). Here is a gif that shows this happening in real life
Okay so hopefully we are now convinced that the highest probability of something occuring in the normal distribution is the mean value. This is the idea behind maximum likelihood. Seriously, it is just a fancy way of calculating the mean.
Maybe that’s a lot of buildup for nothing, but the idea is this, if we know that our process has a normal distribution, we can sum the values we get from the process and divide by the total number of inputs (in other words, we attempt to find the mean value). If you prefer seeing the math it looks like this:
Which in words just says that our maximum likelihood estimate lambda hat (the symbol) equals (and we are taking this out of order) the sum (Σ symbol) from 1 to n (where n is the total number of observations) of xi (where x is our observations and xi is the ith observation) all of that is divided by the total number of observations (n) and that is just equal to the mean (x̄).
So now that we have this introduced, we can discuss expectation maximization, which is very similar, but still very different. Let’s just say it’s complicated, but I’ll do my best to break it down.