This document provides an overview of maximum likelihood estimation. It explains that maximum likelihood estimation finds the parameters of a probability distribution that make the observed data most probable. It gives the example of using maximum likelihood estimation to find the values of μ and σ that result in a normal distribution that best fits a data set. The goal of maximum likelihood is to find the parameter values that give the distribution with the highest probability of observing the actual data. It also discusses the concept of likelihood and compares it to probability, as well as considerations for removing constants and using the log-likelihood.
Maximum likelihood estimation
In statistics, maximum likelihood estimation is a method of estimating the
parameters of a probability distribution by maximizing a likelihood function,
so that under the assumed statistical model the observed data is most
probable.
Maximum likelihood estimation is a method that will find the values of μ and
σ that result in the curve that best fits the data. The goal of maximum
likelihood is to find the parameter values that give the distribution that
maximise the probability of observing the data.
3.
The concept oflikelihood
If the probability of an event X dependent on model parameters p is written
P(X I p) then we would talk about the likelihood L(p I X)
That is, the likelihood of the parameters given the data
The aim of maximum likelihood estimation is to find the parameter value(s)
that makes the observed data most likely
Probability Knowing parameters -> Prediction of outcome
Likelihood Observation of data -> Estimation of parameters
Log-likelihood
The mainreason for this is computational rather than theoretical
If you multiply lots of very small numbers together (say all less than 0.0001)
then you will very quickly end up with a number that is too small to be
represented by any calculator or computer as different from zero
This situation will often occur in calculating likelihoods, when we are often
multiplying the probabilities of lots of rare but independent events together
to calculate the joint probability