Statistics: using data to infer probabilistic model (distribution with unknown parameter)
Parameter estimation: given that we know the model, find the parameter of our model
Maximum Likelihood Estimate
Maximum Likelihood Estimate (MLE):
likelihood X = k: p.m.f. Pr\{X = k\} \triangleq p_{X;\theta}(k) with unknown parameter \theta and known data k.
MEL of \theta: \hat{\theta}_{ML}(k) \in \arg \max_{\theta}P_{X;\theta}(k) find the value \theta that maximizes the likelihood p_{X;\theta}(k) (ie. maximize the event that we use probability to get data we have).
WARNING: we treat p_{X;\theta}(k) as a function of \theta (the parameter) instead of k (the observed data).
Example: The likelihood of X \sim \text{Binomial}(100, p) = k is
We got \frac{d}{dp}p_{X;p} = 0 when p = \frac{k}{100} (verify that \frac{d^2}{dp^2}p_{X;p} > 0). Therefore, maximum likelihood estimate is:
\hat{p}_{ML}(k) = \frac{k}{100}
Maximum Likelihood Estimate with i.i.d. data
Say we have n random variables X_{1:n} := X_1, X_2, ..., X_n, then the joint distribution is p_{X_{1:n};\theta}(k_{1:n})
Example: we want to know \lambda parameter in 30 days in which each day a random variable X is drawn out of \text{Poisson}(\lambda). We know X_1 = k_1, X_2 = k_2, ..., X_{30} = k_{30}.
This is intuitive because the mean of Exponential(\lambda) is \frac{1}{\lambda}.
Normal Maximum Likelihood
Let X_{1:n} be i.i.d. such that each X_i \sim \text{Normal}(\mu, \sigma^2). We find \hat{\sigma}_{ML}(t_{1:n}) using log-likelihood, assuming \mu is known.
For \text{Binomial}(n, p), we know that given data k, for fixed n, \hat{p}_{ML}(k) = \frac{k}{n}. But we can treat k itself as a random variable. Let k \sim X, then \hat{p}_{ML}(X) becomes a random variable.