Lecture 004

Review

Frequentist/classical approach: MLE, confidence in- terval Bayesian approach: MLE, MAP, MMSE

Say we sampled t_1, t_2, ..., t_n from X_i \sim \text{Uniform}(a, 1) where a is unknown.

Maximum Likelihood

We maximize the following. We want a as close to 1^- as possible to maximize \frac{1}{(1 - a)^n} but we don't want g(a) to fall into 0 case.

\begin{align*} f_{X_{1:n};a}(t_{1:n}) =& \begin{cases} \frac{1}{(1 - a)^n} & \text{if } a \leq t_1, t_2, ..., t_n \leq 1\\ 0 & \text{otherwise} \end{cases}\\ g(a) =& \begin{cases} \frac{1}{(1 - a)^n} & \text{if } a \leq \min(t_1, t_2, ..., t_n)\\ 0 & \text{otherwise} \end{cases}\\ \arg \max_a =& \min(t_1, t_2, ..., t_n) \end{align*}

Confidence Interval

It make sense to have an interval of the form I = \left[X_{\min} - \epsilon, X_{\min}\right]

Note that confidence interval is under the frequentist approach, so the unknown parameter (a, \epsilon) is a deterministic quantity. But the interval is constructed based on data, so the interval itself (X_{\min}) is random, i.e., the two endpoints of the interval are random variables.

\begin{align*} Pr\{a \in \left[\min(X_1, ..., X_n) - \epsilon, \min(X_1, ..., X_n)\right]\} \geq& 0.95\\ Pr\{a \geq \min(X_1, ..., X_n)\} \geq& 0.95 \tag{$a \leq \min(X_1, ..., X_n)$ always holds}\\ Pr\{\min(X_1, ..., X_n) \leq a + \epsilon\} \geq& 0.95\\ Pr\{\min(X_1, ..., X_n) > a + \epsilon\} \leq& 0.05\\ \left(\frac{1 - (a + \epsilon)}{1 - a}\right)^n \leq& 0.05\\ \epsilon \geq& (1 - a)(1 - 0.05^{1/n}) \tag{but we need a bound independent of $a$}\\ \epsilon \geq& 1 - 0.05^{1/n} \geq (1 - a)(1 - 0.05^{1/n}) \tag{this is sufficient}\\ \end{align*}

So the bound is I = \left[X_{\min} - (1 - 0.05^{1/n}), X_{\min}\right]

MAP

Now we view parameter a as a random variable A (assuming p.d.f. of A is f_A(a) = \frac{20e^{20a}}{1-e^{-20}} if 0 \leq a \leq 1) rather than deterministic. So we observe data X_i \sim \text{Uniform}(A, 1) (for i = \{1, ..., n\})

We maximize the posterior as following

\begin{align*} f_{a | X_{1:n}=t_{1:n}}(a) =& \frac{f_{X_{1:n}|A=a}(t_{1:n})f_A(a)}{f_{X_{1:n}}(t_{1:n})}\\ =& \begin{cases} \frac{\frac{1}{(1-a)^n}\frac{20e^{20a}}{1-e^{-20}}}{f_{X_{1:n}}(t_{1:n})} & \text{if } 0 \leq a \leq \min(t_1, ..., t_n)\\ 0 & \text{otherwise}\\ \end{cases}\\ \arg\max_a f_{a | X_{1:n}=t_{1:n}}(a) =& \arg\max_{0 \leq a \leq \min(t_1, ..., t_n)} \frac{e^{-20a}}{(1-a)^n} \tag{the maximizing bound for $a$ is important here}\\ \end{align*}

Minimum Mean Square Error (MMSE) Estimation

Notice that when \hat{T}_{map} = \frac{x/t^2 + \mu/\sigma^2}{1/t^2} + \frac{1}{\sigma^2} and we see Pr\{Error\} = Pr\{\frac{x/t^2 + \mu/\sigma^2}{1/t^2} + \frac{1}{\sigma^2} \neq T\} = 1. We are certain the estimation is wrong because it is continuous. We need a better way to capture error.

Mean Square Error (MSE)

The error for estimator \hat{T}(X) is:

E[(\hat{T}(X) - T)^2]

Note that X, T are both random variables.

Minimum mean square error (MMSE) estimator

Given parameter T and data X = k

Theorem: \hat{T}_{MMSE}(X=k) = E[T | X = k] = \int_{-\infty}^\infty t f_{T | X = k}(t) dt

Instead of finding the max of posterior, we find the expectation of posterior.

Minimize Min Squared Error: Find \hat{T}(\cdot) to minimize E[(\hat{T}(X) - T)^2].

\begin{align*} \hat{T}_{MMSE}(X=x) =& E[T | X = x]\\ =& \int_{-\infty}^\infty t f_{T | X = x}(t) dt\\ =& \int_{-\infty}^\infty t \frac{p_{T, X}(t, x)}{p_X(x)} dt\\ =& \int_{-\infty}^\infty t \frac{p_{T, X}(t, x)}{\int_{-\infty}^\infty p_{T, X}(t, x) dt} dt\\ =& \int_{-\infty}^\infty t \frac{p_{X|T=t}(x)p_T(t)}{\int_{-\infty}^\infty p_{X|T=t}(x)p_T(t) dt} dt\\ =& \int_{-\infty}^\infty t \frac{Pr\{X = x | T = t\} f_T(t)}{Pr\{X = x\}} dt\\ =& \int_{-\infty}^\infty t \frac{Pr\{X = x | T = t\} f_T(t)}{\int_{-\infty}^\infty Pr\{X = x | T = t\} f_T(t) dt} dt\\ \end{align*}

MAP vs. MMSE: MAP minimize Pr\{Error\} therefore only suitable for parameter in finite set. MMSE is suitable for parameter in infinite set (continuous) whereas MAP is suitable for parameter in finite set.

Often we can categorize a infinite set of parameter into discrete categories. In this case, we consider the parameter to be finite

Proving MMSE

We want to show \hat{T}_{MMSE}(k) = E[T | X = k]

Let \hat{T}_{MMSE}(X) = c be a fixed value, then

\begin{align*} &\arg \min_c E[(c-T)^2]\\ =& \arg \min_c E\left[((c-E[T])+(E[T]-X))^2\right]\\ =& \arg \min_c (c-E[T])^2 + E[(E[T] - X^2)]\\ =& \arg \min_c (c-E[T])^2 + Var(T)\\ =& \arg \min_c (c-E[T])^2\\ =& E[T]\\ \end{align*}

However, in reality \hat{T}_{MMSE}(X) is a random variable where data X is random. Therefore we need to condition on X = k so that the random variable can be deterministic.

\arg \min_{\hat{T}(k)} E[(\hat{T}(k)-T)^2 | X = k] = E[T | X = k]

Table of Content