# Lecture 004

## Review

Frequentist/classical approach: MLE, confidence in- terval Bayesian approach: MLE, MAP, MMSE

Say we sampled $t_1, t_2, ..., t_n$ from $X_i \sim \text{Uniform}(a, 1)$ where $a$ is unknown.

### Maximum Likelihood

We maximize the following. We want $a$ as close to $1^-$ as possible to maximize $\frac{1}{(1 - a)^n}$ but we don't want $g(a)$ to fall into $0$ case.

\begin{align*} f_{X_{1:n};a}(t_{1:n}) =& \begin{cases} \frac{1}{(1 - a)^n} & \text{if } a \leq t_1, t_2, ..., t_n \leq 1\\ 0 & \text{otherwise} \end{cases}\\ g(a) =& \begin{cases} \frac{1}{(1 - a)^n} & \text{if } a \leq \min(t_1, t_2, ..., t_n)\\ 0 & \text{otherwise} \end{cases}\\ \arg \max_a =& \min(t_1, t_2, ..., t_n) \end{align*}

### Confidence Interval

It make sense to have an interval of the form $I = \left[X_{\min} - \epsilon, X_{\min}\right]$

Note that confidence interval is under the frequentist approach, so the unknown parameter ($a, \epsilon$) is a deterministic quantity. But the interval is constructed based on data, so the interval itself ($X_{\min}$) is random, i.e., the two endpoints of the interval are random variables.

\begin{align*} Pr\{a \in \left[\min(X_1, ..., X_n) - \epsilon, \min(X_1, ..., X_n)\right]\} \geq& 0.95\\ Pr\{a \geq \min(X_1, ..., X_n)\} \geq& 0.95 \tag{$a \leq \min(X_1, ..., X_n)$ always holds}\\ Pr\{\min(X_1, ..., X_n) \leq a + \epsilon\} \geq& 0.95\\ Pr\{\min(X_1, ..., X_n) > a + \epsilon\} \leq& 0.05\\ \left(\frac{1 - (a + \epsilon)}{1 - a}\right)^n \leq& 0.05\\ \epsilon \geq& (1 - a)(1 - 0.05^{1/n}) \tag{but we need a bound independent of $a$}\\ \epsilon \geq& 1 - 0.05^{1/n} \geq (1 - a)(1 - 0.05^{1/n}) \tag{this is sufficient}\\ \end{align*}

So the bound is $I = \left[X_{\min} - (1 - 0.05^{1/n}), X_{\min}\right]$

### MAP

Now we view parameter $a$ as a random variable $A$ (assuming p.d.f. of $A$ is $f_A(a) = \frac{20e^{20a}}{1-e^{-20}}$ if $0 \leq a \leq 1$) rather than deterministic. So we observe data $X_i \sim \text{Uniform}(A, 1)$ (for $i = \{1, ..., n\}$)

We maximize the posterior as following

\begin{align*} f_{a | X_{1:n}=t_{1:n}}(a) =& \frac{f_{X_{1:n}|A=a}(t_{1:n})f_A(a)}{f_{X_{1:n}}(t_{1:n})}\\ =& \begin{cases} \frac{\frac{1}{(1-a)^n}\frac{20e^{20a}}{1-e^{-20}}}{f_{X_{1:n}}(t_{1:n})} & \text{if } 0 \leq a \leq \min(t_1, ..., t_n)\\ 0 & \text{otherwise}\\ \end{cases}\\ \arg\max_a f_{a | X_{1:n}=t_{1:n}}(a) =& \arg\max_{0 \leq a \leq \min(t_1, ..., t_n)} \frac{e^{-20a}}{(1-a)^n} \tag{the maximizing bound for $a$ is important here}\\ \end{align*}

## Minimum Mean Square Error (MMSE) Estimation

Notice that when $\hat{T}_{map} = \frac{x/t^2 + \mu/\sigma^2}{1/t^2} + \frac{1}{\sigma^2}$ and we see $Pr\{Error\} = Pr\{\frac{x/t^2 + \mu/\sigma^2}{1/t^2} + \frac{1}{\sigma^2} \neq T\} = 1$. We are certain the estimation is wrong because it is continuous. We need a better way to capture error.

### Mean Square Error (MSE)

The error for estimator $\hat{T}(X)$ is:

E[(\hat{T}(X) - T)^2]

Note that $X, T$ are both random variables.

### Minimum mean square error (MMSE) estimator

Given parameter $T$ and data $X = k$

Theorem: $\hat{T}_{MMSE}(X=k) = E[T | X = k] = \int_{-\infty}^\infty t f_{T | X = k}(t) dt$

Instead of finding the max of posterior, we find the expectation of posterior.

Minimize Min Squared Error: Find $\hat{T}(\cdot)$ to minimize $E[(\hat{T}(X) - T)^2]$.

\begin{align*} \hat{T}_{MMSE}(X=x) =& E[T | X = x]\\ =& \int_{-\infty}^\infty t f_{T | X = x}(t) dt\\ =& \int_{-\infty}^\infty t \frac{p_{T, X}(t, x)}{p_X(x)} dt\\ =& \int_{-\infty}^\infty t \frac{p_{T, X}(t, x)}{\int_{-\infty}^\infty p_{T, X}(t, x) dt} dt\\ =& \int_{-\infty}^\infty t \frac{p_{X|T=t}(x)p_T(t)}{\int_{-\infty}^\infty p_{X|T=t}(x)p_T(t) dt} dt\\ =& \int_{-\infty}^\infty t \frac{Pr\{X = x | T = t\} f_T(t)}{Pr\{X = x\}} dt\\ =& \int_{-\infty}^\infty t \frac{Pr\{X = x | T = t\} f_T(t)}{\int_{-\infty}^\infty Pr\{X = x | T = t\} f_T(t) dt} dt\\ \end{align*}

MAP vs. MMSE: MAP minimize $Pr\{Error\}$ therefore only suitable for parameter in finite set. MMSE is suitable for parameter in infinite set (continuous) whereas MAP is suitable for parameter in finite set.

Often we can categorize a infinite set of parameter into discrete categories. In this case, we consider the parameter to be finite

### Proving MMSE

We want to show $\hat{T}_{MMSE}(k) = E[T | X = k]$

Let $\hat{T}_{MMSE}(X) = c$ be a fixed value, then

\begin{align*} &\arg \min_c E[(c-T)^2]\\ =& \arg \min_c E\left[((c-E[T])+(E[T]-X))^2\right]\\ =& \arg \min_c (c-E[T])^2 + E[(E[T] - X^2)]\\ =& \arg \min_c (c-E[T])^2 + Var(T)\\ =& \arg \min_c (c-E[T])^2\\ =& E[T]\\ \end{align*}

However, in reality $\hat{T}_{MMSE}(X)$ is a random variable where data $X$ is random. Therefore we need to condition on $X = k$ so that the random variable can be deterministic.

\arg \min_{\hat{T}(k)} E[(\hat{T}(k)-T)^2 | X = k] = E[T | X = k]

Table of Content