# Lecture 002

## Point Estimation

We want to find the confidence interval of a parameter in distribution given data. Say if we have many i.i.d. random variables $X = X_1 + X_2 + ... + X_n$

X_i = \begin{cases} 1 & \text{if person } i \text{ vote for Trump}\\ 0 & \text{otherwise}\\ \end{cases}

$Z = \frac{X}{n}$ is a maximum likelihood estimate for parameter $p$ of $\text{Binomial}(n, p)$ ($\hat{p}_{ML} = \frac{X}{n} = Z$).

We can see this is a good estimate because $E[Z] = p$ and $\lim_{n \to \infty} Var(Z) = \lim_{n \to \infty} \frac{p(1 - p)}{n}$.

We want to know how much our estimate deviates from the true parameter

1. we can say "the deviation is how large only with less than 5\%" ($Pr\{|Z - p| > \delta\} < 5\%$)
2. we can say // TODO

Note that $X$ is a random variable from our collected data. $p$ is imaginary fixed value of the true distribution parameter and $\delta$ is what we want to find. $n$ is how many data points we collect.

### Using Chernoff Bound

\begin{align*} &Pr\{|\frac{X}{n} - p| > \delta\} < 5\%\\ =& Pr\{|X - np| > n \delta\} < 5\%\\ =& Pr\{|X - E[X]| > n \delta\} < 5\%\\ \leq& 2e^{-\frac{2(n\delta)^2}{n}}\tag{by Chernoff for i.i.d. binomial}\\ \end{align*}

Now, we want $2e^{2-n\delta^2} < 0.05$ and this implies that $\delta > \sqrt{\frac{-\ln 0.025}{2n}} = \sqrt{\frac{1.84}{n}}$

### Using Chebyshev's Inequality

We calculate $Pr\{|\frac{X}{n} - p| > \delta\}$

\begin{align*} &Pr\{|\frac{X}{n} - p| > \delta\}\\ =& Pr\{|X - np| > n \delta\}\\ \leq& \frac{Var(X)}{n^2\delta^2} \tag{by Chebyshev}\\ =& \frac{p(1 - p)}{n\delta^2}\\ \leq& \frac{1}{4} \cdot \frac{1}{n\delta^2} \tag{by $p(1 - p) \leq \frac{1}{4}$}\\ \end{align*}

Note that we can't use Chebyshev's Inequality because as we discussed above, $p$ is the true parameter of the distribution that we don't have access to. But we can bound $p(1 - p) \leq \frac{1}{4}$

Now, we want $\frac{1}{4n\delta^2} < 0.05$ and this implies that $\delta > \sqrt{\frac{5}{n}}$

The bound is still good since it shrinks in rate $\Theta(\frac{1}{\sqrt{n}})$.

## Interval Estimation

Confidence Interval: an interval in which landing in the interval takes probability greater than some value.

### Shoe Shopping

#### Assuming Noise Centered

Assumption:

1. $E[N_i] = 0$
2. $Var(N_i) = \sigma^2$

We take $n$ measurement about the same ground truth value $\theta$ (we see the mean as a parameter of random distribution) where the noise are i.i.d. random variables such that $E[N_i] = 0, Var(N_i) = \sigma^2$. Let $X_i = \theta + N_i$ be the measurements, observe $E[X_i] = \theta, Var(X_i) = \sigma^2$

We know that if $\bar{X} = \frac{X_1 + X_2 +...+ X_n}{n}$, then

\begin{align*} E[\bar{X}] =& E[X_i] = \theta\\ Var(\bar{X}) =& Var(\frac{X_1 + X_2 +...+ X_n}{n})\\ =& \frac{1}{n^2}Var(X_1 + X_2 +...+ X_n) \\ =& \frac{n}{n^2}Var(X_i) \tag{linearity of variance}\\ =& \frac{\sigma^2}{n}\\ \end{align*}

We want bound for parameter $\theta$ according to our data sample $X_{1:n}$ in the form

\begin{align*} \mathcal{I}(X_{1:n}) =& \left[E[\bar{X}] - \alpha \sqrt{Var(\bar{X})}, E[\bar{X}] + \alpha \sqrt{Var(\bar{X})}\right]\\ =& \left[\bar{X} - \alpha \frac{\sigma}{\sqrt{n}}, \bar{X} + \alpha \frac{\sigma}{\sqrt{n}}\right]\\ \end{align*}

Notice the only random variable in the equation is $\bar{X}$ (its variance and expection is not random)

\begin{align*} Pr\{\theta \in \left[\bar{X} - \alpha \frac{\sigma}{\sqrt{n}}, \bar{X} + \alpha \frac{\sigma}{\sqrt{n}}\right]\} \geq& 95\%\\ Pr\{|\bar{X} - \theta | > \alpha \frac{\sigma}{\sqrt{n}}\} <& 5\%\\ \frac{Var(\bar{X})}{\alpha^2 \sigma^2 / n} \leq& \tag{by Chebyshev}\\ \frac{1}{\alpha^2}=&\\ a >& \sqrt{20}\\ \end{align*}

We want true mean $\theta$ in the interval centered at $\bar{X}$ (the mean we measured) and with $\alpha$ times $\frac{\sigma}{\sqrt{n}}$ (the standard deviation of true averaged $n$-many i.i.d.) The difference between point and interval estimate seem to be "whether we want deviation less than some true standard deviation" instead of a number.

Therefore, our $95\%$ interval is $\left[\bar{X} - \sqrt{20} \cdot \frac{\sigma}{\sqrt{n}}, \bar{X} + \sqrt{20} \cdot \frac{\sigma}{\sqrt{n}}\right]$

Note that we can't use Chernoff bound because: 1. the distribution of $\bar{X}$ is not binomial since we only know that $E[X_i] = 0$ and no other info.

1. the p.m.f. of $\bar{X}$ is depended on $X_i$ and therefore unknown

#### Assuming Normal Noise

\begin{align*} Pr\{|\bar{X} - \theta| > \alpha \frac{\sigma}{\sqrt{n}}\} <& 0.05\\ Pr\{\frac{|\bar{X} - \theta|}{\frac{\sigma}{\sqrt{n}}} > \alpha \} <&\\ 2 - 2 \Phi(\alpha) <&\\ \alpha >& 1.96\\ \end{align*}

#### Assuming Nothing

We define sample variance as follow: (where $\bar{X} = \frac{1}{n} \sum_{i = 1}^n X_i$)

S^2 = \frac{1}{n - 1} \sum_{i = 1}^n (X_i - \bar{X})^2

Notice this sample variance from sample mean, but corrected with $n - 1$ instead of $n$.

The interval becomes:

\mathcal{I}(X_{1:n}) = \left[\bar{X} - \alpha \frac{S}{\sqrt{n}}, \bar{X} + \alpha \frac{S}{\sqrt{n}}\right]
Pr\left\{\theta \in \left[\bar{X} - \alpha \frac{S}{\sqrt{n}}, \bar{X} + \alpha \frac{S}{\sqrt{n}}\right]\right\} = Pr\left\{\left| \frac{\bar{X} - \theta}{S / \sqrt{n}} \right| \leq a\right\}

The distribution $T = \frac{\bar{X} - \theta}{S / \sqrt{n}}$ is complicated. Even in case of $X_i \sim \text{Normal}(\theta, \sigma^2)$, the distribution is not normal. Student's T Distribution Compared with Normal. For small nn, it deviates a lot from Normal.

Student's t-distribution: when $X_i \sim \text{Normal}(\mu, \sigma^2)$, then $T = \frac{\bar{X} - \theta}{S / \sqrt{n}}$ is student's t-distribution with $n - 1$ degrees of freedom.

There is a complicated close form of $T$, but since we are only interested in calculating $\alpha$ in something like $Pr\{|T| > \alpha\} < 0.05$

\begin{align*} Pr\{|T| > \alpha\} <& 0.05\\ 2 - 2 Pr\{T \leq \alpha\} <&\\ 1 - Pr\{T \leq \alpha\} <& 0.025\\ F_T(\alpha) = Pr\{T \leq \alpha\} \geq& 0.975\\ \end{align*}

We obtain $\alpha$ using a table:

• given $n = 3$, we know degree of freedom is $v = n - 1 = 2$

• we want to find $\alpha$ that satisfies $F_T(\alpha) \geq 0.975$

• we find $(v = 2, F_T^{-1}(0.975)) \to \alpha \approx 1.96$ on the table

Given $X_1, X_2, ..., X_n$ distributed uniformly in $[a, 1]$ where the parameter $a > 1$ is a parameter we don't know. \begin{align} Pr{a \in [X_{\min} - \epsilon, X_{\min}]} \geq& 0.95\ Pr{X_{\min} - \epsilon \leq a \leq X_{\min}} \geq& 0.95\ Pr{X_{\min} - \epsilon \leq a} \geq& 0.95 \tag{$a \leq X_{\min}$ always hold}\ Pr{X_{\min} > a + \epsilon} \leq& 0.05\ Pr{X_1, X_2, ..., X_n > a + \epsilon} \leq& 0.05\ Pr{(\frac{1 - a - \epsilon}{1 - a})^n \leq& 0.05}\ \epsilon \geq& 1 - a - (1 - a)0.05^{1/n}\ \epsilon \geq& 1(1 - 0.05^{1/n}) \tag{bound $a$}\ \end{align} // TODO: full