Lecture 002

Point Estimation

We want to find the confidence interval of a parameter in distribution given data. Say if we have many i.i.d. random variables X = X_1 + X_2 + ... + X_n

X_i = \begin{cases} 1 & \text{if person } i \text{ vote for Trump}\\ 0 & \text{otherwise}\\ \end{cases}

Z = \frac{X}{n} is a maximum likelihood estimate for parameter p of \text{Binomial}(n, p) (\hat{p}_{ML} = \frac{X}{n} = Z).

We can see this is a good estimate because E[Z] = p and \lim_{n \to \infty} Var(Z) = \lim_{n \to \infty} \frac{p(1 - p)}{n}.

We want to know how much our estimate deviates from the true parameter

  1. we can say "the deviation is how large only with less than 5\%" (Pr\{|Z - p| > \delta\} < 5\%)
  2. we can say // TODO

Note that X is a random variable from our collected data. p is imaginary fixed value of the true distribution parameter and \delta is what we want to find. n is how many data points we collect.

Using Chernoff Bound

\begin{align*} &Pr\{|\frac{X}{n} - p| > \delta\} < 5\%\\ =& Pr\{|X - np| > n \delta\} < 5\%\\ =& Pr\{|X - E[X]| > n \delta\} < 5\%\\ \leq& 2e^{-\frac{2(n\delta)^2}{n}}\tag{by Chernoff for i.i.d. binomial}\\ \end{align*}

Now, we want 2e^{2-n\delta^2} < 0.05 and this implies that \delta > \sqrt{\frac{-\ln 0.025}{2n}} = \sqrt{\frac{1.84}{n}}

Using Chebyshev's Inequality

We calculate Pr\{|\frac{X}{n} - p| > \delta\}

\begin{align*} &Pr\{|\frac{X}{n} - p| > \delta\}\\ =& Pr\{|X - np| > n \delta\}\\ \leq& \frac{Var(X)}{n^2\delta^2} \tag{by Chebyshev}\\ =& \frac{p(1 - p)}{n\delta^2}\\ \leq& \frac{1}{4} \cdot \frac{1}{n\delta^2} \tag{by $p(1 - p) \leq \frac{1}{4}$}\\ \end{align*}

Note that we can't use Chebyshev's Inequality because as we discussed above, p is the true parameter of the distribution that we don't have access to. But we can bound p(1 - p) \leq \frac{1}{4}

Now, we want \frac{1}{4n\delta^2} < 0.05 and this implies that \delta > \sqrt{\frac{5}{n}}

The bound is still good since it shrinks in rate \Theta(\frac{1}{\sqrt{n}}).

Interval Estimation

Confidence Interval: an interval in which landing in the interval takes probability greater than some value.

Shoe Shopping

Assuming Noise Centered

Assumption:

  1. E[N_i] = 0
  2. Var(N_i) = \sigma^2

We take n measurement about the same ground truth value \theta (we see the mean as a parameter of random distribution) where the noise are i.i.d. random variables such that E[N_i] = 0, Var(N_i) = \sigma^2. Let X_i = \theta + N_i be the measurements, observe E[X_i] = \theta, Var(X_i) = \sigma^2

We know that if \bar{X} = \frac{X_1 + X_2 +...+ X_n}{n}, then

\begin{align*} E[\bar{X}] =& E[X_i] = \theta\\ Var(\bar{X}) =& Var(\frac{X_1 + X_2 +...+ X_n}{n})\\ =& \frac{1}{n^2}Var(X_1 + X_2 +...+ X_n) \\ =& \frac{n}{n^2}Var(X_i) \tag{linearity of variance}\\ =& \frac{\sigma^2}{n}\\ \end{align*}

We want bound for parameter \theta according to our data sample X_{1:n} in the form

\begin{align*} \mathcal{I}(X_{1:n}) =& \left[E[\bar{X}] - \alpha \sqrt{Var(\bar{X})}, E[\bar{X}] + \alpha \sqrt{Var(\bar{X})}\right]\\ =& \left[\bar{X} - \alpha \frac{\sigma}{\sqrt{n}}, \bar{X} + \alpha \frac{\sigma}{\sqrt{n}}\right]\\ \end{align*}

Notice the only random variable in the equation is \bar{X} (its variance and expection is not random)

\begin{align*} Pr\{\theta \in \left[\bar{X} - \alpha \frac{\sigma}{\sqrt{n}}, \bar{X} + \alpha \frac{\sigma}{\sqrt{n}}\right]\} \geq& 95\%\\ Pr\{|\bar{X} - \theta | > \alpha \frac{\sigma}{\sqrt{n}}\} <& 5\%\\ \frac{Var(\bar{X})}{\alpha^2 \sigma^2 / n} \leq& \tag{by Chebyshev}\\ \frac{1}{\alpha^2}=&\\ a >& \sqrt{20}\\ \end{align*}

We want true mean \theta in the interval centered at \bar{X} (the mean we measured) and with \alpha times \frac{\sigma}{\sqrt{n}} (the standard deviation of true averaged n-many i.i.d.) The difference between point and interval estimate seem to be "whether we want deviation less than some true standard deviation" instead of a number.

Therefore, our 95\% interval is \left[\bar{X} - \sqrt{20} \cdot \frac{\sigma}{\sqrt{n}}, \bar{X} + \sqrt{20} \cdot \frac{\sigma}{\sqrt{n}}\right]

Note that we can't use Chernoff bound because: 1. the distribution of \bar{X} is not binomial since we only know that E[X_i] = 0 and no other info.

  1. the p.m.f. of \bar{X} is depended on X_i and therefore unknown

Assuming Normal Noise

\begin{align*} Pr\{|\bar{X} - \theta| > \alpha \frac{\sigma}{\sqrt{n}}\} <& 0.05\\ Pr\{\frac{|\bar{X} - \theta|}{\frac{\sigma}{\sqrt{n}}} > \alpha \} <&\\ 2 - 2 \Phi(\alpha) <&\\ \alpha >& 1.96\\ \end{align*}

Assuming Nothing

We define sample variance as follow: (where \bar{X} = \frac{1}{n} \sum_{i = 1}^n X_i)

S^2 = \frac{1}{n - 1} \sum_{i = 1}^n (X_i - \bar{X})^2

Notice this sample variance from sample mean, but corrected with n - 1 instead of n.

The interval becomes:

\mathcal{I}(X_{1:n}) = \left[\bar{X} - \alpha \frac{S}{\sqrt{n}}, \bar{X} + \alpha \frac{S}{\sqrt{n}}\right]
Pr\left\{\theta \in \left[\bar{X} - \alpha \frac{S}{\sqrt{n}}, \bar{X} + \alpha \frac{S}{\sqrt{n}}\right]\right\} = Pr\left\{\left| \frac{\bar{X} - \theta}{S / \sqrt{n}} \right| \leq a\right\}

The distribution T = \frac{\bar{X} - \theta}{S / \sqrt{n}} is complicated. Even in case of X_i \sim \text{Normal}(\theta, \sigma^2), the distribution is not normal.

Student's T Distribution Compared with Normal. For small klzzwxh:0021, it deviates a lot from Normal.

Student's T Distribution Compared with Normal. For small n, it deviates a lot from Normal.

Student's t-distribution: when X_i \sim \text{Normal}(\mu, \sigma^2), then T = \frac{\bar{X} - \theta}{S / \sqrt{n}} is student's t-distribution with n - 1 degrees of freedom.

There is a complicated close form of T, but since we are only interested in calculating \alpha in something like Pr\{|T| > \alpha\} < 0.05

\begin{align*} Pr\{|T| > \alpha\} <& 0.05\\ 2 - 2 Pr\{T \leq \alpha\} <&\\ 1 - Pr\{T \leq \alpha\} <& 0.025\\ F_T(\alpha) = Pr\{T \leq \alpha\} \geq& 0.975\\ \end{align*}

We obtain \alpha using a table:

Student T distrbution with Cumulative Probability

Student T distrbution with Cumulative Probability

Given X_1, X_2, ..., X_n distributed uniformly in [a, 1] where the parameter a > 1 is a parameter we don't know. $$ \begin{align} Pr{a \in [X_{\min} - \epsilon, X_{\min}]} \geq& 0.95\ Pr{X_{\min} - \epsilon \leq a \leq X_{\min}} \geq& 0.95\ Pr{X_{\min} - \epsilon \leq a} \geq& 0.95 \tag{a \leq X_{\min} always hold}\ Pr{X_{\min} > a + \epsilon} \leq& 0.05\ Pr{X_1, X_2, ..., X_n > a + \epsilon} \leq& 0.05\ Pr{(\frac{1 - a - \epsilon}{1 - a})^n \leq& 0.05}\ \epsilon \geq& 1 - a - (1 - a)0.05^{1/n}\ \epsilon \geq& 1(1 - 0.05^{1/n}) \tag{bound a}\ \end{align} $$ // TODO: full

Table of Content