# Lecture 013

## Parameter Estimation

We want to estimate the probability $p$ of a coin by sampling. We took $n$ samples where each sample is $X_i$. Together, assume each $X_i$ are i.i.d., we have:

X = \sum_{i = 1}^n X_i

Therefore, we want to find out $\delta$:

\begin{align*} &Pr\{p \in [\frac{X}{n} - \delta, \frac{X}{n} + \delta]\} \geq 0.95\\ \iff& Pr\{|\frac{X}{n} - p| > \delta\} \leq 0.05\\ \iff& Pr\{|X - np| > n\delta\} \leq 0.05 \tag{multiply both side by $n$}\\ \iff& Pr\{|X - E[X]| > n\delta\} \leq 0.05 \tag{by $X \sim \text{Binomial}(n, p) \implies E[X] = np$}\\ \iff& 2e^{-\frac{2(n\delta)^2}{n}} \leq 0.05 \tag{by Pretty Chernoff Bound}\\ \iff& \delta \geq \sqrt{\frac{-\ln(0.025)}{2n}}\\ \iff& \delta \geq \sqrt{\frac{1.84}{n}}\\ \end{align*}

Since $n\delta \in \Theta(n)$, it makes sense to use Pretty Chernoff Bound for i.i.d. Binomial. Also notice that $\delta$ grows as $\frac{1}{\sqrt{n}}$.

So we conclude $[\frac{X}{n} - 0.043, \frac{X}{n} + 0.043]$ forms a 95\% confidence interval on the true $p$.

Of course there are many issues that come up in statistical parameter estimation. For example, it is not obvious how to get "independent", equally weighted samples.

## Balls and Bins

We randomly distribute $n$ balls in $n$ bins. Assuming $n$ is sufficiently large, we want to show with high probability, no bin will have more than $\frac{3\ln n}{\ln \ln n} - 1 \in O(\frac{\ln(n)}{\ln \ln n})$ balls.

• Sufficiently Large: $n \rightarrow \infty$

• With High Probability: $Pr \geq 1 - \frac{1}{n}$ with high $n$

Note that we chose $\frac{3\ln n}{\ln \ln n} - 1$ to simplify calculation. The reason we chose $k = \frac{3\ln n}{\ln \ln n} - 1$ because it is slower than $\ln(n)$

k = \frac{\ln n}{\ln \ln n} \sim \frac{3\ln n}{\ln \ln n} - 1 < \frac{\ln n}{10000} < \ln(n)

We define the total number of balls in bin $j$.

\begin{align*} B_j =& \text{Binomial}(n, \frac{1}{n})\\ =& \sum_{i = 1}^n X_i \tag{where $X_i = \begin{cases}1 & \text{if ith ball go to bin j}\\0&\text{otherwise}\end{cases}$}\\ \end{align*}

Want to show:

\begin{align*} &Pr\{\forall j, B_j < k\} \geq (1 - \frac{1}{n}) \tag{where $k = \frac{3\ln n}{\ln \ln n} - 1$}\\ \iff& Pr\{\exists j (B_j > k)\} \leq \frac{1}{n}\\ \iff& Pr\{B_1 > k \cup B_2 > k \cup ... \cup B_n > k\} \leq \frac{1}{n}\\ \iff& Pr\{B_1 > k\} + Pr\{B_2 > k\} + ... + Pr\{B_n > k\} \leq \frac{1}{n} \tag{Union Bound}\\ \iff& Pr\{B_j > k\} \leq \frac{1}{n^2} \tag{$\forall j$}\\ \iff& Pr\{B_j \geq 1+k\} \leq \frac{1}{n^2}\\ \iff& \left(\frac{e^k}{(1 + k)^{1 + k}}\right) \leq \frac{1}{n^2}\tag{Ugly Chernoff Bound}\\ \iff& k - (1 + k)\ln(1 + k) \leq -2\ln n \tag{$\ln$ both sides}\\ \iff& \frac{3\ln n}{\ln \ln n} - 1 - \frac{3 \ln n}{\ln \ln n} \cdot \ln \left(\frac{3 \ln n}{\ln \ln n}\right) \leq -2 \ln n \tag{subsitute $k = \frac{3\ln n}{\ln \ln n} - 1$}\\ \iff& \frac{3}{\ln \ln n} - \frac{1}{\ln n} - \frac{3 \ln 3}{\ln \ln n} - 3 + \frac{3 \ln \ln \ln n}{\ln \ln n} \leq -2 \tag{simplify}\\ \iff& o(1) + o(1) + o(1) - 3 + o(1) \leq -2 \tag{assume $n \rightarrow \infty$}\\ \end{align*}

Notice $o(1)$ is used with all positive sign. This is because $o(1) \rightarrow 0$ with high $n$ and cannot exceed $1$.

Table of Content