Lecture 003

Expectation

Expectation of Random Variable

Expectation of Random Variable: E[X] = \sum_i i \cdot P_X(i))

Expectation convert a fields to a constant.

\begin{align*} E[X] =& \int_{i = -\infty}^\infty i f_X(i) di = \int_{i = -\infty}^\infty i Pr\{X = i\} di\\ =& \int_{i = -\infty}^\infty \overline{F_X}(i) di = \int_{i = -\infty}^\infty Pr\{X > i\} di\\ E[X^2] =& 2\int_{i = -\infty}^\infty i \overline{F_X}(i) di = 2\int_{i = -\infty}^\infty i Pr\{X > i\} di \end{align*}

Linear Expectation (Sum)

For any random variable X, Y:

E[X + Y] = E[X] + E[Y]

Linear Expectation (with indicator random variable), Sum from c.d.f, Sum from p.d.f, Consitioning are 4 ways to start.

Linear Expectation (Product)

For random variable X, Y with X \perp Y:

E[X \cdot Y] = E[X] \cdot E[Y]

Proof:

\begin{align*} &E[X \cdot Y]\\ =& \int \int xy f_{xy}(x, y) dx dy\\ =& \int \int xy f_x(x) \cdot f_y(y) dx dy \tag{by independence}\\ =& E[X] \cdot E[Y]\\ \end{align*}

Bernoulli

Let X \sim \text{Bernoulli}(p), then E[X] = p

Geometric

Let X \sim \text{Geometric}(p), then

\begin{align*} E[X] &= \sum_{i = 1}^\infty i \cdot (i-p)^{i-1}p\\ &= p\sum_{i = 1}^\infty i \cdot (i-p)^{i-1}\\ &= p(1 + 2(i-p)^1 + 3(i-p)^2 + ...)\\ &= \frac{p}{(1-(1-p))^2}\\ &= \frac{p}{p^2}\\ &= \frac{1}{p}\\ \end{align*}

By Conditioning:

Let X = \text{number of flips to get head}

Lemma: random variable [X | X > 1] = [1+X]. Proof see below.

\begin{align*} E[X] &= E[X|\text{first flip is head}] \cdot p + E[X|\text{first flip is tail}] \cdot (1-p)\\ E[X] &= 1 \cdot p + E[1+X] \cdot (1-p) \tag{by Lemma}\\ E[X] &= p + (1 + E[X])(1-p)\\ E[X] &= p + (1-p) + E[X](1-p)\\ E[X] &= 1 + E[X](1-p)\\ 1 - 1 + p &= \frac{1}{E[X]} \tag{assume $E[X] \neq 0$}\\ E[X] &= \frac{1}{p}\\ \end{align*}
Corollary: Memoryless of Geometric
\begin{align*} [X | X > s] =& [s + X]\\ Pr\{X = t | X > s\} =& Pr\{s + X = t\}\\ Pr\{X = t | X > s\} =& Pr\{X = t - s\}\\ Pr\{X = t + s | X > s\} =& Pr\{X = t - s + s\}\\ Pr\{X = t + s | X > s\} =& Pr\{X = t\}\\ \end{align*}

Proof: Let X \sim \text{Geometric}(p), Y = [X | \text{1st flip is a tail}] = [X | X > 1]

Now, Y is a different random variable that has its own distribution. The range of distribution of Y (2, 3, 4, ...) is not the same as the range of distribution of X (1, 2, 3, 4, ...). Think of Y as the cut off of X from i = 2 to \infty.

We claim: Y =^d 1 + X by showing (\forall i = 2, 3, 4, ...)Pr\{Y = i\} = Pr\{1 + X < i\}. We only show i = 2, 3, 4, ... because we want to compare distributions with different range.

Let hand side:

\begin{align*} &Pr\{X = i | X > 1\} \tag{where $i$ can only be $2, 3, 4, 5, ...$}\\ =& \frac{Pr\{X = i \cap X > 1\}}{Pr\{X > 1\}}\\ =& \frac{Pr\{X = i\}}{Pr\{X > 1\}}\\ =& \frac{(1 - p)^{i - 1}p}{1 - p}\\ =& (1 - p)^{i - 2}p\\ \end{align*}

Right hand side:

\begin{align*} & Pr\{1 + X = i\} \tag{where $i$ can only be $2, 3, 4, 5, ...$}\\ =& Pr\{X = i - 1\}\\ =& (1 - p)^{i - 2}p\\ \end{align*}

Corollary: For X \sim \text{Geometric}(p), E[X^2 | X > 1] = E[Y^2] = E[(1 + X)^2]

Poisson

Let X \sim \text{Poisson}(\lambda), then

\begin{align*} E[X] &= \sum_{i = i}^\infty i \frac{e^{-\lambda}\lambda^i}{i!}\\ &= e^{-\lambda} \cdot \lambda \cdot \sum_{i = i}^\infty \frac{\lambda^{i - 1}}{(i - 1)!}\\ &= e^{-\lambda} \cdot \lambda \cdot (1 + \frac{\lambda^1}{1!} + \frac{\lambda^2}{2!} + \frac{\lambda^3}{3!} + ...)\\ &= e^{-\lambda} \cdot \lambda \cdot e^{\lambda}\\ &= \lambda\\ \end{align*}

Binomial

Let X \sim \text{Binomial}(n, p), then E[X] = \sum_{i = 0}^\infty i {n \choose i} p^i (1 - p)^{n - i}

Calculate using Linear Expectation:

define X_i = \text{value of } i \text{-th coin flip} = \begin{cases} 1 & \text{if head}\\ 0 & \text{if tail}\\ \end{cases}

\begin{align*} X = X_1 + X_2 + ... + X_n\\ E[X] = E[X_1] + E[X_2] + ... + E[X_n]\\ E[X] = p + p + ... + p\\ E[X] = np\ \end{align*}

Expectation of Function of Random Variable

Expectation of Function of Random Variable: E[g(X)] = \sum_i g(i) \cdot P_X(i))

Estimating Integrals

We have the following two methods to calculate expectation:

\begin{align*} &\int f(x)Pr\{X = x\} dx = E[f(x)] \approx \frac{1}{N} \sum_{i = 1}^N f(x_i)\\ \implies& \int f(x) dx = \int \frac{f(x)}{p(x)}p(x) dx \approx \frac{1}{N} \sum_{i = 1}^N \frac{f(x_i)}{p(x_i)} \end{align*}

This can be better understood when f(x) = x, in such case, we have \int x dx \approx \frac{1}{N}\sum_{i = 1}^N \frac{i}{Pr\{X = x_i\}}.

This leads us to the Monte Carlo Estimator: where w(x_i) is the a weighting function (usually w(x_i) = \frac{1}{Pr\{X = x_i\}})

\int f(x) dx \approx \frac{1}{N} \sum_{i = 1}^N f(x_i) w(x_i)

Expectation of Product of Random Variable

Expectation of Product of Random Variable: If X \perp Y, then E[XY] = E[X] \cdot E[Y]

Proof:

\begin{align*} E[XY] &= \sum_x \sum_y xy P_{X, Y}(x, y)\\ &= \sum_x \sum_y xy P_{X}(x)P_{Y}(y) \tag{independence}\\ &= \sum_x x \cdot P_{X}(x) \sum_y y \cdot P_{Y}(y)\\ &= E[X] \cdot E[Y] \end{align*}

Corollary: If X \perp Y, then E[g(X)f(Y)] = E[g(X)] \cdot E[f(Y)]. However, the reverse is not true.

Linearity of Expectation

Linearity of Expectation: E[X + Y] = E[X] + E[Y]

Proof:

\begin{align*} E[X + Y] &= \sum_x \sum_y (x + y) P_{X, Y}(x, y)\\ &= \sum_x \sum_y x P_{X, Y}(x, y) + \sum_x \sum_y y P_{X, Y}(x, y)\\ &= \sum_x x \sum_y P_{X, Y}(x, y) + \sum_y y \sum_x P_{X, Y}(x, y)\\ &= \sum_x x P_X(x) + \sum_y y P_Y(y)\\ &= E[X] + E[Y]\\ \end{align*}

Practice on Linear Expectation

Say in Arknights you have n different characters and you want to collect them all. Each roll you have uniform \frac{1}{n} to get a specific character. What is the expected number of rolls to get full n characters?

Let X \sim \text{number of rolls to get full } n \text{ characters}. Let X_i \sim \text{number of rolls to get } i \text{-th character}. Then X = X_1 + X_2 + ... + X_n where X_i \sim \text{Geometric}(\frac{n - i + 1}{n})

\begin{align*} E[X] &= E[X_1] + E[X_2] + ... + E[X_n]\\ &= \frac{n}{n} + \frac{n}{n-1} + \frac{n}{n - 2} + ... + n\\ &= n(\frac{1}{n} + \frac{1}{n-1} + \frac{1}{n - 2} + ... + 1)\\ &= n \cdot H_n\\ &= n \cdot \ln(n)\\ \end{align*}

Conditional Expectation

Conditional p.m.f.: P_{X|A}(x) = Pr\{X = x | A\} where A is an event.

Conditional Expectation: E[X | A] = \sum_i i \cdot P_{X|A}(x)

Conputing Expectation via Conditioning: E[X] = E[X|A] \cdot Pr\{A\} + E[X|\bar{A}] \cdot Pr\{\bar{A}\}

Proof:

\begin{align*} E[X] &= \sum_x Pr\{X = x\}\\ &= \sum_x (Pr\{X = x | A\} \cdot Pr\{A\} + Pr\{X = x | \bar{A}\} \cdot Pr\{\bar{A}\})\\ &= Pr\{A\} (\sum_x Pr\{X = x | A\}) + Pr\{\bar{A}\} (\sum_x Pr\{X = x | \bar{A}\})\\ &= Pr\{A\}E[X | A] + Pr\{\bar{A}\}E[X | \bar{A}]\\ \end{align*}

Simpson's Paradox

We have two treatments for kidney stones, their effectiveness result is below.

Simpson's Paradox

Simpson's Paradox

Facts:

  1. Treatment A is better in general. Treatment A is better for both cases: if doctor don't know whether patient have small or larger stones, it is still more likely patient will be healed by Treatment A.
  2. But in our sample, more patient with large stones come to Treatment A which brings down the aggregate mix of Treatment A (larger stones are harder to handle)
  3. more patient with small stones come to Treatment B which brings up the aggregate mix of Treatment B (small stones are easier to handle)
  4. However, if a patient end up taking Treatment A, then the patient is more likely to have bigger stones and therefore less success, if a patient end up taking Treatment B, then the patient is more likely to have smaller stones and therefore more success.
  5. Treatment CAUSES patient to heal. But the statistics does not indicate causation.

Tricks

There are generally some methods to solve problems:

  1. Conditioning
  2. Linear Expectation (Variance, Transform)
  3. Summing Expectation
  4. Summing Tail
  5. Bayes Law
  6. Z-Transform and Laplace Transform
  7. Memoryless
  8. Integrate p.d.f. for c.d.f, Differentiate c.d.f. for p.d.f.

Table of Content