Lecture 004

Higher Moments

k-th Moment of X: E[X^k] = \sum_i i^k \cdot Pr\{X = i\}

k-th Central Moment of X: E[(X - E[X])^k] = \sum_i (X - E[X])^k \cdot Pr\{X = i\}

Geometric Higher Moment

Let X \sim \text{Geometric}(p), what is E[X^2]

\begin{align*} E[X^2] &= E[X^2 | X = 1]p + E[X^2 | X > 1](1-p)\\ E[X^2] &= p + E[(1 + X)^2](1 - p) \tag{memoryless}\\ E[X^2] &= p + E[1 + 2X + X^2](1 - p)\\ E[X^2] &= p + (1 + 2E[X] + E[X^2])(1 - p)\\ E[X^2] &= 1 + 2E[X](1-p) + E[X^2](1-p)\\ E[X^2] &= 1 + 2\frac{1}{p}(1-p) + E[X^2](1-p) \tag{subsitute $E[X] = \frac{1}{p}$}\\ pE[X^2] &= 1 + 2\frac{1}{p}(1-p)\\ E[X^2] &= \frac{1}{p} + 2\frac{(1-p)}{p^2}\\ E[X^2] &= \frac{2-p}{p^2}\\ \end{align*}

E[X^3] = 1^3 \cdot p + E[(1+X)^3] \cdot (1 - p)

Variance

Variance:

Var(X) = E[(X - E[X])^2] = E[X^2] - E[X]^2

Think of X as a 1-d field, for each X = x_i, we calculate (x - E[X])^2 and ask what is the expectation (E[X] just the average of the field)

Proof:

\begin{align*} &E[(X - E[X])^2]\\ =& E[X^2 - 2XE[X] + E[X]^2]\\ =& E[X^2] - E[2XE[X]] + E[E[X]^2] \tag{by linear expectation}\\ =& E[X^2] - 2E[X]E[X] + E[X]^2 \tag{by $E[X]$ constants}\\ =& E[X^2] - E[X]^2\\ \end{align*}

Because Var(X) = E[X^2] - E[X]^2 \geq 0, E[X^2] \geq E[X]^2 is always true.

Variance Calculation

  1. Var(X + C) = Var(X)
  2. Var(CX) = C^2Var(X)
  3. Var(-X) = Var(X)

Example: Let X_1, X_2 be independence and both distributed like X. Observe Var(X_1 + X_2) = 2Var(X) \neq Var(2X) = 4Var(X) (roll a die and multiply result versus roll a die twice)

Variance of Bernoulli

Let X \sim \text{Bernoulli}(p). What is Var(X)?

\begin{align*} Var(X) &= E[(X-p)^2]\\ &= E[(X - p)^2 | X = 1]p + E[(X - p)^2 | X = 0](1-p)\\ &= (1 - E[X])^2p + (0 - E[X])^2(1 - p)\\ &= (1 - p)^2p + (0 - p)^2(1 - p)\\ &= p(1-p)\\ \end{align*}

where we know E[X] = p for Bernoulli.

You can't condition of Variance like Var(X) \neq Var(X | X = 1)p + Var(X | X = 0)(1 - p) = 0 + 0 = 0 Also, If you have

X = 0.6Y + 0.4Z

Then it is wrong to say X^2 = (0.6Y + 0.4Z)^2. Instead, you should say:

X^2 = 0.6Y^2 + 0.4Z^2

Other potential definitions of variance:

  1. Var(X) = E[X - E[X]] = E[X] - E[X] = 0: bad
  2. Var(X) = E[|X - E[X]|]: good, but without nice properties
  3. Var(X) = E[e^{X - E[X]}]: number too big, without nice properties
  4. Var(X) = E[(X - E[X])^3]: number too big, without nice properties
  5. Var(X) = \sqrt{E[e^{X - E[X]}]}: standard deviation (Var(X) = \sigma_X^2)

Variance of Binomial

Let X \sim \text{Binomial}(n, p) = X_1 + X_2 + ... + X_n for X_i = \begin{cases} 1 & \text{if success}\\ 0 & \text{otherwise} \end{cases}

\begin{align*} &Var(X)\\ =&Var(\text{Binomial}(n, p))\\ =&Var(X_1 + X_2 + ... + X_n)\\ =&Var(X_1) + Var(X_2) + ... + Var(X_n) \tag{by independence}\\ =&nVar(X_i) \tag{by equal distribution}\\ =&np(1-p) \tag{by variance of Bernoulli}\\ \end{align*}

We now can compute E[X^2] = Var(X) + E[X]^2 = np(1-p) + (np)^2

Variance of Geometric

Var(\text{Geometric}(p)) = \frac{1 - p}{p^2}

Variance of Poisson

Let X \sim \text{Poisson}, what is Var(X)?

\begin{align*} &Var(X)\\ =& E[(X - E[X])^2]\\ =& E[X^2] - E[X]^2\\ =& (\sum_{x = 0}^\infty x^2 Pr\{X = x\}) - \lambda^2\\ =& (\sum_{x = 0}^\infty x^2 \frac{\lambda^xe^{-\lambda}}{x!}) - \lambda^2\\ =& (\sum_{x = 1}^\infty x \frac{\lambda^xe^{-\lambda}}{(x-1)!}) - \lambda^2\\ =& (\lambda e^{-\lambda}\sum_{x = 1}^\infty x \frac{\lambda^{x-1}}{(x-1)!}) - \lambda^2\\ =& (\lambda e^{-\lambda}\sum_{x = 1}^\infty (1+(x-1)) \frac{\lambda^{x-1}}{(x-1)!}) - \lambda^2\\ =& ((\lambda e^{-\lambda}\sum_{x = 1}^\infty (x-1) \frac{\lambda^{x-1}}{(x-1)!}) + (\lambda e^{-\lambda}\sum_{x = 1}^\infty \frac{\lambda^{x-1}}{(x-1)!})) - \lambda^2\\ =& \lambda e^{-\lambda}(\sum_{x = 1}^\infty (x-1) \frac{\lambda^{x-1}}{(x-1)!}) + \lambda - \lambda^2 \tag{same as expectation of Poisson}\\ =& \lambda^2 e^{-\lambda}(\sum_{x = 2}^\infty \frac{\lambda^{x-2}}{(x-2)!}) + \lambda - \lambda^2\\ =& \lambda^2 e^{-\lambda}e^{\lambda} + \lambda - \lambda^2\\ =& \lambda^2 + \lambda - \lambda^2\\ =& \lambda\\ \end{align*}

Measure in Different Scale

Observe the variance is different when measured in different scale.

\left[X = \begin{cases} 3 & \text{with probability } \frac{1}{3}\\ 2 & \text{with probability } \frac{1}{3}\\ 1 & \text{with probability } \frac{1}{3}\\ \end{cases}\right] \implies Var(X) = \frac{2}{3}
\left[Y = \begin{cases} 30 & \text{with probability } \frac{1}{3}\\ 20 & \text{with probability } \frac{1}{3}\\ 10 & \text{with probability } \frac{1}{3}\\ \end{cases}\right] \implies Var(Y) = \frac{200}{3}

Attention: when doing math involving random variables (especially when dealing with variance), X + X \neq 2X

Square Coefficient of Variation: scale-invariant version of variance

C_X^2 = \frac{Var(x)}{E[X]^2}

Observe C_X^2(X) = C_X^2(Y) = \frac{1}{6} in above example

Linearity of Variance

Linearity of Variance: If X \perp Y, Var(X + Y) = Var(X) + Var(Y)

Proof:

\begin{align*} &Var(X + Y)\\ =& E[(X + Y)^2] - E[X+Y]^2\\ =& E[X^2 + 2XY + Y^2] - (E[X] + E[Y])^2\\ =& E[X^2] + 2E[XY] + E[Y^2] - E[X]^2 - 2E[X]E[Y] - E[Y]^2\\ =& E[X^2] + 2E[X]E[Y] + E[Y^2] - E[X]^2 - 2E[X]E[Y] - E[Y]^2\tag{by $X \perp Y$}\\ =& (E[X^2] + E[Y^2]) - (E[X]^2 + E[Y]^2) \tag{by $X \perp Y$}\\ =& Var(X) + Var(Y)\\ \end{align*}

Note that it doesn't go imply independence backward

Table of Distributions

Table of Distributions

Skew

Skew: Skew(X) = E[(X - E[X])^3]

Skew

Skew

Positive Skew: right long tail Negative Skew: left long tail

Linearity of Skew:

X \perp Y \implies Skew(X + Y) = Skew(X) + Skew(Y)

Table of Content