Lecture 004

Higher Moments

$k$ -th Moment of $X$ : $E[X^k] = \sum_i i^k \cdot Pr\{X = i\}$

$k$ -th Central Moment of $X$ : $E[(X - E[X])^k] = \sum_i (X - E[X])^k \cdot Pr\{X = i\}$

2nd central moment is variance
3rd central moment is skew
4th central moment is variance, except weighted more for outliers

Geometric Higher Moment

Let $X \sim \text{Geometric}(p)$ , what is $E[X^2]$

$\begin{align*} E[X^2] &= E[X^2 | X = 1]p + E[X^2 | X > 1](1-p)\\ E[X^2] &= p + E[(1 + X)^2](1 - p) \tag{memoryless}\\ E[X^2] &= p + E[1 + 2X + X^2](1 - p)\\ E[X^2] &= p + (1 + 2E[X] + E[X^2])(1 - p)\\ E[X^2] &= 1 + 2E[X](1-p) + E[X^2](1-p)\\ E[X^2] &= 1 + 2\frac{1}{p}(1-p) + E[X^2](1-p) \tag{subsitute $E[X] = \frac{1}{p}$}\\ pE[X^2] &= 1 + 2\frac{1}{p}(1-p)\\ E[X^2] &= \frac{1}{p} + 2\frac{(1-p)}{p^2}\\ E[X^2] &= \frac{2-p}{p^2}\\ \end{align*}$

$E[X^3] = 1^3 \cdot p + E[(1+X)^3] \cdot (1 - p)$

Variance

Variance:

$Var(X) = E[(X - E[X])^2] = E[X^2] - E[X]^2$

Think of $X$ as a 1-d field, for each $X = x_i$ , we calculate $(x - E[X])^2$ and ask what is the expectation ( $E[X]$ just the average of the field)

Proof:

$\begin{align*} &E[(X - E[X])^2]\\ =& E[X^2 - 2XE[X] + E[X]^2]\\ =& E[X^2] - E[2XE[X]] + E[E[X]^2] \tag{by linear expectation}\\ =& E[X^2] - 2E[X]E[X] + E[X]^2 \tag{by $E[X]$ constants}\\ =& E[X^2] - E[X]^2\\ \end{align*}$

Because $Var(X) = E[X^2] - E[X]^2 \geq 0$ , $E[X^2] \geq E[X]^2$ is always true.

Variance Calculation

$Var(X + C) = Var(X)$
$Var(CX) = C^2Var(X)$
$Var(-X) = Var(X)$

Example: Let $X_1$ , $X_2$ be independence and both distributed like $X$ . Observe $Var(X_1 + X_2) = 2Var(X) \neq Var(2X) = 4Var(X)$ (roll a die and multiply result versus roll a die twice)

Variance of Bernoulli

Let $X \sim \text{Bernoulli}(p)$ . What is $Var(X)$ ?

$\begin{align*} Var(X) &= E[(X-p)^2]\\ &= E[(X - p)^2 | X = 1]p + E[(X - p)^2 | X = 0](1-p)\\ &= (1 - E[X])^2p + (0 - E[X])^2(1 - p)\\ &= (1 - p)^2p + (0 - p)^2(1 - p)\\ &= p(1-p)\\ \end{align*}$

where we know $E[X] = p$ for Bernoulli.

You can't condition of Variance like $Var(X) \neq Var(X | X = 1)p + Var(X | X = 0)(1 - p) = 0 + 0 = 0$ Also, If you have

$X = 0.6Y + 0.4Z$

Then it is wrong to say $X^2 = (0.6Y + 0.4Z)^2$ . Instead, you should say:

$X^2 = 0.6Y^2 + 0.4Z^2$

Other potential definitions of variance:

$Var(X) = E[X - E[X]] = E[X] - E[X] = 0$ : bad
$Var(X) = E[|X - E[X]|]$ : good, but without nice properties
$Var(X) = E[e^{X - E[X]}]$ : number too big, without nice properties
$Var(X) = E[(X - E[X])^3]$ : number too big, without nice properties
$Var(X) = \sqrt{E[e^{X - E[X]}]}$ : standard deviation ( $Var(X) = \sigma_X^2$ )

Variance of Binomial

Let $X \sim \text{Binomial}(n, p) = X_1 + X_2 + ... + X_n$ for $X_i = \begin{cases} 1 & \text{if success}\\ 0 & \text{otherwise} \end{cases}$

$\begin{align*} &Var(X)\\ =&Var(\text{Binomial}(n, p))\\ =&Var(X_1 + X_2 + ... + X_n)\\ =&Var(X_1) + Var(X_2) + ... + Var(X_n) \tag{by independence}\\ =&nVar(X_i) \tag{by equal distribution}\\ =&np(1-p) \tag{by variance of Bernoulli}\\ \end{align*}$

We now can compute $E[X^2] = Var(X) + E[X]^2 = np(1-p) + (np)^2$

Variance of Geometric

$Var(\text{Geometric}(p)) = \frac{1 - p}{p^2}$

Variance of Poisson

Let $X \sim \text{Poisson}$ , what is $Var(X)$ ?

$\begin{align*} &Var(X)\\ =& E[(X - E[X])^2]\\ =& E[X^2] - E[X]^2\\ =& (\sum_{x = 0}^\infty x^2 Pr\{X = x\}) - \lambda^2\\ =& (\sum_{x = 0}^\infty x^2 \frac{\lambda^xe^{-\lambda}}{x!}) - \lambda^2\\ =& (\sum_{x = 1}^\infty x \frac{\lambda^xe^{-\lambda}}{(x-1)!}) - \lambda^2\\ =& (\lambda e^{-\lambda}\sum_{x = 1}^\infty x \frac{\lambda^{x-1}}{(x-1)!}) - \lambda^2\\ =& (\lambda e^{-\lambda}\sum_{x = 1}^\infty (1+(x-1)) \frac{\lambda^{x-1}}{(x-1)!}) - \lambda^2\\ =& ((\lambda e^{-\lambda}\sum_{x = 1}^\infty (x-1) \frac{\lambda^{x-1}}{(x-1)!}) + (\lambda e^{-\lambda}\sum_{x = 1}^\infty \frac{\lambda^{x-1}}{(x-1)!})) - \lambda^2\\ =& \lambda e^{-\lambda}(\sum_{x = 1}^\infty (x-1) \frac{\lambda^{x-1}}{(x-1)!}) + \lambda - \lambda^2 \tag{same as expectation of Poisson}\\ =& \lambda^2 e^{-\lambda}(\sum_{x = 2}^\infty \frac{\lambda^{x-2}}{(x-2)!}) + \lambda - \lambda^2\\ =& \lambda^2 e^{-\lambda}e^{\lambda} + \lambda - \lambda^2\\ =& \lambda^2 + \lambda - \lambda^2\\ =& \lambda\\ \end{align*}$

Measure in Different Scale

Observe the variance is different when measured in different scale.

$\left[X = \begin{cases} 3 & \text{with probability } \frac{1}{3}\\ 2 & \text{with probability } \frac{1}{3}\\ 1 & \text{with probability } \frac{1}{3}\\ \end{cases}\right] \implies Var(X) = \frac{2}{3}$

$\left[Y = \begin{cases} 30 & \text{with probability } \frac{1}{3}\\ 20 & \text{with probability } \frac{1}{3}\\ 10 & \text{with probability } \frac{1}{3}\\ \end{cases}\right] \implies Var(Y) = \frac{200}{3}$

Attention: when doing math involving random variables (especially when dealing with variance), $X + X \neq 2X$

Square Coefficient of Variation: scale-invariant version of variance

$C_X^2 = \frac{Var(x)}{E[X]^2}$

Observe $C_X^2(X) = C_X^2(Y) = \frac{1}{6}$ in above example

Linearity of Variance

Linearity of Variance: If $X \perp Y$ , $Var(X + Y) = Var(X) + Var(Y)$

Proof:

$\begin{align*} &Var(X + Y)\\ =& E[(X + Y)^2] - E[X+Y]^2\\ =& E[X^2 + 2XY + Y^2] - (E[X] + E[Y])^2\\ =& E[X^2] + 2E[XY] + E[Y^2] - E[X]^2 - 2E[X]E[Y] - E[Y]^2\\ =& E[X^2] + 2E[X]E[Y] + E[Y^2] - E[X]^2 - 2E[X]E[Y] - E[Y]^2\tag{by $X \perp Y$}\\ =& (E[X^2] + E[Y^2]) - (E[X]^2 + E[Y]^2) \tag{by $X \perp Y$}\\ =& Var(X) + Var(Y)\\ \end{align*}$

Note that it doesn't go imply independence backward

Skew

Skew: $Skew(X) = E[(X - E[X])^3]$

Positive Skew: right long tail Negative Skew: left long tail

Linearity of Skew:

$X \perp Y \implies Skew(X + Y) = Skew(X) + Skew(Y)$

Table of Content