Lecture 004
Higher Moments
k -th Moment of X : E[X^k] = \sum_i i^k \cdot Pr\{X = i\}
k -th Central Moment of X : E[(X - E[X])^k] = \sum_i (X - E[X])^k \cdot Pr\{X = i\}
2nd central moment is variance
3rd central moment is skew
4th central moment is variance, except weighted more for outliers
Geometric Higher Moment
Let X \sim \text{Geometric}(p) , what is E[X^2]
\begin{align*}
E[X^2] &= E[X^2 | X = 1]p + E[X^2 | X > 1](1-p)\\
E[X^2] &= p + E[(1 + X)^2](1 - p) \tag{memoryless}\\
E[X^2] &= p + E[1 + 2X + X^2](1 - p)\\
E[X^2] &= p + (1 + 2E[X] + E[X^2])(1 - p)\\
E[X^2] &= 1 + 2E[X](1-p) + E[X^2](1-p)\\
E[X^2] &= 1 + 2\frac{1}{p}(1-p) + E[X^2](1-p) \tag{subsitute $E[X] = \frac{1}{p}$}\\
pE[X^2] &= 1 + 2\frac{1}{p}(1-p)\\
E[X^2] &= \frac{1}{p} + 2\frac{(1-p)}{p^2}\\
E[X^2] &= \frac{2-p}{p^2}\\
\end{align*}
E[X^3] = 1^3 \cdot p + E[(1+X)^3] \cdot (1 - p)
Variance
Variance:
Var(X) = E[(X - E[X])^2] = E[X^2] - E[X]^2
Think of X as a 1-d field, for each X = x_i , we calculate (x - E[X])^2 and ask what is the expectation (E[X] just the average of the field)
Proof:
\begin{align*}
&E[(X - E[X])^2]\\
=& E[X^2 - 2XE[X] + E[X]^2]\\
=& E[X^2] - E[2XE[X]] + E[E[X]^2] \tag{by linear expectation}\\
=& E[X^2] - 2E[X]E[X] + E[X]^2 \tag{by $E[X]$ constants}\\
=& E[X^2] - E[X]^2\\
\end{align*}
Because Var(X) = E[X^2] - E[X]^2 \geq 0 , E[X^2] \geq E[X]^2 is always true.
Variance Calculation
Var(X + C) = Var(X)
Var(CX) = C^2Var(X)
Var(-X) = Var(X)
Example: Let X_1 , X_2 be independence and both distributed like X . Observe Var(X_1 + X_2) = 2Var(X) \neq Var(2X) = 4Var(X) (roll a die and multiply result versus roll a die twice)
Variance of Bernoulli
Let X \sim \text{Bernoulli}(p) . What is Var(X) ?
\begin{align*}
Var(X) &= E[(X-p)^2]\\
&= E[(X - p)^2 | X = 1]p + E[(X - p)^2 | X = 0](1-p)\\
&= (1 - E[X])^2p + (0 - E[X])^2(1 - p)\\
&= (1 - p)^2p + (0 - p)^2(1 - p)\\
&= p(1-p)\\
\end{align*}
where we know E[X] = p for Bernoulli.
You can't condition of Variance like Var(X) \neq Var(X | X = 1)p + Var(X | X = 0)(1 - p) = 0 + 0 = 0
Also, If you have
Then it is wrong to say X^2 = (0.6Y + 0.4Z)^2 . Instead, you should say:
Other potential definitions of variance:
Var(X) = E[X - E[X]] = E[X] - E[X] = 0 : bad
Var(X) = E[|X - E[X]|] : good, but without nice properties
Var(X) = E[e^{X - E[X]}] : number too big, without nice properties
Var(X) = E[(X - E[X])^3] : number too big, without nice properties
Var(X) = \sqrt{E[e^{X - E[X]}]} : standard deviation (Var(X) = \sigma_X^2 )
Variance of Binomial
Let X \sim \text{Binomial}(n, p) = X_1 + X_2 + ... + X_n for X_i = \begin{cases}
1 & \text{if success}\\
0 & \text{otherwise}
\end{cases}
\begin{align*}
&Var(X)\\
=&Var(\text{Binomial}(n, p))\\
=&Var(X_1 + X_2 + ... + X_n)\\
=&Var(X_1) + Var(X_2) + ... + Var(X_n) \tag{by independence}\\
=&nVar(X_i) \tag{by equal distribution}\\
=&np(1-p) \tag{by variance of Bernoulli}\\
\end{align*}
We now can compute E[X^2] = Var(X) + E[X]^2 = np(1-p) + (np)^2
Variance of Geometric
Var(\text{Geometric}(p)) = \frac{1 - p}{p^2}
Variance of Poisson
Let X \sim \text{Poisson} , what is Var(X) ?
\begin{align*}
&Var(X)\\
=& E[(X - E[X])^2]\\
=& E[X^2] - E[X]^2\\
=& (\sum_{x = 0}^\infty x^2 Pr\{X = x\}) - \lambda^2\\
=& (\sum_{x = 0}^\infty x^2 \frac{\lambda^xe^{-\lambda}}{x!}) - \lambda^2\\
=& (\sum_{x = 1}^\infty x \frac{\lambda^xe^{-\lambda}}{(x-1)!}) - \lambda^2\\
=& (\lambda e^{-\lambda}\sum_{x = 1}^\infty x \frac{\lambda^{x-1}}{(x-1)!}) - \lambda^2\\
=& (\lambda e^{-\lambda}\sum_{x = 1}^\infty (1+(x-1)) \frac{\lambda^{x-1}}{(x-1)!}) - \lambda^2\\
=& ((\lambda e^{-\lambda}\sum_{x = 1}^\infty (x-1) \frac{\lambda^{x-1}}{(x-1)!}) + (\lambda e^{-\lambda}\sum_{x = 1}^\infty \frac{\lambda^{x-1}}{(x-1)!})) - \lambda^2\\
=& \lambda e^{-\lambda}(\sum_{x = 1}^\infty (x-1) \frac{\lambda^{x-1}}{(x-1)!}) + \lambda - \lambda^2 \tag{same as expectation of Poisson}\\
=& \lambda^2 e^{-\lambda}(\sum_{x = 2}^\infty \frac{\lambda^{x-2}}{(x-2)!}) + \lambda - \lambda^2\\
=& \lambda^2 e^{-\lambda}e^{\lambda} + \lambda - \lambda^2\\
=& \lambda^2 + \lambda - \lambda^2\\
=& \lambda\\
\end{align*}
Measure in Different Scale
Observe the variance is different when measured in different scale.
\left[X = \begin{cases}
3 & \text{with probability } \frac{1}{3}\\
2 & \text{with probability } \frac{1}{3}\\
1 & \text{with probability } \frac{1}{3}\\
\end{cases}\right] \implies Var(X) = \frac{2}{3}
\left[Y = \begin{cases}
30 & \text{with probability } \frac{1}{3}\\
20 & \text{with probability } \frac{1}{3}\\
10 & \text{with probability } \frac{1}{3}\\
\end{cases}\right] \implies Var(Y) = \frac{200}{3}
Attention: when doing math involving random variables (especially when dealing with variance), X + X \neq 2X
Square Coefficient of Variation: scale-invariant version of variance
C_X^2 = \frac{Var(x)}{E[X]^2}
Observe C_X^2(X) = C_X^2(Y) = \frac{1}{6} in above example
Linearity of Variance
Linearity of Variance: If X \perp Y , Var(X + Y) = Var(X) + Var(Y)
Proof:
\begin{align*}
&Var(X + Y)\\
=& E[(X + Y)^2] - E[X+Y]^2\\
=& E[X^2 + 2XY + Y^2] - (E[X] + E[Y])^2\\
=& E[X^2] + 2E[XY] + E[Y^2] - E[X]^2 - 2E[X]E[Y] - E[Y]^2\\
=& E[X^2] + 2E[X]E[Y] + E[Y^2] - E[X]^2 - 2E[X]E[Y] - E[Y]^2\tag{by $X \perp Y$}\\
=& (E[X^2] + E[Y^2]) - (E[X]^2 + E[Y]^2) \tag{by $X \perp Y$}\\
=& Var(X) + Var(Y)\\
\end{align*}
Note that it doesn't go imply independence backward
Table of Distributions
Skew
Skew: Skew(X) = E[(X - E[X])^3]
Skew
Positive Skew: right long tail
Negative Skew: left long tail
Linearity of Skew:
X \perp Y \implies Skew(X + Y) = Skew(X) + Skew(Y)