Lemma: random variable [X | X > 1] = [1+X]. Proof see below.
\begin{align*}
E[X] &= E[X|\text{first flip is head}] \cdot p + E[X|\text{first flip is tail}] \cdot (1-p)\\
E[X] &= 1 \cdot p + E[1+X] \cdot (1-p) \tag{by Lemma}\\
E[X] &= p + (1 + E[X])(1-p)\\
E[X] &= p + (1-p) + E[X](1-p)\\
E[X] &= 1 + E[X](1-p)\\
1 - 1 + p &= \frac{1}{E[X]} \tag{assume $E[X] \neq 0$}\\
E[X] &= \frac{1}{p}\\
\end{align*}
Corollary: Memoryless of Geometric
\begin{align*}
[X | X > s] =& [s + X]\\
Pr\{X = t | X > s\} =& Pr\{s + X = t\}\\
Pr\{X = t | X > s\} =& Pr\{X = t - s\}\\
Pr\{X = t + s | X > s\} =& Pr\{X = t - s + s\}\\
Pr\{X = t + s | X > s\} =& Pr\{X = t\}\\
\end{align*}
Proof: Let X \sim \text{Geometric}(p), Y = [X | \text{1st flip is a tail}] = [X | X > 1]
Now, Y is a different random variable that has its own distribution. The range of distribution of Y (2, 3, 4, ...) is not the same as the range of distribution of X (1, 2, 3, 4, ...). Think of Y as the cut off of X from i = 2 to \infty.
We claim: Y =^d 1 + X by showing (\forall i = 2, 3, 4, ...)Pr\{Y = i\} = Pr\{1 + X < i\}. We only show i = 2, 3, 4, ... because we want to compare distributions with different range.
Let hand side:
\begin{align*}
&Pr\{X = i | X > 1\} \tag{where $i$ can only be $2, 3, 4, 5, ...$}\\
=& \frac{Pr\{X = i \cap X > 1\}}{Pr\{X > 1\}}\\
=& \frac{Pr\{X = i\}}{Pr\{X > 1\}}\\
=& \frac{(1 - p)^{i - 1}p}{1 - p}\\
=& (1 - p)^{i - 2}p\\
\end{align*}
Right hand side:
\begin{align*}
& Pr\{1 + X = i\} \tag{where $i$ can only be $2, 3, 4, 5, ...$}\\
=& Pr\{X = i - 1\}\\
=& (1 - p)^{i - 2}p\\
\end{align*}
Corollary: For X \sim \text{Geometric}(p), E[X^2 | X > 1] = E[Y^2] = E[(1 + X)^2]
Corollary: If X \perp Y, then E[g(X)f(Y)] = E[g(X)] \cdot E[f(Y)]. However, the reverse is not true.
Linearity of Expectation
Linearity of Expectation: E[X + Y] = E[X] + E[Y]
Proof:
\begin{align*}
E[X + Y] &= \sum_x \sum_y (x + y) P_{X, Y}(x, y)\\
&= \sum_x \sum_y x P_{X, Y}(x, y) + \sum_x \sum_y y P_{X, Y}(x, y)\\
&= \sum_x x \sum_y P_{X, Y}(x, y) + \sum_y y \sum_x P_{X, Y}(x, y)\\
&= \sum_x x P_X(x) + \sum_y y P_Y(y)\\
&= E[X] + E[Y]\\
\end{align*}
Practice on Linear Expectation
Say in Arknights you have n different characters and you want to collect them all. Each roll you have uniform \frac{1}{n} to get a specific character. What is the expected number of rolls to get full n characters?
Let X \sim \text{number of rolls to get full } n \text{ characters}.
Let X_i \sim \text{number of rolls to get } i \text{-th character}.
Then X = X_1 + X_2 + ... + X_n where X_i \sim \text{Geometric}(\frac{n - i + 1}{n})
We have two treatments for kidney stones, their effectiveness result is below.
Facts:
Treatment A is better in general. Treatment A is better for both cases: if doctor don't know whether patient have small or larger stones, it is still more likely patient will be healed by Treatment A.
But in our sample, more patient with large stones come to Treatment A which brings down the aggregate mix of Treatment A (larger stones are harder to handle)
more patient with small stones come to Treatment B which brings up the aggregate mix of Treatment B (small stones are easier to handle)
However, if a patient end up taking Treatment A, then the patient is more likely to have bigger stones and therefore less success, if a patient end up taking Treatment B, then the patient is more likely to have smaller stones and therefore more success.
Treatment CAUSES patient to heal. But the statistics does not indicate causation.
Tricks
There are generally some methods to solve problems:
Conditioning
Linear Expectation (Variance, Transform)
Summing Expectation
Summing Tail
Bayes Law
Z-Transform and Laplace Transform
Memoryless
Integrate p.d.f. for c.d.f, Differentiate c.d.f. for p.d.f.