Lecture 007

Continuous Random Variables

Continuous Random Variable: uncountable set of possible values.

probability density function: a non-negative function such that

$P\{a \leq X \leq b\} = \int_a^b f_X(x) dx \text{ with } \int_{-\infty}^\infty f_X(x) dx = 1$

Note: because it is continuous $f_X(x) dx \simeq Pr\{x \leq X \leq x + dx\}$ , $Pr\{a \leq X \leq b\} = Pr\{a < X < b\}$ hold

Uniform Distribution from a to b, Note that klzzwxh:0000 is not required. — Uniform Distribution from a to b, Note that $f_X(x) < 1$ is not required.

cumulative distribution function:

$\begin{align*} F_X(a) =& Pr\{-\infty < X \leq a\} = \int_{-\infty}^a f_X(x) dx\\ f_X(a) =& \frac{d}{dt} F_X(a)\\ \end{align*}$

Note: cdf is strongly tight to pdf $f_X(x) = \frac{d}{dx}\int_{-\infty}^x f_X(t) dt = \frac{d}{dx} F_X(x)$ by Fundamental Theorem of Calculus

tail:

$\bar{F_X}(a) = 1 - F_X(a) = Pr\{X > a\}$

Uniform

For $X \sim \text{U}(a, b)$ :

$\begin{align*} f_X(x) =& \begin{cases} \frac{1}{b - a} & \text{if } a \leq x \leq b\\ 0 & \text{otherwise}\\ \end{cases}\\ F_X(x) =& \int_a^x \frac{1}{b - a}dt = \frac{x - a}{b - a}\\ =& \begin{cases} 0 & x \leq a\\ \frac{x - a}{b - a} & a \leq x \leq b\\ 1 & x > b\\ \end{cases}\\ E[X] =& \int_a^b \frac{1}{b - a} t dt = \frac{1}{b - a} \cdot \frac{b^2 - a^2}{2} = \frac{a+b}{2}\\ E[X^2] =& \int_a^b \frac{1}{b - a} t^2 dt = \frac{1}{b - a} \cdot \frac{b^3 - a^3}{3} = \frac{b^2 + ab + a^2}{3}\\ Var(X) =& E[X^2] - E[X]^2 = \frac{(b - a)^2}{12}\\ \end{align*}$

Canonical Random Variable: Since Uniform $[0, 1)$ is used a lot, we denote $\xi = \text{Uniform}(0, 1)$

Exponential

Exponential: probability density function drops off exponentially. $X \sim \text{Exp}(\lambda)$ where $\lambda > 0$ is the rate of exponential. $\lambda$ is reciprocal of the expectation.

The probability to win the lottery in the future $t$ minutes

$\begin{align*} f_X(x) =& \begin{cases} \lambda e^{-\lambda x} & x \geq 0\\ 0 & x < 0\\ \end{cases}\\ F_X(x) =& \int_{-\infty}^\infty f_X(y) dy\\ =& \begin{cases} 1 - e^{-\lambda x} & x \geq 0\\ 0 & x < 0\\ \end{cases}\\ \bar{F}_X(x) =& 1 - F_X(x) = \begin{cases} e^{-\lambda x} & x \geq 0\\ 0 & x < 0\\ \end{cases}\\ E[X] =& \int_0^\infty \lambda e^{-\lambda t} t dt = \frac{1}{\lambda} \tag{integration by parts}\\ E[X^2] =& \int_0^\infty \lambda e^{-\lambda t} t^2 dt = \frac{2}{\lambda} \tag{double integration by parts}\\ Var(X) =& E[X^2] - E[X]^2 = \frac{1}{\lambda^2}\\ \end{align*}$

Both $f_X(x), \bar{F}_X(x)$ drop off by a constant factor $e^{-\lambda}$ with each unit increase of $x$ .

Memoryless of Exponential

$\begin{align*} Pr\{X > t+s | X > s\} =& \frac{Pr\{X > t + s\}}{Pr\{X > s\}} = \frac{e^{\lambda(t+s)}}{e^{-\lambda s}} = e^{-\lambda t} = Pr\{X > t\}\\ X =& [X - a | X > a]\\ Pr\{X > t+s\} =& Pr\{X > t\}Pr\{X > s\} \end{align*}$

The minimum of two exponential is: $f_{\min(\text{Exponential}(a), \text{Exponential}(b))} = f_{\text{Exponential}(a+b)}$ (proved by c.d.f.), and the maximum of two exponential is first occur plus the probability second occur after first (start over again by memoryless)

$\begin{align*} Pr\{\min(X, Y) < a\} =& 1- Pr\{\min(X, Y) > a\}\\ =& 1 - Pr\{X > a \cap Y > a\}\\ =& 1 - Pr\{X > a\} \cdot Pr\{Y > a\}\\ =& 1 - e^{-a\lambda_1}e^{-a\lambda_2}\\ =& 1 - e^{-a(\lambda_1 + \lambda_2)}\\ \end{align*}$

$Pr\{\max(X, Y) < a\} = Pr\{X < a \cap Y < a\} = Pr\{X < a\} \cdot Pr\{Y < a\}$

Moments

$\begin{align*} E[X] =& \int_{-\infty}^\infty x f_X(x) dx\\ E[X^i] =& \int_{-\infty}^\infty f_X(x) dx\\ E[g(X)] =& \int_{-\infty}^\infty g(x) \cdot f_X(x) dx\\ Var(X) =& \int_{-\infty}^\infty (x - E[X])^2 \cdot f_X(x) dx = E[X^2] - E[X]^2\\ \end{align*}$

Bad Example: A person travel from point $A$ to point $B$ with distance $d$ . The speed $S \sim \text{U}(\alpha, \beta)$ chosen on the start. Let $T$ denotes the distribution of time spend. What is $E[T]$ .

IDEA 1: $E[T] = E[\frac{d}{S}] = \frac{d}{E[S]}$ . WRONG: although $T$ is uniform, $\frac{d}{S}$ is not for constant $d$ . It is similar to $f(x) = \frac{1}{x}$ . Input $E[S]$ cannot bisect $T$ into equal portion. (In fact, $E[\frac{d}{S}] = E[d] \cdot E[\frac{1}{S}] = dE[\frac{1}{S}]$ by constants are trivially independent)
IDEA 2: $E[T] = \text{avg}(\frac{d}{\alpha}, \frac{d}{\beta})$ . WRONG: although $T$ is uniform, $\frac{d}{S}$ is not for constant $d$ . It is similar to $f(x) = \frac{1}{x}$ .
IDEA 3: $E[T] = \int_\alpha^\beta \frac{d}{s} \cdot f_S(s) ds$ . CORRECT: by definition.

Conditioning on Continuous Random Variable

There are differences with discrete R.V:

we use density rather than probability at certain point
we integrate, not summing
we need to condition on zero probability event

Law of Total Probability for Continuous R.V.s: for event $A$

$Pr\{A\} = \int_{-\infty}^\infty f_X(x \cap A) dx = \int_{-\infty}^\infty Pr\{A | X = x\}f_X(x) dx$

$f_X(x \cap A)$ represent the new density function for both $X = x$ and $A$ to happen.

we see $Pr\{A | X = x\} = \frac{Pr\{A \cap Pr\{X = x\}\}}{Pr\{X = x\}} = \frac{f_X(x \cap A)}{f_X(x)}$ make sense because we have both zero in nominator and denominator.

Bayes Law: for continuous random variable $X$ and event $A$ . The conditional density function of $X$ given $A$ is:

$f_{X | A} = \frac{f_X(x \cap A)}{Pr\{A\}} = \frac{Pr\{A | X = x\} \cdot f_X(x)}{Pr\{A\}}$

Observe $\int_x f_{X | A}(x) = 1$

Conditional Expectation:

$E[X | A] = \int_x x \cdot f_{X | A} dx = \int_x x \cdot \frac{Pr\{A | X = x\} \cdot f_X(x)}{Pr\{A\}} dx$

Conditional p.d.f. as domain restriction and scale up so that it can sum to 1. The scaling factor is klzzwxh:0021 where klzzwxh:0022 is the event we are conditioning. — Conditional p.d.f. as domain restriction and scale up so that it can sum to 1. The scaling factor is $\frac{1}{Pr\{E\}}$ where $E$ is the event we are conditioning.

Example: what is the expected probability $P \sim \text{U}(0, 1)$ of head of a unknown biased coin given we have an experiment where we had 10 heads out of 10 flips.

WRONG: our intuition gives $E[P|A] = 1$ . Although it is most likely that the probability of the coin is $1$ (meaning $\max(f_P(x))$ occur at $x = 1$ ), but we are asking for the expectation rather than maximum probability.
CORRECT: Let $A$ denotes the event, then $E[P|A] = \int_0^1 f_{P | A}(p) \cdot p dp$ gives us $\frac{11}{12}$ .

Joint Distributions

Joint Distribution: for continuous random variable $X, Y$

$\int_c^d \int_a^b f_{X, Y}(x, y) dx dy = Pr\{a \leq X \leq b \cap c \leq Y \leq d\}$

where $\int_{-\infty}^\infty \int_{-\infty}^\infty f_{X, Y}(x, y) dx dy = 1$

Marginal densities:

$\begin{align} f_X(x) =& \int_{-\infty}^\infty f_{X, Y}(x, y) dy\\ f_Y(x) =& \int_{-\infty}^\infty f_{X, Y}(x, y) dx\\ \end{align}$

Note that $f_X(x), f_Y(y)$ are densities, not probability. This means although it has the same unit, it is the probability when the domain interval is $1$ . Be careful! For descrite variables, the probability is the area in p.d.f. graph, which sum to one. For continuous variables, the density is the height in p.d.f. graph, which integrates to one, but does not sum to infinity.

Independence: $X \perp Y \iff f_{X, Y}(x, y) = f_X(x) \cdot f_Y(y)$

Functional Expectation:

$E[g(X, Y)] = \int_{-\infty}^\infty \int_{-\infty}^\infty g(x, y) \cdot f_{X, Y}(x, y) dx dy$

Bayes Law for Multiple Random Variables:

$f_{X | Y = y}(x) = \frac{f_{X, Y}(x, y)}{f_Y(y)} = \frac{f_{Y | X = x}(y) \cdot f_X(x)}{f_Y(y)} = \frac{f_{Y | X = x}(y) \cdot f_X(x)}{\int_x f_{X, Y}(x, y) dx}$

Corollary:

$E[X | Y = y] = \int_x x \cdot f_{X | Y = y}(x) dx$
$E[X] = \int_y E[X | Y = y] \cdot f_Y(y) dy$

Table of Content