Lecture 018

Existence of Limiting Distribution

Siturations when limiting distribution does not exists

$\mathbb{P}^n = \begin{bmatrix} 1 & 0\\ 0 & 1\\ \end{bmatrix}$

A chain of two states loop to itself with probability $1$ . If try to solve $\vec{\pi}\mathbb{P} = \vec{\pi}$ , there are infinite many solution to $\vec{\pi}$ because $\mathbb{P}$ is identity. Therefore there are infinite many stationary distributions. The limiting distribution does not exists because the limiting matrix does not have the same rows.

The period is $1$ . It is aperiodic.

$\mathbb{P}^n = \begin{bmatrix} 0 & 1\\ 1 & 0\\ \end{bmatrix}$

A chain of two states goes to each other with probability $1$ . Stationary distribution is $(\frac{1}{2}, \frac{1}{2})$ by solving:

$\begin{cases} \pi_0 = \pi_1 \cdot 1\\ \pi_1 = \pi_0 \cdot 1\\ \pi_0 + \pi_1 = 1\\ \end{cases}$

but the limiting distribution does not exists (by $\pi_{j}=\lim_{n\to\infty} \mathbb{P}_{jj}^n$ does not exists, but $\pi_{j}=\lim_{n\to\infty} \mathbb{P}_{jj}^{2n}$ exist). The stationary distribution represent fraction of time spend on a node.

The period if $2$ . It is periodic.

$\mathbb{P}^n = \begin{bmatrix} 0 & 1\\ 0 & 1\\ \end{bmatrix}$

Undefined period, not irreducible, but limiting distribution exists

Period of Markov Chain

Period of State: the period of state $j$ is $\gcd(\{n \in \mathbb{Z}^+ | (\mathbb{P}^n)_{jj} > 0\})$ .

Note that there are some states with undefined period (an example would be a state that goes directly to a sink)

The set represents all number of steps such that state $j$ can loop back to state $j$ (including paths that pass through $j$ multiple times).

Chicken McNugget Theorem (Euclidean Number Property): if $\gcd(\{i_1, i_2, ..., i_k\}) = 1$ , then $(\exists n_0)(\forall n \in \mathbb{Z}^+)(n \geq n_0 \implies n = a_1i_1 + a_2i_2 + ... + a_ki_k)$ where coefficient $a_i \in \mathbb{N}$ .

Aperiodic: a state is aperiodic if its period is $1$ . A chain is aperiodic if all states are aperiodic.

Aperiodic just mean that there is a $j$ -to- $j$ path of length $k$ for every sufficiently large $k \geq n_0$ . If there exists no path of length $k$ , then it does not make sense to talk about limiting probability from $j$ to $j$ .

Theorem: for an irreducible finite-state DTMC, if a state has period $d$ , then all states have period $d$ . In fact, all states can be divided into $d$ residue classes where som estates are visited at time $0 \mod d$ , some $1 \mod d$ , ..., some $d - 1 \mod d$ .

Accessable: state $j$ is accessable from state $i$ if there is a path from $i$ to $j$ . $(\exists n \in \mathbb{Z}^+)((\mathbb{P}^n)_{ij} > 0)$

Communicate: state $i, j$ communicate if $j$ is accessable from $i$ and $i$ is accessable from $j$ .

Irreducible: a Markov Chain that all states are communicate.

Limiting distribution does not necessarily require irreducible. But will be enough to guarantee such existence. Irreducible is stronger than connected!

Condition when Limiting Distribution Exists

Ergodic: a finite-state DTMC is ergodic if it is aperiodic and irreducible. For infinite-state DTMC, ergodicity requires more properties.

Theorem: given a aperiodic and irreducible $\mathbb{P}$ , then

$\lim_{n \to \infty} \mathbb{P}^n = L$
all rows are the same, sum to $1$ , equal to $\vec{\pi}$
all entries are positive

Note that the theorem does not go backwards.

Proving the Theorem

All Rows are the Same

We extract $j$ -th column of $\mathbb{P}^n$ , then we get a column of constants that are all the same.

$\forall j, (\mathbb{P}^n) \vec{e} := \forall j, (\mathbb{P}^n)\begin{bmatrix} 0\\ ...\\ 1_j\\ ...\\ 0\\ \end{bmatrix} = \begin{bmatrix} c\\ ...\\ c\\ \end{bmatrix}$

Intuition: We know $\mathbb{P}^n \vec{e} = \mathbb{P}(\mathbb{P}(\mathbb{P}(...\mathbb{P}(\vec{e}))))$ . Each multiplication is a weighted average of the original vector $\vec{e}$ . The original vector $\vec{e}$ will become more and more similar.

Let $M_n$ be the maximum element of $\mathbb{P}^n\vec{e}$ . Let $m_n$ be the minimum element of $\mathbb{P}^n\vec{e}$ . Let $s$ be the smallest element of the original matrix $\mathbb{P}$ . We want to show:

$M_n - m_n \leq (1 - 2s)(M_{n - 1} - m_{n - 1})$

The equation above tells us that the difference between $M_n, m_n$ shrinks as $n$ gets big.

Upper Bound for Biggest Element

We prove an upper bound for $M_n$

To maximize an element in $\mathbb{P}^n\vec{e}$ , we maximize $\mathbb{P}(\mathbb{P}^{n - 1}\vec{e})$ by making each element in $\mathbb{P}^{n - 1}\vec{e}$ to be $M_{n - 1}$ and each element in rows of $\mathbb{P}$ be $1 - s$

Lower Bound for Smallest Element

We prove an lower bound for $m_n$

Similarly, $m_n \geq (1 - s)m_{n - 1} + sM_{n - 1}$

We have now proved the inequation, but $s$ might be $0$ . We provide a fix by adding another claim

We want to show:

$\mathbb{P} \text{ aperiodic and irreducible} \implies (\exists n_0)(\forall n \geq n_0)(\mathbb{P}^n \text{ has no zero elements})$

If this hold, then $\mathbb{P}^n = (\mathbb{P}^{n_0})^{n/n_0} = \mathbb{P}^r \cdot \mathbb{P}^{\lfloor \frac{n}{n_0} \rfloor} = \mathbb{P}^r \cdot L$ with $n = \lfloor \frac{n}{n_0} \rfloor + r$ .

All Entries are Positive

Theorem: for $\mathbb{P}$ irreducible and aperiodic, $(\exists n_0)(\forall n \geq n_0)(\mathbb{P}^n \text{ has all positive elements})$

// TODO: proof

All Rows Sum to One

// TODO: proof

Mean Time Between Visits to a State

Between Visits: let $m_{ij}$ denotes the expected number of time steps needed to first get to state $j$ from state $i$ .

Theorem: given irreducible finite-state DTMC, $(\forall i, j)(E[T_{ij}] = m_{ij} \text{ is finite})$ . Therefore, finite-state DTMC are all positive recurrent.

This can be argued by model the stochastic process $i \to j$ as sequence of going along some simple (no loop) path to $i \to j$ and deviate to other simple paths to $k \to j$ . We bound the expected number of deviations by Geometric distribution of going along the path and length of simple paths by number of states.

Theorem: For irreducible (both periodic or aperiodic as long as there is unique stationary distribution) finite-state DTMC

$\pi_j^{\text{stationary or limiting}} = \frac{1}{m_{jj}} = \frac{1}{E[T_{jj}]} > 0$

Observe that:

$\begin{align*} m_{ij} =& 1 + \sum_{k \neq j} \mathbb{P}_{ik}m_{kj}\\ m_{jj} =& 1 + \sum_{k \neq j} \mathbb{P}_{jk}m_{kj}\\ \end{align*}$

$E[T_{i\to j}] = 1 + \sum_{\text{out nodes of }i} E[T_{\text{out nodes of }i \to j}] Pr\{i \to \text{out nodes of }i\}$ . We add one to represent the transition from $i$ to out nodes of $i$ , and then condition on which out nodes $i$ actually goes to.

Note the similarity between conditioned transition probability // TODO

$\begin{align*} \begin{bmatrix} m_{0, 1} & m_{0, 2} & m_{0, 3}\\ m_{1, 1} & m_{1, 2} & m_{1, 3}\\ m_{2, 1} & m_{2, 2} & m_{2, 3}\\ \end{bmatrix} =& \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1\\ 1 & 1 & 1\\ \end{bmatrix} + \begin{bmatrix} p_{00} & p_{01} & p_{02}\\ p_{10} & p_{11} & p_{12}\\ p_{20} & p_{21} & p_{22}\\ \end{bmatrix}\cdot \begin{bmatrix} 0 & m_{0, 2} & m_{0, 3}\\ m_{1, 1} & 0 & m_{1, 3}\\ m_{2, 1} & m_{2, 2} & 0\\ \end{bmatrix}\\ \begin{bmatrix} m_{0, 1} &0 & 0\\ 0 & m_{1, 2} & 0\\ 0 & 0 & m_{2, 3}\\ \end{bmatrix} + \begin{bmatrix} 0 & m_{0, 2} & m_{0, 3}\\ m_{1, 1} & 0 & m_{1, 3}\\ m_{2, 1} & m_{2, 2} & 0\\ \end{bmatrix}=& \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1\\ 1 & 1 & 1\\ \end{bmatrix} + \begin{bmatrix} p_{00} & p_{01} & p_{02}\\ p_{10} & p_{11} & p_{12}\\ p_{20} & p_{21} & p_{22}\\ \end{bmatrix}\cdot \begin{bmatrix} 0 & m_{0, 2} & m_{0, 3}\\ m_{1, 1} & 0 & m_{1, 3}\\ m_{2, 1} & m_{2, 2} & 0\\ \end{bmatrix} \\ \left( \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1\\ \end{bmatrix} - \begin{bmatrix} p_{00} & p_{01} & p_{02}\\ p_{10} & p_{11} & p_{12}\\ p_{20} & p_{21} & p_{22}\\ \end{bmatrix}\right)\cdot \begin{bmatrix} 0 & m_{0, 2} & m_{0, 3}\\ m_{1, 1} & 0 & m_{1, 3}\\ m_{2, 1} & m_{2, 2} & 0\\ \end{bmatrix} =& \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1\\ 1 & 1 & 1\\ \end{bmatrix} - \begin{bmatrix} m_{0, 1} &0 & 0\\ 0 & m_{1, 2} & 0\\ 0 & 0 & m_{2, 3}\\ \end{bmatrix} \\ \left( \vec{\pi} \cdot \mathbb{I} - \vec{\pi} \cdot \begin{bmatrix} p_{00} & p_{01} & p_{02}\\ p_{10} & p_{11} & p_{12}\\ p_{20} & p_{21} & p_{22}\\ \end{bmatrix} \right) \cdot \begin{bmatrix} 0 & m_{0, 2} & m_{0, 3}\\ m_{1, 1} & 0 & m_{1, 3}\\ m_{2, 1} & m_{2, 2} & 0\\ \end{bmatrix} =& \vec{\pi} \cdot \left( \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1\\ 1 & 1 & 1\\ \end{bmatrix} - \begin{bmatrix} m_{0, 1} &0 & 0\\ 0 & m_{1, 2} & 0\\ 0 & 0 & m_{2, 3}\\ \end{bmatrix} \right) \tag{adding stationary distribution $\vec{\pi}$} \\ \vec{0} \cdot \begin{bmatrix} 0 & m_{0, 2} & m_{0, 3}\\ m_{1, 1} & 0 & m_{1, 3}\\ m_{2, 1} & m_{2, 2} & 0\\ \end{bmatrix} =& \vec{\pi} \cdot \left( \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1\\ 1 & 1 & 1\\ \end{bmatrix} - \begin{bmatrix} m_{0, 1} &0 & 0\\ 0 & m_{1, 2} & 0\\ 0 & 0 & m_{2, 3}\\ \end{bmatrix} \right) \tag{by property of stationary distribution $\vec{\pi} \mathbb{P} = \vec{\pi}$} \\ \langle{1, 1, 1, ..., 1}\rangle =& \langle{\pi_0m_{0, 0}, \pi_1m_{1, 1}, ..., \pi_{m-1}m_{m-1, m-1}}\rangle \tag{by all rows of stationary distribution sum to one}\\ (\forall i)(\pi_i =& \frac{1}{m_{ii}} > 0) \tag{by $m_{ii}$ is finite and non-zero}\\ \end{align*}$

Long-run Time Averages

Random Walk (Sample Path): one infinitely long path

Long-run Time-Average Fraction of Time: for an irreducible DTMC, a random walk on the DTMC spends in state $j$ is: (where $N_j(t)$ denotes the number of time we visit state $j$ in $t$ steps)

$\begin{align*} p_j =& \lim_{t \to \infty} \frac{N_j(t)}{t}\\ \pi_j^{\text{limiting}} =& \lim_{n \to \infty} (\mathbb{P}^n)_{ij}\\ \end{align*}$

We need irreducible so that $p_j$ does not depend on initial state.

Note $p_j$ is an average over a single path (time-average, require only irreducible) whereas $\pi_j^{\text{limiting}}$ is an average over many paths (ensemble average, require more than irreducible).

Theorem: For irreducible finite-state DTMC, the average number of time we visit state $i$ in a single long run (with probability $1$ ) is:

$p_j =^{\text{w.p. } 1} \frac{1}{m_{jj}} = \pi_{j}^{\text{stationary}}$

Proving the Theory

Strong Law of Large Numbers

Strong Law of Large Numbers: for $X_1, X_2, ...$ where $X_i \sim X$ with $E[X] < \infty$ . Let $S = \sum_{i = 1}^n X_i$ , then

$Pr\left\lbrace\lim_{n \to \infty} \frac{S_n}{n} = \lim_{n \to \infty} \frac{1}{n}\sum_{i = 1}^n X_i = E[X]\right\rbrace = 1$

Good Sample Path: $X_1, X_2, X_3, X_4, ... = 1, 0, 1, 0, ...$

Bad Sample Path: $X_1, X_2, X_3, X_4, ... = 0, 0, 0, 0, ...$

We say that $\lim_{n \to \infty}\frac{\text{number of bad path}}{\text{number of total path}} = 0$ . This is not obvious because there are uncountable infinite many bad sample paths.

Renewal Theorem

Renewal Process: we renew a car every $X_i$ years. Define $N(t) = \text{number of renewals by time } t$

Let $X$ be a random variable representing the number of states between visits to "renewal" state. ( $E[X]$ is therefore the mean time between each "renewal" state.)

Renewal Theorem: when $E[X]$ is finite, we have

$Pr\left\lbrace\lim_{t \to \infty} \frac{N(t)}{t} = \frac{1}{E[X]}\right\rbrace = 1$

Proof: let $S_n = \sum_{i = 1}^{N(t)} X_i$ be the time $t$ (might not be right at renewal) of the $n$ -th renewals. Then we can say $S_{N(t)} \leq t \leq S_{N(t)+1}$ ( $t$ should be between two renewals).

$\begin{align*} &\lim_{t \to \infty}\left( S_{N(t)} \leq t \leq S_{N(t)+1}\right)\\ =& \lim_{t \to \infty}\left( \frac{S_{N(t)}}{N(t)} \leq \frac{t}{N(t)} \leq \frac{S_{N(t)+1}}{N(t)+1} \cdot \frac{N(t)+1}{N(t)}\right)\\ =& E[X] \leq \frac{t}{N(t)} \leq E[X] \cdot 1 \tag{by Strong Law of Large Numbers}\\ \frac{t}{N(t)} =& E[X]\\ \end{align*}$

Putting Them Together

Theorem: For DTMC where mean time between states $m_{jj} < \infty$ (typically in irreducible finite-state DTMC), the following holds:

$\begin{align*} &p_j\\ =& \lim_{t \to \infty} \frac{N_j(t)}{t} \tag{where $N_j(t)$ is the number of visit to state $j$ by time $t$}\\ \approx& \lim_{t \to \infty} \frac{N_j(t)}{\sum_{i = 1}^{N_j(t)}X_i} \tag{where $X_i \sim X$ is how many steps between visit state $j$ after $i$-th visit}\\ =& \frac{1}{E[X]} \tag{by Renewal Theorem with probability $1$, since $E[X] = m_{jj}$ is finite}\\ =& \frac{1}{m_{jj}} \tag{by definition}\\ =& \pi_j^{\text{stationary}} \tag{by "Mean Time Between Visits to a State"}\\ \end{align*}$

Not Aperiodic or Not Irreducible

Not Aperiodic

Theorem: For any periodic, irreducible, finite-state DTMC, the limiting distribution does not exists.

But $\lim_{n \to \infty} (\mathbb{P}^{nd})_{ij}$ exists for a chain with a state that has period $d$ .

Theorem: For any periodic, irreducible, finite-state DTMC, the stationary distribution exists and unique.

stationary distribution represent long-run time-average portion of time spend in each state

Not Irreducible

For connected, reducible chain with only one sink, the limiting distribution exists. For non-connected or connected chain with more than one sink, the limiting distribution does not exist.

For reducible finite-state chain, at least one stationary distribution always exists.

From Stationary Equation to Balanced Equations and Time-Reversibility

Balanced Equations

Rate of Transition

The rate of transition from state $i$ to state $j$ is:

$\pi_i\mathbb{P}_{ij}$

This is essentially, in long term, the fraction of transition out of all the transitions. Observe that if there is a sink, any transition before reach to the sink will shrink to zero percent of total transitions.

The rate of transition out of state $i$ (including returning right back to state $i$ ) is:

$\sum_{j}\pi_i\mathbb{P}_{ij}$

Balanced Equations Formula

Balanced Equation: equating rate of transition between $i \to ?$ and $? \to i$ .

$\begin{align*} (\forall i)(\sum_{j \neq i}\pi_i \mathbb{P}_{ij} =& \sum_{j \neq i} \pi_j \mathbb{P}_{ji})\\ \sum_i \pi_i = 1\\ \end{align*}$

The balanced equations for the DTMC are equivalent to the stationary equations for any DTMC.

Note that this can also be used for a set of states (think of bipartite)

Proof:

$\begin{align*} \pi_i =& \sum_j \pi_j \mathbb{P}_{ji} \tag{stationary equation}\\ \pi_i =& \pi_i \cdot 1 = \pi_i \sum_{j}\mathbb{P}_{ij} = \sum_{j}\pi_i \mathbb{P}_{ij} \tag{sum to one}\\ \sum_j \pi_j \mathbb{P}_{ji} =& \sum_{j}\pi_i \mathbb{P}_{ij} \tag{combine above}\\ \sum_j \pi_j \mathbb{P}_{ji} - \pi_i\mathbb{P}_{ii} =& \sum_{j}\pi_i \mathbb{P}_{ij} - \pi_i\mathbb{P}_{ii} \tag{remove self loop}\\ \sum_{j \neq i} \pi_j \mathbb{P}_{ji} =& \sum_{j \neq i}\pi_i \mathbb{P}_{ij}\\ \end{align*}$

Time-Reversibility Equations

Time-Reversibility Equations: equating rate of transition between $i \to j$ and $j \to i$ .

$\begin{align*} (\forall i, j)(\pi_i \mathbb{P}_{ij} =& \pi_j \mathbb{P}_{ji})\\ \sum_i \pi_i = 1\\ \end{align*}$

These equations apply to every pair of states $i, j$ . If there are $m$ states, then there exists $\frac{m(m-1)}{2} + 1$ many equations to solve.

Careful when Use:

This equation is NOT equivalent to stationary equation.
This equation may not hold even in aperiodic, irreducible DTMC. An example would be a sink state.

Theorem:

If find $\vec{\pi}$ that satisfy the time-reversibility equations, then $\vec{\pi}$ also satisfy stationary equations. The chain is time-reversible.
If cannot find $\vec{\pi}$ that satisfy the time-reversibility equations, then it implies nothing.

Note that this theorem does not require the states to be finite.

Proof:

$\begin{align*} (\forall i, j)x_i\mathbb{P}_{ij} =& x_j\mathbb{P}_{ji}\\ \implies \sum_i x_i \mathbb{P}_{ij} =& \sum_i x_j\mathbb{P}_{ji}\\ \implies \sum_i x_i \mathbb{P}_{ij} =& x_j \sum_i \mathbb{P}_{ji}\\ \implies \sum_i x_i \mathbb{P}_{ij} =& x_j \cdot 1 \tag{this is stationary equation}\\ \end{align*}$

A transition cannot happen twice without a reverse transition in between implies time reversible, but it does not go backward. It is usually hard to determine whether the chain is time-reversible just by looking at it. You should try to solve the equation regardless.

Theorem: for any unirected, connected (irreducible) graph with weight from $i$ to $j$ is $w_{ij} = w_{ji}$ , the chain is time-reversible and the unique stationary distribution is:

$\pi_i = \frac{\sum_k w_{ik}}{\sum_i \sum_k w_{ik}} = \frac{\text{out weight for }i}{2 \cdot \text{total weights}}$

Page Rank

// TODO

Summary

Ergodic Finite-State:

limiting distribution exists
$(\forall j)(\pi_j > 0)$ (side effect of prove using matrix)
$\pi_j = \frac{1}{m_{jj}}$
stationary distribution exists and unique and equal to limiting distribution
$p_j = \frac{1}{m_{jj}}$ with probability $1$
$\frac{1}{m_{jj}} = \pi_j^{\text{limiting}} = \pi_j^{\text{stationary}} =^{\text{w.p. } 1} p_j > 0$

Irreducible but Periodic with period $d$ :

limiting distribution does not exists (because it depends on exact time step)
stationary distribution exists and unique and equal to $\lim_{n \to \infty} (\mathbb{P}^{n \cdot d})_{ij}$ . $\pi_j^{\text{stationary}} = \frac{1}{m_{jj}}$
$m_{jj}$ is finite
$p_j = \frac{1}{m_{jj}}$ with probability $1$
$0 < \frac{1}{m_{jj}} = \pi_j^{\text{stationary}} =^{\text{w.p. } 1} p_j$

Aperiodic but reducible:

at least one stationary distribution exists

Theorem: All finite-state DTMC has at least one stationary distribution. All irreducible finite-state DTMC has unique stationary distribution.

Theorem: for all DTMC that has unique stationary distribution and $m_{jj}$ is finite, then long run fraction of time $p_j = \pi_j$ . (regardless of periodicity)

When limiting distribution does not exists:

We can have multiple stationary distributions: see above example

- We can have exactly one stationary distribution: two nodes with two self-loop

Table of Content