# Lecture 018

## Existence of Limiting Distribution

Siturations when limiting distribution does not exists

\mathbb{P}^n = \begin{bmatrix} 1 & 0\\ 0 & 1\\ \end{bmatrix}

A chain of two states loop to itself with probability $1$. If try to solve $\vec{\pi}\mathbb{P} = \vec{\pi}$, there are infinite many solution to $\vec{\pi}$ because $\mathbb{P}$ is identity. Therefore there are infinite many stationary distributions. The limiting distribution does not exists because the limiting matrix does not have the same rows.

The period is $1$. It is aperiodic.

\mathbb{P}^n = \begin{bmatrix} 0 & 1\\ 1 & 0\\ \end{bmatrix}

A chain of two states goes to each other with probability $1$. Stationary distribution is $(\frac{1}{2}, \frac{1}{2})$ by solving:

\begin{cases} \pi_0 = \pi_1 \cdot 1\\ \pi_1 = \pi_0 \cdot 1\\ \pi_0 + \pi_1 = 1\\ \end{cases}

but the limiting distribution does not exists (by $\pi_{j}=\lim_{n\to\infty} \mathbb{P}_{jj}^n$ does not exists, but $\pi_{j}=\lim_{n\to\infty} \mathbb{P}_{jj}^{2n}$ exist). The stationary distribution represent fraction of time spend on a node.

The period if $2$. It is periodic.

\mathbb{P}^n = \begin{bmatrix} 0 & 1\\ 0 & 1\\ \end{bmatrix}

Undefined period, not irreducible, but limiting distribution exists

## Period of Markov Chain

Period of State: the period of state $j$ is $\gcd(\{n \in \mathbb{Z}^+ | (\mathbb{P}^n)_{jj} > 0\})$.

Note that there are some states with undefined period (an example would be a state that goes directly to a sink)

The set represents all number of steps such that state $j$ can loop back to state $j$ (including paths that pass through $j$ multiple times).

Chicken McNugget Theorem (Euclidean Number Property): if $\gcd(\{i_1, i_2, ..., i_k\}) = 1$, then $(\exists n_0)(\forall n \in \mathbb{Z}^+)(n \geq n_0 \implies n = a_1i_1 + a_2i_2 + ... + a_ki_k)$ where coefficient $a_i \in \mathbb{N}$.

Aperiodic: a state is aperiodic if its period is $1$. A chain is aperiodic if all states are aperiodic.

Aperiodic just mean that there is a $j$-to-$j$ path of length $k$ for every sufficiently large $k \geq n_0$. If there exists no path of length $k$, then it does not make sense to talk about limiting probability from $j$ to $j$.

Theorem: for an irreducible finite-state DTMC, if a state has period $d$, then all states have period $d$. In fact, all states can be divided into $d$ residue classes where som estates are visited at time $0 \mod d$, some $1 \mod d$, ..., some $d - 1 \mod d$.

Accessable: state $j$ is accessable from state $i$ if there is a path from $i$ to $j$. $(\exists n \in \mathbb{Z}^+)((\mathbb{P}^n)_{ij} > 0)$

Communicate: state $i, j$ communicate if $j$ is accessable from $i$ and $i$ is accessable from $j$.

Irreducible: a Markov Chain that all states are communicate.

Limiting distribution does not necessarily require irreducible. But will be enough to guarantee such existence. Irreducible is stronger than connected!

## Condition when Limiting Distribution Exists

Ergodic: a finite-state DTMC is ergodic if it is aperiodic and irreducible. For infinite-state DTMC, ergodicity requires more properties.

Theorem: given a aperiodic and irreducible $\mathbb{P}$, then

1. $\lim_{n \to \infty} \mathbb{P}^n = L$
2. all rows are the same, sum to $1$, equal to $\vec{\pi}$
3. all entries are positive

Note that the theorem does not go backwards.

### Proving the Theorem

#### All Rows are the Same

We extract $j$-th column of $\mathbb{P}^n$, then we get a column of constants that are all the same.

\forall j, (\mathbb{P}^n) \vec{e} := \forall j, (\mathbb{P}^n)\begin{bmatrix} 0\\ ...\\ 1_j\\ ...\\ 0\\ \end{bmatrix} = \begin{bmatrix} c\\ ...\\ c\\ \end{bmatrix}

Intuition: We know $\mathbb{P}^n \vec{e} = \mathbb{P}(\mathbb{P}(\mathbb{P}(...\mathbb{P}(\vec{e}))))$. Each multiplication is a weighted average of the original vector $\vec{e}$. The original vector $\vec{e}$ will become more and more similar.

Let $M_n$ be the maximum element of $\mathbb{P}^n\vec{e}$. Let $m_n$ be the minimum element of $\mathbb{P}^n\vec{e}$. Let $s$ be the smallest element of the original matrix $\mathbb{P}$. We want to show:

M_n - m_n \leq (1 - 2s)(M_{n - 1} - m_{n - 1})

The equation above tells us that the difference between $M_n, m_n$ shrinks as $n$ gets big.

##### Upper Bound for Biggest Element

We prove an upper bound for $M_n$

To maximize an element in $\mathbb{P}^n\vec{e}$, we maximize $\mathbb{P}(\mathbb{P}^{n - 1}\vec{e})$ by making each element in $\mathbb{P}^{n - 1}\vec{e}$ to be $M_{n - 1}$ and each element in rows of $\mathbb{P}$ be $1 - s$

##### Lower Bound for Smallest Element

We prove an lower bound for $m_n$

Similarly, $m_n \geq (1 - s)m_{n - 1} + sM_{n - 1}$

We have now proved the inequation, but $s$ might be $0$. We provide a fix by adding another claim

We want to show:

\mathbb{P} \text{ aperiodic and irreducible} \implies (\exists n_0)(\forall n \geq n_0)(\mathbb{P}^n \text{ has no zero elements})

If this hold, then $\mathbb{P}^n = (\mathbb{P}^{n_0})^{n/n_0} = \mathbb{P}^r \cdot \mathbb{P}^{\lfloor \frac{n}{n_0} \rfloor} = \mathbb{P}^r \cdot L$ with $n = \lfloor \frac{n}{n_0} \rfloor + r$.

#### All Entries are Positive

Theorem: for $\mathbb{P}$ irreducible and aperiodic, $(\exists n_0)(\forall n \geq n_0)(\mathbb{P}^n \text{ has all positive elements})$

// TODO: proof

// TODO: proof

## Mean Time Between Visits to a State

Between Visits: let $m_{ij}$ denotes the expected number of time steps needed to first get to state $j$ from state $i$.

Theorem: given irreducible finite-state DTMC, $(\forall i, j)(E[T_{ij}] = m_{ij} \text{ is finite})$. Therefore, finite-state DTMC are all positive recurrent.

This can be argued by model the stochastic process $i \to j$ as sequence of going along some simple (no loop) path to $i \to j$ and deviate to other simple paths to $k \to j$. We bound the expected number of deviations by Geometric distribution of going along the path and length of simple paths by number of states.

Theorem: For irreducible (both periodic or aperiodic as long as there is unique stationary distribution) finite-state DTMC

\pi_j^{\text{stationary or limiting}} = \frac{1}{m_{jj}} = \frac{1}{E[T_{jj}]} > 0

Observe that:

\begin{align*} m_{ij} =& 1 + \sum_{k \neq j} \mathbb{P}_{ik}m_{kj}\\ m_{jj} =& 1 + \sum_{k \neq j} \mathbb{P}_{jk}m_{kj}\\ \end{align*}

$E[T_{i\to j}] = 1 + \sum_{\text{out nodes of }i} E[T_{\text{out nodes of }i \to j}] Pr\{i \to \text{out nodes of }i\}$. We add one to represent the transition from $i$ to out nodes of $i$, and then condition on which out nodes $i$ actually goes to.

Note the similarity between conditioned transition probability // TODO

\begin{align*} \begin{bmatrix} m_{0, 1} & m_{0, 2} & m_{0, 3}\\ m_{1, 1} & m_{1, 2} & m_{1, 3}\\ m_{2, 1} & m_{2, 2} & m_{2, 3}\\ \end{bmatrix} =& \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1\\ 1 & 1 & 1\\ \end{bmatrix} + \begin{bmatrix} p_{00} & p_{01} & p_{02}\\ p_{10} & p_{11} & p_{12}\\ p_{20} & p_{21} & p_{22}\\ \end{bmatrix}\cdot \begin{bmatrix} 0 & m_{0, 2} & m_{0, 3}\\ m_{1, 1} & 0 & m_{1, 3}\\ m_{2, 1} & m_{2, 2} & 0\\ \end{bmatrix}\\ \begin{bmatrix} m_{0, 1} &0 & 0\\ 0 & m_{1, 2} & 0\\ 0 & 0 & m_{2, 3}\\ \end{bmatrix} + \begin{bmatrix} 0 & m_{0, 2} & m_{0, 3}\\ m_{1, 1} & 0 & m_{1, 3}\\ m_{2, 1} & m_{2, 2} & 0\\ \end{bmatrix}=& \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1\\ 1 & 1 & 1\\ \end{bmatrix} + \begin{bmatrix} p_{00} & p_{01} & p_{02}\\ p_{10} & p_{11} & p_{12}\\ p_{20} & p_{21} & p_{22}\\ \end{bmatrix}\cdot \begin{bmatrix} 0 & m_{0, 2} & m_{0, 3}\\ m_{1, 1} & 0 & m_{1, 3}\\ m_{2, 1} & m_{2, 2} & 0\\ \end{bmatrix} \\ \left( \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1\\ \end{bmatrix} - \begin{bmatrix} p_{00} & p_{01} & p_{02}\\ p_{10} & p_{11} & p_{12}\\ p_{20} & p_{21} & p_{22}\\ \end{bmatrix}\right)\cdot \begin{bmatrix} 0 & m_{0, 2} & m_{0, 3}\\ m_{1, 1} & 0 & m_{1, 3}\\ m_{2, 1} & m_{2, 2} & 0\\ \end{bmatrix} =& \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1\\ 1 & 1 & 1\\ \end{bmatrix} - \begin{bmatrix} m_{0, 1} &0 & 0\\ 0 & m_{1, 2} & 0\\ 0 & 0 & m_{2, 3}\\ \end{bmatrix} \\ \left( \vec{\pi} \cdot \mathbb{I} - \vec{\pi} \cdot \begin{bmatrix} p_{00} & p_{01} & p_{02}\\ p_{10} & p_{11} & p_{12}\\ p_{20} & p_{21} & p_{22}\\ \end{bmatrix} \right) \cdot \begin{bmatrix} 0 & m_{0, 2} & m_{0, 3}\\ m_{1, 1} & 0 & m_{1, 3}\\ m_{2, 1} & m_{2, 2} & 0\\ \end{bmatrix} =& \vec{\pi} \cdot \left( \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1\\ 1 & 1 & 1\\ \end{bmatrix} - \begin{bmatrix} m_{0, 1} &0 & 0\\ 0 & m_{1, 2} & 0\\ 0 & 0 & m_{2, 3}\\ \end{bmatrix} \right) \tag{adding stationary distribution $\vec{\pi}$} \\ \vec{0} \cdot \begin{bmatrix} 0 & m_{0, 2} & m_{0, 3}\\ m_{1, 1} & 0 & m_{1, 3}\\ m_{2, 1} & m_{2, 2} & 0\\ \end{bmatrix} =& \vec{\pi} \cdot \left( \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1\\ 1 & 1 & 1\\ \end{bmatrix} - \begin{bmatrix} m_{0, 1} &0 & 0\\ 0 & m_{1, 2} & 0\\ 0 & 0 & m_{2, 3}\\ \end{bmatrix} \right) \tag{by property of stationary distribution $\vec{\pi} \mathbb{P} = \vec{\pi}$} \\ \langle{1, 1, 1, ..., 1}\rangle =& \langle{\pi_0m_{0, 0}, \pi_1m_{1, 1}, ..., \pi_{m-1}m_{m-1, m-1}}\rangle \tag{by all rows of stationary distribution sum to one}\\ (\forall i)(\pi_i =& \frac{1}{m_{ii}} > 0) \tag{by $m_{ii}$ is finite and non-zero}\\ \end{align*}

## Long-run Time Averages

Random Walk (Sample Path): one infinitely long path

Long-run Time-Average Fraction of Time: for an irreducible DTMC, a random walk on the DTMC spends in state $j$ is: (where $N_j(t)$ denotes the number of time we visit state $j$ in $t$ steps)

\begin{align*} p_j =& \lim_{t \to \infty} \frac{N_j(t)}{t}\\ \pi_j^{\text{limiting}} =& \lim_{n \to \infty} (\mathbb{P}^n)_{ij}\\ \end{align*}

We need irreducible so that $p_j$ does not depend on initial state.

Note $p_j$ is an average over a single path (time-average, require only irreducible) whereas $\pi_j^{\text{limiting}}$ is an average over many paths (ensemble average, require more than irreducible).

Theorem: For irreducible finite-state DTMC, the average number of time we visit state $i$ in a single long run (with probability $1$) is:

p_j =^{\text{w.p. } 1} \frac{1}{m_{jj}} = \pi_{j}^{\text{stationary}}

### Proving the Theory

#### Strong Law of Large Numbers

Strong Law of Large Numbers: for $X_1, X_2, ...$ where $X_i \sim X$ with $E[X] < \infty$. Let $S = \sum_{i = 1}^n X_i$, then

Pr\left\lbrace\lim_{n \to \infty} \frac{S_n}{n} = \lim_{n \to \infty} \frac{1}{n}\sum_{i = 1}^n X_i = E[X]\right\rbrace = 1

Good Sample Path: $X_1, X_2, X_3, X_4, ... = 1, 0, 1, 0, ...$

Bad Sample Path: $X_1, X_2, X_3, X_4, ... = 0, 0, 0, 0, ...$

We say that $\lim_{n \to \infty}\frac{\text{number of bad path}}{\text{number of total path}} = 0$. This is not obvious because there are uncountable infinite many bad sample paths.

#### Renewal Theorem

Renewal Process: we renew a car every $X_i$ years. Define $N(t) = \text{number of renewals by time } t$

Let $X$ be a random variable representing the number of states between visits to "renewal" state. ($E[X]$ is therefore the mean time between each "renewal" state.)

Renewal Theorem: when $E[X]$ is finite, we have

Pr\left\lbrace\lim_{t \to \infty} \frac{N(t)}{t} = \frac{1}{E[X]}\right\rbrace = 1

Proof: let $S_n = \sum_{i = 1}^{N(t)} X_i$ be the time $t$ (might not be right at renewal) of the $n$-th renewals. Then we can say $S_{N(t)} \leq t \leq S_{N(t)+1}$ ($t$ should be between two renewals).

\begin{align*} &\lim_{t \to \infty}\left( S_{N(t)} \leq t \leq S_{N(t)+1}\right)\\ =& \lim_{t \to \infty}\left( \frac{S_{N(t)}}{N(t)} \leq \frac{t}{N(t)} \leq \frac{S_{N(t)+1}}{N(t)+1} \cdot \frac{N(t)+1}{N(t)}\right)\\ =& E[X] \leq \frac{t}{N(t)} \leq E[X] \cdot 1 \tag{by Strong Law of Large Numbers}\\ \frac{t}{N(t)} =& E[X]\\ \end{align*}

### Putting Them Together

Theorem: For DTMC where mean time between states $m_{jj} < \infty$ (typically in irreducible finite-state DTMC), the following holds:

\begin{align*} &p_j\\ =& \lim_{t \to \infty} \frac{N_j(t)}{t} \tag{where $N_j(t)$ is the number of visit to state $j$ by time $t$}\\ \approx& \lim_{t \to \infty} \frac{N_j(t)}{\sum_{i = 1}^{N_j(t)}X_i} \tag{where $X_i \sim X$ is how many steps between visit state $j$ after $i$-th visit}\\ =& \frac{1}{E[X]} \tag{by Renewal Theorem with probability $1$, since $E[X] = m_{jj}$ is finite}\\ =& \frac{1}{m_{jj}} \tag{by definition}\\ =& \pi_j^{\text{stationary}} \tag{by "Mean Time Between Visits to a State"}\\ \end{align*}

## Not Aperiodic or Not Irreducible

### Not Aperiodic

Theorem: For any periodic, irreducible, finite-state DTMC, the limiting distribution does not exists.

• But $\lim_{n \to \infty} (\mathbb{P}^{nd})_{ij}$ exists for a chain with a state that has period $d$.

Theorem: For any periodic, irreducible, finite-state DTMC, the stationary distribution exists and unique.

• stationary distribution represent long-run time-average portion of time spend in each state

### Not Irreducible

For connected, reducible chain with only one sink, the limiting distribution exists. For non-connected or connected chain with more than one sink, the limiting distribution does not exist.

For reducible finite-state chain, at least one stationary distribution always exists.

## From Stationary Equation to Balanced Equations and Time-Reversibility

### Balanced Equations

#### Rate of Transition

The rate of transition from state $i$ to state $j$ is:

\pi_i\mathbb{P}_{ij}

This is essentially, in long term, the fraction of transition out of all the transitions. Observe that if there is a sink, any transition before reach to the sink will shrink to zero percent of total transitions.

The rate of transition out of state $i$ (including returning right back to state $i$) is:

\sum_{j}\pi_i\mathbb{P}_{ij}

#### Balanced Equations Formula

Balanced Equation: equating rate of transition between $i \to ?$ and $? \to i$.

\begin{align*} (\forall i)(\sum_{j \neq i}\pi_i \mathbb{P}_{ij} =& \sum_{j \neq i} \pi_j \mathbb{P}_{ji})\\ \sum_i \pi_i = 1\\ \end{align*}

The balanced equations for the DTMC are equivalent to the stationary equations for any DTMC.

Note that this can also be used for a set of states (think of bipartite)

Proof:

\begin{align*} \pi_i =& \sum_j \pi_j \mathbb{P}_{ji} \tag{stationary equation}\\ \pi_i =& \pi_i \cdot 1 = \pi_i \sum_{j}\mathbb{P}_{ij} = \sum_{j}\pi_i \mathbb{P}_{ij} \tag{sum to one}\\ \sum_j \pi_j \mathbb{P}_{ji} =& \sum_{j}\pi_i \mathbb{P}_{ij} \tag{combine above}\\ \sum_j \pi_j \mathbb{P}_{ji} - \pi_i\mathbb{P}_{ii} =& \sum_{j}\pi_i \mathbb{P}_{ij} - \pi_i\mathbb{P}_{ii} \tag{remove self loop}\\ \sum_{j \neq i} \pi_j \mathbb{P}_{ji} =& \sum_{j \neq i}\pi_i \mathbb{P}_{ij}\\ \end{align*}

### Time-Reversibility Equations

Time-Reversibility Equations: equating rate of transition between $i \to j$ and $j \to i$.

\begin{align*} (\forall i, j)(\pi_i \mathbb{P}_{ij} =& \pi_j \mathbb{P}_{ji})\\ \sum_i \pi_i = 1\\ \end{align*}

These equations apply to every pair of states $i, j$. If there are $m$ states, then there exists $\frac{m(m-1)}{2} + 1$ many equations to solve.

Careful when Use:

• This equation is NOT equivalent to stationary equation.

• This equation may not hold even in aperiodic, irreducible DTMC. An example would be a sink state.

Theorem:

• If find $\vec{\pi}$ that satisfy the time-reversibility equations, then $\vec{\pi}$ also satisfy stationary equations. The chain is time-reversible.

• If cannot find $\vec{\pi}$ that satisfy the time-reversibility equations, then it implies nothing.

Note that this theorem does not require the states to be finite.

Proof:

\begin{align*} (\forall i, j)x_i\mathbb{P}_{ij} =& x_j\mathbb{P}_{ji}\\ \implies \sum_i x_i \mathbb{P}_{ij} =& \sum_i x_j\mathbb{P}_{ji}\\ \implies \sum_i x_i \mathbb{P}_{ij} =& x_j \sum_i \mathbb{P}_{ji}\\ \implies \sum_i x_i \mathbb{P}_{ij} =& x_j \cdot 1 \tag{this is stationary equation}\\ \end{align*}

A transition cannot happen twice without a reverse transition in between implies time reversible, but it does not go backward. It is usually hard to determine whether the chain is time-reversible just by looking at it. You should try to solve the equation regardless.

Theorem: for any unirected, connected (irreducible) graph with weight from $i$ to $j$ is $w_{ij} = w_{ji}$, the chain is time-reversible and the unique stationary distribution is:

\pi_i = \frac{\sum_k w_{ik}}{\sum_i \sum_k w_{ik}} = \frac{\text{out weight for }i}{2 \cdot \text{total weights}}

// TODO

## Summary

Ergodic Finite-State:

• limiting distribution exists

• $(\forall j)(\pi_j > 0)$ (side effect of prove using matrix)

• $\pi_j = \frac{1}{m_{jj}}$

• stationary distribution exists and unique and equal to limiting distribution

• $p_j = \frac{1}{m_{jj}}$ with probability $1$

• $\frac{1}{m_{jj}} = \pi_j^{\text{limiting}} = \pi_j^{\text{stationary}} =^{\text{w.p. } 1} p_j > 0$

Irreducible but Periodic with period $d$:

• limiting distribution does not exists (because it depends on exact time step)

• stationary distribution exists and unique and equal to $\lim_{n \to \infty} (\mathbb{P}^{n \cdot d})_{ij}$. $\pi_j^{\text{stationary}} = \frac{1}{m_{jj}}$

• $m_{jj}$ is finite

• $p_j = \frac{1}{m_{jj}}$ with probability $1$

• $0 < \frac{1}{m_{jj}} = \pi_j^{\text{stationary}} =^{\text{w.p. } 1} p_j$

Aperiodic but reducible:

• at least one stationary distribution exists

Theorem: All finite-state DTMC has at least one stationary distribution. All irreducible finite-state DTMC has unique stationary distribution.

Theorem: for all DTMC that has unique stationary distribution and $m_{jj}$ is finite, then long run fraction of time $p_j = \pi_j$. (regardless of periodicity)

When limiting distribution does not exists:

• We can have multiple stationary distributions: see above example

Table of Content