Lecture 020

Queueing Theory

Queueing Theory Basics

Types of queues: how many queue for how many server

router (one-to-one)
bank queue (one-to-many)
super market (many-to-many)
data center (different jobs require different number of servers)

Goal of Queueing Theory:

Predicting system performance
- probability delay exceeds Service Level Agreement (SLA)
- number of jobs in queue
- service utilization
Capacity provisioning
- resources needed to achieve performance goal (ensure stability, SLA)
Find system design to improve performance
- scheduling policy to reduce delay

Terminologies

Buffer: limited or unlimited temporary storage (queue area) Service Order:

FCFS: first come first serve
SJF: small jobs first

A First-Come-First-Server(FCFS) System with One Server

Think job size as actual CPU-time needed by a job normalized by server's capability. Note that $I$ is the distribution of interarrival time, $S_i \sim S$ is the distribution for each job size (not population distribution). We always assume $\lambda < \mu$ for single queue stable system.

We calculate the expected number of jobs at time:

$\begin{align*} N(t) =& A(t) - C(t) \tag{by definition} \\ E[N(t)] =& E[A(t)] - E[C(t)]\\ =& \lambda t - E[C(t)]\\ \geq \lambda t - \mu t \tag{by $E[C(t)] \leq \mu t$ because queue might sometimes be empty}\\ =& t(\lambda - \mu)\\ \lim_{t \to \infty} =& \infty \tag{when $\lambda > \mu$}\\ \end{align*}$

In our class, we have stability: $\lambda < \mu$ . Otherwise the $E[N(t)]$ will grow with $t$ and therefore is unbounded. Assuming stability, for $D/D/1$ queue, we have $T_Q = 0, T = S$ .

Kendall Notation

Kendall Notation: a representation of a single queue.

In Kendall notation, we assume distributions are all independence and all values (service time and interarrival time) are drawn from i.i.d. distribution.

$[\text{Inter-Arrival}]/[\text{Server Requirement}]/[\text{Server Amount}] = I/S/n$

Example: $M/M/k$ means $I \sim \text{Exp}(\lambda)$ , $S \sim \text{Exp}(\mu)$ , and there is $k$ server. $G$ means any general service time.

Note that the fourth slot can mean scheduling policy or buffer capacity, but there is no consensus.

Throughput

Throughput: long-run rate of job completions over time.

$X = \lim_{t \to \infty} \frac{C(t)}{t} \text{ jobs}/sec$

Throughput for M/G/k System

Under stable system with one queue and $k$ servers, $X = \lambda < k\mu$ , but under unstable system $X = k\mu$ .

Throughput for Network of Queues

Imagine a system:

We have $k$ queues
Each queue $i$ has a input from outside of system. The rate is $r_i$
Each queue $i$ has a output to outside of system. The probability is $p_{i, out}$
Each queue $i$ has a output to every queues $j$ (might include self loop $i$ ). The probability is $p_{i, j}$
Therefore, we denote the total arrival rate as below

$\lambda_i = r_i + \sum_j \lambda_j p_{j, i}$

Assuming stable system, we have throughput $X = \sum_{i = 0}^k r_i$ . But we need to ensure none of the queues inside the system does not blow up. We need:

$(\forall i)(\lambda_i < \mu_i)$

Other Throughput

Throughput for Deterministic Routing: $X = \lambda$ , but $X_i = c\lambda$ where $c$ is the number of time a job will go through server $i$ .

Throughput for Finite Buffer: Assume jobs are dropped if buffer is full. Then

$\begin{align*} X =& \lambda \cdot Pr\{\text{job is not dropped}\}\\ =& \lambda \cdot (1 - Pr\{\text{buffer overflow when job added}\})\\ <& \lambda\\ \end{align*}$

Utilization

Utilization (load): fraction of time that the device is busy (always assuming one device $k = 1$ ) where $B(t)$ is total seconds of time busy from start to time $t$ .

$\begin{align*} \rho = \lim_{t \to \infty} \frac{B(t)}{t} &=^{\text{by Little's Law for single stable system}} \frac{\lambda}{\mu} = \lambda E[S]\\ &=^{\text{by Little's Law for multiple homogeneous servers}} \frac{\lambda}{k\mu} = \frac{\lambda E[S]}{k}\\ \end{align*}$

Little's Law

Purpose:

to calculate system
mean divide mean

We can represent system as DTMC or CTMC where each state denotes the number of jobs in system (for memoryless distribution only, if not, we can always approximate with multiple memoryless distributions). There are other methods like "tagged job methods"

Modeling System as Markov Chain: for ergodic system:

irreducible: every number of jobs is possible to reach
aperiodic: no periodicity to the times when state $0$ is visited. Typically we imagine time is continuous.
positive recurrent: system is stable

Little's Law: for any ergodic system, we have

$E[N] = \lambda E[T]$

Intuition: $E[\text{Time between completions}] = E[I] = \frac{1}{\lambda}$ for stable system. And therefore $E[T] = E[N] \cdot E[\text{Time between completions}]$ .

Proof: We arrange the $T$ of each job in a timeline. We are interested in calculating the area before time $T = t$ .

Each job is a horizontal rectangle in which the length of rectangle is the total time klzzwxh:0015. The x-axis is time and the y-axis is the number of jobs currently in the system. — Each job is a horizontal rectangle in which the length of rectangle is the total time $T = T_Q + S$ . The x-axis is time and the y-axis is the number of jobs currently in the system.

$\begin{align*} \sum_{i = 1}^{C(t)} T_i \leq& Area &&\leq \sum_{i = 1}^{A(t)} T_i\\ \sum_{i = 1}^{C(t)} T_i \leq& \int_0^t N(s) ds &&\leq \sum_{i = 1}^{A(t)} T_i\\ \frac{\sum_{i = 1}^{C(t)} T_i}{C(t)} \cdot \frac{C(t)}{t} \leq& \frac{\int_0^t N(s) ds}{t} &&\leq \frac{\sum_{i = 1}^{A(t)} T_i}{A(t)} \cdot \frac{A(t)}{t}\\ \lim_{t \to \infty} \frac{\sum_{i = 1}^{C(t)} T_i}{C(t)} \cdot \frac{C(t)}{t} \leq& \lim_{t \to \infty} \frac{\int_0^t N(s) ds}{t} &&\leq \lim_{t \to \infty} \frac{\sum_{i = 1}^{A(t)} T_i}{A(t)} \cdot \frac{A(t)}{t}\\ \lim_{t \to \infty} \frac{\sum_{i = 1}^{C(t)} T_i}{C(t)} \cdot X \leq& \bar{N}^{\text{Time Average}} &&\leq \lim_{t \to \infty} \frac{\sum_{i = 1}^{A(t)} T_i}{A(t)} \cdot \lambda \tag{by definition, assuming limit exists}\\ \bar{T}^{\text{Time Average}} \cdot X \leq& \bar{N}^{\text{Time Average}} &&\leq \bar{T}^{\text{Time Average}} \cdot \lambda \tag{by limit exists}\\ \bar{T}^{\text{Time Average}} \cdot \lambda \leq& \bar{N}^{\text{Time Average}} &&\leq \bar{T}^{\text{Time Average}} \cdot \lambda \tag{by $X = \lambda$ assume stable system}\\ \bar{N}^{\text{Time Average}} =& \lambda \bar{T}^{\text{Time Average}}\\ E[N] =& \lambda E[T] \tag{by time average $=^{\text{w.p.} 1}$ ensemble averages for ergodic}\\ \end{align*}$

Notice the law does not require FCFS order and is independent of scheduling policy. It holds for any system and any parts of the system as long as it is ergodic.

Corollaries of Little's Law

Little's Law for Time in Queue: Given any system where $\lambda = X$ and all quantities with limits exists:

$\bar{N}_Q^{\text{Time Average}} = \lambda \cdot \bar{T}_Q^{\text{Time Average}}$

The proof is the same except when summing up the region, we leave out the portion when the job is not in the service (might break into segments since a job can be in the queue for a while and then in service and then wait for other queues...)

Utilization Law: within a ergodic network of queues

$\begin{align*} \rho_i =& Pr\{\text{facility } i \text{ busy}\}\\ =& E[N_{\text{job in service of } i}] \tag{by one job at a time}\\ =& X_i \cdot E[T_{\text{service time}}] \tag{by Little's Law}\\ =& \lambda_i \cdot E[T_{\text{service time}}] \tag{by ergodic}\\ =& \lambda_i \cdot E[S_{i}] \tag{by server is the system}\\ =& \frac{\lambda_i}{\mu_i}\\ \end{align*}$

Little's Law for Red Jobs: for ergodic system

$E[N_{red}] = \lambda_{red} \cdot E[T_{red}]$

The proof is the same except we now only sum up jobs that are red.

Table of Content