Types of queues: how many queue for how many server
router (one-to-one)
bank queue (one-to-many)
super market (many-to-many)
data center (different jobs require different number of servers)
Goal of Queueing Theory:
Predicting system performance
Capacity provisioning
Find system design to improve performance
Buffer: limited or unlimited temporary storage (queue area) Service Order:
FCFS: first come first serve
SJF: small jobs first
Think
job size
as actual CPU-time needed by a job normalized by server's capability. Note that I is the distribution of interarrival time, S_i \sim S is the distribution for each job size (not population distribution). We always assume \lambda < \mu for single queue stable system.
We calculate the expected number of jobs at time:
In our class, we have stability: \lambda < \mu. Otherwise the E[N(t)] will grow with t and therefore is unbounded. Assuming stability, for D/D/1 queue, we have T_Q = 0, T = S.
Kendall Notation: a representation of a single queue.
In Kendall notation, we assume distributions are all independence and all values (service time and interarrival time) are drawn from i.i.d. distribution.
Example: M/M/k means I \sim \text{Exp}(\lambda), S \sim \text{Exp}(\mu), and there is k server. G means any general service time.
Note that the fourth slot can mean scheduling policy
or buffer capacity
, but there is no consensus.
Throughput: long-run rate of job completions over time.
Under stable system with one queue and k servers, X = \lambda < k\mu, but under unstable system X = k\mu.
Imagine a system:
We have k queues
Each queue i has a input from outside of system. The rate is r_i
Each queue i has a output to outside of system. The probability is p_{i, out}
Each queue i has a output to every queues j (might include self loop i). The probability is p_{i, j}
Therefore, we denote the total arrival rate as below
Assuming stable system, we have throughput X = \sum_{i = 0}^k r_i. But we need to ensure none of the queues inside the system does not blow up. We need:
Throughput for Deterministic Routing: X = \lambda, but X_i = c\lambda where c is the number of time a job will go through server i.
Throughput for Finite Buffer: Assume jobs are dropped if buffer is full. Then
Utilization (load): fraction of time that the device is busy (always assuming one device k = 1) where B(t) is total seconds of time busy from start to time t.
Purpose:
to calculate system
mean divide mean
We can represent system as DTMC or CTMC where each state denotes the number of jobs in system (for memoryless distribution only, if not, we can always approximate with multiple memoryless distributions). There are other methods like "tagged job methods"
Modeling System as Markov Chain: for ergodic system:
irreducible: every number of jobs is possible to reach
aperiodic: no periodicity to the times when state 0 is visited. Typically we imagine time is continuous.
positive recurrent: system is stable
Little's Law: for any ergodic system, we have
Intuition: E[\text{Time between completions}] = E[I] = \frac{1}{\lambda} for stable system. And therefore E[T] = E[N] \cdot E[\text{Time between completions}].
Proof: We arrange the T of each job in a timeline. We are interested in calculating the area before time T = t.
Notice the law does not require FCFS order and is independent of scheduling policy. It holds for any system and any parts of the system as long as it is ergodic.
Little's Law for Time in Queue: Given any system where \lambda = X and all quantities with limits exists:
The proof is the same except when summing up the region, we leave out the portion when the job is not in the service (might break into segments since a job can be in the queue for a while and then in service and then wait for other queues...)
Utilization Law: within a ergodic network of queues
Little's Law for Red Jobs: for ergodic system
The proof is the same except we now only sum up jobs that are red.
Table of Content