queueing.html

COP 4610: Operating Systems & Concurrent Programming

Queueing Theory & Other Analysis Techniques

/TD>

The objective of these notes is to make you aware of the existence of analytic techniques for predicting OS performance, to give you a flavor of how they work, and to give some examples of simple results. We are limited in how deep we can go without making assumptions about background in probability theory. The only actual result we expect everyone to remember and be able to apply is Little's Result. There is a very nice "picture proof" of this result accessible via a link further below, which we go over in class. I hope you will read and follow it. You may even see it on an examination.

The ".jpg" figures in these notes are reproduced for the use of students in this course from William Stallings' textbook and on-line notes on Queueing Analysis ( ftp://shell.shore.net/members/w/s/ws/Support/QueuingAnalysis.pdf). You can read that document if you would like to see all the words that go with the figures.

There are some other nice notes, due to G.A. Vignaux, at http://www.mcs.vuw.ac.nz/~vignaux/docs/mm1.html. These include a very good explanation of statistical equilibrium, Markov processes, and Little's Result. I strongly recommend looking at them, as they are better than what Stallings wrote on the subject.

OS Performance Analysis Techniques

Build the new/modified system, instrument it, and test it
Try to extrapolate from performance of the most similar existing systems
Build a mathematical model, and analyze it (Queueing Theory)
Build a software simulation model, instrument it, and run it

Stallings' textbook may give the impression that there is no interesting mathematical theory underlying operating systems. That is not true. The textbook simply plays down the theory, to broaden the potential readership.

There are several areas with well developed theory, some bits of which pop up here and there in Stallings text. These examples include:

deadlock and its detection, based on graph theory
deterministic scheduling theory, based on sets, relations, and some calculus
performance analysis, based on a branch of probability theory, called queueing theory
distributed algorithms, their correctness and performance

Queueing theory, originally developed by industrial engineers for optimizing the flow of work in businesses, has also been very useful for modeling the flow of work in computers and networks. Without much background in probability, we will not be able to do much with queueing theory in this course. However, there is one simple and very useful result that does not require much knowledge of probability. That is called Little's Result. We will look at queueing theory at a very superficial level, and in particular at one simple result from queueing theory, known as Little's Result.

Queueing theory is mentioned on pp 414-416 in the text.

More complex systems can be studied using simulations. Simulation is discussed on pp 417-419 in text.

Projected Versus Actual Response Time

Projection can be inaccurate. The figure compares actual response time of a system against a projection made from actual data up to a load of 0.5. The projection is based on fitting a 3rd-order polynomial to the known data.

Queueing Theory

application of probability theory
analyzes queueing (waiting and service) behavior
provides some good analytic results, for simple systems
also provides a mathematical basis for simulation results

Single Server Queue

The most basic queuing model is the single-server queue.

Single Server Queue

maximum input rate = 1 / average service time:

Queues become very large (grow exponentially) as utilization (ρ) approaches 1.

Assumptions:

population: usually assumed infinite, so arrival rate is not reduced by loss of population
queue size: usually assumed unbounded, so no items lost from system

Multiserver Queue

Theoretical maximum utilization (N * ρ) is N * 100%, so theoretical maximum input rate is N / average service time.

Multiple Single-Server Queues

Queueing System Notation (1)

Note that "r" above (taken from texbook) is not residence time, but the number of resident work items.

Also, the terms "traffic intensity" and "utilization" are generally interpreted as meaning the same thing.

Queueing System Notation (2)

Queueing Theory Questions

Inputs
- arrival rate
- service time
Outputs
- items waiting
- waiting time
- items in residence
- residence time

How to outputs relate to inputs? e.g., how their averages relate?

X/Y/N Classification of Queueing Models

X = interarrival time distribution
Y = service time distribution
N = number of servers

G = general independent arrivals/service times
M = negative exponential distribution (from Markov)
D = deterministic arrivals or fixed length service

e.g., M/M/1 refers to single-server model with Poisson arrival rate and exponential service times

Poisson Distribution: Probablity Density

The Poisson distribution is for a discrete-valued random variable, such as the number of arrivals that occur within a fixed lengh (e.g., unit) time interval.

(Inverse) Exponential Distribution: Formulae

Here, 1 / λ is the mean value.

Exponential Distribution: Probability Distribution Function

Exponential Distribution: Probability Density Function

It can be shown that the exponential distribution is the only distribution that has the "memoryless" property. That is, the probability that the time until the next arrival at the (any) current time is independent of when the last preceding arrival occurred.

Basic M/G/1 (General) Queueing Relationships

When standard deviation = mean, the service time is exponential, so we have M/M/1.

When standard deviation is zero, the service time is constant, so we have M/D/1.

Little's Result/Law

At statistical equilibrium,
the mean number of jobs/requests in a system encountered by new arrival
= the mean number in the system left behind on departure.
Inventory = Throughput * Flow_Time

(e.g., jobs = jobs/second * seconds)
N = r W, where:
- N = mean number in the system
- r = arrival rate (= throughput, at equilibrium)
- W = mean time in system (= response time)

This is a very valuable observation that has a broad range of applications and is worth pondering. I has been stated in many ways.

John Bentley, in his Programming Pearls (a very good book, which I hope to finish reading some day!) gives an example:

``For instance, if you're in line waiting to get into a popular nightspot, you might figure out how long you'll have to wait by standing there for a while and trying to estimate the rate at which people are entering. With Little's Law, though, you could reason, `This place holds about 60 people, and the average Joe will be in there about 3 hours, so we're entering at the rate of about 20 people an hour. The line has 20 people in it, so that means we'll wait about an hour.''

Bentley also cites the following, attributing them to Peter Denning:

``The average number of objects in a queue is the product of the entry rate and the average holding time.''

``I have 150 cases of wine in my basement and I consume (and purchase) 25 cases per year. How long do I hold each case? Little's Law tells me to divide 150 cases by 25 cases/year, which gives 6 years per case.''

``The response-time formula for a multi-user system can be proved using Little's Law and flow balance. Assume n users of average think time z are connected to an arbitrary system with response time r. Each user cycles between thinking and waiting-for-response, so the total number of jobs in the meta-system (consisting of users and the computer system) is fixed at n. If you cut the path from the system's output to the users, you see a meta-system with average load n, average response time z+r, and throughput x (measured in jobs per time unit). Little's Law says n = x*(z+r), and solving for r gives r = n/x - z.''

A Good Explanation

A very nice explanation of Little's Law is given at http://www.mcs.vuw.ac.nz/~vignaux/docs/mm1.html.

Please read it. It is better than anything Stallings wrote on the subject, or these notes. I have only provided a link to it, rather than embedding it here, out of respect for the author's copyright.

Mean Queue Size for Single-Server Queue

The figure shows the relationship of average queue size to utilization, for several different distributions. The coefficient of variation is std deviation of service time divided by the mean service time, and is a measure of variability in the distribution.

The graph for mean residence time looks very similar. This makes sense, if we figure the longer the queue the longer the residence time.

M/G/1 Service Times

M/M/1 Service Times

M/D/1 Service Times

Example of a Network of Queues

Traffic Partitioning

Traffic Merging

Simple Tandem Queue (Sequencing)

Jackson's Theorem

Network of m nodes with independent exponential service.

Outside arrivals have a Poisson rate (i.e., exponential inter-arrival times).

Once served at a node, an item goes immediately to one of the other nodes with a fixed probability, or out of the system.

Says a network of queues, each node may be analyzed separately from the others using the M/M/1 or M/M/N model, and the results combined using ordinary statistical methods.

Example Queuing Theory Application: DB Server

LAN with 100 PC's and DB server
average query service time is 0.6 seconds
std. deviation = mean
peak query rate over the LAN is 20 queries per minute
Questions
- what is the average response time?
- what percentage growth can be accommodated before response time exceeds 1.5 second?
- if utilization increases 20% will response time increase by more or less than 20%?

ρ = λ T_s
= (20 arrivals/minute)(0.6 seconds/transmission)/(60 sec/min)
= 0.2

avg response time = T_r
= T_s/(1 - ρ)
= 0.6/(1 - 0.2) = 0.75 seconds

© 2002, 2005 T. P. Baker & Florida State University. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without written permission. (Last updated by $Author: cop4610 $ on $Date: 2002/09/26 20:06:06 $.)