COMPUTER AND NETWORK
SYSTEM ADMINISTRATION
Summer 1999 - Lesson 11
Performance Analysis
Reference: Chapter 7 of ESA and Chapter 11 of EWNTSA.
A. Introduction
1. When performance is bad, user complaints come in the form of:
"Why is the system so sloooooow?"
or
"My job is taking forever to run!"
User will report slow keyboard response or long compilation times.
Hopefully you as the administrator notice these problems first
before the bombardment of user complaints.
2. Where to start? What to monitor?
Performance is affected by the efficency of the
four main resources that a system offers:
- CPU
- Memory
- Disk
- Network
3. These are all related.
- NFS traffic depends on network bandwidth as well
as disk bandwidth
- Disk bandwidth depends on memory if disk caching
is in place
4. What is good performance?
- The system administrator must distinguish between
poor performance caused by system malfunctioning
and that caused by heavy usage
- Times of heavy usage are good times to analyze the
system and see where bottlenecks are
- This will help you determine where to put scarce
funds
- Long term analysis, difficult to implement, can
give you better data.
B. CPU monitoring
1. time: time a command
- several system commands will time a job
- /usr/bin/time, /usr/5bin/time (Solaris), shell's built-in "time"
Example (/usr/bin/time):
% /usr/bin/time find / -name csh.1 -print
/usr/share/man.xi.orig/man1/csh.1
real 3.2
user 0.4
sys 1.9
real: wall clock time
user: user CPU time
sys: system CPU time
Example: (csh built-in time):
% time find / -name csh.1 -print
/usr/share/man.xi.orig/man1/csh.1
0.39u 1.64s 0:02.56 79.2%
0.39u: user CPU time
1.64s: system CPU time
0:02.56: wall clock time
79.2%: percentage of time spent on CPU ((u+s)/w)
3. uptime: report current time, amount of time system has been up,
number of users, load average
Example:
% uptime
3:03pm up 1 day(s), 1:20, 14 users, load average: 0.20, 0.09, 0.08
- Load average is rough measure of CPU use
- Reports the average number of processes active (where active is
defined as the number of processes in the run queue) during the last
minute, 5 minutes and 15 minutes
4. rup: show host status of remote machines, using an interesting
broadcast protocol or explicit interrogation.
Example:
% rup xi upsilon sed nu linuxfs1 linuxfs2
xi up 1 day, 1:31, load average: 0.13, 0.26, 0.19
upsilon up 63 days, 23:31, load average: 0.00, 0.02, 0.02
sed up 1 day, 21:47, load average: 0.00, 0.00, 0.00
nu up 37 days, 21:39, load average: 0.11, 0.09, 0.00
linuxfs1 up 1 day, 23:15, load average: 0.00, 0.06, 0.09
linuxfs2 up 14 days, 17:35, load average: 0.03, 0.01, 0.00
5. ps: report process status
- has many options - read man page for specifics
Example: (Solaris)
% ps -ef
UID PID PPID C STIME TTY TIME CMD
root 0 0 0 Jul 03 ? 0:00 sched
root 1 0 0 Jul 03 ? 0:04 /etc/init -r
root 2 0 0 Jul 03 ? 0:00 pageout
root 3 0 0 Jul 03 ? 3:36 fsflush
root 449 1 0 Jul 03 ? 0:00 /usr/lib/saf/sac -t 300
root 224 1 0 Jul 03 ? 1:09 /usr/lib/autofs/automountd
root 136 1 0 Jul 03 ? 0:27 /usr/sbin/rpcbind
healy 7763 7712 1 14:18:43 pts/7 0:12 emacs signal.c
koshy 3279 3276 0 Jul 03 pts/13 0:00 -reg-csh
root 17242 243 0 19:20:12 ? 0:00 /usr/samba/bin/smbd -D
UID: login name
PID: process id
PPID: process ID of the parent process
C: current scheduler value
STIME: start time
TTY: associated terminal
TIME: accumulated CPU time
CMD: command
- ps is often used in pipes
Example:
% ps -ef | grep httpd
nobody 9538 299 0 15:29:23 ? 0:00 /usr/local/etc/httpd/httpd
nobody 9302 299 0 15:22:59 ? 0:00 /usr/local/etc/httpd/httpd
nobody 9557 299 0 15:31:29 ? 0:00 /usr/local/etc/httpd/httpd
nobody 9540 299 0 15:29:24 ? 0:00 /usr/local/etc/httpd/httpd
nobody 9112 299 0 15:17:34 ? 0:00 /usr/local/etc/httpd/httpd
nobody 9304 299 0 15:23:22 ? 0:00 /usr/local/etc/httpd/httpd
6. top: display and update information about the top cpu processes
- Excellent tool for overall view of system
- Combines output of several commands (uptime, ps, vmstat)
Example:
% top
last pid: 9649; load averages: 0.03, 0.05, 0.12 15:36:19
113 processes: 112 sleeping, 1 on cpu
CPU states: 97.6% idle, 1.0% user, 1.4% kernel, 0.0% iowait, 0.0% swap
Memory: 152M real, 48M free, 54M swap, 721M free swap
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
9649 barnash 31 0 1864K 1472K cpu 0:00 1.26% 0.99% top
5356 sheff 33 0 2064K 1384K sleep 8:19 0.33% 0.39% xsysstats
9114 nobody 33 0 1968K 1472K sleep 0:00 0.11% 0.16% httpd
9585 nobody 35 0 1968K 1448K sleep 0:00 0.11% 0.12% httpd
9304 nobody 33 0 1984K 1480K sleep 0:00 0.01% 0.10% httpd
PID: process id
USERNAME: name of the process's owner
PRI: current priority of the process
NICE: nice amount (in the range -20 to 20)
SIZE: total size of the process (text, data and stack; kilobytes)
RES: current amount of resident memory (kilobytes)
STATE: current state (sleep, wait, run, idl, zomb, stop)
TIME: number of system and user cpu seconds the process has used
WCPU: weighted percentage of cpu time
CPU: raw percentage of cpu time
COMMAND: name of the command
- From within top, you can control behavior of processes with renice and kill
renice:
- Change nice number (requested execution priority)
- Syntax: r new-nice-number pid
- nice range either -20 to 20 or 0 to 39
- The lower the nice number the higher the priority
- Only superuser can lower nice number
kill: terminate process
- Syntax: k [-signal] pid
7. Task Manager - Windows NT
- CTRL-ALT-DEL and choose Task Manager
or CTRL-SHIFT-ESC
or right click on taskbar and choose Task Manager
- Applications: shows which applications are active
- Status should be "Running"
- If status is "Not Responding" you can use "End Task" to kill it
- Double click on application or click "Switch To" to bring it to front
- Click "New Task" to start new application
- Right click on application to bring up menu of options
- Processes: shows which processes are active (similar to top)
- Applications have one or more processes but not all processes
are an application.
- Can end process by choosing "End Process"; be sure you
know what process is before ending
- Right click on process to set priority
- Can reorder listing by clicking on column headings
(in ascending or descending numeric order)
- Performance: CPU and memory utilization
- Graphical representation of CPU utilization
- Minimize Task Manager for CPU utilization graphic on Task Bar
8. QuickSlice - WinNT Server Resource Kit
- Nice graphical tool for analyzing cpu utilization
B. Memory performance analysis
1. Buying more memory is generally the cheapest way to
improve performance (UNIX and NT)
2. Generally, active processes require more physical memory
than is available
- paging: involves moving sections of a process's memory to disk
- page fault: occurs when a process needs a page of memory that is not
resident and must be read in from disk
- swapping: writing an entire process to disk, freeing all of its memory
3. When do we have problems?
- Preventative swapping is normal
- A ps -aux usally shows many swapped out processes
STAT column - W as second letter means swapped out
- Paging is also part of normal operations
> A new process must have new pages brought into memory
> Also must page in when it references non-recently used section
of memory
- Page faults always cause a performance degradation
- Usually, the pagedaemon quickly fixes the problem by
getting rid of unneeded pages and loading the needed ones
- When the pagedeamon fails then desperation swapping begins
- What types of processes are likely to be swapped out by
desperation swapping?
> Answer: ones that sleep: editors, shells, generally interactive
processes
> Keyboard response time goes to pot since a keystroke requires
a disk access (and the disk is probably heavily loaded at this
time)
4. How to diagnose performance problems
1. tools - BSD: vmstat
S5: sar
Solaris: mpstat
WinNT: Task Manager & Performance Monitor
2. These tools report:
page-ins
page-outs
swap-ins
swap-outs
3. Page-ins
- Most UNIX systems use 'demand paging'
- When a process is started only the memory
maps for the process are loaded in physical
memory
- Each memory access causes a page fault
and each page is brought in 'on demand'
- The alternative is 'pre-paging'
- Thus page-ins are normal
4. Swap-ins
- A new process acts like a swap-in
5. Page-outs
- This is a first indicator that your memory is
inadequate
- Some page-out activity is normal
- Does the frequency of page-outs dramatically
increase whenever system performance is sluggish?
- Acceptable rate is O/S and hardware dependent
- In order to know you need to establish baselines of
activity (via short or long term performance measurements)
6. Swap-outs
- Heavy amount of swap-outs signify problem
7. Example (BSD):
% vmstat -S
procs memory page disk faults cpu
r b w avm fre si so pi po fr de sr d0 d1 d2 d3 in sy cs us sy id
0 0 0 0 3028 4 1 1 2 1 0 0 2 2 0 0 0 82 177 89 33 9
- procs
Number of processes:
r - runnable (not waiting for I/O or sleeping)
b - blocked for resources (i/o, paging, etc.)
w - runnable or short sleeper (< 20 secs) but
swapped
- Any number > 0 in the w column indicates what?
> Answer: desperation swapping
- memory
avm - number of active virtual Kbytes (used in last 20 secs)
fre - size of the free list in Kbytes
- page
Report information about swapping, page faults, and paging
activity
Reported in units per second (averaged over last 5 seconds)
si - procs swap-ins
so - procs swap-outs (not due to idle)
pi - kilobytes per second paged in
po - kilobytes per second paged out
fr - kilobytes freed per second
de - anticipated short term memory shortfall in Kbytes
sr - pages scanned by clock algorithm, per-second
- disk
Report number of disk operations per second.
- faults
Report trap/interrupt rate averages per second over last 5 seconds
in - (non clock) device interrupts per second
sy - system calls per second
cs - CPU context switch rate (switches/sec)
- cpu
Give a breakdown of percentage usage of CPU time.
us - user time for normal and low priority processes
sy - system time
id - CPU idle
- We are most concerned with swap-outs and page-outs
procs memory page disk faults cpu
r b w avm fre si so pi po fr de sr d0 d1 d2 d3 in sy cs us sy id
0 0 0 0 2508 20 0 0 0 0 0 0 13 0 0 0 226 216 350 7 6 87
0 0 0 0 2280 0 0 16 0 0 0 0 3 0 0 0 258 361 343 5 8 87
0 0 0 0 2104 21 0 124 56 184 0 111 5 0 0 0 545 667 563 14 16 70
0 0 0 0 2120 0 0 36 12 60 0 37 0 0 0 0 338 387 345 3 5 92
0 0 0 0 2076 0 0 12 0 28 0 23 1 0 0 0 263 271 370 3 4 92
0 1 0 0 2048 5 0 0 0 44 16 33 1 0 0 0 320 473 497 6 9 85
8 1 0 0 2116 10 0 0 0 100 0 56 23 0 0 0 514 377 898 14 14 72
0 0 0 0 2084 5 0 24 16 148 0 67 6 0 0 0 350 424 529 9 10 81
8. Example using System V sar tool (Solaris):
% sar -g 5
SunOS xi 5.5.1 Generic_103640-03 sun4u 07/05/97
20:15:43 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
20:15:48 0.00 0.00 0.00 0.00 0.00
- pgout/s: number of page out operations
- ppgout/s: number of pages paged out
- pgfree/s: number of reclaimed pages
- pgscan/s: average number of pages scanned in order
to find cadidates to reclain
- percentage of inodes removed from the free list
9. Example using NT's Task Manager
- Choose "Process" tab
- From "View" menu, choose "Select columns..."
- Can choose from a variety of choices including Page Faults,
Virtual Memory Size, etc.
- "Performance" tab has graphical depiction of memory usage
and other statistics
- How many handles, threads and processes exist
- Total physical memory, how much is free and how much used for cache
- Commit charge shows how much memory is allocated to application
and system programs. Also shows memory limit and peak.
- Memory used by kernel, how much is paged and nonpaged
10. wmem freeware utility for NT: shows RAM and paging information
http://www.winsite.com/info/pc/winnt/dskutil/wmem.zip