COMPUTER AND NETWORK
SYSTEM ADMINISTRATION
Summer 1996 - Lesson 23
More Performance Analysis
A. Straining memory
1. vmstat -S on mu (SunOS)
procs memory page disk faults cpu
r b w avm fre si so pi po fr de sr d0 d1 d2 d3 in sy cs us sy id
3 1 0 0 4912 0 0 144 0 0 0 0 10 0 0 0 370 363 147 31 61 8
3 1 0 0 4912 0 0 176 0 0 0 0 0 0 0 0 424 392 163 30 70 0
3 1 0 0 5556 5 0 188 0 0 0 0 0 0 0 0 446 428 172 38 62 0
3 1 0 0 5556 0 0 200 0 0 0 0 0 0 0 0 509 439 177 33 67 0
5 0 0 0 5576 0 0 196 0 0 0 0 0 0 0 0 495 427 179 39 61 0
5 0 0 0 5576 0 0 172 0 0 0 0 0 0 0 0 423 411 164 47 53 0
5 0 0 0 5576 0 0 144 0 0 0 0 1 0 0 0 397 400 148 39 52 9
1 1 0 0 3720 0 0 144 0 0 0 0 0 0 0 0 377 398 163 43 56 1
1 1 0 0 3720 0 0 152 0 28 0 46 0 0 0 0 372 402 165 43 57 0
3 0 0 0 1632 0 0 144 4 148 0 270 1 0 0 0 361 425 175 46 54 0
3 0 0 0 1632 0 0 200 0 316 0 548 0 0 0 0 455 381 182 34 66 0
3 1 0 0 736 4 0 220 196 468 0 420 9 0 0 0 452 383 167 51 49 0
3 1 0 0 736 0 0 228 224 416 0 278 3 0 0 0 495 355 170 23 51 26
3 1 0 0 736 0 0 256 280 428 0 197 4 0 0 0 562 337 188 27 65 9
3 1 0 0 480 0 0 288 336 500 0 186 7 0 0 0 637 324 193 26 74 0
3 1 0 0 480 0 0 256 300 484 0 161 10 0 0 0 589 275 176 18 55 28
3 0 0 0 404 6 0 248 396 564 0 141 6 0 0 0 565 325 184 37 63 0
3 0 0 0 404 0 0 240 352 456 0 89 3 0 0 0 539 314 184 14 51 35
3 0 0 0 404 0 0 204 224 288 0 56 1 0 0 0 478 319 187 11 30 58
0 1 0 0 6896 5 0 128 140 180 0 35 0 0 0 0 312 237 124 5 11 84
0 1 0 0 6896 0 0 156 88 112 0 22 2 0 0 0 362 242 141 14 40 46
0 1 0 0 9032 5 0 192 52 68 0 13 0 0 0 0 444 231 156 6 34 59
0 1 0 0 9032 0 0 204 32 40 0 8 0 0 0 0 463 220 160 7 35 58
0 1 0 0 9032 0 0 212 16 24 0 4 0 0 0 0 477 201 154 8 37 55
0 1 0 0 8968 0 0 204 8 12 0 2 1 0 0 0 474 192 149 9 33 58
0 1 0 0 8968 0 0 164 0 4 0 0 0 0 0 0 388 217 152 17 25 58
1 0 0 0 8472 1 0 160 0 0 0 0 0 0 0 0 384 222 155 14 28 58
1 0 0 0 8472 0 0 156 0 0 16 0 3 0 0 0 402 291 175 16 53 30
1 0 0 0 8472 0 0 136 0 0 16 0 2 0 0 0 379 332 177 23 42 35
> mu has 32 Mbytes of RAM
> lotsfree appears to be 4M bytes, (at most 1/8 of memory)
- threshold for pagedaemon to go to work
> desfree is 2M bytes, (at most 1/16 of memory)
- desired amount of free memory
> minfree is 1M, (at most 1/2 of desfree)
- involuntary swapping begins below this threshold
> note speed of clock increase - "sr"
B. swap space
1. the disk area used for paging and swapping
- SunOS: typically the b partition (/dev/sd0b)
2. many rules of thumb
- look at swap utilization
pstat -s (BSD-based)
top (Linux)
- example from mu:
7624k allocated + 1608k reserved = 9232k used
23488k available
- example from xi:
10852k allocated + 2516k reserved = 13368k used
53156k available
3. adding swap
- spread swap out over several local disks should improve
performance
- used to have to repartition
- now can use a file within a filesystem
mkfile
> creates a file suitable for swap
(padded with zeroes)
/usr/etc/swapon
> adds file to swap area
> don't add until file system is mounted
> can't 'unswapon'
C. Disk performance
1. want to optimize 3 factors
- per-process disk throughput
- aggregate disk throughput
- disk storage efficiency
2. per-process disk throughput
> the speed at which a single process can read or
write to a disk
> simple to measure - "cp" a large file
> MAX throughput: 2M /sec
> local disk to disk: 32M / 35 sec = 914K/sec
> this is good since the "cp" process only got about
half of the CPU time
> network copy: 125K /sec
3. aggregate disk throughput
> difficult to measure
> depends on job mix
4. disk storage efficiency
> minimize wasted space
> performance and space optimization are usually
in an inverse relationship
> decrease block size increases usage
but decreases performance
5. best recommendation
> spread disk load around
> most workstations these days have /, swap, and /usr
on a local disk and everything else on NFS mounted
partitions
> which drives (or NFS servers) are the most loaded?
> place the handful of most common binaries on local
/usr partition
6. iostat
- example: iostat -d 5
sd0 sd1 sd3
bps tps msps bps tps msps bps tps msps
4 1 0.0 17 2 0.0 1 0 0.0
79 11 0.0 23 3 0.0 2 0 0.0
59 8 0.0 72 10 0.0 2 0 0.0
0 0 0.0 10 2 0.0 0 0 0.0
12 2 0.0 0 0 0.0 87 11 0.0
0 0 0.0 0 0 0.0 477 60 0.0
2 0 0.0 11 2 0.0 452 57 0.0
13 2 0.0 11 1 0.0 464 58 0.0
13 2 0.0 60 9 0.0 479 61 0.0
0 0 0.0 3 0 0.0 458 57 0.0
0 0 0.0 3 0 0.0 521 65 0.0
0 0 0.0 20 3 0.0 465 58 0.0
7 1 0.0 12 2 0.0 501 63 0.0
4 1 0.0 23 4 0.0 471 59 0.0
0 0 0.0 2 0 0.0 511 64 0.0
0 0 0.0 19 3 0.0 137 17 0.0
0 0 0.0 16 2 0.0 0 0 0.0
0 0 0.0 14 2 0.0 0 0 0.0
18 3 0.0 31 5 0.0 4 1 0.0
> bps - average Kbytes/sec during last interval
> msps - msec / seek, unreliable due to controller
specificity, ignore it
> tps - average number of transfers/sec during previous
interval (above reflects 8K size)
- take averages over long period of use and study peaks
- move files among disk and servers to equalize load
7. large vs. small partition sizes
- example: 9 gbyte disk
- UNIX limit of number of partitions to 7
> a-h with c taken
- arguments for LARGE partitions:
> fewer mounts and unmounts
> fewer file systems to manage (fewer quota files, etc.)
> less wasted space
- arguments for SMALL partitions
> easier to backup if size is less than
media size (2.0 G, for example)
> file system overflow affects fewer users
> file system corruption affects fewer users
> (SunOS) format command works!
8. How full to let file systems become
- if too full then the block allocation routine
slows down trying to find contiguous blocks
- eventually fails and moves to other cylinder groups
- Leffler states that as file system free space approaches
zero the file system throughput approaches 1/2 of normal
- the free-space reserve does this automatically when
creating file system
- (BSD:) default is 10%, but doesn't stop root!
D. Network performance (and integrity)
1. netstat -i
input output
packets errs packets errs colls
---------------------------------------
37135998 8 27664258 72 1753344
- a large number of input errors means that there may
some faulty hardware on the local net
> should be under 0.025% of input packets
- a large number of output errors may indicate that the
local machine's transceiver, ethernet controller, or
AUI cable is faulty
> should be under 0.025% (above is 0.0003%)
- collisions are normal on an ethernet, may be as high as 10%
> if consistently higher than 10% then consider bridging
or subnetting
> above is 6.3%
2. nfsstat -c
- can also be used to test network corruption of packets
Example from nu:
Client rpc:
calls badcalls retrans badxid timeout wait newcred timers
1966207 21774 0 549 21756 0 0 8344
- retrans field indicates number of packets this host had to
retransmit as an RPC client
- badxid field indicates that the client received a reply for which
there is no outstanding call
- if retrans is over 5% of total calls then suspect trouble
> above example is 1%
- what kind of trouble?
> if badxid and retrans are roughly equal then some
NFS server is having trouble keeping up with NFS load
> if retrans is high and badxid is low then network itself
is the problem (high load or data corruption)
- nfsstat lets you zero the counters at any time with:
nfsstat -z
3. tracking network problems
- right way, buy a LAN analyzer
> time reflections (with a Time Domain Reflectometer - TDR)
tells you how far up the wire there is a
reflection (within inches)
- wrong way
> methodically disect the network
4. other ways to assess network load
- netstat shows active connections ("netstat -a")
- use "spray" / "tcpspray" to load network and see weak spots