CIS 5406 - Lecture Notes # 22 - Performance Analysis



                          COMPUTER AND NETWORK
                         SYSTEM  ADMINISTRATION
                         Summer 1995 - Lesson 22

                          Performance Analysis


A. Introduction

   1. Performance is affected by the efficency of the
      four main resources that a system offers:

      - CPU speed
      - memory speed and amount
      - disk bandwidth
      - network bandwidth

   2. These are all related.

      - NFS traffic depends on network bandwidth as well
        as disk bandwidth

      - disk bandwidth depends on memory if disk caching
        is in place

   3. What is good performance?

      - to the user is is usually keyboard response time

      - it may be execution time: 'How long does it take to
        run my job?'

      - the system administrator must distinguish between
        poor performance caused by system malfunctioning
        and that caused by heavy usage

      - times of heavy usage are good times to analyze the
        system and see where bottlenecks are

      - this will help you determine where to put scarce
        funds

    4. Run time

      - several system commands will time a job

      - /usr/bin/time, /usr/5bin/time (Solaris), shell's built-in "time"

        time "find ..."

        user   system  wall    (U+S)/W  shared ave.
        CPU    CPU     time             mem    unshared  num of  
                                               data      swaps
        --------------------------------------------------------
        0.200u 4.930s  0:10.73 47.8%    155k   150k      81pf

    5. Interactive response time

      - UNIX makes every effort to prioritize interactive response
        time on the local machine

      - it does this to prevent keyboard buffers from overflowing

      - network delays are primary culprit on a single-user workstation
  
      - on a large multi-user server, CPU sharing is primary culprit

B. memory performance analysis

   1. buying more memory is generally the cheapest way to 
      improve performance (especially now :)

   2. generally, active processes require more physical memory
      than is available

   3. to make memory available the kernel begins to copy pages
      of 'unneeded' memory to disk

   4. the kernel may also copy the memory image of entire processes
      to disk, this is called swapping

   5. review the basic UNIX memory management algorithm

      - early versions of UNIX (prior to 3BSD) were based on swapping only
      - processes were swapped out in their entirety (except for shared
        text)
      - beginning with 3 BSD demand paging was implemented
      - neither the working set model or prepaging was incorporated
      - whenever filling memory from file, though, UNIX tries to read
        any pages of the file adjacent to the faulted page (called
        fill-on-demand-clustering)
      - the algorithm was developed using extensive simulation
 
   6. The pagedaemon
 
      - the paging algorithm was implemented partly in the kernel and
        partly in a new process called the "pagedaemon" - process 2
      - process 0 is the swapper, and process 1 is init
      - the kernel memory is fixed (not paged)
      - a data structure called the core map is also fixed
      - the core map contains information about the contents of each
        page frame in memory
      - a page frame may contain a page (text, data, stack, page table) or
        may be on the free list
      - the pagedaemon's main job is to periodically check and see if the
        free list is getting short, and if so, it frees up some pages

   7. The 4.3 BSD UNIX page replacement algorithm

      - the pagedaemon also executes the page replacement algorithm
      - every 250 msec the pagedaemon awakes and checks to see if the
        number of page frames on the freelist is >= lotsfree - typically
        1/4 of memory
      - if there are enough free page frames the daemon goes back to sleep
      - if not then the daemon starts looking for pages to eject
      - a global algorithm is used - that is, a page from any process
        is a candidate for ejection
      - the page replacment algorithm that is used is the two-handed
        clock algorithm
      - this algorithm approximates global LRU
      - the hands of the clock sweep across all page frames
      - the first hand CLEARS a reference bit (in the core map)
      - if a page is referenced then the reference bit is SET
      - the second hand checks to see if the reference bit is set and
        if it is not, then the page is a candidate for ejection
      - it is written to disk (if dirty), and the page frame is placed on
        free list
      - contents are not erased so can be recovered if it is needed before 
        it is overwritten
      - if hands are close together then only heavily used pages are
        spared
      - if hands are far apart and memory is large then practically every
        page is spared and the PFF goes way up! 
      - UNIX keeps the hands 2 Mbytes apart (if memory is divided into 1 K
        clusters and pages are scanned at about 200 pages per second then the
        time between the first and last hand is about 10 seconds)

   8. swapper
  
       - the swapper moves processes which has been idle for more than 
         20 seconds (preventative swapping - normal housekeeping)
       - if the pagedaemon cannot keep lotsfree high enough, if the
        number of Kbytes of free memory fall below minfree then
        the swapper kicks in (desperation swapping)

      - the swapper chooses a process to swap out based on 2 criteria:
        > longest sleep time
        > if none are sleeping, then use resident memory size
          (the swapper chooses largest 4 processes, then picks the one
           which has been resident longest)

      - when a process is swapped out, everything goes - even the user 
        structure and the page tables

      - swapping is much more expensive than paging so a highly loaded
        system - that invokes swapping frequently - does not perform well

      - UNIX (BSD) attempts to prevent swapping by making lotsfree large,
        frequently one-fourth of memory


   9. When do we have problems?

      - preventative swapping is normal
      - a ps -aux usally shows many swapped out processes
      - paging is also part of normal operations

        > a new process must have new pages brought into memory
        > also must page in when it references non-recently used section
          of memory

      - page faults always cause a performance degradation

      - usually, the pagedaemon quickly fixes the problem by
        getting rid of unneeded pages and loading the needed
        ones

      - when the pagedeamon fails then desperation swapping begins
  
      - what types of processes are likely to be swapped out by
        desperation swapping?

        > ans: ones that sleep: editors, shells, generally interactive
               processes

        > keyboard response time goes to pot since a keystroke requires
          a disk access (and the disk is probably heavily loaded at this
          time)

  10. how to diagnose

      1. tools - BSD: vmstat
                 S5:  sar

      2. these tools report:

         page-ins
         page-outs
         swap-ins
         swap-outs

      3. page-ins

         - most UNIX systems use 'demand paging'
         - when a process is started only the memory
           maps for the process are loaded in physical
           memory
         - each memory access causes a page fault
           and each page is brought in 'on demand'
         - the alternative is 'pre-paging'
         - thus page-ins are normal

      4. swap-ins
         - a new process acts like a swap-in
         - not very useful

      5. page-outs

         - this is a first indicator that your memory is
           inadequate
         - some page-out activity is normal
         - does the frequency of page-outs dramatically
           increase whenever system performance is sluggish?
	 - acceptable rate is O/S and hardware dependent
         - in order to know you need to establish baselines of
           activity

      6. swap-outs 

         - example vmstat -S

procs   memory              page                disk       faults     cpu
r b w avm   fre   si  so pi  po  fr  de  sr  d0 d1 d2 d3  in  sy  cs us sy id
0 0 0   0  3028    4   1  1   2   1   0   0   2  2  0  0  0  82 177  89 33 9


        - procs 

          Number of processes:
     
             r  - runnable (not waiting for I/O or sleeping)

             b  - blocked for resources (i/o, paging, etc.)

             w  - runnable or short sleeper (<  20  secs)  but
                  swapped

        - any number but 0 in the w column indicates what?

          > ans: desperation swapping
    
        -  memory 

             avm - number of active virtual Kbytes (used in last 20 secs)

             fre - size of the free list in Kbytes 

               > when this gets close to lotsfree, 2M on xi, then page-outs
                 begin

        - page 

          Report information about swapping, page faults, and paging
             activity

          Reported in units per second (averaged over last 5 seconds)

             si - procs swap-ins
             so - procs swap-outs (not due to idle)
             pi - kilobytes per second paged in
             po - kilobytes per second paged out
             fr - kilobytes freed per second
             de - anticipated short term memory shortfall in Kbytes
             sr - pages scanned by clock algorithm, per-second

        - disk

          Report number of disk operations per second.

        - faults 

          Report trap/interrupt rate averages per second  over
          last 5 seconds.
          
             in - (non clock) device interrupts per second
             sy - system calls per second
             cs - CPU context switch rate (switches/sec)

        - cpu  

          Give a breakdown of percentage usage of CPU time.
       
             us - user time for normal and low priority processes
             sy - system time
             id - CPU idle

        - we are most concerned with swap-outs and page-outs

procs   memory              page               disk       faults     cpu
r b w avm   fre  si so  pi  po  fr  de  sr d0 d1 d2 d3  in  sy  cs us sy id

0 0 0   0  2508  20  0   0   0   0   0   0 13  0  0  0 226 216 350  7  6 87
0 0 0   0  2280   0  0  16   0   0   0   0  3  0  0  0 258 361 343  5  8 87
0 0 0   0  2104  21  0 124  56 184   0 111  5  0  0  0 545 667 563 14 16 70
0 0 0   0  2120   0  0  36  12  60   0  37  0  0  0  0 338 387 345  3  5 92
0 0 0   0  2076   0  0  12   0  28   0  23  1  0  0  0 263 271 370  3  4 92
0 1 0   0  2048   5  0   0   0  44  16  33  1  0  0  0 320 473 497  6  9 85
8 1 0   0  2116  10  0   0   0 100   0  56 23  0  0  0 514 377 898 14 14 72
0 0 0   0  2084   5  0  24  16 148   0  67  6  0  0  0 350 424 529  9 10 81