COMPUTER AND NETWORK
SYSTEM ADMINISTRATION
Summer 1996 - Lesson 12
Adding Disks, File Systems
A The UNIX file system
1. file systems
- file systems reside on mass storage media such as disks
> can reside on other media (RAM)
- each disk is divided into one or more subdivisions called
partitions
- each partition may contain only one filesystem
- the file system abstraction:
array of bytes ---> array of logical blocks
> translates the user view of a file as an array of bytes
to the underlying structure of an array of logical blocks
plus an offset
> the file system reads and writes to logical blocks
> can't address anything smaller
- a logical block consists of one or more physical blocks
- the device driver
logical blocks --> physical blocks
> map logical blocks to physical blocks on the disk
- the disk controller:
> a physical block consists of one or more contiguous sectors
physical blocks ---> cylinder+head+sector
2. directories
- a directory is allocated in units called chunks
- a chunk consists of a series of directory entries
- each entry contains:
a. the i-node number of the file,
b. the size of the directory entry,
c. the length of the filename, and
d. the file name
- fields (b) and (c) are used merely for keeping track of
space in the chunk itself
- the important fields are the i-node to name mapping
3. i-nodes
- index nodes or i-nodes contain information about one
particular file
- the number of i-nodes is fixed and determined when the
file system is created (try "df -i")
- the fields are dependent on type of file that the i-node
references
> socket, directory, regular file
- 128 bytes long (SunOS 4.x)
> 8 per 1K, if average file size is 1K, what percentage
of file system is taken up with i-nodes (11%)
> using UNIX defaults the i-nodes typically take up 3-5%
of the file system
- see /usr/include/ufs/inode.h for full structure
- fields are:
+ type of file and access mode (drwxr-xr-x)
+ file's owner (uid)
+ group-access identifier (group)
+ number of references to the file (hard links)
+ time of last access and modification
+ size of file in bytes
+ direct pointers (12 - 48 Kbytes if 4K blocks are used)
+ indirect pointers (points to a block of direct pointers)
+ double-indirect pointers
+ triple-indirect pointers
- notably missing is the file name!
> a file may have many names
4. file system layout
- bootblock (not really part of the file system)
- superblock(s): contains (BSD-style superblock)
+ total number of blocks in fs
+ number of i-node blocks in fs
+ total number of data blocks in fs
+ number of cylinder groups
+ size of basic blocks in fs
+ pointers to cylinder group blocks can be calculated
> from cylinder group size, offset into cylinder group
> in fact, given an i-node number one can calculate which
cylinder group it belongs to
+ logical block size
+ lots more!
+ see /usr/include/ufs/fs.h for full structure
> superblock is replicated to protect against catastophic loss
> at least one in each cylinder group
- cylinder group blocks
+ number of cyl's this cg
+ number of inode blocks this cg
+ number of data blocks this cg
+ free block map
- i-nodes
+ allocated within each cylinder group
- data blocks
+ allocated within each cylinder group
B. BSD Fast File System
1. Increased block size
- block size was increased to be multiples of 4096 bytes vs. the
old 1024-byte blocks
> however, a uniformly large block size would waste space since many
UNIX files are small
2. Fragments
> try to get the best of both worlds
> large block size and little wasted space
- write file in complete blocks except for last remainder which is
written into a fragment
- fragments may be break a block into 2, 4, or 8 pieces
> each fragment is addressable
- note that a fragment may not span blocks
- to force the allocation routine to limit the number of fragments
only direct blocks may refer to fragments (first 48K of a file)
- indirect blocks must be full blocks
3. Cylinder groups
- increase locality of reference
- locate i-nodes close to their associated data blocks
- keep a copy of the superblock in each cylinder group
4. global allocation strategy
- localize inodes - keep inodes for files in a directory in the same
cylinder group
- subdirectory entries are place in a cylinder group that has the
most available free inodes
> when you create a subdirectory
- localize data blocks for each file
> place all data blocks for a single file in the same
cylinder group
- put in rotationally optimal positions
> may be thwarted by zone sectoring as mentioned in book
- keep cylinder groups from getting full
> if file exceeds size of direct pointers
move to new cylinder group
> SunOS 4.x i-node has 12 direct pointers (at 8K each)
> move every 1 Mbyte thereafter
5. local allocation strategy
- when the global allocator requests a block the local allocator
services the request
- allocate the requested block if it is available
- otherwise use the next available block that is rotationally
closest to the requested block (in the same cylinder but
perhaps a different platter)
- if none is available then use a block within the same cylinder
group
- if cg has < 10% free space then find another cylinder group
with a free block
6. File system parameterization (parameters for newfs command)
- block-size
> default on Sun is 8192
- the number of cylinders per cylinder group in a
file system
> The default is 16.
- the fragment size of the file system in bytes
> The default is 1024.
- bytes/inode.
> This specifies the density of inodes in the file
system.
> The default is to create an inode for
each 2048 bytes of data space.
> If fewer inodes are desired, a larger number should
be used
> to create more inodes a smaller number should be
given.
- reserved free space
> the percentage of space reserved from normal
users; the minimum free space threshold. The
default is 10%.
- optimization (space or time)
> The file system can either be
instructed to try to minimize the time spent allocating blocks
> or to try to minimize the space fragmentation on the disk.
> If the minimum free space threshold (as specified by the -m option) is
less than 10%, the default is to optimize for
space;
> if the minimum free space threshold is
greater than or equal to 10%, the default is to
optimize for time.
7. Performance
- multiply number of bytes per track time RPS to get upper bound on
disk bandwidth (18,432 x 60 = 1.1 Mbyte per second)
- write test is "time cp /vmunix /var/tmp"
- example: vmunix = 1,698,406 bytes, timed at 3.8sec = 446,948 BPS
or 40% of bandwidth)
8. Some questions
- The original Berkeley fast file system mentions that
someday soft links may be able to span machines.
Has this day come?
answer: yes, NFS
- Why aren't hard links permitted to span file systems?
answer:
a hard link is a name-inode pair and an inode is only unique within
a file system, a symbolic link is a name-name pair and can follow
the directory graph to any file system
- Why is the per cylinder information placed at varying offsets from
the beginning of each cylinder group?
answer: if the information was
kept at the beginning of the cylinder group all of the info such as
superblock copies would be on the same platter and could be wiped
out by a single hardware failure. The info is offset about one
additional track into each cylinder group so that the info spirals
down into the disk
- Why did doubling the block size in the old UNIX file system from 512
to 1024 bytes MORE than double the file system performance?
answer:
The performance doubled because each disk transfer accessed twice as
much data. An additional speedup occurred because many files no longer
needed indirect blocks.
- What are the trade-offs between dedicating more memory to the buffer
cache versus dedicating it to the virtual memory system?
answer:
dedicating it to the buffer cache improves the hit rate of disk access
and I/O throughput improves; dedicating it to the virtual memory system
means that more text pages will be in memory, which reduces the paging
load, which allows the I/O bandwidth to be used for other disk
requests, in addition the system is more responsive; more buffers
for a file server, and more VM for a compute server?
NOTE: Some more modern UNIXes combine the disk buffer cache and
virtual memory, thus eliminating this schism (AIX, for example)
- Why do you think that inodes are allocated statically at the time
of file system creation rather than at the time of file creation?
answer: to make access to i-nodes fast
answer: to make fsck operate in less than a century; fsck scans
the directory structure and then scans the inodes which comprise
3-5% of the disk space (best to have them in known locations)
C. Disk installation
1. connection
- look at boot display (dmesg)
- dkinfo ("dkinfo sd0" on SunOS 4.x)
- fdisk (Linux)
- df
2. creating device files
- see if devices exist: ls -lg /dev/*sd3*
- create devices: MAKEDEV sd3
3. Low-level formatting, if needed
4. labelling and partitioning
- Use "format" command
5. creating UNIX file system
- newfs /dev/rsd3?
6. verify integrity of file system
- fsck /dev/rsd3?
7. set up automatic mounting
- add to /etc/fstab
- type mount -a
or, to just mount the single filesystem
mount /new_mountpoint_name