COMPUTER AND NETWORK SYSTEM ADMINISTRATION Summer 1996 - Lesson 12 Adding Disks, File Systems A The UNIX file system 1. file systems - file systems reside on mass storage media such as disks > can reside on other media (RAM) - each disk is divided into one or more subdivisions called partitions - each partition may contain only one filesystem - the file system abstraction: array of bytes ---> array of logical blocks > translates the user view of a file as an array of bytes to the underlying structure of an array of logical blocks plus an offset > the file system reads and writes to logical blocks > can't address anything smaller - a logical block consists of one or more physical blocks - the device driver logical blocks --> physical blocks > map logical blocks to physical blocks on the disk - the disk controller: > a physical block consists of one or more contiguous sectors physical blocks ---> cylinder+head+sector 2. directories - a directory is allocated in units called chunks - a chunk consists of a series of directory entries - each entry contains: a. the i-node number of the file, b. the size of the directory entry, c. the length of the filename, and d. the file name - fields (b) and (c) are used merely for keeping track of space in the chunk itself - the important fields are the i-node to name mapping 3. i-nodes - index nodes or i-nodes contain information about one particular file - the number of i-nodes is fixed and determined when the file system is created (try "df -i") - the fields are dependent on type of file that the i-node references > socket, directory, regular file - 128 bytes long (SunOS 4.x) > 8 per 1K, if average file size is 1K, what percentage of file system is taken up with i-nodes (11%) > using UNIX defaults the i-nodes typically take up 3-5% of the file system - see /usr/include/ufs/inode.h for full structure - fields are: + type of file and access mode (drwxr-xr-x) + file's owner (uid) + group-access identifier (group) + number of references to the file (hard links) + time of last access and modification + size of file in bytes + direct pointers (12 - 48 Kbytes if 4K blocks are used) + indirect pointers (points to a block of direct pointers) + double-indirect pointers + triple-indirect pointers - notably missing is the file name! > a file may have many names 4. file system layout - bootblock (not really part of the file system) - superblock(s): contains (BSD-style superblock) + total number of blocks in fs + number of i-node blocks in fs + total number of data blocks in fs + number of cylinder groups + size of basic blocks in fs + pointers to cylinder group blocks can be calculated > from cylinder group size, offset into cylinder group > in fact, given an i-node number one can calculate which cylinder group it belongs to + logical block size + lots more! + see /usr/include/ufs/fs.h for full structure > superblock is replicated to protect against catastophic loss > at least one in each cylinder group - cylinder group blocks + number of cyl's this cg + number of inode blocks this cg + number of data blocks this cg + free block map - i-nodes + allocated within each cylinder group - data blocks + allocated within each cylinder group B. BSD Fast File System 1. Increased block size - block size was increased to be multiples of 4096 bytes vs. the old 1024-byte blocks > however, a uniformly large block size would waste space since many UNIX files are small 2. Fragments > try to get the best of both worlds > large block size and little wasted space - write file in complete blocks except for last remainder which is written into a fragment - fragments may be break a block into 2, 4, or 8 pieces > each fragment is addressable - note that a fragment may not span blocks - to force the allocation routine to limit the number of fragments only direct blocks may refer to fragments (first 48K of a file) - indirect blocks must be full blocks 3. Cylinder groups - increase locality of reference - locate i-nodes close to their associated data blocks - keep a copy of the superblock in each cylinder group 4. global allocation strategy - localize inodes - keep inodes for files in a directory in the same cylinder group - subdirectory entries are place in a cylinder group that has the most available free inodes > when you create a subdirectory - localize data blocks for each file > place all data blocks for a single file in the same cylinder group - put in rotationally optimal positions > may be thwarted by zone sectoring as mentioned in book - keep cylinder groups from getting full > if file exceeds size of direct pointers move to new cylinder group > SunOS 4.x i-node has 12 direct pointers (at 8K each) > move every 1 Mbyte thereafter 5. local allocation strategy - when the global allocator requests a block the local allocator services the request - allocate the requested block if it is available - otherwise use the next available block that is rotationally closest to the requested block (in the same cylinder but perhaps a different platter) - if none is available then use a block within the same cylinder group - if cg has < 10% free space then find another cylinder group with a free block 6. File system parameterization (parameters for newfs command) - block-size > default on Sun is 8192 - the number of cylinders per cylinder group in a file system > The default is 16. - the fragment size of the file system in bytes > The default is 1024. - bytes/inode. > This specifies the density of inodes in the file system. > The default is to create an inode for each 2048 bytes of data space. > If fewer inodes are desired, a larger number should be used > to create more inodes a smaller number should be given. - reserved free space > the percentage of space reserved from normal users; the minimum free space threshold. The default is 10%. - optimization (space or time) > The file system can either be instructed to try to minimize the time spent allocating blocks > or to try to minimize the space fragmentation on the disk. > If the minimum free space threshold (as specified by the -m option) is less than 10%, the default is to optimize for space; > if the minimum free space threshold is greater than or equal to 10%, the default is to optimize for time. 7. Performance - multiply number of bytes per track time RPS to get upper bound on disk bandwidth (18,432 x 60 = 1.1 Mbyte per second) - write test is "time cp /vmunix /var/tmp" - example: vmunix = 1,698,406 bytes, timed at 3.8sec = 446,948 BPS or 40% of bandwidth) 8. Some questions - The original Berkeley fast file system mentions that someday soft links may be able to span machines. Has this day come? answer: yes, NFS - Why aren't hard links permitted to span file systems? answer: a hard link is a name-inode pair and an inode is only unique within a file system, a symbolic link is a name-name pair and can follow the directory graph to any file system - Why is the per cylinder information placed at varying offsets from the beginning of each cylinder group? answer: if the information was kept at the beginning of the cylinder group all of the info such as superblock copies would be on the same platter and could be wiped out by a single hardware failure. The info is offset about one additional track into each cylinder group so that the info spirals down into the disk - Why did doubling the block size in the old UNIX file system from 512 to 1024 bytes MORE than double the file system performance? answer: The performance doubled because each disk transfer accessed twice as much data. An additional speedup occurred because many files no longer needed indirect blocks. - What are the trade-offs between dedicating more memory to the buffer cache versus dedicating it to the virtual memory system? answer: dedicating it to the buffer cache improves the hit rate of disk access and I/O throughput improves; dedicating it to the virtual memory system means that more text pages will be in memory, which reduces the paging load, which allows the I/O bandwidth to be used for other disk requests, in addition the system is more responsive; more buffers for a file server, and more VM for a compute server? NOTE: Some more modern UNIXes combine the disk buffer cache and virtual memory, thus eliminating this schism (AIX, for example) - Why do you think that inodes are allocated statically at the time of file system creation rather than at the time of file creation? answer: to make access to i-nodes fast answer: to make fsck operate in less than a century; fsck scans the directory structure and then scans the inodes which comprise 3-5% of the disk space (best to have them in known locations) C. Disk installation 1. connection - look at boot display (dmesg) - dkinfo ("dkinfo sd0" on SunOS 4.x) - fdisk (Linux) - df 2. creating device files - see if devices exist: ls -lg /dev/*sd3* - create devices: MAKEDEV sd3 3. Low-level formatting, if needed 4. labelling and partitioning - Use "format" command 5. creating UNIX file system - newfs /dev/rsd3? 6. verify integrity of file system - fsck /dev/rsd3? 7. set up automatic mounting - add to /etc/fstab - type mount -a or, to just mount the single filesystem mount /new_mountpoint_name