Summer 1996 - Lesson 07 The Network File System - NFS A. Introduction - What was life like before NFS? - built on top of: UDP - User Datagram Protocol (unreliable delivery) XDR - eXternal Data Representation (machine independent data format) RPC - Remote Procedure Call 1. NFS is both a set of specifications and an implementation 2. The protocol specifications are independent of architecture and operating system 3. two protocols - mount protocol and NFS protocols - mount protocol establishes initial link between client and server machines - NFS protocols provide a set of RPCs for remote file operations > searching a directory > reading a set of directory entries > manipulating links and directories > accessing file attributes > read and writing files > notably missing are open() and close() > there is no equivalent to UNIX file tables on the server side > each request must provide full set of arguments including a unique file identifier and offset 4. problems - performance (even with UDP) > modified data may be cached locally on the client > once the cache flushes to server the data must be written to disk before results are returned to the client and the cache is flushed > the benefits of server caching are lost - semantics > UNIX semantics (without NFS) and session semantics (ala Andrew File System) > NFS claimed to implement UNIX semantics > UNIX semantics (without NFS) + writes to an open file are visible immediately to other users who have the file open at the same time + the file is viewed as a single resource > Session semantics (ala Andrew file system) + writes to an open file are not visible to others having it open at the same time + once a file is closed the changes are visible only in the sessions opened later > NFS claimed to implement UNIX semantics + there are two client caches: file blocks and file attributes + cached attributes are validated with server on an open() + the biod implements read-ahead and delayed-write techniques + newly created files may not be visible to other sites for up to 30 seconds + it is indeterminate whether writes to a file will be immediately seen by other clients who have the file open for reading > example - touch file on xi - ls on delta - rm file on xi - ls on delta - If a single NFS stat() request hangs, it can hang up UNIX commands, like "df"! - "magic cookies" (random numbers) used to short-cut future validations. Given to client from server, client can use it to re-connect whenever a server comes back up after a crash. --> can be spoofed <-- Note that "stale cookies" (yuck) can make a client hang (solution: remount the filesystem on the client to make it get a new, fresh cookie). B. Server 1. mountd - Sun's UNIX implementation of the mount protocol - SunOS 4.x reads /etc/exports - uses "exportfs" to have mountd reload table ("exportfs -a") - example: xi:/etc/exports / -ro,access=lpdaemon:lpdaemon2,root=mu /usr -ro,access=lpdaemon:lpdaemon2,root=mu /real/cs25 -access=lpdaemon:lpdaemon2:majorslab,root=mu:nu:tau /real/cs26 -access=lpdaemon:lpdaemon2:majorslab,root=mu:nu - SunOS 5.x reads /etc/dfs/dfstab - uses "share" to have mountd reload table (see Table 17.4, p. 371) - example: export:/etc/dfs/dfstab share -F nfs -o ro,root=nu:mu / share -F nfs -o ro,root=nu:mu /usr share -F nfs -o rw=lpdaemon:lpdaemon2:majorslab,root=nu:mu: /real/cs13 share -F nfs -o rw=lpdaemon:lpdaemon2:dad,root=nu:mu: /real/cs14 share -F nfs -o rw=lpdaemon:lpdaemon2:,root=nu:mu: /real/cs15 share -F nfs -o rw=lpdaemon:lpdaemon2,root=nu:mu:beta:chi\ :epsilon:kill:rho:sigma:socket:exec:sync /real/cs16 - Linux (Slackware, at least) uses /etc/exports and "kill -HUP" to mountd. Linux (apparently) provides "NFS multiplying" -- NFS serving of an NFS mounted file system. - Table 17.1, 17.2, and 17.3 give further implementation specifics. 2. nfsd - handles requests for NFS file service - very small, basically turn around and call kernel - system tuning - See Table 17.5, page 372 - Nemeth says 10 on a dedicated file server - Loukides says leave it at 4 (performance tuning book) - he says the kernel inode table and file table size are more important (an NFS server has more open files) C. Client side 1. extended "mount" command, accepts "host:path" syntax for NFS filesystems - /etc/fstab in SunOS 4.x - example: /dev/sd0a / 4.2 rw 1 1 /dev/sd0g /usr 4.2 rw 1 2 -- Where's the remote file systems? Done via automounter (see below) - /etc/vfstab in SunOS 5.x - example: #device device mount FS fsck mount #to mount to fsck point type pass boot #----------------------------------------------------------------------- /proc - /proc proc - no fd - /dev/fd fd - no swap - /tmp tmpfs - yes /dev/dsk/c0t3d0s0 /dev/rdsk/c0t3d0s0 / ufs 1 no /dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr ufs 2 no /dev/dsk/c0t3d0s5 /dev/rdsk/c0t3d0s5 /opt ufs 5 yes /dev/dsk/c0t3d0s1 - - swap - no - Type "mount" to see currently mounted file systems - example: /dev/sd0a on / type 4.2 (rw) /dev/sd0g on /usr type 4.2 (rw) mount:/real/cs4 on /tmp_mnt/home/cs4 type nfs (rw,suid,hard,intr) mount:/real/cs5 on /tmp_mnt/home/cs5 type nfs (rw,nosuid,hard,intr) access:/real/cs23 on /tmp_mnt/home/cs23 type nfs (rw,nosuid,hard,intr) 2. NFS service is provided in kernel - transparent to user 3. biod - provides read-ahead and write-behind caching - another tuning issue D. Administering NFS 1. user must have account on file server or access rights can't be checked (can default to user "nobody"). 2. for example: Majors labs DOS users must have accounts on xi, sed, mount, export, access, pi, upsilon 3. In CompSci, the artificial shells setup keeps them from logging into the file servers and running up the load 4. must keep UIDs and GIDs consistent across machines 5. don't mount outside of local net 6. write performance is an issue - consider dedicated non-volatile NFS cache cards - or spread out the load on more small user disks E. auto-mounting 1. Sun's "automount" daemon (used on CompSci network) - nice to keep one NIS automount map instead of ~50 /etc/fstab maps Operation (using CompSci mappings): - the autmounter appears to the kernel to be an NFS server - automount uses its maps to locate a real NFS file server - it then mounts the file system in a temporary location - and creates a symbolic link to the temporary location - If the file system is not accessed within an appropriate interval (five minutes by default), the daemon unmounts the file system and removes the symbolic link - if the indicated directory has not already been created, the daemon creates it, and then removes it upon exiting. - this is different from a regular mount for which the mount point must already exist - example (somewhat convoluted) configuration maps: - auto.master (available via a NIS file; "ypcat -k auto.master") /home auto.home # an indirect map all rooted at "/home" /- auto.direct # "/-" means a direct map /net -hosts -rw,nosuid,hard,intr # "-host" means use # NIS "host.byname" to look # up the hostname; will # mount any permissible # NFS server on "/net/..." - auto.direct ("ypcat -k auto.direct") Path mount() options actual location ---- --------------- --------------- /nu0 -rw,nosuid,hard,intr sync:/real/nu0 /nu1 -rw,suid,hard,intr sync:/real/nu1 /nu2 -rw,suid,hard,intr sync:/real/nu2 /var/spool/mail -rw,nosuid,hard,intr nu:/usr/spool/realmail - auto.home ("ypcat -k auto.home") Path mount() options actual location ---- --------------- --------------- s5 -rw,nosuid,hard,intr psi:/s5 s6 -rw,nosuid,hard,intr psi:/s6 cs4 -rw,suid,hard,intr mount:/real/cs4 cs5 -rw,nosuid,hard,intr mount:/real/cs5 cs6 -rw,nosuid,hard,intr mount:/real/cs6 cs7 -rw,nosuid,hard,intr mount:/real/cs7 cs8 -rw,nosuid,hard,intr mount:/real/cs8 cs9 -rw,suid,hard,intr mount:/real/cs9 cs10 -rw,nosuid,hard,intr mount:/real/cs10 cs11 -rw,suid,hard,intr mount:/real/cs11 . . . cs38 -rw,nosuid,hard,intr pi:/real/cs38 2. "amd" - Public domain automounter from Jan-Simon Pendry's doctoral thesis (used at SCRI) - new features; more flexible - irritating features of the Sun implementation were improved > amd does not hang if a remote file system goes down > amd attempts to mount a replacement file system if and when they become available - amd automatically unmounts (via "keep-alive") - Interesting list of mount types (Table 17.7, page 380) - non-blocking operation - amd maps can be just as convoluted! F. Security Don't export to hosts for which non-trusted users have root access. If you don't control root on the machine then don't export the file system. Block NFS UDP traffic at your router, if possible. G. tuning NFS nfsstat -c to see client side --------------------------------------------------------------------- Client rpc: calls badcalls retrans badxid timeout wait newcred timers 3175986 1991 0 1232 1991 0 0 5330 Client nfs: calls badcalls nclget nclsleep 3173970 0 3173970 0 getattr setattr root lookup readlink read 192650 6% 49 0% 0 0% 831671 26% 2059211 64% 78054 2% write create remove rename link symlink 140 0% 124 0% 50 0% 3 0% 7 0% 0 0% mkdir rmdir readdir fsstat 0 0% 0 0% 940 0% 11071 0% > what does the client spend most of its time doing? - reading links and looking up information about files - percentage of writes is low (don't need NFS server card?) - is timing out some but isn't having to retransmit - badxid: received a reply for which there is no outstanding call - timeout: a call timed out - badxid and timeouts are roughly equal, but are only .0006 of all calls - if timeouts or retransmissions were high, say > 5% then we want to know why - if badxid ~= timeout then server is too slow (and is dropping packets) - if badxid << timeout then go get your network analyzer because packets are getting lost on the net due to some other hardware problem tuning with mount command: rsize=n Set the read buffer size to n bytes. wsize=n Set the write buffer size to n bytes. timeo=n Set the NFS timeout to n tenths of a second. retrans=n The number of NFS retransmissions. --------------------------------------------------------------------- nfsstat -s Server rpc: calls badcalls nullrecv badlen xdrcall 82414467 0 0 0 0 Server nfs: calls badcalls 82414467 264 null getattr setattr root lookup 82760 0% 36039746 43% 217061 0% 0 0% 27784077 33% readlink read 287401 0% 6382386 7% wrcache write create remove rename 0 0% 2130913 2% 397712 0% 184138 0% 31848 0% link symlink mkdir rmdir readdir fsstat 10468 0% 1062 0% 4461 0% 4616 0% 8807761 10% 48057 0% > what does the server spend most of its time doing? - getting attributes and performing lookups (for ls -l?) - its a good thing that attributes are cached on the client side (using biod) H. Beyond NFS o AFS - Andrew File System, from CMU and Transarc Corp. - Much better authentication (Kerberos) - 8 inch high stack of installation books! - Adds new file system type to kernel - Addresses more than just file system semantics, also user authentication, etc. - Large local client-side disk cache improves performance o DFS - Distributed File System from OSF - "successor" to AFS; AFS-like - Beginning to show up in most vendor's UNIX implementations - Major part of DCE (Distributed Computing Environment)