FSU

At the system level: Building blocks

I want to take some time to talk about the fundamental toolset that most programs that system administrators work with are built over.

The most important of these are the system calls. When we run strace to see exactly what a process is doing, we are watching this fundamental interaction between a program and its requests to the operating system, usually for access to resources controlled by the operating system.

Instructions on how to find your current set of Linux system calls.

Some thoughts on proliferating file descriptor mediation are here.

Building blocks for Unix tools

A Unix system call is a direct request to the kernel regarding a system resource. It might be a request for a file descriptor to manipulate a file, it might be a request to write to a file descriptor, or any of hundreds of possible operations.

These are exactly the tools that every Unix program is built upon.

File descriptor and file descriptor operations

In some sense, the mainstay operations are those on the file system.

File descriptor and file descriptor operations

Unlike many other resources which are just artifacts of the operating system and disappear at each reboot, changing a file system generally is an operation that has some permanence. [Of course it is possible and even common to create ``RAM'' disk filesystems since they are quite fast and for items that are meant to be temporary, they are quite acceptable. (For instance, as you might do when setting up MailScanner, for instance, in /var/spool/incoming.)]

Important file descriptor calls

A file descriptor is an int. It provides stateful access to an i/o resource such as a file on a filesystem, a pseudo-terminal, or a socket to a tcp session.

open()    -- create a new file descriptor to access a file
close()   -- deallocate a file descriptor

Important file descriptor calls


dup()     -- duplicate a file descriptor
dup2()    -- improved way to duplicate a file descriptor -- you can
          -- choose the new file descriptor number
recvmsg() -- it is possible for a process to pass to another process
sendmsg() -- file descriptors using recvmsg() and sendmsg() over Unix
          -- sockets; the new file descriptors are treated as if they had
          -- created by dup()

Important file descriptor calls


fchmod()  -- change the permissions of a file associated with a file 
          -- descriptor
fchown()  -- change the ownership of a file associated with a file
fchdir()  -- change the working directory for a process via fd

Important file descriptor calls


fcntl()   -- miscellaneous manipulation of file descriptors: dup(), set 
          -- close on exec(), set to non-blocking, set to asynchronous 
          -- mode, locks, signals 
ioctl()   -- manipulate the underlying ``device'' parameters for a file
          -- descriptor

Important file descriptor calls


flock()   -- lock a file associated with a file descriptor

Important file descriptor calls


pipe()    -- create a one-way association between two file 
          -- descriptors so that output from
          -- one goes to the input of the other

Important file descriptor calls


select()  -- multiplex on pending i/o to or from a set of 
poll()    -- file descriptors
epoll()   -- 

Important file descriptor calls


read()     -- send data to a file descriptor
readv()    -- send data to a file descriptor
send()     -- send data to a file descriptor
sendto()   -- send data to a file descriptor
sendmsg()  -- send data to a file descriptor
write()    -- take data from a file descriptor
writev()   -- take data from a file descriptor
recv()     -- take data from a file descriptor
recvfrom() -- take data from a file descriptor
recvmsg()  -- take data from a file descriptor
fsync()   -- forces a flush for a file descriptor

Important file descriptor calls


readdir() -- raw read of directory entry from a file descriptor

Important file descriptor calls


fstat()   -- return information about a file associated with a fd: inode, 
             perms, hard links, uid, gid, size, modtimes
fstatfs() -- return the mount information for the filesystem that the file
          -- descriptor is associated with

Important filesystem operations

In addition to using the indirect means of file descriptors, Unix also offers a number of direct functions on files.
access()  -- returns a value indicating if a file is accessible
chmod()   -- changes the permissions on a file in a filesystem
chown()   -- changes the ownership of a file in a filesystem

Important filesystem operations


link()    -- create a hard link to a file
symlink() -- create a soft link to a file 

Important filesystem operations


mkdir()   -- create a new directory
rmdir()   -- remove a directory

Important filesystem operations


stat()    -- return information about a file associated with a pathname: inode, 
             perms, hard links, uid, gid, size, modtimes
statfs()  -- return the mount information for the filesystem that the 
          -- pathname is associated with

Signals

alarm       -- set an alarm clock for a SIGALRM to be sent to a process 
            -- time measured in seconds
getitimer   -- set an alarm clock in fractions of a second to deliver either
            -- SIGALRM, SIGVTALRM, SIGPROF

Signals

kill        -- send an arbitrary signal to an arbitrary process
killpg      -- send an arbitrary signal to all processes in a process group

Signals

sigaction   -- interpose a signal handler (can include special ``default'' or 
            -- ``ignore'' handlers)
sigprocmask -- change the list of blocked signals

Signals

wait        -- check for a signal (can be blocking or non-blocking) or child exiting
waitpid     -- check for a signal from a child process (can be general or specific)

Modifying the current process's state

chdir       -- change the working directory for a process to dirname
chroot      -- change the root filesystem for a process

Modifying the current process's state


execve      -- execute another binary in this current process
fork        -- create a new child process running the same binary
clone       -- allows the child to share execution context (unlike fork(2))
exit        -- terminate the current process

Modifying the current process's state


getdtablesize  -- report how many file descriptors this process can have
               -- active simultaneously

Modifying the current process's state

getgid      -- return the group id of this process
getuid      -- return the user id of this process
getpgid     -- return process group id of this process
getpgrp     -- return process group's group of this process

Modifying the current process's state

getpid      -- return the process id of this process
getppid     -- return parent process id of this process
getrlimit   -- gets a resource limit on this process (core size, cpu time, 
            -- data size, stack size, and others)
getrusage   -- find amount of resource usage by this process

Modifying the current process's state

nice()          -- change the calling process's priority
setpriority()   -- arbitrarily change any process's (or group or user) priority
getpriority()   -- get any process's priorities

Communications and Networking

socket      -- create a file descriptor (can be either network or local)

bind        -- bind a file descriptor to an address, such a tcp port
listen      -- specify willingness for some number of connections to be 
            -- blocked waiting on accept()
accept      -- tell a file descriptor block until there is a new connection

connect     -- actively connect to listen()ing socket

setsockopt  -- set options on a given socket associated with fd, such out-of-band
            -- data, keep-alive information, congestion notification, final timeout,
            -- and so forth (see man tcp(7)); also allows user credentials to be
            -- passed with each invocation of recvmsg() on a Unix socket
getsockopt  -- retrieve information about options enabled for a given connection from fd
            -- also allows user credentials to be retrieved for a given Unix socket

getpeername -- retrieve information about other side of a connection from fd
getsockname -- retrieve information this side of a connection from fd

Others

brk         -- allocate memory for the data segment for the 
            -- current process

gethostname  -- gets a ``canonical hostname'' for the machine
sethostname  -- sets a ``canonical hostname'' for the machine

gettimeofday -- gets the time of day for the running kernel
settimeofday -- sets the time of day for the running kernel

mount        -- attaches a filesystem to a directory and makes it available
sync         -- flushes all filesystem buffers, forcing changed blocks to
             -- ``drives'' and updates superblocks
futex        -- raw locking (lets a process block waiting on a change 
                to a specific memory location)
sysinfo      -- provides direct access from the kernel to:
                    load average
                    total ram for system
                    available ram
                    amount of shared memory existing
                    amount of memory used by buffers
                    total swap space
                    swap space available
                    number of processes currently in proctable

SYS V IPC

msgctl       -- SYS V messaging control (uid, gid, perms, size)
msgget       -- SYS V message queue creation/access
msgrcv       -- receive a SYS V message
msgsnd       -- send a SYS V message

shmat        -- attach memory location to SYS V shared memory segment
shmctl       -- SYS V shared memory contrl (uid, gid, perms, size, etc)
shmget       -- SYS V shared memory creation/access
shmdt        -- detach from SYS V shared memory segment

Linux-specific, inotify

One of the more interesting developments in the last few years is the idea of improving our ability to detect changes in a filesystem. The most recent incarnation of this idea is the inotify system. It can monitor either files or entire directories for events (but it is not recursive — you would have to monitor each directory separately). Interestingly enough, you can use select to monitor the monitoring... ;-)

The interface consists of three new system calls, and also re-uses our old friends read(2) and close(2):

Here's an example program using inotify: inotify_test.c.

Linux-specific, signalfd

Another interesting development in the last few years is the idea of replacing the old asynchronous signal system with queued, synchronously delivered signals.

This has been implemented with signalfd system. It sets up, like inotify, a queue that can be read with read(2), but instead of file system events, we pick up signal events.

The interface consists of just one new system call, and also re-uses our old friends read(2) and close(2):

Here's an example program using inotify: signalfd-test.c.

Linux-specific, memory over the life of a process

A program to demonstrate stages of memory formation in a linux process with no data segment and two mmap(2) anonymous memory mappings

A program to demonstrate stages of memory formation in a more typical linux process with a data segment and two mmap(2) anonymous memory mappings.