COP4610: Operating Systems & Concurrent
Programming
|
up↑
|
Coding Standards & Practices
These notes expand on the coding standards and practices
outlined in the Study Guide.
Please adhere to thse for any code you write in this course,
unless given specific instructions to the contrary.
Code must be indented, and corresponding syntactic elements aligned,
according to a consistent set of rules that reflect the nesting of sytactic
structures and promote readabiity. You should also follow a consistent
convention regarding the uses of upper- vs. lower-case letters, and underscores,
in identifiers for macros, functions, types, parameters, variables, etc.
If you are updating a provided file
you should maintain the conventions established by the author of the file.
For new files, choose an appropriate convention. For example, you may follow the
conventions of the Linux kernel (
https://www.kernel.org/doc/Documentation/CodingStyle), or Dr. R.C.
Lacher's coding style used in prerequisite courses (
http://www.cs.fsu.edu/~lacher/courses/DOCS/codestandards.html).
Comments should be used to enhance understanding, by providing
information that cannot be easily extracted from the code alone.
Specifically, the following forms of comments are
required:
- Every source file should have a block comment at the beginning,
containing at least the name of the file, date created, date last
updated, author(s), and a brief description of the file contents
(including how they relate to the larger application or system to which
they belong). A copyright and licensing statement may be used as well,
typically the last item in the header documentation.
- For each global data structure, at the point where the corresponding
structor
typedeffirst appears, an explanation of the abstraction it
implements (e.g., a linear null-terminated linked list, a circular doubly
linked list, a a hash table with re-hashing, etc.) This often include
"invariant" properties of the data structure, such as null-termination,
which must be preserved by every piece of code that operates on it. For
concurrent programs, this includes the mechanism or conventions that are
used toensure mutual exclusion and prevent deadlock. Write these
before you write the functions that implement algorithms on the
structure.
- For each function, at the point where the function prototype first
appears, an explanation of:
- A short explanation of what the function does, and how the parameters
affect that, if it is not obvious from their names (and what is obvious to
you might not be so obvious to others, or to yourself a few months or years later).
- An assumptions the function makes about the values of its parameters, beyond
that conveyed by the types and modes of the parameters, and about
global variables or files upon which it depends for effect. These
areoften called the "preconditions" for calling the function.
- Guarantees it provides about the value returned from the function,
and changes i makes to global variables and files, if the assumptions
above are satisfied. These are often called the "postconditions" for
the function.
- If the function can fail, the convention on how failure is
reported.
Write these comments before you write the function implementation,
in the header file, and update them as necessary after you have completed
the implementation.
Do not clutter your code with line-by-line comments that simply
restate in English what the code already expresses. Reserve local comments
for situations where the code is doing something that is not obvious.
Do write the comments as you go, and keep your comments up-to-date.
Misleading out-of-date comments are worse than no comments.
Debugging, trace, and error message output is a necessity, but it should
never be mixed into the same output file or stream as the normal correct
output of a program. In particular, a program that fails should not corrupt
any file as a side-effect of error messages, nor should debugging/trace
output change the effect of a program on the files that it normally is
expected to produce (thereby causing tests to fail).
- Error messages should normally be sent to the standard error stream
sdterr, or to a special log file (e.g., see the
syslog() facility in Linux, not to
stdout.
- Debugging, trace, or other forms of logging output should be
controllable, as to the level of verbosity (or total silence), via
environment variable and/or command-line parameter.
- Debugging code should generally be designed into a program, and
retained for maintenance. (Removing debugging code for delivery is a frequent
cause of other errors, and re-inserting debugging code during bug-fixing
is another source of errors, as well as a waste of time.)
If overhead is of concern, conditional
compilation directives
#ifdef DEBUG ...should be used. Comments should never be used to
disable any code, debugging or otherwise.
- Use standard error-reporting and logging mechaisms, like
perror(),
strerror(), and
syslog() where appropriate.
For this course, program files must be in a form that can be compiled, read, and printed
under the Unix operating system.
- The single character LF (CTR-J) (alone) is used to indicate the an end of
line, and the file should end with such a new-line character.
- The code should not contain any tabs, nulls, or other nonprintable
(formatting) characters, or any blanks at the ends of lines.
- The character encoding should be 8-bit ASCII or UTF-8. Avoid 16-bit encodings or
variable-length encodings, like Unicode.
- No line should contain more than 80 characters.
Take care that you do not use a Windows/DOS editor to edit program
files. Windows/DOS uses two characters (^M^J) to indicate an end of line.
The extra character (^M) will prevent your program from compiling under
Unix. Take care not to process code with a word processing editor or e-mail
tool that inserts blanks, tabs, or other "whitespace" characters at the
ends of lines. Do not try to send source code in e-mail using a
Windows-based mail agent; they are known to insert line breaks in long
lines. In C-language macro definitions, adding extra whitespace at the end
of a line can cause compilation errors. Likewise, breaking a line can cause
syntax errors. The instructor has no recent analogous experience with
Macintosh systems, but common sense dictates that there are likely to be
similar pitfalls. To avoid such problems, you should do all of your editing
of program code for this course on a Unix/Linux system, using either the
emacsor
vieditor. You may upload and download C/C++ source files to your
personal system for backup, but you should probably not try to modify them
there unless you are very savy about avoiding the above kinds of
problems.
C source code files should be divided into two types:
- Header files, whose name ends with the suffix "
.h". These may include the following:
- documentation for the file
-
#include, macro/symbol definitions, and conditional
compilation directives
- constant, type, and structure definitions
- function prototypes
Always protect header files from multiple read using the convention
#ifndef _FILENAME_H
#define _FILENAME_H
...
#endif
Always use angle brackets for include files:
#include <myfile.h> // OK - location of file is unspecified
#include "myfile.h" // NOT OK - location of file is hard coded (relative)
#include "/directory/myfile.h" // NOT OK - location of file is hard coded
(absolute)
The reason: angle brackets allow for the included file to
be movable without editing the file in which they are included. Quotes
force an edit of #include statement whenever the relative locations of the
includee and includor are changed or the absolute path of the includee is
changed. It is much better to resolve these issues in the build record
(makefile).
- Implementation files, whose name ends with the suffix "
.c". These may include the following, in this order:
- file header documentation
-
#include directives
- constants
- locally used function prototypes (if needed)
- function implementations
Regardless of how an assignment is submitted, your instructor will
specify a file naming convention that will allow your submitted work to be
easily identified, among different assignments that you and other students
submit for the course. It is essential that you follow the file naming
convention for the assignment, or else your work may not be graded. For
example, if the assignment says you are to name a file "prog1.c" and you
name it "program1.c" it may not be graded.
The following are some rules that I have found lead to more robust code.
This is not exactly a matter of style, but more a matter of sound programming
practice. Read about additional rules in the notes
on secure coding.
- Always check the results of all functions that can fail and return an
error code, and handle the failure case in a safe way. Examples include
malloc(), which returns
NULL upon failure, and
fork(), which returns the value -1 upon failure.
- Explicitly initialize all variables, including all components of
structures.
- Always check for possible array/buffer overflows, and handle
violations in a safe way.
- Make no assumptions about the length and syntactic structure of
inputs.
- Make no assumptions about the vaidity of command-line arguments to
progams.
- Beware of dependencies on
environment variables, including system calls whose effect can
be modified by environment variables, which are implicit parameters to
the program. For example, avoid calls to
system(), and whenever using
execve() verify both the security of the executable file and the
environment variable values that are passed to it.
- Make no assumptions about the length (in bytes) of any data type. Use
strlen() and
sizeof() where appropriate, but with care not to confuse pointers
with objects pointed to.
- Take care to avoid the possibility of
free() being called more than once on the same object.
- Take extreme care with pointer type conversions, including uses of
void * (which is required by many operating system API calls),
that the pointer actually points to a valid value of the target
type.
- Compile with warnings turned on, and pay attention to the warnings.
In general, enable the
gcc warning options including "
-Wall -Wextra -pedantic". There should be no warnings, with the
exception of some specific cases allowed by the assignment (e.g., use of
gcc-specific extensions for uses of macros from the Linux kernel header
list.h).
- Whenever a function makes assumptions about its parameters, document
them, especially where the function does not (or cannot) check its
parameters for validity.
- Beware of the potential effects of signals, which can be generated
for and delivered to a progam from outside at any time.
- Write error checking and recovery code in a layered systematic way,
checking for errors "outside in", and recovering "inside out" (unwinding
initializations and recovering resources). You may use
goto(only) to implement a set of nested error recovery actions,
similar to exception handlers in other languages, as practiced in the
Linux kernel code.
- Program command-line parameter and environment variable errors
should be caught at start-up
- Failures in module initialization code should be caught within the
module, and generally cause program termination
- Error recovery code should ensure that any resources not local to
the process (e.g., objects in the filesystem namespace) are recovered
and restored to a valid state
- No error condition or failure should be entirely ignored. I have
found it helpful to recognize three classes of errors, which need to be
treated differently:
- Fatal errors, from which no safe recovery is possible. These
require termination of the program. After cleaning up any persistent
objecs (e.g., files) to a valid state, call
exit() with an appropriate exit status value (positive) that
indicates failure to the parent process. Depending on the nature of the
failure, it may also be appropriate to issue a message to a system log
file or the standard error stream, e.g., through a call to
perror().
- Failure of a function for which there is a convention regarding
return values that covers failure cases. In this case, the return value
of the function should be the appropriate failure code. The model is
analogous to C-library and system calls, which generally return 0 upon
success, and some other value if they fail.
- Errors from which local recovery is possible in a way
that allows correct continuation of the rest of the
program. The error should still be logged, but execution
may proceed, at least up to some predetermined point where
further progress becomes impossible. An example of such
errors would be error messages produced by a compiler for
syntax errors. One would expect the compiler to continue
execution through the end of the parsing phase, but not
produce executable code. Another example would be an HTTP
Web server, which aborts service of a request if the URL
is ill-formed, logs the failure, and returns to state
where it is ready for the next request.
As explained in the Study Guide
learning to write portable code is one of the objectives of this course.
Portability is generally achieved through adherence to widely supported
standards, and avoiding dependence on implementation-specific features of
the execution platform, compiler, libraries, and operating system.
Severl specific rules are given in the Study Guide for this course.
In addition, please consider the following principles whenever you code.
Be conservative in your choice of standards. Many people are using
old versions of operating systems, and old compilers, that probably are
not completely up-to-date with the most recent standards. Even the most
recent release of
gcc(at the time of this writing) did not completely support the
most recent C language standard (C99), and many Linux systems are running
older versions of
gcc. For example, at the time of this writing, the version of gcc
on the program servers was behind the version on the linprog servers. So, a person concerned with portability, even across Linux
systems, may want to avoid writing code that depends on new features
introduced by C99. You can generally control which version of the
language a compiler checks for, for example the
gccoption
--std=c90specifies the C90 standard.
The same applies to libraries. The Unix/POSIX operating system
service library functions interfaces are even implemented by Microsofts' Windows operating
systems. When compiling, pay attention to correct use of appropriate
feature-test macro definitions (
e.g.
#define _XOPEN_SOURCE) to enforce standard-compliant versions of
header files are used.
Beware that the POSIX and Open Group standards, like programming
language standards, go through revisions and a given implementation
may not support the latest standard. Be careful
about man-pages. They are generally specific to one OS version, and may
mislead you with respect to what behavior is supported by POSIX. The Open
Group has harmonized its Unix Standards to be consistent with the POSIX
standard, and you can obtain access to the official Unix/POSIX man-pages
from The Open Group's website for free, by signing up. Generally avoid
usage that is specified as having "implemenation-defined" behavior.
For shell script portability stick to the syntax of the standard
shshell, which is a subset of that supported by the
bashshell, and begin the file with the indication of which shell
should execute it, i.e.,
#/bin/sh.
For makefiles and scripts used by other utilities such
as awk, stick to the portable POSIX syntax, or at least
verify that they work on both Linux and SunOS.