Chapter 3

see Concepts Introduced in Chapter 3

We don't have time to go into a lot of detail. Assembly used to be an entire class by itself. We will only go over a subset of the MIPS assembly language.

see Instructions for a Machine

You have programmed in a high-level language (e.g. C++). The machine cannot execute high-level language statements directly. A machine can only execute machine instructions, which are simply a sequence of bits. Assembly instructions are just a human readable form of machine instructions. To command computer hardware, you have to know the language of the machine. The words in this language are called machine instructions and the vocabulary is the instruction set.

see Figure 3.21: A translation hierarchy

We will discuss this again later. The main point to see here is that there is an automatic process of translating a high-level language program to instructions that the machine can execute.

see Goals When Designing an Instruction Set

  1. The instruction set affects the complexity of the hardware since the hardware must perform the operations specified in the instructions. Machine languages in general are pretty similar because there are a set of basic operations that all machines must provide. Simple instructions results in simpler hardware.
  2. The compiler is affected since it must determine how to translate the high-level language program into assembly (machine) instructions. Fewer and regular instructions results in a simpler compiler.
  3. We will see later that the set of instructions can affect both performance and cost.

see Memory

The address of an instruction or a variable in memory indicates the offset from the beginning of memory to the first byte of that instruction or variable. Having a single memory for both data and instructions is often referred to as a von Neumann machine. We will discuss this more later. When we discuss the compilation process in more detail, we will see how instructions and data are laid out in memory. We will also see later how addresses of instructions and data can be formulated in machine instructions.

see Registers

Access to registers is much faster than main memory since registers are much closer to the portion of the processor that manipulates the data. But we will see there are only a limited number due to how instructions are encoded.

see Instruction Set Design Principles

see Who Programs in Assembly Language?

Much fewer people program in assembly as compared to those that program in high-level languages.

see Why Learn Assembly Language?

If it is unlikely that you will program in assembly language, then why learn it? To understand instruction sets, one must understand the assembly language that represents the instructions. It is useful to understand how high-level language constructs are actually implemented since it means you will be more likely to use them efficiently and correctly.

see MIPS Introduction

MIPS actually used to stand for Machine without Interlocked Pipeline Stages. The later MIPS had interlocked pipeline stages. You will learn what that means later. Hennessy led the effort to create a RISC machine (evolved into the MIPS) at Stanford. Patterson led the effort to create a RISC machine (evolved into the SPARC) at Berkeley. The development of these machines had a tremendous impact on the field of computer architecture.

see General Classes of MIPS Assembly Instructions

There are 4 general classes. Sometimes arithmetic and logical instructions are grouped together.

see MIPS Instruction Operands

  1. We will see later why the size of constants are limited in instructions and how larger constants are constructed.
  2. We will see that $0 is hardwired to zero to reduce the number of instructions needed. $0-$31 contain integer values. $f0-$f31 contain floating-point values. Having only 32 integer registers and 32 floating-point registers supports the design principle of smaller is faster. A large number of registers would increase the clock cycle time since the signals would have to travel farther.
  3. The amount of memory is limited by the size of a word that can contain an address. Note that later we will see how memory is actually limited.

see General Form of a MIPS Assembly Language Program

Unlike most high-level languages, assembly languages only allow one declaration and one instruction per line. I am leaving out some details here, but this is enough for writing simple programs in MIPS assembly.

see General Form of a MIPS Instruction

Each instruction has to be on a separate line. Labels and comments are optional.

see General Forms of a MIPS Arithmetic or Logical Instruction

There are two general forms, one with all register operands and one with a constant as one of the sources. Note that we use the same form for a shift instruction when the shift amount is a constant or a register. Note that the destination register is the first register specified. In fact, the order of the operands corresponds to the order you would see in an assignment statement. Requiring every arithmetic/logical instruction to have 3 operands keeps the hardware simple and supports the principle simplicity favors regularity. Note that the u in addu and subu indicate that the add and subtract operations will not cause exceptions if the operation overflows (magnitude of the value is too large).

see General Form of a MIPS Data Transfer Instruction

Note that lw and sw are transfering 4 bytes, while lb and sb are transfering only 1 byte. The address is simply the sum of the value of the register plus the offset. We will discuss the range of values that the offset can take later. Note that in the sw and sb instructions the destination is not the leftmost operand.

see Example of Using Arithmetic/Logical and Data Transfer Instructions

Note that some registers are used to hold values loaded from memory. Registers can also be used to hold temporary results. Note the reuse of $6 to hold the temporary result since the value loaded in $6 isn't used any more. The lw instruction is used instead of lb since the four variables are integers and not characters.

see Another Example Using Arithmetic/Logical and Data Transfer Instructions

a[2] is located at an offset of 8 from the beginning of a since each element is 4 bytes and the first element is a[0]. The sll instruction is used to determine the ith offset in the array a. Note that a[i] is not accessed directly in one instruction. The address must be calculated first. Note << 2 == * 4. We will discuss this more later. Compilers use shifts instead of multiplies when possible since shifts can be performed faster. Note arrays and structs are rarely allocated to registers, while scalars are often associated with registers.

see Memory Access Issues

The x's in the figure means 1's or 0's and the 0's mean that the specified bit must be clear. For instance, a word has to be aligned on a 4 byte boundary and must thus be an integer multiple of 4. The reason for alignment requirements is that it results in simpler hardware and a slightly faster cycle time when these special checks do not have to be handled. The computer relies on the compiler to ensure that memory references are properly aligned.

see Memory Access Issues (cont.)

Big and Little Endian got its name from Gulliver Travels about the big and little end of an egg. It really doesn't matter in terms of performance. Byte ordering is only an issue when transferring data between machines.

see General Form of MIPS Transfer of Control Instructions

slt and slti are used to make comparions. They set a register to 0 or 1 and these can be tested by the beq and bne instructions. The example bne instruction compares $5 to $0. Remember that $0 is hardwired to always have the value of zero. The j instruction is an unconditional transfer of control to a specified label. It is used to implement gotos and other control constructs. The jr instruction is similar, but the target address is in a register. The jr instruction is called an indirect jump and is often used to implement return and switch statements. The jal instruction is called a jump and link instruction, which is used to implement a function call. It sets the $31 register to the current address + 4 and makes an unconditional transfer of control to the specified address. $31 contains the return address or the address of the instruction after the jal instruction. So now you can see that the jr $31 instruction is used to implement returns.

see Translating an If-Then-Else Statement into MIPS Assembly Instructions

The slt instruction checks if i < j and the beq comparison with $0 checks if that condition is not true, which means the branch jumps when i >= j. Remember that $0 is hardwired to always have the value zero. Note the use of the j unconditional jump to jump over the else portion. With just the slt, beq, and bne, one can construct all six different relational operations (==, !=, <, <=, >=, >).

see Translating a For Statement into MIPS Assembly Instructions

Note two different ways of setting a register to zero. Calculating the address of a[i] takes two instructions in this example. When $5 is >= 100, the branch will not jump.

see Instruction Formats

Note each instruction format is 32 bits (the size of a word). Having all instructions the same length simplifies knowing where to fetch the next instruction. Having all instructions use the same format simplifies decoding the instructions. Having only 3 different instruction formats with all formats being the same length supports the design principle: Good design demands good compromises. With 6 bits for the opcode, we can have 2^6 or 64 different instructions. The funct field allows different opcodes when a R-format instruction is used, which always has a regular opcode of 0. Each register is encoded in 5 bits so we can reference registers $0-$31. The address/immediate field allows values to range from -2^15..2^15-1.

see Figure 3.37: The MIPS instruction set covered so far.

This table shows the different formats used for each type of instruction.

see Examples of Encoding Instructions

This slide shows three examples of instruction encoding, one for each instruction format. The funct field distinguishes different types of R-format instructions. The shamt field is only used for shift instructions. Registers (rs, rt, rd) are encoded by their number. The address/immediate and target address fields are encoded as binary values directly. The jump address is stored as a word address instead of a byte address since instructions are aligned on a word boundary. Note 26 bits in the target address allows at most 2**26 total instructions (over 67 million) in a program, which is typically enough.

see Encoding the Branch Displacement

A branch displacement is used since branch targets are within a function and are typically pretty close. We will see why we use a displacement from the instruction after the branch instead of the branch instruction itself when we cover Chapter 5. If the target was the next instruction (the add), then the branch offset would be 0. Note that the branch offset is in words, not bytes, since all instructions are 4 bytes in length.

see lui Instruction

Global addresses require 32 bits on the MIPS. We can't address them directly in a single instruction, so we have to construct them using more than one instruction. We are limited to constants containing 16 bits. So large constants have to be constructed as well. Note that we cannot left shift a constant directly. Having the lui instruction saves one instruction (loading a constant into a register to be left shifted). Why use ori instead of addi? The ori is a logical operation, so the constant won't be sign-extended.

see Figure 3.17: Illustration of the five MIPS addressing modes

  1. Immediate addressing: Just means that the value is within the instruction itself. It is used for arithmetic operations.
  2. Register addressing: Just means that the value is in the specified register.
  3. Base addressing: This is called base displacement. It means that the value of a register is added to a sign-extended displacement. It is used for load and store instructions.
  4. PC-relative addressing: This is used for branches. The address is just a displacement that can be positive or negative. If negative, then it is a backward branch. The displacement is in instructions (words), not bytes, to allow a greater branching distance. The displacement starts from PC+4 since PC has already been incremented by that point.
  5. Pseudodirect addressing: This is for calls and jumps. The ":" means concatenation. It just means that the high portion of the PC is unchanged. The most significant 4 bits of the PC are unchanged. The least significant 2 bits of the PC are zero since instructions are word aligned. That leaves the direct address as 26 bits.

see Figure 3.7: The stored-program concept

So we now know that instructions can be viewed as data. This data can be loaded into memory and the computer can be instructed to start executing the instructions at a specified location in memory. They can be read in and stored in memory, which simplifies the memory hardware since it can be used for both data and code. Later, these instructions can be executed. This is also known as a von Neumann machine. Note that the main point is that both machine code to be executed and data for multiple programs can be simultaneously stored in memory and this allows the computer to easily switch between applications.

see Figure 3.21: A translation hierarchy.

Note that there is a step missing called the C Preprocessor. Actually when you compile, you are running more than just the compiler. Sometimes a compiler will produce object code directly. An assembler translates the symbolic assembly instructions to machine instructions. The linker allows separate compilation. It resolves external addresses and sets up the executable.

see Figure 3.22: The MIPS memory allocation for program and data

Actually, this is the layout of the virtual memory for a process. Instructions go in the text area. Global and static variables go in the static data area. Local variables typically go on the stack. Dynamically allocated space goes on the heap (dynamic data) area. Only the stack and the heap change in size, though there is a maximum for each.

see Figure 3.12: Illustration of the stack allocation (a) before, (b) during, and (c) after the procedure call.

The stack grows from high addresses to low addresses. The area for allocating all of this information for a procedure is called an activation record. The activation record is allocated (registers adjusted) when the procedure is entered and deallocated (registers adjusted) when the procedure is exited. Some registers have to be saved on the stack on entry to the procedure and restored from the stack on exit from the procedure in case the caller needs those registers to retained across the function call. Local variables not allocated to registers are placed on the stack. The run-time stack is used for activation records in C since the language supports recursion (multiple instantiations of the same function). Use of the stack reduces the space accessed and is why you can't retain the values of local (nonstatic) variables between invocations in C.

see Figure 3.13: MIPS register convention.

Each machine has a calling sequence convention. The reason is due to separate compilation. For instance, the run-time library routines abide by certain conventions, such as which arguments are passed in which registers or which registers will be preserved across calls. The compiler or assembly writer must know these conventions so the calling conventions won't be violated. Typically all compiled routines abide by the same conventions.

see Fallacies and Pitfalls

  1. Fallacy: More powerful instructions mean higher performance. This was the whole start of the CISC/RISC controversy. CISC stands for Complex Instruction Set Computer. RISC stands for Reduced Instruction Set Computer. CISC lost. There are lots of reasons, which include better pipelining and caches. We will discuss this in more detail in Chapters 6 and 7.
  2. Fallacy: Write in assembly language to obtain the highest performance. Optimizing compilers can outperform assembly writers now. They are also a lot quicker.
  3. Pitfall: Forgetting that sequential word addresses in machines with byte addressing do not differ by one. Almost all machines are byte addressable to support efficient character processing. It is confusing in C or C++ with pointer arithmetic.
  4. Pitfall: Using a pointer to an automatic variable outside its defining procedure. I have done this. If you return the address of a variable from a function, then that variable better be static.