Chapter 3
see Concepts Introduced in Chapter 3
We don't have time to go into a lot of detail.
Assembly used to be an entire class by itself.
We will only go over a subset of the MIPS assembly language.
see Instructions for a Machine
You have programmed in a high-level language (e.g. C++).
The machine cannot execute high-level language statements directly.
A machine can only execute machine instructions, which are simply
a sequence of bits.
Assembly instructions are just a human readable form of machine
instructions.
To command computer hardware, you have to know the language of
the machine.
The words in this language are called machine instructions and the
vocabulary is the instruction set.
see Figure 3.21: A translation hierarchy
We will discuss this again later.
The main point to see here is that there is an automatic process
of translating a high-level language program to instructions that
the machine can execute.
see Goals When Designing an Instruction Set
-
The instruction set affects the complexity of the hardware since the
hardware must perform the operations specified in the instructions.
Machine languages in general are pretty similar because there
are a set of basic operations that all machines must provide.
Simple instructions results in simpler hardware.
-
The compiler is affected since it must determine how to translate
the high-level language program into assembly (machine) instructions.
Fewer and regular instructions results in a simpler compiler.
-
We will see later that the set of instructions can affect both
performance and cost.
see Memory
The address of an instruction or a variable in memory indicates
the offset from the beginning of memory to the first byte of that
instruction or variable.
Having a single memory for both data and instructions is often
referred to as a von Neumann machine.
We will discuss this more later.
When we discuss the compilation process in more detail, we will
see how instructions and data are laid out in memory.
We will also see later how addresses of instructions and data can
be formulated in machine instructions.
see Registers
Access to registers is much faster than main memory since registers
are much closer to the portion of the processor that manipulates
the data.
But we will see there are only a limited number due to how
instructions are encoded.
see Instruction Set Design Principles
-
Principle 1: Simplicity favors regularity.
We will see that this simplifies both the hardware (decoding of instructions,
pipelining) and the compiler (fewer special cases).
-
Principle 2: Smaller is faster.
The operands of arithmetic instructions must be in registers
(or perhaps one constant).
Registers on the MIPS contain 32 bits and there are 32 registers.
Using a much larger number of registers would increase the cycle
time.
(It would also be difficult to encode in an instruction set.)
In general, less distance means less time (propagation delay).
-
Principle 3: Good design demands good compromises.
Having all instructions the same length and a reasonable size
means having to construct addresses and some constants in two
instructions.
-
Principle 4: Make the common case fast.
Simpler instructions are more common than the need for complex operations.
So make simple instructions fast and accomplish complex operations as a
series of simple instructions.
see Who Programs in Assembly Language?
Much fewer people program in assembly as compared to those that
program in high-level languages.
see Why Learn Assembly Language?
If it is unlikely that you will program in assembly language, then
why learn it?
To understand instruction sets, one must understand the assembly
language that represents the instructions.
It is useful to understand how high-level language constructs
are actually implemented since it means you will be more likely
to use them efficiently and correctly.
see MIPS Introduction
MIPS actually used to stand for Machine without Interlocked Pipeline
Stages. The later MIPS had interlocked pipeline stages. You will
learn what that means later.
Hennessy led the effort to create a RISC machine (evolved into the MIPS)
at Stanford.
Patterson led the effort to create a RISC machine (evolved into the SPARC)
at Berkeley.
The development of these machines had a tremendous impact on the field
of computer architecture.
see General Classes of MIPS Assembly Instructions
There are 4 general classes.
Sometimes arithmetic and logical instructions are grouped together.
see MIPS Instruction Operands
-
We will see later why the size of constants are limited in instructions
and how larger constants are constructed.
-
We will see that $0 is hardwired to zero to reduce the number of instructions
needed.
$0-$31 contain integer values.
$f0-$f31 contain floating-point values.
Having only 32 integer registers and 32 floating-point registers supports
the design principle of smaller is faster.
A large number of registers would increase the clock cycle time since
the signals would have to travel farther.
-
The amount of memory is limited by the size of a word that can contain
an address.
Note that later we will see how memory is actually limited.
see General Form of a MIPS Assembly Language Program
Unlike most high-level languages, assembly languages only allow
one declaration and one instruction per line.
I am leaving out some details here, but this is enough for writing
simple programs in MIPS assembly.
see General Form of a MIPS Instruction
Each instruction has to be on a separate line.
Labels and comments are optional.
see General Forms of a MIPS Arithmetic or Logical Instruction
There are two general forms, one with all register operands and one
with a constant as one of the sources.
Note that we use the same form for a shift instruction when the
shift amount is a constant or a register.
Note that the destination register is the first register specified.
In fact, the order of the operands corresponds to the order you
would see in an assignment statement.
Requiring every arithmetic/logical instruction to have 3 operands
keeps the hardware simple and supports the principle simplicity
favors regularity.
Note that the u
in
addu
and subu
indicate that the add and subtract operations will not cause exceptions if
the operation overflows (magnitude of the value is too large).
see General Form of a MIPS Data Transfer Instruction
Note that lw
and
sw
are transfering 4 bytes, while
lb
and sb
are
transfering only 1 byte.
The address is simply the sum of the value of the register plus
the offset.
We will discuss the range of values that the offset can take
later.
Note that in the sw
and
sb
instructions the
destination is not the leftmost operand.
see Example of Using Arithmetic/Logical and Data Transfer Instructions
Note that some registers are used to hold values loaded from memory.
Registers can also be used to hold temporary results.
Note the reuse of $6
to hold the temporary
result since the value loaded in $6
isn't
used any more.
The lw
instruction is used instead of
lb
since the four
variables are integers and not characters.
see Another Example Using Arithmetic/Logical and Data Transfer Instructions
a[2]
is located at an offset of 8 from the
beginning of a
since each element is 4 bytes
and the first element is a[0]
.
The sll
instruction is used to determine the
i
th offset in
the array a
.
Note that a[i]
is not accessed directly in
one instruction.
The address must be calculated first.
Note << 2 == * 4.
We will discuss this more later.
Compilers use shifts instead of multiplies when possible since shifts can
be performed faster.
Note arrays and structs are rarely allocated to registers, while scalars
are often associated with registers.
see Memory Access Issues
The x's in the figure means 1's or 0's and the 0's mean that the
specified bit must be clear.
For instance, a word has to be aligned on a 4 byte boundary and
must thus be an integer multiple of 4.
The reason for alignment requirements is that it results in simpler
hardware and a slightly faster cycle time when these special checks
do not have to be handled.
The computer relies on the compiler to ensure that memory references are
properly aligned.
see Memory Access Issues (cont.)
Big and Little Endian got its name from Gulliver Travels about the big and
little end of an egg.
It really doesn't matter in terms of performance.
Byte ordering is only an issue when transferring data between machines.
see General Form of MIPS Transfer of Control Instructions
slt
and slti
are
used to make comparions.
They set a register to 0 or 1 and these can be tested by the
beq
and bne
instructions.
The example bne
instruction compares
$5
to $0
.
Remember that $0
is hardwired to always have
the value of zero.
The j
instruction is an unconditional
transfer of control to a specified label.
It is used to implement goto
s and other
control constructs.
The jr
instruction is similar, but the target
address is in a register.
The jr
instruction is called an indirect jump
and is often used to implement return and switch statements.
The jal
instruction is called a jump and link
instruction, which is used to implement a function call.
It sets the $31
register to the current
address + 4 and makes an unconditional transfer of control to the specified
address.
$31
contains the return address or the
address of the instruction after the jal
instruction.
So now you can see that the jr $31
instruction
is used to implement returns.
see Translating an If-Then-Else Statement into MIPS Assembly Instructions
The slt
instruction checks if
i
< j
and the
beq
comparison with
$0
checks if that condition is not true, which
means the branch jumps when
i
>= j
.
Remember that $0
is hardwired to always have
the value zero.
Note the use of the j
unconditional jump to
jump over the else portion.
With just the slt
,
beq
, and bne
,
one can construct all six different relational operations
(==, !=, <, <=, >=, >).
see Translating a For Statement into MIPS Assembly Instructions
Note two different ways of setting a register to zero.
Calculating the address of a[i]
takes two instructions in this
example.
When $5
is >= 100, the branch will not jump.
see Instruction Formats
Note each instruction format is 32 bits (the size of a word).
Having all instructions the same length simplifies knowing where to
fetch the next instruction.
Having all instructions use the same format simplifies decoding the
instructions.
Having only 3 different instruction formats with all formats being
the same length supports the design principle: Good design demands
good compromises.
With 6 bits for the opcode, we can have 2^6 or 64 different
instructions.
The funct field allows different opcodes when a R-format
instruction is used, which always has a regular opcode of 0.
Each register is encoded in 5 bits so we can reference registers
$0
-$31
.
The address/immediate field allows values to range from
-2^15..2^15-1.
see Figure 3.37: The MIPS instruction set covered so far.
This table shows the different formats used for each type of instruction.
see Examples of Encoding Instructions
This slide shows three examples of instruction encoding, one for each
instruction format.
The funct field distinguishes different types of R-format
instructions.
The shamt field is only used for shift instructions.
Registers (rs, rt, rd) are encoded by their
number.
The address/immediate and target address fields are
encoded as binary values directly.
The jump address is stored as a word address instead of a byte address
since instructions are aligned on a word boundary.
Note 26 bits in the target address allows at most 2**26 total instructions
(over 67 million) in a program, which is typically enough.
see Encoding the Branch Displacement
A branch displacement is used since branch targets are within a function
and are typically pretty close.
We will see why we use a displacement from the instruction after the
branch instead of the branch instruction itself when we cover Chapter 5.
If the target was the next instruction (the add), then the branch offset
would be 0.
Note that the branch offset is in words, not bytes, since all instructions
are 4 bytes in length.
see lui Instruction
Global addresses require 32 bits on the MIPS.
We can't address them directly in a single instruction, so we
have to construct them using more than one instruction.
We are limited to constants containing 16 bits.
So large constants have to be constructed as well.
Note that we cannot left shift a constant directly.
Having the lui
instruction saves one
instruction (loading a constant into a register to be left shifted).
Why use ori
instead of
addi
?
The ori
is a logical operation, so the
constant won't be sign-extended.
see Figure 3.17: Illustration of the five MIPS addressing modes
-
Immediate addressing: Just means that the value is within the instruction
itself. It is used for arithmetic operations.
-
Register addressing: Just means that the value is in the specified register.
-
Base addressing: This is called base displacement.
It means that the value of a register is added to a sign-extended
displacement.
It is used for load and store instructions.
-
PC-relative addressing: This is used for branches.
The address is just a displacement that can be positive or negative.
If negative, then it is a backward branch.
The displacement is in instructions (words), not bytes, to allow a greater
branching distance.
The displacement starts from PC+4 since PC has already been incremented by
that point.
-
Pseudodirect addressing: This is for calls and jumps.
The ":" means concatenation.
It just means that the high portion of the PC is unchanged.
The most significant 4 bits of the PC are unchanged.
The least significant 2 bits of the PC are zero since instructions are word
aligned.
That leaves the direct address as 26 bits.
see Figure 3.7: The stored-program concept
So we now know that instructions can be viewed as data.
This data can be loaded into memory and the computer can be
instructed to start executing the instructions at a specified location
in memory.
They can be read in and stored in memory, which simplifies the memory
hardware since it can be used for both data and code.
Later, these instructions can be executed.
This is also known as a von Neumann machine.
Note that the main point is that both machine code to be executed and
data for multiple programs can be simultaneously stored in memory and
this allows the computer to easily switch between applications.
see Figure 3.21: A translation hierarchy.
Note that there is a step missing called the C Preprocessor.
Actually when you compile, you are running more than just the compiler.
Sometimes a compiler will produce object code directly.
An assembler translates the symbolic assembly instructions to machine
instructions.
The linker allows separate compilation.
It resolves external addresses and sets up the executable.
see Figure 3.22: The MIPS memory allocation for program and data
Actually, this is the layout of the virtual memory for a process.
Instructions go in the text area.
Global and static variables go in the static data area.
Local variables typically go on the stack.
Dynamically allocated space goes on the heap (dynamic data) area.
Only the stack and the heap change in size, though there is a maximum
for each.
see Figure 3.12: Illustration of the stack allocation (a) before,
(b) during, and (c) after the procedure call.
The stack grows from high addresses to low addresses.
The area for allocating all of this information for a procedure is
called an activation record.
The activation record is allocated (registers adjusted) when the procedure
is entered and deallocated (registers adjusted) when the procedure is
exited.
Some registers have to be saved on the stack on entry to the procedure
and restored from the stack on exit from the procedure in case the caller
needs those registers to retained across the function call.
Local variables not allocated to registers are placed on the stack.
The run-time stack is used for activation records in C since the language
supports recursion (multiple instantiations of the same function).
Use of the stack reduces the space accessed and is why you can't retain the
values of local (nonstatic) variables between invocations in C.
see Figure 3.13: MIPS register convention.
Each machine has a calling sequence convention.
The reason is due to separate compilation.
For instance, the run-time library routines abide by certain conventions,
such as which arguments are passed in which registers or which registers
will be preserved across calls.
The compiler or assembly writer must know these conventions so the calling
conventions won't be violated.
Typically all compiled routines abide by the same conventions.
see Fallacies and Pitfalls
-
Fallacy: More powerful instructions mean higher performance.
This was the whole start of the CISC/RISC controversy.
CISC stands for Complex Instruction Set Computer.
RISC stands for Reduced Instruction Set Computer.
CISC lost.
There are lots of reasons, which include better pipelining and caches.
We will discuss this in more detail in Chapters 6 and 7.
-
Fallacy: Write in assembly language to obtain the highest
performance.
Optimizing compilers can outperform assembly writers now.
They are also a lot quicker.
-
Pitfall: Forgetting that sequential word addresses in machines with
byte addressing do not differ by one.
Almost all machines are byte addressable to support efficient character
processing.
It is confusing in C or C++ with pointer arithmetic.
-
Pitfall: Using a pointer to an automatic variable outside its defining
procedure.
I have done this.
If you return the address of a variable from a function, then
that variable better be static.