Chapter 6: Enhancing Performance with Pipelining
1: Concepts Introduced in Chapter 6
2: Figure 6.1: The laundry analogy for pipelining.
3: Pipelining
4: Speedup from Pipelining
5: Figure 6.2: Total time for eight instructions calculated from the time for each component.
6: Figure 6.3: Single-cycle, nonpipelined execution in top vs. pipelined execution in bottom.
7: MIPS Designed for Pipelining
8: Pipeline Terms
9: Structural Hazards
10: Control Hazards
11: Figure 6.4: Pipeline showing stalling on every conditional branch as solution to control hazards.
12: Figure 6.5: Predicting that branches are not taken as a solution to the control hazard.
13: Branch Prediction by the Compiler
14: Figure 6.6: Pipeline delayed branch as solution to control hazard.
15: Data Hazards
16: Figure 6.8: Graphical representation of forwarding.
17: Figure 6.9: We need a stall even with forwarding when an R-format instruction following a load tries to use the data.
18: Figure 6.9 Shown in a Traditional Pipeline Diagram
19: An Example Pipeline Diagram
20: Instruction Scheduling
21: Figure 6.10: The single-cycle datapath from Chapter 5 (similar to Figure 5.17 on page 358).
22: Figure 6.11: Instructions being executed in the single-cycle datapath in Figure 6.10, assuming pipelined execution.
23: Figure 6.12: The pipelined version of the datapath in Figure 6.10.
24: Figure 6.13: IF and ID: first and second pipe stages of an instruction, with the active portions of the datapath in Figure 6.12 highlighted.
25: Figure 6.14: EX: the third pipe stage of a load instruction, highlighting the portions of the datapath in Figure 6.12 used in this pipe stage.
26: Figure 6.15: MEM and WB: the fourth and fifth pipe stages of a load instruction, highlighting the portions of the datapath in Figure 6.12 used in this pipe stage.
27: Figure 6.16: EX: the third pipe stage of a store instruction.
28: Figure 6.17: MEM and WB: the fourth and fifth pipe stages of store instruction.
29: Figure 6.18: The corrected pipelined datapath to properly handle the load instruction.
30: Figure 6.19: The portion of the datapath in Figure 6.18 that is used in all five stages of a load instruction.
31: Figure 6.20: Multiple-clock-cycle pipeline diagram of two instructions.
32: Figure 6.21: Traditional multiple-clock-cycle pipeline diagram of two instructions in Figure 6.20.
33: Figure 6.22: Single-cycle pipeline diagrams for clock cycles 1 (top diagram) and 2 (bottom diagram).
34: Figure 6.23: Single-cycle pipeline diagrams for clock cycles 3 (top diagram) and 4 (bottom diagram).
35: Figure 6.24: Single-cycle pipeline diagrams for clock cycles 5 (top diagram) and 6 (bottom diagram).
36: Figure 6.28: The values of the control lines are the same as in Figure 5.20 on page 361, but they have been shuffled into three groups corresponding to the last three pipeline stages.
37: Figure 6.29: The control lines for the final three stages.
38: Figure 6.30: The pipelined datapath of Figure 6.19, with the control signals connected to the control portions of the pipelined register.
39: Figure 6.31: Clock cycles 1 and 2.
40: Figure 6.32: Clock cycles 3 and 4.
41: Figure 6.33: Clock cycles 5 and 6.
42: Figure 6.34: Clock cycles 7 and 8.
43: Figure 6.35: Clock cycle 9.
44: Figure 6.36: Pipelined dependencies in a five-instruction sequence using simplified datapaths to show the dependencies.
45: Figure 6.37: The dependencies between the pipeline registers move forward in time, so ...
46: Figure 6.38: On the top are the ALU and pipeline registers before adding forwarding. On the bottom, the multiplexors have been expanded to add the forwarding paths, and we show the forwarding unit.
47: Figure 6.39: The control values for the forwarding multiplexors in Figure 6.38.
48: Forwarding Logic
49: Figure 6.40: The datapath modified to resolve hazards via forwarding.
50: Figure 6.41: Clock cycles 3 and 4 of the instruction sequence in the example on page 485.
51: Figure 6.42: Clock cycles 5 and 6 of the instruction sequence in the example on page 485.
52: Figure 6.43: A close-up of the datapath in Figure 6.38 on page 482 shows a 2:1 multiplexor, which has been added to select the signed immediate as an ALU input.
53: Figure 6.44: A pipelined sequence of instructions.
54: Hazard Logic
55: Figure 6.45: The way stalls are really inserted into the pipeline.
56: Figure 6.46: Pipelined control overview, showing the two multiplexors for forwarding, the hazard detection unit, and the forwarding unit.
57: Figure 6.47: Clock cycles 2 and 3 of the instruction sequence in the example on page 485 with a load replacing sub.
58: Figure 6.48: Clock cycles 4 and 5 of the instruction sequence in the example on page 485 with a load replacing sub.
59: Figure 6.49: Clock cycles 6 and 7 of the instruction sequence in the example on page 485 with a load replacing sub.
60: Figure 6.50: The impact of the pipeline on the branch instruction.
61: Figure 6.51: Datapath for branch, including hardware to flush the instruction that follows the branch.
62: Figure 6.52: The ID stage of clock cycle 3 determines that a branch must be taken, so it selects 72 as the next PC address and zeros the instruction fetched for the next clock cycle.
63: Branch Preceded by a Comparison
64: 1-bit Branch Prediction Buffer
65: 1-bit Branch Prediction Buffer Example
66: Figure 6.53: The states in a 2-bit prediction scheme.
67: Figure 6.54: Scheduling the branch delay slot.
68: Handling Exceptions
69: Figure 6.55: The datapath with controls to handle exceptions.
70: Figure 6.56: Event in the result of an exception due to arithmetic overflow in the add instruction.
71: Precise Exceptions
72: Faster Scalar Processors
73: Superpipeline Example
74: Figure 6.57: Superscalar pipeline in operation.
75: Figure 6.58: A superscalar datapath.
76: The Scheduled Code As It Would Look on a Superscalar MIPS
77: Loop Unrolling
78: Example on Page 513 after Loop Unrolling
79: Example on Page 513 after Adjusting Offsets
80: Example on Page 513 after Renaming Registers
81: Example on Page 513 after Avoiding Load Hazards
82: The Unrolled and Scheduled Code of Figure 6.59 As It Would Look on a Superscalar MIPS.
83: The Three Primary Units of a Dynamically Scheduled Pipeline.
84: Figure 6.65: The final datapath and control for this chapter.
85: Fallacies and Pitfalls
86: Figure 6.64: The depth of pipelining versus the speedup obtained.
87: Figure 6.66: The performance consequences of simple (single-cycle) datapath and multicycle datapath from Chapter 5 and the pipelined execution model in Chapter 6.
88: Figure 6.67: The basic relationship between the datapaths in Figure 6.66.