Table of Contents for Chapter 6: Enhancing Performance with Pipelining

Chapter 6: Enhancing Performance with Pipelining

1: Concepts Introduced in Chapter 6

2: Figure 6.1: The laundry analogy for pipelining.

3: Pipelining

4: Speedup from Pipelining

5: Figure 6.2: Total time for eight instructions calculated from the time for each component.

6: Figure 6.3: Single-cycle, nonpipelined execution in top vs. pipelined execution in bottom.

7: MIPS Designed for Pipelining

8: Pipeline Terms

9: Structural Hazards

10: Control Hazards

11: Figure 6.4: Pipeline showing stalling on every conditional branch as solution to control hazards.

12: Figure 6.5: Predicting that branches are not taken as a solution to the control hazard.

13: Branch Prediction by the Compiler

14: Figure 6.6: Pipeline delayed branch as solution to control hazard.

15: Data Hazards

16: Figure 6.8: Graphical representation of forwarding.

17: Figure 6.9: We need a stall even with forwarding when an R-format instruction following a load tries to use the data.

18: Figure 6.9 Shown in a Traditional Pipeline Diagram

19: An Example Pipeline Diagram

20: Instruction Scheduling

21: Figure 6.10: The single-cycle datapath from Chapter 5 (similar to Figure 5.17 on page 358).

22: Figure 6.11: Instructions being executed in the single-cycle datapath in Figure 6.10, assuming pipelined execution.

23: Figure 6.12: The pipelined version of the datapath in Figure 6.10.

24: Figure 6.13: IF and ID: first and second pipe stages of an instruction, with the active portions of the datapath in Figure 6.12 highlighted.

25: Figure 6.14: EX: the third pipe stage of a load instruction, highlighting the portions of the datapath in Figure 6.12 used in this pipe stage.

26: Figure 6.15: MEM and WB: the fourth and fifth pipe stages of a load instruction, highlighting the portions of the datapath in Figure 6.12 used in this pipe stage.

27: Figure 6.16: EX: the third pipe stage of a store instruction.

28: Figure 6.17: MEM and WB: the fourth and fifth pipe stages of store instruction.

29: Figure 6.18: The corrected pipelined datapath to properly handle the load instruction.

30: Figure 6.19: The portion of the datapath in Figure 6.18 that is used in all five stages of a load instruction.

31: Figure 6.20: Multiple-clock-cycle pipeline diagram of two instructions.

32: Figure 6.21: Traditional multiple-clock-cycle pipeline diagram of two instructions in Figure 6.20.

33: Figure 6.22: Single-cycle pipeline diagrams for clock cycles 1 (top diagram) and 2 (bottom diagram).

34: Figure 6.23: Single-cycle pipeline diagrams for clock cycles 3 (top diagram) and 4 (bottom diagram).

35: Figure 6.24: Single-cycle pipeline diagrams for clock cycles 5 (top diagram) and 6 (bottom diagram).

36: Figure 6.28: The values of the control lines are the same as in Figure 5.20 on page 361, but they have been shuffled into three groups corresponding to the last three pipeline stages.

37: Figure 6.29: The control lines for the final three stages.

38: Figure 6.30: The pipelined datapath of Figure 6.19, with the control signals connected to the control portions of the pipelined register.

39: Figure 6.31: Clock cycles 1 and 2.

40: Figure 6.32: Clock cycles 3 and 4.

41: Figure 6.33: Clock cycles 5 and 6.

42: Figure 6.34: Clock cycles 7 and 8.

43: Figure 6.35: Clock cycle 9.

44: Figure 6.36: Pipelined dependencies in a five-instruction sequence using simplified datapaths to show the dependencies.

45: Figure 6.37: The dependencies between the pipeline registers move forward in time, so ...

46: Figure 6.38: On the top are the ALU and pipeline registers before adding forwarding. On the bottom, the multiplexors have been expanded to add the forwarding paths, and we show the forwarding unit.

47: Figure 6.39: The control values for the forwarding multiplexors in Figure 6.38.

48: Forwarding Logic

49: Figure 6.40: The datapath modified to resolve hazards via forwarding.

50: Figure 6.41: Clock cycles 3 and 4 of the instruction sequence in the example on page 485.

51: Figure 6.42: Clock cycles 5 and 6 of the instruction sequence in the example on page 485.

52: Figure 6.43: A close-up of the datapath in Figure 6.38 on page 482 shows a 2:1 multiplexor, which has been added to select the signed immediate as an ALU input.

53: Figure 6.44: A pipelined sequence of instructions.

54: Hazard Logic

55: Figure 6.45: The way stalls are really inserted into the pipeline.

56: Figure 6.46: Pipelined control overview, showing the two multiplexors for forwarding, the hazard detection unit, and the forwarding unit.

57: Figure 6.47: Clock cycles 2 and 3 of the instruction sequence in the example on page 485 with a load replacing sub.

58: Figure 6.48: Clock cycles 4 and 5 of the instruction sequence in the example on page 485 with a load replacing sub.

59: Figure 6.49: Clock cycles 6 and 7 of the instruction sequence in the example on page 485 with a load replacing sub.

60: Figure 6.50: The impact of the pipeline on the branch instruction.

61: Figure 6.51: Datapath for branch, including hardware to flush the instruction that follows the branch.

62: Figure 6.52: The ID stage of clock cycle 3 determines that a branch must be taken, so it selects 72 as the next PC address and zeros the instruction fetched for the next clock cycle.

63: Branch Preceded by a Comparison

64: 1-bit Branch Prediction Buffer

65: 1-bit Branch Prediction Buffer Example

66: Figure 6.53: The states in a 2-bit prediction scheme.

67: Figure 6.54: Scheduling the branch delay slot.

68: Handling Exceptions

69: Figure 6.55: The datapath with controls to handle exceptions.

70: Figure 6.56: Event in the result of an exception due to arithmetic overflow in the add instruction.

71: Precise Exceptions

72: Faster Scalar Processors

73: Superpipeline Example

74: Figure 6.57: Superscalar pipeline in operation.

75: Figure 6.58: A superscalar datapath.

76: The Scheduled Code As It Would Look on a Superscalar MIPS

77: Loop Unrolling

78: Example on Page 513 after Loop Unrolling

79: Example on Page 513 after Adjusting Offsets

80: Example on Page 513 after Renaming Registers

81: Example on Page 513 after Avoiding Load Hazards

82: The Unrolled and Scheduled Code of Figure 6.59 As It Would Look on a Superscalar MIPS.

83: The Three Primary Units of a Dynamically Scheduled Pipeline.

84: Figure 6.65: The final datapath and control for this chapter.

85: Fallacies and Pitfalls

86: Figure 6.64: The depth of pipelining versus the speedup obtained.

87: Figure 6.66: The performance consequences of simple (single-cycle) datapath and multicycle datapath from Chapter 5 and the pipelined execution model in Chapter 6.

88: Figure 6.67: The basic relationship between the datapaths in Figure 6.66.