The MIPS R4000 had an eight-stage pipeline as shown here in abbreviated form:

### Pipeline Speedup(10 points)

The pipelining without hazards is shown in this pipeline timing diagram:

```                    Cycle ->
Instruction         0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15
0: add  \$1, \$2, \$3  IF IS RF EX DF DS TC WB
1: add  \$4, \$5, \$6     IF IS RF EX DF DS TC WB
2: add  \$7, \$8, \$9        IF IS RF EX DF DS TC WB
3: add \$10,\$11,\$12           IF IS RF EX DF DS TC WB
4: add \$13,\$14,\$15              IF IS RF EX DF DS TC WB
5: add \$16,\$17,\$18                 IF IS RF EX DF DS TC WB
6: add \$19,\$20,\$21                    IF IS RF EX DF DS TC WB
7: add \$22,\$23,\$24                       IF IS RF EX DF DS TC WB
8: add \$25,\$26,\$27                          IF IS RF EX DF DS TC WB
```
1. What is the ideal pipeline speedup for this processor?
2. What are three assumptions of the ideal pipeline speedup?

### Data Hazards(45 points)

For each data hazard that can happen between R-type instructions with this MIPS R4000 organization:

1. Show a sequence of MIPS instructions that has that that hazard and no others.
2. Explain what forwarding would be necessary to avoid it (e.g. in cycle 2, forward from the ALU result pipeline register between EX and DF to the ALU input in EX).
3. Show the necessary forwarding path(s) for just this hazard on the abbreviated datapath.

Finally, draw a version of the abbreviated datapath with all of the R-type instruction forwarding paths.

### Control Hazards(45 points)

Consider this sequence of instructions, implementing y = |x|+1:

```if (x<=0)
x = -x;
y = x + 1;```

If x is in register \$1 and y in register \$2, this could correspond to the following assembly code:

```        bgtz \$1, endif
sub \$1, \$0, \$1

Assume the branch target can only be resolved at the end of the EX stage, and the processor always predicts not taken.

1. Show a pipeline timing diagram for the MIPS R4000 when \$1 is 5.
2. Show a second pipeline timing diagram for when \$1 is -5
3. What is the branch penalty?
4. If 20% of the instructions in a program are branches, and 40% of the branches are taken, what is the expected average CPI?

### Extra credit(25 points)

iAPX 86,88 (Intel Advanced Processor Architecture 8086/8088) was the predecessor to the modern x86 Intel CPU architectures. It is a two operand machine (op dest, src), supporting source/destination operand combinations of register/memory, memory/register, memory/memory, immediate/register, and immediate/memory. Consider the following code segment and instruction set reference table. Assume the initial value for ARRAY[100] is 128 and for ARRAY[200] is 2048

```       MOV   AX, ARRAY[100]
MOV   CX, 4
MUL   CX
MOV   ARRAY[100], AX
AGAIN: MOV   AX, ARRAY[200]
SUB   AX, 256
MOV   ARRAY[200], AX
MOV   CX, AX
MOV   AX, ARRAY[100]
SUB   CX, AX
JCXZ  AGAIN```
InstructionOperandsClock Cycles
MOV dest, srcreg, reg2
reg, imm4
reg, mem12
mem, reg13
SUB dest,src
reg, reg3
reg, imm4
reg, mem13
mem, reg24
mem, imm25
MUL src
(AX is dest)
reg118
JCXZ
(jump if CX==0)
label18

Compute the CPI and expected execution time for a 5 MHz 8086

### Submitting

Follow the class git instructions to submit. You can do your work electronically (e.g. use latex for written content and equations and something like Inkscape or Adobe Illustrator for any drawings), or on paper which you scan or photograph, or on a tablet. Submit your work in hw4 directory, and commit, tag, and push your final submission before the deadline. Be sure to edit the hw4/readme.txt file to tell us what files contain the answers to which problems (especially if there is more than one file), and what tools you used.