Pipelining

Consider this sequence of MIPS instructions

ADD   $s1, $s1, $s2
SUBI  $s3, $s1, #1
BNEZ  $s3, Offset

Timing

Using the standard 5-stage MIPS pipeline, show the pipeline timing for these 3 instructions. Draw arrows between any stages where forwarding occurs. Circle the stage when the branch outcome is known.

Instructions ↓Cycles →
ADD $s1, $s1, $s2 IFIDEXDMWB
SUBI $s3, $s1, #1
BNEZ $s3, Offset

Pipeline changes

Draw the a dataflow diagram for this architecture, with the changes necessary to support a new "branch on not equal" that compares with an immediate value:

ADD   $s1, $s1, $s2
BNEQI $s1, #1, Offset

Do not make any changes that would increase the clock cycle time. Remember that the register fetch happens in the second half of the clock cycle, so there is not time to do an arithmetic operation with a register in the same stage as the register fetch.

New timing

Show the pipeline timing using your new pipeline. Draw arrows between any stages where forwarding occurs. Circle the stage when the branch outcome is known

Instructions ↓Cycles →
ADD $s1, $s1, $s2 IFIDEXDMWB
BNEQI $s1, #1, Offset

Encoding

This new instruction requires two immediate values (the immediate used for comparison, and the branch offset). Assume we support this by introducing a new MIPS instruction encoding format as shown below. The opcode takes 6 bits, the register takes 5 bits, leaving 21 to split between the two immediates. How would you decide how many bits to allow for each?

R-Type opcode(6) Rs(5) Rt(5) Rd(5) shift(5) func(6)
I-Type opcode(6) Rs(5) Rt(5) immediate(16)
J-Type opcode(6) address(26)
New opcode(6) Rs(5) immediate(??) offset(??)

Branch Prediction

Tired of dealing with all of the issues predicting conditional jumps, you decide to make JR, the unconditional jump to an address in a register, the only means for performing conditional flow instruction in your new MIPS-like ISA. You figure that JR, in combination with a conditional move instruction MOVZ, can do anything the other types of branches can. "MOVZ R1, R2, R3" is equivalent to this C code, "if (R3 == 0) R1 = R2", but as a pure ALU operation without branching or control hazards.

Code Conversion

Give an instruction sequence to accomplish the following, but replacing BNEZ with some combination of MOVZ, JR and possibly other operations, assuming additional registers are available for your use:

          BNEZ  R1, Loop
NonLoop:  ...

CPI using BNEZ

You measure the standard MIPS pipeline (diagram above) for an application typical of your expected users. From that you get the following statistics: 15% of total instructions are branches, 65% of all branches are taken. What is the expected CPI using BNEZ and a predict not taken strategy?

MOVZ/JR compared to BNEZ

Accounting for the change in instruction count and the fact that branches are always taken on your new architecture, what is the expected speedup (or slowdown) of the architecture with MOVZ/JR as compared to the architecture with BNEZ?

Discussion

Spoiler alert: the answer to the previous question gives a slowdown for using MOVZ/JR. Why?