Consider this sequence of MIPS instructions
ADD $s1, $s1, $s2 SUBI $s3, $s1, #1 BNEZ $s3, Offset
Using the standard 5-stage MIPS pipeline, show the pipeline timing for these 3 instructions. Draw arrows between any stages where forwarding occurs. Circle the stage when the branch outcome is known.
|Instructions ↓||Cycles →|
Draw the a dataflow diagram for this architecture, with the changes necessary to support a new "branch on not equal" that compares with an immediate value:
ADD $s1, $s1, $s2 BNEQI $s1, #1, Offset
Do not make any changes that would increase the clock cycle time. Remember that the register fetch happens in the second half of the clock cycle, so there is not time to do an arithmetic operation with a register in the same stage as the register fetch.
Show the pipeline timing using your new pipeline. Draw arrows between any stages where forwarding occurs. Circle the stage when the branch outcome is known
|Instructions ↓||Cycles →|
This new instruction requires two immediate values (the immediate used for comparison, and the branch offset). Assume we support this by introducing a new MIPS instruction encoding format as shown below. The opcode takes 6 bits, the register takes 5 bits, leaving 21 to split between the two immediates. How would you decide how many bits to allow for each?
Tired of dealing with all of the issues predicting conditional jumps, you decide to make
JR, the unconditional jump to an address in a register, the only means for performing conditional flow instruction in your new MIPS-like ISA. You figure that
JR, in combination with a conditional move instruction
MOVZ, can do anything the other types of branches can. "
MOVZ R1, R2, R3" is equivalent to this C code, "
if (R3 == 0) R1 = R2", but as a pure ALU operation without branching or control hazards.
Give an instruction sequence to accomplish the following, but replacing
BNEZ with some combination of
JR and possibly other operations, assuming additional registers are available for your use:
BNEZ R1, Loop NonLoop: ...
You measure the standard MIPS pipeline (diagram above) for an application typical of your expected users. From that you get the following statistics: 15% of total instructions are branches, 65% of all branches are taken. What is the expected CPI using
BNEZ and a predict not taken strategy?
JR compared to
Accounting for the change in instruction count and the fact that branches are always taken on your new architecture, what is the expected speedup (or slowdown) of the architecture with
JR as compared to the architecture with
Spoiler alert: the answer to the previous question gives a slowdown for using