03. Multi Cycle

Single Cycle	Multi-Cycle
Each instruction is executed in a single clock cycle	Instructions are divided into multiple smaller stages, taking one cycle each
The hardware ‘knows’ what instruction we have at the beginning of the Execute stage	The hardware does not ‘know’ what instruction it has until the end of the Execute stage
The processor has a fixed set of hardware that performs all the necessary steps for instruction execution within one cycle	The hardware is designed to handle different stages of execution seperately
Some parts of the hardware may remain idle during certain instructions, leading to reduced overall performance	Instructions that require fewer cycles finish faster, while complex instructions will take more cycles to complete
Duplicate hardware	No duplicate hardware
No extra registers	Extra registers required to store results between cycles
CPI is always one	CPI depends on the types of instructions

There are three to five cycles for any given command (other than floating point), similar to Single Cycle:

Fetch
Decode
Execute
Memory
Register Write

Each stage is performed in one clock cycle.

Fetch¶

This instruction is identical for all command types.
The address from the PC is passed to the Memory File. The Memory File reads the data at that address, and is stored in a temporary Instruction Register, to be used later as the instruction to execute. The PC is incremented by 4.

Decode¶

The values from the specified registers (RS and RT) are passed into temporary registers (called A and B) to make sure the data persists across the next cycle. The Control sets the control lines according to the opcode.

Just in case the command is a Branch command, we calculate what the destination address is. The bottom 16 bits are sign extended to 32 bits and shifted left two bits. The ALU adds the PC to this value and stores the result in the ALUout register

Execute¶

The ALU uses the values in registers A and B as it’s operands. Depending on the instruction, one of four things happen:

R-Type instructions
- The ALU performs an operation on A and B, storing the result in ALU-Result
Memory instructions (sw/lw)
- ALU calculates effective memory address by adding the offset to RS
Branch instructions
- ALU subtracts B from A to see if they’re equal, and sets the zero flag accordingly
Jump instructions
- The PC is set to the 4 leading bits of the PC (after incrementing it by 4) and the 28 bits of the shifted address

All instructions other than Load, Store, and R-Types complete here.

Memory Access¶

R-Type
- The result calculated in the Execute cycle is waiting in the ALUout register. This value is written to register RD.
Store word/byte
- The address computed in the Execute cycle is passed from ALUout to the Write Address line into the Memory File, and the contents of RT are passed to the Write Data line of the Memory File. The Control unit sets the MemWrite line to 1.
Load word/byte
- The address computed in the Execute cycle is passed from ALUout to the Read Address line into the Memory File. The Control unit sets the MemRead line to 1. The result is stored in the Memory Data register to persist across the next cycle.

Register Write¶

This stage only happens with load commands. The data in the Memory Data register is sent to the Register File and written to RT.