The files in this directory implement a 5-stage pipeline. Each stage is separated by a "FIFO" called BX, where X is replaced with the initial of the stage preceeding the FIFO. So the BF FIFO follows the fetch stage. In reality, these are single registers, with a presence bit, but they are wrapped in a FIFO interface. The stages are: * Fetch stage (takes 32-bit vectors from IMEM and writes them to BF) * Decode stage (takes from BF, looks up register values and writes to BD) * Execute stage (takes from BD, performs ALU and branch instrs, writes to BE) * Memory stage (takes from BE, reads/writes DMEM, writes to BM) * Write-back stage (takes from BM, updates register file) Instructions in this ISA have arguments which are register names where the argument values can be found. The decode stage looks up the value associated with a register and replaces the name with the found value. This is only legal if no instruction later in the pipeline is planning to write a value to the register. If such an instruction exists, the decode stage either has to wait for the register file to be updated, or it has to bypass the value from a later stage where the value to be written has been calculated. There are three types of pipeline which are implemented in this directory: 1) Stall * The decode stage probes the instructions in the BD, BE, and BM FIFOs and stalls if any instruction there is planning to write to a register which is read by the instruction at the head of the BF. * Files: FiveStageCPUStall.bsv, FindFIFO2.bsv, CPUTest.bsv 2) Bypass from the buffers * The decode stage stalls on instructions in the BD, and potentially stalls based on instructions in the BE and BM, but can also bypass values from the BE and BM if the computed value exists. * Most instructions cannot be bypassed from the BD, which is why this design only stalls based on that FIFO. However, the instruction LoadC (which was added for convenience) has a value which is ready to be taken a early as the BD. This design stalls until the LoadC reaches the BE FIFO. One could write a version which bypasses from the BD, of course. * Files: FiveStageCPUBypass.bsv, FindFIFO2.bsv, FindFIFOM2.bsv, CPUTestBypass.bsv 3) Bypass from the buffers and from the execute stage (pre-FIFO) * The is just like #2, except that the decode stage can bypass values that are being computed simultaneously by the execute stage. The previous version needed to wait until the execute stage's computation was registered in the BE before being able to bypass. * This also has the effect of "fixing the LoadC" problem. A LoadC at the head of the BD FIFO can be bypassed, via the execute stage (which does nothing to the LoadC anyway). In general, the decode stage must not stall based on the head of the BD FIFO (since that is covered by bypassing from the execute stage). Since the BD FIFO is a one-place FIFO, we can remove stalling considerations from it altogether. * Files: FiveStageCPUBypassPreFIFO.bsv, FindFIFOM2.bsv, CPUTestBypassPreFIFO.bsv There are also two versions of the stalling pipeline, which differ based on the data types in the buffers. The original version, FiveStageCPUStall, uses the same template for all the buffers. This is wasteful, because an Add instruction will never appear in Add form after the execute stage. By using a special template for each buffer, we can reduce the width of the FIFOs and also significantly reduce the stall/bypass logic on those FIFOs. The file FiveStageCPUStallV2 is an example of how one might write special types for each buffer. The data types in V2 were also created with extensibility in mind: * It should now be easy to extend the ISA without unnecessarily adding more logic/state to the stages. This is done by creating a template which almost like an a la carte order form of the stages -- the template says what the instruction should do at each stage, and as each stage completes, it can throw away the part of the template related to that stage. * The execute stage was also written with extensibility in mind. The execute stage could have been written as an ALU stage followed by a branch stage. The execute stage has been set up in this form without actually inserting a buffer. So it should be easy to extend the ISA with a "Jump If Add Is Not Zero" instruction, or any other combination of an ALU operation with a branch operation. The instruction sets started as purely register-addressed instructions: Add, Jz, Load, Store. To make writing the testbench easier, a LoadC instruction was added, which loads a register with a hardcoded value. (Previously, values could only come in from DMEM.) Also, the CPU was extended with a start method, which starts the fetching of instructions; the ISA was extended with a Halt instruction, which stops fetching; and the CPU was extended with a done method which signals when no more instructions are in the pipeline. The testbench files for each pipeline have very details ASCII-art comments which show the expected movement of instructions through the pipeline and indicate when stalling and bypassing occur, and point out some of the hazards of implementing bypassing. ========================================================= Exercises: 1. FiveStageCPUQ1.bsv -> FiveStageCPUQ1sol.bsv Have students add the LoadPC instruction to the ISA. 2. FiveStageCPUQ2.bsv -> FiveStageCPUQ2sol.bsv Fill in the stall/bypass functions to make the initial bypass CPU examples 3. FiveStageCPUQ3.bsv -> FiveStageCPUQ3sol.bsv Add the ability to forward before enqueuing. ============================================================= Potential exercises for this lab: 1) Have the students add the "start" and "done" methods and Halt instruction. 2) Have the students add the LoadC instruction. 3) Have the students add new instructions: ShiftL, Sub, LoadPC, etc. (There is already a sample program in the testbenches which uses ShiftL and LoadPC, but these have not been tested. 4) Have students write StallV2 from Stall. 5) Have students extend StallV2 to BypassV2 and BypassPreFIFOV2 as was done with the Stall. 6) Have students write the Bypass or BypassPreFIFO from the Stall. (This could be done by having them fill in the blanks.) 7) The BypassPreFIFO version required little change to the CPU. It simply involved changing the FIFOs. The comments at the top of FiveStageCPUBypassPreFIFO indicate two other ways of changing the design. Perhaps students could be asked to write those versions. 8) Have students bypass from the BD in Bypass (currently, the design only stalls). 9) Have students add a buffer between the ALU and branch parts of the execute stage, to make a 6-stage pipeline. Methodology exercises: 1) Have students run the examples and view in a waveform viewer that the pipeline moves as indicated in the ASCII-art (watch the firing of rules, watch the inputs and outputs to the buffers). 2) Have students explore adding debug $display statements to see how they can best probe stalling and the bypassing of values. 3) Have students explore rule conflicts and rule urgency. (Why did the execute stage need to be broken into separate rules? Why does the deleted instruction after the branch appear to "stall"?)