EEC 116 - Final Project

Multi-Processor Data Sorting Engine

This project consists of the design and layout of a chip which contains an array of 9 processors that sorts a stream of 9 unsigned 8-bit numbers at very high throughputs. The chip uses a novel algorithm that can be viewed as a scalable physically-distributed multi-processor bubble sort where data flows once through 9 2-element sorting processors rather than one processor making ~9 passes through the data set as would happen in a common software implementation. If a single-issue RISC processor required 7 cycles to perform a single comparison and swap (load, load, subtract, branch, store, store, incr_counter), to maintain the same performance as a 2.0 GHz array of these processors, the RISC processor would need to run at 126 GHz!

All work must be done by groups composed of 2 members each. Projects by 1 person are not possible due to requirements for project classes.

Single-processor architecture (proc.mag)

Processor array architecture (core.mag)

Chip (chip.mag)

Top-level test environment (top.mag)

Other requirements

Basic operation

  1. Set reset=1 and clock the chip several cycles to reset important circuitry (should include all registers).
  2. Up to 9 data values are entered into the chip.

  3. The chip is clocked enough cycles (approximately 9) while inserting the maximum value (01111111) to sort the data and to output the sorted data stream

Functional testing

    Verify the functionality of the processors by testing the following input sequences (all values shown in decimal base10)
  1. Valid outputs should be: 1, 2, 3, 127,...
    [reset chip]
    chip_in_valid=1, chip_in_data=3  
    chip_in_valid=1, chip_in_data=1  
    chip_in_valid=1, chip_in_data=2  
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=0     # used to exercise critical path
    chip_in_valid=1, chip_in_data=1     # used to exercise critical path
    chip_in_valid=1, chip_in_data=0     # used to exercise critical path
    [Add a few extra clocks if needed to flush results and see them clearly]

  2. Valid outputs should be: 124, 125, 126, 127,...
    [reset chip]
    chip_in_valid=1, chip_in_data=125  
    chip_in_valid=1, chip_in_data=126  
    chip_in_valid=1, chip_in_data=124  
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    [Add a few extra clocks if needed to flush results and see them clearly]

  3. Valid outputs should be: 0, 2, 4, 66, 67, 68, 124, 125, 126, 127,...
    [reset chip]
    chip_in_valid=1, chip_in_data=125  
    chip_in_valid=0, chip_in_data=32    # no valid input
    chip_in_valid=1, chip_in_data=68  
    chip_in_valid=0, chip_in_data=16    # no valid input
    chip_in_valid=1, chip_in_data=124  
    chip_in_valid=0, chip_in_data=0     # no valid input
    chip_in_valid=1, chip_in_data=0  
    chip_in_valid=1, chip_in_data=4  
    chip_in_valid=1, chip_in_data=126  
    chip_in_valid=1, chip_in_data=67  
    chip_in_valid=1, chip_in_data=2  
    chip_in_valid=1, chip_in_data=66  
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    chip_in_valid=1, chip_in_data=127   # used to flush out pipeline
    [Add a few extra clocks if needed to flush results and see them clearly]

    Change inputs on the falling edge of clock in your irsim top-level test file.

Measuring the maximum clock rate (minimum clock period or longest logic path delay)

Points

Miscellaneous



Updates:
2010/11/19         Written, functional test missing
2010/11/30         Final version
2010/12/02         Minor logic fixes in proc so sorting works correctly with a data bubble
2010/12/03         Added comments on submitting your work