Notes:
Submit: (1) all code you wrote (no generated or provided files) including verilog hardware, verilog testing, matlab, etc. (2) other requested items such as diagrams etc.
Upload a single pdf to https://canvas.ucdavis.edu/.
Place all of your answers and code into a single pdf file with all problems and material in order.
Add titles to pages and file names so it is clear to which problem they belong. For example, Problem 1, prob1.v, prob1.vt,...
Diagrams. If a problem requires a diagram, include details such as datapath, memory, control, I/O, pipeline stages, word widths in bits, etc. There must be enough detail so that the exact functional operation of the block can be determined by someone with a reasonable knowledge of what simple blocks do. A satisfactory diagram may sometimes require multiple pages of paper taped together into a single large sheet.
Verilog. If a problem requires a verilog design, turn in paper copies of both hardware and test verilog code.
a table printed by your verilog testbench module listing all inputs and corresponding outputs,
a simvision waveform plot which shows (labeled and highlighted) corresponding inputs and outputs, or
verilog
test code which compares a) your hardware circuit and
b) a simple
reference circuit (using high-level functions such as "+")—no
third circuit.
Include two copy & paste sections of text from your
simulation's output (one for pass, and one for fail where you
purposely make a very small change to either your
designed hardware circuit or your reference circuit
to force the comparison to fail) that look something like this:
input=0101, out_hw=11110000, out_ref=11110000, ok
...
input=0101, out_hw=11110000, out_ref=11110001, Error!
...
For 1 and 3, the output must be copied & pasted directly from the simulator's output without any modifications.
In all cases, Show how you verified the correctness of your simulation's outputs.
Synthesis. If a problem requires synthesis, turn in paper copies of the following. Print in a way that results are easy to understand but conserves paper (multiple files per page, 8 or 9 point font, multiple columns). Delete sections of many repeated lines with a few copies of the line plus the comment: <many lines removed> .
The "always @(*)" verilog construct may be used.
Run all compiles with "medium" effort. Do not modify the synthesis script except for functional purposes (e.g., to specify source file names).
Functionality. For each design problem, you must write by hand 1) whether the design is fully functional, and 2) the failing sections if any exist.
Point deductions/additions. TotalProbPts is the sum of all points possible.
inA inB outExp outMantissa I Certify Correct -------- -------- ------ -------------- ----------------- 10101100 00110101 110010 01100110100101 Y 00000101 10110101 101010 01010101010101 Y 01010100 11101010 010100 11010101100101 no // this indicates I recognize there is an error here
Clarity. For full credit, your submission must be easily readable, understandable, and well commented.
Total: 270 points
1. [70 pts] This problem requires the design of a block which calculates the complex number ejθ for a given θ, every cycle. It would be very useful as a very-high-precision complex numerically-controlled oscillator. The latency may be as many cycles as needed.
2. [200 pts] This problem involves the design, implementation, synthesis, and testing of a custom hardware Sum of the Absolute Difference (SAD) computational unit and post-processor. The primary specifications are as follows:
The reference group of pixels to be matched is 4 × 4 pixels
The search area image is 16 pixels × 16 pixels
Both the reference group pixels and the search image pixels are 8-bit unsigned grayscale integers
The complete processor has the following I/O ports in addition to clock:
Because synthesis times will be long if all memories are synthesized, do not synthesize the search area pixel memory. Do this by locating the search area pixel memory outside the module which will be synthesized--in other words, your top level processor will contain two things: the search area pixel memory and the rest of your entire processor.
Register all data immediately after entering and before leaving the module which is synthesized.
Execution occurs in the following stages:
assert in_valid high and load one reference group pixel (16 cycles)
assert in_valid high and load one search area pixel (256 cycles)
wait a few arbitrary clock cycles (test with ~5)
assert go (1 cycle)
when the three out_min* values are ready, they appear on their respective output ports and out_valid is asserted high (1 cycle)
Other requirements:
calculate SADs in any order
use as many memories as you like, but only ones with one read and one write port
use only one CPA for each adder or subtractor, and build large adders using 4:2 and 3:2 carry-save adders plus one CPA.
the three out_min* values must be output approximately 14–16 clock cycles after go is asserted.
Build the testing environment as follows:
your testbench must generate at least two test cases: 1) random reference and search area pixel data; 2) some type of recognizable image whose location in the data is clear by inspection (for example, a smiley face or QR-code-type square within a square) for the reference pixel data; the search area is arbitrary pixels with imperfect smiley faces appearing in several places.
print all input pixel data, all 169 SAD calculations, and all outputs to a *.m file for analysis in matlab. The three out_min_* outputs must be printed by code that prints the values only when out_valid is high.
write a bit-accurate matlab model of your hardware that is short, extremely clear, and calculates the SAD differently than the way your hardware works.
Submit the following.
a) [10 pts] A bulleted list and a few sentences describing your design including such features as: the number and size of memories, number of multipliers and adders, exact number of cycles to complete the 16 convolutions, number of pipeline stages, in what order are the pieces and in what order are the 16 convolutions calculated, etc.
b) [15 pts] Detailed pipelined block diagram with all functional details.
c) [10 pts] A simvision waveform showing the entire overall operation.
d) [10 pts] List the number of cycles from asserting go to the three out_min* outputs being ready.
e) [40 pts] Images of test case #2 showing the reference and search grayscale pixels along with the locations of the three out_min* outputs. Your data should be chosen so the best 3 matches can be easily seen. You may find the provided matlab script drawbar.m helpful.
f) [40 pts] Proof that your hardware and your matlab model match exactly for the two prescribed test cases (or a clear explanation how they differ). State whether the outputs match the matlab reference for both cases.
g) [75 pts] Synthesize your design at the following three cycle time values and report the 1) achieved cycle time (and corresponding clock frequency) and 2) area for each:
a very long cycle time, e.g., 1 ms = 1 KHz, to find the minimum area;
a very short cycle time, e.g., 0.1 ns = 10 GHz, to find the minimum cycle time;
a synthesis run with the cycle time set to the minimum cycle time result for the synthesis run for case (2) multiplied times 1.5
No points are possible for (g) unless the design is fully functional and without synthesis errors or serious warnings.
Hint: See the Synthesis handout for details on the achievable cycle time and reading synthesis timing reports.
Hint: See the "matlab: tips for 281" web page for suggestions on addressing memories in matlab.
Hint: See the "verilog: example code" web page for example code in case it is helpful
2024/02/29 Posted 2024/03/12 Update regarding memory ports 2024/03/14 Fixed "min" instead of "max", removed requirement for order of SAD calculations 2024/03/18 Added clock and reset inputs to list