Work individually, but I strongly recommend working with someone in the class nearby so you can help each other when you get stuck, with consideration to the Course Collaboration Policy. Please send me email if something isn't clear and I will update the assignment. Changes are logged at the bottom of this page.
Notes:
When due, two copies of 1) all hardware and testing code you wrote, and 2) all other requested files, must be submitted via:
A paper copy which is graded [instructions], and
An electronic copy used only as a backup uploaded to Canvas (under "Assignments") in a tar or zip file.
Label directories or files so it is clear to which problem they belong. For example, prob1.v, prob1.vt,...
Diagrams. If a problem requires a diagram, include details such as datapath, memory, control, I/O, pipeline stages, word widths in bits, etc. There must be enough detail so that the exact functional operation of the block can be determined by someone with your diagram and explanation, and a reasonable knowledge of what simple blocks do. A satisfactory diagram may require multiple pages of paper taped together into a single large sheet.
Verilog. If a problem requires a verilog design, turn in paper copies of both hardware and test verilog code.
a table printed by your verilog testbench module listing all inputs and corresponding outputs,
a simvision waveform plot which shows (labeled and highlighted) corresponding inputs and outputs, or
verilog
test code which compares the designed circuit and a simple
reference circuit (using high-level functions such as "+"),
and two copy & paste sections of text from your
simulation's output (one for pass, and one for fail where you
purposely make a very small change to your
designed hardware circuit or your reference circuit,
to force the comparison to fail) that look something like this:
Error: input=0101, out_module=11110000, out_ref=11110001
For 1 and 3, the output must be copied & pasted directly from the simulator's output without any modifications.
In all cases, Show how you verified the correctness of your simulation's outputs.
Synthesis. If a problem requires synthesis, turn in paper copies of the following. Print in a way that results are easy to understand but conserves paper (multiple files per page, 8 or 9 point font, multiple columns). Delete sections of many repeated lines with a few copies of the line plus the comment: <many lines removed> .
Run all compiles with "medium" effort unless told otherwise. Do not modify the synthesis script except for functional purposes (e.g., to change or add source file names).
Functionality. For each design problem, you must write by hand 1) whether the design is fully functional, and 2) the failing sections if any exist.
Point deductions/additions. TotalProbPts is the sum of all points possible.
inA inB outExp outMantissa Correct? -------- -------- ------ -------------- -------- 10101100 00110101 110010 01100110100101 Y 00000101 10110101 101010 01010101010101 Y 01010100 11101010 010100 11010101100101 no
Clarity. For full credit, your submission must be easily readable, understandable, and well commented.
b) [10 pts] Write matlab code which repeatedly calls lpfirstats(H) and finds the smallest area filter. There is no need to write a sophisticated optimization algorithm, just something reasonable that does more than simple coefficient scaling. For example, making small perturbations to the frequency and amplitude values that remez() uses such as using 0.01 and other small values instead of 0.00 in the stopband.
It may be helpful to use the following matlab code. Remember that matlab vectors start at index=1 so H(1) is the magnitude at frequency=0.
coeffs1 = remez(numtaps-1, freqs, amps); coeffs2 = coeffs1*scale; coeffs = round(coeffs2); [H,W] = freqz(coeffs); H_norm = abs(H) ./ abs(H(1)); [ripple, minpass, maxstoplo, maxstophi] = lpfirstats(H_norm);
Assume area is: Total_num_partial_products + 2*Num_filter_taps
As an example from a previous year of the difference between a good optimization and a weaker one, these are class results for ten students for a different filter than the one assigned here:109 area, 31 taps, 47 PPs 109 area, 31 taps, 47 PPs 113 area, 33 taps, 47 PPs 114 area, 33 taps, 48 PPs 123 area, 33 taps, 57 PPs 128 area, 34 taps, 60 PPs 182 area, 55 taps, 72 PPs 221 area, 59 taps, 103 PPs 221 area, 59 taps, 103 PPs 250 area, 61 taps, 128 PPs
i) [5 pts] Filter coefficients
ii) [10 pts] The number of taps, number of required partial products, area estimate, and the attained values for the four filter criteria in dB.
iii) [10 pts] A plot made by: plot_one_lpfir.m (that requires updating) to show the correct filter specifications.
iv) [5 pts] A stem() plot of the filter's coefficients.
i) [5 pts] Filter coefficients in a matlab-readable vector in a file coeff.m (e.g., c = [-5 2 14 ...]).
[150 pts] Design a block which calculates the complex radix-2 DIF FFT butterfly. Use your complex exponential generator from hwk/proj 3, problem 3, design 2 to generate W_{N} values.
X = A + B Y = (A - B) * W
The latency may be as many cycles as needed however the W_{N} block and the multipliers must be the only logic inside their own pipeline stages.
The block's I/O signals are described below. Recall that since there is no decimal point in the hardware, you may think of the inputs as being in any x.x format you like. Having done that, the decimal point of the output will be fixed and you will need to take that into consideration when comparing in matlab.clk input
wn_exp input
12-bit fixed-point unsigned
This is the input into the W_{N} generator.
A input
16-bit fixed-point signed 2's complement complex (a_r, a_i)
B input
16-bit fixed-point signed 2's complement complex (b_r, b_i)
X output
16-bit fixed-point signed 2's complement complex (x_r, x_i)
Y output
16-bit fixed-point signed 2's complement complex (y_r, y_i)
With outputs scaled with maximum precision but also so they never overflow, underflow, or saturate.
Appropriately pipelined so corresponding X and Y outputs are output at the same time.
Use 16-bit × 17-bit multipliers.
Use +, –, and * for arithmetic operations.
Generate test cases in your verilog testbench:
1) A minimum of 20 hand-picked extreme case inputs
(e.g., max pos and max neg inputs)
2) A minimum of 1000 random inputs using
$random (which returns a 32-bit number each time it is called).
Use $random(seed) once at the beginning of your test to set the random number
generator's seed to some arbitrary value so tests can be repeated for
debugging.
Output both the a) inputs and b) verilog output to a plain text
matlab-readable *.m file.
For example, a file such as:
wn_exp(0+1) = 0; a_r(1) = -643; a_i(1) = 0; ... % index 1 = angle 0
unfortunately since matlab can not have index = 0
wn_exp(1+1) = 1; a_r(2) = 123; a_i(2) = -6; ...
wn_exp(2+1) = 2; a_r(3) = 000; a_i(3) = -243; ...
where values can be printed out and then re-scaled in matlab however it
is most convenient.
Another possible format is:
wn_exp(0+1) = 0; a(0+1) = -643 + j * 0; ... % index 1 = angle 0
unfortunately since matlab can not have index = 0
wn_exp(1+1) = 1; a(1+1) = 123 + j * 6; ...
wn_exp(2+1) = 2; a(2+1) = 0 + j * 12; ...
Suggestion: print integers in verilog. Use "signed" reg's only for the printf statement.
Compare a) verilog output and b) matlab calculation of the butterfly equations using difff.m in matlab. Do not scale the matlab equations from how they are written above, but you may scale your verilog output by any power-of-2—which is the same as selecting the location of the decimal point.
Submit the following. When submitting the verilog file of your lookup tables, print only the first ~15 lines and the last ~15 lines and insert the comment "<Many lines removed>" for lines you deleted.
a) [30 pts] Detailed pipelined block diagram with all functional details.
b) [60 pts] Accuracy points for smallest error compared to matlab:
60pts:
within 1 bit,
50pts:
within 2 bits,
30pts: within 3 bits
Write the Energy_diff/Energy_data0 value in dB in
your report and also submit the four plots and printout produced by difff.m.
c) [60 pts] Synthesize your design at the following 3 cycle time values and report the 1) achieved cycle time (clock frequency) and 2) area for each:
a very long cycle time, e.g., 1 ms = 1 KHz, to find the minimum area;
a very short cycle time, e.g., 0.1 ns = 10 GHz, to find the minimum cycle time;
the cycle time achieved in the synthesis run for case (2) multiplied times 1.5
No points are possible for (b) or (c) unless the design is fully functional and without synthesis errors or serious warnings. See the Synthesis handout for details on the achievable cycle time and reading synthesis timing reports.
Updates: 2018/03/12 Posted 2018/03/14 Problem 2 posted 2018/03/19 Minor clarifications for problem 2