EEC 281 - Homework/Project #4
Winter 2013
Work individually, but I strongly recommend working with someone in
the class nearby so you can help each other when you get stuck, with
consideration to the course collaboration policy. Please send me email
if something isn't clear and I will update the assignment. Changes are
logged at the bottom of this page.
Notes:
Problem 1 - Complex exponential generator 3 x [20 + 0-50 pts]
This problem requires the design of a block which calculates
the complex number e^{jθ} for a given θ,
every cycle. It would be very useful as a very-high-precision complex numerically-controlled
oscillator. The latency may be as many cycles as needed.
The block's I/O signals are:
- in_theta input
12-bit fixed-point unsigned where:
0000_0000_0000 = 0.000 radians, and
1111_1111_1111 = 2π*(4095/4096) radians
- out_real, out_imag outputs
each are 16-bit fixed-point 2's complement.
The test procedure is as follows:
- Generate all 2^12 possible theta inputs [0, 2π) in verilog
($random
returns a 32-bit number but use only 12 of those bits each time you
call the function).
- Calculate the e^{jθ} output for each theta with your
verilog design
- Output both the a) theta input and b) verilog output to a
matlab-readable *.m file
- Compare a) verilog output and b) exp(j*theta)
using difff.m in matlab.
Design
and write verilog for
the block three ways:
- With "full" lookup table(s) (for an area reference point)
- With lookup tables that cover no more than π/4, or one-eighth
of the total 0-2π range.
- Same as design #2 but where outputs for odd theta inputs are
linearly-interpolated between samples from the lookup table(s) and
the system has a throughput of one calculation for every two clock
cycles. The goal is smaller area.
For each of your three designs, submit (a) through (d) below.
When submitting the verilog file of your large lookup table, print only the
first ~25 lines and the last ~25 lines and insert the comment "<Many lines
removed>" for lines you deleted.
- a) [10 pts] Detailed block diagram with all functional details.
- b) [10 pts] Higher-level pipelined block diagram.
- c) [0-25 pts] (Accuracy points for smallest error compared to matlab,
in comparison to other working designs in the class.)
Write the Energy_diff/Energy_data0 value in dB in
your report and also submit the four plots produced by difff.m.
- d) [0-25 pts] (Area points for smaller area for design #3 in relation to
other designs in the class. Must be fully functional, with working
and tested matlab-accurate plots and clean synthesis--no errors or
serious warnings. There is no minimum cycle time requirement.
Synthesize your design at a
few different cycle time values and report the area for each.
Include at least one case for a very long cycle time (approximating the
minimum possible area) and at least one cycle time where the area begins
to get significantly larger
(e.g., approx 25%).
Problem 2 - CDMA Transmitter [100 + 0-75 pts]
This problem consists of the design, implementation, and synthesis
of a CDMA transmit path.
Register all inputs before using them and register all outputs
before they exit the block.
Use a single 100 MHz (10 ns) clock for all circuits.
Here are the key block signals:
- input reset
Synchronous reset, asserted high.
- input data[6:0]
Data input, one bit for each code or "user."
This specific design supports only seven users.
- output out[8:0]
2's complement output waveform, 100 MSamples/sec
(10 ns cycle time).
You may add other signals as needed (within reason, it may be helpful to have
some sort of handshaking for read input data) to communicate
with interfacing circuits.
+----------+ +---------+ +---------+ +---------+
| | | | | | | |
reset | code | | | | | | |
-------| geners | 4 | upsamp2 | ? | upsamp2 |"all"| round | out
data | add29 |--/--| low- |--/--| low- |--/--| satur |--/-->
---/---| control | | pass | | pass | | | 9
7 | | | filter | | filter | | |
| | ^ | round | ^ | | ^ | |
+----------+ | +---------+ | +---------+ | +---------+
| | |
| | |
25 MSamples/sec | 100 MSamples/sec
sig25[3:0] sig50[--:0] sig100[--:0]
Baseband Transmitter
This project uses orthogonal length-8 Walsh codes
to modulate data for seven users.
The accompanying
cdma.m matlab file generates the Walsh codes, modulates
data onto the codes, and assembles the transmit symbols.
Your system should perform the exact same operations bit-for-bit.
Use the 29-input adder you designed for homework 1.
Disable unused inputs by setting half of them to 1 and the other half to 0.
Tie these unused inputs off in your synthesized verilog module (not your
external testing module) and DC should optimize away some circuits for you.
Upsampling and Filtering
Upsample the sig25 signal by 4 times and filter out images produced
by the upsampling, by doing two stages of upsampling by 2.
For the first upsampling and low-pass filtering, round the output by
adding a half LSB and truncating.
Choose the point where you round by setting
sig25 to a constant +7 or -7, and making sure sig50 has 12
"good" data bits
(one sign-extension bit or zero repeated-MSBs).
Also add
enough sign extension
bits so that sig50 can never overflow under any circumstances.
Set sig25 to +7 or -7 with verilog commands
force and release in the testbench
rather than temporarily changing your hardware.
For the second upsampling and low-pass filtering, keep the output full width.
Build each block by using the merged Nyquist filter + 2x upsampler
we discussed in class.
Make sure you design your filters so the output can fully represent
a worst-case input, which is going to have two versions as discussed
in class.
The cdma.m matlab file performs upsampling
by 2x, shows spectral results, and may be helpful as a starting point.
Implement each filter by either:
1) "+ (in * 5)" for each filter tap.
2) "+ (in << 4)" for each partial product (probably smaller),
or
3) using rows of 4:2 adders (use provided wide 4:2 submodules) plus one CPA
(may be smallest).
The output must have
less than 3 dB attenuation from 0 to 0.20 π, and
at least 22 dB attenuation from 0.3 π to π,
according to
freqz(conv(upsample(coeffs1,2), coeffs2));
where coeffs1 is the coefficients of the filter after the first
2x upsampling and coeffs2 is for the filter after the second
2x upsampling.
Rounding, Saturation, and Output
Design an "add 1/2 LSB and truncate" rounding circuit with 1/2 LSB added
in the second filter.
Follow this by a saturator which saturates the rounded signal to the
maximum range out can represent [-256, +255]. Scale the
signal so that with sig25 held at a constant +7 or -7,
out is not saturated and all of its
bits are used.
This scaling is accomplished solely by matching
the filter output bus and round/sat input bus differently--it
involves no hardware.
Using all of the output's bits means out[MSB] is
not equal to out[MSB-1] which is the same as saying
-256 <= out <= -129 or
+128 <= out <= +255.
This implies that if the value of the filter's output is doubled
from what it was with sig25 = +/-7 (whatever the values are
for your implementation), out would saturate.
Synthesis
Make the following changes to your *.scr DC compile script.
(Let me know right away if you notice anything strange about your
synthesis timing.)
- clk_period = 7500 (let me know if this is difficult to meet)
- clock_skew = 50
- input_setup = 3000
- output_delay = 400
Do not modify the synthesis script except for functional purposes
(e.g., to change or add source file names). There are many
knobs to enhance synthesis results but that's not the focus of
this homework. If you do improve the script, please let me know
and I'll add it to the base script.
Run your compile with "medium" effort.
Design goals
The overall goal of the project is to design a working system with low area
(low power). In order of importance, you should:
- Design components and get verilog working
- Write bit-accurate matlab
- Get synthesis working
- Meet clock cycle time without timing violations (negative slack)
- Minimize area (meaningless without meeting timing)
Designing, Testing, and Grading [100 pts + 0-75 pts]
For parts below regarding designing and drawing...
Include pipeline stages and word widths in bits.
There must be enough detail so that the exact functional
operation of the block can be determined by someone with a
reasonable knowledge of what simple blocks do (e.g., "block generates
walsh code #2", or "add29 adder") and your diagram and explanation.
Include details of the datapath and control.
For parts below regarding bit-accurate models and testing...
Keep bit-accurate models as simple as possible to make debug easier
(i.e., don't write your verilog first and then force matlab to
be the same). Use floor(signal+0.5) for rounding.
For full-credit, compare at least 25 symbols.
You may test within a block-level simulation or within a larger simulation.
It may be easiest to generate random inputs in verilog, write out
input and output from verilog into a file in a format matlab can
easily read (such as, in(1)=5; in(2)=-2; ... on different lines),
and do the comparison inside matlab.
You may have to neglect samples near the beginning and end
of your simulations to get 100% matching; this is ok.
- a) [10 pts] Design and draw a block diagram of the
baseband transmitter including details of the code generators,
adder, and control.
- [15 pts]
Write verilog and clearly show that it
matches the bit-accurate matlab.
- b) [10 pts] Design and draw a block diagram of the
upsampling and filter circuits.
- [10 pts]
Write verilog and a bit-accurate matlab model and clearly
show that they match.
- [10 pts]
Show the freqz(conv(upsample(coeffs1,2), coeffs2)); plot
described above with axis ([0 1 -30 5]); to clearly see the
important range.
- [10 pts]
Show a verilog waveform of the input and output for the filter alone
using the following input:
- maximum magnitude negative impulse input
- c) [10 pts] Design and draw a block diagram of the
rounding and saturation circuits.
- d) [25 pts] Synthesize the entire system.
Turn in listings given below.
- [0-75 pts] Smaller area receives more points.
Must be fully functional, with working and tested bit-accurate
matlab, and clean synthesis (no errors or serious warnings).
Turn in paper copies of the following. Print in a way that is
clear and easy to understand but conserves paper (multiple files per
page, 8 or 9 point font, multiple columns).
Edit sections with many repeating lines so they have only a few lines
and replace with the comment: <many lines removed> .
- dc_compile (or equivalent)
- *.area file
- *.log file; use something like dc_shell -f dc_compile
| tee prac.log as shown on the DC tutorial page.
Edit and reduce "Beginning Delay Optimization Phase"
and "Beginning Area-Recovery Phase" sections.
- *.pow file; summary only
- *.tim file; first (longest) path only
- source verilog files
- test verilog files
- source matlab files
Possibly-helpful suggestions
- Work on a block at a time, especially when doing bit-accurate
testing. You'll go crazy trying to debug a downstream block if
an upstream one is causing the trouble.
- For bit-accurate testing, you can generate input data from
either matlab or verilog. I think it's a little easier to
generate input data in verilog and then print both input and
output to a matlab-readable file
and test and
compare in matlab. You may find it handy to declare variables as
signed in verilog and print them using $fwrite
so both positive and negative numbers print correctly.
- Use the matlab function
difff.m
to easily compare two signals.
- The file
tb.vt, and
cdma.v
may give you a helpful start with the testbench.
- When debugging filters, start with an impulse-response input. This
likely won't be a [0, 0, 0, 0, 1, 0, 0, 0, 0] that you normally think
of as an impulse response, but the "1" should be the largest
input possible which may be a "-8", "+4", or "+7". It will be
easier to see what is going on if you use a power-of-2 input. This is
where a detailed block diagram is going to save you a headache. Think
carefully about what you expect on the output.
- As a rule, register all inputs into your top level to give those
signals a full clock cycle to work inside your block before they
have to be registered again. To be extra conservative, I gave you
a small 0.50ns setup window, but the down side is that those paths
will likely show up at the top of your timing reports. So go ahead
and change input_setup to 6000 in your dc_compile file.
I've made the change to the version on the web. input_setup =
6000
Updates:
2013/03/07 Posted.