EEC 116 - Final Project
This project requires the design and layout of a chip which contains
a high-speed digital low-pass filter. Filters are one of the most common
blocks found in digital signal processors, which are increasingly popular
in many electronic systems.
All work must be done by groups comprised of 2 members each. Groups of
1 person are not possible due to requirements for project classes.
Digital filter
The filter is a 5-tap or 5-coefficient finite impulse response (FIR)
filter and has a saturator at its output. It processes one
sample every clock cycle enabling very high data throghputs. The plot
below shows the magnitude frequency response of an example 7-tap filter
(with a phase plot below it). The values of the coefficients determine
the specifications and type of the filter (e.g. low-pass, high-pass, etc.).
Filter architecture
The filter consists of three major components: 1) multipliers, 2) adders, and
3) registers made up of flip-flops. Use the multiplier from your Hwk as a
starting point and revise it at least once to improve its area and speed.
The adders must be built with a simple ripple-carry adder structure made up of
a chain of full adders.
The multipliers must be built using the structure shown in Figure 11-30
in the textbook. You may use Full Adders or Half Adders for the "HA"
blocks.
The filter is followed by a saturator which saturates or clips the output
to be no greater than a certain level.
Saturation is a common method to reduce the magnitude and word-width of
signals and in some sense is complimentary to
rounding.
The 5 coefficients of the filter and the saturation level are programmable.
Saturator architecture
The purpose of the saturator is to clamp or saturate the filter's
output to a configurable maximum value--so it fits into a limited-width
output word. A table describing its operation is below. You may build
the saturator using any logic you like.
The notation below is verilog-compatible. For example, abc[15:0] means
the 16 signals from abc[15] to abc[0]. Underscore characters are
inserted for readability only; 00_1111 is the same as the six digits
001111. Curly braces concatenate signals; {000, abc[5]} is a 4-bit
signal.
config_sat[1:0] sat_in[17:0] | sat_out[15:0]
-----------------------------------------------+-------------------
sat_out saturated to 16 bits |
00 sat_in ≤ 00_1111_1111_1111_1111 | sat_in[15:0]
00 sat_in > 00_1111_1111_1111_1111 | 1111_1111_1111_1111
|
sat_out saturated to 14 bits |
01 sat_in ≤ 00_0011_1111_1111_1111 | { 00, sat_in[13:0]}
01 sat_in > 00_0011_1111_1111_1111 | 0011_1111_1111_1111
|
sat_out saturated to 12 bits |
10 sat_in ≤ 00_0000_1111_1111_1111 | { 0000, sat_in[11:0]}
10 sat_in > 00_0000_1111_1111_1111 | 0000_1111_1111_1111
|
sat_out saturated to 10 bits |
11 sat_in ≤ 00_0000_0011_1111_1111 | {000000, sat_in[9:0]}
11 sat_in > 00_0000_0011_1111_1111 | 0000_0011_1111_1111
Chip structure
The diagram below shows the hierarchy of the core, clock driver, power rings,
and I/O pads.
The diagram below shows the hierarchy of the chip and the test
environment in top.mag
Chip pads
Pads. Design three types of pads:
- Power/ground--Use three VddCore and three GndCore pads
for internal circuit power. Plus, add one VddIo and one
GndIo for every four output pads, and one of each for every
six input pads.
- Input into chip--include two inverter buffers.
- Output out of chip--include sufficient buffering to drive
the output pad with a fanout no larger than 8 at any point in the
inverter chain. The NMOS and PMOS of the final inverter must be
separated by at least 100 λ, and must be almost entirely
surrounded by appropriate-type guard rings.
Assume pad and load capacitance is 10 pF. This is approximately
equivalent to the input capacitance of 2000
near-minimum-sized
inverters (10 λ PMOS, 5 λ NMOS). Copy the cell ninepF.mag to your directory and attach
it to every output pad in a top level test cell (called top.mag),
not in the same cell as your chip. It's large, but will make
irsim timing simulations much more accurate.
Place the pads in a single ring around the edge of your chip.
Power input and output buffers with VddIo and GndIo.
The I/O circuits are in a reasonably small space between the pads and
power rings.
Design your pads with a 50 µm x 50 µm bonding area
(paint with paint glass in magic), a 60 µm
x 60 µm pad comprised of all 6 metal layers, and with a
80 µm pitch (center-to-center distance).
Power rings and power grid
Use m5 and m6 for all
except for minimum-length runs necessary to cross other m5 and/or m6
structures.
Power rings. Design four power rings as part of each pad cell:
VddIo ,
GndIo,
VddCore, and
GndCore.
Each ring must be 30 µm wide and made of at least four levels
of metal. They must be located at least 10 µm away from the 60
µm x 60 µm pads.
Power grid. Route wide VddCore and GndCore wires
(major ones at least 10 µm wide) across the chip connecting
to the appropriate power ring on both ends and to your active
circuits in the middle. You need to build the power grid only in
the regions that feed your circuits.
Clock tree
Connect the clock inputs of all flip-flop groups together in a single
m5 and m6 network
except for minimum-length runs necessary to cross other m5 and/or m6
structures.
Connecting a few FFs
in an area
with a lower level of metal
is ok. Use wires at least 8 λ wide for sections which
drive many clock loads. Design a clock driver chain of inverters which
is located in one location (not spread around the chip) beginning with
a 10 λ/5 λ inverter (driven by the clk input
pad) and which drives the clock wire network without an inversion
from the clk signal, and which has fanouts of internal buffers no
larger than 6. Estimate wire capacitances of the network using these
wire capacitances, Assume a
10 λ/5 λ inverter has an input gate cap of 5 fF.
Place the clock tree buffers inside chip.mag but outside core.mag
Place the entire clock distribution tree (wires) inside core.mag
Other requirements
- Use only static circuits except where explicitly permitted or
required. Transmission-gate logic may be used for only the
following circuits: XOR, XNOR, full adders, muxes.
- Do not optimize circuits (dynamic, ratioed, etc.),
architectures (pipelining, iterative datapaths, faster
arithmetic algorithms, etc.), or other aspects beyond VLSI
design techniques.
- Make all Vdd and Gnd wires (e.g. the main power wires
in cells) at least 8 λ wide.
- All transistors must be minimum length unless specified
otherwise.
- See Special Nanometer-scale rules for EEC 116 doc and follow
rules for all layout.
- Build enable-able flip-flops by connecting a mux to a regular
flip-flop's input (if en=0, D=FF output; if en=1, D=input).
- Use "safest" flip-flop design from previous handout.
Functional testing
Run the following test sequences in irsim on core.mag.
Some irsim command files that may be helpful:
1clk.cmd,
test1.cmd.
The underscore character "_" is used only to separate digits into more
easily-readable groups and does not affect the data values.
- coefficients=[1 2 3 2 1], config_sat=11
[configure appropriately]
in_data=0000_0000 # flush out
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0001 # input=1
in_data=0000_0000 # input=0
in_data=0000_0000 # input=0
in_data=0000_0000 # input=0
in_data=0000_0000 # input=0
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0010 # input=2
in_data=0000_0000 # input=0
in_data=0000_0000 # input=0
in_data=0000_0000 # input=0
in_data=0000_0000 # input=0
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=1000_0000 # input=128
in_data=0000_0000 # input=0
in_data=0000_0000 # input=0
in_data=0000_0000 # input=0
in_data=0000_0000 # input=0
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0001 # input=1
in_data=0000_0001 # input=1
in_data=0000_0001 # input=1
in_data=0000_0001 # input=1
in_data=0000_0001 # input=1
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=1111_1111 # input=255
in_data=1111_1111 # input=255
in_data=1111_1111 # input=255
in_data=1111_1111 # input=255
in_data=1111_1111 # input=255
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
- coefficients=[4 63 250 63 4], config_sat=00
[configure appropriately]
in_data=0000_0000 # flush out
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=...._.... # input=1
in_data=...._.... # input=11
in_data=...._.... # input=116
in_data=...._.... # input=255
in_data=...._.... # input=255
in_data=...._.... # input=255
in_data=...._.... # input=255
in_data=...._.... # input=255
in_data=...._.... # input=58
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
in_data=0000_0000
Make an irsim waveform with every I/O signal name and print
it out so that all data values are visible in hexidecimal.
Change chip inputs on the falling edge of clk.
Measuring the maximum speed
Measure the maximum speed of your design by:
- Getting your chip functional with the functional test above using
a long stepsize (for example, "stepsize 100")
- Repeat the simulation with varying stepsizes using the following
algorithm:
- If the test runs without a "pending events" message, use
a stepsize half way between the current value and the value for
the last failing (with pending events) run.
- If the test runs with a "pending events" message, use
a stepsize half way between the current value and the value for
the last successful (no pending events) run.
- Iterate until the difference in stepsize between simulations with
and without "pending events" messages is 0.01 (nsec).
Remember that the minimum stepsize without pending events is the
same as the minimum cycle time.
The irsim clock cycle time is twice the stepsize (one
stepsize is while clock is high, one while it is low).
_____ _____
clock _____| |_____|
during
test | | |
-->| | |<-- irsim cycle time
-->| |<-- stepsize (minimum cycle time when no pending events)
__ __ __
clock _____| |__| |__| |__
at max
frequency
For example, if the shortest stepsize without pending events is 0.9 ns,
the irsim simulation has a clock cycle time of 1.8 ns.
The circuit can run at a maximum frequency of 1/0.9 ns = 1.1 GHz.
Points
Total: 350 pts + 0-150 performance pts
Fill out and submit this
Summary Sheet.
All *.mag files and all files needed for testing your design must be
uploaded to SmartSite. Layout and simulations will be verified.
a) [10+10 pts] Full adder: transistor schematic with widths, stick diagram
b) [10+10 pts] Clock tree driver: calculations, transistor
schematic with widths.
c) [10+10 pts] Output pad driver: calculations, transistor
schematic with widths.
d) [20 pts] Magic plot of layout for "core.mag"
e) [20 pts] Magic plot of layout for entire chip
f) [10 pts] Floorplan drawing showing the location of each block in the
diagram above.
g) [100 pts] core.mag, test sequence #1: printout of the IRSIM test sequence
described above showing inputs and outputs. Values must be
readable.
Use print/file menu in irsim.
h) [100 pts] core.mag, test sequence #2: printout of the IRSIM test sequence
described above showing inputs and outputs. Values must be
readable.
Use print/file menu in irsim.
i) [20 pts] Minimum measured cycle time with no pending events.
j) [20 pts] Write a half page (minimum) of original text not copied from
project assignment giving an overview of the chip. Include
short descriptions of at least 5 key tradeoffs you examined
(floorplanning, transistor sizing, circuit,... ) and state why
you chose what you did.
Performance points.
Area, delay, and A x D are critical parameters in VLSI design. Since
we do not know ahead of time how small the circuit can be, grading will
necessarily have to be based on results of other groups in the class.
If a group achieves a poor result but clearly made a strong effort, the
instructor may increase their points beyond the formula's recommendation.
These points are assigned proportionately
to how each group's results compare to the class overall. For example,
the group with the largest active circuit area will receive no points
for the area category and the group with the smallest area will receive 25,
and a group with an area half way between the two will receive 13 points.
No points are possible for designs that
are not fully functional, or
extract with any warnings
(except extraction warnings when extracting top.mag and chip.mag
from shorting VddIo with VddCore via label, and GndIO and GndCore
on the pads).
The following points are for "core.mag":
- [25 pts] lower area compared to class
- [25 pts] lower delay compared to class
- [100 pts] lower area x delay compared to class
Additional tasks for the group of 3
- Additional functional requirements TBD.
Place all circuits outside core.mag
- Design and place corner pads but do not count these as some of
the required pads.
- Design and place 11 arrays of bypass capacitors (using same basic
cell): 1 small array inside core.mag, and 5 medium and 5 large
arrays outside core.mag .
- Write a 1.5 page (minimum) description rather than 0.5 page.
Misc.
- You may invert internal signals however you wish but the
operation of the system must be as described.
- Reference matlab files
- What if my partner does not do their part?
1) No matter what, do an excellent job on your part, 2) kindly
tell your partner what you have finished and how much time you
are spending (to encourage them to step up), and 3) if your
partner still does not do their part, do as excellent work on
your part as you can, make a clear line between your work and
theirs, include a short description in your report saying what
each of you did, and I will see if there is anything I can do
(but ONLY in the most drastic cases).
Updates:
2010/11/19 Written