Asynchronous Array of Simple Processors (AsAP) project
"You know you have achieved perfection in design, not when you have nothing
more to add, but when you have nothing more to take away."
- Antoine de Saint-Exupéry
Members of the VCL are currently focused on the circuits, functional
units, architecture, interconnection network, algorithms, and applications
for a high-performance and energy-efficient processing system targeting
computationally-demanding multi-task DSP system applications.
The single-chip processing system is comprised of a large number of
fine-grain asynchronously-operating programmable processors connected
by a reconfigurable 2-dimensional mesh network.
We have designed a 0.18 μm CMOS chip that was fabricated during
the summer of 2005. Early testing in the fall of 2005 has shown it is
fully functional!
We believe it is the highest clock rate fabricated processor designed
in any university.
A 13 mm x 13 mm chip utilizing the exact same design in 90 nm
CMOS would contain more than 1000 processors and be capable of more than
1 Tera-op/sec peak performance.
Details of the chip were presented at/in: ISSCC, ISVLSI, HotChips, ICCD,
IEEE MICRO, EURASIP, IEEE TVLSI, IEEE JSSC, ISCAS, and Symp on VLSI Circuits.
A complete list of publications can be found
here.
Key features of the AsAP processor
Several key features of the
AsAP processor enable its high performance, high energy efficiency, and
efficient use of silicon area. These features include:
- a chip multi-processor architecture to achieve high
performance;
- small memories and a simple architecture in each processor
to provide high energy efficiency;
- a globally asynchronous locally synchronous (GALS) clocking style
to simplify clock design, provide easy scaling into future deep submicron
technologies, and increase energy efficiency; and
- nearest neighbor communication to avoid long global wires which
are good for future fabrication technologies.
Architecture (First Generation)
The first generation AsAP processor contains 36 identical processors with
independent clock domains. Each processor is a reduced complexity
programmable DSP with small memories, which can dramatically increase
system area efficiency and energy efficiency. Each processor can
receive data from any two neighbors and send data to any of its four
neighbors. The block diagram of AsAP processor is shown below.
AsAP 1 chip (36 processors)
Below is a photo micrograph of our single-chip 6x6 AsAP processor array.
Chip design
We used a number of CAD tools from Cadence and Synopsys
for our chip design.
This page contains an overview of our
CAD tool flow
including progress and issues.
Here are some topics and issues
we considered before the tape out.
Test board
The AsAP test board is the custom-designed printed circuit board shown
on the right and is designed to work with a commercial Memec FPGA board
shown on the left.
Applications
Several DSP tasks and applications such as FFT, JPEG core encoder and
802.11a/802.11g wireless transmitter are mapped onto AsAP processor.
802.11a/802.11g implementation using 22 processors is
shown below. It consumes 407 mW at 300 MHz and achieve 30% of 54 Mb/s
performance. These results are around 10 times higher performance and
35x - 75x lower energy dissipation than 8-way VLIW TI C62x (according
one implementation reported at ICC02).
Results (First Generation)
AsAP processor operates at 475 MHz; and each processor dissipates
32 mW while executing applications, 84 mW while 100% active,
and 144 mW worst-case at 1.8 V. Most of AsAP's area (66%) is for the
core which is a high area utilization.
Each processor occupies 0.66 mm2,
which is more than 20 times smaller than
the other traditional processors such as ARM. AsAP processor also
achieves more than 5 times higher performance density and energy
efficiency compared with others, as shown at below.
AsAP 2 chip (167 processors)
Key features
- 164 programmable processors
- Configurable Fast Fourier Transform (FFT) processor
- Configurable Viterbi decoder processor
- Configurable video motion estimation processor
- 3 16 KB shared memories
- Circuit-switched long-distance-capable inter-processor network
- Per-processor dynamic voltage scaling
- Per-processor dynamic clock frequency scaling
- All processors and shared memories clocked by fully-independent
clock oscillators
Below is the die micrograph of the single-chip 167-processor AsAPs2 array
processor.
Key data
| Overall Chip |
| CMOS Technology |
65 nm ST Microelectronics low-leakage |
| Transistors |
55 million |
| Area |
39.4 mm2 |
| Single Programmable Processor Tile |
| Transistors |
325,000 |
| Area |
0.17 mm2 |
| Max clock frequency |
1.2 GHz @ 1.3 V |
Power (100% active) |
47 mW @ 1.06 GHz, 1.2 V |
3.4 mW @ 260 MHz, 0.75 V
(Equivalent to 1.0 Tera-op/sec @ 6.5 Watts) |
| 608 μW @ 66 MHz, 0.675 V |
| Fast Fourier Transform (FFT) Accelerator |
| Area |
1.01 mm2 |
| Max clock frequency |
866 MHz @ 1.3 V |
| Viterbi Decoder Accelerator |
| Area |
0.17 mm2 |
| Max clock frequency |
894 MHz @ 1.3 V |
| Video Motion Estimation Accelerator |
| Area |
0.67 mm2 |
| Max clock frequency |
938 MHz @ 1.3 V |
| (3) 16 KB Shared Memories |
| Area |
0.34 mm2 |
| Max clock frequency |
1.3 GHz @ 1.3 V |
Development boards
Work has begun on two
development
boards: one for high-speed AsAP array emulation on an FPGA, and the
other to host our planned CMOS chip.
Key features for both boards include:
- On-board D/A converter(s)
- On-board A/D converter(s)
- Simple interface to a workstation for programming/configuration
- Simple interface to a workstation for data in and data out (may utilize
the same interface using on-board memory for buffering)
- FPGA-only board: Sufficient on-board RAM for future non-AsAP projects
- FPGA-only board: Sufficient CLBs/slices for at least 9 AsAP processors,
ideal goal is 19, 20, or 22 (802.11a transmitter)
Information on the AsAP version 1 development board can be found at: http://www.ece.ucdavis.edu/vcl/asap/asap_v1/asap_ver1.shtml.
Measurements and Characterization
Here is our checklist of things to measure and
characterize in the AsAP1 and AsAP2 chips.
Acknowledgments
This material is based upon work supported by Intel Corporation,
UC MICRO,
the National Science Foundation under Grant No. 0430090
and CAREER grant No. 0546907,
and
a UCD Faculty Research Grant.
Any opinions, findings and conclusions or recomendations expressed in
this material are those of the author(s) and do not necessarily reflect
the views of the National Science Foundation (NSF).
VCL
| ECE Dept.
| UC Davis
Last update: August 22, 2011
Keywords:
electrical engineering, computer engineering,
university, academic, department, group, lab, laboratory,
research development,
chip, VLSI, CMOS, circuit,
low power, energy efficient, FFT, DCT, viterbi, FIR, IIR,
compression, communication, coding, convolution, correlation, encryption,
image, video, JPEG, multimedia, wireless, OFDM, radar, sonor, medical imaging,
MRI, magnetic resonance imaging, biological imaging,
802.11a, 802.11g, wireless LAN, transmitter, receiver.