CHIP DESIGN METHODOLOGIES OR DESIGN METHODS

#### Processor Design Approaches

- Full custom\*\*
- Standard cell\*
- Gate array
- FPGA
- Programmable special-purpose
- Programmable general-purpose Reconfigurable/System-on-chip
  - \* Design domains of EEC 116

higher performance lower energy (power) lower per-part cost



lower design time lower one-time cost

# VLSI Design Technologies

#### • VLSI

- Originally meant "Very Large Scale Integration" meaning a large number of transistors per chip
- Now generally means "semiconductor chip"
- Characterized by their minimum feature length (length of transistor's gate)
- Some typical state-of-the-art fabrication technologies in late 2019:
  - 14 nm Mature production for logic chips
  - 5 nm "Industry-leading 5 nm CMOS technology features, for the first time, full-fledged EUV, and high mobility channel finFETs, offering ~1.84x logic density, 15% speed gain or 30% power reduction over 7 nm. This true 5 nm technology successfully passed qualification with high yield, and targets for mass production in 1H 2020." —IEDM, December 2019

# Full Custom

- All transistors and interconnect drawn by hand
- Full control over sizing and layout



## Full Custom

- Multiplier chip
  - Multiplier
  - I/O pads
  - Clock generator
  - Control logic
  - Buffers



# Standard Cell

- Constant-height cells
- Regular "pin" locations
- Regular layout allows CAD tools to much more easily automatically place and route cells





# Standard Cell

• Channels for routing only in older technologies (not necessary with modern processes with many levels of interconnect)





# Standard Cell

- Wireless LAN chip
- Ten major standard cell digital blocks. Plus one analog block in the upper right corner
- Many embedded memory arrays
- Horizontal power grid stripes



# Combination Standard Cell and Full Custom

- Dense, regular fullcustom blocks
- Random logic implemented with standard cells and automatic place and route



20

#### Typical Standard Cell, Gate Array, or FPGA Design Flow

- HDL (Verilog) source code is synthesized to generate a *gate netlist* made up of elements from the Standard Cell library
- The same HDL design may be synthesized to various libraries; for example:
  - Standard cell (NAND, NOR, Flip-Flop, etc.)
  - FPGA library (CLBs, LUTs, etc.)



# Simplified diagram of Standard Cell design flow after synthesis



#### Layout synthesized from Verilog and a Standard Cell library, and then "Placed & Routed"



| module multiplier (         |
|-----------------------------|
| input in1,                  |
| input in2,                  |
| output out                  |
| );                          |
| <pre>out = in1 * in2;</pre> |
| endmodule                   |
|                             |



### Gate Array

- Polysilicon and diffusion are the same for all designs
- Metal layers customized for particular chips



### Gate Array

- Polysilicon and diffusion the same for all designs
- 0.125 um example



#### Gate Array — Sea-of-gates



# Field Programmable Gate Array (FPGA)

- Metal layers now programmable with SRAM instead of hardwired during manufacture as with a gate array
- Cells contain general programmable logic and registers



# Field Programmable Gate Array (FPGA)

- Chips now "designed" with software
- User pays for up-front chip design costs
  - All: full-custom, standard cell
  - Half: gate array
  - Shared: FPGA
- User writes code (e.g., verilog), compiles it, and downloads into the chip
- The flexibility comes at a great cost however; as a very approximate comparison, FPGAs are over 10x slower, less energy efficient, and greater area than an equivalent Standard Cell design

### **Progrmmable Processor**

- Intel 8086
- First released 1978
- 33 mm<sup>2</sup>
- 3.2 µm
- 4–12 MHz
- 29,000 transistors



# 4.80 GHz General-Purpose Processor

- Intel i9 (formerly called Coffee Lake) [i9-8950HK]
- 14 nm CMOS
- 6 cores (12 threads)
- 2.90 GHz base frequency
- 4.60 GHz standard turbo frequency
- 4.80 GHz maximum turbo frequency—possible only if the CPU is below 53 °C
- 12 MB on-die cache
- 45 Watts TDP (Thermal Design Power)



#### Massive General-Purpose Server Processor

- Itanium Poulson
- 32 nm
- 3.1 Billion Transistors
- 18.2 mm x 29.9 mm = 544 mm<sup>2</sup>
- 8 multi-threaded cores
- 54 MB total on-die cache
- 170 Watts TDP
- [ISSCC 2011]



# Programmable DSP Processor

- TI C64X
- 600 MHz, 0.13 um, 718 mW @ 1.2 V
- 8-way VLIW core
- 2-level memory system
- 64 million transistors



#### Massive Special-Purpose Processor

- Nvidia V100
- TSMC 12 nm FinFET
- 21.1 Billion Transistors
- 815 mm<sup>2</sup>
  - Approximately 37.9 mm x 21.5 mm
  - At the reticle limit
- 1.45 GHz
- 80 streaming multiprocessors
- 300 Watts TDP
- Memory interface to HBM2 1.75 GHz, 4096-bit bus, 900 GB/s
- [HotChips 2017]





#### COLOSSUS GC2

#### The world's most complex processor chip with 23.6 billion transistors



#### Heterogeneous Programmable Platforms

**FPGA** Fabric



#### **Embedded memories**

#### Hardwired multipliers

#### Xilinx Vertex-II Pro

High-speed I/O

35

#### Design at a crossroad System-on-a-Chip



- Often used in embedded applications where cost, performance, and energy are big issues!
- DSP and control
- Mixed-mode
- Combines programmable and application-specific modules
- Software plays crucial role

#### A System-on-a-Chip Example High Definition TV Chip



#### The World's Largest Chip Cerebras Wafer-Scale Engine

- 46,225 mm<sup>2</sup> chip
  - 8.5" × 8.5"
  - Built from a 12" wafer
  - 56x larger than the biggest
    GPU ever made: 815 mm<sup>2</sup> and
    21.1 billion transistors
- 1.2 Trillion transistors
- 15 KWatts!
- 400,000 cores
- Fabbed by TSMC, 98%-99% of wafer area is usable
- 18 GB on-chip SRAM
- 100 Pb/s interconnect (100,000 Tb/s = 12,500 TB/sec)
- Approximately \$200M startup capital as of Aug 2019

EEC 116, B. Baas



https://www.cerebras.net/

https://www.zdnet.com/article/cerebras-has-as-a-three-year-lead-on-competition-with-its-giant-chip/