



# **VLSI** Arithmetic

Lecture 6



## Prof. Vojin G. Oklobdzija University of California

http://www.ece.ucdavis.edu/acsel







## Review

### Lecture 5





# Prefix Adders and Parallel Prefix Adders





#### ADDITION: TWO-STEP PROCESS

- 1. Obtain carries (carry at i depends on  $j \leq i$ )
  - non-trivial to do fast
- 2. Compute sum bits (local function)



Figure 2.2: Steps in addition.

#### from: Ercegovac-Lang





## **Prefix Adders**

## Following recurrence operation is defined: (g, p)o(g',p')=(g+pg', pp')

such that:

$$\begin{split} G_{i}, \ P_{i} &= \begin{cases} \left(g_{0}, \ p_{0}\right) & i=0 \\ & \left(g_{i}, \ p_{i}\right)O(G_{i-1}, \ P_{i-1}) & 1 \leq i \leq n \\ & c_{i+1} &= G_{i} & \text{for } i=0, \ 1, \ \dots \ n \\ & c_{1} &= g_{0} + p_{0} \ c_{in} & \left(g_{-1}, \ p_{-1}\right) = (c_{in}, c_{in}) \end{split}$$

This operation is associative, but not commutative It can also span a range of bits (overlapping and adjacent)



Oklobdzija 2004



### Parallel Prefix Adders: S. Knowles 1999

operation '•':

$$\left(\frac{g}{k}\right)_{i} \bullet \left(\frac{g}{k}\right)_{j} = \left(\frac{g_{i} + \overline{k}_{i} \cdot g_{j}}{k_{i} \cdot \overline{k}_{j}}\right)$$

$$\left(\frac{g}{k}\right)_{h\ldots j} \bullet \left(\frac{g}{k}\right)_{j\ldots k} = \left(\frac{g}{k}\right)_{h\ldots i} \bullet \left(\frac{g}{k}\right)_{i\ldots k}$$

operation is associative: h>i≥j≥k

$$\left(\frac{g}{k}\right)_{h\ldots j} \bullet \left(\frac{g}{k}\right)_{i\ldots k} = \left(\frac{g}{k}\right)_{h\ldots k}$$

operation is idempotent: h>i≥j≥k

$$\left(\frac{c_{i+1}}{k_i \cdot k_{i-1} \cdot k_{i-2}} \dots \cdot k_0\right) = \left(\frac{g}{k}\right)_i \cdot \left(\frac{g}{k}\right)_{i-1} \cdot \left(\frac{g}{k}\right)_{i-2} \dots \cdot \left(\frac{g}{k}\right)_0 \quad \text{produces carry: } c_{in} = 0$$





#### Prefix adders



Figure 2.17: Composition of spans in computing (g, a) signals.

#### from: Ercegovac-Lang





















10 **ap** 

## Kogge-Stone Adder







## **Brent-Kung Adder** $x_{15} x_{14} x_{13} x_{12} x_{11} x_{10} x_9 x_8 x_7 x_6 x_5 x_4 x_3 x_2 x_1 x_0$ Level 1 2 3 4 5 6 $s_{15} s_{14} s_{13} s_{12} s_{11} s_{10} s_9 s_8 s_7 s_6 s_5$ $s_4 s_3$ *s*<sub>2</sub> *s*<sub>1</sub> $s_0$





## Hybrid BK-KS Adder







### Pyramid Adder:

M. Lehman, "A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic Units", IFIP Congress, Munich, Germany, 1962.







## Parallel Prefix Adders: Ladner-Fisher



Figure 1: 32b Ladner-Fischer graph [16,8,4,2,1]

Exploits associativity, but not idempotency. Produces minimal logical depth





### Parallel Prefix Adders: Ladner-Fisher (16,8,4,2,1)



Two wires at each level. Uniform, fan-in of two. Large fan-out (of 16; n/2); Large capacitive loading combined with the long wires (in the last stages)



msb



## Parallel Prefix Adders: Kogge-Stone



Exploits idempotency to limit the fan-out to 1. Dramatic increase in wires. The wire span remains the same as in Ladner-Fisher.

Buffers needed in both cases: K-S, L-F

Figure 3: 32b Kogge-Stone graph [1,1,1,1,1]





## Parallel Prefix Adders: Brent-Kung

- Set the fan-out to one
- Avoids explosion of wires (as in K-S)
- Makes no sense in CMOS:
  - fan-out = 1 limit is arbitrary and extreme
  - much of the capacitive load is due to wire (anyway)
- It is more efficient to insert buffers in L-F than to use B-K scheme





## **Two Parallel Prefix Adder Structures**

#### Kogge-Stone



*log(bits)* carry stagesExtra Wiring

#### Han-Carlson



*log(bits)* + 1 carry stages
Reduced Wiring and Gates





## Parallel Prefix Adders: Han-Carlson

- Is a hybrid synthesis of L-F and K-S
- Trades increase in logic depth for a reduction in fan-out:
  - effectively a higher-radix variant of K-S.
  - others do it similarly by serializing the prefix computation at the higher fan-out nodes.
- Others, similarly trade the logical depth for reduction of fan-out and wire.







ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

Following rules are used:

- Lateral wires at the j<sup>th</sup> level span 2<sup>j</sup> bits
- Lateral fan-out at j<sup>th</sup> level is power of 2 up to 2<sup>j</sup>
- Lateral fan-out at the j<sup>th</sup> level cannot exceed that a the (j+1)<sup>th</sup> level.



• The number of minimal depth graphs of this type is given in:

| operand width<br>(bits) | number of basic<br>minimum-depth graphs |
|-------------------------|-----------------------------------------|
| 4                       | 2                                       |
| 8                       | 5                                       |
| 16                      | 14                                      |
| 32                      | 42                                      |
| 64                      | 132                                     |
| 128                     | 429                                     |
| 256                     | 1430                                    |

 at 4-bits there is only K-S and L-F, afterwards there are several new possibilities.





Figure 5: 32b graph [4,4,2,2,1] Knowles 1999 example of a new 32-bit adder [4,4,2,2,1]





Knowles 1999



Example of a new 32-bit adder [4,4,2,2,1]





|                        | Structure    | Buffering   | Delay      | Length | Transverse    | wire flux |
|------------------------|--------------|-------------|------------|--------|---------------|-----------|
|                        |              |             | (ref invs) | (µm)   | By level      | Total     |
| Ladner-Fischer (fig 2) | [16,8,4,2,1] | [2,1,1,0,0] | 13.7       | 38     | [1,2,2,2,2]   | 9         |
| -                      | [16,4,2,2,1] | [2,1,1,0,0] | 13.2       | 38     | [1,4,4,2,2]   | 13        |
| -                      | [16,2,2,2,1] | [2,1,1,0,0] | 13.0       | 41     | [1,8,4,2,2]   | 17        |
| (fig 6)                | [4,4,2,2,1]  | [1,1,0,0,0] | 13.2       | 35     | [4,4,4,2,2]   | 16        |
| -                      | [4,4,2,2,1]  | [1,1,1,0,0] | 12.7       | 39     | [4,4,4,2,2]   | 16        |
| -                      | [2,2,2,1,1]  | [1,1,1,0,0] | 12.1       | 46     | [8,8,4,4,2]   | 26        |
| Koggo Stopo            | [1,1,1,1,1]  | [1,1,0,0,0] | 12.1       | 63     | [16,16,8,4,2] | 42        |
| Kogge-Stone            | [1,1,1,1,1]  | [1,1,1,0,0] | 11.8       | 63     | [16,16,8,4,2] | 42        |

- Delay is given in terms of FO4 inverter delay: w.c. (nominal case is 40-50% faster)
- K-S is the fastest
- K-S adders are wire limited (requiring 80% more area)
- The difference is less than 15% between examined schemes







Figure 7: 16b hybrid graph

### Conclusion

- Irregular, hybrid schmes are possible
- The speed-up of 15% is achieved at the cost of large wiring, hence area and power
- Circuits close in speed to K-S are available at significantly lower wiring cost





## Possibilities for Further Research

- The logical depth is important (Knowles was right)
- The fan-out is less important than fan-in (Knowles was wrong):
  - It is possible to examine a variety of topologies with restricted and varied fan-in.
- Driving strength and Logical Effort rules were overlooked and at least neglected:
  - It is possible to create number of topologies taking LE rules into account.
  - It is further possible to combine the rules with compound domino implementation taking advantage of two different rules governing "dynamic" and "static".
- It is still possible to produce a better adder !





# Other Types of Adders





J. Sklansky, "Conditional-Sum Addition Logic", IRE Transactions on Electronic Computers, EC-9, p.226-231, 1960.







Figure 2.21: (a) Obtaining conditional outputs. (b) Combined conditi onal adder. from: Ercegovac-Lang





| <u> </u>   | <b>T</b> |    |    |    |    |    |    |     |     |    | _  |     |    |     |          | _ | <br>•   |                |
|------------|----------|----|----|----|----|----|----|-----|-----|----|----|-----|----|-----|----------|---|---------|----------------|
| L L        | 15       | 14 | 13 | 12 | 11 | 10 | 9  | 8   | 7   | 6  | 5  | 4   | 3  | 2   | 1        | 0 |         |                |
| .Xi        | 11       | ο  | 1  | I. | L  | ο  | I. | 1   | 0   | L  | 1  | 0   | 1  | 1   | 0        | I | ASSUMED | TIME           |
| y i        | 0        | 0  | 0  | 1  | 1  | 0  | 0  | 1   | I   | 0  | I. | t   | 0  | 1   | 1        | ο | CARRY   | INTERVAL       |
| S          | 1        | 0  | T  | 0  | 0  | 0  | Г  | 0   | П   | T  | 0  | П   | Π  | 0   | Π        | T |         |                |
| C          | 0        | 0  | 0  | 1  | 1  | 0  | 0  | lı. | 0   | 0  | 11 | 0   | 0  | 1   | ю        | 0 | 0       | -              |
| S          | 0        | 1  | 0  | 1  | 1  | 1  | 0  | 1   | 0   | 0  | 1  | 0   | 0  | 1   | 0        | ! |         | т,             |
| C          |          | 0  | 1  | 1  |    | 0  | 1  | 1   | 1   | I. | 1  | LL. | 1  | - 1 | 1        |   |         |                |
| S          | LL.      | 0  | 0  | ο  | 0  | 0  | 0  | ο   | II. | 1  | 0  | 1   | 0  | 0   | T        | 1 | 0       |                |
| C          | 0        |    | 1  |    | 1  |    | 1  |     | 0   |    | 1  |     | 1  |     | 0        |   | •       | -              |
| S          | 1        | L  | 0  | L  | 0  | L  | 0  | I.  | 0   | 0  | 1  | ο   | 0  | t   |          |   |         | τ <sub>ι</sub> |
| C          | 0        |    | 1  |    | 1  |    | 1  |     | IL. |    | 1  |     | 1  |     | <u> </u> |   | 1       |                |
| S          | 1        | •  | 0  | 0  | 0  | 1  | 0  | 0   | 0   | 0  | 0  | T   | 0  | 0   | 1        | 1 | 0       |                |
| <b>C</b> . | 0        |    |    |    | 1  |    |    |     | 1   |    |    |     | 11 |     |          |   | l v l   | -              |
| S          | 1        | I  | 0  | 1  | ю  | 1  | ο  | 1   | ю   | ο  | ı  | ο   |    |     |          |   |         | τ <sub>z</sub> |
| c          | 0        |    |    |    | 1  |    |    |     | 1   |    | •  |     |    |     |          |   |         |                |
| 5          | 1        | 1  | 0  | T  | 0  | 1  | 0  | 0   | 0   | 0  | T  | 0   | 6  | 0   | 1        | 1 | <br>    |                |
| c          | 0        |    |    |    |    |    |    |     | 1   |    |    |     |    |     |          |   | 0       | -              |
| 5          | 1        | L  | 0  | I. | ο  |    | ο  | 1   |     |    |    |     |    |     |          |   |         |                |
| C          | 0        |    |    |    |    |    |    |     |     |    |    |     |    |     |          |   |         |                |
| Sį         | 1        | I. | 0  | I  | 0  | 1  | 0  | I   | 0   | 0  | 1  | 0   | 0  | 0   | 1        | 1 |         |                |
| Cit        | 0        |    |    |    |    |    |    |     |     |    |    |     |    | -   | -        |   | 0       | ार्ग्स         |
|            | _        |    | -  |    |    |    | _  | _   |     | -  |    |     | _  | _   |          |   | <br>    |                |

Fig. 1-Example of a conditional-sum addition.







Figure 2.23: Doubling the number of bits of the conditional sum.

from: Ercegovac-Lang







Figure 2.24: 16-bit conditional-sum adder (m = 4).



Computer Arithmetic

from: Ercegovac-Lang









# Carry-Select Adder

O. J. Bedrij, "Carry-Select Adder", IRE Transactions on Electronic Computers, June 1962, p.340-34





## Carry-Select Sum Adder



Figure 2.22: Carry-select adder.



**Computer Arithmetic** 

from: Ercegovac-Lang



## **Carry-Select Adder**

Addition under assumption of  $C_{in}=0$  and  $C_{in}=1$ .



Fig. 11: 16-bit Carry-Select Adder





## Carry Select Adder: combining two 32-b VBAs in select mode







## **Carry-Select Adder**



Fig. 1-25-bit adder group.