# A 4.5 NS 96B CMOS ADDER DESIGN Ajay Naini, David Bearden and William Anderson Microprocessor and Memory Technologies Group, Motorola Inc. 6501 William Cannon Drive West, Austin, Texas 78735-8598 ## **ABSTRACT** A new approach to the design of high-speed adders using the carry look-ahead principle is presented. These techniques discuss a new organization for the carry-chain, with minimal area impact, which minimizes the latency while maintaining modularity. Application of these techniques to a 96-bit adder, implemented in a CMOS process with 1.0 $\mu$ m design rules, shows a critical path delay of 4.5 ns. #### INTRODUCTION Carry look-ahead (CLA) addition has become one of the more popular techniques for implementing fast adders because of the resulting speed, area, and modularity<sup>1</sup>. However, as the word size of the adders increase, the conventional carrychain delay easily limits cycle time. In an attempt to decrease carry-chain delay, new circuit techniques have been applied to CLA addition<sup>2</sup>. This paper discusses a new organization for the carry-chain to minimize latency. As an example, a 96-bit adder design will be discussed. The recursive method of CLA addition is well known. In general, fan-in limits carry look-ahead to groups of four bits. Because of this, multi-level look-ahead structures are used for larger words. Fig. 1a shows the structure for a 64-bit conventional CLA carry-chain in a group size of four, and the CLA equations are shown in Fig. 1b. Each carry-block generates three local carry-out terms from the P,G terms and the carry-in. Fig. 1c shows a circuit implemention of a 4-bit group PG-block and carry block. There are three levels of delay to the most significant carry-out term. The PG-block generates the 4-bit group P,G terms. The critical path for the adder which is highlighted in Fig. 1a is: A,B - $$G_0$$ - $G_{3:0}$ - $G_{15:0}$ - $C_{16}$ - $C_{32}$ - $C_{48}$ - $C_{52}$ - $C_{56}$ - $C_{60}$ - $C_{61}$ - $C_{62}$ - $C_{63}$ - $S_{63}$ ## MODIFIED CLA STRUCTURE Fig. 2a shows the modified CLA structure. The PG-block here generates intermediate group P terms $(P_{2:0}, P_{1:0})$ and group G terms $(G_{2:0}, G_{1:0})$ , in addition to the 4-bit group P,G terms. A Manchester chain structure is used in the generation of the group G terms to reduce the transistor count. The carry-block generates three local carry-out terms $(C_1, C_2, C_3)$ . The associated circuit structure for the PG-block and carry block is shown in Fig. 2b. The generation of the intermediate group P,G terms means the delay to any carry-out term is only one level. The critical path for the modified CLA structure which is highlighted in Fig. 2a is: $$A,B-G_0-G_{3:0}-G_{15:0}-G_{47:0}-C_{48}-C_{60}-C_{63}-S_{63}$$ Table 1 details the timing associated with each of the above critical paths. These timing numbers are based upon a 1 $\mu m$ design rules CMOS process (Leff = 0.8 $\mu m$ ), 5.0 V, 25° C. As can be seen, the 64-bit modified CLA addition is faster than the conventional 64-bit addition. The reason for the faster speed is the availability of the intermediate group P,G terms at every PG-block, which enabled the most significant carry-out term delay of every carry-block to be a one-level delay. The PG-block of the modified CLA design has 41 transistors as opposed to 19 transistors for the conventional CLA design. However, this will minimally impact area because these transistors arise in the intermediate bits, which will not be the limiting factor in determining the bit cell size. #### **CARRY INPUT REQUIREMENTS** In the critical path for the conventional 64-bit CLA adder, the carry-in, $C_0$ , is needed in the generation of $C_{16}$ (Fig. 1a). In the modified 64-bit CLA adder, the carry-in is needed in the generation of $C_{48}$ (Fig. 2a). This means the carry-in to the respective adders should be valid by 1.7 ns and 2.9 ns (Refer to Table 1 for timing numbers). This tolerance of a late carry-in of the modified adder may be applied in the construction of larger adders. ## A 96-BIT ADDER USING MODIFIED CLA The 96-bit adder shown in Fig. 3 consists of a high-order 64-bit modified adder and a low-order 32-bit conventional adder. The low-order 32-bit adder has the carry-out, C<sub>32</sub>, valid by 2.35 ns (refer to Table 1 for timing numbers). Since this carry-out delay falls within the carry-in requirement (2.9 ns) of the 64-bit adder, the 32-bit adder timing does not affect the critical path. Therefore, the low-order 32-bit addition is effectively transparent to the operation of the higher-order 64-bit addition. This phenomenon enables the 96-bit addition to be performed in 4.5 ns, which is the same time as the modified 64-bit addition, and in a lesser time than the 64-bit conventional CLA addition. ## **REFERENCES** - [1] S. Waser and M. Flynn, *Introduction to Arithmetic for Digital Systems Designers*, Holt, Rinehart, and Winstion, New York, 1982, pp. 83-88. - [2] I. Hwang and A. Fisher, "A 3.1 ns 32b CMOS Adder in Multiple Output Domino Logic," *ISSCC Digest of Technical Papers*, Feb. 1988, pp. 140-141. Figure 1a. Conventional CLA Structure ``` \begin{array}{l} P=A \oplus B \ (propogate) \\ G=A \cdot B \ (generate) \\ C_0 \ (carry\text{-in}) \\ C_1=G_0+C_0\cdot P_0 \\ C_2=G_1+C_1\cdot P_1=G_1+G_0\cdot P_1+C_0\cdot P_0\cdot P_1 \\ C_3=G_2+C_2\cdot P_2=G_2+G_1\cdot P_2+G_0\cdot P_1\cdot P_2+C_0\cdot P_0\cdot P_1\cdot P_2 \\ C_4=G_3+C_3\cdot P_3=G_3+G_2\cdot P_3+G_1\cdot P_2\cdot P_3+G_0\cdot P_1\cdot P_2\cdot P_3+C_0\cdot P_0\cdot P_1\cdot P_2\cdot P_3 \\ =G_{3:0}+C_0\cdot P_{3:0} \\ C_8=G_{7:4}+C_4\cdot P_{7:4}=G_{7:4}+G_{3:0}\cdot P_{7:4}+C_0\cdot P_{3:0}\cdot P_{7:4} \\ & \cdots \\ C_{16}=G_{15:0}+C_0\cdot P_{15:0} \\ C_{84}=G_{63:0}+C_0\cdot P_{63:0} \\ S_n=P_n\oplus C_n \end{array} ``` Figure 1b. Carry-Lookahead Equations Figure 1c. Circuit Implementation of Conventional 4b Group PG and Carry Blocks Figure 2a. Modified CLA Structure Figure 2b. Circuit Implementation of Modified 4b Group PG and Carry Blocks | TOTAL = 5 | | | | |------------------------------------------------------------------------------------------------------|----------------|------------------------------------------------------------|---------------| | C <sub>63</sub> - S <sub>63</sub> | 9:0 | | | | C <sub>62</sub> - C <sub>63</sub> | 0.35 | | | | Go-G310 G310-G1530 G1531-C16 C16-C32 C32-C48 C48-C52 C52-C56 C56-C60 C60-C61 C61-C62 C62-C63 C63-S63 | 0.35 | | | | C <sub>60</sub> - C <sub>61</sub> | 0.35 | 2NS | | | 090 - 950 | 0.35 | TOTAL = 4.5NS | | | C <sub>52</sub> - C <sub>56</sub> | 0.35 | C <sub>63</sub> - S <sub>63</sub> | 9.0 | | C48 - C52 | 0.35 | C <sub>60</sub> - C <sub>63</sub> | 0.35 | | C32 - C48 | 0.35 | C48 - C60 | 0.35 | | C16 - C32 | 0.35 | G47:0 - C48 | 0.3 | | G15:0 - C16 | 0.3 | Go-G30 G30-G150 G150-G470 G470-C48 C48-C60 C60-C60 C63-S63 | 0.7 | | G <sub>3:0</sub> - G <sub>15:0</sub> | 0.65 | G <sub>3:0</sub> - G <sub>15:0</sub> | 6.0 | | G <sub>0</sub> - G <sub>3:0</sub> | 0.65 | Go - G3:0 | 6.0 | | A,B-G <sub>0</sub> | 0.4 | A,B - G <sub>0</sub> | 0.4 | | CRITICAL | DELAY 0.4 (NS) | CRITICAL<br>PATH | DELAY<br>(NS) | | CONVENTIONAL<br>CLA ADDER<br>(FIGURE 1A) | | MODIFIED<br>CLA ADDER<br>(FIGURE 2A) | | Table 1. Timing Characteristics for 1.0 $\mu m$ CMOS process with VDD = 5V and 25° C