# A 4.4-ns CMOS 54X54-b Multiplier Using Pass-transistor Multiplexer

Norio Ohkubo, Makoto Suzuki, \*Toshinobu Shinbo, Toshiaki Yamanaka, \*Akihiro Shimizu, \*\*Katsuro Sasaki, and Yoshinobu Nakagome

Central Research Laboratory, Hitachi Ltd., 1-280 Higashi-Koigakubo, Kokubunji, Tokyo 185, Japan.

\*Hitachi VLSI Engineering Corporation, Kodaira, Tokyo 187, Japan. \*\*R&D Division, Hitachi America Ltd., Brisbane, CA 94005-1819

## Abstract

A 54 X 54-b multiplier using pass-transistor multiplexer has been fabricated by 0.25-µm CMOS technology. To enhance the speed performance, a new 4-2 compressor and a carry look-ahead adder (CLA) both featuring the use of pass-transistor multiplexers have been developed. The new circuits have a speed advantage over conventional CMOS circuits because the number of critical-path gate stages is minimized due to the high logic functionality of pass-transistor multiplexers. The active size of the 54 X 54-b multiplier is 3.77 mm X 3.41 mm. The multiplication time is 4.4 ns at 2.5 V power supply.

### Introduction

Enhancing the performance of floating point operation is indispensable for current high-performance microprocessors. In particular, high speed multiplication operation is becoming one of the keys in RISCs, DSPs, graphics accelerators and so on, because of increasing demand from multimedia applications. Recent high-end microprocessors call for an operation frequency of 200 MHz or over, and a multiplier will be required to operate in one clock cycle. However, no 54 X 54-b multiplier with a delay time less than 5 ns has yet been reported [1][2]

This paper describes a 54 X 54-b multiplier macro developed for the mantissa multiplication of two double-precision numbers as outlined in the IEEE standard. To reduce the multiplication time, a new 4-2 compressor and a carry look-ahead adder (CLA) featuring pass-transistor multiplexers have been developed. The new circuits gain a speed advantage over conventional CMOS circuits because the number of critical-path gate stages is minimized due to

the high logic functionality of pass-transistor multiplexers. The 54 X 54-b multiplier was fabricated by triple-metal 0.25-µm CMOS technology.

#### Architecture

The block diagram of the 54 X 54-b multiplier is shown in Fig. 1. We used Booth's algorithm, Wallace's tree and a conditional carry-selection (CCS) adder [3]. The number of partial products is halved by Booth's algorithm. Without propagation of the carry, partial products are summed by Wallace's tree. The summed results are added by CCS adder with high-speed carry propagation.

Wallace's tree used the 4-2 compressor, which has five inputs and three outputs. Carry-out (Co) is connected to the next 4-2 compressor's carry-in (Ci), as shown in Fig. 1. Without propagating the carry to the higher bit, the 4-2 compressor can add four partial products because the carry-out (Co) does not depend on the carry-in (Ci). By using the 4-2 compressor, only four addition stages are needed for Wallace's tree as shown in Fig. 1.

## Circuit and Layout Design

The 4-2 compressor circuits using pass-transistor multiplexers are shown in Fig. 2. Since the pass-transistor multiplexer circuit, as shown in Fig. 3, has high logic functionality, a full adder circuit is constructed by three pass-transistor multiplexers. The 4-2 compressor is constructed of two full adders, and there are four critical-path gate stages, as shown in Fig. 2(a). This circuit is faster than the conventional CMOS circuit. For further speed improvement, we used an improved 4-2 compressor. The number of critical-path gate stages for this circuit becomes

26.4.1 599

three by exploiting parallelism, as shown in Fig. 2(b). The simulated delay comparison for these 4-2 compressor circuits is shown in Fig. 4. The proposed circuit reduces the propagation delay time by 18% from that of full-adder-based circuit. The construction of Wallace's tree is shown in Fig. 5. By using the 4-2 compressor, the construction of Wallace's tree can be simplified.

The carry look-ahead adder (CLA) in the final adder also uses pass-transistor multiplexers, as shown in Fig. 6. We have already reported a new look-ahead carry scheme called conditional carry-selection (CCS) [3]. The 4-bit CLA is constructed by three multiplexers and is faster than the conventional pass-transistor-based design by avoiding series-connected pass-transistors in the carry propagation path. To apply this scheme to the final 108-bit adder, the 4-bit CLA is modified to an 8-bit CLA, as shown in Fig. 6. The new 8-bit CLA achieves four critical-path gate stages by exploiting the parallelism. It reduces the 108-bit addition time to 1.52 ns.

## **Fabrication**

The chip was fabricated by triple-metal 0.25-µm CMOS technology. Table 1 shows the process technology. The 1st metal is tungsten, and the 2nd and 3rd metals are aluminum. It operates from supply voltage of 2.5 V. Figure 7 shows a micrograph of the chip. 100,200 transistors are integrated in the active area of 3.77 mm X 3.41mm.

# **Evaluation**

The simulated multiplication time of the 54 X 54-b multiplier is shown in Fig. 8. The multiplication time is 4.4 ns with a 2.5 V power supply. It shows excellent characteristics at such a low voltage because of the pass-transistor-multiplexer-based design where both NMOS and PMOS are turned on. The Characteristics of this 54 X 54-b multiplier test chip are summarized in Table 2. The measured waveforms of Wallace's tree are shown in Fig. 9. The measured delay was almost the same as that of the simulated value.

Figure 10 shows the multiplication time plotted against the device dimensions. The multiplication time with the full-adder-based circuit is estimated to be 5.1 ns. Therefore, this multiplier achieves 14% improvement in multiplication time due to the use of the new circuits with pass-transistor multiplexers.

#### Conclusions

A new 4-2 compressor and CLA using pass-transistor multiplexers have been developed to shorten multiplication time. The multiplication time of the 54 X 54-b multiplier is reduced by 14% due to the reduction of the critical-path gate stages by using pass-transistor multiplexers. A 4.4-ns multiplication time was achieved with 0.25- $\mu$ m CMOS technology.

#### Acknowledgments

The authors wish to thank Dr. K. Shimohigashi, Dr. T. Nishimukai, Dr. E. Takeda, and Dr. T. Nagano for their useful discussions, and K. Ueda and K. Takasugi for their assistance and support. The authors are also greatly indebted to T. Nishida, A. Fukami, N. Ohki, and H. Ishida for their assistance with the process technology and device fabrication.

#### References

- G. Goto et al., "A 54 X 54-b regularly structured tree multiplier," *IEEE J. Solid-State Circuits*, vol. 27, pp. 1229-1236, September 1992.
- [2] J. Mori et al., "A 10-ns 54 X 54-b parallel structured full array multiplier with 0.5-μm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 26, pp. 600-606, April 1991.
- [3] M. Suzuki et al., "A 1.5ns 32b CMOS ALU in double pass-transistor logic," in 1993 ISSCC Dig. Tech. Papers, pp. 90-91, February 1993.



Fig. 1. Block diagram of the 54 X 54-b multiplier using pass-transistor multiplexer.



Fig. 2. 4-2 compressor circuits using pass-transistor multiplexer:
(a) full-adder-based construction (b) proposed construction.



Fig. 4. Simulated comparison of 4-2 compressor circuits:

(a) full-adder-based construction (b) proposed construction.



P: 8X4 Partial product generators
C: 4-2 Compressor
H: Half adder
F: Full adder

Fig. 5. Construction of Wallace's tree.



Fig. 3. Pass-transistor multiplexer circuit.

| Table 1. Process technology |                              |
|-----------------------------|------------------------------|
| Technology                  | 0.25-µm CMOS<br>Triple metal |
| Gate length                 | 0.25 μm                      |
| Gate oxide                  | 6.5 nm                       |
| 1st Metal Width/Space       | 0.5 μm / 0.4 μm              |
| 2nd Metal Width/Space       | 0.5 μm / 0.4 μm              |
| 3rd Metal Width/Space       | 0.7 µm / 0.6 µm              |

26.4.3



Fig. 6. 8-bit CLA using pass-transistor multiplexer.



Fig. 7. Micrograph of the 54X54-b multiplier.

Table 2. Characteristics of the 54X54-b multiplier

| Organization        | 54X54-b multiplier |
|---------------------|--------------------|
| Multiplication time | 4.4 ns             |
| Active Area         | 3.77 X 3.41 mm     |
| Transistors         | 100200             |



Fig. 8. Multiplication time of 54X54-b multiplier.



Fig. 9. Measured waveforms of the Wallace's tree.



Fig. 10. 54X54-b multiplication time versus device dimension.