# SURVEY AND EVALUATION OF LOW-POWER FULL-ADDER CELLS

Ahmed Sayed and Hussain Al-Asaad Department of Electrical & Computer Engineering University of California Davis, CA, U.S.A.

#### ABSTRACT

In this paper, we survey various designs of low-power full-adder cells from conventional CMOS to really inventive XOR-based designs. We further describe simulation experiments that compare the surveyed full-adder cells. The experiments simulate all combinations of input transitions and consequently determine the delay and power consumption for the various full-adder cells. Moreover, the simulation results highlight the weaknesses and the strengths of the various full-adder cell designs.

**Keywords**: Full-adder cell design, low-power circuits, power and delay estimation, VLSI implementations.

## **1** INTRODUCTION

Low power circuit design has been a challenge for a long time and it is now one of the most important goals of today's CMOS designs. Signal processing is one of the most power hungry applications. Adders are the main building blocks for signal processing applications. Saving power in adders would reduce the power consumption significantly at the chip level.

Low power can be achieved at four different levels of the design process, the architectural, the circuit, the device or the layout levels. Power consumption in CMOS digital circuits [1] is divided into three major components as follows:

 $P_{tot} = P_{dynamic}(P_d) + P_{static} + P_{short circuit}$ 

The dynamic component is the part of power consumed when the circuit is switching from one state to another. To be able to estimate the worst case or max power consumption, we need to exhaustively switch the circuit through all of its states. Added to this is the power used to charge and discharge the load capacitance. The load capacitance is identical for all cells and simulations performed in this paper, so it will not play a role in the relative comparison of the power consumption. Actual layout will affect the load capacitance which constitutes the routing capacitance and the fanout capacitance. Layout effects and their minimizations are not considered here. The less and less we have of a characteristic dimension of a technology, the routing load factor starts to dominate the total loading on gates. This is very noticeable in submicron technologies and low fanout designs.

The static component is due to the reverse bias leakage between diffusion regions and the substrate. In conventional CMOS there is no direct path from Vdd and GND at steady stable static state so there is no DC current path, hence power consumption is zero.

The short circuit component is due to the direct path from Vdd to GND during switching of the gate. The slope of inputs of an inverter for example, causes a current spike between Vdd and GND resulting in short circuit power dissipation. The slower the rise and fall times the bigger the current spike and consequently the power dissipated.

In our simulations, several issues need to be resolved. First, since the outputs of some adders are not fully driven rail to rail, then this will cause short circuit power dissipation in the next stage in a bigger design. Hence, loads had to be included in our simulation—two minimum sized buffer inverters—to the sum and carry out to account for the power dissipated in the total bigger design. It would be an unfair power consumption comparison not to have these loads. The second issue in our simulation is that the inputs are driven from two minimum sized inverters for each input. This results in a finite output impedance of the input signal driver, which account to the loading of the input impedance of the different inputs of different adder circuits. The third and final issue is that some of the Complementary Pass Logic (CPL) based adders have feedback paths to inputs which affect the full swing of input signals causing power dissipation. A finite output impedance accounts for that effect while the infinite HSPICE output impedance of the ideal voltage sources would not account for that power consumption and delay.

The basic dynamic power consumption of a conventional CMOS digital circuit is given by:

$$P_d = \alpha * f * V_{dd}^2 * C_{load}$$

- α: is the activity factor which represents the switching activity of the cell on a probabilistic/statistical basis. This is the same for all simulations for all circuits so it is a don't care for relative power consumption analysis.
- *f*: frequency of switching the input signals. This is considered as the max frequency of the inputs.
- V<sub>dd</sub> is the positive supply voltage.
- C<sub>load</sub>: is the load on the output node. This is the same for all circuits.

At the device level, reducing the positive supply voltage V<sub>dd</sub> and reducing the threshold voltage accordingly would reduce the power consumption significantly. At the layout level, some tricks can be used including the use of short smaller transistors, poly and diffusion areas and the use of shorter metal lines for connections of different devices. These mainly reduce the loading i.e. parasitic capacitances in different parts of the device and circuit. At the design level, different methodology to achieve the required function such as CPL instead of traditional CMOS, can reduce area and consequently power. On an architecture level, an algorithm that requires less or smaller gates, maybe minimizing all circuits on an architectural level, can be used to reduce the overall power consumption.

### **2** THE CELLS USED

In our study we used ten of the recently published full-adder cells, from conventional CMOS to really inventive XOR-based adders. The adders are shown in Figure 1, Figure 2, and Figure 3.

In our simulation, the adders have not been sized in any way. This results in some failures for some measurements in Adder10 especially with the selected loads. The adder has demonstrated so many failures that it could not be used in our comparison. Adder10 has demonstrated a severe need for sizing to guarantee functionality.

We believe that for the comparison of power and delay to be fair, we do not need to size any transistor and overlook some of the failures. So, Adder10 will not be considered further for any results or analysis.

We have noticed from our simulation that there are glitches on signals which are inevitable in any combinational circuit. The price to remove these glitches is usually a compromise because on the one hand, having glitches might cause power consumption in the output buffers or stages but on the other hand adding gates to remove the hazards or glitches which are assumed removable might increase the power consumption. So it is really dependent on the system requirements and design constraints. The system might be area, timing or power limited and it might be glitch intolerable depending on the application.

Regarding transistor count, Table 1 shows the number of transistors (T) for the various full-adder cells. It is worth while noting that the less the area the better the Silicon real estate utilization. This is very true if the design relies on many instances of the base cell design.

### **3** SIMULATION SETUP

In our simulation setup, we use five buffers in addition to each and every adder cell as shown in Fig-

# Table 1 Transistor count for various full-adder cells.

|   | Adder # |    |    |    |    |    |    |    |    |  |  |  |
|---|---------|----|----|----|----|----|----|----|----|--|--|--|
|   | 1       | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  |  |  |  |
| Т | 12      | 14 | 16 | 20 | 12 | 10 | 10 | 16 | 15 |  |  |  |



Figure 1 Full-adder cells: 1 to 4.



Figure 2 Full-adder cells: 5 to 8.



Figure 3 Full-adder cells: 9 to 10.

ure 4. The buffer delay and power consumption are indicators of the adders' driving capability and the input impedance as well. We can assume that the buffers' contribution to power consumption and delays would be an offset for all adders, which is arguably true as a general approximation. The delay and power consumption measurements are from primary inputs-before buffers to outputs after the buffers. For relative comparison purposes the buffer delays and power consumption would show the system level properties of different adders that do not show if the adders are simulated without these buffers





or if the delay and power measurements do not take the buffers into account.

To guarantee same activity factor and to find the worst case power consumption and delay we had to simulate all the unique transitions from each input combination to another combination. As a result of doing this we can find the specific input that causes the worst case delay or the worst case power consumption. The following is the input sequence applied to the full-adder cells:

| <0, | 1, | Ο, | 2, | Ο, | з, | Ο, | 4, | Ο, | 5, | Ο, | 6, | Ο, | 7, | 1, |
|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 2,  | 1, | 3, | 1, | 4, | 1, | 5, | 1, | 6, | 1, | 7, | 2, | 3, | 2, | 4, |
| 2,  | 5, | 2, | 6, | 2, | 7, | 3, | 4, | 3, | 5, | 3, | 6, | 3, | 7, | 4, |
| 5,  | 4, | 6, | 4, | 7, | 5, | 6, | 5, | 7, | 6, | 7, | 0> |    |    |    |
|     |    |    |    |    |    |    |    |    |    |    |    |    |    |    |

The previous input sequence leads to the following corresponding sum and carry-out (cout) outputs:

| Sur | n: · | < 0, | , 1, | , 0, | , 1, | 0, | 0, | 0, | 1, | 0, | 0, | 0, | 0, | 0,  |
|-----|------|------|------|------|------|----|----|----|----|----|----|----|----|-----|
| 1,  | 1,   | 1,   | 1,   | 0,   | 1,   | 1, | 1, | 0, | 1, | 0, | 1, | 1, | 1, | 0,  |
| 1,  | 1,   | 1,   | 0,   | 1,   | 0,   | 1, | 1, | 0, | 1, | 0, | 0, | 0, | 0, | 0,  |
| 1,  | 1,   | 0,   | 1,   | 0,   | 1,   | 1, | 0, | 0, | 0, | 1, | 0, | 1, | 0  | >   |
| coi | 1;   | <    | 0,   | 0,   | 0,   | 0, | 0, | 1, | 0, | 0, | 0, | 1, | 0, | 1,  |
| 0,  | 1,   | 0,   | 0,   | 0,   | 1,   | 0, | 0, | 0, | 1, | 0, | 1, | 0, | 1, | 0,  |
| 1,  | 0,   | 0,   | 0,   | 1,   | 0,   | 1, | 0, | 1, | 1, | 0, | 1, | 1, | 1, | 1,  |
| 1,  | 1,   | 0,   | 1,   | 0,   | 1,   | 0, | 1, | 1, | 1, | 1, | 1, | 1, | 1, | 0 > |

From the above output sequences, we can deduce that we have 16 rise times and 16 fall times for the sum output, and 16 rise times and 16 fall times for the carry-out (cout).

Due to the different delay paths for different inputs and due to the finite rise and fall times of inputs, glitches may result that are specific to the circuit used and to the layout of each delay line. One or more inputs might have bigger input impedance than other inputs of the same circuits. We did not take glitches into consideration for delay measurements but it is inevitable to take them into consideration in power consumption measurements. Another cause of glitches and short circuit power consumption is the input signal rise and fall times. Power measurements would also be different for different slopes.

# **4** SIMULATION AND COMPARISON

#### 4.1 Power

The combinations previously mentioned were used for power consumption measurements and delays as well. All the following simulations were based on a process skew (deviation) analysis/modeling, 25C temperature, and 1.65v Vdd. The technology used was the 0.09um process technology. All the power and delay measurements were computed using HSPICE simulations.

First we consider the power consumption for different adders at different rise and fall times then we consider power consumption at different frequencies. The frequencies used were not chosen for any specific reasons, and the effect of frequency on power consumption and delays is very well behaved as expected. We consider the comparison of the fulladder cells according to the following scenarios:

- Frequency = 10 MHz and rise/fall time = 3 ns.
- Frequency = 10 MHz and rise/fall time = 1 ns.
- Frequency = 25 MHz and rise/fall time = 1 ns.
- Frequency = 50 MHz and rise/fall time = 1 ns.

In Figure 5, we show the maximum power consumption for the adders in the input combinations outlined above. It is noticed that adders 1 and 7 exhibit lower max power consumption than the rest of the adders. Another note is that for different frequencies and rise and fall times, the maximum power consumption of adder1 seems to be dominated by other factors that are not highly sensitive to frequency or rise and fall times.

Looking at the average power consumption in Figure 5, we notice that adder5 has the worst average power consumption. Moreover, adders 1, 3, 4, 8, and 9, show a balanced average power consumption over the different frequency and rise and fall times. This can be explained as follows: from Figure 5, we see that several adders have high max power consumptions and low average power consumption—this shows that the adders are not balanced for different input combinations and they exhibit a wide range of dependency on the input combination. To achieve balance for the adders, sizing would be employed which will in turn cause their max power consumption to increase but will balance their average power consumption.

#### 4.2 Delay

We consider the comparison of the full-adder cells according to the same scenarios described in 4.1. We compare average and max delay from primary inputs to final outputs including the buffers' delays.





Figure 5 Maximum and average power consumption in the various full-adder cells.

From Figure 6, it is observed that adders 2, 5, and 6, exhibits the worst delay. The delay frequency behavior is very dependent on the slowest path frequency dependence. This is very noticeable in adder7 above. This dependence can be reduced via transistor sizing.

The average delays of different adders are also shown in Figure 6. Most adders exhibit a similar





# Figure 6 Maximum and average delay in the various full-adder cells.

behavior with adder7 being the best. Again it is obvious that a big difference from average to max delay means that the adders have many different delay paths of which one is worst and pushed the max higher than most of the other paths.

### 4.3 Power^2\*Delay Criteria

In most standard cell libraries there is an optimization goal during the design process. One of the most known optimization criteria is "power^2\*delay". In this section we compare this factor from all adders and try to gain more insight in the advantages of different adders.





Figure 7 Maximum and average "Power^2\*delay" in the various full-adder cells.

In Figure 7, we notice that adders 1 and 7 show a better max power^2\*delay behavior than most other adders. We also see from the average power^2\*delay for different adders that there is a huge effect of different delay paths in the different adders and the sensitivity of the adders' power consumption to the input combinations. It is worth noting that if we balance these delay paths for the adders, the adders performance will be much better.

## **5** SUMMARY

In this paper we have surveyed various full-adder cell designs from the most recent published research. We have described a fair simulation experiment that compares the full-adder cells to each other in terms of transistor count, power, and delay.

Some very interesting areas of future research include the study of the effects of temperature, voltage, process corner and sizing of transistors on the delay and power consumption.

#### REFERENCES

- [1] N. Weste and K. Eshraghian, *Principles of CMOS VLSI Design, A Systems Perspective*, Second Edition, Addison-Wesley Pub., 1994.
- [2] A. Sayed and M. Bayoumi, "A new low power building block cell for adders", Proc. *Midwest Symposium on Circuits and Systems*, 1997, pp. 818-822.
- [3] S.-C. Fang, J.-M. Wang, and W. S. Feng, "A new design for three-input XOR function on transistor level", *IEEE Transactions on Circuits and Systems-I Fundamental Theory and Applications*, Vol. 43, pp. 343-348, April 1996.
- [4] E. Abu-Shama *et al.*, "An efficient low power basic cell for adders", Proc. *Midwest Symposium*

on Circuits and Systems, 1995, pp. 306 -309.

- [5] E. Abu-Shama and M. Bayoumi, "A new cell for low power adders", Proc. *International Sympo*sium on Circuits and Systems, 1996, pp. 49-52.
- [6] H.A. Mahmoud and M. Bayoumi, "A 10-transistor low-power high-speed full adder cell", Proc. *International Symposium on Circuits and Systems*, 1999, pp. 43-46.
- [7] L. Junming *et al.*, "A novel 10-transistor lowpower high-speed full adder cell", Proc. *International Conference on Solid-State and Integrated Circuit Technology*, 2001, pp. 1155-1158.
- [8] A. Fayed and M. Bayoumi, "A low power 10transistor full adder cell for embedded architectures", Proc. *International Symposium on Circuits and Systems*, 2001, pp. 226-229.
- [9] A. Shams and M. Bayoumi, "A novel high-performance CMOS 1-Bit full-adder cell", *IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing*, Vol.47, pp.478-481, May 2000.
- [10] A. Shams and M. Bayoumi, "A new full adder cell for low-power applications", Proc. Great Lakes Symposium on VLSI, 1998, pp. 45-49.
- [11] R. Shalem, E. John, and L.K. John, "A novel low power energy recovery full adder cell", Proc. *Great Lakes Symposium on VLSI*, 1999, pp. 380-383.