# A New Method for Power Estimation and Optimization of Combinational Circuits

Ahmed Sammy Aldeen Intel Corporation, Folsom, CA.

Abstract — One of the challenges of low power methodologies for digital systems is saving power consumption in these systems without compromising performance. In this paper we propose a new method for estimating dynamic power consumption in combinational circuits. The method enables us to optimize the power consumption of typical combinational circuits.

*Index Terms* — **Power estimation, power optimization, low-power design, combinational circuits.** 

#### I. INTRODUCTION

To address the areas of power estimation and optimization, we would revisit the basic CMOS power consumption equations. There are three major sources of power consumption in a digital CMOS circuit. These are summarized in the following equation:

$$P_{total} = p_t(C_L V V_{dd} f_{clk}) + I_{sc} V_{dd} + I_{leakage} V_{dd}$$
(1)

The first term in Equation (1) represents the switching component of power, where  $C_L$  is the effective switched loading capacitance,  $f_{clk}$  is the clock frequency and  $p_t$  is the probability that a power consuming transition occurs. In most cases, the voltage swing V is the same as the supply voltage  $V_{dd}$ .

The second term in Equation (1) is caused by the direct path short circuit current  $I_{sc}$ , which arises when both the NMOS and PMOS network of transistors are simultaneously active or on, conducting current from the supply  $V_{dd}$  to ground. Finally, a factor that is growing more and more important as we develop deep submicron technologies, the leakage power. The main cause is leakage current  $I_{leakage}$ , which can arise from substrate injection, gate leakage and sub-threshold effects.  $I_{leakage}$  is primarily determined by the CMOS fabrication process technology characterization.

All the nodes of a circuit contribute to the total power consumption of the circuit so Equation (1) should be applied to each and every node at a micro scale, this is where the notion of Transition Density helps [1].

The dominant term in a well-designed CMOS circuit is the switching component, thus the low-power design Hussain Al-Asaad University of California, Davis, CA.

goal becomes the task of minimizing  $p_f(C_L V V_{dd} f_{clk})$ , while retaining the required functionality and identifying the cost of such minimizations in terms of area and/or performance.

# **II. BASIC CALCULATIONS**

In this paper, we assume Strict Sense Stationary (SSS) mean-ergodic 0-1 processes to model the variety of logic waveforms that may be applied at the primary inputs of a digital circuit [1]. We further assume that the processes at the circuit primary inputs are mutually independent. A digital circuit can be thought of as a nonlinear but time-invariant system that operates on its input waveforms to produce its internal waveforms and outputs.

The probability of a signal x(t) to be at a high level logic value in the time interval (-T/2, T/2], is computed as follows, if x(t) is Strict Sense Stationary [17] and mean ergodic, then

$$P(x) = E[x(t)] = \lim_{T \to \infty} 1/T \int_{-T/2}^{T/2} x(t) dt$$
 (2)

A measure of switching activity [1] is called the "Transition Density", denoted D(x), which is defined as the average switching rate at a circuit node. An algorithm to propagate it throughout a circuit from primary inputs to outputs is outlined as follows.

We consider estimating the average power of a CMOS gate. If the gate has an output capacitance C to ground and its output is a simple clock signal of frequency f, then the average power dissipated is 1/2  $CV_{dd}^2$  f, where  $V_{dd}$  is the supply voltage. In general, a node in a logic circuit may not carry a periodic signal, instead, one may compute the power as follows. If  $n_x(T)$  is the number of transitions of a signal x(t) in the time interval (-T/2, T/2], then Equation (2) becomes :

$$P_{av} = \lim_{T \to \infty} V_{dd} CV_{dd} n_x(T)/2T$$
  
=  $\frac{1}{2} CV_{dd}^2 \{ \lim_{T \to \infty} n_x(T)/T \}$   
=  $\frac{1}{2} CV_{dd}^2 D(x)$  (3)

If x(t) is Strict Sense Stationary [1] and mean ergodic, then P(x) is as stated in Equation 2, and

$$D(x) = \lim_{T \to \infty} n_x(T)/T$$
(4)

If the probabilities and transition densities of the primary inputs are given, then they can be propagated throughout the circuit to all internal nodes and outputs.

We begin by recalling the definition of the Boolean Difference, if y is a Boolean function that depends on x then, the Boolean difference of y with respect to x is defined as:

$$\partial y/\partial x = y |_{x=1} \text{ XOR } y |_{x=0} = y(x) \text{ XOR } y(x')$$

If  $y=f(x_1, x_2, ..., x_n)$ :

$$D(y) = \sum_{i} P(\partial y / \partial x_{i}) D(x_{i})$$
 (5)

By Shannon's expansion:

$$y = x_1 f_{x1} + x_1' f_{x1'}$$
(6)

Then, 
$$P(y) = P(x_1 f_{x_1}) + P(x_1' f_{x_1'})$$
  
 $P(y) = P(x_1) P(f_{x_1}) + P(x_1') P(f_{x_1'})$  (7)

To illustrate, consider the example of a 3-input AND gate with independent inputs and output y = x1 x2 x3. The probability and transition density at the output y can be calculated as follows:

$$P(y) = P(x_1) P(x_2) P(x_3)$$
(8)  

$$D(y) = P(\partial y / \partial x_1) D(x_1) + P(\partial y / \partial x_2) D(x_2) + P(\partial y / \partial x_3) D(x_3)$$
(8)  

$$= P(x_2 x_3) D(x_1) + P(x_1 x_3) D(x_2) + P(x_1 x_2) D(x_3)$$
(7)  

$$= P(x_2) P(x_3) D(x_1) + P(x_1) P(x_3) D(x_2) + P(x_1) P(x_2) D(x_3)$$
(9)

So given P(primary inputs) and D(primary inputs) we can calculate both P(x) and D(x) for each gate output in the circuit. Binary Decision Diagrams could be used for this propagation or functions could be derived for each type of gate that exists in our circuit or library for different number of inputs. We follow the latter in our research.

## III. EXPERIMENTS AND NEW APPROACH

All the simulation experiments were done on gate level netlists in Design Compiler 2004.12-SP1 [8]. We used default input static probability and toggle rate of 0.5. We used a custom wire load model (CWLM) [8] to perform the power calculations.

We wrote our own script to use the above mentioned methodology to estimate dynamic power consumption

of combinational circuits. The script is not as accurate as Design Compiler due to the assumptions made. To minimize the square error between the estimation data and the experimental data we propose to use the best fit model. This approach speeds up the process of power estimation, and considers higher level correlations in the variables considered and include variables that might have an effect on the outcome of the experiments but not directly controllable. For example, establishing the best-fit model for the NAND gates of the ISCAS-85 [7] C2670 circuit is shown in Figure 1. Other models can be similarly developed for other gate types. Dynamic power results from Design Compiler were obtained at the slow corner of 90nm technology, 100C and 1V.



Figure 1. NAND gates dynamic power best fitted curve.

The benefit of the best fit model is using it to estimate the dynamic power for other circuits in the same technology. This implies a new methodology for finding the best-fit using one or two experiments in the targeted technology.

The flow chart in Figure 2 shows the proposed methodology to be able to predict the dynamic power consumption of any gate in any circuit for a specific technology.

#### IV. MODEL ADEQUACY AND ACCURACY

When we compare two sets of data, one actual and one model produced, we care about the mean and the variance of both statistics of the calculated and measured values. The following Figures 3 and 4 are the comparisons of means and variances of the best fit models compared to the Design Compiler runs.

Furthermore, we investigated the correlation coefficients of four types of gates. Table I shows very high correlation coefficients which show the adequacy of the best fit model.



Figure 2. Methodology for dynamic power estimation.



Figure 3. Mean comparison.



Figure 4. Variance comparison.

 TABLE I

 Correlation coefficients for the four gate types.

|      | 0           |
|------|-------------|
| Gate | Correlation |
| NAND | 0.972486942 |
| INV  | 0.997509972 |
| OR   | 0.977294443 |
| AND  | 0.95985257  |

Residuals were investigated to show any inadequacy in the models used. Figure 5 show that there is no such dependency between the actual simulation results and the errors caused by the model calculations. This signifies that the models proposed are adequate to actually represent the simulations results.



Figure 5. NAND gates residuals.

We investigated the coefficient of determination, or  $r^2$ , which indicates a strong relationship between the independent and the dependent variables. We will then use the F-distribution statistic to determine whether these results, occurred by chance or not. We used Microsoft Excel's built in functions.

The F-distribution, F-values for each gate type and  $r^2$  are shown in Table II. It is noticeable that F values are much higher than the Fdist ones, this means that it is extremely unlikely that an F value this high occurred by chance. This implies that the two data sets are very highly similar, meaning the models are extremely good.

We used the proposed methodology to estimate the dynamic power consumption in the ISCAS-85 [7] C7752 circuit. We chose 3 nodes for each gate type and got the correlation coefficients shown in Table III.

|            | TABLE II          |          |
|------------|-------------------|----------|
| Statistics | for different gat | e types. |
| $r^2$      | F                 | Fdi      |

| Gate | r²       | F        | Fdist    |
|------|----------|----------|----------|
| NAND | 9.46E-01 | 3.31E+01 | 6.60E-10 |
| INV  | 9.95E-01 | 3.60E+02 | 1.31E-18 |
| OR   | 9.55E-01 | 3.62E+01 | 1.77E-09 |
| AND  | 9.21E-01 | 2.22E+01 | 6.88E-08 |

| TABLE III           C7752 nodes correlation coefficients. |                         |  |  |  |
|-----------------------------------------------------------|-------------------------|--|--|--|
| Gate                                                      | Correlation Coefficient |  |  |  |
| NAND                                                      | 0.99904762              |  |  |  |
| OR                                                        | 0.94790048              |  |  |  |
| AND                                                       | 0.99148311              |  |  |  |

# V. LOW POWER CIRCUIT OPTIMIZATIONS

After establishing model representations of the power consumption data for different nodes in the ISCAS-85 circuits, the power consumption of the highest power consuming nodes of the circuit was investigated in view of Equation (3). Several experiments were conducted and these lead to a basic set of transformations that resulted in significant power savings. This set of basic transformations is a direct application of De Morgan's Laws. The transformations are basically taking advantage of the permissible functions at a circuit node to reduce capacitance [2, 3, 4, 5, 6]. An example of the basic set of these transformations used is illustrated in Figure 6. Another example would be merging similar gates to a higher input gate of the same type.



Figure 6. Bubbled AND to NOR power transformation.

These transformations are done to the non-critical paths, so there is no performance degradation penalty. Applying these transformations to different ISCAS-85 [7] circuits resulted in a significant power consumption reduction. This is illustrated in Figure 7.

#### VI. CONCLUSIONS

We developed and qualified fast best-fit models for power consumption estimation. We used these models to optimize the circuit nodes that consume high power and applied the new methods to several ISCAS-85 [7] circuits.





## REFERENCES

- [1] F. Najm, "Transition density, a stochastic measure of activity in digital circuits", *Proc. Design Automation Conference*, 1991, pp. 644 – 649.
- [2] R. Panda and F. Najm, "Technology decomposition for low-power synthesis", *Proc. Custom Integrated Circuits Conference*, 1995, pp. 627-630.
- [3] S. Muroga, Y. K. Yashi, H. C. Lai and J. N. Culliney, "The transduction method- design of logic networks based on permissible functions", *IEEE Transactions on Computers*, Vol. 38, No. 10, pp.1404-1424, Oct. 1998.
- [4] I. Bahar and F. Somenzi, "Boolean techniques for low power driven re-synthesis", *Proc. International Conference on Computer-Aided Design*, Nov. 1995, pp. 428-432.
- [5] B. Rohfleisch, A. Kolbel and B. Wurth, "Reducing power dissipation after technology mapping by structural transformations", *Proc. Design Automation Conference*, June 1996, pp. 789-794.
- [6] Q. Wang and S. Vrudhula, "Multi-level logic optimization for low power using local logic transformations", Proc. International Conference on Computer-Aided Design, 1996, pp. 270-277.
- [7] F. Brglez and H. Fujiwara, "A neutral netlist of 10 combinational benchmark circuits and a target translator in fortran", *Proc. International Symposium on Circuits and Systems*, 1985, pp. 695 – 698.
- [8] Design Compiler and Prime Power user guides from <u>www.synopsys.com</u>.