# Towards a High-Level Power Estimation Capability

Mahadevamurty Nemani, Student Member, IEEE, and Farid N. Najm, Member, IEEE

Abstract—We will present a power estimation technique for digital integrated circuits that operates at the register transfer level (RTL). Such a high-level power estimation capability is required in order to provide early warning of any power problems before the circuit-level design has been specified. With such early warning, the designer can explore design trade-offs at a higher level of abstraction than previously possible, reducing design time and cost. Our estimator is based on the use of entropy as a measure of the average activity to be expected in the final implementation of a circuit, given only its Boolean functional description. This technique has been implemented and tested on a variety of circuits. The empirical results to be presented are very promising and demonstrate the feasibility and utility of this approach.

#### I. INTRODUCTION

THE HIGH device count and operating frequency of modern integrated circuits has led to unacceptably high levels of chip power consumption. Modern microprocessors have power consumption specifications that can easily exceed 30 W. Due to limited battery life, high power consumption is a major problem in the design of portable or mobile electronics. Even in line-powered equipment, such high power levels require expensive packages and heat-sinks. Thus, there is a need for CAD tools to help with the power management problem.

In order to avoid costly redesign steps, power estimation tools are required that can assess the power dissipation *early* in the design process, before the final circuit-level design has been specified. This allows the designer to explore design trade-offs at a higher level of abstraction than was previously possible, reducing design time and cost. While several approaches have been proposed for gate-level power estimation (see [1] for a recent survey), there has been little work on power estimation for general logic circuits at higher levels of abstraction, such as when the circuit is represented only by Boolean equations.

We propose that a way of providing this capability is to make use of the concept of *computational work*, based on the use of *entropy* from information theory. This concept was introduced in the early 1970's, as researchers were looking for a measure of the area complexity of a computational process (computer program). It was felt that, by somehow measuring the computational work being performed, one should be able to predict the *area cost* of an implementation. While this sounds reasonable, it turned out to be very difficult to quantify

Manuscript received March 29, 1996. This work was supported in part by Intel Corporation, Santa Clara, CA. This paper was recommended by Guest Editors M. Pedram and M. Fujita.

The authors are with the ECE Department and Coordinated Science Laboratory, University of Illinois, Urbana, IL 61801 USA.

Publisher Item Identifier S 0278-0070(96)04898-1.

computational work. In 1972, Hellerman [2] proposed the use of *entropy* as a measure of computational work. Entropy will be discussed at length in the next section.

These efforts were mostly unsuccessful [3] for a general computational process, but were reasonably successful [4]–[8] in the limited context of a combinational logic circuit implementing a Boolean function. Thus, it seems plausible to apply these concepts to perform power estimation of a combinational circuit at a point in the design process where only the Boolean functionality of the circuit, but not its gatelevel implementation, is known. The circuit representation at this level of abstraction is usually called a (structural) register transfer level (RTL) description. In this description, the circuit is described in terms of well defined flip-flops or latches and other combinational logic blocks, described only by Boolean functions.

In this paper, we will present a technique for estimating the average switching frequency inside a combinational circuit, given only its input/output Boolean functional description. This represents a first step toward a high-level power estimation capability. The technique is based on properties of the entropy function and a few simplifying assumptions and approximations whose validity will be demonstrated with empirical results. This paper is an extended version of [18].

The remaining sections are organized as follows. In Section II, we give a brief review of the concept of entropy and its application to logic circuits. Our model for average activity in terms of input/output entropy is given in Section III, and is then verified in Section IV against empirical data. Section V provides a discussion of area estimation from entropy and presents improved bounds for area prediction. Finally, conclusions are presented in Section VI.

#### II. ENTROPY IN LOGIC CIRCUITS

Entropy is a characterization of a random variable or a random process. It is used in information theory [9] as a measure of information-carrying capacity. If x is a random Boolean variable with probability p of being high, i.e.,  $\mathcal{P}\{x=1\}=p$ , then the entropy of x is defined as

$$H(x) = p \log_2 \frac{1}{p} + (1 - p) \log_2 \frac{1}{(1 - p)}$$
 (1)

where  $\log_2$  is the logarithm to base 2. A plot of H(x) is shown in Fig. 1.

The function H(x) has a maximum value of one at p=0.5. Intuitively, if a signal has p=0.5 then it can make the maximum number of transitions and can carry the most information. In general, if a discrete variable can take n



Fig. 1. The entropy of a Boolean variable.

different values then its entropy is

$$H(x) = \sum_{i=1}^{n} p_i \log_2 \frac{1}{p_i}$$
 (2)

where  $p_i$  is the probability that x takes the ith value  $x_i$ .

Thus, every Boolean variable (or vector) has an associated entropy function, whose value is determined by the probability value assigned to the variable (or vector). Let Y = f(X) be a Boolean function where X is a Boolean vector with n b and Y is a Boolean vector with m b, i.e.,  $f(\cdot)$  can be implemented by an n-input m-output logic circuit. Then X can take  $2^n$  values and the *input entropy* of  $f(\cdot)$  is

$$H(X) = \sum_{i=1}^{2^n} p_i \log_2 \frac{1}{p_i}.$$
 (3)

And Y can take  $2^m$  values and the *output entropy* of  $f(\cdot)$  is

$$H(Y) = \sum_{i=1}^{2^m} p_i \log_2 \frac{1}{p_i}.$$
 (4)

With Y=f(X), it can be shown (see [9], p. 43) that  $H(Y) \leq H(X)$ , so that the entropy at the output of a combinational circuit is always less than at its input.

Previously, the entropy associated with a Boolean function has been used to predict the *silicon area* required to implement that function, without knowing its gate-level implementation. Given input probabilities of 0.5, the *output entropy* of a Boolean function has been used to predict the area of its average minimized implementation, according to

$$\mathcal{A} \propto \frac{2^n}{n} H(Y). \tag{5}$$

This was shown to be theoretically valid in the limit (as  $n \to \infty$ ) [5]. For small circuits  $(n \le 10)$ , it was empirically observed [8] that  $2^n H(Y)$  provides a good measure of area. We will show in Section V that this model breaks down over a range of realistic input counts from 25 to 200. In that section, we will also give two entropy-based new bounds on the area that perform better than the above.

#### III. POWER ESTIMATION

We restrict ourselves to the common static fully complementary CMOS technology. Consider a combinational logic circuit composed of N logic gates whose gate output nodes are denoted  $x_i$ ,  $i=1,2,\cdots,N$ . If  $D(x_i)$  is the transition density [10] of node  $x_i$  (average number of logic transitions per second), then the average power consumed by the circuit is

$$P_{\text{avg}} = \frac{1}{2} V_{dd}^2 \sum_{i=1}^{N} C_i D(x_i)$$
 (6)

where  $C_i$  is the total capacitance at node i. This expression accounts only for the capacitive charging/discharging component of power, and not for the so-called short-circuit power which is known to be only around 10% of the total power in well-designed circuits. The transition density is a measure of circuit switching *activity*. We will be using the terms "density" and "activity" interchangeably.

Given that the internal details of the logic circuit are not known in a high level representation, then a few approximations seem inevitable for high-level power estimation. The impact and utility of these approximations will be demonstrated through empirical results in Section IV. We start with

$$P_{\text{avg}} \propto \sum_{i=1}^{N} C_i D(x_i) \approx \mathcal{D} \sum_{i=1}^{N} C_i$$
 (7)

where  $\mathcal{D}$  is the average node transition density, defined by

$$\mathcal{D} = \frac{1}{N} \sum_{i=1}^{N} D(x_i) \tag{8}$$

so that

$$P_{\text{avg}} \propto \mathcal{A} \times \mathcal{D}$$
 (9)

where  $\mathcal{A}$  is an estimate of the *circuit area* that is representative of the capacitance  $\sum_{i=1}^{N} C_i$ .

It also seems inevitable that both  $\mathcal{D}$  and  $\mathcal{A}$  will be estimated only from knowledge of the input/output behavior. The main result of this paper will be a relationship between the average density  $\mathcal{D}$  and entropy. We will also give (in Section V) two new bounds on the area that perform better than existing methods. We will assume throughout that we are dealing with a combinational circuit block that is part of a synchronous sequential circuit, as shown in Fig. 2. If both area and average density are successfully related to entropy, then a viable highlevel power estimation methodology would be as follows:

- run a structural RTL simulation of the sequential circuit to measure the input/output entropies of the combinational block;
- 2) from the input/output entropies, estimate  $\mathcal{D}$ ,  $\mathcal{A}$ , and  $P_{\text{avg}}$  for the combinational block;
- 3) combine with latch and clock power to get the total average power.

In the two subsections below, we discuss the estimation of entropy from an RTL simulation trace (step 1), and the estimation of average density from entropy (part of step 2). Step 3 is easy, given the clock frequency and the results of steps 1 and 2.



Fig. 2. A general synchronous sequential circuit.

# A. Entropy from RTL Trace

The entropy  $H(x_i)$  of a single input or output node  $x_i$  can be easily computed from the definition (1) once the node probability has been found. The probabilities of the primary inputs of the sequential machine (nodes  $u_i$  in Fig. 2) are assumed to be provided by the user, or can otherwise be easily extracted from an RTL trace by counting the proportion of ones. The other inputs of the combinational block are latch outputs, whose probabilities can be obtained as in [13] and [14], or from an RTL trace using Monte Carlo analysis as proposed in [11]. The same analysis also yields the probabilities of the latch inputs and of the other outputs of the combinational block.

For area estimation, it has been found [8] that the entropy of the output Boolean *vector* plays a key role. If  $X = [x_1, x_2, \cdots, x_n]$  is a Boolean vector (say, the *next state* vector or *present state* vector), then it is obviously too expensive to estimate its entropy from the definition (3). Instead, one can efficiently estimate an upper bound on the entropy based on the relation

$$H(X) \le \sum_{i=1}^{n} H(x_i) \tag{10}$$

where equality occurs when the signals  $x_i$  are independent [9]. Thus, if the bits in a Boolean vector are not too correlated, one can make the approximation

$$H(X) \approx \sum_{i=1}^{n} H(x_i). \tag{11}$$

We refer to this as a *zeroth-order* approximation and denote it by  $H^0(X) = \sum_{i=1}^n H(x_i)$ . Other higher-order approximations are possible, as will be shown in Section V.

### B. Average Activity from Entropy

Consider one of the *present state* bit signals, and let p be its signal probability (average fraction of clock cycles in which it is high). If the signal values in two consecutive cycles are assumed independent, then its average activity per clock cycle is 2p(1-p) transitions per cycle (the transition density is



Fig. 3. The relation between activity and entropy.

 $2p(1-p)/T_c$  transitions per second, where  $T_c$  is the clock period [1]). It so happens that the plot of 4p(1-p) is very close to that of the entropy function, as shown in Fig. 3.

Thus, it makes sense to use entropy as a measure of activity, so that if  $\mathcal{H}$  is the average value of  $H(x_i)$  over all nodes  $x_i$  in the circuit, then (with some approximation)

$$P_{\rm avg} \propto \mathcal{A} \times \mathcal{H}.$$
 (12)

In the following, we will see how the average internal node entropy of a combinational circuit can be computed from its input/output entropy. We will start the analysis by considering the variation of the internal entropy in a completely specified gate-level implementation of the circuit. From this, we will carefully abstract away (using a series of approximations) all aspects of circuit structure to end up with a model that depends only on the circuit input/output properties.

A combinational circuit can always be *levelized* so that its gates are tagged with *level* values that represent their distance from the primary inputs. Thus every gate whose inputs are all primary inputs is said to have level one. Every other gate whose inputs are either outputs of level one gates or are primary inputs is said to have level two, etc. The levelization algorithm [12] has linear time complexity and is standard in most logic/timing simulation systems.

The largest level number K used in levelizing a circuit is called the circuit depth. Since level numbers are gate attributes, not node attributes, it will be helpful to define the notion of a circuit cross-section, as follows. A node which is the output of some gate g is said to be generated at the level of g. A primary input node is said to be generated at the level of g. A primary output node is said to be used at the level of g. A primary output node is said to be used at level K+1. Thus, every circuit node is generated at some unique level and used at possibly several other levels. For every  $i=0,1,2,\cdots,K$ , define the set of nodes in cross-section i,  $S_i$  as the set of all circuit nodes that are generated at levels less than or equal to i and used at levels greater than i.

Definition of  $\hat{H}(i)$ : Based on the notion of cross-section, we define H(i) to be the sum of node entropies in the set  $S_i$ , called the *cumulative entropy at cross-section* i, or simply, the

entropy at cross-section i. Thus H(K) is the sum of entropies of the primary output nodes (next state vector + output vector, in the case of a sequential circuit) denoted by  $H_o$ . Likewise, H(0) is the sum of entropies of the primary input nodes (present state vector + primary inputs vector, in the case of a sequential circuit), denoted by  $H_i$ .

Let W(i) be the number of nodes in cross-section i, i.e., the number of elements in  $\mathcal{S}_i$ . This we will call the circuit width at that cross-section. Thus, W(K) is the number of primary output nodes, which we denote by m (the output width). And W(0) is the number of primary input nodes, denoted by n (the input width). Define  $H_{\text{avg}}(i)$  as

$$H_{\text{avg}}(i) = \frac{H(i)}{W(i)} \tag{13}$$

so that  $H_{\mathrm{avg}}(i)$  is the average entropy per node at cross-section i.

Given only a high-level specification, the width W(i) at internal cross-sections of the circuit is unknown. Consider the linear width model

$$\hat{W}(i) = m + (n - m)\left(1 - \frac{i}{K}\right). \tag{14}$$

We are not actually assuming that the circuit width is linear like this, but will only use the model (14) as a scaling factor as we look for a reasonable entropy model, as we shall see. Note that  $\hat{W}(0) = n = W(0)$  and  $\hat{W}(K) = m = W(K)$ . Based on this, we now define the quantity  $\hat{H}(i)$  as follows

$$\hat{H}(i) = H_{\text{avg}}(i)\hat{W}(i) \tag{15}$$

so that  $\hat{H}(i)$  is the entropy at cross-section i corresponding to a linear width model. Note that  $\hat{H}(0) = H_i$  and  $\hat{H}(K) = H_o$ .

Quadratic Model for  $\hat{H}(i)$ : We have empirically observed that  $\hat{H}(i)$  varies quadratically with depth. This is shown in Fig. 4(a)–(c) where we show, for some example circuits, the actual variation of  $\hat{H}(i)$  with depth and compare it with the quadratic model for  $\hat{H}(i)$ , namely  $H^q(i)$ , which can be written in terms of  $H_i$  and  $H_o$  as follows

$$H^{q}(i) = H_{o} + (H_{i} - H_{o}) \left(1 - \frac{i}{K}\right)^{2}.$$
 (16)

The plots show that there is a good agreement between the quadratic model  $H^q(i)$  and the actual variation of  $\hat{H}(i)$ . The deviation from the quadratic model observed at the input side is due to the fact that the actual cross-sectional entropy falls faster than the quadratic model in this region. The deviation in the middle is due to the fact that the actual cross-sectional entropy increases due to recombination or reconvergent fanout while the cross-sectional entropy as predicted by the quadratic model continues to decrease. Toward the output side,  $\hat{H}$  seems to fall linearly. This implies that the average entropy per node in a cross-section  $H_{\rm avg}(i)$  is approximately constant toward the output end.

To further verify the quadratic model, the root mean square (RMS) error between the quadratic model and the actual variation was measured for all the ISCAS'85 and ISCAS'89 benchmark circuits for input probabilities ranging from 0.1 to







Fig. 4. (a) Comparison of  $H^q$  with  $\hat{H}$  for S713. (b) Comparison of  $H^q$  with  $\hat{H}$  for C5315. (c) Comparison of  $H^q$  with  $\hat{H}$  for C2670.

0.9. A histogram depicting the errors obtained is shown in Fig. 5. The histogram indicates that  $H^q$  is a good approximation of  $\hat{H}$ . In the cases where the RMS error was large, it was



Fig. 5. Root-mean-square error between  $H^q$  and  $\hat{H}$ .

observed that the quadratic model overbounds the actual data. This means that the quadratic approximation is conservative and would err on the side of higher activity.

Average Entropy Model: Having empirically demonstrated the fact that  $\hat{H}$  decreases quadratically with circuit depth, we now derive a model for computing the average node entropy from the input and output entropies as follows. By the definition of average entropy per node,  $\mathcal{H}$ , we have

$$\mathcal{H} = \frac{1}{N} \sum_{i=1}^{K} \sum_{G_i} H(x_i)$$
(17)

where N is the number of logic gates in the circuit, and  $\mathcal{G}_i$  is the set of nodes  $x_j$  that are outputs of gates at level i. We define  $\mathcal{G}_0$  to be equal to the empty set. Let  $N_s$  be defined as

$$N_s = \sum_{i=0}^{K} W(i). {18}$$

It then follows from the definitions of  $N_s$  and H(i) that  $N_s \geq N$  and  $H(i) \geq \sum_{G_i} H(x_j)$ . Thus

$$\frac{1}{N_s} \sum_{i=0}^{K} H(j) = \frac{\sum_{i=1}^{K} \sum_{\mathcal{G}_i} H(x_j) + \sum_{j=0}^{K} \delta H(j)}{N + \sum_{i=0}^{K} \delta N(j)}$$
(19)

where  $\delta H(j)$  is the sum of entropies of those nodes in  $\mathcal{S}_j$  but not in  $\mathcal{G}_j$ , and  $\delta N(j)$  is the *number* of such nodes. Now let  $\delta H = \sum_{j=0}^K \delta H(j)$  and  $\delta N = \sum_{j=0}^K \delta N(j)$ . If  $\delta N$  is small relative to N, we have the approximation

$$\frac{1}{N_s} \sum_{j=0}^K H(j)$$

$$\approx \frac{1}{N} \sum_{i=1}^K \sum_{\mathcal{G}_i} H(x_j)$$

$$+ \frac{1}{N} \left[ \delta H \left( 1 - \frac{\delta N}{N} \right) - \sum_{i=1}^K \sum_{\mathcal{G}_i} H(x_j) \frac{\delta N}{N} \right] \quad (20)$$

which is the result of using a Taylor series expansion of  $1/(1+\delta N/N)$  and dropping the high-order terms. For large N, we can further approximate the above expression and write



Fig. 6. Histogram of the ratio  $\frac{k_1}{k_2}$ .

the equation as

$$\frac{1}{N_s} \sum_{j=0}^{K} H(j) \approx \frac{1}{N} \sum_{i=1}^{K} \sum_{G_i} H(x_j).$$
 (21)

When the condition  $\delta N \ll N$  does not hold, the Taylor approximation can not be used. In spite of this, (21) can be derived from (19) by rewriting (19) as follows

$$\frac{1}{N_s} \sum_{j=0}^K H(j) = \frac{\sum_{i=1}^K \sum_{\mathcal{G}_i} H(x_j) \left[ 1 + \frac{\delta H}{\sum_{i=1}^K \sum_{\mathcal{G}_i} H(x_j)} \right]}{N \left[ 1 + \frac{\delta N}{N} \right]}$$

Now let  $k_1=1+\frac{\delta H}{\sum_{i=1}^K\sum_{\mathcal{G}_i}H(x_j)}$  and  $k_2=1+\frac{\delta N}{N}.$  It must be observed that if

$$\frac{\delta H}{\delta N} \approx \frac{\sum_{i=1}^{K} \sum_{\mathcal{G}_i} H(x_j)}{N}$$

then  $\frac{k_1}{k_2} \approx 1$ . The correlation between  $k_1$  and  $k_2$  was measured for ISCAS'85 and ISCAS'89 circuits for input probabilities ranging from 0.1 to 0.9 and the results are shown in Fig. 6. It can be seen from this figure that  $\frac{k_1}{k_2} \approx 1$  and hence, when  $\delta N$  is large, it is reasonable to assume that  $\frac{k_1}{k_2} \approx 1$ . It then follows that for large  $\delta N$  (19) can be approximated by (21).

The empirical results presented in the next section indicate that the above assumptions are quite reasonable for large circuits. From (17) and (21), it thus follows that

$$\mathcal{H} \approx \frac{1}{N_s} \sum_{j=0}^{K} H(j). \tag{22}$$

The right-hand side of the above equation can be further approximated as follows

$$\frac{1}{N_s} \sum_{j=0}^{K} H(j) = \frac{\sum_{j=0}^{K} H_{\text{avg}}(j) W(j)}{\sum_{j=0}^{K} W(j)}$$

$$\approx \frac{\sum_{j=0}^{K} H_{\text{avg}}(j) \hat{W}(j)}{\sum_{j=0}^{K} \hat{W}(j)}$$

$$= \frac{1}{N_l} \sum_{j=0}^{K} \hat{H}(j) \tag{23}$$

where  $N_l$  is the total number of nodes corresponding to a linear width model, defined as

$$N_l = \sum_{j=0}^{K} \hat{W}(j).$$
 (24)

The errors between  $\frac{1}{N_s}\sum_{j=0}^K H(j)$  and  $\frac{1}{N_l}\sum_{j=0}^K \hat{H}(j)$  were measured for the ISCAS'85 and ISCAS'89 circuits for input probabilities ranging from 0.1 to 0.9. The results obtained are summarized in the histogram shown in Fig. 7.

It can be seen from the histogram that the error due to the above approximation is quite small. This implies that the above approximation is quite good. Thus, we have reduced the problem of computing  $\mathcal{H}$  for the actual circuit to computing the average of  $\hat{H}(j)$  for the linear width model  $\hat{W}(j)$ . But we have already demonstrated that  $\hat{H}(j)$  can be approximated by the quadratic model  $H^q(j)$ . This implies that

$$\frac{1}{N_l} \sum_{j=0}^{K} \hat{H}(j) \approx \frac{1}{N_l} \sum_{j=0}^{K} H^q(j). \tag{25}$$

Thus it follows from (22), (23), and (25), that

$$\mathcal{H} \approx \frac{1}{N_l} \sum_{i=0}^{K} H^q(j). \tag{26}$$

Now, if we denote the average width of the linear model by  $\hat{\mathcal{W}}$ , so that

$$\hat{\mathcal{W}} = \frac{1}{K+1} \sum_{j=0}^{K} \hat{W}(j) = (n+m)/2$$
 (27)

then, from (24), (26), and (27), it follows that

$$\mathcal{H}\hat{\mathcal{W}}(K+1) \approx \sum_{j=0}^{K} H^{q}(j) = (K+1)H_{o}$$

$$+ (H_{i} - H_{o}) \sum_{j=0}^{K} \left(1 - \frac{j}{K}\right)^{2}$$

$$= (K+1)H_{o} + (H_{i} - H_{o}) \frac{1}{K^{2}} \sum_{j=0}^{K} (K-j)^{2}$$

$$= (K+1)H_{o} + (H_{i} - H_{o}) \frac{1}{K^{2}} \sum_{k=1}^{K} k^{2}$$

$$= (K+1)H_{o} + (H_{i} - H_{o}) \frac{K(K+1)(2K+1)}{6K^{2}}.$$
(28)

From which it follows that

$$\mathcal{H}\hat{\mathcal{W}} = \frac{4K - 1}{6K}H_o + \frac{2K + 1}{6K}H_i \approx \frac{2}{3}H_o + \frac{1}{3}H_i \qquad (29)$$

where the last approximation is based on the fact that one is negligible compared to 2K and 4K, given that the circuit depth K can be large for large circuits. This leads to

$$\mathcal{H}\hat{\mathcal{W}} \approx \frac{H_i + 2H_o}{3} \tag{30}$$



Fig. 7. Error between  $\frac{1}{N_s} \sum_{j=0}^K H(j)$  and  $\frac{1}{N_l} \sum_{j=0}^K \hat{H}(j)$ .

which does not depend on circuit depth (a must, so as to be applicable to a high level representation). Using (27), we finally arrive at our main result

$$\mathcal{H} \approx \frac{2/3}{n+m} (H_i + 2H_o) \tag{31}$$

which depends only on the input and output entropies and on the input and output node counts, all of which are obtainable from a high level representation.

In spite of the approximations made above, we have found that the resulting simple expression for  $\mathcal{H}$ , (31), works quite well for a broad range of circuits. The empirical results presented in the next section will be based on this expression.

## IV. EMPIRICAL RESULTS

As a first step toward a high-level power estimation capability, we have implemented a technique for estimating the average node activity of a combinational circuit, based on the average entropy measure (31). To use this technique, we estimate  $H_i$  and  $H_o$  from their definitions and then use (31). We tested the technique on isolated combinational circuit blocks whose input probabilities are user-specified. Normally, these input probabilities would be obtained from an examination of the behavior of the sequential circuit as in the established techniques [13], [14] or [11]. The input probabilities are enough to compute  $H_i$ , but  $H_o$  depends on the output probabilities. These can be computed using BDD's as explained in [10], but this can be memory intensive. Instead, we compute them using a Monte Carlo approach [15].

In order to assess the accuracy of the technique, we need an accurate measure of average node activity, obtained from a gate-level view of the same circuit. This can be obtained by first finding the transition density at every node and then averaging the results. Accurate transition density values were obtained by simulation in two ways, depending on the timing model chosen.

 Using a zero-delay timing model: In this case, one is interested only in the final steady state node values in a clock cycle, and any additional toggles due to



Fig. 8. Comparison with zero-delay simulation results.

unequal delay paths are ignored. Also, the density is easily obtained from the signal probability [1] according to  $D(x) = 2p(1-p)/T_c$ , and the probabilities were obtained using the technique in [15].

2) Using a general-delay timing model: In this case, the delays are obtained from a gate library and an event driven simulation is performed as in [16].

The delay model did not enter into the derivation of (31), as is probably to be expected in a high-level model. Therefore, in order to check the impact of the approximations made in the derivation, it is important to check the accuracy of (31) against the zero-delay simulation results.

The results of testing against the zero-delay analysis results are shown in Fig. 8 for 56 different circuits, with sizes ranging from 100 to about 22 000 gates, and with input probabilities ranging from 0.1 to 0.9. These circuits include all the ISCAS'85 and ISCAS'89 circuits. As shown in Fig. 3, the average entropy should correlate well with twice the average activity per clock cycle. Thus the "activity from simulation" shown on the horizontal axis in Fig. 8 is actually normalized to give the average value of 4p(1-p) for each circuit. The agreement is quite good, with an error of less than 0.09, with 90% confidence. We consider this to be strong indication that the technique is feasible and constitutes a reasonable approach to high-level power estimation. The approach is also very fast. Our implementation, which includes reading the circuit, estimating the output entropy, and evaluating (31), requires only 14 CPU s for a 20 000 gate circuit (on a SUN sparc-10).

The effect of capacitance is not included in the data shown Fig. 8 (only activity values were measured, and not activity  $\times$  capacitance). Therefore, before moving to the case of the general delay timing model, we tested the impact of the approximation (7) which is equivalent to an independence assumption between the node capacitance and node density distributions. To do this, we checked if the average entropy correlates with the power per unit area, according to  $\mathcal{D} \propto P/A$ . We used gate count as a measure of area, and estimated power using a zero-delay timing model, accounting for fanout capacitance. Average entropy is compared to power per unit area in units of  $\mu W/MHz/gate$  in Fig. 9. The results shown



Fig. 9. Comparison with zero-delay power/area simulation results.



Fig. 10. Comparison with general-delay power/area simulation results.

are for the same circuits used in Fig. 8, but only for an input probability value of 0.5. The results show slightly more spread than Fig. 8, due to the effect of the node capacitance distribution.

Finally, we measured the power under a general timing model. The power in some circuits increases appreciably due to multiple transitions/cycle. We compare the average activity measured from entropy to the power/area, in  $\mu$ W/MHz/gate, as shown in Fig. 10. For one circuit (ISCAS'85:c6288), the deviation was very large, as shown by the point at the far right in the figure. Hence, more work is needed to predict situations like this. Furthermore, the comparisons for the other circuits are not as good as before and show increased spread, as shown in Fig. 11.

## V. AREA MEASUREMENT

In the previous sections, we presented an approach for measuring the average activity of a circuit using entropy. In order to estimate the average power consumed by a circuit, we also need an estimate of the total area (capacitance) consumed by a circuit [see (7)]. Thus, one has to come up with techniques

| CIRCUIT | INPUTS | #GATES | k        |
|---------|--------|--------|----------|
| s400    | 25     | 164    | 2.92e-7  |
| s713    | 54     | 393    | 1.12e-15 |
| c2670   | 157    | 1193   | 1.78e-46 |
| c5315   | 178    | 2307   | 7.85e-53 |



Fig. 11. Comparison with general-delay power/area simulation results.

to measure the area consumed by a circuit from its high level description.

The importance of entropy in the computation of area complexity of a Boolean function has been known for some time. However, most of the work focuses on characterization of the area consumed by single output functions. In [8], Cheng *et al.* proposed a model for the area complexity of a multioutput Boolean function given by the following expression

$$\mathcal{A} = (1 - d)k2^n H. \tag{32}$$

Here  $\mathcal{A}$  is a measure of the area complexity of the circuit in terms of the literal count, d is the fraction of don't cares in the function specification, n is the number of inputs, H is the entropy of the output vector and k is some proportionality constant. In this paper, we have used gate count as a measure of area consumed by a circuit. In [17], Muller showed that the type of gates used affects the area complexity by a scaling constant, asymptotically. We assume that this fact holds true for all n.

According to the model of Cheng et al. [8], the area complexity increases exponentially in the number of inputs. This implies that circuits with input counts of the order of a few hundreds, which is quite common, would have an exponentially large area requirement. But this does not seem to be the case as can be seen from Table I, where the constant k is shown to vary widely as the number of inputs changes. The variation in k is so large that the model (32) is neither a good area estimator, nor a good area upper bound.

The fact that circuits with large numbers of inputs and outputs are typically designed to have reasonable (at least,

not exponentially large) area suggests that the model (32) is not applicable to practical circuits, and needs to be improved.

As a first step toward improving the area model (32), we have developed an efficient scheme for estimating a more accurate estimate of the entropy of a Boolean vector. This estimate takes the form of an upper bound on the entropy that is is more accurate than the zeroth-order approximation (11), because it does not completely ignore the bit correlations. Based on this, we then verified that two entropy-based simple bounds on the area seem to work well in practice well into the range of input counts where (32) breaks down. Both these topics are discussed below.

## A. Entropy Bound Computation

To compute a more accurate upper bound on the vector entropy, we will take into account pair-wise correlations between the bits. Before getting into the details of the computation, a definition of conditional entropy is in order. Let  $Y_1 \in \mathcal{Y}_1$  and  $Y_2 \in \mathcal{Y}_2$  be scalar random variables with joint probability density function given by  $p(y_1, y_2)$  and a conditional density function given by  $p(y_1 \mid y_2)$ . Then the conditional entropy [9] of  $Y_1$  given  $Y_2$ , denoted by  $H(Y_1 \mid Y_2)$ , is defined as

$$H(Y_1 \mid Y_2) = -\sum_{y_2 \in \mathcal{Y}_2} \sum_{y_1 \in \mathcal{Y}_1} p(y_1, y_2) \log_2 p(y_1 \mid y_2). \quad (33)$$

Intuitively, conditional entropy measures the additional information required to encode  $Y_1$  having specified (encoded)  $Y_2$ .

Now, we discuss the calculation of the upper bound on H(Y), where  $Y=[y_1,y_2,\cdots,y_n]$  is a Boolean output vector. It can be shown (refer to [9]) that

$$H(Y) = H(y_1) + H(y_2 \mid y_1) + H(y_3 \mid y_2, y_1) + \dots + H(y_n \mid y_{n-1}, \dots, y_1)$$
(34)

where the above equation is true for all orderings of the bits of vector Y. Also

$$H(y_k \mid y_m, y_{m-1}, \dots, y_1)$$
  
 $\leq H(y_k \mid y_i), \quad j = 1, 2, \dots, m.$  (35)

Using the above facts one can write an upper bound on H(Y) as

$$H(Y) \le H(y_1) + H(y_2 \mid y_1) + H(y_3 \mid y_2) + \dots + H(y_n \mid y_{n-1}).$$
 (36)

Since the expansion of H(Y) in terms of conditional entropies is independent of the ordering of  $y_i$  in Y, we get that the

| - Check       |        |         |         |         |          |  |  |
|---------------|--------|---------|---------|---------|----------|--|--|
| CIRCUIT       | INPUTS | OUTPUTS | ENTROPY | ENTROPY | % IMPROV |  |  |
| Name          |        |         | FIRST   | ZERO    |          |  |  |
| c880          | 60     | 26      | 16.47   | 17.48   | 5.76     |  |  |
| c1908         | 33     | 25      | 24.24   | 24.63   | 1.93     |  |  |
| <b>c267</b> 0 | 157    | 64      | 36.75   | 52.34   | 29.77    |  |  |
| c3540         | 50     | 22      | 17.92   | 19.42   | 7.70     |  |  |
| c5315         | 178    | 123     | 76.64   | 104.05  | 26.34    |  |  |
| c6288         | 32     | 32      | 31.03   | 31.17   | 0.43     |  |  |
| c7552         | 208    | 107     | 83.01   | 99.47   | 16.54    |  |  |
| s382          | 24     | 27      | 15.69   | 19.97   | 21.39    |  |  |
| s510          | 25     | 13      | 9.02    | 10.32   | 12.70    |  |  |
| s526          | 24     | 27      | 20.52   | 22.84   | 10.16    |  |  |
| s713          | 54     | 42      | 19.65   | 26.23   | 25.08    |  |  |
| s953          | 22     | 29      | 6.84    | 10.05   | 31.94    |  |  |
| s1196         | 31     | 31      | 15.83   | 18.23   | 13.20    |  |  |
| s5378         | 214    | 156     | 76.06   | 95.10   | 20.00    |  |  |
| s9234.1       | 247    | 250     | 186.72  | 225.91  | 17.35    |  |  |
| Average       |        |         |         |         | 16.02    |  |  |

TABLE II
COMPARISON OF FIRST- AND ZERO-ORDER BOUNDS ON ENTROPY (ISCAS'85 AND ISCAS'89 CIRCUITS)

above inequality must hold for all orderings of the components of vector Y. Let  $\mathcal Z$  be the set of all possible orderings of  $y_i$ ,  $i=1,2,\cdots,n$  in Y. Further, let  $H^1_{\mathrm{opt}}(Y)$  be defined as

$$H_{\text{opt}}^{1}(Y) = \min_{\mathcal{F}} H(Y). \tag{37}$$

Then, it follows that

$$H_{\text{opt}}^{1}(Y) \le H(y_1) + H(y_2 \mid y_1) + H(y_3 \mid y_2) + \dots + H(y_n \mid y_{n-1}).$$
 (38)

Also,

$$H(Y) \le H_{\text{opt}}^1(Y). \tag{39}$$

Note that if  $y_i$ ,  $i = 1, 2, \dots, n$  are not independent, then the above bound is strictly smaller than the sum of the individual entropies, i.e.,

$$H(Y) \le H_{\text{opt}}^1(Y) < H^0(Y) \tag{40}$$

where

$$H^{0}(Y) = \sum_{i=1}^{n} H(y_{i}). \tag{41}$$

Thus,  $H^1_{\mathrm{opt}}(Y)$  is called the optimal first-order bound on entropy. Since it is computationally expensive to find the optimal ordering, we will resort to a heuristic that orders the variables based on their *correlation coefficients*. Given this ordering, we will refer to the resulting entropy upper bound (36) as a first order bound, denoted  $H^1(Y)$ .

The following briefly describes how the heuristic works. First, we compute all the correlation coefficients  $\rho(y_i, y_j), i \neq j$ , and set  $\rho(y_i, y_i) = 0$ , as we are not interested in these.

To compute these coefficients, it suffices to find the joint probability of every two bits, i.e., the probability that  $y_iy_j=1$ . These probabilities can be found at the same time that signal probabilities are computed, using Monte Carlo analysis [11]. We then pick the element of the resulting correlation matrix that has the largest absolute value. The row and column indices of this element provide us the first two elements in our ordering, i.e.,  $y_1^o$  and  $y_2^o$ . We then set the correlation coefficients  $\rho(y_i, y_1^o) = 0$ ,  $i = 1, 2, \cdots, n$  as we do not want to pick these pairs. Now, we pick out the element that has the largest correlation coefficient (in absolute value) with  $y_2^o$ . This is the third element in our ordering. Then the correlation coefficients  $\rho(y_i, y_2^o)$ ,  $i = 1, 2, \cdots, n$  are set to zero. This procedure is repeated until all elements are exhausted. The algorithm has  $\mathcal{O}(n^2)$  time and space complexity.

In Table II, we perform a comparative study between the first-order bound on entropy  $H^1(Y)$  computed using the above heuristic, and the zeroth-order bound  $H^0(Y)$  on a few circuits from ISCAS'85 and ISCAS'89 benchmarks. It can be seen that, on the average,  $H^1(Y)$  offers a 16% improvement over  $H^0(Y)$ . In the next section, we use this bound to obtain empirical upper and lower bounds on the area complexity of a combinational circuit.

## B. Area Bounds

It is well known that (see [5]) as the number of inputs tends to infinity, the area complexity of a circuit grows exponentially. However, as our data has shown, this asymptotic bound is quite loose for practical circuits. Thus, one has to come up with better models to predict the area of a circuit. Here, we present a lower bound and an upper bound, which

TABLE III
COMPARISON OF LOWER AND UPPER BOUNDS ON AREA WITH ACTUAL AREA REQUIREMENTS (ISCAS'85 AND ISCAS'89 CIRCUITS)

| CIRCUIT       | INPUTS | AREA   | LOWER | UPPER  |
|---------------|--------|--------|-------|--------|
| Name          | #      | #GATES | BOUND | BOUND  |
| s208.1        | 18     | 104    | 65    | 405    |
| s298          | 17     | 119    | 85    | 515    |
| s953          | 22     | 395    | 57    | 409    |
| s820          | 23     | 289    | 36    | 267    |
| s444          | 24     | 181    | 111   | 841    |
| s382          | 24     | 158    | 108   | 823    |
| s344          | 24     | 160    | 107   | 876    |
| s526          | 24     | 193    | 141   | 1075   |
| <b>s4</b> 00  | 25     | 164    | 120   | 935    |
| <b>s</b> 510  | 25     | 211    | 81    | 631    |
| s1196         | 31     | 529    | 132   | 1169   |
| c6288         | 32     | 2416   | 265   | 2394   |
| c1908         | 33     | 880    | 211   | 2429   |
| s420.1        | 34     | 218    | 151   | 1412   |
| c432          | 36     | 160    | 53    | 509    |
| c499          | 41     | 202    | 326   | 3386   |
| c1355         | 41     | 546    | 325   | 3383   |
| c3540         | 50     | 1669   | 211   | 2432   |
| s713          | 54     | 393    | 244   | 2919   |
| c880          | 60     | 383    | 222   | 2800   |
| s838.1        | 66     | 446    | 478   | 6328   |
| <b>c267</b> 0 | 157    | 1193   | 1049  | 20228  |
| c5315         | 178    | 2307   | 2429  | 49191  |
| c7552         | 208    | 3512   | 2948  | 63133  |
| s5378         | 214    | 2779   | 2785  | 60492  |
| s9234.1       | 247    | 5844   | 7705  | 176458 |

were arrived at empirically, that seem to describe the area behavior of the ISCAS'85 and ISCAS'89 benchmark circuits, much better than (32).

We begin by assuming that the area of a circuit depends only on the output entropy (when the inputs are assumed to be independent and equi-probable with probability 0.5) and the number of inputs in a circuit in the following way

$$\mathcal{A} \propto H(Y)f(n) \tag{42}$$

where H(Y) is the entropy of the output vector and f(n) is some function of the number of inputs n. This model is along the same lines of previous area models (see [2], [4], [5], [8]). In our paper, we have characterized the function f(n) empirically, to obtain functions  $f_{\text{low}}(n)$  and  $f_{\text{upp}}(n)$  such that

$$k_1 H(Y) f_{\text{low}}(n) < \mathcal{A} < k_2 H(Y) f_{\text{upp}}(n). \tag{43}$$

We have found that the following bounds work well

$$0.4H^{1}(Y)\left(\frac{n}{\log_{10} n}\right) \le \mathcal{A} \le 2H^{1}(Y)(n\log_{10} n)$$
 (44)

where we have used  $H^1(Y)$  instead of H(Y), because H(Y) is too expensive to compute. These bounds suggest that the circuit area grows sublinearly for medium sized circuits. These bounds are compared with the actual area (gate count) consumed by the circuits in Table III. Of course, these bounds are not as close to each other as one would like, and we do not suggest that they be used *directly* for area estimation. Nevertheless, these bounds are valuable because they are valid over a wide range of circuits and input node counts. In this, they represent a significant improvement over the previous model (32) which completely breaks down when the input count is around 200.

#### VI. CONCLUSION

There is a need for high-level power estimation, and the RTL level seems the reasonable place to start. We proposed to use computational work based on entropy as a high-level measure of power. Preliminary investigation shows that entropy is a viable measure of circuit activity, but needs improvement to

account for general delay and capacitance distribution. We also presented an algorithm to estimate the entropy of a vector which was used in obtaining improved bounds on the area complexity of combinational circuits.

#### REFERENCES

- [1] F. Najm, "A survey of power estimation techniques in VLSI circuits," *IEEE Trans. VLSI Syst.*, vol. 2, pp. 446–455, Dec. 1994.
- [2] L. Hellerman, "A measure of computational work," *IEEE Trans. Comput.*, vol. C–21, pp. 439–446, May 1972.
- [3] K. Mase, "Comments on 'A measure of computation work' and 'Logical network cost and entropy," *IEEE Trans. Comput.*, vol. C-27, pp. 94-95, Jan. 1978.
- [4] R. W. Cook and M. J. Flynn, "Logical network cost and entropy," *IEEE Trans. Comput.*, vol. C-22, pp. 823–826, Sept. 1973.
  [5] N. Pippenger, "Information theory and the complexity of Boolean
- [5] N. Pippenger, "Information theory and the complexity of Boolean functions," *Mathematical Systems Theory*. New York: Springer-Verlag, vol. 10, 1977, pp. 129–167.
  [6] J. A. Dussault, "A testability measure," in *Proc. IEEE Digital Semicon*-
- [6] J. A. Dussault, "A testability measure," in *Proc. IEEE Digital Semiconductor Test Symp.*, Cherry Hill, NJ, Oct.–Nov. 1978, pp. 113–116.
   [7] V. D. Agrawal, "Information theory in digital testing—A new approach
- [7] V. D. Agrawal, "Information theory in digital testing—A new approach to functional test pattern generation," *IEEE Int. Conf. Circuits Comput.*, Port Chester, NY, Oct. 1–3, 1980, pp. 928–931.
- [8] K.-T. Cheng and V. Agrawal, "An entropy measure for the complexity of multi-output Boolean functions," in *Proc. 27th ACM/IEEE Design Automation Conf.*, 1990, pp. 302–305.
- [9] T. M. Cover and J. A. Thomas, *Elements of Information Theory*. New York: Wiley, 1991.
- [10] F. Najm, "Transition density: A new measure of activity in digital circuits," *IEEE Trans. Computer-Aided Design*, vol. 12, pp. 310–323, Feb. 1993.
- [11] F. Najm, S. Goel, and I. N. Hajj, "Power estimation in sequential circuits," in Proc. ACM/IEEE Design Automation Conf., 1995.
- [12] S. Even, Graph Algorithms. Rockville, MD: Computer Science, 1979.
- [13] J. Monteiro and S. Devadas, "A methodology for efficient estimation of switching activity in sequential logic circuits," in *Proc. ACM/IEEE 31st Design Automation Conf.*, 1994, pp. 12–17.
  [14] C.-Y. Tsui, M. Pedram, and A. M. Despain, "Exact and approximate
- [14] C.-Y. Tsui, M. Pedram, and A. M. Despain, "Exact and approximate methods for calculating signal and transition probabilities in FSM's," in *Proc. ACM/IEEE 31st Design Automation Conf.*, 1994, pp. 18–23.
- [15] F. Najm, "Statistical estimation of the signal probability in VLSI circuits," Coord. Sci. Lab., Urbana, IL, Tech. Rep. #UILU-ENG-93-2211, Apr. 1993.
- [16] M. Xakellis and F. Najm, "Statistical estimation of the switching activity in digital circuits," in *Proc. 31st ACM/IEEE Design Automation Conf.*, 1994, pp. 728–733.

- [17] D. E. Muller, "Complexity in electronic switching circuits," IRE Trans. Electronic Comput., vol. 5, pp. 15–19, Mar. 1956.
- [18] F. Najm, "Toward a high-level power estimation capability," in *Proc. ACM/IEEE Int. Symp. Low-Power Design*, 1995, pp. 87–92.



Mahadevamurty Nemani (S'93) received the B.Tech degree from the Indian Institute of Technology, Madras, India, in 1991, and the M.S. degree from the University of Illinois, Urbana, in 1993, both in mechanical engineering. He is currently pursuing the Ph.D. degree with the Department of Electrical and Computer Engineering, University of Illinois.

In 1995, he was a summer intern with Intel Corporation, Santa Clara, CA. His research interests include power estimation, low-power design,

synthesis for low-power, VLSI for signal processing applications, and systems theory.



Farid N. Najm (S'85–M'89) received the B.E. degree (with distinction) in electrical engineering from the American University of Beirut, Lebanon, in 1983, and the M.S. and Ph.D. degrees in electrical and computer engineering from the University of Illinois, Urbana, in 1986 and 1989, respectively.

He was with General Swedish Electric Company in Västerås, Sweden, in 1982, and was a Teaching Assistant with the American University of Beirut in 1983, and later was an Electronics Engineer with this institution from 1983 to 1984. He held a visiting

position with the University of Warwick, England, in 1984. While at the University of Illinois, 1985 to 1989, he was a Research Assistant with the Coordinated Science Laboratory, and worked for a year with the VLSI Design Laboratory, Texas Instruments, Inc., Dallas, TX. In July 1989, he joined Texas Instruments as a member of Technical Staff with the Semiconductor Process and Design Center. In August 1992, he became an Assistant Professor with the Electrical and Computer Engineering Department, University of Illinois. His research interests are in the area of CAD tool development for VLSI circuits, including power estimation, low-power design, reliability prediction, synthesis of low-power and reliable VLSI, timing analysis, test generation, and circuit and timing simulation.

Dr. Najm received the IEEE Transactions on Computer-Aided Design Best Paper Award in 1992, and the NSF Research Initiation Award in 1993.