# EEC 216 Winter 2008 Problem Set #2

Rajeevan Amirtharajah Dept. of Electrical and Computer Engineering University of California, Davis

January 29, 2008

**Reading:** This set of readings is focused on architectural level power estimation and looks at two pairs of papers by two different sets of researchers. Landman's work has focused on empirical methods toward power estimation while Nemani has explored analytical approaches based on information theoretic estimates of complexity and switching activity. Readings are available on the course web page.

- 1. P.E. Landman and J.M. Rabaey, "Architectural Power Analysis: The Dual Bit Type Method" [1]
- 2. P.E. Landman and J.M. Rabaey, "Activity-Sensitive Architectural Power Analysis" [2]
- 3. M. Nemani and F. N. Najm, "Towards a High-Level Power Estimation Capability" [3]
- 4. M. Nemani and F. N. Najm, "High-Level Area and Power Estimation for VLSI Circuits" [4]

### 1 Short-Circuit Power

**Simulation:** This problem requires extensive use of HSPICE. For information on running HSPICE on the UCD ECE department network, follow this URL:

http://www.ece.ucdavis.edu/cad/hspice/index.html.

If you want to use another version of Spice (e.g. PSpice, Berkeley Spice, Spectre), you must get permission from the instructor first.

**Device Models:** This problem relies on freeware models from the Predictive Technology Model Group [5, 6]. Download the model file 45nm\_MGHiK.sp from the course web site and include it in your Spice deck. You should already have the models from completing Problem Set 1.

**Problem 1.1** (15 points) **Average Switching Power:** We've seen in lecture that short circuit current can be modeled as an additional load capacitor in parallel with the actual output load of a CMOS gate, and so the short circuit current contribution to total power can be lumped with the physical dynamic power. A standard load for analyzing CMOS logic is a "Fanout of 4" (FO4) configuration, where the gate being characterized drives four copies of a minimum sized inverter. This is intended to model an average datapath load when the gate is used in a design. Simulate and measure the average current for a minimum sized (P/N ratio 2:1 or as you determined in Problem Set 1) inverter driving an FO4 load for one charge and discharge cycle. To simulate its operation more accurately, drive it with another copy of the inverter rather than with a voltage source. This will "shape" the input to the inverter under test. Use the following voltage source to drive your inverter under test:

```
VinO inO gnd pulse (vdd, OV, 990ps, 10ps, 10ps, 990ps, 2.0ns)
```

Be sure to isolate the power supply of the inverter under test from the supplies of the driver and the loads so that only the power of interest is measured (download the file macros.sp for some useful inverter circuits). What is the average total current and average total power?

**Problem 1.2** (10 points) **Modeling Short Circuit Current.** In the last problem set we examined average total power for a CMOS ring oscillator consisting of inverters. In this problem, we will explore the accuracy of the triangular current waveform approximation.

Download and simulate the HSPICE deck ps2.sp and the file of spice macros macros.sp from the course web page. Modify the ps2.sp file to do the following. (1) Measure the total average power. (2) Measure the peak current drawn from  $V_{DD}$ ,  $I_{peak}$ , and the duration of the current pulse  $t_{sc}$ . (3) Compute the short circuit power based on the formula shown in lecture. What percentage of the total power is due to short circuit current? In class, we also learned a formula to relate  $t_{sc}$  to the threshold voltages  $V_T$  of the devices and the 10%-90% rise/fall times of the inputs  $t_r$ . The .op card causes the operating point data to be output during the simulation. (4) Using the  $V_T$  data from the simulation for the transistors in the Device Under Test inverter, compute  $t_{sc}$  from the input rise time. How does this compare to  $t_{sc}$  measured directly from the simulation?

**Problem 1.3** (15 points) **Short Circuit Power vs. Gate Loading**. Short circuit power is strongly dependent on the capacitive load driven by a logic gate. (1) Modify the ps2.sp file to simulate the range of fanouts listed in the table. Write down the average power, peak current, and measured short circuit power under each loading condition and compute the percentage of total power contributed by the short circuit current. (2) Explain any trends that you see in the data. Turn in your modified Spice deck along with a composite plot of the peak currents for each load condition and fill in Table 1.

**Problem 1.4** (5 points) **Short Circuit Current and Input Risetime.** The stimulus waveform to measure the short circuit current above has very different rise and fall times from the stimulus waveform from the first part of this problem. Comparing the measurements for Problem 1.1 and 1.2, do you think short circuit power is a significant issue for typical output loads for this 45 nm CMOS technology?'

| Fanout | Total Power $(\mu W)$ | $I_{peak}$ ( $\mu \mathbf{A}$ ) | Estimated SC Power ( $\mu$ W) | Percentage SC Power |
|--------|-----------------------|---------------------------------|-------------------------------|---------------------|
| 0      |                       |                                 |                               |                     |
| 2      |                       |                                 |                               |                     |
| 4      |                       |                                 |                               |                     |
| 6      |                       |                                 |                               |                     |
| 8      |                       |                                 |                               |                     |
| 10     |                       |                                 |                               |                     |
| 12     |                       |                                 |                               |                     |

Table 1: Short circuit power versus fanout.

**Problem 1.5** (10 points) **Leakage Current:** Although we haven't discussed the mechanisms of leakage current in detail yet, it is quite easy to measure in simulation. Simulate a single inverter with a P/N width ratio of 2:1 at both steady output states. (1) Measure the current flowing through the inverter and fill in Table 2 below. (2) Are the currents different for the two states? If they are different, why?

| Vout | $I_{VDD}$ |
|------|-----------|
| High |           |
| Low  |           |

Table 2: CMOS Inverter Leakage Current.

## 2 Energy-Delay Product and Other Metrics

**Problem 2.1** (5 points) **Delay:** Suppose a new MOSFET has been invented, where instead of the drain-source current varying quadratically with  $V_{GS} - V_T$ , it varies as:

$$I_{DS} = \frac{\mu C_{ox}}{2} \frac{W}{L} (V_{GS} - V_T)^{2.5}$$
(1)

where channel length modulation has been ignored. Using the fundamental equation for delay in CMOS,  $\Delta t = \frac{C\Delta V}{I_{DS}}$ , and assuming a voltage swing of  $V_{DD}$ , write the expression for propagation delay  $t_{pd}$  assuming the power law dependence for the drain current above.

**Problem 2.2** (3 points) **Energy-Delay:** Given that the energy stored on a circuit node is  $\frac{CV_{DD}^2}{2}$ , using the equation derived above for  $t_{pd}$  write an expression for the energy-delay product.

**Problem 2.3** (12 points) **Optimal Supply Voltage:** Find the optimal supply voltage (that is, the voltage which minimizes energy-delay product) for the energy-delay expression derived in the preceding part. How does this compare to the optimal  $V_{DD}$  for the classical quadratic drain current in saturation?

**Problem 2.4** (10 points) **Energy-Delay Squared:** Suppose we want to optimize a new metric: the energy-delay squared product (ED2P). Find the  $V_{DD}$  which minimizes ED2P given the saturation drain current equation above.

#### **3** Activity Factors and Transition Probabilities

**Problem 3.1** (5 points) **And-Or-Invert Gates:** And-Or-Invert (AOI) gates are often included in standard cell libraries to reduce the area of synthesized combinational logic because they implement common logic functions with fewer devices. Draw a static CMOS gate which implements the AOI function  $Y = \overline{A \cdot (B + C + D)}$ .

**Problem 3.2** (10 points) Activity Factors for And-Or-Invert Gates: Write down the truth table for the AOI gate for Y. Determine the transition activities  $\alpha_{0\to 1}$  of the output assuming the inputs are independent and uniformly distributed.

**Problem 3.3** (25 points) And-Or-Invert Gates versus Basic Gates: (1) Draw the simplest possible implementation of the logic function Y using 2-input basic gates (NOR, OR, NAND, AND, XOR). (2) Assuming the inputs are independent and their probabilities of equaling 1 are 0.5, derive the activity factors for the outputs and any internal nodes of the 2-input gate implementation based on the transition probabilities formulas from Lecture 3. Assume the self-loading capacitance of a gate is the same as its input capacitance (i.e., both are equal to some unit capacitance  $C_u$ ), and all gates are equivalent in terms of capacitance. (3) How much more efficient is the AOI implementation than the 2-input gate implementation in terms of *effective* capacitance (write answer in units of  $C_u$ )?

#### 4 Activity of Full Adders

**Problem 4.1** (45 points) **Full Adder Implementation:** Many digital arithmetic functions require addition. A Full Adder is a combinational logic block that takes two summand inputs (A and B) and a carry input  $C_i$  to produce a sum output  $S_o$  and a carry output  $C_o$ . The logic equations for the outputs are:

$$S_o = A \oplus B \oplus C_i \tag{2}$$

$$C_o = AB + BC_i + AC_i. \tag{3}$$

Implement these equations using only two-input *noninverting* gates (XOR, AND, OR). Assuming the three inputs are independent and uniformly distributed, compute the transition probabilities of the output nodes and any internal nodes.

**Problem 4.2** (10 points) **Input Reordering:** For the sum output  $(S_o)$  circuit, recompute the transition probabilities when the inputs are independent but the probabilities of each signal equaling 1 are now:  $p_A = 0.5$ ,  $p_B = 0.2$ , and  $p_{C_i} = 0.1$ . If possible, rearrange the inputs to minimize switching activity on internal nodes.

| Metal Layer | W                     | Η                     | t                     | $c_{pp}$ | $c_{fringe}$ | $\% c_{fringe}$ |
|-------------|-----------------------|-----------------------|-----------------------|----------|--------------|-----------------|
| M1          | $0.072~\mu\mathrm{m}$ | $0.072~\mu\mathrm{m}$ | $0.320~\mu\mathrm{m}$ |          |              |                 |
| M2          | $0.072~\mu\mathrm{m}$ | $0.090~\mu\mathrm{m}$ | $0.690~\mu\mathrm{m}$ |          |              |                 |

Table 3: Metal capacitances (per unit length).

**Problem 4.3** (10 points) **Glitching:** Consider the carry output  $(C_o)$  circuit from Problem 4.1. Suppose each noninverting gate (XOR,AND,OR) has a delay of 2 units. Assuming the inputs only change simultaneously, draw a timing diagram which creates a glitch on  $C_o$ . Hint: think about  $C_o$  being high at time t = 0. How long is the glitch duration in terms of unit delays?

**Problem 4.4** (10 points) **Delay Balancing:** Suppose inverting gates (NAND, NOR) have a delay of 1 unit because they don't require an inverter on their outputs. Reimplement the logic function for carry out  $C_o$  to balance the delays and eliminate the output glitch.

# 5 Power Estimation for Memories

**Problem 5.1** (15 points) **Estimating Wire Capacitance:** In this problem we will explore optimizing the aspect ratio of a single memory array in terms of the wire capacitance. There are generally two kinds of wires in an array, *wordlines* which run horizontally and activate a single row, and *bitlines* which run vertically and carry data. Often these are implemented in two different metal layers. Table 3 lists the dimensions for the lower two levels of metal. The relative dielectric coefficient  $\epsilon_r$  for SiO<sub>2</sub> is 3.9.  $\epsilon_0$  for free space is  $8.854 \times 10^{-12}$  F/m. Given these constants and the dimensions listed in the table, fill in the parallel plate capacitance, fringing capacitance, and percentage of fringing capacitance of the total using the formulas given in lecture.

**Problem 5.2** (15 points) **Optimizing Memory Array Aspect Ratio:** Figure 1 shows a schematic of a memory array where the total number of bits is  $2^N$ . The *N* address bits  $A_i$ can be divided into *k* row address bits and N - k column address bits. Suppose the per unit length capacitance of the bitlines is  $C_u$  and the per unit length capacitance of the wordlines is related to  $C_u$  by some factor  $\gamma$ . The total wiring capacitance of the memory array is proportional to the number of bits in a row and the number of bits in a column, times the appropriate per unit length capacitances. (1) Write a formula for this wiring capacitance in terms of  $C_u$ ,  $\gamma$ , N, and k. (2) Find the optimum partitioning (which minimizes wiring capacitance) between row address bits and column address bits (i.e. memory array aspect ratio) in terms of these parameters.

**Problem 5.3** (5 points) **An Example:** Using the optimum formula derived in Problem 5.2 and the capacitances estimated in Problem 5.1, solve for the optimal aspect ratio assuming wordlines run in M2 and bitlines in M1.



Figure 1: Memory array schematic.

# References

- P. Landman and J. Rabaey, "Architectural power analysis: The dual bit type method," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 3, no. 2, pp. 173–187, Jun. 1995.
- [2] —, "Activity-sensitive architectural power analysis," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 15, no. 6, pp. 571–87, Jun. 1996.
- [3] M. Nemani and F. N. Najm, "Towards a high-level power estimation capability," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 15, no. 6, pp. 588–98, Jun. 1996.
- [4] —, "High-level area and power estimation for VLSI circuits," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 18, no. 6, pp. 697–713, Jun. 1999.
- [5] Nanoscale Integration and Modeling (NIMO) Group, Arizona State University. (2006, December) Predictive technology model (ptm). latest.html. [Online]. Available: http://www.eas.asu.edu/~ptm/
- [6] W. Zhao and Y. Cao, "New generation of predictive technology model for sub-45 nm early design exploration," *IEEE Trans. Electron Devices*, vol. 53, no. 11, pp. 2816–23, November 2006.