Problem 1

1.1 Figure 1 displays the single inverter switching cycle and the corresponding supply current (6 points). The current shape matches intuition in that the current is drawn from the supply during charging of the output to $V_{DD}$. The other positive current spikes correspond to capacitive coupling on the output (the small blips can be seen on the output node) pushing charge into the supply. Note that the current is not particularly triangular in shape, so there is a component corresponding to the dynamic power and a component corresponding to short circuit current (3 points). The average current is given by a measure card as 869 nA which corresponds to an average power of 869 nW (6 points). This current is many orders of magnitude higher than the leakage current measured below, indicating that leakage may not be a significant issue at this process node. However, the fact that at any given time large numbers of devices which aren’t switching will be leaking and that there could be long durations where no nodes are switching could result in leakage becoming a chip-wide issue even if it is negligible at the individual gate level.
Figure 1: Single charge-discharge cycle and corresponding power supply current for an inverter driving an FO4 load.

The spice deck which generated the plot in Figure 1 and the measured currents is:

* EEC 216 W08 Problem Set 2 Number 1
* File: ps2.sp
* Author: Raj Amirtharajah (ramirtha@ece.ucdavis.edu)
* Date: 01/28/08
**
** Problem Set 2
**
** Problem 1.1: Average Switching Power
** Last edited: Feb 11 21:34 2008 (ramirtha)
**---------------------------------------------------------------

.include 'macros.sp'
.include '45nm_MGHiK.sp'
.param lambda=24nm vdd=1.0V
.options accurate post
Note that the given stimulus in the spice deck has very long input rise times to the device under test, which will exaggerate the short-circuit power contribution. The total average power measured in this is problem as 1.81µW. The peak current for this simulation is $I_{\text{peak}} = 49.9\mu A$ and the short circuit current duration is approximately $t_{sc} = 1200$ ps (see Figure 2). Using the following formula and plugging in the appropriate values ($\alpha = 1$ for the continuously switching output of the inverter):
Table 1: Short circuit power versus fanout.

\[
P_{sc} = t_{sc} V_{DD} I_{\text{peak}} \alpha f = (1200\text{ps})(1.0\text{V})(49.9\mu\text{A})(1)(25\text{MHz}) \tag{1}
\]

which yields \( P_{sc} = 1.497\mu\text{W} \) (3 points). This number is less than the average measured total power, but is a very large fraction of the total, so this approximation to the short circuit current may be poor (2 points). The \( V_T \) reported by HSPICE is -231 mV for the PMOS and 278 mV for the NMOS and the output rise and fall times are 383.6 ps and 384.2 ps, respectively. The input rise and fall times are 1.6 ns each (2 points). Plugging these times into the following equation,

\[
t_{sc} \approx \frac{V_{DD} - V_{Tn} + V_{Tp}}{V_{DD}} t_r \tag{2}
\]

yields 982 ps, about 18% smaller than the measured \( t_{sc} \). Plugging the revised \( t_{sc} \) into the short circuit power formula (Equation 1) gives 1.23\mu W. This is may still be an unreasonably high proportion of the total power (67.96 %) (3 points). However, since the input slopes are highly exaggerated this number is nevertheless plausible.

Table 1 summarizes the data which indicate that as gate loading increases, the fraction of power due to short circuit current increases for these long input rise times (5 points). Since the output rise time increases for the fixed width driver as the load increases, short circuit current flows for a longer period of time resulting in an increase in the total power (in addition to the increased dynamic power due to the larger loads). The fraction corresponding to short circuit current does not change much for these examples. Note that the approximations for short circuit current appear to yield more reasonable results for the long risetimes used in this problem, however the scaling of short circuit power with load capacitance is not really observed in the data for this example (5 points).

Figure 2 shows the peak current scaling with the increasing inverter fanout, as would be expected since a larger capacitance must be charged. Note that the peak in current occurs later and later as the fanout increases, showing that it takes longer for both devices to turn fully on (5 points).

1.4 Given that the estimates for short circuit power based on the really fast rise times (below 70 ps) seem to exaggerate the short circuit current component, it is likely that the classical models for short circuit current don’t really hold. In other words, most of the peak current
Figure 2: Peak current scaling with increasing fanout.
is used to charge and discharge the load capacitance and very little is wasted as short circuit current unless the input risetimes are very long (5 points).

1.5 Figure 3 plots the leakage current for an inverter with P/N ratio of 2:1 at both high and low output states. The currents are summarized in Table 2 (6 points). The currents are not the same with the output low current (NMOS on, PMOS off) being about 2.5 times bigger. There are two competing effects in determining leakage current: threshold voltage and device width. In this case, the P device width is sufficiently bigger to produce increased leakage in the output high state (3 points). Note that in 45 nm technology, despite the inclusion of a high-K gate dielectric, the gate leakage current is actually dominant as shown in the figure. For the output high state, the drain-source current through the PMOS device is determined by KCL:

\[
I_{DS,PMOS} = I_{G,PMOS} + I_{G,NMOS} + I_{DS,NMOS}
\]

\[
-95.1 = -77.7 + -11.0 + -9.22
\]

\[
\approx -97.8
\]

where all currents are in nA. A similar calculation also holds for the current in the output low state, in which case the NMOS drain-source current is much less than the NMOS gate leakage.

One can also measure the currents for an inverter with the PMOS device at the same width as the NMOS device. In this case, the difference in leakage current between the two states should be primarily due to the difference in threshold voltages and gate leakage for the two types of device. The leakage in the low output state corresponds to 15.7 nA, indicating that the PMOS width is lower by a factor of two, but the PMOS leakage is still higher than the corresponding NMOS leakage in the output high state. This indicates that the PMOS threshold is lower, confirmed by printing the operating point from the simulation. Note that this is approximately a factor of two less than the leakage reported in Table 2, which is consistent with the 2X sizing chosen for the PMOS device in this circuit.

<table>
<thead>
<tr>
<th>$V_{out}$</th>
<th>$I_{VDD}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>High</td>
<td>95.1 nA</td>
</tr>
<tr>
<td>Low</td>
<td>37.5 nA</td>
</tr>
</tbody>
</table>

Table 2: CMOS Inverter Leakage Current.
(a) Gate and source-drain leakage for output high.

(b) Gate and source-drain leakage for output low.

Figure 3: Static (leakage) current for an inverter at both output states.
The spice deck for this problem follows:

* EEC 216 W08 Problem Set 2 Number 1
* File: ps2.sp
* Author: Raj Amirtharajah (ramirtha@ece.ucdavis.edu)
* Date: 01/28/08
**
** Problem Set 2
**
** Problem 1: Short Circuit Power
** Last edited: Feb 12 22:07 2008 (ramirtha)
**---------------------------------------------------------------------

.include 'macros.sp'
.include '45nm_MGHiK.sp'
:param lambda=24nm vdd=1.0V

.options accurate post kcltest
.temp 27

.tran 1ps 80.0ns

.op
.global vdd gnd vcc
.probe i(Xdut.vps) i(Xdut.vns)

* Power Supplies
Vvdd vdd gnd dc=vdd
Vvcc vcc gnd dc=vdd

* Stimulus
Vin0 in0 gnd pulse (0V, vdd, 18.0ns, 2.0ns, 2.0ns, 18.0ns, 40.0ns)

* Inverter Macro for Short Circuit Current Measurement
* ----------------------------------------------------
.macro invSC in out
Vps vps vdd dc=0V
Vpg in inp dc=0V
Xp0 vps inp out pfet Wi='2*5*lambda'
Xn0 vns inn out nfet Wi='5*lambda'
Vns gnd vns dc=0V
Vng in inn dc=0V
.eom
* Inverters
* 
.macro inv in out
Xp0 vdd in out pFet Wi='2*5*lambda'
Xn0 gnd in out nFet Wi='5*lambda'
.eom

.macro invT in out
Xp0 vcc in out pFetT Wi='2*5*lambda'
Xn0 gnd in out nFetT Wi='5*lambda'
.eom

* Short Circuit Current Test
* 
.ic out=vdd
Xdut in0 out invSC
* Xld0 out flt0 invT M=4

.macro invSCmin in out
Vps vps vdd dc=0V
Xp0 vps in out pFet Wi='5*lambda'
Xn0 vns in out nFet Wi='5*lambda'
Vns gnd vns dc=0V
.eom

* FO4 Leakage Power Test
* 
.XdutL0 vdd outL0 invSC
.XdutH0 gnd outH0 invSC

.XdutL1 vdd outL1 invSCmin
.XdutH1 gnd outH1 invSCmin

* Measure Cards
* 
.measure tran iavg AVG i(Vvdd)
.measure tran ipeak MIN i(Vvdd)
.measure tran itr trig v(in0) val='0.1*vdd’ rise=2
+ targ v(in0) val='0.9*vdd’ rise=2
.measure tran itf trig v(in0) val='0.9*vdd’ fall=1
+ targ v(in0) val='0.1*vdd’ fall=1
.measure tran otr trig v(out) val='0.1*vdd’ rise=1
+ targ v(out) val='0.9*vdd’ rise=1
.measure tran otf trig v(out) val='0.9*vdd’ fall=2
+ targ v(out) val='0.1*vdd’ fall=2
* .alter
* F04 Dynamic Power Test
* ----------------------
* Xbuf in0 a0  invT
* Xdut a0  out  invSC
* Xld0  out  flt0  invT  M=4

* .alter
* Vin0 in0 gnd  pwl  (0ps vdd, 1.0ns vdd, 1.01ns 'vdd/2')

* .alter
* Xld0  out  flt0  invT  M=4
* Vin0 in0 gnd  pulse  (0V, vdd, 18.0ns, 2.0ns, 2.0ns, 18.0ns, 40.0ns)

.alter
Xld0  out  flt0  invT  M=2

.alter
Xld0  out  flt0  invT  M=4

.alter
Xld0  out  flt0  invT  M=6

.alter
Xld0  out  flt0  invT  M=8

.alter
Xld0  out  flt0  invT  M=10

.alter
Xld0  out  flt0  invT  M=12

.end
Problem 2

2.1 The propagation delay is determined by plugging in the equation for drain current assuming the voltage swing and the gate drive are equal to the power supply voltage $V_{DD}$ (5 points):

$$t_{pd} = \frac{CV_{DD}}{I_{DS}} = \frac{CV_{DD}}{\frac{\mu C_{ox} W}{L} (V_{DD} - V_T)^{2.5}}.$$ (4)

2.2 Energy delay product is the product of the expression derived above and the energy expression for a capacitor $C$ charged up to $V_{DD}$ (3 points):

$$EDP = E \cdot t_{pd} = \frac{C^2 V_{DD}^3}{\frac{\mu C_{ox} W}{L} (V_{DD} - V_T)^{2.5}}.$$ (5)

2.3 To solve this, take the derivative of the energy-delay product expression with respect to $V_{DD}$, set it equal to 0, and solve for $V_{DD}$ (6 points):

$$\frac{\partial EDP}{\partial V_{DD}} = \frac{C^2 L}{\mu C_{ox} W} \frac{(V_{DD} - V_T)^{2.5}3V_{DD}^2 - V_{DD}^32.5(V_{DD} - V_T)^{2.5-1}}{(V_{DD} - V_T)^{2.5}}$$ (6)

For the derivative to equal 0, the numerator must be set to 0:

$$(V_{DD} - V_T)^{2.5}3V_{DD}^2 - V_{DD}^32.5(V_{DD} - V_T)^{1.5} = 0$$

$$(V_{DD} - V_T)^{2.5}3V_{DD}^2 = V_{DD}^32.5(V_{DD} - V_T)^{1.5}$$

$$V_{DD} - V_T = \frac{2.5}{3} V_{DD}$$

$$V_{DD} = 6V_T$$ (7)

The optimal $V_{DD}$ for classical MOS is $3V_T$ (2 points). For the 2.5 power-law device, however, the minimum EDP voltage is $6V_T$, because the numerator of the energy-delay product scales faster than the denominator, but not as fast as the classical MOS case. If the device obeyed a cube-law behavior, then the optimal $V_{DD}$ is 0V (this is left as an exercise for the reader). This example indicates that metrics which are relevant at some regimes of MOSFET operation may not be useful in others (4 points).

2.4 Energy delay squared product is the product of the expression derived above and the energy expression for a capacitor $C$ charged up to $V_{DD}$ (3 points):

$$EDP = E \cdot t_{pd}^2 = \frac{C^3 V_{DD}^4}{\left(\frac{\mu C_{ox} W}{L}\right)^2 (V_{DD} - V_T)^5}.$$ (8)

$$\frac{\partial EDP}{\partial V_{DD}} = \frac{C^3 L^2}{\left(\frac{\mu C_{ox} W}{L}\right)^2} \frac{(V_{DD} - V_T)^54V_{DD}^3 - V_{DD}^45(V_{DD} - V_T)^4}{(V_{DD} - V_T)^{10}}$$ (9)

For the derivative to equal 0, the numerator must be set to 0:
\[(V_{DD} - V_T)^5 4V_{DD}^3 - V_{DD}^4 5(V_{DD} - V_T)^4 = 0\]
\[(V_{DD} - V_T)^5 4V_{DD}^3 = V_{DD}^4 5(V_{DD} - V_T)^4\]
\[V_{DD} - V_T = \frac{5}{4} V_{DD}\]
\[V_{DD} = -4V_T\]  

Yielding an inconsistent overall result compared to before. The only plausible solution is to set \(V_{DD}\) to 0V. In this example, since delay\(^2\) increases faster than energy decreases, using delay-squared as a metric implies using a zero volt power supply (7 points).
Problem 3

3.1 Two possible pulldown and pullup networks are shown in Figures 4 and 5 (5 points) Exchanging the order of the series connections also creates valid pullup networks.

Given the logic equation for $Y$, computing the transition activity factor $\alpha_{0\rightarrow1}$ is easy once the truth table for output $Y$ is determined. This is shown in Table 3. Using the equation from lecture, the transition probability is:

$$\alpha_{0\rightarrow1} = \frac{N_0 \cdot N_1}{2^N \cdot 2^N}$$

where $N_0$ and $N_1$ are the numbers of 0’s and 1’s in the output column of the truth table, respectively. Plugging in the numbers from the tables into the formula yields (8 points):
Table 3: Truth table for function $Y = A \cdot (B + C + D)$.

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th>Y</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Figure 6: Y-output two input gate implementation.

$$\alpha_{0\rightarrow 1}(Y) = \frac{7}{16} \times \frac{9}{16} = \frac{63}{256}$$

3.3 Figures 6 show a small implementation with only two input gates. Other implementations are possible but require the use of inverters (5 points).

The logic gates don’t contain any reconvergent nodes, so we can ignore conditional probabilities when computing the transition activity factors. Using the formulas from the table, we can compute the transition activity factors assuming input probabilities of 0.5. For the Y logic equation (5 points):

$$P_{0\rightarrow 1}(Z) = (1 - p_D)(1 - p_C)[1 - (1 - p_D)(1 - p_C)]$$

$$= (1 - 0.5)(1 - 0.5)[1 - 0.5 \times 0.5]$$

$$= \frac{3}{16} = 0.1875$$

$$p_Z = \frac{N_1}{4} = \frac{3}{4} = 0.75$$
\[ P_{0\rightarrow 1}(X) = (1 - p_B)(1 - p_Z)[1 - (1 - p_B)(1 - p_Z)] \]
\[ = (1 - 0.5)(1 - 0.75)[1 - 0.5 \cdot 0.25] \]
\[ = \frac{7}{64} = 0.1094 \]
\[ p_X = [1 - (1 - p_B)(1 - p_Z)] = 1 - \frac{1}{8} = 0.875 \]
\[ P_{0\rightarrow 1}(Y) = p_Ap_X(1 - p_Ap_X) \]
\[ = 0.5 \cdot 0.875(1 - 0.5 \cdot 0.875) \]
\[ = \frac{63}{256} = 0.2461 \]

From the transition activity factors calculated earlier, the effective capacitances for the Y AOI gate is \( C_{sw}(Y) = \alpha_{0\rightarrow 1}C_u = \frac{63}{256}C_u \). The effective capacitances are 0.241\( C_u \). This is based on the self-loading or output capacitances of the gates equaling \( C_u \). The two input gates have to include the activity and the capacitances of the intermediate nodes as well as the output. These are summarized below:

\[ C_{sw}(Y) = \alpha_{0\rightarrow 1}(Z)(2C_u) + \alpha_{0\rightarrow 1}(X)(2C_u) + \alpha_{0\rightarrow 1}(Y)C_u \]
\[ = 0.8399C_u \]

(13)

The AOI gate implementation for the Y equation has only approximately 30% of the switched capacitance of the two input gate implementation. AOI gates are generally quite efficient for implementing miscellaneous logic equations, hence they are often included in standard cell libraries designed for synthesis (10 points).
Figure 7: Two input gate implementation of sum function.

Table 4: Transition probability table for full adder implementation.

<table>
<thead>
<tr>
<th>Nodes</th>
<th>$\alpha_{0\rightarrow 1}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>$\frac{1}{2} = 0.25$</td>
</tr>
<tr>
<td>S</td>
<td>$\frac{3}{8} \times \frac{4}{8} = 0.25$</td>
</tr>
<tr>
<td>Y0</td>
<td>$\frac{3}{16} = 0.1875$</td>
</tr>
<tr>
<td>Y1</td>
<td>$\frac{1}{16} = 0.0625$</td>
</tr>
<tr>
<td>Y2</td>
<td>$\frac{5}{16} = 0.3125$</td>
</tr>
<tr>
<td>Y3</td>
<td>$\frac{3}{8} \times \frac{5}{8} = 0.2344$</td>
</tr>
<tr>
<td>Co</td>
<td>$\frac{1}{4} \times \frac{1}{4} = 0.25$</td>
</tr>
</tbody>
</table>

Problem 4

4.1 Figures 7 and 8 show two input gate implementations of the full adder sum and carry outputs (10 points).

Note that the circuits involve reconvergent fanouts, since inputs connect to multiple gates whose outputs are combined further down the datapath. This implies that conditional probabilities are required to compute the activity factors. However, because the gates are relatively simple, the easiest way to compute the transition probabilities is to use the truth table approach based on the inputs to the entire gate ($A$, $B$, and $C_i$) (35 points).

4.2 The probability that the internal node $X$ is 0, is determined by the probabilities that the inputs $A$ and $B$ from Figure 7 are the same: $\Pr(X = 0) = \Pr(A = 0) \cdot \Pr(B = 0) + \Pr(A = 1) \cdot \Pr(B = 1) = 0.5 \cdot 0.8 + 0.5 \cdot 0.2 = 0.5$. The activity factor is $\alpha_{0\rightarrow 1}(X) = \Pr(X = 0) \cdot \Pr(X = 1) = 0.25$. The output node probability $\Pr(S_o = 1)$ can be computed in a similar manner by enumerating the cases when the sum output is high: $\Pr(S_o = 1) = \Pr(A = 0) \cdot \Pr(B = 0) \cdot \Pr(C_i = 1) + \Pr(A = 1) \cdot \Pr(B = 0) \cdot \Pr(C_i = 0) + \Pr(A = 0) \cdot \Pr(B = 1) \cdot \Pr(C_i = 0) + \Pr(A = 1) \cdot \Pr(B = 1) \cdot \Pr(C_i = 1) = 0.5$. The activity factor $\alpha_{0\rightarrow 1}$ for $S_o$ is 0.25 (5 points).

Activity factor is typically minimized when the most active input (transition probability closest to $\frac{1}{2}$) is moved as far down the logic path as possible. If this were done, $\Pr(X = 0) = \Pr(C_i = 0) \cdot \Pr(B = 0) + \Pr(C_i = 1) \cdot \Pr(B = 1) = 0.9 \cdot 0.8 + 0.1 \cdot 0.2 = 0.74$. The activity factor is now $\alpha_{0\rightarrow 1}(X) = \Pr(X = 0) \cdot \Pr(X = 1) = 0.1924$, so the input reordering reduces the internal node activity factor by about 23% (5 points). Figure 9 shows the rearranged circuit.

4.3 The unbalanced nature of the logic tree for computing the carry output makes it intuitive that there might be some glitching on the output signal. The glitch is generated by causing
Figure 8: Two input gate implementation of carry function.

Figure 9: Two input gate implementation of sum function with inputs reordered to reduce internal node switching activity.
the fast path (through Y2) and the slow path (through Y3) to be exercised by simultaneous transitions on two inputs. For the schematic drawn earlier (Figure 8), this is done by transitions on the B and carry inputs. The timing diagram showing a glitch duration of 2 units is shown in Figure 10.

4.4 Using DeMorgan’s laws, it is possible to reimplement the carry logic using inverting gates. Figure 11 shows the delay balanced implementation. The three NAND gates have a delay of 1 unit so they balance the 2 unit delay of the AND gate and prevent glitching on the output (10 points).
Figure 11: Two input gate implementation of carry function.

<table>
<thead>
<tr>
<th>Metal Layer</th>
<th>W</th>
<th>H</th>
<th>t</th>
<th>( c_{pp} )</th>
<th>( c_{fringe} )</th>
<th>% ( c_{fringe} )</th>
</tr>
</thead>
<tbody>
<tr>
<td>M1</td>
<td>0.072 ( \mu )m</td>
<td>0.072 ( \mu )m</td>
<td>0.320 ( \mu )m</td>
<td>( 7.77 \times 10^{-18} ) F/( \mu )m</td>
<td>( 9.46 \times 10^{-17} ) F/( \mu )m</td>
<td>92.42 %</td>
</tr>
<tr>
<td>M2</td>
<td>0.072 ( \mu )m</td>
<td>0.090 ( \mu )m</td>
<td>0.690 ( \mu )m</td>
<td>( 3.60 \times 10^{-18} ) F/( \mu )m</td>
<td>( 7.76 \times 10^{-17} ) F/( \mu )m</td>
<td>95.57 %</td>
</tr>
</tbody>
</table>

Table 5: Metal capacitances (per unit length).

Problem 5

5.1 Using the formulas from lecture and the constants given in the problem statement, it should be easy to fill in the capacitances for M1 and M2 in Table 5. It is more convenient to use a value of \( 8.854 \times 10^{-18} \) F/\( \mu \)m for Faraday’s constant.

However, a corrected formula different from the one given in lecture is used to fill in the rest of Table 5. In both cases, it is clear that for minimum width wires in this process, the fringing field capacitance dominates the total. For long wires, a larger than minimum width is used so the fringing field capacitance is less dominant, but still a significant fraction of the total (15 points).

5.2 The goal of this problem is to explore the tradeoff between memory width (number of columns) and memory height (number of rows) in terms of the capacitance of the bit and word lines. The total capacitance is determined by the number of stored bits \( 2^N \), but the aspect ratio has an impact on power consumption and delay. In terms of power, the bitlines tend to have a much higher activity factor since they are typically precharged every cycle, whereas only one wordline is activated per memory cycle for an activity factor of \( \frac{1}{k} \) where \( k \) is the number of row address bits.

\( k \) bits are used for the row address and \( N - k \) are used for the column address. A bitline runs vertically and so must span all of the rows, so its length is proportional to the number
of rows $2^k$. Similarly, the wordline running horizontally must span all of the columns so its length is proportional to the number of columns $2^{N-k}$. The wiring capacitance is proportional to the sum of the two components (6 points):

$$ C_{TOT} = C_{bit} + C_{word} = 2^k C_u + 2^{N-k} \gamma C_u $$

(14)

To minimize $C_{TOT}$ depending on the partitioning between row and column address bits, take the derivative with respect to $k$ and set it equal to 0:

$$ \frac{dC_{TOT}}{dk} = C_u \ln(2)[2^k - 2^{N-k} \gamma] = 0 $$

$$ 2^k - 2^N \gamma = 0 $$

$$ 2k = N + \log_2(\gamma) $$

$$ k = \frac{N}{2} + \frac{1}{2} \log_2(\gamma) $$

(15)

This makes intuitive sense: the optimum partitioning of the bits is as a square array ($k = \frac{N}{2}$) plus a correction factor depending on the ratio of the capacitances between bit and word lines. Note that if $\gamma$ is less than 1, the optimum uses less than $\frac{N}{2}$ bits for the row address (9 points).

5.3 $\gamma$ is the ratio of the wordline capacitance per unit length to the bitline capacitance per unit length. Using the numbers from the top of Table 5, we compute $\gamma = 0.7933$. Plugging into the equation for the optimal number of row address bits yields:

$$ k = \frac{N}{2} + \frac{\log_2(0.7933)}{2} = \frac{N}{2} - 0.17 \approx \frac{N}{2} $$

(16)

So for the example of this problem, the optimum aspect ratio is approximately a square, equal numbers of row and column address bits (5 points).