# A Comparison of CMOS Circuit Techniques: Differential Cascode Voltage Switch Logic Versus Conventional Logic

KAN M. CHU AND DAVID L. PULFREY, MEMBER, IEEE

Abstract - Differential cascode voltage switch (DCVS) logic is a CMOS circuit technique which has potential advantages over conventional NAND/ NOR logic in terms of circuit delay, layout density, power dissipation, and logic flexibility. In this paper a detailed comparison of DCVS logic and conventional logic is carried out by simulation, using SPICE, of the performance of full adders designed using the different circuit techniques. Specifically, comparisons are made between a static full CMOS design and two different implementations of static DCVS circuits, and, in the dynamic case, between two conventional NORA implementations and DCVS forms of both NORA and DOMINO logic. The parameters compared are: input gate capacitance, number of transistors required, propagation delay time, and average power dissipation. In the static case, DCVS appears to be superior to full CMOS in regards to input capacitance and device count but inferior in regards to power dissipation. The speeds of the two technologies are similar. In the dynamic case, DCVS can be faster than more conventional CMOS dynamic logic, but only at the expense of increased device count and power dissipation.

# I. INTRODUCTION

DIFFERENTIAL cascode voltage switch (DCVS) logic is a recently proposed CMOS circuit technique which is claimed to have advantages over traditional NAND/NOR circuit techniques in terms of circuit delay, power dissipation, layout area, and logic flexibility [1]. DCVS also has an inherent self-testing property which can provide coverage of both stuck-at and dynamic faults [2]. A further attraction of DCVS circuits is the fact that they can be readily designed using straightforward procedures based on Karnaugh maps (K-maps) and tabular methods [3].

All these worthwhile features would appear to make DCVS logic a very promising CMOS circuit technique. To investigate this possibility, we have compared DCVS logic and more conventional CMOS logic forms using the fulladder circuit as a test vehicle. The full adder is suited to this purpose as it is a common, yet reasonably complex, building block in digital circuits. The comparison reported here uses SPICE simulations to assess the performance parameters of area, input loading, speed, and power dissipation. Area is represented by the number of transistors needed to implement the adder and loading is quantified

IEEE Log Number 8714872.

 $x_n -$ 

Fig. 1. Block diagram of a DCVS circuit. The load circuitry is connected to nodes Q and Q'.

in terms of the input gate capacitance. Speed is assessed by simulating the worst-case propagation time. Power dissipation is computed at the maximum frequency of operation of each circuit.

#### II. CIRCUIT TECHNIQUES FOR DCVS LOGIC

The basic DCVS circuit comprises two parts: a binary decision tree and a load (see Fig. 1). The tree is specified such that:

- 1) when the input vector  $x = (x_1, \dots, x_n)$  is the true vector of the switching function Q(x), then the output Q is disconnected from node G and the node Q' is connected to G; and
- 2) when  $x = (x_1, \dots, x_n)$  is the false vector of Q(x), then the reverse holds.

There are two trees required to implement a full adder, one to perform the sum and one to perform the carry function (see Fig. 2). These circuits, which were designed using the K-map procedure described in [3], are used as the tree circuits for all the DCVS circuit forms examined in this paper. The various DCVS forms differ in their load circuitry, as is now described.

The load for a static DCVS circuit is the simple latch shown in Fig. 3. Depending on the differential inputs, either node Q or Q' is pulled down by the DCVS tree network. Regenerative action sets the PMOS latch to static outputs Q and Q' of  $V_{DD}$  and ground or vice versa. The



Manuscript received August 29, 1986; revised January 26, 1987. This work was supported by the Natural Sciences and Engineering Research Council of Canada.

The authors are with the Electrical Engineering Department, University of British Columbia, Vancouver, B.C. V6T 1W5, Canada.



Fig. 2. The DCVS trees for a full adder. (a) The circuit providing the sum, S(A, B, C) = A + B + C. (b) The circuit yielding the carry,  $C_o(A, B, C) = AB + BC + CA$ .



Fig. 3. The load for a static DCVS circuit.

logic trees do not pass any direct current after the latch sets.

A variation of this static DCVS circuit is the differential split-level (DSL) logic circuit [4] shown in Fig. 4. Two n-transistors T3 and T4 with their gates connected to a reference voltage  $V_{\text{REF}}$  are added to reduce the logic swing at nodes Q and Q'. If  $V_{\text{REF}}$  is set to  $V_{DD}/2 + V_{th}$ , where  $V_{th}$  is the threshold voltage of the n device, then the nodes Q and Q' are clamped at  $V_{DD}/2$ . Suppose node Q is pulled down from 2.5 V (i.e., assume  $V_{DD} = 5$  V) to a low level. T1 switches from its low-current state to its highcurrent drive state very quickly, because T4 is initially OFF. The voltage on node f' goes up to 5 V because T1 is fully ON. Node Q' is raised up to 2.5 V until T3 is in the cutoff mode. DSL circuits would be expected to be about two times faster than standard DCVS circuits on account of the need for logic swings of only half the rail-to-rail voltage difference. This should result in a reduction by two times of the charges needed to be manipulated in the circuit.

Turning now to dynamic operation of DCVS circuits, consider first the DOMINO [5] configuration of Fig. 5 [1]. Nodes Q and Q' are precharged to high during the precharge phase ( $\phi = 0$ ) and either node Q (node f) or Q'(f') discharges to low during the evaluation phase ( $\phi = 1$ ). Transistor T1 (or T2) is a high impedance p transistor which serves as the feedback device to maintain the high logic level at node Q' (or Q), where charges may be lost due to charge sharing [6].

For dynamic operation of pipelined architectures, NORA (NO RACE) techniques [7] are suitable for imple-



Fig. 4. The load for a static DSL circuit.



Fig. 5. The load and circuit arrangement for a DCVS DOMINO circuit.



Fig. 6. The load and circuit arrangement for a DCVS NORA pipelined section.

menting logical functions. In its original form the NORA structure consists of n- and p-logic gates to enhance logic flexibility. The p-logic gates usually cause long delay times and consume large areas. Using DCVS logic in the NORA technique will eliminate p-logic gates because of the inherent availability of complementary signals. The general structure of a DCVS NORA pipelined section consisting of only one dynamic gate is shown in Fig. 6. This type of circuit technique is suitable for use in a heavily pipelined logic design, as in the case, for example, of a newly developed  $8 \times 8$  pipelined multiplier [8].

As Fig. 6 indicates, the load circuitry is symmetrical, and thus, for analysis purposes, only one side of it need be considered. During the evaluation phase ( $\phi = 1$ ), node Q is either floating or discharged depending on the inputs. The output register acts as a clocked inverter, and the output can be either high or low. During the precharge phase ( $\phi = 0$ ), the ground path of the register is blocked. If the output resulting from the previous evaluation is high, then



Fig. 7. Circuits for a static CMOS full adder.

the output continues to be high regardless of the voltage of Q. If the output is low (i.e., node Q has never been discharged) and transistor T1 is ON, then the output continues to be low because no charges can be added through T2. Thus for a  $\phi$  section of a pipeline, the output changes freely when  $\phi$  is high and is latched at the falling edge of  $\phi$ .

# III. CONVENTIONAL CMOS CIRCUIT TECHNIQUES

To provide a basis for comparison of the DCVS circuits described in Section II, conventional CMOS designs operating under static and dynamic conditions need to be considered.

The circuit used here for a static CMOS full adder is shown in Fig. 7. Two subcircuits are identified, one to generate the sum signal and one to generate the carry out signal. The three-way EXCLUSIVE-OR gate in the sum circuit has the highest stack level and largest parasitic capacitance, and thus determines the worst-case delay time of the adder. This circuit is relatively fast compared to other possible static full CMOS implementations because the complemented outputs are obtained through only one gate delay from the complementary inputs.

Two versions of conventional approaches to dynamic CMOS full-adder design were studied. One, a conventional NORA adder with serial n- and p-logic blocks, is shown in Fig. 8. The other circuit, a modified NORA adder [9], is shown in Fig. 9. It contains a special three-way XOR gate to generate the sum signal.

#### IV. COMPARISON OF THE FULL ADDERS

To compare the performance of the various forms of full adders, each of the circuits described in Sections II and III was simulated using SPICE. The conventional CMOS cir-



Fig. 8. Circuit for a conventional NORA full adder.



Fig. 9. Circuit for a modified NORA full adder (from [9]).

cuits simulated were those shown in Figs. 7–9. Four different DCVS circuits, two static and two dynamic, were generated by connecting the full-adder tree of Fig. 2 to the load circuits shown in Figs. 3–6. The DCVS circuits were laid out on a Metheus  $\lambda$ 700 workstation in accordance with design rules for the single-metal 3- $\mu$ m CMOS process of Northern Telecom, Ottawa, Canada [10]. The areas occupied by the DCVS circuits were about 2.2×10<sup>-4</sup> and 3.5×10<sup>-4</sup> cm<sup>2</sup> for the static and dynamic versions, respectively. The results of SPICE simulations from the schematics of all the circuits are summarized in Table I.

The input gate capacitance gives a measure of the input loading of the circuit. This parameter is, for the case of transistors of fixed length (3  $\mu$ m in this case), determined by the number of transistors and their widths. A general guideline used in the first iteration of a design was to size the transistors in a tree network such that the equivalent conductance of any single discharging path was the same as the conductance of a minimum-size ( $W = 3 \mu$ m in our

#### CHU AND PULFREY: COMPARISON OF CMOS CIRCUIT TECHNIQUES

TABLE I Comparison of Simulation Results for Different Types of Full Adders

| PROPERTY<br>CIRCUIT<br>TECHNIQUE | INPUT BATE<br>CAPACITANCE<br>(ff) | OUTPUT LOAD<br>CAPACITANCE<br>(fF) | # OF P-<br>DEVICES<br># OF H-<br>DEVICES | WORST CASE<br>DELAY TIME<br>(ns) | AVERAGE POWER<br>DISSIPATION<br>AT MAX. FREQ.<br>(mW) | NORMALIZED<br>POWER-DELAY<br>PRODUCT |
|----------------------------------|-----------------------------------|------------------------------------|------------------------------------------|----------------------------------|-------------------------------------------------------|--------------------------------------|
| STATIC<br>FULL CHOS              | 155                               | 155                                | 15/15                                    | 20                               | 0.58                                                  | 1.00                                 |
| STATIC<br>DCVS                   | <b>B</b> 5                        | 85                                 | 4/18                                     | 22                               | 1.11                                                  | 2.11                                 |
| STATIC<br>DSL                    | 85                                | 85                                 | 4/22                                     | 14                               | 1.35                                                  | 1.63                                 |
| NORA                             | 110                               | 220                                | 12/10                                    | 18                               | 0.83                                                  | 1.29                                 |
| MODIFIED<br>NORA                 | 45                                | 90                                 | B/20                                     | 10                               | 1.24                                                  | 1.06                                 |
| DCVS<br>NDRA                     | 85                                | 170                                | 12/28                                    | 10                               | 1.55                                                  | 1.34                                 |
| DCVS<br>DOMIND                   | 85                                | 170                                | 12/24                                    | 9                                | 1.75                                                  | 1.36                                 |

case) n transistor. For example, a path with four serially connected transistors requires each transistor contained in that path to be 12  $\mu$ m (= 4×3  $\mu$ m) wide. Consider the right half of the network in Fig. 2(b); that the number of transistors (or stack level) contained in path A'BC' is three implies that each transistor in this path should be 9  $\mu$ m wide. If the width of transistor C' is 9  $\mu$ m, then the width of B' in path B'C' can be estimated as 4.5  $\mu$ m. Similar principles can be applied to the sizing of transistors in the charging or discharging paths in other circuits. The final form of a design was arrived at by making adjustments, usually small, to the widths of the transistors on the basis of minimizing the circuit delay time as predicted by SPICE simulations.

The worst-case delay times quoted in Table I refer to situations where the input signals are such that the circuit operation is likely to be slowest. For example, in the conventional NORA circuit of Fig. 8, the speed performance will be poorest when A = B = HI and C = LO. In this case, during the evaluation phase ( $\phi = 1$ ), node F needs to be pulled down, in order to turn on the p-channel transistor through which node H, via transistor C, is connected to the output stage to render the sum signal LO.

Power dissipation was computed using the procedure described by Kang [11]. The figures quoted in Table I refer to average power dissipation at the maximum frequency of operation of each circuit, i.e., as determined by the worstcase delay times. The power-delay product, normalized to the static full CMOS case, is also shown in Table I.

The output load capacitances used in the simulations are meant to represent typical load conditions. A fan-out of two was used for the dynamic designs as these circuits are buffered and would be expected to be able to drive larger loads than the static gates.

# V. DISCUSSION

Considering, first, the static designs, it appears that the DSL technique yields a significantly faster circuit than do the other two techniques. This is to be expected due to the need for logic swings which are only one-half of the

rail-to-rail value. The significant differences between the other two static designs are the increased power dissipation and the reduced device count and input gate capacitance of the static DCVS circuit. The number of devices is less because the DCVS implementation uses only p-channel transistors, as opposed to both p- and n-channel devices, as pull-ups in the load and buffer circuitry. The input gate capacitance loading in the DCVS circuit is typically a factor of 2 or 3 times smaller than conventional CMOS circuits which require complementary n- and pchannel devices to be driven, since the inputs drive only n-channel tree devices.

The static DCVS circuit consumes more power than the conventional static CMOS circuit because the charging and discharging times of nodes Q and Q' in Fig. 3 depend on the turn-on and turn-off paths within the DCVS tree and these are, generally, not symmetrical. An asymmetry in the rise and fall times of the potential at nodes Q and Q' will prolong the period of current flow through the latch during the transient state, thus increasing the power dissipation.

The apparent attractiveness of the static DSL circuit in regards to speed is negated somewhat by three possible problems which may arise when using this technique. For example, with reference to Fig. 4, if node Q' is at 2.5 V, then T2 is partially ON and it is possible to destroy the low logic level that would otherwise have appeared on node f. Although reducing the size of the p device alleviates this problem, it decreases the output drive capability and results in longer delay. Thus a trade-off should be considered when the sizes of T1 and T2 are chosen. Another problem is due to the body effect existing in T3 and T4. Although the threshold voltage  $V_{th}$  is equal to 0.8 V in the Northern Telecom 3-µm CMOS process [10], SPICE simulations show that it is necessary to set  $V_{\text{REF}}$  equal to 4.2 V in order to clamp either of the nodes Q or Q' to 2.5 V. Also the clamped logic swing is sensitive to the stack level of the DCVS tree for a fixed  $V_{\text{REF}}$ . The third problem is that this circuit exhibits static power dissipation. There is a direct current path to ground through transistors T1 and T3 when Q' is low or through T2 and T4 when Q is low.

Turning now to the dynamic circuits, all the designs have a similar power-delay product. The DCVS circuits appear to have a speed advantage, but this is achieved at the expense of an increased device count. The conventional NORA circuit, Fig. 8, is characterized by a large input gate capacitance due to the wide transistors in the p-logic block, and a slow speed due to the use of two levels of gate delay and because half of the logic is performed by p transistors. Considerable improvement in these two areas is achieved by the modified NORA adder of Fig. 9. This circuit has two times smaller input gate capacitance and is nearly twice as fast as the serial NORA adder. The disadvantage of this circuit is that accidental discharge due to races is possible under certain conditions. For example, if A = 0, B = 1, and C = 1, the gate of T14 (or source of T15) and the gate of T15 (or source of T14) are pulled down. If the drain nodes of T7 and T10 do not pull down at similar rates so that a voltage difference of more than one threshold is developed across the gate nodes of T14 and T15, the drain node of T13 discharges accidentally. To avoid this requires careful sizing of the transistors along the discharging paths so that the conductance to ground and the capacitive load associated with each of the pull-down paths is equal. Tight process control and detailed simulation through circuit extraction are needed if this circuit is to be successfully implemented.

The DCVS NORA (Fig. 6) adder has smaller input gate capacitance and delay time than the conventional NORA adder, although the area consumed is larger. The large area stems from the symmetrical buffer circuits used to provide complementary outputs. The DCVS NORA circuit is as fast as the modified NORA adder, but only at the expense of a higher device count and increased input gate capacitance. However, the DCVS version of NORA is superior to the modified conventional version, in terms of circuit flexibility, due to its provision of complementary outputs, and reliability, due to the fact that accidental discharge cannot occur. The DCVS DOMINO (Fig. 5) adder is similar to the DCVS NORA adder in all the parameters evaluated in this comparison. It is the only kind of fulladder circuit which can be included in a DOMINO chain without causing race problems.

# VI. CONCLUSIONS

The main conclusion to be drawn from this work is that DCVS logic offers opportunities for realizing faster circuits than are possible with conventional forms of CMOS logic, but this speed advantage is often gained at the expense of circuit area and active power consumption.

The fastest static logic technique investigated was the differential split-level (DSL) version of DCVS logic. The worst-case delay time for this implementation was 14 ns, while that of a conventional CMOS circuit was 20 ns. However, DSL may have some problems in terms of static power dissipation, security of charge storage, and sensitivity of the logic swing to the number of input signals.

In dynamic operation, DCVS versions of NORA and DOMINO circuits appear to be a few nanoseconds faster (9-10 versus 10-18) than their conventional counterparts. Further, DCVS logic may overcome the problem of accidental discharge, which appears to be a concern with one of the conventional NORA techniques evaluated in this study.

#### References

- L. G. Heller, W. R. Griffin, J. W. Davis, and N. G. Thoma, "Cascode voltage switch logic: A differential CMOS logic family," [1]
- in ISSCC Dig. Tech. Papers, 1984, pp. 16–17. R. K. Montoye, "Testing scheme for differential cascode voltage switch circuits," IBM Tech. Disc. Bull., vol. 27, pp. 6148–6152, [2] 1985
- K. M. Chu and D. L. Pulfrey, "Design procedures for differential cascode voltage switch circuits," *IEEE J. Solid-State Circuits*, vol. [3] SC-21, pp. 1082–1087, Dec. 1986. L. C. Pfennings, W. G. J. Mol, J. J. Bastiaens, and J. M. F.
- [4] Van Dijk, "Differential split-level CMOS logic for subnanosecond speeds," in *ISSCC Dig. Tech. Papers*, 1985, pp. 212–213; also *IEEE J. Solid-State Circuits*, vol. SC-20, pp. 1050–1055, Oct. 1985. R. H. Krambeck, C. M. Lee, and H. Law, "High-speed compact circuits with CMOS," *IEEE J. Solid-State Circuits*, vol. SC-17, pp. (14, 610, Lurg, 108).
- [5]
- 614–619, June 1982. L. G. Heller, "Stabilizing cascode voltage switch logic," *IBM Tech. Disc. Bull.*, vol. 27, p. 6015, 1985. [6]
- N. F. Goncalves and H. J. De Man, "NORA: A racefree dynamic CMOS technique for pipelined logic structure," *IEEE J. Solid-State* [7]
- [8]
- *Circuits*, vol. SC-18, pp. 261–266, June 1983. K. M. Chu, "Cascode voltage switch logic circuits," M.A.Sc. thesis, Univ. of British Columbia, Vancouver, Canada, 1986. A. H. C. Park, "CMOS LSI design of a high-throughput digital filter," M.Sc. thesis, Mass. Inst. of Technol., Cambridge, ch. 4, [9] 1984
- G. Puukila, "Canadian Microelectronics Corporation guide for [10] designers using the Northern Telecom CMOS3 Process," Ca Microelectronics Corp., Kingston, Ont., Rep. IC 85-6, 1985. Canadian
- S. M. Kang, "Accurate simulation of power dissipation in VLSI circuits," *IEEE J. Solid-State Circuits*, vol. SC-21, pp. 889-891, [11] Oct. 1986.



Kan M. Chu was born in Hong Kong on May 16, 1962. He received the B.Eng. (Honors E.E.) degree from McGill University, Montreal, Canada, in 1984 and the M.A.Sc. (E.E.) degree from the University of British Columbia, Vancouver, B.C., Canada, in 1986. He plans to work towards a Ph.D. degree in electrical engineering.

His research interests are in CMOS-integrated circuit design and semiconductor device modeling



David L. Pulfrey (M'73) is a Professor in the Electrical Engineering Department at the University of British Columbia, Vancouver, B.C. Canada. His research interests are in the fields of semiconductor device physics and integrated-circuit design. He has worked on the topics of electrical breakdown in thin dielectrics, the preparation and properties of plasma-anodized thinoxide films, and the analysis and fabrication of solar cell structures suited to large-area terrestrial applications. His present work at the

University of British Columbia is in the areas of high-gain polysilicon emitter transistors, the characterization and applications of MIS tunnel junctions, and the algorithmic generation of IC marcocells.