# Half-Swing Clocking Scheme for 75% Power Saving in Clocking Circuitry

Hirotsugu Kojima, Member, IEEE, Satoshi Tanaka, Member, IEEE, and Katsuro Sasaki, Member, IEEE

Abstract—We propose a half-swing clocking scheme that allows us to reduce power consumption of clocking circuitry by as much as 75%, because all the clock signal swings are reduced to half of the LSI supply voltage. The new clocking scheme causes quite small speed degradation, because the random logic circuits in the critical path are still supplied by the full supply voltage. We also propose a clock driver which supplies half-swing clock and generates half  $V_{\rm DD}$  by itself. We confirmed that the halfswing clocking scheme provided 67% power saving in a test chip fabricated with 0.5  $\mu$ m CMOS device, ideally 75%, in the clocking circuitry, and that the degradation in speed was only 0.5 ns by circuit simulation. The key to the proposed clocking scheme is the concept that the voltage swing is reduced only for clocking circuitry, but is retained for all other circuits in the chip. This results in significant power reduction with minimal speed degradation.

#### I. INTRODUCTION

REDUCING power consumption without sacrificing processing speed is a critical factor in LSI design, especially for hand-held devices. In CMOS circuits, dynamic power consumption is proportional to the transition frequency, capacitance, and square of supply voltage. Consequentially, reducing supply voltage provides significant power savings at the expense of speed. This technique employs high-performance architectures to achieve the specified speed, and is quite effective for ASIC's [1]. In general purpose processors, however, it is more dificult to employ high-performance architectures, because the architecture is already a part of the specifications. It is therefore very important to reduce power consumption without reducing supply voltage or sacrificing performance.

Several power reduction techniques in random logic and clocking have been reported [2]–[7]. Such techniques can be applied for all random logic circuits, but they do not save much power. The clocking circuitry generally consumes a large portion of the total power in digital LSI's. Fig. 1 shows that the clocking circuitry in an adaptive equalizer consumes 33% of the total power [8]. In a microprocessor, 18% of the total chip power is consumed by clocking [9]. This is because the clock frequency is typically several times higher than other signals, such as data and control.

Our proposal is a new clocking scheme in which all the clock signal swings are reduced to half of the LSI supply voltage. This technology allows us to reduce power consump-

Manuscript received October 5, 1994; revised December 5, 1994.

H. Kojima and K. Sasaki are with Hitachi America, Ltd., R&D, San Jose, CA 95134 USA.

S. Tanaka is with the Central Research Laboratory, Hitachi Ltd., Tokyo 185 Japan.

IEEE Log Number 9409857.



Fig. 1. Power consumption analysis.

tion of clocking circuitry by as much as 75%. The speed degradation caused by the proposed clocking scheme is quite small because the random logic circuits in the critical path are still supplied by the full supply voltage. The key to the proposed clocking scheme is the concept that the voltage swing is reduced only for clocking circuitry, but is retained for all other circuits in the chip. This results in significant power reduction with minimal speed degradation.

## II. HALF-SWING CLOCKING SCHEME

Fig. 2 shows the proposed half-swing clocking scheme compared with a conventional scheme. In Fig. 2(a), a conventional latch is gated by two full-swing clocks. To decrease the clocking power, the voltage swing of the clock is reduced to half  $V_{\rm DD}$  (' $V_{\rm DD}$ ' represents the LSI supply voltage). The proposed scheme, as shown in Fig. 2(b), uses two separate clock signals for NMOS and PMOS transistors, respectively. The clock for NMOS's swings from zero to half  $V_{\rm DD}$ , and the clock for PMOS's swings from  $V_{\rm DD}$  to half  $V_{\rm DD}$ . The power consumed by clocking circuitry is decreased to 25% of conventional clocking circuitry.

We propose to use two stacked inverters to generate half-swing clock signals. Fig. 3 shows the proposed clock driver circuit which supplies the two half-swing clock signals described above and generates a half  $V_{\rm DD}$  voltage by itself. Here,  $C_1$  and  $C_2$  represent PMOS loads on drivers, and  $C_3$  and  $C_4$  represent NMOS loads on the other drivers.  $C_A$  and  $C_B$  are additional capacitors which can be fabricated on-chip or connected externally. The intermediate voltage at the node H- $V_{\rm dd}$  is given by the following equations:

$$V_{H\text{-}V_{
m dd}} = rac{C_1 + C_A}{C_1 + C_4 + C_A + C_B} V_{
m DD}$$
 when CLK is 'low'  $V_{H\text{-}V_{
m dd}} = rac{C_2 + C_A}{C_2 + C_3 + C_A + C_B} V_{
m DD}$  when CLK is 'high.'



Fig. 2. Concept of half-swing clocking scheme. (a) Conventional clocking scheme. (b) Half-swing clocking scheme.



Fig. 3. Proposed clock driver.

The  $H\text{-}V_{\mathrm{dd}}$  node is stabilized at  $V_{\mathrm{DD}}/2$ , when  $C_A$  and  $C_B$  are equal, yet large enough for  $C_1$  through  $C_4$  to be considered negligibly small. If  $C_1$  through  $C_4$  are made equal,  $C_A$  and  $C_B$  are not needed. In actual LSI's, however,  $C_A$  and  $C_B$  should be large enough to compensate for production fluctuation of  $C_1$  through  $C_4$ .

Fig. 4 shows a summary of the proposed idea by using a simple model of digital circuits. Most digital LSI's are separated into three parts: latches, random logic circuits between the latches, and a clock driver tree which provides clock signals to the latches. The chip performance is determined by the critical path delay from one latch to another. The half-swing clocking scheme causes a two delay increase: in the clock drivers, and in a latch driven by half-swing clock. The delay increase from clock source to the latch does not degrade the performance, because no clock driver is located on the critical path. The performance degradation is caused by the delay increase in the latch that is located on the critical path and is driven by the half-swing clock. Since the clocking scheme never degrades the speed of other random logic circuits on the critical path, the speed degradation is minimal.

# III. CHARACTERISTICS

The two important characteristics of our proposed clocking scheme are evidenced through circuit simulation: the delay increase in a half-swing clock driver and the delay increase in a latch driven by a half-swing clock.

Fig. 5 shows the load capacitance dependency of the propagation delay in proposed and conventional clock drivers.



Fig. 4. Summary of tradeoff with half-swing clocking scheme.



Fig. 5. Simulated load capacitance dependency of the propagation delay in clock drivers.

The results were obtained through circuit simulation using  $0.5~\mu m$  CMOS FET models. The propagation delay of the half-swing clocking driver is approximately twice that of the conventional driver. However, the delay itself does not affect the performance determined by a critical path delay. Since a clock skew is proportional to the delay, deskewing techniques are important in using the half-swing clocking scheme.

The propagation delay of a latch is the interval from when the clock arrives at the latch to the time when the data is output from the latch. Fig. 6 shows the relationship of the simulated propagation delay of a latch driven by conventional and halfswing clocks to the load capacitance. The delay increase caused by the proposed half-swing clocking is at most 0.5 ns. Note that the increase is regardless of the load capacitance. The propagation delay of the latch is the sum of two delays: the delay caused when the clock gated transistor drives the



Fig. 6. Simulated load capacitance dependency of the propagation delay of latch.

last stage inverter and the delay caused when the last stage inverter drives the load capacitance of the latch. The former delay increases when the clock gated transistor is driven by a half-swing clock, but the amount of delay is independent of the load capacitance of the latch, i.e., the clocking scheme does not affect the latter delay.

Fig. 4 shows that the critical path delay of an LSI is the sum of the propagation delay of the latch and the random logic. As demonstrated in the simulation, the propagation delay of the latch increases by 0.5 ns. The propagation delay in the random logic is the same as the conventional one, because the random logic circuits are supplied by full  $V_{\rm DD}$ . Thus, the critical path delay increases by 0.5 ns by employing the half-swing clocking scheme. The speed degradation of 0.5 ns is acceptable for most LSI's.

# IV. EXPERIMENTAL RESULTS

We fabricated a test chip that consisted of two sixteen-stage shift registers: one employing the half-swing clocking scheme and the other employing the conventional clocking scheme. Fig. 7 shows observed waveforms of the four half-swing clocks, which are true and bar clock signals driving PMOS and NMOS. The clock signals driving PMOS swing between 1.65 V and 3.3 V, and the clock signals driving NMOS swing between ground and 1.65 V. The waveforms were observed only at low frequencies, because the pin drivers were designed to be small enough not to affect the internal capacitor balance while the probes had large (10 pF) capacitive loads.

Fig. 8 shows measured power consumption of the test circuits. The random logic circuits in the proposed and conventional shift registers consume the same amount of power. The result confirmed that the proposed clock driver saves 67% of the power of the conventional one throughout a wide clock frequency range of 1 MHz to 40 MHz. The ratio between the power consumed by the proposed and conventional clocking



Fig. 7. Observed waveform of half-swing clock signals.



Fig. 8. Measured power consumption. (a)  $V_{\rm DD}=3.3V$ . (b)  $V_{\rm DD}=2.0V$ .

circuitry was 33% instead of the ideal ratio of 25%. This is because the NMOS transistors that load of the clock driver in the proposed latch are twice that of the conventional latch in order to equalize the PMOS and NMOS loads on each line of the half-swing clocks. The ideal ratio of power saving can be achieved if the latch was designed to have the same capacitive load as the conventional latch.

The intermediate voltage,  $V_{H^-V_{\rm dd}}$ , was measured at various supply voltages and frequencies as shown in Fig. 9. The results confirmed that the intermediate voltage,  $V_{H^-V_{\rm dd}}$ , was successfully stabilized at a  $V_{\rm DD}$  of 5 to 1.5 V and at clock frequencies of 1 to 40 MHz.

## V. CONCLUSION

We propose the half-swing clocking scheme and clock driver which supplies half-swing clock and generates half  $V_{\rm DD}$  by itself. We confirmed that the proposed clock driver successfully generates half-swing clock signals with stable half  $V_{\rm DD}$  generation by itself throughout the measurements of the test chip fabricated using 0.5  $\mu$ m CMOS process. Half-swing clocking scheme allows us to save 67% of power in the test chip, ideally 75%, in the clocking circuitry with only 0.5 ns degradation in speed for a 0.5  $\mu$ m CMOS device.

The proposed half-swing clocking scheme can achieve a significant power savings for most CMOS digital LSI's



Fig. 9. Measured  $H\text{-}V_{\mathrm{dd}}$  voltage stability.

including microprocessors, DSP's, and other custom chips, with little degradation in speed.

#### ACKNOWLEDGMENT

The authors would like to thank Mr. A. Masumura, Mr. Y. Ishigami, Mr. H. Misawa, Mr. H. Akama, Mr. S. Katoh, Mr. M. Otsuka, and Mr. T. Akazawa for their great cooperation on layout design, and Mr. K. Nitta, Mr. D. Gorny, Dr. M. Hiraki, Dr. Y. Hatano, and Dr. M. Hotta for their helpful discussions.

### REFERENCES

- A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, "Low-power CMOS digital design," *IEEE J. Solid-State Circuits*, vol. 27, no. 4, pp. 473–484, Apr. 1992.
   K. Yano et al., "A 3.8 ns CMOS 16 × 16 multiplier using complementary."
- [2] K. Yano et al., "A 3.8 ns CMOS 16 × 16 multiplier using complementary pass-transistor logic," *IEEE J. Solid-State Circuits*, vol. 25, no. 2, pp. 388–395, Apr. 1990.
- pp. 388–395, Apr. 1990.

  [3] M. Suzuki et al., "An I.5 ns 32 bit CMOS ALU in double pass-transistor logic," in 1993 IEEE Int. Solid-State Circuit Conf. Dig. Tech. Papers, Feb. 1993, pp. 90–91.

  [4] F. S. Lai and W. Hwang, "Differential cascode voltage switch with pass
- [4] F. S. Lai and W. Hwang, "Differential cascode voltage switch with pass gate logic tree for high performance digital systems," in 1993 Int. Symp. VLSI Technol., June 1993, pp. 358–362.
  [5] L. G. Heller, W. R. Griffin, J. W. Davis, and N. G. Thoma, "Cascode
- [5] L. G. Heller, W. R. Griffin, J. W. Davis, and N. G. Thoma, "Cascode voltage switch logic: A differential CMOS logic family," in 1984 IEEE Int. Solid-State Circuit Conf. Dig. Tech. Papers,, Feb. 1984, pp. 16–17.
   [6] A. Parameswar, H. Hara, and T. Sakurai, "A high speed, low power,
- [6] A. Parameswar, H. Hara, and T. Sakurai, "A high speed, low power, swing restored pass-transistor logic based multiply and accumulate

- circuit for multimedia applications," in 1994 IEEE Custom Integrated Circuit Conf., May 1994, pp. 278–281. E. De Man and M. Schobinger, "Power dissipation in the clock system
- [7] E. De Man and M. Schobinger, "Power dissipation in the clock system of highly pipelined ULSI CMOS circuits," in *Proc. 1994 Int. Workshop Low Power Design*, Apr. 1994, pp. 133–138.
  [8] H. Kojima, S. Tanaka, Y. Okada, T. Hikage, F. Nakazawa, H. Mat-
- [8] H. Kojima, S. Tanaka, Y. Okada, T. Hikage, F. Nakazawa, H. Matsushige, H. Miyasaka and S. Hanamura, "A multi-cycle operational signal processing core for an adaptive equalizer," VLSI Signal Process. VI. Oct. 1993, pp. 150–158.
- VI, Oct. 1993, pp. 150–158.
  [9] R. Bechade, R. Flaker, B. Kauffmann, A. Kenyon, C. London, S. Mahin, K. Nguyen, D. Pham, A. Roberts, S. Ventrone, and T. Voreyn, "A 32 b 66 MHz 1.8 W microprocessor," 1994 IEEE Int. Solid-State Circuit Conf., Dig. Tech. Papers, Feb. 1994, pp. 208–209.

**Hirotsugu Kojima** (M'94), for a photograph and biography, see this issue, p. 402.



Satoshi Tanaka (S'84-M'85) was born in Kyoto, Japan. He received the B.E. and M.E. degrees in electrical engineering from Waseda University, Tokyo, Japan, in 1983 and 1985, respectively.

In 1985 he joined the Central Research Laboratory, Hitachi Ltd., Tokyo, Japan, where he has been engaged in the research and development of circuit design for mixed analog/digital LSI's for VCR, low-voltage high-frequency analog IC's for paging receiver systems, digital signal processors for magnetic recording systems, low-voltage logic

circuits, and GaAs MMIC's for mobile communication systems.

Mr. Tanaka is the member of the Institute of Electronics, Information, and Communication Engineers of Japan.



Katsuro Sasaki (M'88) received the B.S. degree in electrical engineering and M.S. degree in electronic engineering from the University of Tokyo, Tokyo, Japan, in 1976 and 1978, respectively.

In 1978 he joined Semiconductor Division, Hitachi Ltd., Tokyo Japan, where he was involved in the design and development of a 16-kb low-power CMOS SRAM and 16-kb 64-kb, and 1-Mb high-speed CMOS SRAM's. From 1985 to 1986 he worked on polysilicon TFT's at Massachusetts Institute of Technology, Cambridge. Since 1987 he

has worked on the research and development of circuits and devices for submicrometer CMOS SRAM's in the Central Research Laboratory, Hitachi Ltd., Tokyo, Japan. In 1993 he moved to the Semiconductor Research Laboratory, Research and Development Division, Hitachi America, Ltd., San Jose, CA, as the Manager.

Mr. Sasaki is a member of the IEEE Electron Devices Society and the Institute of Electronics, Information, and Communication Engineers of Japan.