# Analog Timing Recovery for a Noise-Predictive DFE\*

John P. Keane Dept. of ECE University of California Davis, CA 95616, USA jpkeane@ece.ucdavis.edu Michael Q. Le Broadcom Corporation 16215 Alton Parkway

Irvine, CA 92619, USA

mle@broadcom.com

Paul J. Hurst Dept. of ECE University of California Davis, CA 95616, USA hurst@ece.ucdavis.edu

## Abstract

A timing recovery architecture and its CMOS implementation are described for a noise-predictive decisionfeedback equalizer (NPDFE). The  $0.5\mu m$  CMOS prototype includes timing recovery and the NPDFE and operates at 160Mbps. The timing recovery blocks dissipate 27mW from 3.3V, occupy 0.2  $mm^2$ , and achieve a rms jitter of 50 ps, which is 0.8% of a bit period.

# 1. Introduction

A CMOS noise-predictive decision-feedback equalizer (NPDFE) that outperforms a conventional DFE was recently described [1]. The key advantage of the NPDFE is that it cancels intersymbol interference (ISI) without noise enhancement. While the prototype in [1] showed the advantage of the NPDFE, it did not include timing recovery.

This paper describes a CMOS implementation of timing recovery for the NPDFE in [1]. This prototype includes timing recovery circuits and the NPDFE. It is targeted at the magnetic recording channel but could be used in many communication applications.

### 2. NPDFE Background

The NPDFE consists of the unshaded blocks in Fig. 1: an analog forward equalizer B(z), a decision-feedback equalizer (DFE) C(z), and an analog noise predictive equalizer A(z) [1]. (In the figures, thin lines represent analog signals, and thick lines represent digital signals.) A, B and C are adaptive finite impulse response (FIR) filters of order 3, 3 and 5, respectively. The equalizer outputs are summed to generate the analog voltage y[k] that is sliced to produce the binary decision  $\hat{a}[k] = \pm 1$ .

In a conventional DFE-based read channel consisting of only the FIR equalizer B(z) and the DFE C(z), B(z)will have a high-pass characteristic that provides high frequency boost to the input signal to eliminate precursor ISI, but that boost also affects the input noise. This noise enhancement reduces the signal-to-noise ratio (SNR) at the slicer input, which increases the bit-error rate. With the



Figure 1. Block diagram of the complete NPDFE-based read channel.

addition of the noise predictive equalizer A(z), the transfer function from the input to the slicer for the noise is all-pass. Therefore, the noise boost is eliminated, which can improve the SNR at the slicer input by about 2dB for the magnetic recording channel [1].

Each coefficient in A(z) is also a coefficient in B(z). Therefore, only three adaptive loops are needed to determine the coefficients in A(z) and B(z). When the FIR equalizer transfer function B(z) has maximum phase, as is the case when B(z) is adapted to remove precursor ISI only, the feedback loop that contains the noise predictive equalizer A(z) is stable [1].

#### 3. Timing Recovery Architecture

The shaded blocks in Fig. 1 form the timing recovery subsystem. A recovered bit-rate clock controls the input sampler. An estimate z[k] of the timing error in the recovered clock is fed through a first-order loop filter. The filter output adjusts the clock phase by varying the current in the current controlled oscillator (ICO).

The timing error estimate z[k] is based on the product of a sample error e and the signal slope [2]. Intuitively, the slope is used because the error in the sampled value due to a sample-time error will depend on the slope of the signal at that point. In order to simplify the implementation, a quantized version of the slope, slope, is often used, so that the timing error estimate z is given by

$$z = e \cdot slope. \tag{1}$$

<sup>\*</sup>Research supported by UC MICRO grant 00-094, which was sponsored by Broadcom, Intel, Metalink, and Texas Instruments.



Figure 2. Signal and slope quantization in acquisition mode.

The calculation of both the slope of the signal and the error in the sampled value depends on whether the loop is acquiring initial phase and frequency lock or is performing steady-state tracking.

#### 3.1. Acquisition mode

Initial acquisition of the correct timing phase and frequency is made during a training sequence. In the magnetic recording channel, this sequence is generated by repeated recording of  $\{+1, +1, -1, -1\}$ , which produces a sinewave with period 4T (where T is 1 bit period) when read. Acquisition is achieved by adjusting the ICO phase so that every other sample of x(t) is at a zero crossing. The timing error estimate is calculated from the samples x[k] of x(t). Near a zero crossing, the sample error e[k] is the sample x[k] since the ideal sample value is 0. The sign of the slope of x(t) near the zero crossing is given by the sign of the difference between the following and previous samples of x(t), i.e. by the sign of x[k+1] - x[k-1]. In this case, since x(t) is a sinusoid with period 4T,  $x[k+1] \approx -x[k-1]$ , so the quantized slope estimate is given by

$$\hat{slope}[k] = -\hat{x}[k-1] \tag{2}$$

where  $\hat{x}[k-1]$  is a quantized version of x[k-1]. Thus, from (1), the timing error estimate is [3]

$$z[k] = -x[k]\hat{x}[k-1]$$
(3)

However, rather than use  $\hat{x}[k] = \pm 1$  as in [3], we use  $\hat{x}[k] \in \{-1, 0, +1\}$  by quantizing x[k] to 3 levels using comparators with thresholds set to approximately  $\pm 50\%$  of the full scale voltage. This signal and slope quantization is illustrated in Fig. 2, where  $slope[1] = -\hat{x}[0] = -1$  and  $slope[2] = -\hat{x}[1] = 0$ . Quantizing to 3 rather than 2 levels avoids adjusting the clock phase when x[k-1] is small (when x[k] is far away from a zero crossing); this results in a maximum initial phase offset which is less than the scheme in [3], which reduces the maximum acquisition time. At a phase offset close to 0.5T, a repeating sample sequence  $x \approx \left\{ +\frac{1}{\sqrt{2}}, +\frac{1}{\sqrt{2}}, -\frac{1}{\sqrt{2}}, -\frac{1}{\sqrt{2}} \right\}$  is possible. Oscillation (hangup) of the timing loop at this timing offset can occur as  $z \approx \left\{ -\frac{1}{\sqrt{2}}, +\frac{1}{\sqrt{2}}, -\frac{1}{\sqrt{2}}, +\frac{1}{\sqrt{2}} \right\}$  in this case. Such oscillation is avoided by preventing the timing loop from updating in two consecutive bit periods.



Figure 3. Timing error estimation and loop filter. Signals with arrows in the middle are currents.

#### 3.2. Steady-state mode

In steady state, the input x(t) consists of recorded data with large amounts of ISI; the ISI makes extraction of timing information from x(t) difficult. Therefore, decisiondirected timing recovery is used in steady state. Signals at the slicer, where equalization by the NPDFE has removed the ISI, are used to estimate the timing error. A popular timing recovery scheme [2] uses the slicer input y[k] and decisions  $\hat{a}$  to calculate the timing error estimate. The difference between decisions is used to estimate the signal slope:

$$\hat{slope}[k-1] = (\hat{a}[k] - \hat{a}[k-2])/2$$
 (4)

Hence, the steady-state timing error estimate, from (1), is given by:

$$z[k] = e[k-1](\hat{a}[k] - \hat{a}[k-2])/2$$
(5)

where  $e[k-1] = y[k-1] - \hat{a}[k-1]$  is the decision-slicer error.

### 4. Implementation

#### 4.1. Timing Error Estimator and Loop Filter

A simplified block diagram of the timing-error estimator and loop filter is shown in Fig. 3. While signals are shown as single-ended here for simplicity, all analog circuits are fully differential.

The FIR forward equalizer B(z) uses sample-and-hold amplifiers (SHAs) that hold present and past samples of the input x[k], and the noise predictor A(z) holds past values of the slicer input y[k]. These held samples are used to generate the timing error estimate z[k], so no additional SHAs are needed.

In steady-state mode, the error current e[k-1] is generated by applying y[k-1] and  $\hat{a}[k-1]$  to a differencing transconductance (Gm) cell. In acquisition mode, the voltage x[k] and zero are input to the Gm cell to generate the error current. This current is multiplied by the slope estimate according to (3) or (5) using switches to generate z[k]. This slope estimate is generated from quantized signals ( $\hat{a}$  or  $\hat{x}$ ) and takes one of 3 values  $\{+1, 0, -1\}$ .

Current z[k] is integrated onto a capacitor. The voltage output of the integrator  $v_c$  is converted to a current by the transconductance cell  $\beta$ . The resulting current is added to  $\alpha z[k]$  to form the loop filter output  $I_{ICO}$ . The integration allows for frequency offsets between the local ICO and



Figure 4. Multiplier and loop filter integrator.

the read data while maintaining zero phase offset in steady state.

In steady state, the loop filter coefficients used were  $\alpha = 0.2$  and  $\beta = 40\mu$ A/V. The differential integration capacitance was 20pF, and  $G_m = 0.1$ mA/V. The ICO gain was  $K_{ICO} = 140$ kHz/  $\mu$ A.

The multiplication and integration in Fig. 3 are realized together by the fully differential structure shown in Fig. 4. The error e is represented by a differential current

$$e = I_{ep} - I_{en} \tag{6}$$

The switches in Fig. 4 are controlled by the slope estimate slope. For a positive slope estimate, up = 1 and dn = 0 so the error current is integrated onto the capacitors. For a negative slope estimate, up = 0 and dn = 1 and the negative of this current is integrated. For a slope estimate of zero, all switches remain open and no integration is performed. When all switches in Fig. 4 are open, all current sources are switched to a replica circuit (not shown), so that the current-source transistors remain in saturation. The signals up and dn have a duty cycle of 50% to avoid integrating glitches and allow time for the error e to settle before being multiplied by the slope.

The 4 cascode devices with gates connected to  $V_{CM}$  buffer the outputs from clock feedthrough due to the switches. The integration capacitors ( $C_{int} = 40 \text{pF}$ ) are implemented by PMOS devices whose drain, source and body are connected to the positive supply.

The common-mode (CM) voltage at the output of the integrator is set to  $V_{CM}$  by a CM feedback circuit (not shown) that controls the current  $I_{CM}$ . However, there is no CM feedback when the slope is zero because the current sources are not connected to the outputs in this case. In practice, this is acceptable since the amount of CM voltage drift due to leakage currents is small, even during a long period of successive zero slopes.

Any dc offset that can be referred to the input of the loop filter will result in a steady-state phase error in the recovered clock. One potential source of such offset is due to offset in the Gm cell generating the error current e. Another potential source is mismatch between the two



Figure 5. (a) Three stage ring oscillator. (b)Replica bias circuit.

current sources generating the common-mode feedback current  $I_{CM}$ . When slope = 1, these offsets will be integrated onto the capacitors. However, when slope = -1, the complement of such offsets will be integrated. Since, for binary data, a positive slope estimate must be followed by a negative slope estimate (with an arbitrary number of zero slope estimates in between), the net offset integrated is zero.

The slope multiplication and scaling by factor  $\alpha$  for the direct path in the loop filter in Fig. 3 are combined in a manner similar to that described for the integrator in Fig. 4. Therefore, the effect of offsets in this path are reduced in the same way as they are in Fig. 4.

#### 4.2. Current Controlled Oscillator

The current controlled oscillator is shown in Fig. 5(a). It is a 3-stage ring oscillator; its frequency is controlled by tail current  $I_{ICO}$  in each stage. A replica bias circuit in Fig. 5(b) generates bias voltage  $V_B$  that makes the drain currents of both conducting PMOS load devices in each inverter equal when all the tail current is steered to one side. This biasing ensures a large output swing over a wide range of operating frequencies. When  $v_{op} = v_{on}$ , the PMOS devices controlled by  $V_B$  are triode. The other PMOS load devices are diode-connected. The conductances of the triode and diode-connected transistors have opposite voltage coefficients. Therefore, the two load impedances in each inverter are fairly well matched over the output swing [5], which helps reduce supply noise coupling that can cause jitter.



Figure 6. Die photograph.

Capacitors formed from the 3 metal layers form a linear differential capacitive load and set the oscillation frequency range to include the expected operating rate of the NPDFE. Common-mode (CM) oscillation is avoided by using a cascoded tail current source for high CM rejection. The measured operating range of the ICO is 40MHz-240MHz.

### 5. Measured Data

The  $0.5\mu m$  CMOS prototype contains the blocks in Fig. 1. A die photograph is shown in Fig. 6. Measurements were taken using a signal that models a Lorentzian read channel with  $PW_{50} = 2.5T$ . The signal was loaded into an arbitrary waveform generator and bandlimited white noise was added to form the test input signal. The measured bit-error rate versus input SNR. with and without timing recovery enabled is plotted in Fig. 7 at 160Mbps. The SNR loss with timing recovery is 0.2-0.4dB compared to without timing recovery (i.e., operating the input sampler from an external clock synchronized to the test data). The measured cycle-to-cycle jitter at 160Mbps is 50ps(rms), or 0.8% of a bit period, and 180ps(pk). A jitter histogram is shown in Fig. 8. Performance is summarized in Table 1. The timing recovery circuits consume only 13% of the area and power while providing jitter low enough to allow near optimum performance of the NPDFE.

Table 1. Measured performance at 3.3V and 25°C.

| Process                        | $0.5 \mu m$ SPTM CMOS |
|--------------------------------|-----------------------|
| Active Area (Timing Rec. only) | $1.5mm^2 (0.2mm^2)$   |
| Data Rate = $1/T$              | 160Mbps               |
| Power (Timing Rec. only)       | 213mW (27mW)          |
| Frequency Lock Range           | $\pm 5\%$             |
| Maximum Acquisition Time       | 100T                  |
| Steady-State Jitter            | 50ps(rms), 180ps(pk)  |



Figure 7. Measured bit-error rate vs. input SNR with (solid) and without (dashed) timing recovery at 160Mbps.



Figure 8. Measured jitter histogram at 160Mbps and 16dB input SNR.

### 6. Conclusion

A timing recovery scheme for a NPDFE has been demonstrated. The addition of on-chip timing recovery and clock generation increased the maximum data rate to 160 Mbps, compared to 100 Mbps for the prototype in [1], where all clock generation was performed off-chip. The maximum speed is limited by the NPDFE circuits and not by the timing recovery loop, which is capable of operating about 50% faster, based on simulations and the measured ICO oscillation range.

- M.Q. Le, P.J. Hurst, and J.P. Keane, "An adaptive analog noise-predictive decision-feedback equalizer," *JSSC*, pp. 105-113, Feb. 2002.
- [2] S.U.H. Qureshi, "Timing Recovery for Equalized Partial-Response Systems," *IEEE Trans. Comm.*, vol. COM-24, pp. 1326-1331, Dec. 1976.
- [3] W.L. Abbott and J. M. Cioffi, "Timing recovery for adaptive decision feedback equalization of the magnetic storage channel," *GLOBECOM*, vol. 3, pp. 1794-1799, 1990.
- [4] J.W.M. Bergmans, Digital Baseband Transmission and Recording, Kluwer Academic Publishers, pp. 555-557, 1996.
- [5] J.G. Maneatis, "Low-jitter process-independent DLL and PLL based on self-biased techniques," *JSSC*, pp. 1723-32. Nov. 1996.