# Comparative Analysis of Robustness of Spin Transfer Torque based Look up Tables under Process Variations

| Ragh Kuttappa                  | Houman Homayoun         |
|--------------------------------|-------------------------|
| School of Engineering          | Electrical and Computer |
| San Francisco State University | Engineering             |
| San Francisco, CA, USA         | George Mason University |
| ragh@mail.sfsu.edu,            | Fairfax, VA, USA        |
|                                | hhomayou@gmu.edu        |

Abstract-Spin Transfer Torque (STT) switching realized using a Magnetic Tunnel Junction (MTJ) device has shown great potential for low power and non-volatile storage. A prime application of MTJs is in building non-volatile Look Up Tables (LUT) used in reconfigurable logic. Such LUTs use a hybrid integration of CMOS transistors and MTJ devices. This paper discusses the reliability of STT based LUTs under transistor and MTJ variations in nano-scale. The sources of process variations include both the CMOS device related variations and the MTJ variations. A key part of the STT based LUTs is the sense amplifier needed for reading out the MTJ state. We compare the voltage and current based sensing schemes in terms of the power, performance, and reliability metrics. Based on our simulation results in a 16nm CMOS, for the same total device area, the voltage mode sensing scheme offers 75% lower failure rates under threshold voltage (Vth) variations, 4.9X higher tolerance to MTJ resistance variations, 19% less delay, and 64% lower active power compared to the current sensing scheme.

### Keywords - Look Up Table (LUT), Magnetic Tunnel Junction (MTJ), Process Variations, Sense Amplifier, Spin Transfer Torque (STT).

# I. INTRODUCTION

Spin Transfer Torque (STT) refers to a switching mechanism resulting in change of magnetic state in a Magnetic Tunnel Junction (MTJ) device [1]. The MTJ is composed of fixed and free magnetic layers isolated by a thin insulator (Fig. 1) [2]. The parallel and anti-parallel magnetic state of the two layers, representing binary states, is sensed by the resulting low and high resistance across the two terminals of the MTJ [1,2]. The current passed through the MTJ for sensing its resistance (i.e. read current) has to be less the current needed for changing its state (i.e. write or critical current) [1,2].

Due to its non-volatile nature and CMOS compatibility, STT-based memory (STT-RAM) has shown great promise in addressing the leakage barrier for SRAM. While the high write power still remains to be a major obstacle for STTRAM [3], the application of STT-based memory in reconfigurable logic, as in Field Programmable Gate Arrays (FPGA), seems more promising due to the low frequency of reconfigurable logic relies on implementing logic in small Look-Up-Tables (LUT). STT-based LUTs are realized by using MTJs as storage elements and using CMOS for interface circuitry needed for read and write operations [4,5,6,7]. The CMOS interface includes a decoder/mux for selecting a unique MTJ for read/write and a sense amplifier

This research is sponsored by the Defense Advanced Research Project Agency of the USA.

Hassan Salmani Electrical and Commuter Engineering Howard University Washington, DC, USA hassan.salmani@howard.edu Hamid Mahmoodi School of Engineering San Francisco State University San Francisco, CA, USA mahmoodi@sfsu.edu

for sensing the resistance of the selected MTJ in the read mode [4,5,6,7].

Scaling of the CMOS technology to ever smaller dimensions has posed serious reliability challenges to designs. The main cause of the issue is increasing process variations (both spatial and temporal) affecting transistor characteristics, and especially the threshold voltage ( $V_{th}$ ). Such variations include both inter- and intra-die variations. Some causes of variations such as Random Dopant Fluctuations (RDF) exhibit uncorrelated variations from one device to another and hence fall into the intra-die category, whereas other sources such and oxide-thickness variations tend to exhibit correlations among adjacent devices and hence fall more into the inter-die variations. In addition to transistor variations, MTJs exhibit variations in their geometrical parameters such as insulator thickness [8]. Such variations result in variations in resistance of an MTJ [9].

In this paper, we analyze the impact of CMOS/MTJ process variations on reliability of the STT-based LUTs. We present a comparative analysis of voltage vs. current mode sensing schemes in such LUTs. The contributions of this paper are as follows:

- Comparative reliability analysis of voltage vs. current mode sensing schemes in STT-based LUTs considering both CMOS and MTJ variations
- Statistical transistor sizing of the STT-LUT designs for fair comparison under same area

The remainder of the paper is organized as follows. Section II introduces the voltage and current mode sensing schemes for STT-LUTs. Section III presents the modeling of process variations and statistical sizing of the designs. The results of the process variation analysis and comparisons are discussed in Section IV. Section V concludes the paper.

# II. SENSING SCHEMES FOR STT-LUT

An n-input LUT contains  $2^n$  storage elements that are accessed via a decoder/MUX. However, in STT-LUTs, since the storage elements are MTJs that exhibit high and low resistance states, there is also need for a sense amplifier stage to compare the resistance of the selected MTJ with a reference resistance to produce full-swing logic one or zero signal depending on the MTJ being in the high or low resistance state (Fig. 2) [4,5,6,7]. For high read performance and enhance noise margin, greater difference between the low and high resistances of the MTJ is desired.



Fig. 1: Programmable MTJ: (a) Parallel (low resistance) and (b) antiparallel (high resistance) states



Fig. 2: STT-based LUT, a hybrid MTJ/CMOS design This resistance differential is quantified by the Tunnel Magneto Resistance (TMR), defined as:

$$TMR = \frac{R_{AP} - R_P}{R_P} \tag{1}$$

where  $R_P$  and  $R_{AP}$  are the resistances of the MTJ in the parallel and anti-parallel states, respectively. TMR is technology parameter dependent on the MTJ geometries and materials.

To translate the  $R_P$  and  $R_{AP}$  into a binary full swing voltage signal in the read mode, a sense amplifier is used to compare the resistance of the selected MTJ against a reference resistor (Fig. 2). Ideally, the value of the reference resistor should be the average of  $R_P$  and  $R_{AP}$ . In the read mode, the selected MTJ and the reference resistors are voltage biased and their currents are passed to the sense amplifier stage. The sense amplifier can be designed to either directly amplify the current differential (i.e. current mode sensing) or a current-to-voltage conversion stage may precede a voltage mode sense amplifier. Since the write paths remain identical, we will only discuss the read paths and compare the read performance of the two styles.

## A. VOLTAGE SENSING MODE STT-LUT

Fig. 3 shows the schematic of a Voltage Sensing Mode (VSM) 2-input (4-bit) STT-LUT [6]. This is a dynamic circuit that operates in a precharge (CLK=0) and evaluate (CLK=1) fashion. The MTJ selection is performed via a pass-transistor decoder/mux (selection tree). To balance the transistor paths of the MTJs and the reference resistor  $(R_{REF})$ , similar transistors are inserted above the reference resistor. When CLK goes high, the current provided by the dynamic current source is divided between the selected MTJ and the reference resistor, resulting in a current differential that will be drained from the nodes DEC and REF. This current differential is converted to a low swing voltage differential on the nodes DEC and REF by the current-tovoltage converter circuit which is composed of the two cross coupled PMOSes. This voltage differential is then amplified by a voltage-mode sense amplifier to produce full swing differential outputs (Z and Z').



B. CURRENT SENSING MODE STT-LUT

Fig. 4 shows the schematic of a Current Sensing Mode (CSM) 2-input (4-bit) STT-LUT [7]. The design is similar to the VSM version except that the current differential is directly applied to a current mode sense amplifier, and hence the current-to-voltage convertor circuit is eliminated. This is also a dynamic design. When the clock is high, the sense amplifier in biased in a metastable state by shorting its outputs. The outputs approach a voltage of about  $V_{dd}/2$  in this case and this voltage is also applied as bias to the MTJ and reference resistors. When the clock switches to low the cross-couple inverter in the sense amplifier will switch to one of the stable states and the direction of this switching will be determined by the current differential between the MTJ and the reference resistor. Since during the biasing of the sense amplifier in the meta-stable condition (i.e. when CLK is high), the outputs are shorted, there is a considerable amount of static short circuit power dissipated on the sense amplifier. In order to reduce this short circuit power, the CLK duty cycle (high duration) should be reduced. In this research CLK has a duty cycle of 50% to maintain uniformity in the analysis of the two schemes.

## III. PROCESS VARIATION MODELING AND ANALYSIS

CMOS process variations have various causes that affect transistor performance. The effect of most causes of variations can be captured as  $V_{th}$  variation. Some sources of variations such as RDF are random (uncorrelated) in nature, whereas some others such as oxide thickness variations are correlated. We divide the variations into two groups of inter and intra-die variations. The uncorrelated and random causes belong to the intra-die category and the correlated ones to the inter-die category. We model the  $V_{th}$  variation of a transistor by adding a DC voltage source in series with the gate terminal, with a parameterized voltage level that represents the total  $V_{th}$  shift for a transistor.

This modeling allows us to do both inter and intra-die  $V_{th}$  variation analysis. The intra-die variation considered in this study is RDF due to its prominence in scaled bulk CMOS transistors. The  $V_{th}$  shift by RDF is inversely related to the square root of the device area (W×L) as follows:



$$\sigma_{V_{t}} = \left[\frac{qT_{ox}}{\varepsilon_{ox}}\sqrt{\frac{(N_{a}W_{d})}{3L_{min}W_{min}}}\right] \times \sqrt{\frac{L_{min}W_{min}}{LW}} = \sigma_{V_{t}0} \times \sqrt{\frac{L_{min}W_{min}}{LW}}$$
(2)

where all the technology parameters are lumped into  $\sigma_{vt0}$ which represents the standard deviation of V<sub>th</sub> variation of a minimum sized transistor with dimensions  $L_{min}$  and  $W_{min}$ . L and W are channel length and width of the given transistor.

Sense amplifier circuits utilize differential pair transistors to do analog voltage or current comparison and hence are more sensitive to intra-die V<sub>th</sub> variations that cause mismatch among neighboring transistors such as those in a differential pair. Given that bigger transistors exhibit less intra-die  $V_{th}$  variations (Eq. (2)), it is expected that by increasing transistor sizes (W) in the LUT designs, the delay variation and failure probability should be reduced. Hence, a fair comparison between the two LUT designs should be made under same total transistor (active) area. Moreover, for a given total area constraint, it is not optimal to uniformly allocate area to all transistors, given that the V<sub>th</sub> variation of some might have more influence than that of others on the overall failure probability. To address this problem more formally, we define the delay to  $V_{th}$  sensitivity metric for a given transistor,  $M_i$ , in a circuit as:

$$Sensitivity_{i} = \frac{|T_{p0} - Max(T_{pi}, T_{p'i})|}{dVti}$$
(3)

where dVti is the V<sub>th</sub> variation applied to the transistor, Tp0 is the nominal delay of the design, and Tpi and Tp'i represents the delay to the OUT and OUT' of the LUT design after applying the V<sub>th</sub> variation to the given transistor,  $M_i$ .

Tables 1 and 2 summarize the sensitivity measurements in descending order for transistors of both LUT designs obtained by spice simulations in a predictive 16nm CMOS technology [10]. A transistor with higher sensitivity is given higher area (W) than another one with a lower sensitivity. From the  $V_{th}$  sensitivity results, it is observed that the sense amplifier transistors are the most sensitive ones and need to be given the highest portion of the area.

Besides transistors, MTJs also exhibit variations in their geometries [8]. Such variations result in variations in critical write current as well as high and low state resistances during the read mode [9]. Since we are concerned about read failures for the LUTs, we model the MTJ variations as variations in high and low state resistances ( $R_{AP}$  and  $R_P$ ). To determine the tolerable resistance variation margin, we

measure how low  $R_{AP}$  and how high  $R_P$  can vary before a read failure occurs. The lowest  $R_{AP}$  ( $R_{APmin}$ ) and the highest  $R_P$  ( $R_{Pmax}$ ) that the read passes are measured and used to define the Resistance Variation Margin (RVM) as:

 $\begin{array}{l} RVM=\min \left\{ \left(R_{AP0}-R_{APmin}\right), \left(R_{Pmax}-R_{P0}\right) \right\} \quad (4) \\ \text{where } R_{AP0} \text{ and } R_{P0} \text{ are the nominal values of } R_{AP} \text{ and } R_{P}, \\ \text{respectively. Ideally, } RVM \text{ can be as high as } (R_{AP0}-R_{P0})/2. \\ \text{Hence, we define the } \% RVM \text{ as:} \end{array}$ 

$$\% RVM = \frac{2 \times RVM}{R_{AP0} - R_{P0}} \times 100$$
 (5)

The values of  $R_{AP0}$  and  $R_{P0}$  are chosen to be 10 K $\Omega$  and 4 K $\Omega$  in 16nm node based on the MTJ scaling trends [11].

TABLE 1: Vth SENSITIVITY RANKINGS FOR VSM STT-LUT

| RANKING | TRANSISTOR NAME   | SENSITIVITY (pS/mV) |
|---------|-------------------|---------------------|
| 1       | MN1,MN5           | 5.94                |
| 2       | MN8,MN9,MN10,MN11 | 5.18                |
| 3       | MN14,MN15         | 5.16                |
| 4       | MN6,MN7           | 5.01                |
| 5       | MN16,MN17         | 3.49                |
| 6       | MP5,MP6           | 1.63                |
| 7       | MN2,MN4           | 1.47                |
| 8       | MN13              | 0.23                |
| 9       | MP1,MP2           | 0.22                |
| 10      | MP0               | 0.05                |
| 11      | MN12              | 0.004               |
| 12      | MP6,MP7           | 0.002               |

TABLE 2: V<sub>th</sub> SENSITIVITY RANKINGS FOR CSM STT-LUT

| RANKING | TRANSISTOR NAME  | SENSITIVITY (pS/mV) |
|---------|------------------|---------------------|
| 1       | MN1,MN2          | 4.20                |
| 1       | MN2              | 4.20                |
| 2       | MN3              | 1.93                |
| 3       | MN5,MN6          | 1.11                |
| 3       | MN6              | 1.11                |
| 4       | MN11             | 0.94                |
| 5       | MN12             | 0.86                |
| 6       | MN13             | 0.11                |
| 7       | MN7,MN8,MN9,MN10 | 0.003               |
| 8       | MP0,MP1          | 0.001               |

#### IV. RESULTS AND DISCUSSIONS

The MTJs in both LUTs are programmed to have 50% of the MTJs in the  $R_P$  and the rest in the  $R_{AP}$  state. Simulations are performed to apply all inputs combinations and measure read delay, power, and failure rates. Fig. 5 shows typical simulation waveforms of the LUTs at the nominal process corner. Fig. 6 shows the output waveform plots obtained by monte-carlo simulations of intra-die Vth variation. These waveforms clearly show delay variations and failures caused by V<sub>th</sub> variations. From these waveforms, delay distribution plots are obtained for both designs (Fig. 7). These plots are obtained for both STT-LUTs optimally designed according to the Vth sensitivity method for the same total active (transistor) areas of 0.02856  $\mu$ m<sup>2</sup> and 0.04284  $\mu$ m<sup>2</sup> (50% larger). The failure cases are put into a single bin shown on the far right side of the histograms. The VSM exhibits 61% to 75% less failure rates compared to the CSM style. By 50% upsizing of the total active area, the failure rate of the VSM style goes down by 43% and that of the CSM style goes down by only 13%. The enhanced reliability of the VSM is attributed to its two-stage signal amplification, first by the cross-coupled PMOSes in the current to voltage convertor and then by the sense amplifier (Fig. 3).



Fig. 5: Read waveforms showing clock and output signals for (a) VSM STT-LUT and (b) CSM STT-LUT

Due to the differential nature of the LUT circuits, both designs exhibit good tolerance to inter-die V<sub>th</sub> variations, as such variation do not cause mismatch among the transistors on the same circuit. With 30 mV of  $\sigma_{Vt}$  for inter-die V<sub>th</sub> variations, neither of the designs show any failures. We increased the inter-die  $\sigma_{Vt}$  to 60 mV to see some failures, and it is observed that the VSM style shows less failure rates for inter-die variations as well (Fig. 8).

Table 3 summarizes the numerical results of the LUTs for the two active areas. The VSM style exhibits 19% reduction in delay and 38% to 64% reduction in active power. The CSM style however exhibits 74% to 80% less standby power. The leakage difference is due to the fact that in the CSM style the sense amplifier is stacked on top of the selection tree offering additional stacking effect causing leakage reduction on the sense amplifier circuit. However, in the VSM style, the sense amplifier has its own separate connections to the supply lines offering less stacking effect.

The VSM style also shows more tolerance to MTJ variation as measured by the RVM and %RVM metrics defined by Eq. (4) and Eq. (5). The VSM style shows 4.2X to 4.9X higher RVM as compared to the CSM style.

Fig. 9 shows delay distributions and failure rates obtained under intra-die MTJ resistance variations with standard deviation ( $\sigma_R$ ) of 0.5 K $\Omega$ . As expected, the VSM style shows less failure rates (55% to 100% lower).

#### V. CONCLUSION

Reliability assessment of various circuit design styles is an important consideration in nano-scale CMOS/MTJ hybrid technologies. This paper performed a comparative reliability analysis of the STT-based LUTs under transistor and MTJ variations and determined that the VSM style shows superior reliability as compared to the CSM style under same design area. This reliability enhancement is present against not only transistor variations but also MTJ variations. The VSM improved reliability also comes with less propagation delay and active power consumption.

#### REFERENCES

- [1] C. Augustine et. al., IEEE Sensors Jrnal, pp. 756 766, 2012
- [2] N. Nishimura et. al., Journal of Applied Physics, pp. 5246-5249, 2002
- [3] M. Rasquinha et. al., ACM/IEEE Int. Symp. on LPE, pp. 389-394, 2010
- [4] S. Paul et. al., IEEE Trans. on Nanotechnology, 2011
- [5] F. Ren et. al., IEEE Trans. on ED, 57(5):1023 –1028, 2010.
  [6] D. Suzuki et. al., Symposium on VLSI Circuits, pp. 80 –81, 2009.
- [7] W. Zhao et. al., ACM Trans. on Embedded Computing Systems, 2009.
- [8] X. Wang et. al., IEEE Trans. on Magnetics, p. 2038, Apr. 2009.
- [9] Y. Zhang et. al., IEEE Trans. on Magnetics, p. 2038, Apr. 2009.
- [10] PTM: Predictive Technology Model, http://www.eas.asu.edu/~ptm

[11] K. Chun et. al., IEEE Jrnal of SSC, pp. 698-610, Feb. 2013



Fig. 6: Read output waveforms from monte-carlo  $V_{th}$  variation simulation for (a) VSM STT-LUT and (b) CSM STT-LUT



Fig. 7: Delay distributions under intra-die RDF V<sub>th</sub> variation ( $\sigma_{1\prime0}=30$  mV) for LUTs designed for same total active area of (a) 0.02856  $\mu$ m<sup>2</sup> (b) 0.04284  $\mu$ m<sup>2</sup>



Fig. 8: Delay distributions under inter-die V<sub>th</sub> variation ( $\sigma_{\nu_i}$ =60 mV) for LUTs designed for same total active area of (a) 0.02856  $\mu$ m<sup>2</sup> (b) 0.04284  $\mu$ m<sup>2</sup>

TABLE 3: SIMULATION RESULTS IN 16NM CMOS AT CLOCK FREQUENCY=0.5 GHZ,  $V_{DD}$ =0.7V, T=110°C

| METRIC                                      | Area=0.02856 |       | Area=0.04284 µm <sup>2</sup> |       |
|---------------------------------------------|--------------|-------|------------------------------|-------|
|                                             | $\mu m^2$    |       |                              |       |
|                                             | VSM          | CSM   | VSM                          | CSM   |
| Nominal delay (pS)                          | 214.6        | 265.0 | 196.5                        | 242.6 |
| Active power (uW)                           | 0.904        | 1.468 | 1.015                        | 2.815 |
| Standby Power (nW)                          | 82.46        | 16.62 | 97.21                        | 25.29 |
| %Failure rate under intra-die               | 27.3         | 70.3  | 15.5                         | 61.4  |
| $V_{th}$ variation ( $\sigma_{Vt0}$ =30 mV) |              |       |                              |       |
| %Failure rate under inter-die               | 0            | 0     | 0                            | 0     |
| $V_{th}$ variation ( $\sigma_{Vt}$ =30 mV)  |              |       |                              |       |
| %Failure rate under inter-die               | 7.9          | 8.7   | 0                            | 7.4   |
| $V_{th}$ variation ( $\sigma_{Vt}$ =60 mV)  |              |       |                              |       |
| Resistance Variation Margin                 | 2.5          | 0.5   | 2.5                          | 0.6   |
| $(RVM)(K\Omega)$                            |              |       |                              |       |
| %RVM                                        | 83           | 17    | 83                           | 20    |



Fig. 9: Delay distributions under intra-die MTJ resistance variation ( $\sigma_R$ =0.5 K $\Omega$ ) for LUTs designed for same total active area of (a) 0.02856  $\mu$ m<sup>2</sup> (b) 0.04284  $\mu$ m<sup>2</sup>