# Double Stage Domino Technique: Low-Power High-Speed Noise-tolerant Domino Circuit for Wide Fan-In Gates

R Ravikumar

Department of Micro and Nano Electronics, VIT University, Vellore, India ravi10ee052@hotmail.com

Abstract—In this paper, a new technique for domino circuit is proposed, which has high noise immunity and consume low power without degrading the performance for wide fan-in dynamic gates. The proposed circuit utilizes the double stage domino technique, in which the domino circuit is divided into two stages: standard footed domino includes pull-down network and another standard footed domino includes one pull-down transistor and one keeper transistor. In between stage one and stage two, a simple current mirror is utilized. The wide fan-in gates are designed in 90-nm gpdk technology and simulation result of 64-bit OR gate shows 78% power reduction and Unity Noise Gain (UNG) increased by 3.25 times, compares to the standard domino circuits, while simulation carried out under same delay and process corner. And the proposed technique maintains Figure of Merit (FOM) of 13.13.

Keyword-Domino logic, wide fan-in, leakage tolerance, noise immunity.

# I. INTRODUCTION

Domino logic, a standout amongst the most generally used dynamic logic to achieve better performance. Though having several advantages over the static logic styles [1], the more sensitive towards the noise makes it vulnerable than the static logic families. As technology advances, the scaling down in technology to gain area, the reduced power supply voltage to achieve low power consumption, and scaled down threshold voltage for better performance, exponentially increases the leakage current at subthreshold situation. Especially in dynamic wide fan-in gates [2], the major concerns are leakage- current reduction and noise immunity improvement.

In this paper, for wide fan-in dynamic gates, a new double stage domino (DSD) circuit technique is proposed. The proposed circuit has better performance, high noise immunity and consumes less power on comparing with the standard footless domino circuit. The paper proceeds as follows: Section II includes the Literature review, Section III explains the proposed circuit technique, and Section IV includes the simulation results of the proposed double stage domino circuit using cadence virtuoso 6.1 in 90 nm gpdk technology and comparison with other standard domino circuits.

# **II. LITERATURE REVIEW**

The conventional standard footless domino circuit [1], to prevent dynamic node from discharging due to the leakage current and during evaluation phase, to avoid pull-down network charge sharing, a pMOS keeper transistor is utilized. The noise immunity is also improved. The Keeper ratio K is given by

$$K = \frac{\mu_p(\frac{W}{L})_{Keeper \ transistor}}{\mu_n(\frac{W}{L})_{Evaluation \ network}}$$
(1)

Where  $\mu_p$  and  $\mu_n$  are the hole and electron mobilities respectively and W represents the Width of the transistor and L represents the length of the transistor. The current contention between the pMOS keeper transistor and the pull-down evaluation network increases, when upsizing the keeper transistor to achieve appreciable noise immunity, which leads to more consumption of power and increases delay. This issue is more pronounced in when large number of wide dynamic fan-in gates are connected between the dynamic node and the GND, since the more number of leakage paths are available and increased leakage current.

To address this trade-off issue between noise immunity and leakage power consumption and the number of parallel paths in pull-down network, several techniques are proposed. These techniques include the changing the keeper transistor gate voltage controlling circuit and changing the circuit topology by introducing a footer transistor or re-designing the pull-down network. The conditional-keeper-domino [3], controlled keeper by current-comparison-domino [4], High-speed-domino [5] and leakage -current-replica-keeper domino [6] are falls into former category. While the current-comparison-based-domino [7], diode-footed-domino [8], and diode-partitioned-domino [9] comes under later category.



Fig. 1. Standard Footless Domino (SFLD) [1].

#### III. PROPOSED CIRCUIT DESIGN

In wide fan-in OR gates, the dynamic node capacitance is large and more number of parallel paths leads to more leakage results in speed degradation and less noise immunity respectively. The standard domino circuits utilizes up-sized keeper transistor to improve noise robustness at the expense of more power consumption and increased delay due to large contention. These problems can be solved by opting proposed double stage domino technique [Fig.4].

The proposed idea is illustrated in [Fig.3]. On compared with the conventional standard footless domino (SFLD) [Fig.1] the proposed DSD circuit [Fig.4] has five additional transistors. The proposed circuit technique can be viewed has double stages. The first stage is the standard footed domino with eliminated keeper pMOS transistor and inverter, the evaluation transistor is grounded via  $M_{Mirror1}$  transistor. The first stage includes the pre-charge transistor  $M_{Pre1}$ , evaluation network pull-down transistors, evaluation transistor  $M_{Eva1}$ . Unlike standard domino circuits, the dynamic node 'A' is separated from the output inverter. The second stage is the standard footed domino, it includes the pre-charge transistor  $M_{Pre2}$ , only one transistor in pull-down network  $M_{Mirror2}$ , evaluation transistor  $M_{Eva2}$ . The dynamic node 'B' is directly connected to the output inverter, the charge at this dynamic node 'B' is indirectly controlled by the charge at dynamic node 'A' via a simple current mirror.

The first stage prepares the input signal for the pull-down transistor  $M_{Mirror2}$  in the second stage. During evaluation phase the dynamic power consumed in both the stage one and stage two. Since the dynamic power consumption depends on various parameters like capacitance, input power supply, the switching current in the switching node at constant frequency and temperature, voltage swing. The first stage with the n-input and footed evaluation transistor via  $M_{Mirror1}$  has no contention and lower voltage swing from VDD-V<sub>THN</sub> to GND, the second stage with only one pull-down transistor and keeper has minimum contention current and rail-to-rail voltage swing.

During the evaluation, across the mirror transistor M<sub>Mirror1</sub>, some considerable voltage drop has been established by the current in the pull-down evaluation network. This established voltage would be very less, if all the inputs for the pull-down network applied are at the low level and only the leakage- current flows through the pull-down network and the current-mirror transistor M<sub>Mirror</sub>, which might not enough to properly drive the current mirror formed by the M<sub>Mirror1</sub> and M<sub>Mirror2</sub>. Although at worst case, this leakage- current can be mirrored to the second stage, the keeper pMOS transistor MKpr in second stage compensates this mirrored leakage-current. On other hand, if there is at least one of the input is high, there exists a parallel path conducting between dynamic node 'A' and ground. This current, flow through the mirror transistor M<sub>Mirror1</sub> is sufficient to establish voltage across it, which is enough to drive the current mirror, turning on the mirror transistor M<sub>Mirror2</sub> in the second stage and in-turn pull-down the charge at the dynamic node 'B' and changing the voltage at OUT, output of the inverter. The mirror transistor M<sub>Mirror1</sub> present in between the M<sub>Eva1</sub> and GND, reduces the current leakage when allinputs to the pull-down network transistors kept at low level and also reduces the sub-threshold leakage due to stacking effect [10]. The voltage drop across the current-mirror transistor  $M_{Mirror1}$  due to the flow of leakage current: establishes negative voltage across gate to source of the pull-down evaluation network transistors, increases the source voltage and in-turn increases the body effect and results in increased threshold voltage of the pull-down evaluation network transistors (almost two times), decreases the drain to source voltage of the pull-down network transistors and reduces the drain induced barrier lowering (DIBL) leakage. By doing so, it decreases the sub-threshold leakage and the leakage power of the proposed DSD circuit.



Fig. 2. Current-Comparison-Based-Domino Circuit [7].



Fig. 3.Proposed Double Stage Domino concept (DSD).



Fig. 4. Wide dynamic fan-in OR gate implementation by utilizing Double Stage Domino (DSD) technique.

In the proposed technique, since the dynamic node 'A' and dynamic node 'B' are isolated from each other with the help of simple current mirror, even though the noise from the input signals affects the charge at dynamic node 'A' and leads to dis-charging, this dis-charging current is might not be sufficient enough to drive the current mirror transistors  $M_{Mirror1}$  and  $M_{Mirror2}$ . In case of high amplitude noise or noise of appreciable duration, the dis-charging currents drives the current mirror and pull-down the charge at dynamic node 'B' leads

to failure of logic. To avoid logic function failure, the keeper transistor  $M_{Kpr}$  is utilized to compensate the discharging current in second stage. Thus maintains the noise robustness and improves the noise immunity of the circuit. The upsizing of the mirror transistor  $M_{Mirror2}$  increases the speed of operation without degrading the noise immunity up to certain level. The mirror ratio of the simple current mirror 'M' can be described as the ratio of size (W/L) of the  $M_{Mirror2}$  transistor to the size (W/L) of the  $M_{Mirror1}$  transistor.

$$M = \frac{\left(\frac{W}{L}\right)_{Mirror2}}{\left(\frac{W}{L}\right)_{Mirror1}}$$
(2)

Upsizing the  $M_{Mirror2}$  transistor and increasing the value of 'M' will results in high mirrored current which inturn increases the speed of operation at the expense of noise immunity degradation.

#### A.Pre-charging Phase

In this phase, the input signals IN1, IN2....INn and clock voltage CLK are in low level. The transistors  $M_{Pre1}$  and  $M_{Pre2}$  are ON and charges the dynamic node 'A' and dynamic node 'B' to VDD respectively. And the transistors  $M_{Eva1}$ ,  $M_{Eva2}$ ,  $M_{Mirror1}$ , and  $M_{Mirror2}$  all are OFF and the  $M_{Kpr}$  is ON. Therefore, the output inverter sets the OUT voltage at low level.

#### B. Evaluation Phase

During this phase, the clock voltage CLK reaches high level, the transistors  $M_{Pre1}$  and  $M_{Pre2}$  are OFF and transistors  $M_{Eva1}$ ,  $M_{Eva2}$  are ON and the keeper transistor  $M_{Kpr}$  is ON or OFF depends on the input signals voltages can be either at low level or at high level. If all the input signals voltages are at low level then, due to leakage current, the mirror transistors  $M_{Mirror1}$  and  $M_{Mirror2}$  are ON in-turn mirrors the leakage current to the second stage, the keeper transistor  $M_{Kpr}$  at second stage is which is still ON compensate this mirrored leakage current. Therefore the output voltage is maintained at low level by the output inverter. On other hand, if there exists a at least one conduction path, that is if at least one of the input signals voltage is at high level, the charge at dynamic node 'A' pulls down, the pull-down current flow through the mirror transistor  $M_{Mirror1}$  and establishes non-zero voltage across gate to source of the saturated mirror transistor  $M_{Mirror1}$ . The current- mirror mirrors the pull-down current and turning ON the  $M_{Mirror2}$  mirror transistor  $M_{Kpr}$  and the output inverter sets the OUT voltage to high.

## IV. SIMULATION RESULTS AND COMPARISONS

The proposed double stage domino 8, 16, 32, 64 inputs wide fan-in dynamic OR gates were simulated in gpdk 90-nm technology at 110° C temperature with provided supply voltage of 1 V under 1-GHz clock frequency. A capacitor of 5 fF is used at the output side, for the worst case measurement under high fan-out, heavy load conditions. The simulated waveform of double stage domino 64 inputs wide fan-in dynamic OR gate is shown in [Fig.5].

## A. Noise Margin Metric

The unity noise gain (UNG), noise-margin metric is used in this work. UNG can be defined as the input noise amplitude to cause the same amplitude appears at the output side [4]. It can be written as

Unity Noise Gain = 
$$\{VIN: VNOISE = VOUTPUT\}$$
 (3)

To all inputs in the pull-down network, identical noise pulses with duration of 30 ps are applied and the noise amplitude at the inverter output, OUT in [Fig.4] is observed for different input noise amplitudes. To simulate cross-talk noise type at the input side, pulse-noise is used. The amplitude and the duration decide the effectiveness of noise, in this work the noise level at the input side is changed by changing the amplitude. [Fig.6] shows the UNG calculation waveform for DSD 64 inputs wide fan-in OR gate. The UNG is measured as 0.91 and beyond 0.91 V the applied input noise produces an amplified output noise.

# B. Figure of Merit (FOM)

To compare the proposed technique with the other standard techniques, figure of merit (FOM) [7] measure is used in this work. It can be written as

Figure of Merit = 
$$\frac{UNG_{norm}}{t^2_{p-norm} X P_{tot-norm} X A_{norm} X \sigma_{Delay-norm}}$$
(4)

Where UNGnorm is the unity noise gain, Anorm is the total area of the circuit,  $\sigma_{Delay-norm}$  is the standard deviation of delay and  $t^2_{p-norm}$  is worst case propagation delay and each parameter is normalized with respect to the standard footless domino (SFLD) wide fan-in dynamic OR gate values. The normalized average total power: short-circuit, switching and power due to leakage is represented by the term  $P_{tot-norm}$ .  $P_{avg} X t^2_p$  gives the Energy Delay Product (EDP) according to [1] is the most critical parameter.

#### C. Transistors Sizing

The 64 inputs wide fain-in dynamic OR gate constructed in standard footless domino [Fig.1], currentcomparison-based domino [Fig.2], and in the proposed double stage domino [Fig.4] are simulated and compared under the same process corner and temperature and delay. The length and width of the pull-up transistors (CCD) or pull-down transistors (SFLD and DSD) in the evaluation network is set to the minimum value, where  $W_{min}$  = 1.4 L<sub>min</sub>, L<sub>min</sub> = 90 nm. The pMOS to nMOS width ratio of CCD inverter is set as two and all other transistors are sized according to the size mentioned in [7, TABLE I]. The size ratio of keeper transistor 'K' (0.1 to 1 in eq.(1)) and pre-charge transistor in SFLD are upsized to measure UNG and delay at different data points and to achieve delay of desired value respectively. In CCD, the pre-charge transistor Mpre and the mirror transistor M2 are upsized, when it is necessary to achieve delay of desired values. In proposed DSD the size of the transistors  $M_{pre1}$ ,  $M_{pre2}$ ,  $M_{Mirror2}$ ,  $M_{Eva1}$  and  $M_{Eva2}$  are necessarily varied and optimized to provide more noise-immunity and less delay. The inverter sizing ratio is maintained to provide threshold at VDD/2 voltage. There is a trade-off between power, size, delay and noise- margin, the consumption of power can be reduced by decreasing the size of transistors at the expense of delay and noise-margin. The sizes of all the transistors for the proposed DSD are mentioned in [TABLE I].

The proposed DSD 8, 16, 32 and 64 inputs wide fan-in dynamic OR gates are simulated at 110° C with the output loaded with 5 fF and the Unity Noise Gain (UNG), worst case delay and power consumed are tabulated in [TABLE II].



Fig. 5. Simulated waveform of 64 inputs wide fan-in Or gate in DSD.

For the SFLD, CCD and the proposed DSD, the delay is calculated at worst case scenario. If no parallel path is available for discharge during evaluation phase, except only one parallel path, that is only one of the pull-down (pull-up in case of CCD) transistor is conducting gives the worst case delay and the power consumed. The Figure of Merit (FOM) for 64 inputs wide fan-in OR gate implemented in SFLD, CCD and DSD under same delay is tabulated in [TABLE III]. For standard comparison the power consumed, delay and UNG are all normalized with the values of Standard Footless Domino (SFLD). The result shows that the proposed DSD circuit consumed 78% and 60% less power on comparing to the SFLD and CCD domino structures respectively.

The UNG factor is increased 3.25 and 1.7 times compared with SFLD and CCD respectively. The normalized power and delay of the proposed circuit under different process corners at 110°C is shown in [Fig.7]. The delay and consumption of power for 8, 16, 32 and 64 inputs wide fan-in OR gates shown in [Fig.8], concludes that the delay and consumption of power increases with increase in number of inputs.



Fig.6. Input and output noise waveform of 64 inputs wide OR gate implemented in DSD.

| Fan-in<br>(delay-<br>ps) | W <sub>Pre1</sub> | W <sub>Pre2</sub> | W <sub>Mirror</sub> | W <sub>Mirror</sub> | W <sub>Eva1</sub> | W <sub>Eva2</sub> | Inverter<br>(W <sub>p</sub> /W <sub>n</sub> ) | W <sub>Kpr</sub> | W <sub>pull-</sub><br>down n/w |
|--------------------------|-------------------|-------------------|---------------------|---------------------|-------------------|-------------------|-----------------------------------------------|------------------|--------------------------------|
| 8                        | 4.2               | 2.8               | 1.4                 | 9.8                 | 9.8               | 8.4               | 3.6 Lmin/                                     | 1.4              | 1.4                            |
| (215)                    | Lmin              | Lmin              | Lmin                | Lmin                | Lmin              | Lmin              | 1.4 Lmin                                      | Lmin             | Lmin                           |
| 16                       | 2.8               | 2.8               | 1.4                 | 9.8                 | 8.4               | 8.4               | 3.6 Lmin/                                     | 1.4              | 1.4                            |
| (215)                    | Lmin              | Lmin              | Lmin                | Lmin                | Lmin              | Lmin              | 1.4 Lmin                                      | Lmin             | Lmin                           |
| 32                       | 1.4               | 1.4               | 1.4                 | 9.8                 | 8.4               | 8.4               | 3.6 Lmin/                                     | 1.4              | 1.4                            |
| (225)                    | Lmin              | Lmin              | Lmin                | Lmin                | Lmin              | Lmin              | 1.4 Lmin                                      | Lmin             | Lmin                           |
| 64                       | 1.4               | 1.4               | 1.4                 | 9.8                 | 7                 | 9.8               | 3.6 Lmin/                                     | 1.4              | 1.4                            |
| (275)                    | Lmin              | Lmin              | Lmin                | Lmin                | Lmin              | Lmin              | 1.4 Lmin                                      | Lmin             | Lmin                           |

TABLE I. Sizes Of All The Transistors For Double Stage Domino DSD Technique.

TABLE II. Proposed DSD 8, 16, 32, 64 Inputs Wide OR Gates UNG, Delayand Power Consumption.

| Fan-in | Unity Noise Gain<br>(UNG) | Delay (ps) | Power<br>(µW) |
|--------|---------------------------|------------|---------------|
| 8      | 0.97                      | 215        | 3.78          |
| 16     | 0.96                      | 215        | 3.9           |
| 32     | 0.93                      | 225        | 4.28          |
| 64     | 0.91                      | 275        | 5.24          |

TABLE III. Figure Of Merit (FOM) Comparison OF 64 Inputs Wide Fan-in OR Gates at 110°c While Delay Maintained Same.

|                                   | Standard<br>Footless<br>Domino<br>(SFLD) | Current<br>Comparison<br>Based Domino<br>(CCD) | Double Stage<br>Domino<br>(DSD) |
|-----------------------------------|------------------------------------------|------------------------------------------------|---------------------------------|
| No. of Transistors                | 68                                       | 73                                             | 73                              |
| Area $(W_{min} X L_{min}) (fm^2)$ | 864                                      | 938                                            | 972                             |
| Normalized Area                   | 1                                        | 1.085                                          | 1.125                           |
| Power (µW)                        | 23.6                                     | 12.85                                          | 5.24                            |
| Normalized Power                  | 1                                        | 0.54                                           | 0.22                            |
| Delay (ps)                        | 275                                      | 275                                            | 275                             |
| Normalized Delay                  | 1                                        | 1                                              | 1                               |
| UNG                               | 0.27                                     | 0.54                                           | 0.91                            |
| Normalized UNG                    | 1                                        | 1.92                                           | 3.25                            |
| FOM                               | 1                                        | 3.3                                            | 13.13                           |



Fig. 7.Normalized power and delay versus various process corners. Fig. 8. Power and delay versus number of inputs in wide OR gate

## V. CONCLUSION

The main goal of this paper, is to achieve high noise-tolerant with low power consumption for domino circuit, without degrading the performance especially in wide fan-in gates. This has been full-filled by the proposed circuit, which has 3.25 times of UNG and consumes 78% less power compares with the standard Footless Domino (SFLD). Moreover the proposed circuit has FOM of 13.13, which makes it more suitable to implement Boolean logic functions with low power consumption and better performance than other counterpart standard domino circuits.

#### ACKNOWLEDGMENT

My sincere thanks to Prof. Dr. Sri Adibhatlasridevi, Department of Micro and Nano Electronics, VIT University for the Digital IC design classes.

#### REFERENCES

- [1] J. M. Rabaey, A. Chandrakasan, and B. Nicolic, Digital Integrated Circuits: A Design Perspective, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 2003.
- [2] L. Wang, R. Krishnamurthy, K. Soumyanath, and N. Shanbhag, "An energy-efficient leakage-tolerant dynamic circuit technique," in Proc. Int. ASIC/SoC Conf., 2000, pp. 221–225.
- [3] A. Alvandpour, R. Krishnamurthy, K. Sourrty, and S. Y. Borkar, "A sub-130-nm conditional-keeper technique," IEEE J. Solid-State Circuits, vol. 37, no. 5, pp. 633–638, May 2002.
- [4] A. Peiravi and M. Asyaei, "Robust low leakage controlled keeper by current-comparison domino for wide fan-in gates, integration," VLSI J., vol. 45, no. 1, pp. 22–32, 2012.
- [5] M. H. Anis, M. W. Allam, and M. I. Elmasry, "Energy- efficient noise-tolerant dynamic styles for scaled-down CMOS and MTCMOS technologies," IEEE Trans. Very Large Scale (VLSI) Syst., vol. 10, no. 2, pp. 71–78, Apr. 2002.
- [6] Y. Lih, N. Tzartzanis, and W. W. Walker, "A leakage current replica keeper for dynamic circuits," IEEE J. Solid-State Circuits, vol. 42, no. 1, pp. 48–55, Jan. 2007.
- [7] Ali Peiravi and Mohammad Asyaei, "Current-Comparison-Based Domino: New Low-Leakage High-Speed Domino Circuit for Wide Fan-In Gates," IEEE Trans. VLSI systems, vol. 21, no. 5, May 2013.
  [8] H. Mahmoodi and K. Roy, "Diode-footed domino: A leakage-tolerant high fan-in dynamic circuit design style," IEEE Trans. Circuits
- [8] H. Mahmoodi and K. Roy, "Diode-footed domino: A leakage-tolerant high fan-in dynamic circuit design style," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 3, pp. 495–503, Mar. 2004.
- [9] H. Suzuki, C. H. Kim, and K. Roy, "Fast tag comparator using diode partitioned domino for 64-bit microprocessors," IEEE Trans. Circuits Syst., vol. 54, no. 2, pp. 322–328, Feb. 2007.
- [10] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, "Leakage current mechanisms and leakage reduction techniques in deepsubmicrometer CMOS circuits," Proc. IEEE, vol. 91, no. 2, pp. 305–327, Feb. 2003.