[IEEE 2012 International Conference on Computers and Devices for Communication (CODEC) - Kolkata, India (2012.12.17-2012.12.19)] 2012 5th International Conference on Computers and

��XL�-RXIVREXMSREP�'SRJIVIRGI�SR�'SQTYXIVW�ERH�(IZMGIW�JSV�'SQQYRMGEXMSR��'3()' ��1):

��谠��-)))

A Novel Delay Minimization Technique for Low LeakageWide Fan-In Domino Logic Gates

AkankshaChouhan

VLSI Design Lab, ABV-Indian Institute of Information

Technology and Management, Morena Link Road, Gwalior- 474010,

Madhya Pradesh,India [email protected]

,

VikasMahor VLSI Design Lab,

ABV-Indian Institute of Information Technology and Management,

Morena Link Road, Gwalior- 474010, Madhya Pradesh,India

[email protected]

ManishaPattanaik VLSI Design Lab,

ABV-Indian Institute of Information Technology and Management,

Morena Link Road, Gwalior- 474010, Madhya Pradesh,India

[email protected]

Abstract—With the scaling of technology the magnitude of leakage current has become a major cause of concern as it reduces the robustness of the circuit and leads to wastage of power. Most of the methods of leakage reduction lead to an increase in the delay of the circuit. In this paper a delay minimization block is proposed. This block is incorporated in a domino gate which has high threshold transistors for leakage reduction. The delay of high threshold domino gates has been reduced by using this mechanism. This facilitates the placement of high threshold domino gates in the critical or near critical paths of a design. Delay reduction of about 10% is achieved without any penalty on power delay productwhen wide fan-in domino gate has leakage as well as delay reduction features as compared to wide fan-in domino gates with only leakage reduction mechanisms. Simulations at 500MHz in 90nm show that leakage has reduced by 50% in the proposed design as compared to the conventional wide fan-in domino gate.

Keywords-Delay minimization, Leakage reduction, Wide fan-

in domino logic

I. INTRODUCTION Domino logic circuits are leakage sensitive circuits.

Leakage in the MOS device depends on its threshold voltage. With the scaling of technology, supply voltage is scaled down. Along with supply voltage, threshold voltage (Vth) is scaled down. If not, a substantial degradation in clock speed will occur, which is evident from the following equation[1]-

As threshold voltage is scaled down, the subthreshold

leakage current increases according to the following equation [2]-

where I0 = 0 Cox (W/L) VT

2 1.8, Vth is threshold voltage, VT = kT/q is the thermal voltage, Cox is gate oxide capacitance, 0 is zero bias mobility, n is subthreshold swing coefficient, islinearized body effect coefficient and is DIBL coefficient.Thus threshold should be low for high performance but it should be high for low leakage.

In domino logic for gates to remain robust high noise margin is expected. When high threshold voltage transistors are used then high noise margin is achievable because of low leakage. But high threshold transistors cause an increase in delay and hence performance penalty is to be suffered. When low threshold transistors are used then high leakage (especially in wide fan-in domino circuits) tends to reduce the noise margin. To improve the noise margin, two ways are there. One is to compensate for the leakage current and another is to reduce the leakage current.

Compensation is done by the use of keepers. To improve the noise margin, the degradation in dynamic node voltage due to leakage current is compensated. Different keeper architectures are – conditional keeper [3], LCR keeper [4], ratesensing keeper [5][6]. Usage of keeper improves noise margin at the cost of area, delay and power. Also usage of keeper results in increase in leakage current. Here effect of leakage current is compensated for, hence leakage power wastage is as it is and extra power is consumed in the keeper block.

Another way to improve noise margin is to reduce leakage rather than using compensating mechanism for it. All the methods of leakage reduction use high threshold transistors in some way or other. For e.g. leakage is reduced in MTCMOS [7][8] technique by the insertion of high Vth transistor in series with low Vth circuitry. By the application of MTCMOS method only standby leakage power is reduced at the cost of increased area and delay. Dual threshold CMOS technique[8][9] relies on the usage of low threshold transistors in performance critical blocks to meet target clock frequency requirements and high threshold transistors in blocks with delay slacks to minimize overall leakage power. Dual threshold gives leakage power reduction during both standby and active modes without delay and area overhead. But still high threshold transistors can’t be used in logic blocks which are part of critical paths and hence leakage has to be suffered in these paths. The proposed technique facilitates the placement of high threshold transistors in critical and near critical paths with minimum delay penalty.

In the design process, when a design is complete designers typically downsize transistors in the paths close to critical paths to save active power. Downsizing leads to increase in


��谠��-)))

delay and hence the near critical paths may end up becoming critical. Another design technique is to replace transistors in near critical paths by high Vth transistors. These high Vth transistors make the near critical paths slower and again near critical paths may convert to critical [10]. Also the impact of process variations contributes to the spread in delay. The spread in delay of near critical paths which have downsized or high threshold transistors eventually leads to an increase in the number of critical paths. As the number of critical paths in a processor increases, variability increases and the probability of achieving the target frequency that translates to performance drops [11]. Hence it is good to keep sufficient difference in the delays of near critical and critical paths and thereby keep the number of critical paths under check. From the above discussion the proposed delay minimization block finds two uses. It can be used in high threshold near critical paths to reduce their delay and thus prevent them from becoming critical. The second possibility is to use high threshold transistors in critical paths to reduce leakage and use of delay minimization block in conjunction with this to undo the delay penalty caused due to leakage reduction as much as possible.

Fig. 1 shows a wide fan-in AND-OR gate. Wide AND-OR structures form part of read paths of register files and L1 caches. Being wide fan-in the amount of leakage current contributed by them is substantial. Low leakage is to be achieved in these blocks with minimum delay penalty. The proposed technique tries to achieve this objective. Section II describes the proposed technique and simulation results are compiled in section III. Section IV concludes the paper.

II. PROPOSED TECHNIQUE

Fig. 1 is a wide fan-in AND-OR domino gate with 32 branches in the PDN. When inputs are low these branches contribute a lot of leakage current. The proposed design tries to reduce the leakage in the wide fan-in PDN and minimize the delay penalty occurring due to leakage reduction feature.

Figure. 1. Wide fan-in AND-OR domino gate

A. Principle In order to reduce leakage, transistors A1 to ANaremade

high threshold transistors. This mechanism reduces leakage but compromises the performance by increasing the delay. The delay penalty is minimized with the addition of an extra delay minimization block at the dynamic node as shown in Fig. 2.

Delay is measured when PDN is ON, because when PDN is OFF dynamic node voltage remains at the same value.

Delay reduction is achieved by the addition of an extra NMOS supplementary transistor which will work alongside the PDN to discharge the dynamic node. This block is designed in such a manner that it will turn ON only in the evaluation phase and that too when the PDN is ON. This block remains OFF in the precharge phase as well as when PDN is OFF in the evaluation phase. By being ON along with an ON branch of PDN, it supplements this branch of PDN to pull down the voltage of dynamic node from high to low. Two paths working together for the same cause, take less time as against when only one is working. At a time, only one branch of PDN is ON as A1-AN correspond to word line input in a register file.

Figure. 2.Wwide fan-in AND-OR domino gate with A1-AN transistors reverse body biased for leakage reduction along with proposed delay

minimization conceptual block B. Operation

The operation of the proposed technique depicted in Fig. 3 can be divided into two phases- precharge phase and evaluation phase. Precharge phase:In this phase clock is low, turning the precharge transistor ON. Dynamic node Y gets charged to supply voltage during this phase. mclk signal is same as clock signal. During precharge it is also maintained low and hence during precharge no current flows through the CMOS pass transistor. As a result node X is not charged and transistor M1 is OFF.

Figure. 3.Wide fan-in AND-OR domino gate with proposed delay minimization circuitry


��谠��-)))

In precharge CMOS PTL is also OFF as node Y is at high logic level. As during precharge M1 is supposed to be OFF hence node X should be kept clean of any charge in this duration. Cleaning of node X is done by transistor M2 whose gate is driven by mclk_bar signal. mclk_bar is the complement of mclk signal and hence in precharge it is high, turning M2 ON. Evaluation phase: In this phasemclk signal is maintained high. In evaluation phase PDN can be either ON or OFF depending upon the inputs. If the PDN is OFF and leakage is sufficiently low owing to use of high threshold transistors then the node Y is very near to supply voltage. This high voltage at node Y keeps the CMOS PTL OFF and consequently transistor M1 is also OFF. When PDN is ON then node Y starts discharging through the PDN. Also at this point mclk is high implying that current could flow through the CMOS PTL if the PTL is ON. Now as Y discharges a little, soon the VGS of transistor M3 will become more negative than its threshold voltage and M3 will turn ON. At this time mclk_bar is low, turning M2 OFF. Thus node X gets a high voltage which turns ON transistor M1. Transistor M1 by turning ON aids in the discharge of node Y. Thus along with the PDN an extra NMOS is also discharging node Y. Due to this, rate of discharge of node Y increases thereby decreasing the delay. The proposed technique utilizes the duration in which the dynamic node voltage drops from VDD –Vtp to VDD/2, to decrease the delay. VDD-Vtp is the voltage at which the CMOS PTL turns ON. C. Design Methodology

Transistor M1 is sized bigger than the transistors in the PDN to have a sufficiently high rate of discharge. It should be sized judiciously as too big size will lead to increased leakage through it. Transistor M2 should be of minimum size. The task of this transistor is to clean node X when no substantial charge is coming at node X. Hence it does not require high current carrying capacity and consequently its area should be less. Transistor M3 and M4 are sized big so as to have good current carrying capacity and in turn a faster charging of node X. Faster the voltage of node X goes high, faster M1 turns ON and starts discharging the dynamic node.

Till now the mclk signal is considered to be same as the clk signal but some power advantage can be achieved by keeping the ON duration of mclk lesser than the ON duration of clock pulse. The pulse period of clk and mclk remain same. The cutting of ON duration of mclk depends on the time required to discharge the dynamic node. The ON duration of mclk should not be lower than the discharge time of dynamic node Y.

III. SIMULATION RESULTS

Simulations have been performed at 500MHz, 90nm 1V CMOS process. Four wide AND-OR designs have been simulated. They are – A) conventional wide AND-OR domino gate, B) wide AND-OR domino gate with A1-AN transistors reverse body biased for leakage reduction, C) wide AND-OR domino gate with A1-AN transistors reverse body biased for leakage reduction along with proposed delay minimization

block, and D) wide AND-OR domino gate with a rate sensing keeper (RSK).

For these four techniques low to high propagation delay of domino gate ( PLH), delay, active power, power-delay product, leakage current and Unity Gain DC Noise (UGDN) have been compared. The input A1...AN typically are the word line inputs in case of a register file or memory. The inputs B1...BN connect to the bit cells. The dynamic node Y is the bitline. To measure the worst case delay the conditions were – B1=1V, A1=0->1, B2...BN=0V and A2...AN=0V. To measure the leakage current B1...BN=1V and A1...AN=0.2V. UGDN has been calculated using the method in [3]. UGDN is the dc noise level on the inputs of the precharged gate generating an equal level of noise at the output. It quantifies dc noise robustness. Another way of quantifying noise immunity is ANTE[1].

Figure. 4. Wide fan-in AND-OR domino gate with RSK[5]

From table I it is evident that there is a reduction in delay

by 11.91% inwide AND-OR domino gate with A1-AN transistors reverse body biased along with proposed delay minimization block as compared to wide AND-OR domino gate with A1-AN transistors reverse body biased. From the table it is evident that greater advantage is attained in terms of low to high propagation delay ( PLH) at output node i.e. high to low propagation delay at the dynamic node. But this greater advantage is a little averaged out in the calculation of delay. Basically the delay minimization block at the dynamic node tends to increase the evaluation speed when PDN is ON i.e. it reduces the high to low propagation time at the dynamic node. There is a reduction in PLH by 12.5% in wide AND-OR domino gate with A1-AN transistors reverse body biased along with proposed delay minimization block as compared to wide AND-OR domino gate with A1-AN transistors reverse body biased.

TABLE I LOW TOHIGH PROPAGATIONDELAY, DELAY,ACTIVE POWER, PDP, LEAKAGE

CURRENTAND UGDN COMPARISONOFVARIOUSTECHNIQUES

Conv. Domino

Domino with leakage reduction

Domino with leakage and delay reduction (proposed)

Domino with RSK [5]

PLH(ps) 183 259 227 244 Delay (ps) 162.5 214 188.5 197.5


��谠��-)))

Active power( W)

10.7 9.7 11 11.65

PDP(ps* W) 1738 2075 2073 2300 Leakage current( A)

5.04 0.5 2.39 5.78

UGDN 0.24 0.33 0.315 0.3 Fig. 5 shows the waveforms of charging and discharging of

the dynamic node during precharge and evaluation phase when the PDN is ON for all the four techniques. The red box encloses the evaluation phase during one of the cycles. The portion of the waveforms within the red box have different slopes during high to low transition. The steeper the slope lesser is the delay and vice versa. Conventional domino gate has the least delay and the second best value is obtained for the proposed technique.

Power-delay product values remain near to each other in the cases where only leakage reduction mechanism is used and where leakage reduction as well as delay minimization techniques are used.

Leakage current reduces by more than 50% in wide AND-OR domino gate having extra features of leakage and delay reduction in comparison to wide AND-OR domino gate without any of these features. Leakage current is more in circuit where delay and leakage reduction both are used as compared to circuit where only leakage reduction mechanism is used. Transistor M1 is responsible for the extra leakage in the former case.

UGDN has been calculated for all the four designs with the same sizing of transistors in the PDN. The best value of UGDN is obtained in wide AND-OR domino gate with A1-AN transistors reverse body biased as it has the lowest leakage value. The second best value of UGDN is obtained for proposed wide AND-OR domino gate with A1-AN transistors reverse body biased along with delay minimization block.

In RSK technique (Fig. 4) more delay is observed in order to achieve UGDN comparable to that of the proposed technique. The proposed technique scores over RSK technique in terms of performance for the same value of UGDN and leakage current as well.

The proposed technique can be used in critical or near critical paths in the circuit, which traditionally employ low threshold transistors, to save on leakage power with reduced performance penalty as it has lesser delay in comparison to wide AND-OR domino gate with A1-AN transistors reverse body biased i.e. where A1-AN are high threshold transistors.

IV. CONCLUSION

In this paper a delay minimization technique is proposed which can facilitate the use of high threshold transistors in critical paths where traditionally low threshold transistors have been used. The improvement obtained in performance is not at the cost of power delay product. About 10% reduction in delay is achieved when wide AND-OR domino gate has leakage as well as delay reduction features as compared to wide AND-OR domino gates with only leakage reduction mechanism. Leakage has reduced by 50% in the proposed design as compared to conventional wide fan-in domino gate. The

proposed technique has a satisfactory value of UGDN which is better than that of conventional wide AND-OR domino gate and wide AND-OR domino gate with RSK. Further, future work can be diverted towards the leakage reduction of supplementary transistor in the proposed technique without incurring performance penalty.

Figure. 5. Waveforms of all the four methods simulated, showing the charging

and discharging of dynamic node

REFERENCES [1] J.Rabaey, “Low Power Design Essentials”, Springer,ch.2,pp.2, 2009. [2] M.Anis, M.Elmasry, “Multi-Threshold CMOS Digital Circuits –

Managing Leakage Power”, Kluwer, ch.2,pp.11, 2003. [3] A. Alvandpour, R.K.Krishnamurthy, K.Soumyanath and S.Y.Borkar, “ A

sub 130nm conditional keeper technique,” IEEE J. Solid State Circuits, vol.37, no.5, pp.633-638, May 2002.

[4] Y.Lih, N.Tzartanis and W.W.Walker, “ A leakage current replica keeper for dynamic circuits,” IEEE J. Solid State Circuits, vol.42, no., pp.48-55, Jan.2007.

[5] R.G.D. Jeyasingh, N.Bhat and B.Amrutur, “ Adaptive keeper design for dynamic logic circuits using rate sensing technique,” IEEE transactions on VLSI Systems, vol.19,no.2, Feb.2011

[6] R. G. D. Jeyasingh and N. Bhat, “A lowpower, process invariant keeper design for high speed dynamic logic circuits,” in Proc. ISCAS, pp. 1668–1671, 2008.

[7] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J.Yamada, “1-V power supply high-speed digital circuit technology with multi-threshold voltage CMOS,” IEEE J. Solid-State Circuits, vol. 30, pp. 847–854, Aug. 1995.

[8]K.Roy, S.Mukhopadhyay, H.Mahmoodi-Meimand, "Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits," Proceedings of the IEEE , vol.91, no.2, pp. 305- 327, Feb 2003.

[9] L.Wei, Z. Chen, K.Roy, M.C.Johnson, Y.Ye, V.K. De , "Design andoptimization of dual-threshold circuits for low-voltage low-power application,” Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol.7,no.1,pp.16-24,March 1999.

[10]S.Borkar, "Designing reliable systems from unreliable components: the challenges of transistor variability and degradation," Micro, IEEE , vol.25, no.6,pp.10-16,Nov.-Dec.2005.

[11]O.S.Unsal, J.W.Tschanz, K.Bowman, V.De, X.Vera, A.Gonzalez,O.Ergin,"Impact of Parameter Variations on Circuits and Microarchitecture," Micro, IEEE , vol.26, no.6, pp.30-39, Nov.-Dec. 2006.

Documents

[IEEE 2012 International Conference on Computers and Devices for Communication (CODEC) - Kolkata, India (2012.12.17-2012.12.19)] 2012 5th International Conference on Computers and