4
XL -RXIVREXMSREP 'SRJIVIRGI SR 'SQTYXIVW ERH (IZMGIW JSV 'SQQYRMGEXMSR '3()' 1): -))) A Novel Process Variation Tolerant Wide Fan-In Dynamic OR Gate with Reduced Contention VikasMahor VLSI Design Lab, ABV-Indian Institute of Information Technology and Management, Morena Link Road, Gwalior- 474010, Madhya Pradesh, India [email protected] AkankshaChouhan VLSI Design Lab, ABV-Indian Institute of Information Technology and Management, Morena Link Road, Gwalior- 474010, Madhya Pradesh, India [email protected] ManishaPattanaik VLSI Design Lab, ABV-Indian Institute of Information Technology and Management, Morena Link Road, Gwalior- 474010, Madhya Pradesh, India [email protected] Abstract—Register file structures in modern microprocessors usually employ wide fan-in dynamic CMOS OR gates. Weak keepers have been traditionally used to resolve the low noise margin problem of dynamic CMOS design. Scaling trends and process variation issues in CMOS design have reduced the effectiveness of this weak PMOS keeper. On the other hand large sized PMOS keeper used in wide fan-in dynamic OR gate results in contention between the pull down network (PDN) and the keeper. As a consequence of contention there is an unnecessary increase in power dissipation and loss in performance. In this paper a process variation tolerant wide fan-in dynamic OR gate with a new keeper design is proposed which is capable of reducing the contention between the keeper and PDN and hence capable of reducing the power dissipation and delay. Simulation results at 50nm shows that the power dissipation and delay have been reduced by 40% and 35% respectively as compared to the wide fan-in dynamic OR gate with conventional keeper under different levels of process variation. Keywords-Dynamic CMOS logic; Noise immunity; Process variation; Keeper I. INTRODUCTION Wide fan-in dynamic OR gate forms an important structure in the critical path of modern high speed microprocessors [1].But due to the aggressive scaling trends inCMOS design the effects of process variation dominates. Process variation [2], [3] leads to variationin leakage current of gates located in differentregions of a die. In such a situation to maintain appropriate level of noise margin for wide fan-in ORgate a large sized PMOS keeper is used butthislarge size keeper results in large contention between pulldown network (PDN) and the keeper. This contention results in an unnecessary increase inpower dissipation and delay.An effort has been made in this work to make the design process variationtolerant along with the reduction inthe contention resulting in low power dissipation and less delay. Thenovelty of the work is that the proposed dynamic circuit is insensitive to process variation along with drastic reduction in contention current. In this section firstly the importance of wide fan-in dynamic OR gate is explained and then the issues with the wide fan-in dynamic OR gate are discussed. Finally in this section the prior technique is discussed which is used for making the wide fan- in OR gate as process variation tolerant design. A. Importance of wide fan-in OR gate in Register file architecture The high performance ARM Cortex™-A series Processors are used as core processors in almost all the smart devices being used today such as iphone, ipad, and mobile phones [4].In this processor, two register files are deployed in the data path, which are boxed for emphasis. The register files are used almost in each clock cycle, as in order to execute each instruction data should either be read from or written to the register file. Therefore register files forms an important module in high speed microprocessors. Figure. 1(a) Block diagram of a simplified register file and (b) read port implemented using 4 x1 multiplexer (MUX) [1] Fig. 1(a) shows theblock diagram of such a register file consisting of static RAM register, a read and a write port [1]. These ports are implemented using multiplexer and demultiplexer whose implementation is shown in Fig. 1(b). It illustrates a simple 4x1 multiplexer with 4 input lines.Note that actual ARM CORTEX microprocessor wouldbe consisting of 16 or 32 bit register file and hence would need 16 or 32 bit

[IEEE 2012 International Conference on Computers and Devices for Communication (CODEC) - Kolkata, India (2012.12.17-2012.12.19)] 2012 5th International Conference on Computers and

  • Upload
    manisha

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Page 1: [IEEE 2012 International Conference on Computers and Devices for Communication (CODEC) - Kolkata, India (2012.12.17-2012.12.19)] 2012 5th International Conference on Computers and

������������������������������XL�-RXIVREXMSREP�'SRJIVIRGI�SR�'SQTYXIVW�ERH�(IZMGIW�JSV�'SQQYRMGEXMSR��'3()' �������������������������������������������������������1):

���������������������������������������������������谠�����-)))

A Novel Process Variation Tolerant Wide Fan-In Dynamic OR Gate with Reduced Contention

VikasMahor VLSI Design Lab,

ABV-Indian Institute of Information Technology and Management,

Morena Link Road, Gwalior- 474010, Madhya Pradesh, India

[email protected]

AkankshaChouhan VLSI Design Lab,

ABV-Indian Institute of Information Technology and Management,

Morena Link Road, Gwalior- 474010, Madhya Pradesh, India

[email protected]

ManishaPattanaik VLSI Design Lab,

ABV-Indian Institute of Information Technology and Management,

Morena Link Road, Gwalior- 474010, Madhya Pradesh, India

[email protected]

Abstract—Register file structures in modern microprocessors usually employ wide fan-in dynamic CMOS OR gates. Weak keepers have been traditionally used to resolve the low noise margin problem of dynamic CMOS design. Scaling trends and process variation issues in CMOS design have reduced the effectiveness of this weak PMOS keeper. On the other hand large sized PMOS keeper used in wide fan-in dynamic OR gate results in contention between the pull down network (PDN) and the keeper. As a consequence of contention there is an unnecessary increase in power dissipation and loss in performance. In this paper a process variation tolerant wide fan-in dynamic OR gate with a new keeper design is proposed which is capable of reducing the contention between the keeper and PDN and hence capable of reducing the power dissipation and delay. Simulation results at 50nm shows that the power dissipation and delay have been reduced by 40% and 35% respectively as compared to the wide fan-in dynamic OR gate with conventional keeper under different levels of process variation.

Keywords-Dynamic CMOS logic; Noise immunity; Process

variation; Keeper

I. INTRODUCTION Wide fan-in dynamic OR gate forms an important structure

in the critical path of modern high speed microprocessors [1].But due to the aggressive scaling trends inCMOS design the effects of process variation dominates. Process variation [2], [3] leads to variationin leakage current of gates located in differentregions of a die. In such a situation to maintain appropriate level of noise margin for wide fan-in ORgate a large sized PMOS keeper is used butthislarge size keeper results in large contention between pulldown network (PDN) and the keeper. This contention results in an unnecessary increase inpower dissipation and delay.An effort has been made in this work to make the design process variationtolerant along with the reduction inthe contention resulting in low power dissipation and less delay.

Thenovelty of the work is that the proposed dynamic circuit is insensitive to process variation along with drastic reduction in contention current.

In this section firstly the importance of wide fan-in dynamic OR gate is explained and then the issues with the wide fan-in

dynamic OR gate are discussed. Finally in this section the prior technique is discussed which is used for making the wide fan-in OR gate as process variation tolerant design.

A. Importance of wide fan-in OR gate in Register file architecture The high performance ARM Cortex™-A series Processors

are used as core processors in almost all the smart devices being used today such as iphone, ipad, and mobile phones [4].In this processor, two register files are deployed in the data path, which are boxed for emphasis. The register files are used almost in each clock cycle, as in order to execute each instruction data should either be read from or written to the register file. Therefore register files forms an important module in high speed microprocessors.

Figure. 1(a) Block diagram of a simplified register file and (b) read port implemented using 4 x1 multiplexer (MUX) [1]

Fig. 1(a) shows theblock diagram of such a register file

consisting of static RAM register, a read and a write port [1]. These ports are implemented using multiplexer and demultiplexer whose implementation is shown in Fig. 1(b). It illustrates a simple 4x1 multiplexer with 4 input lines.Note that actual ARM CORTEX microprocessor wouldbe consisting of 16 or 32 bit register file and hence would need 16 or 32 bit

Page 2: [IEEE 2012 International Conference on Computers and Devices for Communication (CODEC) - Kolkata, India (2012.12.17-2012.12.19)] 2012 5th International Conference on Computers and

������������������������������XL�-RXIVREXMSREP�'SRJIVIRGI�SR�'SQTYXIVW�ERH�(IZMGIW�JSV�'SQQYRMGEXMSR��'3()' �������������������������������������������������������1):

���������������������������������������������������谠�����-)))

input OR gate. Therefore,wide fan-in OR gate forms an important structure in such a high speed microprocessor. But designing a highly robust wide fan-in dynamic OR gate is a difficult task in sub 100nm regime [6]. Process variation and high contention current are two major factors which make designing the wide OR gate a challenging task as will be discussed in the next section.

B. Process variation and contention current issues in wide fan-in OR gate

Variations in the parameter of a device due to limitations in precision of fabrication process are referredto as process variation [2]. Process variation can be of two types- systematic process variation and random process variation [3]. Random variation affects the device individually and has no correlation with the nearby devices whereas in systematic process variation the nearby devices are affected in same manner and there is a high similarity in the affected parameters of adjacent devices. In the proposed design we are considering only the systematic process variation.

Systematic process variation has a pronounced effect on the wide fan-in dynamic OR gate. In wide fan-in dynamic OR gate the systematic process variation results in variation of the threshold voltage of NMOS transistors in the pull down network. In turn this threshold voltage variation results in variation of the leakage current through the pull down network. Hence it becomes very difficult to maintain a particular value of noise margin [6].One solution to this problem is that a large size PMOS keeper can be employed at the dynamic node which can compensate for any variation in the leakage current by the pull down network. But this results in large contention between the pull down network and the keeper [1].

Such a contention results in unnecessary increase in delay and static power dissipation.

C. A process variation tolerant technique for wide fan-in OR gate

In the literature various keeper designs have been proposed to addresses the process variation issue [1], [5].Aneffectivevariation tolerant keeper architecture is proposed in [1]. This technique has been used in the proposed design to achieve process variation tolerance.

This design achieves process variation tolerance and contention current has been reduced as compared to conventional keeper design since a large size keeper is not used here. But still there is some amount of contention current flowing through the keeper when one of the inputs in the pull down network becomes high. To illustrate this contention current a 16-input process variation tolerant dynamic OR gate has been simulated in worst case delay condition i.e. with one of the input as logic ‘1’. This simulation is shown in Fig. 2 for 50nm technology and at 1.6 GHz of frequency. The rest of simulation setup has been described in section III.

The high level clock signal shows the evaluation phase while a low level clock signal shows the precharge phase. First waveform shows the contention current flowing through the keeper M1. Simulation result clearly reveals that the contention current is of the order of 16 A during the initial period of

evaluation phase. This contention current unnecessarily increases static power dissipation of the design.

Figure. 2. Simulation result of the process variation tolerant technique [1]showing the contention current through keeper and the clock signal.

II. PROPOSED TECHNIQUE

The proposed design aims at reducing the unwanted contention current illustrated in Fig. 5 for wide fan-in OR gate, along with achieving process variation tolerance. In this section design, analysis and operation of theproposed technique is described.The proposed process variation tolerant wide fan-in OR gate with reduced contention is shown in Fig. 3.

Figure. 3.Proposed keeper design for wide fan-in OR gate.

A. Design and Analysis: In the conventional keeper design the keeper was operated

by sensing the output node. In the proposed design the keeper operates by sensing the dynamic node and the clock. Instead of using a single keeper this design uses a process variation coupled keeper M3 and a fixed keeper M1. Variation coupled keeper is responsible for compensating the variable leakage current due to process variation, hence achieving process variation tolerance while fixed keeper maintains a particular level of noise margin.The process variation sensor makes use

Page 3: [IEEE 2012 International Conference on Computers and Devices for Communication (CODEC) - Kolkata, India (2012.12.17-2012.12.19)] 2012 5th International Conference on Computers and

������������������������������XL�-RXIVREXMSREP�'SRJIVIRGI�SR�'SQTYXIVW�ERH�(IZMGIW�JSV�'SQQYRMGEXMSR��'3()' �������������������������������������������������������1):

���������������������������������������������������谠�����-)))

of DIBL effect to sense the process variation by operating the NMOS M2 in its sub threshold region. Whenever there is a change in threshold voltage of the M2 due to process variation then as a result of DIBL effect there is a corresponding change in voltage at the drain terminal VDIBL. This voltage at the drain terminal in turn controls the variation coupled keeper.

The basic principle used here to reduce the contention current is to keep the keeper OFF in the duration when there are chances of contention between pull down network and keeper. Such an operation is achieved by delaying the clock by a sufficient duration and operating the keeper with this delayed clock. This ensures that there are no chances of contention current flowing through the keeper. This delay is obtained by a buffer shown in the proposed design.The delayed clockobtained from the buffer is supplied to the gates of PMOS M5 and NMOS M6.

From Fig. 2the approximate duration of contention current can be easily obtained. To avoid contention issue the buffer transistors are designed such that the buffer has a delay less than or equal to this duration. An important design constraint is that the delay of the buffer should not exceed this duration, since on exceeding this limit the keeper will remain OFF when needed during the evaluation phase and will degrade the noise margin.

B. Operation of the Proposed Design: Similar to conventional dynamic design the proposed design

has two phases of operation pre-charge phase when the CLOCK signal is low and evaluation phase when the clock signal is high. In the pre-charge phase transistor M5 is ON and M6 is OFF

ensuring that the keeper M1 is OFF during the pre-charge phase when it is not needed. During the evaluation phase M5 remains ON and becomes OFF only after a delay obtained from the buffer so that keeper remains OFF during the evaluation phase. Now if any one of the input becomes high then the dynamic node slowly discharges and the keeper M1 remains OFF during this period ensuring no contention current. On complete discharge of the dynamic node M7 becomes OFF

keeping keeper OFF in the rest of evaluation phase. In another scenario when all the inputs are high in the evaluation phase, the M7 remains ON while M6 becomes ON after some delay from the buffer. This operation switches ON the keeper M1 immediately for the rest of the evaluation phase when it is needed. Transistor M2, M3 and M4 is dedicated towards making the design process variation tolerant. The design and operation of such a process variation tolerant design is explained in [1]. Such an operation of proposed design is oriented towards

achieving process variation tolerance and reduced contention as illustrated by simulation results in section III.

III. SIMULATION RESULTS To study the relative performance of the proposed design in

comparison with the prior keeper designs a 16-input wide fan-in OR gate is implemented using the proposed technique. For the comparison purpose two 16-input wide fan-in OR gate are also designed with the conventional keeper technique and the

process variation tolerant technique [1] having same transistor sizes and specifications. Only the effect of systematic process variation has been considered in this work and hence the threshold voltage has been varied in the range from Vth0 -3 to Vth0 +3 for both nMOS and pMOS. Where Vth0and are the mean value and standard deviation of the threshold voltage respectively.The bias circuitry of variation coupled keeper M2 used in proposed design (Fig.2) and process variation tolerant design [1] has been designed with IREF = 5 A and VBIAS = 0.2 V. The simulation has been done with the help of Tanner SpiceV13.0 simulator using BPTM 90nm technology with supply voltage of 1V at room temperature.The 16- input dynamic OR gate is implemented for an ARM Cortex-A9 microprocessor as explained in section I, therefore the operating frequency of the implemented design should match with the application. ARM Cortex-A series processors has a maximum operating frequency of 1.6 GHz [4] and hence the operating frequency of implemented 16-input dynamic OR gate is 1.6 GHz.The sizing of the transistors have been done such that the Unity Gain DC Noise (UGDN) [1] of 0.2 V is maintained under the process variation of = 5% of Vth0.

To illustrate the reduction in contention current the circuit has been simulated in the worst case delay conditionwhen one of the input in the PDN is logic ‘1’. Under this condition the current through the keeper has been measured and illustrated in Fig. 4.

On comparing the graphs of Fig. 2 with Fig. 4 it can be observed that the contention current has been reduced from 16

A to 3 A. This comparison shows that there is a drastic amount of reduction in contention current by almost 4.5 times as compared to process variation tolerant design [1].

Fig. 5 shows that the proposed design has the lowest power dissipation with process variation as compared to the conventional keeper design and the process variation tolerant design.

Figure. 4. Simulation result of the proposed technique showing the reduced contention current through keeper.

For comparison, power dissipation of all the power sources has been taken into consideration. As seen in Fig. 4 the contention current has been reduced which has a direct effect on reduction of power dissipation in the proposed design. Fig.

Page 4: [IEEE 2012 International Conference on Computers and Devices for Communication (CODEC) - Kolkata, India (2012.12.17-2012.12.19)] 2012 5th International Conference on Computers and

������������������������������XL�-RXIVREXMSREP�'SRJIVIRGI�SR�'SQTYXIVW�ERH�(IZMGIW�JSV�'SQQYRMGEXMSR��'3()' �������������������������������������������������������1):

���������������������������������������������������谠�����-)))

6 shows that the proposed design has the lowest delay with process variation as compared to the conventional keeper design and the process variation tolerant design [1].Such a reduction in delay is due to reduction in contention current.

Figure.5.Power dissipation comparison of the Conventional keeper, Process variation tolerant design and Proposed design at 50nm.

Figure.6.Delay comparison of the Conventional keeper, Process variation tolerant design and Proposed design at 50nm.

To further illustrate the process variation tolerance in the

proposed design monte carlo simulations have been done on 500 samples of threshold voltages to obtain the delay distribution as shown in Fig. 7. It is clear from monte carlosimulations that that the proposed design has lowest delay and lowest delay deviation as compared to conventional keeper technique and process variation tolerant technique in [1].

Low delay variation in the process variation tolerant design [1] and the proposed design shows that both the designs are tolerant to process variation.

(a) (b)

(c)

Figure.7.Delay distribution of the Conventional keeper (a), Process variation tolerant design (b) and Proposed design (c) obtained from monte carlo

simulation.

IV. CONCLUSION Wide fan-in dynamic OR gates implemented in high speed

microprocessors such as ARM Cortex microprocessors has two major issues with it- process variation and contention current. The proposed design has achieved process variation tolerance along with reduction in contention current which has been illustrated in the simulation results. Proposed design has almost 25 % and 35 % reduction in delay as compared to process variation tolerant design [1] and conventional keeper design respectively for entire process variation range. Similarly the proposed design has 33 % and 40% reduction in power dissipation as compared to the process variation tolerant design [1] and conventional keeper design respectively. Furthermore the Monte Carlosimulation shows that proposed design has lowest delay and the lowest delay deviation. Hence the proposed technique can be the most effective technique to achieve process variation tolerance and reducing the contention issue for wide fan-in dynamic OR gate.

REFERENCES

[1] H. F. Dadgour and K. Banerjee “A Novel Variation-Tolerant Keeper Architecture for High-Performance Low-Power Wide Fan-In Dynamic OR Gates” IEEE transaction on VLSI systems, vol.18, NO. 11, pp. 1567 - 1577 , Nov 2010.

[2] K.J.Kuhn, M.D.Giles,D.Becher, P.Kolar, A.Kornfeld, R.Kotlyar, A.Maheshwari,S.Mudanai, "Process Technology Variation," Electron Devices, IEEE Transactions on , vol.58, no.8, pp.2197-2208, Aug. 2011

[3] S Borkar, "Designing reliable systems from unreliable components: the challenges of transistor variability and degradation," Micro, IEEE , vol.25, no.6, pp. 10- 16, Nov.-Dec. 2005

[4] J.Koppanalil, G.Yeung, D.O'Driscoll,S.Householder,C. Hawkins, "A 1.6 GHz dual-core ARM Cortex A9 implementation on a low power high-K metal gate 32nm process," VLSI Design, Automation and Test (VLSI-DAT), 2011 International Symposium on , vol., no., pp.1-4, 25-28 April 2011

[5] RakeshGnana David Jeyasingh, NavakantaBhat, and BharadwajAmrutur, “Adaptive Keeper Design for Dynamic Logic Circuits Using Rate Sensing Technique,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 2, pp. 295-204,Feb. 2011.

[6] ManishaPattanaik, Fazal Rahim ,Muddala V D L Varaprasad, “Improvement of Noise Tolerance Analysis in Deep-submicron Dynamic CMOS logic circuits”, IEEE International Conference of Electronic Devices Systems, Page(s): 48 – 53, Year: 2010.