12
5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clock generation circuit in 0.18-lm CMOS Si Shi Zhigong Wang Changchun Zhang Peng Miao Lu Tang Received: 16 August 2010 / Revised: 4 March 2012 / Accepted: 28 March 2012 / Published online: 4 May 2012 Ó Springer Science+Business Media, LLC 2012 Abstract A 5 Gb/s 2:1 full-rate multiplexer (MUX) has been designed and fabricated in SMIC 0.18-lm CMOS process. A clock generation circuit (CGC) is also integrated to provide the MUX with both 2.5 and 5-GHz clock signals. The CGC is realized by a clock and data recovery (CDR) loop with a divide- by-2 frequency divider embedded in, where the two required clocks are obtained after and before the divider, respectively. In addition, the phase relation between data and clock is assured automatically by CDR feedback loop and the precise layout. The whole chip area is 812 9 675 lm, including pads. At a single supply voltage of 1.8 V, the total power consumption is 162 mW with an input sensitivity of \ 25 mV and a single- ended output swing of above 300 mV. And due to the full-rate architecture, the pulse width distortion (PWD) with multi- plexed data is removed. The measured results also show that the circuit can work reliably at any input data rate between 2.46 and 2.9 Gb/s without need for external components, reference clock, or manual phase alignment between data and clock. Keywords Multiplexer Á Clock generation circuit Á Clock and data recovery Á Phase/frequency detector Á Ring voltage-controlled oscillator Á Pulse width distortion 1 Introduction MUXs, also called as serializers, are one of the critical blocks in both long-haul optical communication and short-distance I/O transmission. By serializing multiple low-speed data into one or few high-speed data streams, the high-speed link can be fully exploited as possible, and then the cost, space and so on, can be saved. There exist two categories of MUXs: half-rate [13] and full-rate [4, 5]. As for a half-rate MUX, the required highest clock frequency is only half of the baud rate of multiplexed data. It has such advantages as simple archi- tecture, low complexity and low power consumption, but a notorious, inherent shortcoming, that is, the PWD cannot be removed. The PWD mainly results from both the duty cycle distortion of the half-rate clock and non-perfect phase relation between clock and data around the last selector in the half-rate MUX. A full-rate MUX consists of a half-rate MUX followed by a data decision circuit, and the latter is triggered by a full-rate clock, which means the clock frequency is equal to the baud rate of multiplexed data, and regenerates the output data with PWD, multiplexed by the half-rate MUX. Because the data decision circuit is single-edge-sensitive and the output data is independent of the duty cycle of the trigger clock, the PWD can be removed and the excellent output eye-diagram can be obtained. However, the extra high-speed decision circuit and related full-rate clock buffers will incur higher power consumption and higher complexity. For an ordinary MUX [1, 2], usually, besides some simple dividers for frequency division, only data-related blocks are included. The required highest frequency clock is provided externally, and the lower frequency clock required for the former stages in a multiple-stage MUX is realized by a string of frequency dividers. Additionally, the initial proper phase relation between clock and data is assured by some techniques outside the MUX IC, for example, FIFO access, manual adjustment, and so on. S. Shi Á Z. Wang Á C. Zhang Á P. Miao Á L. Tang (&) Institute of RF- & OE-ICs, Southeast University, Nanjing 210096, China e-mail: [email protected] Z. Wang e-mail: [email protected] 123 Analog Integr Circ Sig Process (2012) 72:469–480 DOI 10.1007/s10470-012-9859-y

5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clock generation circuit in 0.18-μm CMOS

  • Upload
    lu

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Page 1: 5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clock generation circuit in 0.18-μm CMOS

5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clockgeneration circuit in 0.18-lm CMOS

Si Shi • Zhigong Wang • Changchun Zhang •

Peng Miao • Lu Tang

Received: 16 August 2010 / Revised: 4 March 2012 / Accepted: 28 March 2012 / Published online: 4 May 2012

� Springer Science+Business Media, LLC 2012

Abstract A 5 Gb/s 2:1 full-rate multiplexer (MUX) has been

designed and fabricated in SMIC 0.18-lm CMOS process. A

clock generation circuit (CGC) is also integrated to provide the

MUX with both 2.5 and 5-GHz clock signals. The CGC is

realized by a clock and data recovery (CDR) loop with a divide-

by-2 frequency divider embedded in, where the two required

clocks are obtained after and before the divider, respectively.

In addition, the phase relation between data and clock is assured

automatically by CDR feedback loop and the precise layout.

The whole chip area is 812 9 675 lm, including pads. At a

single supply voltage of 1.8 V, the total power consumption is

162 mW with an input sensitivity of \25 mV and a single-

ended output swing of above 300 mV. And due to the full-rate

architecture, the pulse width distortion (PWD) with multi-

plexed data is removed. The measured results also show that the

circuit can work reliably at any input data rate between 2.46 and

2.9 Gb/s without need for external components, reference

clock, or manual phase alignment between data and clock.

Keywords Multiplexer � Clock generation circuit �Clock and data recovery � Phase/frequency detector �Ring voltage-controlled oscillator � Pulse width distortion

1 Introduction

MUXs, also called as serializers, are one of the critical

blocks in both long-haul optical communication and

short-distance I/O transmission. By serializing multiple

low-speed data into one or few high-speed data streams, the

high-speed link can be fully exploited as possible, and then

the cost, space and so on, can be saved.

There exist two categories of MUXs: half-rate [1–3] and

full-rate [4, 5]. As for a half-rate MUX, the required

highest clock frequency is only half of the baud rate of

multiplexed data. It has such advantages as simple archi-

tecture, low complexity and low power consumption, but a

notorious, inherent shortcoming, that is, the PWD cannot

be removed. The PWD mainly results from both the duty

cycle distortion of the half-rate clock and non-perfect phase

relation between clock and data around the last selector in

the half-rate MUX.

A full-rate MUX consists of a half-rate MUX followed

by a data decision circuit, and the latter is triggered by a

full-rate clock, which means the clock frequency is equal to

the baud rate of multiplexed data, and regenerates the

output data with PWD, multiplexed by the half-rate MUX.

Because the data decision circuit is single-edge-sensitive

and the output data is independent of the duty cycle of the

trigger clock, the PWD can be removed and the excellent

output eye-diagram can be obtained. However, the extra

high-speed decision circuit and related full-rate clock

buffers will incur higher power consumption and higher

complexity.

For an ordinary MUX [1, 2], usually, besides some

simple dividers for frequency division, only data-related

blocks are included. The required highest frequency clock

is provided externally, and the lower frequency clock

required for the former stages in a multiple-stage MUX is

realized by a string of frequency dividers. Additionally, the

initial proper phase relation between clock and data is

assured by some techniques outside the MUX IC, for

example, FIFO access, manual adjustment, and so on.

S. Shi � Z. Wang � C. Zhang � P. Miao � L. Tang (&)

Institute of RF- & OE-ICs, Southeast University,

Nanjing 210096, China

e-mail: [email protected]

Z. Wang

e-mail: [email protected]

123

Analog Integr Circ Sig Process (2012) 72:469–480

DOI 10.1007/s10470-012-9859-y

Page 2: 5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clock generation circuit in 0.18-μm CMOS

Figure 1 shows the block diagram of a conventional 2:1

full-rate MUX.

Although some on-chip techniques [3, 5] have been

presented to alleviate the clock generation problems, for

example, a clock multiplication unit (CMU) to reduce the

externally reference clock frequency, and delay locked

loop to automatically adjust the phase relation, they cannot

eradicate these problems. Especially, in some special

occasions where no reference clock can be obtained, then

some new techniques must be devised for a MUX or a

transmitter.

This paper presents a 5 Gb/s full-rate 2:1 MUX in SMIC

0.18-lm CMOS process, and a CGC is also integrated to

provide the MUX with the desired clock signals. The MUX

IC can work properly, if only the supply is applied, and it’s

fully-integrated and fully-automatic, so no any external

components, reference clock or manual adjustment are

needed.

This circuit works well with two ideal data inputs, even

while two data inputs have jitter or some skew. The CGC is

actually part of clock and data recovery (CDR) circuit,

which can restrain the input signal jitter, so the output

clock signal pulse jitter/phase noise performance will not

be influenced much by input signal jitter. While two inputs

have skew, the 2:1 full rate MUX can multiplex and output

right results, because the data decision circuits in MUX

have enough sample margin. However, the best sample

point is at the middle of the data pulse.

The paper is organized as follows: Sect. 2 describes the

whole architecture of the proposed MUX integrated with a

CGC. The circuit design is detailed in Sect. 3. The simu-

lation results and measurement results are given in Sects. 4

and 5, respectively. And Sect. 6 concludes the design.

2 Architecture description

An architecture block diagram of the proposed 2:1 MUX

integrated a CGC is shown in Fig. 2. The whole MUX IC is

mainly comprised of two parts: a full-rate 2:1 MUX and a

CGC. Due to the advantages mentioned above, the full-rate

MUX is chosen instead of the half-rate one. However,

besides the 2.5 GHz clock for half-rate MUX, another

5 GHz clock is required for the data decision unit. The

required clocks and the phase relation are achieved by the

CGC.

The CGC is realized by a full-rate CDR loop with a

divided-by-2 frequency divider embedded in, where two

required clocks, 2.5 and 5 GHz, are obtained after and

before the divider, respectively. The full-rate CDR is

selected instead of the popular sub-rate one [6, 7]. The

reason is that the CDR is usually used with a demultiplexer

(DeMUX) in a receiver, where the speed of the input data

is the highest part of the whole circuit, and the recovered

lower frequency is consistent with the requirement of

decreased frequency clock from the DeMUX. Moreover,

the sub-rate phase detector (PD) in the sub-rate CDR can

carry out part works of the DeMUX. Whereas, in our

special occasion, the CDR is used with a MUX instead, in

which the speed of input data is lower, and higher fre-

quency clock is desired by a MUX, so full-rate CDR has

advantages.

Fig. 1 Block diagram of a conventional 2:1 full-rate MUX

Fig. 2 Block diagram of the proposed 2:1 full-rate MUX integrated

with a CGC

Fig. 3 Block diagram of the PFD

470 Analog Integr Circ Sig Process (2012) 72:469–480

123

Page 3: 5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clock generation circuit in 0.18-μm CMOS

In fact, as shown in Fig. 2, the CGC is somewhat like a

simple CMU, except that the reference signal is random

data instead of a periodic clock signal. A 2.5 GHz VCO in

a traditional full-rate CDR is replaced by a 5 GHz VCO

followed by a divide-by-2 frequency divider to generate

both 2.5 and 5 GHz clocks.

As for the proper relation between clock and data, the

two 2.5 Gb/s input data with 2.5 GHz recovered clock are

guaranteed automatically and dynamically by the CDR

loop under Process–Voltage–Temperature (PVT) varia-

tions, and the precondition is that the skew between two

data is not too much, which is easily met because of the

lower data speed and shorter transmission distance for

the input data of the MUX, contrary to its output data. The

phase relation of the latter stages of that MUX is assured by

precise delay control, layout, and post-simulation, based on

the initial phase alignment mentioned above. The data

decision-related circuit should be designed carefully

because of its high operation speed.

According to Fig. 2, two 2.5 Gb/s data with less skew

are input, and then multiplexed 5-Gb/s data and 5-GHz

clock are produced, where the clock is used to test the CDR

loop or for source synchronous data communication

accompanying the multiplexed data.

3 Circuit design

The CMOS current-mode logic (CML), i.e. source-coupled

logic (SCL), is widely employed across the whole circuit,

because of its advantages such as small internal voltage

swing, reduced time jitter and crosstalk, and good common

mode suppression characteristics. Both of the input and

output buffers are included for the sake of the circuit

integrity.

3.1 Clock generation circuit (CGC)

As shown in Fig. 2, the CGC loop is comprised of a

Pottbacker PFD [8], a V/I (Voltage-to-Current) converter,

a loop filter, a 5-GHz ring voltage controlled oscillator, and

Fig. 4 Circuit diagram of the

VCO cell

Fig. 5 Block diagram of the digital frequency divider

Analog Integr Circ Sig Process (2012) 72:469–480 471

123

Page 4: 5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clock generation circuit in 0.18-μm CMOS

a 5-GHz divider. Due to the integral characteristics asso-

ciated with the V/I and loop filter circuits, the loop belongs

to type II, so the DC gain is infinite and the static phase

error or skew between the input data (Din0) and extracted

clock approaches zero. In other words, the sampling clock

approaches the optimum position, that is, the middle of the

data UI.

Firstly, the PFD compares the input data signal with the

I/Q (In_phase/Quarter_phase) clock signals from the fre-

quency divider, and generates the two voltage signals,

which reflect the differences of frequency and phase,

respectively, between data and clock. Secondly, the voltage

signals are converted into a current signal by V/I converter

circuit. Thirdly, the control signal generated by the loop

filter controls the oscillator to produce the clock signal.

Finally, the 5-GHz clock signal is divided and then fed

back to the PFD.

In contrast to linear PDs, bang–bang PDs have the

unique advantages [7, 9] of robust operation, high adapt-

ability to multi-phase sampling structures, operating at the

highest speed at which a process can make a working flip-

flop, and so on. So a Pottbacker type bang–bang PD is

employed in the CGC.

The Pottbacker PFD is used to improve the loop capture

bandwidth. As a trade-off between power consumption,

chip area, operation frequency range, I/Q phase precision,

phase noise, and so on, an inductorless ring VCO is

selected. A ring VCO is also popular in Bang–Bang loop

because of smaller tuning delay compared to an LC VCO

Fig. 6 Circuit diagram of V/I converter and loop filter

Fig. 7 Block diagram a of and timing diagram b of the full-rate

MUX

Fig. 8 Circuit diagram of latch

Fig. 9 Circuit diagram of SEL

472 Analog Integr Circ Sig Process (2012) 72:469–480

123

Page 5: 5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clock generation circuit in 0.18-μm CMOS

[7, 9]. And, its high phase noise can be repressed owing to

Bang–Bang loop large bandwidth.

The loop with PFD can increase the pull-in range of the

PLL markedly, and reduce the acquisition time compared

with those only with PD. Figure 3 shows the diagram of the

PFD. It is composed of a PD, a QPD which is identical to

PD, and a frequency detector (FD). The PD is, in fact, a

double-edge-triggered flip-flop (DETFF) including two

latches and a selector. The main difference of the FD from

the PD is that the selector used in the FD is a modified

version from that one in the PD, which can produce a

ternary output. Triggered on both edges by the input data,

both of the PD and QPD sample the I/Q clocks, respec-

tively, to produce two I/Q beat notes, based on which, the

FD produce frequency difference signal.

The output of the FD remains on the third state when the

frequency acquisition is completed. Then the PD takes over

the control of the loop, and the tracking stage starts.

Figure 4 shows the circuit diagram of the 5-GHz 3-stage

VCO. The current-folding technique [10] is adapted to

alleviate the conflict between the voltage headroom and the

sensitivity of the VCO. The SCL differential circuit with

cross-coupled transistor pair is used in each delay cell of

the oscillator. The cross-coupled transistor pair exhibits a

negative resistance, whose value can be controlled by the

bias current. Thus, the frequency of the oscillator is also

altered when the control voltage changes the bias current.

Meanwhile, the existence of the cross-coupled transistor

pair sharpens the oscillation waveform edges and then

improves the phase noise performance of the VCO.

PMOS transistors, instead of resistors, are used as the

loads, because the latter are hard to precisely control during

fabrication. As shown in Fig. 4, two current sources are

added in each VCO delay cell, in order to prohibit ceasing

of the oscillation and improve the linearity of the VCO

characteristic.

Usually, there are three types of dividers which are

widely used in IC designs, those are, the digital divider

based on flip-flop, the analog regenerate divider, and the

injection-locked divider. Although the latter two can

operate at a higher frequency, the operation range is much

narrower. Unlike the even-stage ring VCO which has the

inherent capacity to generate I/Q clocks, the latter two

dividers are usually able to generate I/Q clocks after some

improvements are made, esp. for LC version dividers.

Therefore, the digital divider shown in Fig. 5 is selected.

The wider operation range and I/Q clocks are highly

desired here, in consideration of wider pull-in range of the

loop, PVT variations, and requirement of the PFD shown in

Fig. 3.

The implementation of the V/I converter and loop filter

is shown in Fig. 6. Actually, the V/I converter is an adder

with a single-ended high-impedance output. Both the phase

error signal from the PD and the frequency error signal

from the FD are applied to two input ports, and an output

signal is generated to tune the VCO by the loop filter. The

gate lengths of the transistors in the V/I circuit are

increased in order to minimize the channel-length modu-

lation effects.

The second-order loop filter used is entirely passive,

consisting only of a passive resistor and two MOS capac-

itors. The inherent low phase offset from the type II loop

based on V/I circuit allows this simple filter, which con-

sumes less power and provides better loop stability, since

no higher order poles exist, contrary to that when an

op-amp is used [11]. Due to the low capacitance density of

the MIM capacitors of \1 fF/lm2, MOS capacitors, about

8 fF/lm2, are used as capacitors to reduce the chip area,

especially for the large capacitor C, shown in Fig. 6.

Because no MOS capacitors are provided in the pro-

cess, the drain-source shorted MOS transistors are used

instead.

3.2 Multiplexer circuit

Just as mentioned above, shown in Fig. 7(a), the MUX

includes a 2:1 half-rate MUX and a data decision circuit

based on a D flip-flop. The 2:1 half-rate MUX is composed

Fig. 10 Circuit diagram of

output buffer

Analog Integr Circ Sig Process (2012) 72:469–480 473

123

Page 6: 5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clock generation circuit in 0.18-μm CMOS

of 5 latches, a selector, and a clock delay buffer. It can be

seen in the figure that the former two latches in each

channel regenerate and retime the two input data. By a

more latch in second channel, half the clock cycle delay

can be obtained later than that in the first one. And the

clock signal is delayed by one quarter 2.5 GHz clock

period before the selector, which will give the optimum

phase margin for the proper operation of the selector.

Figures 8 and 9 show the circuit diagram of latch and

SEL, respectively. The latch consists of two parts, reading

differential pair on the left, holding differential pair on the

right. Traditionally, the latches can be divided into two

Fig. 11 Simulation waveforms

474 Analog Integr Circ Sig Process (2012) 72:469–480

123

Page 7: 5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clock generation circuit in 0.18-μm CMOS

categories, dynamic and static, based on the relative MOS

transistor size between two parts of differential pair. When

the holding differential pair is smaller than the reading

differential pair by some margin, the latch is called

dynamic latch, otherwise the static latch. It’s generally

believed that the dynamic latch can work at higher fre-

quency, but it has a narrower operation band, and internal

logic waveform is not as good enough as those in static

latch, which will increase BER in practical, noisy envi-

ronment. In order to achieve better internal wave, the

holding differential pair is not decreased, but increased in

this work. The ratio of the reading differential pair gate

width to the holding differential pair gate width is set to 3:4

through extensive simulation optimization [12].

The data decision circuit is similar to the divider shown

in Fig. 5, just without the feedback loop. Additionally, the

latches and selectors used throughout the IC are the same

with those shown in Figs. 8 and 9. However, different

MOS transistor sizes are adopted in different blocks

according to the different operation requirements.

3.3 Output buffer

Output buffers are added for both data and clock outputs,

so the circuit is able to drive the test instrumentation which

has 50-X input impedance, and then the circuit can be

evaluated. The output buffer, as shown in Fig. 10, is

composed of three stage CML buffers. Also, the same

CML buffer structure is used for data input buffers.

Both impedance matching and power consumption

should be considered for the resistor selection of the last

stage. No reflection is incurred when the 50-X load resis-

tors are employed, but it will cost too much current to

generate a sufficient output swing, e.g. 400 mV. If large

load resistors are used to save the power, serious reflection

will happen, and then thus the signal quality will degrade.

As a tradeoff between impedance matching and power

consumption, a pair of 100-X load resistors is adopted in

the last stage of the output buffer.

4 Simulation analysis

Figure 11 shows the respective simulated waveforms for

the CGC loop when fini \ 5 GHz and fini [ 5 GHz (fini is

the initial frequency of the VCO). In both figures, the top

waveform (PDoutput) is the QPD output; the second

waveform (FDoutput) is the FD output; the bottom one is

the tuning voltage of the VCO. It can be seen that during

the acquisition stage the FD works, and during the tracking

stage the FD remains on the third state while only the PD

operates, just as discussed above.

The output eye-diagrams of extracted clock multiplexed data

are shown in Fig. 12(a), (b), respectively. Figure 12(c) verifies

the function of the multiplexer.

Figure 13(a), (b) show the multiplexed output waveforms

when two non-aligned input data with 150 ps (3/8 Tb) delay

Fig. 12 Simulation results: a clock signal b data eye-diagram

c function verification

Analog Integr Circ Sig Process (2012) 72:469–480 475

123

Page 8: 5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clock generation circuit in 0.18-μm CMOS

skew are applied. And the simulation with -150 ps skew

input data are shown in Fig. 13(c), (d).

Figure 14 shows the multiplexed output waveforms and

eye diagram in another non-ideal condition, where the first

stream of data has pulse width distortion issue. Here,

because pseudo-random data is not easy to control the

desired degree of distortion, the periodical pulse signal

with duty cycle distortion is used instead as the first stream

of input data in simulation.

Simulation shows the circuit can tolerate above ±37.5 %

delay skew and[10 % input duty cycle distortion.

5 Measurement results

The chip was designed and fabricated in SMIC 0.18-lm

CMOS technology. As shown in Fig. 15, the whole IC

including pads occupies an area of 812 9 675 lm.

In order to test the performance, the chip was bonded to the

printed circuit board by gold bonding wires as shown in

Fig. 16. 100-X differential impedance coupled microstrips are

applied for the PCB layout and both acute angles and right

angles are removed, in order to avoid signal reflections.

An Advantest D3186 pulse pattern generator, an Agilent

86100A Infinium DCA wide-bandwidth oscilloscope, and

an E4440a digital spectrum analyzer were employed.

The measured VCO tuning characteristic curve was

shown in Fig. 17. The VCO achieves a tuning range of

[800 MHz, from 4.92 to 5.8 GHz. However, because of

the process variation, the measured mid-frequency is about

400 MHz higher than the results of post-simulation. And

the frequency of divider output, about 2.46–2.9 GHz, is

half of that from the VCO.

Figure 18 shows the measured waveform, phase noise,

and the spectrum of the 5-GHz recovered clock in response

to a 2.5-Gb/s pseudo-random bit sequence of length 231-1,

from a 1.8 V supply. It can be seen that the recovered clock

has an RMS jitter of 2.5 ps, a phase noise of -109.6 dBc/Hz

at 100-kHz offset (-115.8 dBc/Hz at 1-MHz offset), and

so on.

Fig. 13 Simulation results of the multiplexer with non-aligned input data. The output data eye diagram a when two input data with 150 ps skew

b are applied; Accordingly, the occasion with -150 ps skew can be observed in c, d, respectively

476 Analog Integr Circ Sig Process (2012) 72:469–480

123

Page 9: 5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clock generation circuit in 0.18-μm CMOS

Figure 19 shows the measured 5-Gb/s multiplexed sin-

gle-ended eye diagram with an RMS jitter of 5.6 ps when

two 2.5-Gb/s data streams were applied to input ports.

Actually, the pull-in range of the CGC loop is limited by

the available operating range of the VCO which is

approximately between 4.92 and 5.8 GHz. So, the loop can

work properly with any input data rate between 2.46 and

2.9 Gb/s, the output clock frequency is between 4.92 and

5.8 GHz, and the output bit rate of MUX is between 4.92

and 5.8 Gb/s. Two examples are shown in Figs. 19 and 20.

The overall performance of the proposed 5 Gb/s fully-rate

MUX integrated with a CGC is summarized in Table 1.

6 Conclusions

A 5-Gb/s full-rate 2:1 MUX with an on-chip integrated

CGC has been designed and fabricated in SMIC 0.18 lm

Fig. 14 Output data eye diagram (b) of the multiplexer when the

input data with 10 % pulse width distortion (a) is applied

C

VCO DIVV/IPFD

CE

MUX

812µm

675µm

Fig. 15 Chip photograph of the whole IC

Fig. 16 PCB for testing

Fig. 17 Measured VCO tuning characteristic curve

Analog Integr Circ Sig Process (2012) 72:469–480 477

123

Page 10: 5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clock generation circuit in 0.18-μm CMOS

CMOS process. The whole IC has an area of

812 9 675 lm. Under a 1.8 V supply voltage, the IC

consumes a DC power of about 162 mW with an input

sensitivity of\25 mV and an output single-ended swing of

above 300 mV.

The MUX circuit can work reliably in the absence of

external reference clock or manual phase alignment

between data and clock. Moreover, the CGC is highly

integrated and has a wide pull-in range of 800 MHz. So

Fig. 18 Measured (a) waveform, (b) phase noise, and (c) spectrum of

the 5 GHz recovered clock

Fig. 19 Measured 5 Gb/s multiplexed eye diagram when two 2.5 Gb/

s data streams were applied

Fig. 20 Measured eye diagrams of recovered clock and data when

one 2.9 Gb/s data stream was applied to two input ports at the same

time

Table 1 Measured performance summary of the proposed MUX

Technology 0.18-lm CMOS

Target operation speed 5 Gb/s

Available operation range (output) Between 4.92 and

5.8 Gb/s

Supply voltage 1.8 V

Input sensitivity \25 mV

Power consumption1 162 mW

Output amplitude [300 mV

Die area2 812 9 675 lm

Extracted clock

(5 GHz)

Phase noise -15.8dBc/Hz @ 1 MHz

Spectrum

power

-2.21 dbm

MUXed data (5 Gb/s) Amplitude [300 mV

RMS jitter 5.6 ps

1 Including input/output buffers2 Including pads

478 Analog Integr Circ Sig Process (2012) 72:469–480

123

Page 11: 5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clock generation circuit in 0.18-μm CMOS

neither external adjustment nor components, for example

capacitors, are required. The loop circuit can operate

properly at any input data rate between 2.46 and 2.9 Gb/s,

and it has a phase noise of -115.8 dBc/Hz at 1 MHz offset

at 2.5 Gb/s data input.

Acknowledgments Project supported by the National Natural Sci-

ence Foundation of China (No. 61106024), Specialized Research

Fund for the Doctoral Program of Higher Education, China (No.

20090092120012), the Natural Science Foundation of Jiangsu Province,

China (No.BK2010411), and the Research Start-Up Fund of Nanjing

University of Posts and Telecommunications (No. NY211016).

References

1. Tang, X., Wang, X. J., Zhang, S. Y., et al. (2008). A 2-Gb/s 16:1

multiplexer in 0.18-lm CMOS. IEEE ICMMT, 2, 868–870.

2. Akira, T., Masato, U., Ikuo, F., et al. (2001). 0.18-lm CMOS

10-Gb/s multiplexer/demultiplexer ICs using current mode logic

with tolerance to threshold voltage fluctuation. IEEE Journal ofSolid-State Circuits, 36(6), 988–996.

3. Hai, T., Shaeffer, D. K., Min, X., et al. (2003). 40-43-Gb/s

OC-768 16:1 MUX/CMU chipset with SFI compliance. IEEEJournal of Solid-State Circuits, 38(12), 2169–2180.

4. Nakasha, Y., Suzuki, T., Kano, H., et al. (2002). A 43-Gb/s full-

rate -clock 4:1 multiplexer in InP-based HEMT technology. IEEEJournal of Solid-State Circuits, 37(12), 1703–1709.

5. Cong, H. I., Logan, S. M., Loinaz, M. J., et al. (2001). A 10-Gb/s

16:1 multiplexer and 10-GHz clock synthesizer in 0.25-lm

CMOS SiGe BiCMOS. IEEE Journal of Solid-State Circuits,36(12), 1946–1953.

6. Seedher, A., & Sobelman, G. E. (2003). Fractional rate phase

detectors for clock and data recovery. IEEE SOC conference,

2003, pp. 313–316.

7. Greshishchev, Y. M., & Schvan, P. (2000). A fully integrated

SiGe receiver IC for 10-Gb/s data rate. IEEE Journal of Solid-State Circuits, 35(12), 1949–1957.

8. Pottbacker, A., & Langmann, U. (1994). An 8 GHz silicon

bipolar clockrecovery and data-regenerator IC. IEEE Journal ofSolid-State Circuits, 29(12), 1572–1576.

9. Walker, R. C. (2003). Designing bang–bang PLLs for clock and

data recovery in serial data transmission systems. In B. Razavi

(Ed.), Phase-locking in high performance systems-from devices toarchitectures. New York: IEEE Press.

10. Razavi, B. (2003). Design of integrated circuits for opticalcommunications. New York: McGraw-Hill.

11. Gutierrez, G., & Shyang, K. (1998). Unaided 2.5 Gb/s silicon

bipolar clock and data recovery IC. Radio Frequency IntegratedCircuits Symposium, IEEE, pp. 173–176.

12. Zhang, C. C., Wang, Z. G., Shi, S., et al. (2009). A 20-Gb/s 1:2

demultiplexer in 0.18-lm CMOS. Journal of Semiconductors,30(5), 055007.

Si Shi received the B.S. degree

in Information Engineer-

ing from Southeast University

in 2008. She is working towards

the M.S. in the Institute of

RF- & OE-ICs (IROI) of School

of Information and Engineering,

Southeast University. Her

research interests include the

design of high-speed multiplex-

ers, clock and data recovery

circuits and de-multiplexers.

Zhigong Wang received the

M.S. degree in radio engineer-

ing from Nanjing Institute of

Technology (now, Southeast

University), Nanjing, China, in

1981. He received the Ph.D.

degree from Ruhr-University

Bochum, Germany in 1990.

From 1990 to 1997 was with

Fraunhofer-Institute of Applied

Solid State Physics, Freiburg,

Germany, where he focused on

the high-speed GaAs ICs for

optic-fiber data transmission

and MMICs. Since Oct. 1997,

he has been engaged as a doctoral supervisor and the superintendent

in the Institute of RF- & OE-ICs (IROI) of School of full professor of

Southeast University, Nanjing, China. He is a Senior Member of

IEEE. He is the author or co-author of 17 books and more than 300

SCI/EI-cited papers, and inventor of 17 patents. Recently, he is

involving in IC design for optic-fiber transmission systems at data

rates up to 100 Gb/s, for RF wireless, microwave, and millimeter

wave applications, and micro-electronic systems for bio-medical

applications.

Changchun Zhang received the

B.S. and M.S. degrees in Guilin

University of Electronic Tech-

nology (GUET), Guilin, China,

in 2003 and 2006, respectively,

and the Ph.D. in Southeast

University, Nanjing, China, in

2010. His work focuses on

SERDES IC designs and PLL,

DLL design.

Analog Integr Circ Sig Process (2012) 72:469–480 479

123

Page 12: 5 Gb/s 2:1 fully-integrated full-rate multiplexer with on-chip clock generation circuit in 0.18-μm CMOS

Peng Miao received Ph.D.

degree in Circuits and Systems

from Southeast University,

Nanjing, China, in 2007. He

now serves as an associate pro-

fessor in the Institute of RF- &

OE-ICs (IROI) of School of

Information and Engineering,

Southeast University. His work

focuses on the high-speed ICs

for optic-fiber data transmission.

Lu Tang received the B. S.

degree in Information Engi-

neering and the Ph.D. degree in

Circuits and Systems from

Southeast University, Nanjing,

China, in 2002 and 2008,

respectively. He now serves as a

Lecturer in the Institute of

RF- & OE-ICs (IROI) of School

of Information and Engineering,

Southeast University. His work

focuses on RF front- end circuit

and mixed- signal circuit design.

480 Analog Integr Circ Sig Process (2012) 72:469–480

123