Dds Cordic

7/27/2019 Dds Cordic

1/10

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007 151

A 380 MHz Direct Digital Synthesizer/Mixer WithHybrid CORDIC Architecture in 0.25 m CMOS

Davide De Caro, Member, IEEE, Nicola Petra, Student Member, IEEE, andAntonio Giuseppe Maria Strollo, Senior Member, IEEE

AbstractThe paper describes the implementation of a380 MHz, 13 bit, direct digital synthesizer/mixer IC in 0.25 mCMOS technology. The circuit employs an innovative archi-tecture which divides the rotation operation required inthe quadrature synthesizer/mixers, in three rotations. The firsttwo rotations are implemented by using a CORDIC datapathcompletely realized in carry-save arithmetic. The directions ofthe CORDIC rotations are computed in parallel by using a littlelookup table, for the first rotation, and a multiply by constant andaddition circuit for the second rotation. The final (third) rotationis multiplier-based, in order to reduce the circuit latency andincrease the circuit performances.

The CORDIC datapath is implemented with a novel approachboth at the algorithmic level and at the transistor level. At the al-gorithmic level the combined employ of sign-extension prevention,overflow prevention and a novel rounding scheme are presented.At the transistor level a design style that jointly uses full-CMOSand DPL to improve the circuit latency is described.

The overall circuit performances are very interesting. The syn-thesizer/mixer IC, realized in a 0.25 m CMOS technology, has anarea occupation of 0.22 mm and dissipates 152 mW at 380 MHzwith a supply voltage of 2.5 V.

Index TermsAngle rotator, carry save arithmetic, CORDIC al-gorithm, digital downconverter, direct digital frequency synthe-sizer, digital mixer, digital synthesizer, digital tuner, digital upcon-verter, mixer, modulator, overflow prevention, quadrature modu-

lator, rounding.

I. INTRODUCTION

IN RECENT years, there has been a growing trend in the

communication technologies to shift from analog toward

digital techniques. The use of digital techniques, in fact, over-

comes many analog hardware limitations (like high sensitivity

to process and temperature variations, difficult portability as

the VLSI technology scales down, etc.). Moreover, the pro-

grammability offered by digital techniques provides flexibility

that is especially important in the context of rapidly evolving

communication standards.

Owing to advances in CMOS circuit performances, digital

techniques are nowadays able of handling intermediate fre-

quency (IF) or even low radio frequency (RF) tasks. One of the

most basic building-block in this context is the direct digital

Manuscript received April 3, 2006; revised June 30, 2006. Chip fabricationwas supported by MOSIS Research Educational Program.

The authors are with the Department of Electronics and TelecommunicationEngineering, University of Napoli Federico II, Napoli 80125, Italy (e-mail:[email protected]).

Digital Object Identifier 10.1109/JSSC.2006.886527

Fig. 1. Synthesizer/mixer nonoptimal architectures: a) DDFS-based architec-ture; b) CORDIC-based architecture.

frequency synthesizer/mixer (DDFSM), which is in ubiqui-

tous use for many communication subsystems such as tuners,

derotators, up and down frequency converters (see [8], [35]).

In addition, the quadrature mixer is the front-end of various

modulation/demodulation schemes, such as binary phase shift

keying (BPSK), quadrature phase shift keying (QPSK), and

quadrature amplitude modulation (QAM) (see[35]).

The inputs of a DDFSM are two signals and , and

a frequency control word . The outputs of the system arecomputed according to the following equations:

(1)

where

(2)

The equations(1)and(2) correspond to a complex multipli-

cation between an input vector in the complex plane, with coor-

dinates , and an unitary modulus vector

.

A first implementation for the DDFSM includes two distinctfunctional units[1]; seeFig. 1(a). The first one is a direct dig-

ital frequency synthesizer (DDFS)[2],[3]that generates the se-

quences and . The second one is

a complex multiplier, which uses four real multipliers, one adder

and one subtractor to generate the outputs and

according to(1).This implementation is generally nonoptimal

[4],[8]. The DDFS is in fact a cumbersome circuit itself. More-

over, the complex multiplier does not exploit the property that

the modulus of one of the inputs is unitary.

A second possible implementation[5],[6]employs a simple

overflowing accumulator that generates the angle and a ro-

tator using the CORDIC algorithm [7] to implement (1); see

0018-9200/$20.00 2007 IEEE


2/10

152 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007

Fig. 1(b). Unfortunately, the CORDIC algorithm in its standard

implementation is inherently slow, using many cascade compu-

tation stages.

The recent approaches[8][11]overcome the limitations of

the simple architectures ofFig. 1by implementing the synthe-

sizer/mixer as the cascade of two stages: a coarse angle rota-

tion followed by a fine rotation stage. In[8][10]both thecoarse rotation and thefine rotation employ a multiplier-based

architecture, while the approach of[11]uses a CORDIC archi-

tecture for the coarse rotation and a multiplier-based fine rota-

tion. The IC implementations[8],[10]are very effective, with

high speed operation and reduced hardware complexity. Until

now, no IC implementation exists of the mixed approach of[11].

This paper [12] introduces a novel combined approach,

named Hybrid CORDIC, to realize a synthesizer/mixer. This

approach splits the rotation required in the synthesizer/mixer

circuit in three rotations. A first rotation is performed by em-

ploying a CORDIC datapath in which the rotation direction are

computed in parallel, by employing a lookup table. The second

rotation is also CORDIC-based, with rotations directionscomputed in parallel analytically. The final (third rotation) is

multiplier-based.

The parallel evaluation of the rotations directions allows an

efficient use of the carry-save arithmetic in the CORDIC data-

path of the first two rotation blocks, without requiring additional

carry-propagate adders (as in[19],[20]) or the introduction of

additional CORDIC sub-rotations (as in [21]). Thefinal mul-

tiplier-based rotation allows to reduce the overall number of

pipelining levels and the circuit complexity as well.

At the transistor level, a novel approach, which combines

full-CMOS and double-pass-transistor logic (DPL)[30]design

styles, is presented to implement the CORDIC datapath.The paper is organized as follows.Section IIintroduces the

top level implementation of the synthesizer/mixer. Section III

discusses the algorithmic aspects of the novel Hybrid CORDIC

architecture. Section IV highlights the main advantages

of the Hybrid CORDIC architecture in comparison to the

state-of-the-art architectures. The carry-save implementation

of the CORDIC stages is discussed in Section V, while the

mixed CMOS-DPL design style is presented inSection VI. In

Section VIIthe prototype IC, realized in 0.25 m CMOS tech-

nology, is presented, and the experimental results are compared

to the state-of-the-art implementations.

II. SYNTHESIZER/MIXERBASICARCHITECTURE

The top-level architecture of the designed DDFSM IC is

shown inFig. 2.The circuit is sized in order to exhibit a 90 dBc

spurious free dynamic range (SFDR). The input word-length

is 12 bit while the output word-length is 13 bit. The 32 bit

phase accumulator generates the rotation angle ,

represented with a binary fractional value in [0, 1]. The rotation

angle is truncated to 16 bit, introducing output spurs that are

below the 90 dBc SFDR constraint. The heart of the circuitis the Hybrid CORDIC rotator block. This block is able to

Fig. 2. Top-level architecture of the designed DDFSM IC.

is given by

1 .

perform a rotation by an angle represented with a

binary fractional value in [0, 1]:

(3)

The least significant bit of has a weight that will be indicated

in the following as .

The other minor subsystems in Fig. 2 (1s complementer,

swappers and 2s complementers controlled by control logic)

employ the symmetries of the sine and cosine functions [8], [10]

to perform the complete rotation in the full interval. It is

worth to highlight that introducing of a phase shift in

the rotator block, it is possible to completely eliminate the error

due to the employ of a simple 1s complementer to evaluate the

angle (see[2],[3],[16]).

III. HYBRIDCORDICROTATORALGORITHM

The architecture of the Hybrid Cordic rotator is shown in

Fig. 3.The circuit rotates its input vector by the angle

. The rotation is performed in three steps. Thefirst

two steps are performed with a CORDIC datapath, while the

final step is realized by using two multipliers.

A. First Rotation

In thefirst step, the angle is divided in two sub-words

, where

(4)

(5)

and is the complement of .

The goal of thefirst stage is to perform a rotation by an angle

close to . To that purpose, the first rotation block

uses the CORDIC algorithm, described by the following equa-

tions:

(6)


3/10

DE CAROet al.: A 3 80 MHz DIRE CT DIGITA L S YNT HES IZ ER/MIXER WITH HYBR ID CORD IC ARC HITEC TURE IN 0 .2 5 m CMO S 15 3

Fig. 3. Hybrid CORDIC rotator architecture.

where is equal to . The algorithm starts with

, and . To simplify hardware im-

plementation, only four CORDIC sub-rotations are performed

in(6), resulting in a rotation by an angle .

From the CORDIC algorithm properties, it can be easily shown

that the absolute value of the residual angle

is upper bounded by . Therefore, by choosingfour rotations in the first stage, we have about the same max-

imum absolute value for both and [see(5)].

The direction of the first rotation in (6) is fixed

since . The directions of the remaining rotations

depend only on . These directions, therefore, can

be precomputed, by using (6), and stored in the lookup table

shown inFig. 3. The lookup table is very small, having three

address bits . The residual angle , similarly

to values, depends only on , , . Also , there-

fore, can be stored in the lookup table.

Finally, let us note that the four CORDIC sub-rotations(6)

amplify the modulus of the input vector by a factor

(7)

The amplification is inconsequential in many applications

[4][6],[11]and is not compensated in the proposed approach.

B. Second Rotation

In order to complete their operation, the second and third

stages of the Hybrid CORDIC architecture rotate the vector

(the output of thefirst stage) by an angle

(8)

The angle is computed by using the multiplier and the

adder shown inFig. 3. The multiplier is needed to calcu-

late from its scaled representation; see(5).Since, as we have

observed before, the absolute values of and are both

lower than , the absolute value of is lower than . By

representing with 11 bits, we have

(9)

A phase quantization error in the range is in-

troduced in(9). This results in an maximum error at the ,

outputs of the DDFSM equal to . This

value is much lower than the weight of the less significant bit atthe outputs of the DDFSM .

The angle is then split as the sum of two sub-angles

, where

(10)

(11)

Thesecond rotationblock is aimed to perform the rotation bythe angle , whereas the rotation by the angle is assigned to

the final rotationblock.

In the second rotation we employ a CORDIC algorithm

without the computation. The rotation directions are

obtained directly by the bits of as follows:

for (12)

The corresponding CORDIC equations are

(13)

where is the output of the first rotation

stage; seeFig. 3.

The actual rotation angle obtained with(13)is not exactly the

required angle but is instead an angle , given by

(14)

From(10),(12)the angle can be written as

(15)

As a consequence, thesecond rotationblock introduces a phase

error, :

(16)

With simple manipulations, it is possible to show that is

upper bounded by

(17)

The phase error of thesecond rotationintroduces an error on

each component of the DDFSM output. From(17), is muchlower than the weight of the output LSB .


4/10


TABLE I

PERFORMANCES OF THEPROPOSEDARCHITECTURE

Like thefirst rotation block, also the CORDIC rotations(13)

amplify the modulus of the input vector, by a factor

(18)

Therefore, the total amplification factor is

(19)

C. Final (Third) Rotation

Thefinal rotationblock inFig. 3implements the rotation by

. The operation to be performed by this block can be written

as

(20)

This final rotation could also be computed by using the

CORDIC algorithm. However, as observed in [17] and [18],

when the rotation angle is small a complex multiplier is able to

reduce the latency and improve the performances.

In our case, the absolute value of is lower than . There-

fore, we can approximate sine and cosine functions in (20)as

(21)

In this way, the final rotation is realized without the need of

lookup tables to store sine and cosine values.

The approximation (21) introducesan error on the DDFSM

outputs and . It can be easily shown that this error

component is upper bounded by .

As shown inFig. 3, we have introduced two rounders in the

final rotation stage, to reduce the wordlength of multipliers

input. The two rounders introduce an additional error at the

output. We have .

An analytical derivation of the joined effect of all algorithmic

and quantization errors is not easy. We performed bit-level sim-

ulations, by operating the DDFSM in two modes. In DDS mode

and so that the circuit generates two quadra-

ture sinewave outputs. In SSB mode a sinusoidal input is applied

to the DDFSM, that operates as a digital upconverter with image

rejection. Table I summarizes the performances of the developed

architecture.

IV. COMPARISON WITHSTATE-OF-THE-ARTAPPROACHES

The main advantage of the proposed Hybrid CORDIC archi-

tecture is the parallel computation of the rotations directionsand . This computation is performed with a small lookup table,

a multiplier by constant and an adder. Therefore, simple and ef-

fective carry-save[31]implementation for the datapaths can be

used, avoiding the speed penalties due to carry propagation[5].

Previously proposed carry-save CORDIC architectures re-

quire a datapath, and also additional carry-propagate adders

to determine rotations directions [19], [20]. Other techniques do

not include carry-propagate adders, but require the introductionof extra rotations[21].

Thefirst two CORDIC rotation blocks in our architecture re-

semble the algorithms proposed in[22].However, in the parti-

tioned Hybrid CORDIC algorithm of[22]the partitioning and

the handling of the rotation angle would require a huge lookup

table for its implementation. On the other hand, the mixed Hy-

brid CORDIC algorithm, also proposed in[22],does not parti-

tion the rotation angle. Therefore, its implementation requires

in thefirst stage either a full datapath or a lookup table ad-

dressed by all the bits of the rotation angle.

The solution of[11]uses two rotation stages. The first one is

a CORDIC rotator, while the second one is multiplier-based (as

originally proposed in[17]). The CORDIC rotator of[11]usesa number of stages comparable to the overall stages used in the

first and second block of our architecture. The use in[11]of a

single CORDIC rotator, however, requires a lookup table much

larger than the one used in our architecture.

The recently proposed DDFSM implementations [8][10]

use an architecture composed by two multiplier-based rotation

stages. These architectures require a total of 8 small-width

multipliers. The experimental results shown in the following

demonstrate that the Hybrid CORDIC architecture is more

effective, especially in terms of power and area occupation.

V. HYBRID CORDICIMPLEMENTATION

The most critical subsystem in the Hybrid CORDIC architec-

ture ofFig. 3are the CORDIC stages. In fact, the lookup table

is very small and can be effectively be synthesized as random

logic. The multiplier requires only the sum of few partial

products that can easily be merged with the adder needed to

compute in a single summation tree.

The final rotation of the Hybrid CORDIC architecture of

Fig. 3 uses multiply-accumulate circuits also realized with a

single summation tree. The sign-extension prevention tech-

nique[23] has been used to realize the subtraction needed to

compute .

A. Carry-Save Implementation of the Cordic StagesAn innovative architecture has been devised to implement the

first and second CORDIC rotation stages. The basic equation to

implement the CORDIC stages is

(22)

where is the direction of the CORDIC sub-rotation, while

represents the order of the sub-rotation. The(22)implements

the computation in (6), (13). The computation can be easily

obtained by swapping and in(22)and changing the sign

of .

Since, in our architecture, the CORDIC rotations directions

are efficiently evaluated in parallel, the implementation wasperformed by using carry-save arithmetic. Rewriting (22)


5/10


Fig. 4. Detailed implementation of thefirstand second rotationblocks with carry-save arithmetic. The datapath is built by one wiring block and six CORDICsub-rotations driven by the directions , .

Fig. 5. Optimized bit-level implementation in carry-save arithmetic of theelementary stage(23); is the order of theelementary stage.

in carry-save [31] form, we obtain the main equation to be

implemented:

(23)

Fig. 4 shows the detailed carry-save datapath of the seven

CORDIC stages needed in the architecture ofFig. 3.

The , inputs of the circuit ofFig. 4, are in twos com-plement representation. Thefirst two blocks in Fig. 4(labeled

wiring) implement the first CORDIC sub-rotation with a fixed

direction ( , seeSection III). These blocks are also in

charge of the conversion from twos complement to carry-save

representation and therefore can be realized by simple wiring

and complementations, without additional logic.

The six remaining CORDIC sub-rotations are implemented

by using the elementary stages inFig. 4. Each elementary stage

implements(23). The wordlength of the , signals inside the

datapath ofFig. 4is increased by 2 LSBs (in order to reduce the

overall error introduced by the CORDIC elaboration) and by 1

MSB (to avoid overflow).

The twofinal vector merging adders (VMAs), inFig. 4,con-vert the result to twos complement representation. Rounding is

also performed in the VMAs to provide thefinal , sig-

nals with a wordlength of 13 bits.

Fig. 5 shows the terms to be added to implement (23) atthe bit

level. In thisfigure, is the binary value associated to (

if and for ).Fig. 5highlights the use

of both the sign-extension prevention of[23]and the overflow

prevention of[19]. Both techniques allow to reduce the circuit

complexity with respect to simpler carry-save approaches [6].

In order to implement the two subtractions of(23)the bits ofand are XORed with . Moreover, a twos complement

constant (the bit equal to s in the column of weight LSB) is

also added.

The rounding constant has been computed in order to min-

imize the rounding error. For all elementary stages, but the one

marked with a star in Fig. 4, the rounding error is minimized

when if and if . Therefore,

the sum of the twos complement constant and is equal to

LSB. For the elementary stage indicated with a star inFig. 4,

the optimal rounding constant is zero.

B. Elementary Stage Implementation

Fig. 6shows that the terms ofFig. 5can be summed with asingle row of 4-2 compressors[24]. Besides these blocks, the


6/10


Fig. 6. Implementation of the

-th orderelementary stageby using 4-2 compressors and half-adders. For theelementary stagemarked with a star inFig. 4,

and . For the otherelementary stages and .

Fig. 7. Implementation of

(a) and

(b) half-adder circuits.

circuit requires half adders ( and for the compression

of the MSBs) and XOR gates (for conditional complementing).

The circuits are the traditional half adders which compute

. The circuits, instead, compute . These

blocks allow the summation of the sign-extension prevention

constant (seeFig. 5) without requiring additional hardware.

The circuit is well known[34]. An effective implemen-

tation in CMOS logic (derived from the 28T full-adder [34]) is

shown in Fig. 7(a). The circuit is described by the following

equations:

(24)

and is implemented as shown inFig. 7(b).

It is interesting to observe, inFig. 6, that the employ of the

sign-extension prevention allows the use of a couple of half

adder circuits in place of a single 4-2 compressor, to compute the

most significant bits. The most efficient realizations of the 4-2

compressor[25][28]requires about 60 MOS transistors, while

the couple of half adder circuits, realized as shown inFig. 7re-

quire only a total of 28 transistors. The sign-extension preven-

tion technique is, therefore, able to provide a very low circuit

complexity. The number of 4-2 compressors decreases with the

increase of the order of the stage and, in our approach, this re-sults in a substantial gain in area.

The timing performances of the elementary stage shown in

Fig. 6are limited by two critical paths.

Thefirst timing critical circuit, shown inFig. 8(a), is com-

posed by a 4-2 compressor with two inputs conditionally com-

plemented. The best available implementations of the 4-2 com-

pressor[27], [28]provide a delay of three XOR gates, and in-

clude a total of four XOR gates plus two multiplexers. There-

fore, a straightforward implementation of the circuit ofFig. 8(a)

would require a maximum delay of four XOR gates.

An optimized implementation of thisfirst timing critical cir-

cuit can be obtained by embedding the two XORgates driven by

in the 4-2 compressor. This is not straightforward, since (due

to redundancy of the carry-save arithmetic) different Boolean

equation sets exist which provide the same arithmetic function

of a 4-2 compressor. We have found that an optimal solution can

be obtained starting from the Boolean equations set of the 4-2compressor introduced by Ghoshet al.[29], and embedding the

XORgates in the circuit, as shown in the following equations:

(25)

Fig. 8(b) shows the gate level implementation of (25). Our

circuit exhibits only three XOR

gates on the critical path,highlighting an evident advantage in terms of speed with

respect to the implementation of Fig. 8(a). Moreover, the

circuit of Fig. 8(b), requiring a total of five XOR gates plus

two multiplexers, results in one less XOR gate with respect to

the implementation of Fig. 8(a) using the state-of-the-art 4-2

compressor of Hsiao et al.[28].

Let us now consider the second timing critical circuit of

Fig. 9(a), corresponding to the overflow prevention logic, on

the left-hand side ofFig. 6. A straightforward implementation

of the circuit would present four gates on the critical path (by

assuming the delay of an half adder comparable to the delay of

an XOR gate). An optimized implementation, with a delay of

three XORgates can be obtained by exploiting the redundancyof carry-save arithmetic. In fact, the two half adders surrounded


7/10


Fig. 8. Optimal implementation of thefirst timing critical block inFig. 6.(a) Logical function. (b) Detailed implementation with simple gates.

Fig. 9. Optimal implementation of the second timing critical block inFig. 6. (a) Logical function. (b) Detailed implementation with simple gates.

by the dashed line in Fig. 9(a)are described by the following

Boolean equations:

(26)

where . By exploiting the redundancy of the

carry-save arithmetic, we can rewrite the Boolean equations of

this block, preserving its arithmetic function, as follows:

(27)

Proceeding in a similar way for the second column of half

adders inFig. 9(a), with some manipulations, we obtain for the

whole circuit ofFig. 9(a)the following equivalent equations:

(28)

where ; .

The resulting circuit is shown inFig. 9(b),where the critical

path from all inputs to all outputs is composed of three gates

(twoXOR and one multiplexer or two XOR and one NANDgate).

The worst delay from to all outputs is two gates (one XOR

and one multiplexer). Since the input arrives with a delay

of one gate [seeFig. 6andFig. 7(b)], this path results again in

a total delay of three gates.

VI. MIXEDCMOS-DPL IMPLEMENTATION

In order to simplify IC design, the DDFSM has been im-plemented by using a standard cell approach, with automatic

place and routing. To optimize performances special purpose

cells were designed to implement the timing critical circuits of

Fig. 8(b)andFig. 9(b). These circuits, being composed mainly

by XORgates and multiplexers, are well suited for a pass tran-sistor logic implementation. Having high speed operation as

our main target, we employed the double-pass-transistor (DPL)

logic style[30], as shown inFig. 10.

DPL is a double-rail logic. In the developed cells, each

input is converted from single to dual rail by using a couple

of inverters. In this way passgate inputs, that are not suited

for the timing models used by the timing analysis tools, are

also avoided. The inverters 15 in the circuit ofFig. 10(b)and

the inverters 16 in Fig. 10(c) increase the circuit speed by

limiting the maximum number of series transistors. Moreover,

the inverters 12 inFig. 10(b) make the propagation delay of

the Carry output independent from the capacitive load on the

Sumoutput. A similar consideration applies to the inverters 12and 34 inFig. 10(c).In this way the developed DPL circuits

are fully compatible with the other full-CMOS standard cells

of the library.

It is worth noting that not all gates have to be dual rail.

The gates which drive the outputs, both in Fig. 10(b) and in

Fig. 10(c), can be single rail. Also the XOR gate which drives

the single rail multiplexer that calculates inFig. 10(b)can

be implemented in a single rail style.

The number of transistors, the propagation delay and

the power dissipation obtained by employing the proposed

DPL-CMOS design style are reported inTable II. For compar-

ison, the same table reports also the performances achievableby using a standard cell library with only full-CMOS logic,


8/10


Fig. 10. Transistor level implementation of the special cells ofFig. 8(b)andFig. 9(b).(a) Basic gates implementation. (b) DPL implementation of the circuit ofFig. 8(b).(c) DPL implementation of the circuit ofFig. 9(b).

TABLE IISIMULATEDPERFORMANCES OFDIFFERENTCORDIC STAGESCONFIGURATIONS BYEMPLOYINGPROPOSEDDPL-CMOS AND FULL-CMOS STYLES

TABLE III

EXPERIMENTAL PERFORMANCES OF THESYNTHESIZER/MIXER

without special cells. All stages have been designed for a

0.25 m, 2.5 V technology. The analysis of Table II reveals

that proposed design style allows about a 35% reduction of

the propagation delay by providing about the same number oftransistors and power dissipation of the full-CMOS realization.

VII. EXPERIMENTAL RESULTS

The DDFSM with the optimized carry-save CORDIC archi-

tecture and the mixed CMOS-DPL design style has been fabri-

cated on a test chip (see Fig. 11) ina 2.5 V, 0.25 m CMOS tech-

nology. The DDFSM has been synthesized from a VHDL de-scription, and has been automatically placed and routed by using

a commercial tool. The DDFSM accepts a 32 bit frequency con-

trol word, resulting in a frequency resolution of about 0.088 Hz.

Table III summarizes the main characteristics of the circuit.

Fig. 12reports the experimental digital output spectrum of the

DDFSM when operated in DDS mode ( , ),

showing an SFDR larger than 93 dBc.

Besides the DDFSMthe chip includes a built in self test struc-

ture (SA Mixer) and two programmable ring oscillators (RO

Fast and RO Slow) to make the measurement of DDFSM max-

imum clock frequency and power dissipation easier. Also in-

cluded in the chip it is a DDFS which can provide

inputs to the synthesizer/mixer to test the single and double side-band modulation functionality of the circuit.


9/10


TABLE IV

COMPARISON WITHRECENTLYPROPOSEDDESIGNS

Fig. 11. Test chip realized in CMOS 0.25 m technology. The chip includesour optimized synthesizer/mixer (Synth/Mixer) a DDFS, two ring oscillators(RO Fast andRO Slow) andthe built-in self-testlogic(SA Mixer)for easy circuittesting.

Fig. 12. Output spectrum of the DDFSM in DDSmode ( 0 , );

MHz.

Table IIIalso reports the experimental performances of the

developed DDFSM. The circuit exhibits a very low power

dissipation (0.40 mW/MHz) with a maximum clock frequency

slightly lower than 400MHz.

The experimental performances of the proposed circuit

are compared in Table IV with the performances of the best

DDFSMs available in literature based on two stage multiplier

architecture and implemented with the same 0.25 m tech-

nology. It can be observed that the developed architecture

allows a more than three-fold reduction of power dissipation,

with also a substantial reduction in the silicon area with respectto[8].The circuit in[10], while able to reach a SFDR of 100

dBc, requires about a 2.32 times larger area with respect to our

implementation.Table IVshows, moreover, that our circuit is

able to work correctly up to 385 MHz, whereas the best result

obtained in literature was 330 MHz.

VIII. CONCLUSION

The paper has described in detail the implementation of

an high performances synthesizer/mixer IC which exploits

improvements at the algorithmic, architectural and transistorlevels.

In the novel synthesizer/mixer architecture, the rotation

operation has been split in three rotations. The first two rota-

tions use a CORDIC datapath completely realized in carry-save

arithmetic. This is possible since the directions of the CORDIC

rotations are computed in parallel by using a little lookup table

in thefirst rotation and a fast multiply by constant and addition

circuit in the second rotation. The final (third) rotation is mul-

tiplier-based, in order to reduce the circuit latency and increase

the circuit performances.

The CORDIC datapath has been realized in carry-save arith-

metic. In this datapath the combined employ of sign extensionprevention, overflow prevention and a novel rounding scheme

have been presented. At the transistor level a design style that

jointly uses full CMOS and DPL to improve the circuit latency

has also been described.

The realized synthesizer/mixer IC shows very good perfor-

mances in terms of power dissipation, area and maximum clock

frequency.

REFERENCES

[1] L.K. Tanand H. Samueli, A 200 MHzdirectdigital synthesizer/mixerin 0.8 m CMOS, IEEE J. Solid-State Circuits, vol. 30, no. 3, pp.193200, Mar. 1995.

[2] B. G. Goldberg, Direct Digital Frequency Synthesis Demystified.

Eagle Rock, VA: LLH Technology Publishing, 1999.[3] J. Vankka and K. Halonen, Direct Digital Synthesizers: Theory, Design

and Applications. Norwell, MA: Kluwer Academic, 2001.

[4] S. Nahm, K. Han, and W. Sung, A CORDIC-based digital quadra-ture mixer: comparison with a ROM-based architecture, in Proc. IEEE

ISCAS, 1998, pp. 385388.[5] G. Gielis, R. Van de Plassche, and J. Van Valburg,A 540 MHz 10-b

polar to Cartesian converter,IEEE J. Solid-State Circuits, vol. 26, no.11, pp. 16451650, Nov. 1991.

[6] Y. Ahn, S. Nahm, and W. Sung,VLSI design of a CORDIC-basedderotator,in Proc. IEEE ISCAS, 1998, pp. 449452.

[7] J. E. Volder,The CORDIC trigonometric computing technique,IRETrans. Electron. Comput., vol. EC-8, no. 3, pp. 330334, Sep. 1959.

[8] A. Torosyan, D. Fu, and A. N. Wilson,A 300 MHz direct digital syn-thesizer/mixer in 0.25

m CMOS, IEEE J. Solid-State Circuits, vol.38, no. 6, pp. 875887, Jun. 2003.

[9] D.Fu and A.N. Wilson, A high-speed processor for digital sine/cosinegeneration and angle rotation, in Proc. 42nd Asilomar Conf. Signal,Systems and Computers, 1998, vol. 1, pp. 177181.


10/10


[10] Y. Song and B. Kim,A quadrature digital synthesizer/mixer architec-ture using fine/coarse coordinate rotation to achieve 14-b input, 15-boutput, and 100-dBc SFDR,IEEE J. Solid-State Circuits, vol. 39, no.11, pp. 18531861, Nov. 2004.

[11] F. Curticapean and J. Niittylahti,An improved digital quadrature fre-quency down-converter architecture,in 35th Asilomar Conf. Signals,Systems and Computers, Nov. 2001, pp. 13181321.

[12] D. De Caro, N. Petra, and A. G. M. Strollo,A 380 MHz, 150 mW

direct digital synthesizer/mixer in 0.25

m CMOS, in IEEE ISSCCDig. Tech. Papers, 2006, pp. 258259.[13] H. T. Nicholas and H. Samueli,An analysis of the output spectrum of

direct digital frequency synthesizers in the presence of phase accumu-lator truncation, in Proc. 41st Annu. Frequency Control Symp., May1987, pp. 495502.

[14] A. Torosyan and A. N. Willson, Jr.,Analysis of the output spectrumfor direct digital frequency synthesizers in the presence of phase trun-cation andfinite arithmetic precision,in Proc. 21th Symp. Image andSignal Processing and Analysis, 2001, pp. 458463.

[15] F. Curticapean and J. Niittylahti,Exact analysis of spurious signals indirectdigital frequency synthesizers due to phase truncation,Electron.

Lett., vol. 39, no. 6, pp. 499 501, Mar. 2003.[16] J. Vankka,Methods of mappingfrom phase to sine amplitude in direct

digital synthesis,IEEE Trans. Ultrason. Ferroelectr. Freq. Contr., vol.44, no. 2, pp. 526534, Mar. 1997.

[17] H. M. Ahmed,Signal processing algorithms and architectures,Ph.D.

dissertation, Dept. Electr. Eng., Stanford Univ., Stanford, CA, Dec.1981.

[18] , Efficient elementary function generation with multipliers, inProc. 19th Symp. Computer Arithmetic, Sep. 1989, pp. 5259.

[19] T. G. Noll, Carry-save arithmetic for high speed digital signal pro-cessing,in Proc. IEEE ISCAS, 1990, pp. 982986.

[20] N. Takagi, T. Asada, and S. Yajima, Redundant CORDIC methodswith a costant scale factor for sine and cosine computation, IEEETrans. Comput., vol. 40, no. 9, pp. 989995, Sep. 1991.

[21] T. B. Juang, S. F. Hsiao, and M. Y. Tsai, Para-CORDIC: parallelCORDIC rotation algorithm, IEEE Trans. Circuits Syst. I: Reg. Pa-

pers, vol. 51, no. 8, pp. 15151524, Aug. 2004.[22] S. Wang, V. Piuri, and E. E. Swartzlander,Hybrid CORDIC algo-

rithms, IEEE Trans. Comput., vol. 46, no. 11, pp. 12021207, Nov.1997.

[23] J. F. Ardekani,M 2 N booth encoded multiplier generator using op-timized wallace trees, IEEE Trans. Very Large Scale Integr. (VLSI)Syst., vol. 1, no. 2, pp. 120125, Jun. 1993.

[24] M. Nagamatsu, S. Tanaka, J. Mori, K. Hirano, T. Noguchi, and K.Hatanaka, A 15-ns 32 2 32-b CMOS multiplier with an improvedparallel structure, IEEE J. Solid-State Circuits, vol. 25, no. 2, pp.494497, Apr. 1990.

[25] J. Mori, M. Nagamatsu, M. Hirano,S. Tanaka,M. Noda, Y. Toyoshima,K. Hashimoto, H. Hayashida, and K. Maeguchi, A 10-ns 54 2 54-bparallel structured full array multiplier with 0.5-

m CMOS tech-

nology,IEEE J. Solid-State Circuits, vol. 26, no. 4, pp. 600606, Apr.1991.

[26] G. Goto, T. Sato, M. Nakajima, and T. Sukemura,A 5 42 54-b regu-larly structured tree multiplier, IEEE J. Solid-State Circuits, vol. 27,no. 9, pp. 12291236, Sep. 1992.

[27] N. Ohkubo, M. Suzuki, T. Shinbo, T. Yamanaka, A. Shimizu, K.Sasaki, and Y. Nakagome,A 4.4-ns CMOS 54 2 54-b multiplier usingpass-transistor multiplexer, IEEE J. Solid-State Circuits, vol. 30, no.

3, pp. 251257, Mar. 1995.[28] S. F. Hsiao, M. R. Jiang, and J. S. Yeh, Design of high speed low-

power 3-2 counter and 4-2 compressor for fast multipliers,Electron.Lett., vol. 34, no. 4, pp. 341 342, Feb. 1998.

[29] D. Ghosh, S. K. Nandy, and K. Parthasarathy,TWTXBB: a low la-tency, high throughput multiplier architecture using a new 4-2 com-pressor,in Proc. 7th Int. Conf. VLSI Design, Jan. 1994, pp. 7782.

[30] M. Suzuki, N. Ohkubo, T. Shinbo, T. Yamanaka, A. Shimizu, K.Sasaki, and Y. Nakagome, A 1.5-ns 32-b CMOS ALU in doublepass-transistor logic,IEEE J. Solid-State Circuits, vol. 28, no. 11, pp.11451151, Nov. 1993.

[31] B. Parhami, Computer Arithmetic: Algorithms and Hardware De-signs. Oxford, U.K.: Oxford Univ. Press, 1999.

[32] Y. H. Hu,The quantization effects of the CORDIC algorithm,IEEETrans. Signal Process., vol. 40, no. 4, pp. 834 844, Apr. 1992.

[33] S. Y. Park and N. I. Cho,Fixed-point error analysis of CORDIC pro-cessor based on the variance propagation formula, IEEE Trans. Cir-cuits Syst. I: Reg. Papers, vol. 51, no. 3, pp. 573584, Mar. 2004.

[34] N. H. E. Weste and K. Eshragian, Principles of CMOS VLSI Design .

Reading, MA: Addison-Wesley, Jan. 1993, 0201533766.[35] J. Proakis, Digital Communications, 4th ed. New York: McGraw-Hill, Aug. 2000.

Davide De Caro (M05) was born in Naples, Italy,on February 9, 1973. He received the M.S. degree inelectronic engineering with honors in 1999, and thePh.D. degree in electronic engineering and computerscience in 2003, both from the University of Naples

Federico II, Italy.He has worked in the area of digital integrated

VLSI circuit design for the last eight years. SinceMarch 2003, he has been a Researcher at the De-

partment of Electronics and TelecommunicationEngineering of the University of Naples, Italy, where

he is working on high-performance flip-flops (including both low-power and

high-speed structures), VLSI implementation of arithmetic circuits (squarers,fixed-width multipliers, ReedSolomon decoders, Galois-field multipliers),direct digital frequency synthesizers and digital mixers.

Dr. De Caro is author or coauthor of more than 30 technical papers in interna-tionaljournals and refereed international conferences. He acted as a reviewerforIEEE TRANSACTIONS ONCIRCUITS ANDSYSTEMSI and IEEE TRANSACTIONS

ON VLSI SYSTEMS.

Nicola Petra (S05) was born in 1974 in Naples,Italy. He received the M.S. degree in electronicengineering with honors in 2002 from the Universityof Naples Federico II. He is presently workingtowards the Ph.D. degree at the Department ofElectronics Engineering of the University of Naples

Federico II.His research interests include design of digitalVLSI circuits for telecommunications and high-per-formance arithmetic circuits.

Antonio Giuseppe Maria Strollo (M05SM06)was born in 1963. He received the Laurea degree(with honors) in electronic engineering in 1988,

and the Ph.D. degree in electronic engineering andcomputer science, in 1992, both from the Universityof NaplesFederico II, Italy.

From 1990 to 1998, he was a full time Researcherat the Department of Electronic Engineering of theUniversity of Naples. In November 1998, he was

appointed Associate Professor at the Universityof Naples Federico II. Since November 2002,

he has been Full Professor at the same University. Currently, he is the head

of the Department of Electronic and Telecommunication Engineering of theUniversity of Naples Federico II. His initial research activities covered thearea of bipolar devices modelling and power electronics. His current researchinterests include design and analysis of VLSI digital circuits. In particular, he isworking on: advanced architectures for direct-digital frequency synthesis and

for digital mixers, high-performance arithmetic circuits and high performanceand low-powerflip-flops. He has authored or co-authored more than 100 paperson international journals and refereed conferences.

Documents

Dds Cordic