Dds Cordic

Embed Size (px)

Citation preview

  • 7/27/2019 Dds Cordic

    1/10

    IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007 151

    A 380 MHz Direct Digital Synthesizer/Mixer WithHybrid CORDIC Architecture in 0.25 m CMOS

    Davide De Caro, Member, IEEE, Nicola Petra, Student Member, IEEE, andAntonio Giuseppe Maria Strollo, Senior Member, IEEE

    AbstractThe paper describes the implementation of a380 MHz, 13 bit, direct digital synthesizer/mixer IC in 0.25 mCMOS technology. The circuit employs an innovative archi-tecture which divides the rotation operation required inthe quadrature synthesizer/mixers, in three rotations. The firsttwo rotations are implemented by using a CORDIC datapathcompletely realized in carry-save arithmetic. The directions ofthe CORDIC rotations are computed in parallel by using a littlelookup table, for the first rotation, and a multiply by constant andaddition circuit for the second rotation. The final (third) rotationis multiplier-based, in order to reduce the circuit latency andincrease the circuit performances.

    The CORDIC datapath is implemented with a novel approachboth at the algorithmic level and at the transistor level. At the al-gorithmic level the combined employ of sign-extension prevention,overflow prevention and a novel rounding scheme are presented.At the transistor level a design style that jointly uses full-CMOSand DPL to improve the circuit latency is described.

    The overall circuit performances are very interesting. The syn-thesizer/mixer IC, realized in a 0.25 m CMOS technology, has anarea occupation of 0.22 mm and dissipates 152 mW at 380 MHzwith a supply voltage of 2.5 V.

    Index TermsAngle rotator, carry save arithmetic, CORDIC al-gorithm, digital downconverter, direct digital frequency synthe-sizer, digital mixer, digital synthesizer, digital tuner, digital upcon-verter, mixer, modulator, overflow prevention, quadrature modu-

    lator, rounding.

    I. INTRODUCTION

    IN RECENT years, there has been a growing trend in the

    communication technologies to shift from analog toward

    digital techniques. The use of digital techniques, in fact, over-

    comes many analog hardware limitations (like high sensitivity

    to process and temperature variations, difficult portability as

    the VLSI technology scales down, etc.). Moreover, the pro-

    grammability offered by digital techniques provides flexibility

    that is especially important in the context of rapidly evolving

    communication standards.

    Owing to advances in CMOS circuit performances, digital

    techniques are nowadays able of handling intermediate fre-

    quency (IF) or even low radio frequency (RF) tasks. One of the

    most basic building-block in this context is the direct digital

    Manuscript received April 3, 2006; revised June 30, 2006. Chip fabricationwas supported by MOSIS Research Educational Program.

    The authors are with the Department of Electronics and TelecommunicationEngineering, University of Napoli Federico II, Napoli 80125, Italy (e-mail:[email protected]).

    Digital Object Identifier 10.1109/JSSC.2006.886527

    Fig. 1. Synthesizer/mixer nonoptimal architectures: a) DDFS-based architec-ture; b) CORDIC-based architecture.

    frequency synthesizer/mixer (DDFSM), which is in ubiqui-

    tous use for many communication subsystems such as tuners,

    derotators, up and down frequency converters (see [8], [35]).

    In addition, the quadrature mixer is the front-end of various

    modulation/demodulation schemes, such as binary phase shift

    keying (BPSK), quadrature phase shift keying (QPSK), and

    quadrature amplitude modulation (QAM) (see[35]).

    The inputs of a DDFSM are two signals and , and

    a frequency control word . The outputs of the system arecomputed according to the following equations:

    (1)

    where

    (2)

    The equations(1)and(2) correspond to a complex multipli-

    cation between an input vector in the complex plane, with coor-

    dinates , and an unitary modulus vector

    .

    A first implementation for the DDFSM includes two distinctfunctional units[1]; seeFig. 1(a). The first one is a direct dig-

    ital frequency synthesizer (DDFS)[2],[3]that generates the se-

    quences and . The second one is

    a complex multiplier, which uses four real multipliers, one adder

    and one subtractor to generate the outputs and

    according to(1).This implementation is generally nonoptimal

    [4],[8]. The DDFS is in fact a cumbersome circuit itself. More-

    over, the complex multiplier does not exploit the property that

    the modulus of one of the inputs is unitary.

    A second possible implementation[5],[6]employs a simple

    overflowing accumulator that generates the angle and a ro-

    tator using the CORDIC algorithm [7] to implement (1); see

    0018-9200/$20.00 2007 IEEE

  • 7/27/2019 Dds Cordic

    2/10

    152 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007

    Fig. 1(b). Unfortunately, the CORDIC algorithm in its standard

    implementation is inherently slow, using many cascade compu-

    tation stages.

    The recent approaches[8][11]overcome the limitations of

    the simple architectures ofFig. 1by implementing the synthe-

    sizer/mixer as the cascade of two stages: a coarse angle rota-

    tion followed by a fine rotation stage. In[8][10]both thecoarse rotation and thefine rotation employ a multiplier-based

    architecture, while the approach of[11]uses a CORDIC archi-

    tecture for the coarse rotation and a multiplier-based fine rota-

    tion. The IC implementations[8],[10]are very effective, with

    high speed operation and reduced hardware complexity. Until

    now, no IC implementation exists of the mixed approach of[11].

    This paper [12] introduces a novel combined approach,

    named Hybrid CORDIC, to realize a synthesizer/mixer. This

    approach splits the rotation required in the synthesizer/mixer

    circuit in three rotations. A first rotation is performed by em-

    ploying a CORDIC datapath in which the rotation direction are

    computed in parallel, by employing a lookup table. The second

    rotation is also CORDIC-based, with rotations directionscomputed in parallel analytically. The final (third rotation) is

    multiplier-based.

    The parallel evaluation of the rotations directions allows an

    efficient use of the carry-save arithmetic in the CORDIC data-

    path of the first two rotation blocks, without requiring additional

    carry-propagate adders (as in[19],[20]) or the introduction of

    additional CORDIC sub-rotations (as in [21]). Thefinal mul-

    tiplier-based rotation allows to reduce the overall number of

    pipelining levels and the circuit complexity as well.

    At the transistor level, a novel approach, which combines

    full-CMOS and double-pass-transistor logic (DPL)[30]design

    styles, is presented to implement the CORDIC datapath.The paper is organized as follows.Section IIintroduces the

    top level implementation of the synthesizer/mixer. Section III

    discusses the algorithmic aspects of the novel Hybrid CORDIC

    architecture. Section IV highlights the main advantages

    of the Hybrid CORDIC architecture in comparison to the

    state-of-the-art architectures. The carry-save implementation

    of the CORDIC stages is discussed in Section V, while the

    mixed CMOS-DPL design style is presented inSection VI. In

    Section VIIthe prototype IC, realized in 0.25 m CMOS tech-

    nology, is presented, and the experimental results are compared

    to the state-of-the-art implementations.

    II. SYNTHESIZER/MIXERBASICARCHITECTURE

    The top-level architecture of the designed DDFSM IC is

    shown inFig. 2.The circuit is sized in order to exhibit a 90 dBc

    spurious free dynamic range (SFDR). The input word-length

    is 12 bit while the output word-length is 13 bit. The 32 bit

    phase accumulator generates the rotation angle ,

    represented with a binary fractional value in [0, 1]. The rotation

    angle is truncated to 16 bit, introducing output spurs that are

    below the 90 dBc SFDR constraint. The heart of the circuitis the Hybrid CORDIC rotator block. This block is able to

    Fig. 2. Top-level architecture of the designed DDFSM IC.

    is given by

    1 .

    perform a rotation by an angle represented with a

    binary fractional value in [0, 1]:

    (3)

    The least significant bit of has a weight that will be indicated

    in the following as .

    The other minor subsystems in Fig. 2 (1s complementer,

    swappers and 2s complementers controlled by control logic)

    employ the symmetries of the sine and cosine functions [8], [10]

    to perform the complete rotation in the full interval. It is

    worth to highlight that introducing of a phase shift in

    the rotator block, it is possible to completely eliminate the error

    due to the employ of a simple 1s complementer to evaluate the

    angle (see[2],[3],[16]).

    III. HYBRIDCORDICROTATORALGORITHM

    The architecture of the Hybrid Cordic rotator is shown in

    Fig. 3.The circuit rotates its input vector by the angle

    . The rotation is performed in three steps. Thefirst

    two steps are performed with a CORDIC datapath, while the

    final step is realized by using two multipliers.

    A. First Rotation

    In thefirst step, the angle is divided in two sub-words

    , where

    (4)

    (5)

    and is the complement of .

    The goal of thefirst stage is to perform a rotation by an angle

    close to . To that purpose, the first rotation block

    uses the CORDIC algorithm, described by the following equa-

    tions:

    (6)

  • 7/27/2019 Dds Cordic

    3/10

    DE CAROet al.: A 3 80 MHz DIRE CT DIGITA L S YNT HES IZ ER/MIXER WITH HYBR ID CORD IC ARC HITEC TURE IN 0 .2 5 m CMO S 15 3

    Fig. 3. Hybrid CORDIC rotator architecture.

    where is equal to . The algorithm starts with

    , and . To simplify hardware im-

    plementation, only four CORDIC sub-rotations are performed

    in(6), resulting in a rotation by an angle .

    From the CORDIC algorithm properties, it can be easily shown

    that the absolute value of the residual angle

    is upper bounded by . Therefore, by choosingfour rotations in the first stage, we have about the same max-

    imum absolute value for both and [see(5)].

    The direction of the first rotation in (6) is fixed

    since . The directions of the remaining rotations

    depend only on . These directions, therefore, can

    be precomputed, by using (6), and stored in the lookup table

    shown inFig. 3. The lookup table is very small, having three

    address bits . The residual angle , similarly

    to values, depends only on , , . Also , there-

    fore, can be stored in the lookup table.

    Finally, let us note that the four CORDIC sub-rotations(6)

    amplify the modulus of the input vector by a factor

    (7)

    The amplification is inconsequential in many applications

    [4][6],[11]and is not compensated in the proposed approach.

    B. Second Rotation

    In order to complete their operation, the second and third

    stages of the Hybrid CORDIC architecture rotate the vector

    (the output of thefirst stage) by an angle

    (8)

    The angle is computed by using the multiplier and the

    adder shown inFig. 3. The multiplier is needed to calcu-

    late from its scaled representation; see(5).Since, as we have

    observed before, the absolute values of and are both

    lower than , the absolute value of is lower than . By

    representing with 11 bits, we have

    (9)

    A phase quantization error in the range is in-

    troduced in(9). This results in an maximum error at the ,

    outputs of the DDFSM equal to . This

    value is much lower than the weight of the less significant bit atthe outputs of the DDFSM .

    The angle is then split as the sum of two sub-angles

    , where

    (10)

    (11)

    Thesecond rotationblock is aimed to perform the rotation bythe angle , whereas the rotation by the angle is assigned to

    the final rotationblock.

    In the second rotation we employ a CORDIC algorithm

    without the computation. The rotation directions are

    obtained directly by the bits of as follows:

    for (12)

    The corresponding CORDIC equations are

    (13)

    where is the output of the first rotation

    stage; seeFig. 3.

    The actual rotation angle obtained with(13)is not exactly the

    required angle but is instead an angle , given by

    (14)

    From(10),(12)the angle can be written as

    (15)

    As a consequence, thesecond rotationblock introduces a phase

    error, :

    (16)

    With simple manipulations, it is possible to show that is

    upper bounded by

    (17)

    The phase error of thesecond rotationintroduces an error on

    each component of the DDFSM output. From(17), is muchlower than the weight of the output LSB .

  • 7/27/2019 Dds Cordic

    4/10

    154 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007

    TABLE I

    PERFORMANCES OF THEPROPOSEDARCHITECTURE

    Like thefirst rotation block, also the CORDIC rotations(13)

    amplify the modulus of the input vector, by a factor

    (18)

    Therefore, the total amplification factor is

    (19)

    C. Final (Third) Rotation

    Thefinal rotationblock inFig. 3implements the rotation by

    . The operation to be performed by this block can be written

    as

    (20)

    This final rotation could also be computed by using the

    CORDIC algorithm. However, as observed in [17] and [18],

    when the rotation angle is small a complex multiplier is able to

    reduce the latency and improve the performances.

    In our case, the absolute value of is lower than . There-

    fore, we can approximate sine and cosine functions in (20)as

    (21)

    In this way, the final rotation is realized without the need of

    lookup tables to store sine and cosine values.

    The approximation (21) introducesan error on the DDFSM

    outputs and . It can be easily shown that this error

    component is upper bounded by .

    As shown inFig. 3, we have introduced two rounders in the

    final rotation stage, to reduce the wordlength of multipliers

    input. The two rounders introduce an additional error at the

    output. We have .

    An analytical derivation of the joined effect of all algorithmic

    and quantization errors is not easy. We performed bit-level sim-

    ulations, by operating the DDFSM in two modes. In DDS mode

    and so that the circuit generates two quadra-

    ture sinewave outputs. In SSB mode a sinusoidal input is applied

    to the DDFSM, that operates as a digital upconverter with image

    rejection. Table I summarizes the performances of the developed

    architecture.

    IV. COMPARISON WITHSTATE-OF-THE-ARTAPPROACHES

    The main advantage of the proposed Hybrid CORDIC archi-

    tecture is the parallel computation of the rotations directionsand . This computation is performed with a small lookup table,

    a multiplier by constant and an adder. Therefore, simple and ef-

    fective carry-save[31]implementation for the datapaths can be

    used, avoiding the speed penalties due to carry propagation[5].

    Previously proposed carry-save CORDIC architectures re-

    quire a datapath, and also additional carry-propagate adders

    to determine rotations directions [19], [20]. Other techniques do

    not include carry-propagate adders, but require the introductionof extra rotations[21].

    Thefirst two CORDIC rotation blocks in our architecture re-

    semble the algorithms proposed in[22].However, in the parti-

    tioned Hybrid CORDIC algorithm of[22]the partitioning and

    the handling of the rotation angle would require a huge lookup

    table for its implementation. On the other hand, the mixed Hy-

    brid CORDIC algorithm, also proposed in[22],does not parti-

    tion the rotation angle. Therefore, its implementation requires

    in thefirst stage either a full datapath or a lookup table ad-

    dressed by all the bits of the rotation angle.

    The solution of[11]uses two rotation stages. The first one is

    a CORDIC rotator, while the second one is multiplier-based (as

    originally proposed in[17]). The CORDIC rotator of[11]usesa number of stages comparable to the overall stages used in the

    first and second block of our architecture. The use in[11]of a

    single CORDIC rotator, however, requires a lookup table much

    larger than the one used in our architecture.

    The recently proposed DDFSM implementations [8][10]

    use an architecture composed by two multiplier-based rotation

    stages. These architectures require a total of 8 small-width

    multipliers. The experimental results shown in the following

    demonstrate that the Hybrid CORDIC architecture is more

    effective, especially in terms of power and area occupation.

    V. HYBRID CORDICIMPLEMENTATION

    The most critical subsystem in the Hybrid CORDIC architec-

    ture ofFig. 3are the CORDIC stages. In fact, the lookup table

    is very small and can be effectively be synthesized as random

    logic. The multiplier requires only the sum of few partial

    products that can easily be merged with the adder needed to

    compute in a single summation tree.

    The final rotation of the Hybrid CORDIC architecture of

    Fig. 3 uses multiply-accumulate circuits also realized with a

    single summation tree. The sign-extension prevention tech-

    nique[23] has been used to realize the subtraction needed to

    compute .

    A. Carry-Save Implementation of the Cordic StagesAn innovative architecture has been devised to implement the

    first and second CORDIC rotation stages. The basic equation to

    implement the CORDIC stages is

    (22)

    where is the direction of the CORDIC sub-rotation, while

    represents the order of the sub-rotation. The(22)implements

    the computation in (6), (13). The computation can be easily

    obtained by swapping and in(22)and changing the sign

    of .

    Since, in our architecture, the CORDIC rotations directions

    are efficiently evaluated in parallel, the implementation wasperformed by using carry-save arithmetic. Rewriting (22)

  • 7/27/2019 Dds Cordic

    5/10

    DE CAROet al.: A 3 80 MHz DIRE CT DIGITA L S YNT HES IZ ER/MIXER WITH HYBR ID CORD IC ARC HITEC TURE IN 0 .2 5 m CMO S 15 5

    Fig. 4. Detailed implementation of thefirstand second rotationblocks with carry-save arithmetic. The datapath is built by one wiring block and six CORDICsub-rotations driven by the directions , .

    Fig. 5. Optimized bit-level implementation in carry-save arithmetic of theelementary stage(23); is the order of theelementary stage.

    in carry-save [31] form, we obtain the main equation to be

    implemented:

    (23)

    Fig. 4 shows the detailed carry-save datapath of the seven

    CORDIC stages needed in the architecture ofFig. 3.

    The , inputs of the circuit ofFig. 4, are in twos com-plement representation. Thefirst two blocks in Fig. 4(labeled

    wiring) implement the first CORDIC sub-rotation with a fixed

    direction ( , seeSection III). These blocks are also in

    charge of the conversion from twos complement to carry-save

    representation and therefore can be realized by simple wiring

    and complementations, without additional logic.

    The six remaining CORDIC sub-rotations are implemented

    by using the elementary stages inFig. 4. Each elementary stage

    implements(23). The wordlength of the , signals inside the

    datapath ofFig. 4is increased by 2 LSBs (in order to reduce the

    overall error introduced by the CORDIC elaboration) and by 1

    MSB (to avoid overflow).

    The twofinal vector merging adders (VMAs), inFig. 4,con-vert the result to twos complement representation. Rounding is

    also performed in the VMAs to provide thefinal , sig-

    nals with a wordlength of 13 bits.

    Fig. 5 shows the terms to be added to implement (23) atthe bit

    level. In thisfigure, is the binary value associated to (

    if and for ).Fig. 5highlights the use

    of both the sign-extension prevention of[23]and the overflow

    prevention of[19]. Both techniques allow to reduce the circuit

    complexity with respect to simpler carry-save approaches [6].

    In order to implement the two subtractions of(23)the bits ofand are XORed with . Moreover, a twos complement

    constant (the bit equal to s in the column of weight LSB) is

    also added.

    The rounding constant has been computed in order to min-

    imize the rounding error. For all elementary stages, but the one

    marked with a star in Fig. 4, the rounding error is minimized

    when if and if . Therefore,

    the sum of the twos complement constant and is equal to

    LSB. For the elementary stage indicated with a star inFig. 4,

    the optimal rounding constant is zero.

    B. Elementary Stage Implementation

    Fig. 6shows that the terms ofFig. 5can be summed with asingle row of 4-2 compressors[24]. Besides these blocks, the

  • 7/27/2019 Dds Cordic

    6/10

    156 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007

    Fig. 6. Implementation of the

    -th orderelementary stageby using 4-2 compressors and half-adders. For theelementary stagemarked with a star inFig. 4,

    and . For the otherelementary stages and .

    Fig. 7. Implementation of

    (a) and

    (b) half-adder circuits.

    circuit requires half adders ( and for the compression

    of the MSBs) and XOR gates (for conditional complementing).

    The circuits are the traditional half adders which compute

    . The circuits, instead, compute . These

    blocks allow the summation of the sign-extension prevention

    constant (seeFig. 5) without requiring additional hardware.

    The circuit is well known[34]. An effective implemen-

    tation in CMOS logic (derived from the 28T full-adder [34]) is

    shown in Fig. 7(a). The circuit is described by the following

    equations:

    (24)

    and is implemented as shown inFig. 7(b).

    It is interesting to observe, inFig. 6, that the employ of the

    sign-extension prevention allows the use of a couple of half

    adder circuits in place of a single 4-2 compressor, to compute the

    most significant bits. The most efficient realizations of the 4-2

    compressor[25][28]requires about 60 MOS transistors, while

    the couple of half adder circuits, realized as shown inFig. 7re-

    quire only a total of 28 transistors. The sign-extension preven-

    tion technique is, therefore, able to provide a very low circuit

    complexity. The number of 4-2 compressors decreases with the

    increase of the order of the stage and, in our approach, this re-sults in a substantial gain in area.

    The timing performances of the elementary stage shown in

    Fig. 6are limited by two critical paths.

    Thefirst timing critical circuit, shown inFig. 8(a), is com-

    posed by a 4-2 compressor with two inputs conditionally com-

    plemented. The best available implementations of the 4-2 com-

    pressor[27], [28]provide a delay of three XOR gates, and in-

    clude a total of four XOR gates plus two multiplexers. There-

    fore, a straightforward implementation of the circuit ofFig. 8(a)

    would require a maximum delay of four XOR gates.

    An optimized implementation of thisfirst timing critical cir-

    cuit can be obtained by embedding the two XORgates driven by

    in the 4-2 compressor. This is not straightforward, since (due

    to redundancy of the carry-save arithmetic) different Boolean

    equation sets exist which provide the same arithmetic function

    of a 4-2 compressor. We have found that an optimal solution can

    be obtained starting from the Boolean equations set of the 4-2compressor introduced by Ghoshet al.[29], and embedding the

    XORgates in the circuit, as shown in the following equations:

    (25)

    Fig. 8(b) shows the gate level implementation of (25). Our

    circuit exhibits only three XOR

    gates on the critical path,highlighting an evident advantage in terms of speed with

    respect to the implementation of Fig. 8(a). Moreover, the

    circuit of Fig. 8(b), requiring a total of five XOR gates plus

    two multiplexers, results in one less XOR gate with respect to

    the implementation of Fig. 8(a) using the state-of-the-art 4-2

    compressor of Hsiao et al.[28].

    Let us now consider the second timing critical circuit of

    Fig. 9(a), corresponding to the overflow prevention logic, on

    the left-hand side ofFig. 6. A straightforward implementation

    of the circuit would present four gates on the critical path (by

    assuming the delay of an half adder comparable to the delay of

    an XOR gate). An optimized implementation, with a delay of

    three XORgates can be obtained by exploiting the redundancyof carry-save arithmetic. In fact, the two half adders surrounded

  • 7/27/2019 Dds Cordic

    7/10

    DE CAROet al.: A 3 80 MHz DIRE CT DIGITA L S YNT HES IZ ER/MIXER WITH HYBR ID CORD IC ARC HITEC TURE IN 0 .2 5 m CMO S 15 7

    Fig. 8. Optimal implementation of thefirst timing critical block inFig. 6.(a) Logical function. (b) Detailed implementation with simple gates.

    Fig. 9. Optimal implementation of the second timing critical block inFig. 6. (a) Logical function. (b) Detailed implementation with simple gates.

    by the dashed line in Fig. 9(a)are described by the following

    Boolean equations:

    (26)

    where . By exploiting the redundancy of the

    carry-save arithmetic, we can rewrite the Boolean equations of

    this block, preserving its arithmetic function, as follows:

    (27)

    Proceeding in a similar way for the second column of half

    adders inFig. 9(a), with some manipulations, we obtain for the

    whole circuit ofFig. 9(a)the following equivalent equations:

    (28)

    where ; .

    The resulting circuit is shown inFig. 9(b),where the critical

    path from all inputs to all outputs is composed of three gates

    (twoXOR and one multiplexer or two XOR and one NANDgate).

    The worst delay from to all outputs is two gates (one XOR

    and one multiplexer). Since the input arrives with a delay

    of one gate [seeFig. 6andFig. 7(b)], this path results again in

    a total delay of three gates.

    VI. MIXEDCMOS-DPL IMPLEMENTATION

    In order to simplify IC design, the DDFSM has been im-plemented by using a standard cell approach, with automatic

    place and routing. To optimize performances special purpose

    cells were designed to implement the timing critical circuits of

    Fig. 8(b)andFig. 9(b). These circuits, being composed mainly

    by XORgates and multiplexers, are well suited for a pass tran-sistor logic implementation. Having high speed operation as

    our main target, we employed the double-pass-transistor (DPL)

    logic style[30], as shown inFig. 10.

    DPL is a double-rail logic. In the developed cells, each

    input is converted from single to dual rail by using a couple

    of inverters. In this way passgate inputs, that are not suited

    for the timing models used by the timing analysis tools, are

    also avoided. The inverters 15 in the circuit ofFig. 10(b)and

    the inverters 16 in Fig. 10(c) increase the circuit speed by

    limiting the maximum number of series transistors. Moreover,

    the inverters 12 inFig. 10(b) make the propagation delay of

    the Carry output independent from the capacitive load on the

    Sumoutput. A similar consideration applies to the inverters 12and 34 inFig. 10(c).In this way the developed DPL circuits

    are fully compatible with the other full-CMOS standard cells

    of the library.

    It is worth noting that not all gates have to be dual rail.

    The gates which drive the outputs, both in Fig. 10(b) and in

    Fig. 10(c), can be single rail. Also the XOR gate which drives

    the single rail multiplexer that calculates inFig. 10(b)can

    be implemented in a single rail style.

    The number of transistors, the propagation delay and

    the power dissipation obtained by employing the proposed

    DPL-CMOS design style are reported inTable II. For compar-

    ison, the same table reports also the performances achievableby using a standard cell library with only full-CMOS logic,

  • 7/27/2019 Dds Cordic

    8/10

    158 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007

    Fig. 10. Transistor level implementation of the special cells ofFig. 8(b)andFig. 9(b).(a) Basic gates implementation. (b) DPL implementation of the circuit ofFig. 8(b).(c) DPL implementation of the circuit ofFig. 9(b).

    TABLE IISIMULATEDPERFORMANCES OFDIFFERENTCORDIC STAGESCONFIGURATIONS BYEMPLOYINGPROPOSEDDPL-CMOS AND FULL-CMOS STYLES

    TABLE III

    EXPERIMENTAL PERFORMANCES OF THESYNTHESIZER/MIXER

    without special cells. All stages have been designed for a

    0.25 m, 2.5 V technology. The analysis of Table II reveals

    that proposed design style allows about a 35% reduction of

    the propagation delay by providing about the same number oftransistors and power dissipation of the full-CMOS realization.

    VII. EXPERIMENTAL RESULTS

    The DDFSM with the optimized carry-save CORDIC archi-

    tecture and the mixed CMOS-DPL design style has been fabri-

    cated on a test chip (see Fig. 11) ina 2.5 V, 0.25 m CMOS tech-

    nology. The DDFSM has been synthesized from a VHDL de-scription, and has been automatically placed and routed by using

    a commercial tool. The DDFSM accepts a 32 bit frequency con-

    trol word, resulting in a frequency resolution of about 0.088 Hz.

    Table III summarizes the main characteristics of the circuit.

    Fig. 12reports the experimental digital output spectrum of the

    DDFSM when operated in DDS mode ( , ),

    showing an SFDR larger than 93 dBc.

    Besides the DDFSMthe chip includes a built in self test struc-

    ture (SA Mixer) and two programmable ring oscillators (RO

    Fast and RO Slow) to make the measurement of DDFSM max-

    imum clock frequency and power dissipation easier. Also in-

    cluded in the chip it is a DDFS which can provide

    inputs to the synthesizer/mixer to test the single and double side-band modulation functionality of the circuit.

  • 7/27/2019 Dds Cordic

    9/10

    DE CAROet al.: A 3 80 MHz DIRE CT DIGITA L S YNT HES IZ ER/MIXER WITH HYBR ID CORD IC ARC HITEC TURE IN 0 .2 5 m CMO S 15 9

    TABLE IV

    COMPARISON WITHRECENTLYPROPOSEDDESIGNS

    Fig. 11. Test chip realized in CMOS 0.25 m technology. The chip includesour optimized synthesizer/mixer (Synth/Mixer) a DDFS, two ring oscillators(RO Fast andRO Slow) andthe built-in self-testlogic(SA Mixer)for easy circuittesting.

    Fig. 12. Output spectrum of the DDFSM in DDSmode ( 0 , );

    MHz.

    Table IIIalso reports the experimental performances of the

    developed DDFSM. The circuit exhibits a very low power

    dissipation (0.40 mW/MHz) with a maximum clock frequency

    slightly lower than 400MHz.

    The experimental performances of the proposed circuit

    are compared in Table IV with the performances of the best

    DDFSMs available in literature based on two stage multiplier

    architecture and implemented with the same 0.25 m tech-

    nology. It can be observed that the developed architecture

    allows a more than three-fold reduction of power dissipation,

    with also a substantial reduction in the silicon area with respectto[8].The circuit in[10], while able to reach a SFDR of 100

    dBc, requires about a 2.32 times larger area with respect to our

    implementation.Table IVshows, moreover, that our circuit is

    able to work correctly up to 385 MHz, whereas the best result

    obtained in literature was 330 MHz.

    VIII. CONCLUSION

    The paper has described in detail the implementation of

    an high performances synthesizer/mixer IC which exploits

    improvements at the algorithmic, architectural and transistorlevels.

    In the novel synthesizer/mixer architecture, the rotation

    operation has been split in three rotations. The first two rota-

    tions use a CORDIC datapath completely realized in carry-save

    arithmetic. This is possible since the directions of the CORDIC

    rotations are computed in parallel by using a little lookup table

    in thefirst rotation and a fast multiply by constant and addition

    circuit in the second rotation. The final (third) rotation is mul-

    tiplier-based, in order to reduce the circuit latency and increase

    the circuit performances.

    The CORDIC datapath has been realized in carry-save arith-

    metic. In this datapath the combined employ of sign extensionprevention, overflow prevention and a novel rounding scheme

    have been presented. At the transistor level a design style that

    jointly uses full CMOS and DPL to improve the circuit latency

    has also been described.

    The realized synthesizer/mixer IC shows very good perfor-

    mances in terms of power dissipation, area and maximum clock

    frequency.

    REFERENCES

    [1] L.K. Tanand H. Samueli, A 200 MHzdirectdigital synthesizer/mixerin 0.8 m CMOS, IEEE J. Solid-State Circuits, vol. 30, no. 3, pp.193200, Mar. 1995.

    [2] B. G. Goldberg, Direct Digital Frequency Synthesis Demystified.

    Eagle Rock, VA: LLH Technology Publishing, 1999.[3] J. Vankka and K. Halonen, Direct Digital Synthesizers: Theory, Design

    and Applications. Norwell, MA: Kluwer Academic, 2001.

    [4] S. Nahm, K. Han, and W. Sung, A CORDIC-based digital quadra-ture mixer: comparison with a ROM-based architecture, in Proc. IEEE

    ISCAS, 1998, pp. 385388.[5] G. Gielis, R. Van de Plassche, and J. Van Valburg,A 540 MHz 10-b

    polar to Cartesian converter,IEEE J. Solid-State Circuits, vol. 26, no.11, pp. 16451650, Nov. 1991.

    [6] Y. Ahn, S. Nahm, and W. Sung,VLSI design of a CORDIC-basedderotator,in Proc. IEEE ISCAS, 1998, pp. 449452.

    [7] J. E. Volder,The CORDIC trigonometric computing technique,IRETrans. Electron. Comput., vol. EC-8, no. 3, pp. 330334, Sep. 1959.

    [8] A. Torosyan, D. Fu, and A. N. Wilson,A 300 MHz direct digital syn-thesizer/mixer in 0.25

    m CMOS, IEEE J. Solid-State Circuits, vol.38, no. 6, pp. 875887, Jun. 2003.

    [9] D.Fu and A.N. Wilson, A high-speed processor for digital sine/cosinegeneration and angle rotation, in Proc. 42nd Asilomar Conf. Signal,Systems and Computers, 1998, vol. 1, pp. 177181.

  • 7/27/2019 Dds Cordic

    10/10

    160 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007

    [10] Y. Song and B. Kim,A quadrature digital synthesizer/mixer architec-ture using fine/coarse coordinate rotation to achieve 14-b input, 15-boutput, and 100-dBc SFDR,IEEE J. Solid-State Circuits, vol. 39, no.11, pp. 18531861, Nov. 2004.

    [11] F. Curticapean and J. Niittylahti,An improved digital quadrature fre-quency down-converter architecture,in 35th Asilomar Conf. Signals,Systems and Computers, Nov. 2001, pp. 13181321.

    [12] D. De Caro, N. Petra, and A. G. M. Strollo,A 380 MHz, 150 mW

    direct digital synthesizer/mixer in 0.25

    m CMOS, in IEEE ISSCCDig. Tech. Papers, 2006, pp. 258259.[13] H. T. Nicholas and H. Samueli,An analysis of the output spectrum of

    direct digital frequency synthesizers in the presence of phase accumu-lator truncation, in Proc. 41st Annu. Frequency Control Symp., May1987, pp. 495502.

    [14] A. Torosyan and A. N. Willson, Jr.,Analysis of the output spectrumfor direct digital frequency synthesizers in the presence of phase trun-cation andfinite arithmetic precision,in Proc. 21th Symp. Image andSignal Processing and Analysis, 2001, pp. 458463.

    [15] F. Curticapean and J. Niittylahti,Exact analysis of spurious signals indirectdigital frequency synthesizers due to phase truncation,Electron.

    Lett., vol. 39, no. 6, pp. 499 501, Mar. 2003.[16] J. Vankka,Methods of mappingfrom phase to sine amplitude in direct

    digital synthesis,IEEE Trans. Ultrason. Ferroelectr. Freq. Contr., vol.44, no. 2, pp. 526534, Mar. 1997.

    [17] H. M. Ahmed,Signal processing algorithms and architectures,Ph.D.

    dissertation, Dept. Electr. Eng., Stanford Univ., Stanford, CA, Dec.1981.

    [18] , Efficient elementary function generation with multipliers, inProc. 19th Symp. Computer Arithmetic, Sep. 1989, pp. 5259.

    [19] T. G. Noll, Carry-save arithmetic for high speed digital signal pro-cessing,in Proc. IEEE ISCAS, 1990, pp. 982986.

    [20] N. Takagi, T. Asada, and S. Yajima, Redundant CORDIC methodswith a costant scale factor for sine and cosine computation, IEEETrans. Comput., vol. 40, no. 9, pp. 989995, Sep. 1991.

    [21] T. B. Juang, S. F. Hsiao, and M. Y. Tsai, Para-CORDIC: parallelCORDIC rotation algorithm, IEEE Trans. Circuits Syst. I: Reg. Pa-

    pers, vol. 51, no. 8, pp. 15151524, Aug. 2004.[22] S. Wang, V. Piuri, and E. E. Swartzlander,Hybrid CORDIC algo-

    rithms, IEEE Trans. Comput., vol. 46, no. 11, pp. 12021207, Nov.1997.

    [23] J. F. Ardekani,M 2 N booth encoded multiplier generator using op-timized wallace trees, IEEE Trans. Very Large Scale Integr. (VLSI)Syst., vol. 1, no. 2, pp. 120125, Jun. 1993.

    [24] M. Nagamatsu, S. Tanaka, J. Mori, K. Hirano, T. Noguchi, and K.Hatanaka, A 15-ns 32 2 32-b CMOS multiplier with an improvedparallel structure, IEEE J. Solid-State Circuits, vol. 25, no. 2, pp.494497, Apr. 1990.

    [25] J. Mori, M. Nagamatsu, M. Hirano,S. Tanaka,M. Noda, Y. Toyoshima,K. Hashimoto, H. Hayashida, and K. Maeguchi, A 10-ns 54 2 54-bparallel structured full array multiplier with 0.5-

    m CMOS tech-

    nology,IEEE J. Solid-State Circuits, vol. 26, no. 4, pp. 600606, Apr.1991.

    [26] G. Goto, T. Sato, M. Nakajima, and T. Sukemura,A 5 42 54-b regu-larly structured tree multiplier, IEEE J. Solid-State Circuits, vol. 27,no. 9, pp. 12291236, Sep. 1992.

    [27] N. Ohkubo, M. Suzuki, T. Shinbo, T. Yamanaka, A. Shimizu, K.Sasaki, and Y. Nakagome,A 4.4-ns CMOS 54 2 54-b multiplier usingpass-transistor multiplexer, IEEE J. Solid-State Circuits, vol. 30, no.

    3, pp. 251257, Mar. 1995.[28] S. F. Hsiao, M. R. Jiang, and J. S. Yeh, Design of high speed low-

    power 3-2 counter and 4-2 compressor for fast multipliers,Electron.Lett., vol. 34, no. 4, pp. 341 342, Feb. 1998.

    [29] D. Ghosh, S. K. Nandy, and K. Parthasarathy,TWTXBB: a low la-tency, high throughput multiplier architecture using a new 4-2 com-pressor,in Proc. 7th Int. Conf. VLSI Design, Jan. 1994, pp. 7782.

    [30] M. Suzuki, N. Ohkubo, T. Shinbo, T. Yamanaka, A. Shimizu, K.Sasaki, and Y. Nakagome, A 1.5-ns 32-b CMOS ALU in doublepass-transistor logic,IEEE J. Solid-State Circuits, vol. 28, no. 11, pp.11451151, Nov. 1993.

    [31] B. Parhami, Computer Arithmetic: Algorithms and Hardware De-signs. Oxford, U.K.: Oxford Univ. Press, 1999.

    [32] Y. H. Hu,The quantization effects of the CORDIC algorithm,IEEETrans. Signal Process., vol. 40, no. 4, pp. 834 844, Apr. 1992.

    [33] S. Y. Park and N. I. Cho,Fixed-point error analysis of CORDIC pro-cessor based on the variance propagation formula, IEEE Trans. Cir-cuits Syst. I: Reg. Papers, vol. 51, no. 3, pp. 573584, Mar. 2004.

    [34] N. H. E. Weste and K. Eshragian, Principles of CMOS VLSI Design .

    Reading, MA: Addison-Wesley, Jan. 1993, 0201533766.[35] J. Proakis, Digital Communications, 4th ed. New York: McGraw-Hill, Aug. 2000.

    Davide De Caro (M05) was born in Naples, Italy,on February 9, 1973. He received the M.S. degree inelectronic engineering with honors in 1999, and thePh.D. degree in electronic engineering and computerscience in 2003, both from the University of Naples

    Federico II, Italy.He has worked in the area of digital integrated

    VLSI circuit design for the last eight years. SinceMarch 2003, he has been a Researcher at the De-

    partment of Electronics and TelecommunicationEngineering of the University of Naples, Italy, where

    he is working on high-performance flip-flops (including both low-power and

    high-speed structures), VLSI implementation of arithmetic circuits (squarers,fixed-width multipliers, ReedSolomon decoders, Galois-field multipliers),direct digital frequency synthesizers and digital mixers.

    Dr. De Caro is author or coauthor of more than 30 technical papers in interna-tionaljournals and refereed international conferences. He acted as a reviewerforIEEE TRANSACTIONS ONCIRCUITS ANDSYSTEMSI and IEEE TRANSACTIONS

    ON VLSI SYSTEMS.

    Nicola Petra (S05) was born in 1974 in Naples,Italy. He received the M.S. degree in electronicengineering with honors in 2002 from the Universityof Naples Federico II. He is presently workingtowards the Ph.D. degree at the Department ofElectronics Engineering of the University of Naples

    Federico II.His research interests include design of digitalVLSI circuits for telecommunications and high-per-formance arithmetic circuits.

    Antonio Giuseppe Maria Strollo (M05SM06)was born in 1963. He received the Laurea degree(with honors) in electronic engineering in 1988,

    and the Ph.D. degree in electronic engineering andcomputer science, in 1992, both from the Universityof NaplesFederico II, Italy.

    From 1990 to 1998, he was a full time Researcherat the Department of Electronic Engineering of theUniversity of Naples. In November 1998, he was

    appointed Associate Professor at the Universityof Naples Federico II. Since November 2002,

    he has been Full Professor at the same University. Currently, he is the head

    of the Department of Electronic and Telecommunication Engineering of theUniversity of Naples Federico II. His initial research activities covered thearea of bipolar devices modelling and power electronics. His current researchinterests include design and analysis of VLSI digital circuits. In particular, he isworking on: advanced architectures for direct-digital frequency synthesis and

    for digital mixers, high-performance arithmetic circuits and high performanceand low-powerflip-flops. He has authored or co-authored more than 100 paperson international journals and refereed conferences.