84
Institutionen för systemteknik Department of Electrical Engineering Examensarbete Utilizing FPGAs for data acquisition at high data rates Examensarbete utfört i Electronics vid Tekniska högskolan i Linköping av Mats Carlsson LITH-ISY-EX--09/4298--SE Linköping 2009 Department of Electrical Engineering Linköpings tekniska högskola Linköpings universitet Linköpings universitet SE-581 83 Linköping, Sweden 581 83 Linköping

Full Text 02

Embed Size (px)

Citation preview

Page 1: Full Text 02

Institutionen för systemteknikDepartment of Electrical Engineering

Examensarbete

Utilizing FPGAs for data acquisition at high datarates

Examensarbete utfört i Electronicsvid Tekniska högskolan i Linköping

av

Mats Carlsson

LITH-ISY-EX--09/4298--SE

Linköping 2009

Department of Electrical Engineering Linköpings tekniska högskolaLinköpings universitet Linköpings universitetSE-581 83 Linköping, Sweden 581 83 Linköping

Page 2: Full Text 02
Page 3: Full Text 02

Utilizing FPGAs for data acquisition at high datarates

Examensarbete utfört i Electronicsvid Tekniska högskolan i Linköping

av

Mats Carlsson

LITH-ISY-EX--09/4298--SE

Handledare: SupervisorRashad Ramzan, isy, Linköpings universitet

Examinator: ExaminerChrister Svennson, isy, Linköpings universitet

Linköping, 27 March, 2009

Page 4: Full Text 02
Page 5: Full Text 02

Avdelning, InstitutionDivision, Department

Division of Automatic ControlDepartment of Electrical EngineeringLinköpings universitetSE-581 83 Linköping, Sweden

DatumDate

2009-03-27

SpråkLanguage

� Svenska/Swedish� Engelska/English

RapporttypReport category

� Licentiatavhandling� Examensarbete� C-uppsats� D-uppsats� Övrig rapport�

URL för elektronisk versionhttp://www.control.isy.liu.se

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-17820

ISBN—

ISRNLITH-ISY-EX--09/4298--SE

Serietitel och serienummerTitle of series, numbering

ISSN—

TitelTitle

Användning av FPGAer för insamling av höghastighetsdataUtilizing FPGAs for data acquisition at high data rates

FörfattareAuthor

Mats Carlsson

SammanfattningAbstract

The aim of this thesis was to configure an FPGA with high speed ports tocapture data from a prototype 4 bit

∑∆ analogue-to-digital converter sending

data at a rate of 2.4 Gbps in four channels and to develop a protocol for transferringthe data to a PC for analysis. Data arriving in the four channels should be sortedinto 4 bit words with one bit taken successively from each of the channels. Arequirement on the data transfer was that the data in the four channels shouldarrive synchronously to the FPGA. A Virtex-5 FPGA on a LT110X platform wasused with RocketTMIO GPT transceivers tightly integrated with the FPGA logic.Since the actual DUT (Device Under Test) was not in place during the work, thetransceivers of the FPGA were used for both sending and receiving data. Thetransmission was shown to be successful for both eight and ten bit data widths.At this stage a small skew between the data in the four channels was observed.This was solved by storing the information in separate memories, one for each ofthe channels, to make possible to later form the 4 bit words in the PC (MatLab).The memories were two port FIFOs writing in data at 240 MHz (10 bit data width)or 300 MHz (8 bit data width) and read out at 50 MHz.

NyckelordKeywords FPGA, Virtex-5, RocketIO GTP, Xilinx, Transceiver, Verilog, VHDL

Page 6: Full Text 02
Page 7: Full Text 02

Abstract

The aim of this thesis was to configure an FPGA with high speed ports tocapture data from a prototype 4 bit

∑∆ analogue-to-digital converter sending

data at a rate of 2.4 Gbps in four channels and to develop a protocol for transferringthe data to a PC for analysis. Data arriving in the four channels should be sortedinto 4 bit words with one bit taken successively from each of the channels. Arequirement on the data transfer was that the data in the four channels shouldarrive synchronously to the FPGA. A Virtex-5 FPGA on a LT110X platform wasused with RocketTMIO GPT transceivers tightly integrated with the FPGA logic.Since the actual DUT (Device Under Test) was not in place during the work, thetransceivers of the FPGA were used for both sending and receiving data. Thetransmission was shown to be successful for both eight and ten bit data widths.At this stage a small skew between the data in the four channels was observed.This was solved by storing the information in separate memories, one for each ofthe channels, to make possible to later form the 4 bit words in the PC (MatLab).The memories were two port FIFOs writing in data at 240 MHz (10 bit data width)or 300 MHz (8 bit data width) and read out at 50 MHz.

SammanfattningSyftet med examensarbetet var att konfigurera en FPGA med höghastighetsportarså att data från en prototyp av en 4 bitars Σ∆ analog-till-digital omvandlare kansamlas in med en hastighet av 2.4 Gbps i var och en av fyra kanaler och attutveckla ett protokoll för överföring av dessa data från FPGAn till en PC föranalys. Insamlade data ska sorteras i 4 bitars ord med en bit successivt tagenfrån var och en av kanalerna. Ett krav på dataöverföringen var att data i de fyrakanalerna skulle anlända synkront till FPGAn. En Virtex-5 FPGA på en LT110Xplattfrom användes med RocketIO GTP transceivrar tätt integrerade med FPGAlogiken. Då utrustningen som skulle testas inte var tillgänglig under tiden arbetetutfördes användes FPGAns transceivrar till att både sända och ta emot data.Överföring av data med både 8 och 10 bitars datavidd uppnåddes framgångsrikt.Data i de fyra kanalerna visade sig dock inte anlända synkront till mottagaren.Detta problem löstes genom att lagra informationen i separata minnen, ett förvarje kanal, överföra data från minnena till PCn och där med hjälp av MatLabsortera dem till 4 bitars ord. Som minnen användes tvåportars FIFOn där dataskrivs in med en hastighet av 240 MHz (10 bitars datavidd) eller 300 MHZ (8bitars datavidd) och läses ut med en hastighet av 50 MHz.

v

Page 8: Full Text 02
Page 9: Full Text 02

Acknowledgments

I would like to thank Christer Svensson for giving me the opportunity to do thisinteresting thesis work and also for leading me in writing the report. I thankmy Supervisor Rashad Razam for all help and support during my work with thethesis, Anton Blad for help in the lab and my mother Gudrun Alm Carlsson forproofreading the manuscript. Finally, I want to thank my fiancée Kajsa Tibell forsupport and letting me work late in the evenings and my son Darrell for givingme opportunities to rest and forget about the work.

vii

Page 10: Full Text 02
Page 11: Full Text 02

Contents

1 Introduction 11.1 The task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Structure of the GTP transceiver . . . . . . . . . . . . . . . . . . . 21.3 Tests used to understand the functioning of the transceiver . . . . 31.4 Communication between the FPGA and the PC - development of a

protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background 72.1

∑∆-converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Basic principles of a∑

∆-converter . . . . . . . . . . . . . 72.1.2 The

∑∆ - chip . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Xilinx Virtex-5 LX110T FPGA . . . . . . . . . . . . . . . . . . . . 92.3 Virtex-5 FPGA RocketIO GTP transceiver . . . . . . . . . . . . . 10

2.3.1 GTP Transmitter (TX) . . . . . . . . . . . . . . . . . . . . 112.3.2 GTP Receiver (RX) . . . . . . . . . . . . . . . . . . . . . . 132.3.3 Shared PMA PLL . . . . . . . . . . . . . . . . . . . . . . . 152.3.4 Clock domains . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Development board . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4.1 Superclock module . . . . . . . . . . . . . . . . . . . . . . . 192.4.2 Serial interface (RS232) . . . . . . . . . . . . . . . . . . . . 20

2.5 ISE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Implementations 233.1 The New project Wizard . . . . . . . . . . . . . . . . . . . . . . . . 233.2 The RocketIO GTP Transceiver Wizard . . . . . . . . . . . . . . . 25

3.2.1 Generated files . . . . . . . . . . . . . . . . . . . . . . . . . 263.2.2 Clock Connections . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Loopbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.4 Near-End PCS Loopback . . . . . . . . . . . . . . . . . . . . . . . 29

3.4.1 PRBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4.2 Own produced data . . . . . . . . . . . . . . . . . . . . . . 31

3.5 Near-End PMA Loopback . . . . . . . . . . . . . . . . . . . . . . . 323.5.1 Comma detect . . . . . . . . . . . . . . . . . . . . . . . . . 33

ix

Page 12: Full Text 02

x Contents

3.6 Far-End PMA Loopback . . . . . . . . . . . . . . . . . . . . . . . . 343.7 Structure of each project . . . . . . . . . . . . . . . . . . . . . . . . 343.8 Sending and receiving data . . . . . . . . . . . . . . . . . . . . . . 34

3.8.1 Sending and receiving data over one link . . . . . . . . . . . 353.8.2 Sending and receiving data over two links . . . . . . . . . . 353.8.3 Sending and receiving data over four links . . . . . . . . . . 35

3.9 Protocol between PC and FPGA . . . . . . . . . . . . . . . . . . . 373.9.1 Prerequisite for creating the protocols . . . . . . . . . . . . 373.9.2 Communication between PC and FPGA . . . . . . . . . . . 393.9.3 Protocol for eight bit data over one line . . . . . . . . . . . 403.9.4 Protocol for ten bit data over one line . . . . . . . . . . . . 413.9.5 Protocol for eight bit data over 4 lines . . . . . . . . . . . . 423.9.6 Protocol for ten bit data over 4 lines . . . . . . . . . . . . . 46

3.10 MatLab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.11 Channel bonding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.12 Two dual tiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4 Results 554.1 Design of the transceivers . . . . . . . . . . . . . . . . . . . . . . . 554.2 Design of the FIFO registers . . . . . . . . . . . . . . . . . . . . . . 594.3 MatLab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.4 Design of the protocol . . . . . . . . . . . . . . . . . . . . . . . . . 604.5 Complete solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5 Discussion 63

6 Future work 67

7 Conclusion 69

8 References 71

Page 13: Full Text 02

Chapter 1

Introduction

1.1 The taskThe main purpose of this thesis is to make possible that data from a Σ∆ analog-to-digital converter (in the following called Σ∆-converter) over four channels at2.4 Gbps can be received by an FPGA (Field Programmable Gate Array) andsubsequently transmitted to a PC (Personal Computer) for analysis. The data inthe four channels form 4 bit words read in the vertical direction, see Figure 1.1.These 4 bit words are the ones to be analyzed in the PC.

A subproject was to see if ten bit words could be received and fed into theFPGA logic. This would increase the possibilities in what way the data can bereceived (see Background). The chip with the Σ∆-converter was not availableduring the time the thesis work was going on. Instead, the transmitter of thetransceiver in the FPGA was used for sending data.

Virtex 5

PC

2.4Gbps

1 0 0 1 0 1 0 1 1 1 1

1 1 1 1 0 0 1 1 1 1 1

1 0 1 1 1 1 0 0 0 1 0

0 0 0 1 0 1 0 1 0 1 1

Figure 1.1. Block diagram illustrating the task of this work.

At this point the idea was to receive the data synchronously in the four receiversat the dual tiles, combine the data into four bit words in the FPGA logic and builda protocol in order to transfer these four bit words to the PC for analysis.

1

Page 14: Full Text 02

2 Introduction

A Virtex-5 FPGA on the LT110X platform was available. This platform containsRocketIO GTP (Giga Transceiver Peripheral) transceivers closely integrated withthe FPGA logic.

1.2 Structure of the GTP transceiverFigure 1.2 shows a block diagram of the transceiver containing one transmitter andone receiver. Data are transmitted and received serially. In the receiver, the dataare first parallelized in the SIPO (Serial In, Parallel Out) before being transferredto the FPGA logic. Data sent from within the FPGA (PRBS (Pseudo RandomBit Sequence) generator or FPGA logic) arrives in parallel at the PISO (ParallelIn, Serial Out) where they are serialized before being sent. In this project, serialdata are transmitted and received at a rate of 2.4 Gbps. The rates of the parallelclocks inside the transceiver are 240 MHz (sending and receiving 10 bit data) or300 MHz (sending and receiving 8 bit data). Most of the FPGA logic, driven by aseparate clock oscillator on the board (see Background), works at a rate of 50 MHz.

NEAR END PCS-LOOPBACK

NEAR END PMA-LOOPBACK

FAR END PMA-LOOPBACK

Figure 1.2. Block diagram over the structure of the transceiver [6]. The three loopbacksused in this work are indicated (see section 1.3).

Clocking architecture

Each transceiver has a shared PLL (Phase Locked Loop) fed by a referenceclock. The PLL provides both the transmitter and the receiver with two clocks,one serial and one parallel. It also provides a second transceiver with serial and

Page 15: Full Text 02

1.3 Tests used to understand the functioning of the transceiver 3

parallel clocks, which explains why it is called a shared PLL (shared between twotransceivers forming a dual tile). The parallel clocks produced by the PLL arecalled XCLK in the block diagram (see Figure 1.2). The clocks in the ’PCS paral-lel clock’ section and the ’FPGA parallel’ section (the user clocks) of the receiverand transmitter are sourced either by the XCLK in the PMA parallel section orby an independent reference clock (REFCLKOUT).

1.3 Tests used to understand the functioning ofthe transceiver

To understand how the transceiver works, three out of four loopbacks suggested byXilinx were used. The first loopback, the ’Near-end PCS (Physical Coding Sub-layer) loopback’, tests the parallel data section. In doing this loopback, the XCLKat the transmitter side (TXOUTCLK) was synchronized with the RXUSERCLK,since XCLK (RXRECCLK) at the receiver side is not necessarily in phase withthe TXOUTCLK, see figure 1.3.

TXOUTCLK

DATA

RXRECCLK

RXUSRCLK

Figure 1.3. The figure shows how the synchronization works.

The second loopback, the ’Near-end PMA (Physical Medium Attachment)loopback’, also tests the section where the data is serialized. The third loop-back, the ’Far-end PMA loopback’, includes the whole transceiver sending andreceiving the data at 2.4 Gbps. The loopbacks are indicated in the block diagramof the transceiver in figure 1.2.

1.4 Communication between the FPGA and thePC - development of a protocol

Received data should be sent to the PC for analysis using the serial interface(RS (Recommended Standrad) 232). MatLab was chosen for the communicationon the PC side. The MatLab program should communicate with the FPGA via

Page 16: Full Text 02

4 Introduction

a protocol (describing how data in and out from the FPGA is handled). Forthis communication a UART (Universal Asynchronous Receiver/Transmitter) isneeded to convert received serial data to parallel data and inversely for data whichis to be sent. Since there was no UART available on the FPGA board, a UARThad first to be implemented in the FPGA.

The protocol was developed in several steps. The first protocol was devel-oped in order to find out how the communication between the FPGA and thePC worked via the UART. The PC sends commands to the protocol, which re-sponds by sending one or two bytes back to the PC (for more information, see 3.9).

At this stage, it had been noticed that there was a skew between the linescontaining the data (see 3.8.3, skew registration). Because of this skew, the orig-inal idea of how the four bit words should be formed and transferred to the PCcould not work. If, for instance, the data in the third line in Figure 1.1 arrives onesample faster than in the other lines the correct four bit word ’1011’ will insteadbe ’1001’.

Because the serial communication with the RS232 transfers the data in a veryslow speed (11.52 kHz) a memory is needed so that no information gets lost. Tosolve the problem with the skew mentioned above, FIFO (First In First Out) mem-ories were created for each of the channels. Because the protocol inside the FPGAis clocked by a 50 MHz oscillator and the data from the receivers is coming at240 or 300 MHz, two port FIFOs were needed. A two port FIFO has independentwrite and read clocks so that information can be written into the FIFO at one rateand read from it at another. This makes it possible to read out from the FIFOat only 50 MHz without loosing information. Such FIFOs were generated usingthe ISE (Integrated Software Environment) program (software tool provided byXilinx). The data from each memory is transferred to the PC. Once within thePC the bits in each memory will be formed to the correct four bit words (see 4.2).

To start with a protocol that can transfer eight bit data over one link (receivedfrom one receiver of the dual tile) to the PC was created A similar protocol wasthen created for the transfer of ten bit data words. The main difference comparedto the protocol for eight bit data is that this protocol has to send the ten bitword divided between two bytes to the PC. It was successfully shown that a tenbit word received from the dual tile remains ten bits when fed into the FPGA logic.

The task could finally be completed with four receivers combined with four twoport FIFOs and with associated protocol. Once the task had been completed, itwas possible to see that no disturbances existed between the links. Furthermore,the skew that had been noticed, could now be measured and was found to be upto three samples.

A test was also made to eliminate the skew between the channels using channelbonding. This was done with a counter producing the data. The test was suc-

Page 17: Full Text 02

1.5 Abbreviations 5

cessful for its purpose and channel bonding could later on be useful if the skewdepends on start up problems. It cannot be used with the present Σ∆-chip thatwill arrive since this cannot generate suitable test signals for measuring the skewand make channel bonding work. The idea is that this could be possible with anew Σ∆-chip that includes a counter which generates data to the four channelsand with channel bonding the four receivers are synchronized. After this the Σ∆-converter is connected to the receivers instead of the counter.

1.5 AbbreviationsADC Analogue to Digital Converter

ASIC Application Specific Integrated Circuit

CDR Clock Data Recovery

CML Current Mode Logic

CMOS Complementary Metal Oxide Semiconductor

DAC Digital to Analogue Converter

DIP Dual In-line Package

FIFO First In First Out

GTP Giga Transceiver Peripheral

HDL Hardware Description Language

ISE Integrated Software Environment

LED Light Emitting Diode

LVCMOS Low Voltage Complementary Metal Oxide Semiconductor

LVDS Low Voltage Differential Signal

OSR Over Sampling Ratio

PC Personal Computer

PCS Physical Coding Sublayer

Page 18: Full Text 02

6 Introduction

PLL Phase Locked Loop

PMA Physical Medium Attachment

PRBS Pseudo Random Bit Sequences

RS232 Recommended Standard 232

SMA Sub Miniature version A

UART Universal Asynchronous Receiver/Transmitter

UCF User Constraint File

USB Universal Serial Bus

VHDL VHSIC (Very High Speed Integrated Circuit) Hardware DescriptionLanguage

Page 19: Full Text 02

Chapter 2

Background

This chapter is structured as follows. Principles of a∑

∆-converter and the∑

∆-chip are described in section 2.1. The Virtex-5 FPGA and the RocketIO GTP usedin the project are described in Sections 2.2 and 2.3. The associated developmentboard is described in Section 2.4 followed by a short description of the ISE softwarepackage in Section 2.5. Sources of information are the references [1-9].

2.1 ∑∆-converter

2.1.1 Basic principles of a ∑∆-converter

+

-+

Analog input Digital output

DAC

ADC

Power Power

fs

M fs/2fs/2

A B

Power C

M fs/2 fs/2

fs/2

Figure 2.1. Block diagram of a∑

∆-ADC (Analogue-to-Digital Converter). Inserteddiagrams A, B, C show the power of the signal (represented by a single frequency) andnoise distribution using A: Nyquist sampling rate, fs, B: oversampling with sampling rateMfs and C: oversampling with feedback loop.

7

Page 20: Full Text 02

8 Background

The Nyquist theorem states that when analogue signals are digitalized, thesampling frequency (Nyquist frequency, fs) must be at least twice that of thehighest frequency ( fs2 ) in the analogue signal, if there should be no loss in theinformation of the signal. The

∑∆-converter uses oversampling which means

that it samples the analogue signal using a much higher frequency (Mfs) than theNyqvist theorem requires (see figure 2.1).

The result of this oversampling is that the quantization noise gets distributedoutside the signal frequency band so that the noise level in this band is reduced(see diagrams A and B in figure 2.1). The feed back loop with the DAC (Digitalto Analogue) converter acts as a low pass filter for the signal and a high pass filterfor the noise. In this way, the noise is pushed towards higher frequencies and thenoise in the signal frequency band further reduced (see diagram C in figure 2.1).By feeding the digital out put through a low pass filter, the SNR (Signal to NoiseRatio) is drastically improved compared to case A (figure 2.1). Finally, the lowpass filtered signal is down sampled to get back to the Nyquist frequency (theprocess of low pass filtering and down sampling is called decimation).

2.1.2 The ∑∆ - chip

@ BBf

Test ChipMerged Wideband

I

Q

LO+LO-

LPF

LNA +Mixer

LPF

4Bit ΣΔ ADC

1 n(Φ -Φ ) CLK0.5-6GHz

@2.4GHZCLK

Gm

Gm

PRBSX

OR2.4Gbs

2.4GbsClockingCircuit

M

M4Bit

ΣΔ ADC XO

R

50Ω

(0 -10MHz)IFf FPGA

@ BBf

Test ChipMerged Wideband

I

Q

LO+LO-

LPF

LNA +Mixer

LPF

4Bit ΣΔ ADC

1 n(Φ -Φ ) CLK0.5-6GHz

@2.4GHZCLK

Gm

Gm

PRBS

XO

R2.4Gbs

2.4GbsClockingCircuit

M

M4Bit

ΣΔ ADC XO

R

50Ω

(0 -10MHz)IFf FPGA

Figure 2.2: Block diagram of the Σ∆-chip with two parallel 4 bit Σ∆-converters.The chip contains XOR gates which offer the possibility to choose if the CLKsignals, data from the Σ∆-converters or the PRBS (Pseudo Random Number Se-quence) should be sent. The data are captured by an FPGA, low pass filtered anddownsampled [4].

The aim with the∑

∆-chip in figure 2.2, which underlies the task in thisthesis, is to receive signals at a bandwidth of 20 MHz. By utilizing a samplingfrequency of 2.4 GHz, an oversampling of OSR = 2400

2 10 = 120 (Over SamplingRatio) is achieved. This means that a quantizer accuracy of 4 bits is "converted"

Page 21: Full Text 02

2.2 Xilinx Virtex-5 LX110T FPGA 9

to an accuracy of 13.5 bits (4 + 0.2 ∗2 log( 3∗OSR3

Π2 ). A problem is that the data-rate from the quantizer is very high, in this case 2.4 Gwords/s with 4 bit words.Therefore, very fast digital logic is needed to capture this signal. The objective ofthis thesis is to utilize and configure an FPGA with high speed ports to capturethis data and transfer them to a PC for analysis. The analysis aims for testingthe Σ∆-receiver system and to investigate possible algorithms to correct eventualerrors in, for example, the DAC (Digital-to-Analogue Converter) in the system [1].

2.2 Xilinx Virtex-5 LX110T FPGAGeneral description of FPGAAn FPGA is a semiconductor device containing ’logic blocks’ with interconnectswhich can be programmed by the customer to create the logical functions neededfor the purpose. This is why it is called ’field programmable’. FPGAs are usuallyslower and draw more power than ASICs (Application Specific Integrated Circuit)which are designed for a particular application. The advantage is that FPGAs aremore flexible since they can be re-programmed to suit different designs. In mod-ern developments, the logic blocks and interconnects of traditional FPGAs arecombined with embedded systems such as memories, microprocessors and relatedperipherals.

Xilinx Virtex-5 LXT FPGAThe FPGA in the Xilinx Virtex-5 family is specially design for high speed applica-tions and is therefore used in this work. Members of this family include individualfeatures to suit different applications. Thus the Virtex-5 LTX contains a Rocke-tIO GTP transceiver tightly integrated with the FPGA logic. The version LX110Tused in this work contains 16 GTPs (8 dual tiles).

The Virtex-5 family was introduced in 2006. Compared to its predecessor theVirtex-4, it is faster. One reason for this is that advances in CMOS (Comple-mentary Metal Oxide Semiconductor) technology now allows the gate length tobe decreased from 90 nm to 65 nm resulting in shorter switching times. A draw-back with the thinner oxide layer is that it is associated with increased leakagecurrents. Therefore the 65 nm technology is only used in those components of thelogic that are critical to speed performance and the 90 nm technology retained forthose which are not [2].

Page 22: Full Text 02

10 Background

2.3 Virtex-5 FPGA RocketIO GTP transceiverThe GTP transceivers are organized as dual tiles. In the dual tile configuration,two transceivers share important functions. Each of the transceiver contains atransmitter (TX) and a receiver (RX). Among the shared functions are the gener-ation of a high-speed serial clock and resets. The dual-tile configuration allows theTX and RX of both transceivers to share a PLL, see Figure 2.3. The PLL reducesthe jitter and the shared configuration reduces the size and power consumption ofthe device.

Shared PMA PLL

IBDUFDSCLKIN

MGTREFCLKP

MGTREFCLKN

TX

RX

TX

RX

GTP0

GTP1

Figure 2.3: Organization of the dual-tile with the two transceivers GTP0 andGTP1, each containing a transmitter (TX) and a receiver (RX). From the SMA(Sub Miniature version A) contacts (see 2.4), a differential clock signal passesthrough the IBDUFDS buffer to form the common clock signal, CLKIN, used asreference clock for the PLL.

Correct clocking and reset behavior are critical for any GTP transceiver design.Use of a high-quality crystal oscillator as reference clock is therefore essential forgood performance. The reference clock feeding one dual-tile can be used to driveneighboring dual tiles. However, to keep the jitter within acceptable margins, nomore than three dual tiles above and three dual tiles below the sourcing tile mustbe used so that the number of dual tiles that can be sourced by a common referenceclock must not exceed seven (see Figure 2.4). An external clock is recommended asreference clock by Xilinx since using a clock from inside the FPGA may, dependingon design, cause increased jitter.

Page 23: Full Text 02

2.3 Virtex-5 FPGA RocketIO GTP transceiver 11

Figure 2.4: A reference clock (CLKIN) can feed up to seven dual tiles withoutcreating unacceptable jitter [6].

The GTP transceiver consists of the PCS and PMA blocks in the transmitter(TX) and the receiver (RX) parts of the dual tiles. The TX and RX internal datapaths are 8 or 10 bits wide. The components of these blocks are explained below.

2.3.1 GTP Transmitter (TX)

The GTP transmitter includes several blocks with opportunities to choose differentpaths for the data. In this section a description of the blocks of interest to thisproject is given.

Page 24: Full Text 02

12 Background

Figure 2.5: Block diagram of the transmitter [6]

TX DriverThe TX Driver is a high speed output buffer that transforms single ended signalsto differential signals. It also includes ’Differential control’ and ’Configurable ter-mination impedance’ to achieve the highest quality of the signal in every situation.The ’Configurable terminations impedance’ is not used in this project. The ’Dif-ferential control’ sets the amplitude of the differential swing that the signal needsto reach the receiver.

Pre-emphasisPre-emphasis control is used to improve the signal. In transmitting the signals be-tween the transmitter and the receiver, the high frequencies are attenuated morethan the lower frequencies. To compensate for this, pre-emphasis can be usedwhich decreases the amplitude of the low frequency signals. The

∑∆-chip is

equipped with this option so it will also be used in this configuration. If the chipdid not have this opportunity, the problem could be solved using the RX EQ block(see 2.3.2).

PMA PLL DriverThis driver provides the transmitter with a high quality, low jitter clock signal.For more details about the PLL, see 2.3.3.

PISOThe PISO (Parallel In Serial Out) block transforms the signal from parallel insidethe transmitter to serial when it is sent. This is done because the signal is lessaffected by external disturbances in serial than in parallel mode.

Page 25: Full Text 02

2.3 Virtex-5 FPGA RocketIO GTP transceiver 13

PolarityThe TX polarity control can be used when there is trouble (created by the hard-ware) to send the signal. It then swaps its polarity.

Phase Adjust FIFO And OversamplingBetween the PMACLK and the TXUSRCLK clock domains (see figure 2.8) therehas to be some circuit to resolve phase differences. This is solved by Xilinx intwo ways, either by using the ’TX-buffer’ or the ’Phase Alignment circuit’. TheTX-buffer is easy to use and is required when using oversampling but it does notgive any benefit in reducing skew between GTP transceivers. If low latency iscritical, the TX-buffer must be bypassed. The phase alignment circuit requiresextra logic and more demanding clock requirements, for example TXOUTCLK(see figure 2.8, XCLK) cannot be used. If more than one dual tile is in use andthey have the same line rate, the phase alignment circuit can reduce skew betweenthem. Oversampling is not used in this project.

PRBS GeneratePRBS stands for ’Pseudo Random Bit Sequence’, which means that a sequence ofbits is created, which looks random but in fact is repeated with a given periodicitygenerated by an algorithm. This makes it possible to send the sequence of bitsand receive it at another location knowing what data will be received. The PRBSsequence is often used in the industry to control the condition of data links. Thereare three standard patterns available. In this project, the 223 − 1 standard is used.In the transmitter there is one PRBS producer (PRBS Generate).

FPGA logicThis block represents the Virtex-5 FPGA, where the logic is written.

2.3.2 GTP Receiver (RX)The blocks contained in the GTP receiver and used in this work are describedbelow and illustrated in figure 2.6.

Figure 2.6: Block diagram of the receiver [6]

Page 26: Full Text 02

14 Background

RX EQThe equalizer uses a separate receive buffer to capture the high frequencies, am-plify them and add them to the original signal. This is used for the same purposeas the Pre-emphasis is used in the transmitter to compensate for that high fre-quencies are more attenuated than lower frequencies during transmission. TheEQ block also contains the CML (Current Mode Logic) of the receiver with pos-sibility to adjust the termination impedance so that it matches the impedance ofthe transmission line with the incoming signal to avoid reflections in the system.The incoming differential transmission lines are both internally terminated withadjustable resistors (50/75 ohm, 100/150 ohm differential) and there is possibilityto choose DC or AC coupling. DC coupling is preferable when the differentialsignals and common mode voltages of the connected devices match each other. Ifthis is not so, AC coupling is usually used and normally achieved by putting acapacitor in the signal path. To make DC coupling work, the LVDS (Low VoltageDifferential Signal) driver must see a 100 ohm termination. Internal AC couplingis an option and used when DC coupling is chosen but the RX termination is setto GND (Ground) (non standard termination voltage).

RX CDRWhen the data is received, its embedded clock signal has to be recovered. TheCDR (’Clock Data Recovery’) is doing this by taking the divided signal from thePLL, PLL_TXDIV SEL_OUT (see figure 2.7) and adjust it to the phase andfrequency of the incoming clock. It cannot be more than 1000 ppm (parts permillion) difference between the line rates of the recovered clock and that of the re-ceiver if this should succeed. There must also be a sufficient number of transitionsin the incoming data stream. Clock recovery contributes to reducing jitter in theincoming data.

PMA PLL DriverThis driver provides the transmitter with a high quality, low jitter clock signal.For more details about the PLL, see 2.3.3.

SIPOThe SIPO (Serial In Parallel Out) block transforms the received serial data toparallel data inside the transmitter.

PRBS CheckWhen the PRBS pattern is used to control a link, the receiver side must have achecker (the PRBS Check), controlling that the data sent from the transmitteris correctly received. The chip including the Σ∆-converter also includes a PRBSgenerator which sends this pattern.

PolarityIf Polarity is used on the transmitter side, the data has to be swapped once morein the receiver to restore the correct form.

Page 27: Full Text 02

2.3 Virtex-5 FPGA RocketIO GTP transceiver 15

Comma detectionIn the transceiver at the PCS section, see figure 2.5 and figure 2.6, data is paral-lel. In the transmitter, the parallel words go through the PISO (Parallel In SerialOut) block and are sent in serial mode. This is because serial information is lessaffected by disturbances outside the FPGA. When the data arrives at the receiverit is serial but is transformed back to parallel inside the receiver (because this isfaster). This is done in the PMA section, see figure 2.6. The data is parallel butthe receiver has no possibility to know where the words start or end. To make thispossible, the transmitter can send a predefined pattern at a regular interval. Thispattern is called a ’comma’ and the logic block searching for this comma is calledcomma detection block, see figure 2.6. When the comma is found by the commadetection block, the receiver knows that the word of the chosen length starts.

RX Elastic BufferOn the receiver side, the Phase adjustment circuit or the RX elastic buffer are usedto resolve phase differences between the PMACLK and the RXUSRCLK domains,see figure 2.8. The RX elastic buffer can also correct frequency differences usingclock correction. However, clock correction does not work with the unsorted dataas from the Σ∆-converter. When clock correction is not used, the transmitter andreceiver should preferably use the same frequency source to avoid problems withthe RX elastic buffer. Frequency differences can alternatively be resolved usingthe phase alignment circuit and the recovered clock (RXRECCLK) to source theRXUSRCLK. The RX Elastic Buffer can be used with 8 or 10 bit words whereasthe phase alignment circuit requires 10 bit words. The RX elastic buffer offers theoption to use channel bonding to synchronize data between different lines.

FPGA logicThis block represents the Virtex-5 FPGA, where the logic is written and the in-formation from the received data will be stored/analyzed.

2.3.3 Shared PMA PLL

Figure 2.7: The function of the shared PMA PLL [6]

Page 28: Full Text 02

16 Background

Each dual tile shares a PMA PLL that is driven from a high quality clock,CLKIN as seen in figure 2.3. It produces parallel and serial clocks for both trans-mitters and receivers in the dual tile. The parallel clocks are used in the PCSsection of the transmitter. The frequency of the PLL clock is calculated as inequation 2.1.

PLLclock = PLL_DIV SEL_FBPLL_DIV SEL_REF ∗ CLKIN (2.1)

For the receiver, the clock rate from the PLL is divided withPLL_RXDIV SEL_OUT_0 for dual tile number zero (GTP0) and withPLL_RXDIV SEL_OUT_1 for dual tile one (GTP1). The divider can take thevalues 1, 2 or 4. The parallel clock rates are obtained by dividing with W. Thevalue of W is 4 when eight bit internal data width is used and 5 using ten bit datawidth. This gives the following equation.

RX_Parallel_Clock = PLLclockPLL_RXDIV SEL_OUT_0/1∗W (2.2)

The RX serial clock rate is obtained by multiplying with two instead of divid-ingPLL_RXDIV SEL_OUT with W. The data is trigged on both edges of the serialclock and this explains the multiplication with two.

RX_Serial_Clock = PLLclockPLL_RXDIV SEL_OUT_0/1 ∗ 2 (2.3)

The transmitter section uses the same PLL clock as the RX section(equation 2.1). PLL clock rate is in both transmitters divided byPLL_TXDIV SEL_COMM_OUT which can take the values 1,2 and 4. Then,for dual tile number zero this clock rate is divided by PLL_TXDIV SEL_OUT_0and for the dual tile number one by PLL_TXDIV SEL_OUT_1, in both caseswith the optional values of 1, 2, 4. The parallel clock rate is then obtained bydividing with W, which gives the equation (2.4)

TX_Parallel_Clock =PLLclock

PLL_TXDIV SEL_COMM_OUT_0/1∗PLL_TXDIV SEL_OUT_0/1∗W (2.4)

The serial clock rate is obtained from equation 2.4 by exchanging the divisionby W with multiplication with two, which gives equation (2.5) for the serial clockrate.

TX_Serial_Clock =PLLclock∗2

PLL_TXDIV SEL_COMM_OUT_0/1∗PLL_TXDIV SEL_OUT_0/1 (2.5)

Page 29: Full Text 02

2.3 Virtex-5 FPGA RocketIO GTP transceiver 17

2.3.4 Clock domainsThe transceiver is divided in separate clock domains for both the transmitter andreceiver (see figure 2.8). There are four clock domains, which are described below.

Figure 2.8: The clock domains [6]

1. ’Serial clock’ where a serial clock (TX/RX) generated from the PLL is run-ning

2. ’PMA parallel clock’ where a parallel clock generated from the PLL is run-ning. This clock is called XCLK in figure 2.8.

3. ’PCS parallel clock’ where user clock RXUSRCLK / TXUSRCLK is runninggenerated from inside the FPGA and sourced by the XCLK.

4. ’FPGA parallel clock’ where user clock two, RXUSRCLK2 / TXUSRCLK2,is running generated from inside the FPGA and sourced by the XCLK.

Page 30: Full Text 02

18 Background

NoteIn the RX section, the frequency of the XCLK in the PMA parallel section mustbe sufficiently close to the RXUSRCLK rate in the PCS parallel section. All phasedifferences between the two clock domains must be resolved, on both the TX andRX sides.

2.4 Development boardThere are many ways to configure the transceiver and Xilinx provides a develop-ment board (in the following also called test board), Virtex-5 ML523 [7] to helpthe designer to explore suitable configurations.The test board used in this work is seen in figure 2.9. The components of theplatform are marked with numbers 1-23. Those which have been used in this workare described below.

Figure 2.9: The development board [7]

• 1)The power switch (on/off) to the FPGA board

• 3) The J-tag port, which, when connected to the ’USB Cable Pod’ makes itpossible to download the code into the FPGA so that the specified circuit iscreated.

Page 31: Full Text 02

2.4 Development board 19

• 4,6) Clock pair with differential clock signals produced by the Superclockmodule, which is the source for the reference clock, CLKIN, at the PLL (seefigure 2.7).

• 12) Socket for the 50MHz oscillator that feeds the logic inside the FPGA

• 15) 16 LEDs (Light Emitting Diode) available to the user

• 16) 16 switches available to the user

• 17) four buttons available to the user

• 19) Reference clock, CLKIN, for the PLL (see figure 2.7)

• 20) Every marked rectangle contains the differential SMA (Sub Miniatureversion A) contacts of one tile.

• 21) The contact for the RS-232 connecion, for more explanation see 2.4.2

• 22) The Superclock module, for more explanation see 2.4.1

2.4.1 Superclock moduleThe reference clock (used in this work) for the dual tiles is produced by the ’Su-perClock module’ on the board. It generates a low noise clock from 49 MHz to 640MHz. To configure this clock, which represents the CLKIN for the dual tiles, twopossible clocks are available. These two oscillators are represented by XTAL0 andXTAL1 where XTAL0 represents an oscillator at 19 MHz and XTAL1 one at 25MHz. The frequency of the oscillator used is multiplied with the feedback source’M’, with the alternative values 18,22,24,25,32,40, and divided with the divider se-lection ’N’, with the possible values 1,2,3,4,5,8,10. The frequency of the referenceclock is obtained from the formula below.

CLKIN = XTAL ∗ MN (2.6)

Figure 2.10: The combination of SEL0 and SEL1 selects which oscillator thatwill be used [7].

Page 32: Full Text 02

20 Background

Figure 2.11: The combination of M0, M1 and M2 determines the value of themultiplier [7].

Figure 2.12: The combination of N0, N1 and N2 determines the value of thedivider [7].

After the tables shown in figures 2.10-2.12 have been studied, the configurationof the red DIP (Dual In-line Package) switch in figure 2.9 can be made to obtainthe appropriate CLKIN frequency needed by the PLL.

When the CLKIN signal is connected it can be found at the 100 Ohm SMAdifferential clock pairs clk0, clk1, clk2. Depending on the number of dual tilesengaged, one, two or three of these are used to feed the SMA reference clockinputs. Each of these eight reference clock inputs can feed up to seven dual tiles,see section 2.3.

2.4.2 Serial interface (RS232)The RS232 (Recommended Standard 232) protocol [3] is a standard that is com-monly used in transmissions between two units at a speed of up to 38.4 kbps datafor cables of length up to 30 meter. The standard describes in what way the datawill be sent. There is always a start bit followed by seven or eight data bits. Thenthere is one (optional) parities bit and one or two stop bits. The start bit is alwayszero and the stop bit is always one. If a parities bit is used it comes after sevento eight data bits (in this project eight bits are used). This bit is one if the sumof ones in the data package including the parities is even (even parity). If the

Page 33: Full Text 02

2.5 ISE 21

value of the parities bit is not as expected when it arrives at the receiver, there issome distortion in the transmission. There has to be a UART on both sides of theconnections so that the data can be serialized and unserialized in a proper way.

2.5 ISEXilinx offers a software package (in this work the 10.1 version is used) ISE, whichprovides the user with all tools needed to implement the desired logic design intothe FPGA. It contains built-in tools such as memory generators and wizards suchas the ’New Project Wizard’ and the ’RocketIO GTP Transceiver Wizard’ tohelp the user manage the configuration of the transceivers. It supports the HDL(Hardware Description Language) Verilog and VHDL.

Page 34: Full Text 02
Page 35: Full Text 02

Chapter 3

Implementations

Xilinx provides a set of wizards, with forms to be filled in, to help the user createa core for a given application. The wizards used are described in 3.1 (The NewProject Wizard) and 3.2 (The RocketIO Transceiver Wizard). The loopbacks,their testing and the data (PRBS and own produced data) which is used in thetests are described in 3.3-3.6. The structure of each project is described in 3.7,tests of sending and receiving data over up to 4 links and skew registration in3.8. In 3.9, a protocol for the communication between the FPGA and the PC isdeveloped. A short description of the MatLab program used for the serial com-munication between the FPGA and the PC is given in 3.10 and the use of channelbonding to synchronize the signals through the channels in 3.11. The chapter isclosed with 3.12 testing the use of two instead of eight dual tiles.

For the communication between the PC and the FPGA MatLab is used, illus-trated below in figure 3.1.

PC

MatLabFPGA

Serial communication

Figure 3.1. MatLab is used for communication between PC and FPGA

3.1 The New project WizardTo start configure a transceiver, a project has to be created in the ISE. This projectmust have at least one source file (here a Verilog module) where the transceiver

23

Page 36: Full Text 02

24 Implementations

should be instantiated in the top module. This is done by choosing a ’New Project’under files in the menu. Then the ’New Project Wizard’ is opened. A project name,a location (where the project will be stored) and the source type of the top levelmust be chosen in the form. Here, the HDL is used, see figure 3.19 .

The next important step is to fill in the family of the FPGA, which device inthe family it is, the package it comes in and the speed it shall count with. Thespeed is chosen between -1 and -3, where -1 represents the smallest gate delaysfor the FPGA. It is important that the delay in the actual FPGA can handle thespeed (i.e., that the delay is not too long). Here the -1 was chosen. Modelsimwas chosen as simulator (but simulation was never used). Verilog was chosen asthe language, see figure 3.20. After this, two pages follow. The first with an op-tion to create a source, the second to add an already existing source. (Both ofthese alternatives can be implemented after that the project is created, then under’Source, New Source/ Add Copy of Source’.) Here is described how a new sourceis created from the ’New Project Wizard’. After ’New Source’ has been marked, anew wizard, ’New Source Wizard’, appears. This wizard starts with asking whichtype of source should be created, here a Verilog module named ’Top’, see figure3.21 . After this selection has been made, the number of inputs and outputs andhow wide these buses are, can be filled in. This is preferably done later whenthe transceiver has been instantiated in the top module (writing the inputs andoutputs with their sizes at the head of the code describing the module). Beforethis stage it is impossible to know all the inputs and outputs, which are neededand their sizes. After the new source has been created, there is a question aboutwhere this should be implemented. It can be in the Implementation, Simulation,None or All. Here ’All’ was chosen.

If a module source will be added to the project, it is important that the al-ternative ’Add Copy of Source’ is used instead of ’Add a Source’. The programis made such that if ’Add Copy of source’ is used, changes in the code of this filewill not influence the original file. If the ’Add a Source’ alternative is chosen, theoriginal source file will be used and changes of this one will influence other projectsusing the same file. If it is changed to a different name, the other projects will nolonger have the possibility to use the file.

Now when the project has been created, the core generator for the transceiverhas to be implemented. Under ’Source’ in the menu, the heading ’New Source’is chosen and the ’IP CORE Generator and Architecture Wizard’ in the ’Newsource Wizard’, see figure 3.21. Under the map ’FPGA Features and Design’, ’IOInterfaces’ and finally ’RocketIO GTP Wizard v1.8’ are selected. The formulafor The ’RocketIO GTP Transceiver Wizard’ opens up and selections made asdescribed in the next section.

Page 37: Full Text 02

3.2 The RocketIO GTP Transceiver Wizard 25

3.2 The RocketIO GTP Transceiver WizardIn this wizard the user can determine the design of the transceiver, the numberof dual tiles that will be created, what clock domain will be used to source theUSERCLKs etc. Here, it will be explained how some of the settings which areshared in all the projects are made.

In the wizard the user chooses how many GTP dual tiles that will be createdand which differential clocks that will be used as reference clock/clocks (under ’RE-FCLK source’) for these tiles. There is a possibility to use up to seven GTP dualtiles with the same reference clock and this possibility was chosen in all projectsusing less than eight dual tiles.

It has to be decided which ’Internal Data Width’ that will be sent/received andthe transmission line rate has to be set. ’Target Line Rate’ was set to 2.4 Gbps.The speed of the ’Reference Clock’ was set to the most recommended choice forthe actual internal data width. For some configurations, a predefined protocol canbe chosen but, with the Σ∆-converter as a source, this option was not possibleto use. An own standard had to be created and, as a consequence, all projectsinitiated in the wizard use the ’Start from scratch’ option instead of a well knownpredefined protocol option in the ’Protocol Template’ area. The ’Silicon Version’was set to ’PRODUCTIONS’. This means that the board has been well tested bythe manufacturer.

Both transmitter and receiver are instantiated. Thus, under ’TX settings’ and’RX settings’ the line rate is set to 2.4 Gbps and the data path to ten or eightbit. Then it is possible to choose ’Encoding’ but since such possibilities are notsupported by the Σ∆-converter, this is not used. The alternative ’None’ is there-fore chosen here. For the GTP1 the ’Protocol Template’ was set to ’Use GTP0Settings’. In the line rate section, for each dual tile there is a possibility to turnoff the RX, TX side or both. In most of this work the configuration was set asabove but if only the transmitter or the receiver is to be used the other will beset to ’No TX’ or ’No RX’. In most tests performed after the loopbacks had beenchecked, only one dual tile in the transceiver sends or receives data (GTP0). Inthis case, it can be favorable to disconnect the second dual tile (GTP1). The’Protocol Template’ is set to ’Start from scratch’ and the ’Line Rate’ to ’No TX’and ’No RX’, see figure 3.22.

The receiver user clock (RXUSRCLK) was synchronized with the transmitteruser clock (TXUSRCLK) by using TXOUTCLK as a source for both the trans-mitter and receiver user clocks, see 1.3. In the wizard this is done by setting’TXUSRCLK Source’ and ’RXUSRCLK Source’ to ’TXOUTCLK’.

It also has to be decided whether to use the TX/RX elastic buffer or the phasealignment circuit to minimize skew between the ’PMA Parallel Clock’ and ’PCSparallel Clock’ domains, see figure 2.8. It was not clear from the beginning if

Page 38: Full Text 02

26 Implementations

the receiver could deliver 10 bits to the FPGA logic and since the phase align-ment circuit requires 10 bit data width, it was decided to use the TX/RX elasticbuffer. As described in the background, the transmitter and receiver should havethe same oscillator source (the same reference clock) to assure that frequency dif-ferences will not appear between XCLK (RXRECCLK) and RXUSRCLK, when’clock correction’ cannot be used in the RX elastic buffer as is the case in receivingunsorted data. There is no possibility to use the 1.2 GHz clock in the serial partof the transmitter to drive the clock in the

∑∆-converter (see figure 2.2) and in

this way avoid frequency differences between XCLK and RXUSRCLK. However,this problem could possibly also be solved by other means, e.g., using two synchro-nized signal generators to produce one clock for the

∑∆-converter at 1.2 GHz and

one producing the 240 MHz clock to the dual tile to replace the reference clockprovided by the super clock module (see 2.4.1) [4]. Since the

∑∆-chip was not

available during the time of this thesis work, no tests of such a solution could becarried through. It was then decided to configure the receiver presupposing thatit would be possible, in one way or another, to avoid the frequency differencesbetween XCLK and RXUSRCLK. The possibilities to use phase alignment (i.e.,to use the RXRECCLK to drive the RXUSRCLK) should be investigated as anoption if the suggested configuration should not work. As a consequence of theabove discussion, TX/RX elastic buffer was used and selected by choosing ’EnableTX Buffer (default)’ in the wizard.

In order to have a way to reset either the receiver part of the transceiver orthe transmitter, the RXRESET and TXRESET options were chosen in the wizard.

In every project, the ’Main driver differential swing’ and the ’Preemphasis level’were chosen to be set manually so that they could easily be changed in the project.The ’Preemphasis boost’ was set, giving an increased swing of 10 percent. Afterthis, there are different opportunities for different designs. These will be describedin each specific case.

3.2.1 Generated filesWhen the ’RocketIO GTP Transceiver Wizard’ has created the core for the transceiver,some Verilog files are created, modules describing the tiles chosen and their settingsand one describing the top of this tiles, the interface to the user. This interfacehas to be connected to specific applications in the code to get the core transforminto a fully working transceiver.

In addition, an UCF (User Constraint file) - file is created where all settingsare stated. This UCF file has to be included in the projects. It is divided into twoparts, one called the attribute file and one called the example file. In the examplefile, there is some information (location of the dual tile and the clock instance)that has to be included in the attribute file to make it work. It is also useful to gothrough the UCF settings before using them in the design. For example, if more

Page 39: Full Text 02

3.2 The RocketIO GTP Transceiver Wizard 27

than one dual tile is created there has to be changes in the search path becausethe UCF file can only describe the search path for one of the dual tiles. The UCFfile is not perfect. It is recommended to go through it and see that everything isas expected.

All signals that come to the FPGA or go out from it must be set as a NET inthe UCF file. Both transmitter and receiver should be programmed to the samesignaling standard for proper operation. Here the TXN, TXP and RXN, RXPpins were set to the ’LVCMOS12’ IO standard.

3.2.2 Clock ConnectionsTo instantiate the transceiver/transceivers they have to be created in the ’Rock-etIO GTP Transceiver Wizard’ (3.2) according to desire. The reference clock isset in the Wizard, i.e., what frequency the reference clock of the dual tile musthave to work properly. As described in the background (Chapter 2) there has tobe physical settings made at the ’DIP switch’ on the board to get this frequencyand cables must be drawn from the ’super clock module’ to the ports feeding thereference source (’REFCLK source’), see 2.4. Finally, three more operations mustbe done in the code to make the clocks function.

1)The reference clock delivers a differential signal (MGTREFCLKP andMGTRE-FCLKN) to the dual tile. The dual tile requires a common clock and this isobtained by implementing an IBUFDS as described in figure 3.2.

Figure 3.2. Connection of the differential reference clock [6]

2) As described in the ’RocketIO GTP Transceiver Wizard’ (3.2), the TX-OUTCLK drives the TXUSRCLK and RXUSRCLK. Since this has been set inthe wizard, the TXOUTCLK is found in the interface and has to be connected toa BUFG buffer to create the TXUSRCLK and RXUSRCLK, see figure 3.3.

3) The TXUSRCLK has to be connected with the TXUSRCLK2. The sameprocedure is used to connect RXUSRCLK with RXUSRCLK2. Now the TXOUT-CLK feeds the user clocks as shown in figure 3.3.

Page 40: Full Text 02

28 Implementations

Figure 3.3. An example of the connection of TXOUTCLK driving the TXUSRCLKand TXUSRCLK2 with 8 or 10 bit data width. The RXUSRCLK and RXUSRCLK2have to be connected in the same way [6]

In all projects there has to be set how strong the transmitting signal shouldbe peak to peak (the differential swing). It can be chosen in eight ways between0-1100 mV. In this work the TXDIFFCTRL was assign to ’100’ that represents adifferential swing of 800 mV. This should work properly without too much powerloss.

In transmitting the signals between the transmitter and the receiver, the highfrequencies are attenuated more than the lower frequencies. To compensate forthis, a pre-emphasis can be used which decreases the amplitude of the low fre-quency signals. The pre-emphasis port TXPREEMPHASIS was assign to ’100’which means that the pre-emphasis is 18.5 percent of the chosen differential swing.This value can be chosen to be between 3-52 percent in eight steps [6]. A too highpre-emphasis, however, can cause distortion of the signals so the 18.5 percent cho-sen was regarded as a fair middle course.

3.3 Loopbacks

Figure 3.4. The four possible loopbacks [6]

Page 41: Full Text 02

3.4 Near-End PCS Loopback 29

To understand how the transceiver was working, the first step was to do theloopbacks suggested by Xilinx. Xilinx has suggested four loopbacks: the Near endPCS , the Near end PMA , the Far end PMA and the Far end PCS loopbacks [6].In this project, the first three were tested. This is because after that the thirdone has been tested, the Far end PMA loopback, the whole transceiver with thetransmitter section and the receiver have been tested and further loopbacks arenot needed to configure the transceiver.

To be able to do these loopbacks, the transmitter and the receiver were createdas described in 3.2 above and a top module was created where the two instants areinstantiated and the ports set in the way the designer has thought is best. Whenown produced data was sent, this module was instantiated here as well. Also it isimportant in all the projects to connect RESETDONE which is a signal set afterthe bit file is burned into the FPGA. This signal indicates that all resets thatare necessary to do for the FPGA to be able to work have been done. The PLLsignal indicates that the PLL has been able to lock. If not so, it could be thatthe reference clock operates at another rate than prescribed or that somethingelse is not right in the transceiver. These two signals are preferably connected totwo different LEDs (Light Emitting Diode) so that the user can see that both arelightened. If one of them has not been able to lighten, it is not likely that thetransceiver works as desired.

3.4 Near-End PCS Loopback

Figure 3.5. Near End PCS loopback [6]

The first loopback tested was the Near-End PCS Loopback. This loopbacktests the digital part of the transmitter and never involves the parallel-to-serialand the serial-to-parallel sections in the PMA block [6]. This loopback was testedboth with PRBS data and with own produced data. In addition to the settings inthe ’RocketIO GTP Transceiver Wizard’ (3.2), PRBS settings were implemented.The Near-End PCS Loopback is shown in figure 3.5.

Page 42: Full Text 02

30 Implementations

3.4.1 PRBSInitially one transceiver was created that could send and receive PRBS patterns.The internal data path then has to be set to 10 bit and the PRBS transmissioncontrol as well as the PRBS detector are chosen in the ’RocketIO GTP TransceiverWizard’. In the wizard, there is also the possibility to chose the threshold (thevalue that has to be reached before an error will occur) for the PRBS error. Inthis project, the value of 255 should be reached before the PRBS error is indi-cated. This number is set to avoid that the PRBS error should be influenced byinstantaneous startup problems.

When the settings in the wizard have been done and the PRBS generator isused, it has to be assigned to the TXENPRBSTST port that it is enable for theactual PRBS pattern. It is also important to enable the pattern checker, RX-ENPRBSSTST, in the same standard. When this was done the TXDATA wasassigned to send zeros so that the PRBS information would not be disturbed.

When a PRBS error occurs the RXPRBSERROR port will be trigged. Thisport is coupled to a LED and this will indicate when the RXPRBSERROR turnshigh. The RXPRBSERROR port has to be reset before the LED will turn off,even if no more errors are indicated. This can be done with the GTPRESET,that resets the whole dual tile transceiver or RXRESET that resets the whole re-ceiver part of the transceiver or with the PRBSCNTRESET that resets the PRBScounter. Another port which does the same is RXCDRRESET but this one is notimplemented in this loopback (or in the project at all).

The next step was to see that the own produced data was coming through thePCS loopback channel. In this case, the wizard was gone through as in 3.2 andan own communication module was built as described below.

Page 43: Full Text 02

3.4 Near-End PCS Loopback 31

3.4.2 Own produced data

00000111 00000011

communication

TX RX

TXUSRCLK2

dual tile0

module

module

RXDATA0=00000011TXDATA0=00000111

Near-End PCS loopback

RXUSRCLK2

Figure 3.6. Communication block

This module (illustrated in figure 3.6) includes a counter that was connectedto the output on the communication module, and to the TXDATA port on thetransmitter. The output data from the counter was clocked out to the transmitterwith the same clock as this, TXUSRCLK2, so that they were synchronized. Theinput signal from the receiver (RXDATA) was clocked in with the same clock asthe receiver, RXUSRCLK2.

Because this communication module was built in the FPGA and was not partof the transceiver, an own reset function had to be implemented. The module wasalso given an enable function for the counters. When enable mode was on, bothcounters started.

Page 44: Full Text 02

32 Implementations

The idea was that if the counter at the output sends a one, a one should alsoappear on the receiver side. Since the data stream is produced from a counter,it is also easy to know what the next value should be. This makes it possibleto have a copy of the counter producing data, to check the data. Because thecounters start at the same time and the data has to travel from the transmitterto the receiver port ( RXDATA) the counter that checks the incoming data willmost probably not have the same value as the incoming data at the receiver port.So when the first data appears at the receiver port, the counter is synchronizedwith the incoming data. After that, the incoming data (at the RXDATA port)will be the same as the counter checking it, if the link has no distortions. If thecounter and incoming data were the same, a LED representing the correct datawas switched on. Else another LED representing incorrect data was lightened atthe test board (see 2.4).

The counter checking the incoming data will be synchronized every cycle ofthe producing counter (every time the incoming data will be a one). This means,for example, that if noise at the RXDATA port causes the counter on the receiverside to start synchronize before the first sent data arrives, it is no problem becauseit will be synchronized again when the proper number one arrives.

TestTo have a way to search for errors in the configuration, couplings etc, a switch wasimplemented in such a way that if it was high, the correct loopback was assignedand if the switch was off, normal mode was set in the loopback. When there wasnothing connecting the TXN,TXP drivers with the RXN, RXP pins nothing couldcome back. It means that if own produced data was used and the LED signalledthat everything was in order, it should say that an error occurred after the switchwas pulled down. If the RXPRBSERROR (coupled to a LED) was not lightenedwhen the switch was on, it should be lightened when the switch was pulled down.

3.5 Near-End PMA Loopback

Figure 3.7. Near End PMA loopback [6]

Page 45: Full Text 02

3.5 Near-End PMA Loopback 33

The second loopback which was tested also includes the analogue part of thetransceiver and here the parallel-to-serial and the serial-to parallel sections are in-volved. This did not make any difference when PRBS data were sent through thechannels compared to what is described in 3.4. It did not even have to be anotherproject to make this happen, just to change the loopback to be in Near-End PMALoopback mode. But in the part in which own produced data was to be sent therehas to be some changes which are described below. The loopback is shown infigure 3.7

3.5.1 Comma detectWhen a project uses the comma detect function, the settings in the ’RocketIO GTPTransceiver Wizard’ are as in 3.2. Also some new settings are needed. Under ’RXComma Alignment’, ’Use Comma Detection’ is chosen and the square ’CombinePlus/Minus Commas (double-length comma)’ is filled in because using a doublecomma is more safe. There are already comma suggestions in the ’Plus comma’and ’Minus comma’ fields. These predefined commas are used in connection with8/10 bit encoding, which is the most common way to send data. The data whichis to be sent goes through an encoding block. All the 256 eight bit words aredescribed in a ten bit code. Then it will be combinations left, not describing aword. These combinations are chosen to be commas. In this work, this featurewas not used so these commas have to be changed to words represented by thecounter. The positive comma and the negative comma also need to be the inverseof each other. This makes the choice easy. It has to be ’1111111111’ for the ’PlusComma’ and ’0000000000’ for the ’Minus Comma’.

When the comma port is defined as a ten bit port it can cause problems whenan eight bit counter is used. This problem is solved with the ’Comma Mask’ thatworks like a demand specification. If ’Comma Mask’ is set to ’1111111111’ itmeans that all the ten bits in the comma have to be matched for the comma tobe accepted. A zero in this comma mask means that this bit is a ’do not care’ bit.So using eight bit words ’Comma Mask’ is set to ’0011111111’.

If a single comma is used it is more likely that noise in the channel can causethe comma to trig than if a double comma is used. It has to set in the wizardthat both a positive and a negative comma will be used even when double commais chosen. Otherwise, the comma detect will look for two positive commas or twonegative commas in a row. The squares ’ENPCOMMAALIGN’ and ’ENMCOM-MAALIGN’ are marked.

The following ’Optional Ports’ were chosen: RXCOMMADET that goes highevery time a comma is detected and then remains low until next comma is de-tected; RXBYTEREALIGN that indicates if one byte alignment in the serial datastream has changed due to comma detection; RXBYTEISALIGNED that indi-cates if the byte alignment is properly aligned after comma detection, see figure

Page 46: Full Text 02

34 Implementations

3.23

3.6 Far-End PMA Loopback

Figure 3.8. Far End PMA loopback [6]

The third loopback test includes a second dual tile. It is important that thesecond part of the dual tile channel is not in use because it is not reliable in thismode. The first dual tile is set in loopback for normal operation and the seconddual tile is set in far ended loopback. Now the entire dual tile will be tested infull application mode. The figure above shows the second loop of the second tile.The core was created as in 3.2 and 3.5.1 for own produced data and as in 3.2 andin 3.4.1 when PRBS patterns were used. In this stage of the loopbacks the oldversion of control as described in 3.4 under the heading ’test’ cannot be used anymore because all loopbacks are in use. Instead the TXINHIBIT port is chosen inthe ’RocketIO GTP Transceiver Wizard’. When this port, that is assigned to aswitch, is turned high the transmitter will stop sending data and starts to senddifferential zeros. This port later helpful to use in many of the projects. Theloopback is shown in figure 3.8.

3.7 Structure of each projectAfter all the loopbacks had been successfully carried through, the following projectswere designed such that only the GTP0 was used in each dual tile. However, GTP1was defined in ’RocketIO GTP Transceiver Wizard’ but disconnected to assurethat cross talk would not appear. Later in the project, use of GTP1 together withGTP0 was tested, see 3.12

3.8 Sending and receiving dataAfter the loopbacks had been tested, it had to be tested if the communication wasconfigured in the right way, first over one link. Then it was interesting to see if

Page 47: Full Text 02

3.8 Sending and receiving data 35

there would be any problem to implement this over two links, i.e., if there wouldbe any interference between the links or if there would be some other unknownproblems. The last step was to create a four links transmission to be able to testhow the dual tile should receive this data. At first, these links had there owntesting modules. Later on, protocols were implemented and new possibilities totest the data became available.

3.8.1 Sending and receiving data over one linkThe core for the two dual tiles was created as in 3.2 and when PRBS data wassent/received as in 3.4.1. With own produced data, the core was created as in 3.2and 3.5.1 and used the logic block described in 3.4.2.

3.8.2 Sending and receiving data over two linksThe core for the four dual tiles was created in the same way as for two dualtiles in 3.8.1 with the difference that there were two more dual tiles and the logicblock described in 3.4.2 was duplicated so that there were two counter modulestransmitting and checking if the data was coming through the channels withoutdisturbance.

3.8.3 Sending and receiving data over four linksUsing eight dual tiles, number zero, four, five and six were used as receivers, num-ber one, two, three and seven as transmitters. The project was created as in 3.8.2with one difference: The reference clock could not be the same for all eight dualtiles so the dual tile number zero was sourced by another GTP clock than theothers but running at the same rate.

On the test board, 16 LEDs are available. If the important signals that indicatethat the PLLs are locked and that each dual tile is able to do all the resets thatare necessary to lighten the RESETDONE, all LEDs would be occupied. Instead,two modules were created. One that checked that all PLLs were locked and onethat checked that all of the dual tiles had been able to do the RESETDONE.There were also some simplifications done with the GTPRESET and TXRESET,RXRESET. From all the dual tiles there is a possibility to use those resets. Thiswas not changed but the resets were tied together so just one button was neededto get all transceivers, with the eight dual tiles, to reset if the GTPRESET waspushed down, and one button for each of TXRESET and RXRESET. The resultwas that these resets needed only three buttons instead of 24.

Page 48: Full Text 02

36 Implementations

Skew registration

00000111 00000011

communication

TX RX

tile6_TXUSRCLK2

module

tile6_RXUSRCLK2

TXTXTX RXRXRX

tile0 tile4 tile6tile5tile1 tile2 tile3 tile7

Figure 3.9. Block diagram of the set up for skew testing

At the start the communication module illustrated in figure 3.6 was used foreach dual tile pair (TX and RX). This worked well and the next step was to seeif the data was coming through the channels synchronously. In this case only oneof these communication blocks was used. The output signal from this module wasconnected to all the four transmitters so that it was only one source feeding them.The receivers were clocked in with TILE6_RXUSRCLK2 but they have also beentested with the user clocks of all the other tiles. If the counter on the receiverside was the same as TILE6_RXDATA, TILE5_RXDATA, TILE4_RXDATA,TILE0_RXDATA at the same time, the data would be synchronously received.This was not the case, but occasionally up to three of them could be synchronous.

Because of this skew, the original idea of how to store and transfer the datato the PC (see 1.1) could not be used. Instead, the storing problem was solved asdescribed in 3.9 (also see 4.1 for more details about how this could work with theΣ∆-chip).

Page 49: Full Text 02

3.9 Protocol between PC and FPGA 37

3.9 Protocol between PC and FPGAThe protocol was developed in five steps. The first step was to make sure that thePC and the FPGA could communicate. This protocol is described in section 3.9.2.After this protocol was successfully completed, a new protocol was created thatsends some information about the data received. Such protocol was made for botheight bit data, described in section 3.9.3, and ten bit data, described in section3.9.4. Using these protocols it was possible to check that the data was comingthrough the channel in a proper way. After the ten bit protocol was finished, aprotocol for the four lines and eight bit data was established. Finally, a protocolfor four lines and ten bit data was made. All the protocols are presented in theform of state machines.

3.9.1 Prerequisite for creating the protocolsTo create the serial interface between the FPGA and the PC, an asynchronousUART was implemented [4]. The interface from the UART, illustrated in figure3.10, is described below.

• we - write enable, is set when data is sent to the transmit buffer

• re - read enable, is set when the FPGA reads data from the receive buffer

• full - full = 1 indicates that received data is present in the receive buffer

• empty - empty = 0 indicates that the UART is full and further informationhas to wait until more space is available in the UART

• din - represents the port sending data from the protocol to the UART

• dout - represents the port sending data from the UART to the protocol

UART Protocol

we

re

full

empty

din

dout

Figure 3.10. Block diagram between Protocol and UART

Page 50: Full Text 02

38 Implementations

The first protocol used this interface when it was created. The other protocolsneeded memories since they should be able to store collected data from the receiverand then transfer this data to the PC for analysis. For this purpose FIFO/FIFOSwas used. This memory has to be a two port FIFO (two independent ports us-ing different clocks for the in and out data). The data has to be received at thesame speed as it arrives at the RXDATA port and therefore the clock has to be atthe same clock rate as RXUSRCLK2. When the data afterwards is read from theFIFO, it is at the same speed as in the rest of the FPGA logic, in our case 50 MHz.

In the ’New Source Wizard’ the ’Memories and Storage Elements’ map is chosenand under this map, the ’FIFOs’. From here the ’Fifo Generator v4.3’ is selected,see figure 3.24. When the FIFO generator is started, the design of the FIFO canbe implemented. The ’Independent clocks ( RD_CLK, WR_CLK)’ were chosenfor the reasons given above, see figure 3.25. There are several ways to read fromthe FIFO/FIFOS. In this case, it is only interesting to read the FIFO from the firstword to the last one. Therefore, ’Readmode’ was set to standard. ’Write width’and ’Read width’ were set to eight bit/ten bit depending on the chosen internaldata width. ’Write depth’ was set to 512 for eight bit internal data and to 1024for ten bit internal data.A new interface from the FIFOs had to be added as described below.

• fifo_we - write enable, is set when data is written into the FIFO

• fifo_re - read enable, is set when data will be read from the FIFO

• fifo_full - fifo_full= 1 indicates that FIFO is full

• fifo_empty - fifo_empty = 1 indicates that FIFO is empty

• fifo_din - represents the port writing incoming data (from the receiver) intothe FIFO

• fifo_dout - represents the port reading data from FIFO to the PC

The complete block diagram including a FIFO is shown in figure 3.11

UART

we

re

full

empty

din

dout

fifo_we

fifo_re

fifo_full

fifo_empty

fifo_din

fifo_dout

Protocol FIFO

Figure 3.11. Block diagram between Protocol, UART and one FIFO

It is important that the read enable and write enable signals are turned offdirectly after they have been used otherwise the result will not be as expected. In

Page 51: Full Text 02

3.9 Protocol between PC and FPGA 39

the following descriptions of the protocols, the handling of the we, re, din, dout,fifo_we, fifo_re , fifo_din, fifo_dout is not treated. Only the data paths are de-scribed.

3.9.2 Communication between PC and FPGA

s0

s1

s3

s6

s2

s5

s4

[7]=0[7]=1

Valid address =1

empty=0

Valid address =1

Valid address =0Valid address =0

full=0

empty=0empty=0

full=0

Figure 3.12. State machine for communication between PC and FPGA

The state machine shown in figure 3.12 is described below.The PC starts sending eight bits to the FPGA. When the UART receives thisinformation (full =1), it is evaluated in the protocol in state s0. The seven leastsignificant bits are stored in a memory called address. The most significant bittells what the PC wants to do. If this bit is zero, the PC wants to send informationto the FPGA. In state s1, the address is controlled in a state machine of its ownjust checking that the address is predefined. When there is space in the UART(empty=1), the protocol sends a byte of the status to the PC telling if the addressis valid or not. If the address is valid, the protocol waits for the data which is tobe stored. When the UART receives this information, the data is stored in theactual address in state s3. State s6 is just to set ’re’ to zero.

If instead, the most significant bit is a one, the PC wants to read data fromthe FPGA. In state s2, the validity of the address is checked in the same way as itwas checked in s1. After that, the status is sent to the UART, ’we’ (write enable)has to be set to zero again and this is done in state s4. In state s5 the data stored

Page 52: Full Text 02

40 Implementations

at the valid address is sent to the PC when full=1 at UART.

If valid data=0, a code will be sent telling that the address which was sent tothe FPGA was incorrect and the session is finished.

3.9.3 Protocol for eight bit data over one line

s0

s1

s6

s2

s5

s4

[7]=0 [7]=1

s3

fifo_empty=0

empty =0

fifo_full=0

full=0

empty=0

Figure 3.13. State machine illustrating the protocol used for one link with 8 bit data.

The state machine shown in figure 3.13 is described below.

The first state s0 checks if there is data in the UART (full=1). If so, this datais stored in the eight bit memory (computerCommand) that describes what thePC wants to do. If the most significant bit is a one the PC wants the FIFO tostore data from the receiver port, RXDATA. The next state will be changed froms1 to s2, where the FIFO will store the data from this port until the FIFO is full(fifo_full=1). A register is set to a code that describes that the FIFO is full andthe code is sent in state s3, in order to be sent through the UART. When theUART can receive the data (empty=1) it will be sent and next state is s6. In thisstate all write, read enable are set to zero and then the state goes back to states0.

If, instead, the most significant bit is a zero, the PC wants to read from theFIFO and the state will change from s1 to s4. In state s4 the data will be read

Page 53: Full Text 02

3.9 Protocol between PC and FPGA 41

from the FIFO, if possible (fifo_empty=0). After the word has been read fromthe FIFO it will be sent in state s5, if the UART allows it (empty=1). Thenthe protocol returns to state s4 to get more data from the FIFO memory. Whenthe FIFO is empty (fifo_empty=1) an eight bit register containing a code will besent in state s3, when empty=1. After the data has been sent in s3, the protocolreturns to state s6 and back to state s0.

3.9.4 Protocol for ten bit data over one line

s0

s1

s6

s2

s5

s4

[7]=1[7]=0

s3

fifo_empty=0

empty =0

fifo_full=0

full=0

empty=0

s7

s8

empty=0

Figure 3.14. State machine illustrating the protocol used for one link with 10 bit data.

The state machine shown in figure 3.14 is described below.

Because the flow of information between the FPGA and the PC is eight bitsand the length of the word from the FIFO is ten bits there has to be a divisionof the word. Of the first eight bits sent to the PC, the lowest five bits from theFIFO is sent. The upper five bits from the FIFO register is sent in the next eightbits. This resulted in the following state machine.

The first state s0 checks if there is data in the receive buffer of the UART(full=1). The word is saved and in next state s1 it will be evaluated. If the mostsignificant bit is a one, data will be stored in s2 until the FIFO is full (fifo_full=1).Then an eight bit code will be sent in s3 (if empty=1) to the PC saying that nomore writing to the FIFO is possible. All read, write -enable for both UART and

Page 54: Full Text 02

42 Implementations

FIFO are reset in s6.

If instead the most significant bit in s1 is zero, the state will change from s1 tos4 where the FIFO will be read until it is empty (fifo_empty=1). When emptiedit will go to state s3 to write the code to the PC telling that it will be no morereading from the FIFO. In s3 the same procedure as in storing data will take place.On the other hand, if the FIFO is not empty the state will go from s4 to s5 forwriting the lower 5 bits from the FIFO to the eight bit word going to the PC whenempty=1. Then next state is s7. In this state nothing is done. It is a state neededbecause otherwise the data stored in the memory will be overwritten in the nextstate before the UART has the possibility to send the first five bits. In the nextclock cycle, the s8 state is reached where the upper five bits are sent to the PCwhen the UART allows it (empty=1). Now the first word from the FIFO has beensent and a new word will be sent, if possible, in state s4.

3.9.5 Protocol for eight bit data over 4 lines

s0

s1

s2

fifo_full0=1fifo_full4=1fifo_full5=1

fifo_full4=1fifo_full5=1fifo_full6=1

s4s8

s9

fifo_full0=1 fifo_full4=1

fifo_full4=1 fifo_full6=1

fifo_full4=1 fifo_full5=1

fifo_full0=1 fifo_full6=1

fifo_full0=1 fifo_full5=1

fifo_full5=1 fifo_full6=1

fifo_full4=1

fifo_full5=1

fifo_full6=1

fifo_full0=1

fifo_full0=1fifo_full4=1fifo_full6=1

fifo_full0=1fifo_full5=1fifo_full6=1

[7:5]=011

[7:5]=010[7:5]=001[7:5]=000

[7:5]=100

WRITE TO FIFOs

READ FROM FIFOS

s13

s12

s11

s10

s20

s21

s22

s23

s19s18

s17s16s15

s14

s7

computer command <=dout

Figure 3.15. This state machine describes how the protocol treats the ’write to’ and’read from’ procedures for the FIFOs. Each part is described in more detail in figures3.16-3.18

Page 55: Full Text 02

3.9 Protocol between PC and FPGA 43

s2

[7:5]=100

s13

s12

s11

s10

s20

s21

s22

s23

s19

s18

s17

s16s15

s14

[0, 4, 5]

[0, 4, 6]

[4, 5, 6][0,5, 6]

[5, 6]

[0, 5]

[6]

[0, 4][4, 5]

[0, 6]

[4, 6]

[0]

[5][4]

[4, 5]

[4, 6]

[5, 6]

[4]

[5]

[6]

[4]

[5]

s1

Figure 3.16. This part of the state machine describes how the FIFOs are loaded withdata and controlled if they are full. The green arrows describe an example of the processwhen only one FIFO is full after state s2, blue and red arrows show examples of whentwo and three FIFOs are full after s2. If all FIFOs are full in some state, next state is s3seen in figure 3.17.

Page 56: Full Text 02

44 Implementations

s0

s1

s4s8

s9

[7:5]=011[7:5]=010[7:5]=001

[7:5]=000

Read from FIFOs 8 bit

s7

computer command <=dout

s6

s3

s5

s28

fifo_empty4 = 0

fifo_empty4 = 1

fifo_din <= forfifo10bit [7:0]

s3 writes to UART

s5 writes to UART

[7:5]=000empty=1

empty=1

Figure 3.17. This part of the state machine describes an example of how data are readfrom the FIFOs for eight bit data

The state machine shown in figure 3.16 and 3.17 is described below. Also seefigure 3.15 for an overview of the state machine.

Tiles number zero, four, five and six are receivers as described in 3.8.3. ThisProtocol is made for 8 bit wide words and starts at s0. If there is data in theUART(full=1) this data is stored in an eight bit memory called computerCom-mand. In the next state, s1, the computerCommand is evaluated. If the upperthree bits show the binary number ’100’, this means that data will be stored in theFIFOs. The next state will be s2 where data from the receiver will be stored in thefour FIFOs. If all FIFOs turn full at the same time (fifo_full0=1, fifo_full4=1,fifo_full5=1, fifo_full6=1) an eight bit word will, in state s3, be sent to the PC(if empty=1) describing that the FIFOs are full and nothing more can be stored.If the FIFOs are full at different times there is a tree structure taking care of this

Page 57: Full Text 02

3.9 Protocol between PC and FPGA 45

as can be seen in figure 3.16.

If data is to be read from one of the FIFOs this will be described by the up-per three bits of the word and evaluated in state s1. The binary code for FIFOnumber zero is ’000’, number four is ’001’, number five is ’010’ and number sixis ’011’. If the three upper bits are ’000’, the next state will be s4 where it willbe read from FIFO0 if there is some information (fifo_emty0=0). If it is empty(fifo_empty0=1), a code describing this will be stored in a memory and then sentin s3, if the UART allows it (empty=1). Next state will be s6 where all read, write-enable for both UART and FIFOs are reset. The state is then back in s0. Thesame procedure is repeated for FIFO4, FIFO5 and FIFO6 with the difference thatinstead of s4 the state will be s7 for FIFO4, S8 for FIFO5 and s9 for FIFO6.

If, instead, the FIFOs have some information (fifo_empty0=0, fifo_empty4=0,fifo_empty5=0 or fifo_empty6=0), the word will be stored and sent in the nextstate, s5 if the UART allows it (empty=1). After state s5, state s4, state s7, states8 or state s9 are reached again and the next word can be read from the actualFIFO memory until it is empty. If the upper three bits in the computerCommandin s1 are neither of these codes, the next state will be s6 and then back to states0.

Page 58: Full Text 02

46 Implementations

3.9.6 Protocol for ten bit data over 4 linesThe state machine for the ’write in’ to the FIFOs is the same as for eight bit dataillustrated in figure 3.16. The ’read from’ part is slightly different for ten bit dataand is shown in figure 3.18. The overview of the state machine is the same as foreight bit data (see figure 3.15).

s0

s1

s4s8

s9

[7:5]=011[7:5]=010[7:5]=001

[7:5]=000

Read from FIFOs 10 bit

s7

computer command <=dout

s6

s29

s3

s5

s30

s28

fifo_empty = 0

fifo_empty = 1

fifo_din <= forfifo10bit [7:0]

flag<= 4’b0011

flag= 4’b0011

flag<= 4’b0011

fifo_din <= forfifo10bit [9:8]

flag= 4’b0011s3 writes to UART

s5 writes to UART

empty=1

empty=1

Figure 3.18. ’Read out’ part of the state machine for ten bit data over four lines.

If the UART gives the signal full = 0 there is no data to be read from it andthe protocol will stay in start position s0. If it changes to full = 1, ’re’ sets to 1to be able to read the data from the UART and the ’out’ port (from the UART)containing the code describing the wish of the user is saved into a memory calledComputercommand. Next state is s1. In s1 ’re’ sets to 0 and the code is analyzed.The upper three bits are used to analyze the message. If these are ’000’, the nextstate is s4, if they are ’001’, next state is s7, if they are ’010’, next state is s8 and,finally, if they are ’011’, next state is s9. In all these states, data are read from

Page 59: Full Text 02

3.9 Protocol between PC and FPGA 47

the FIFOs. If no data is stored in the memories, a code is sent back to the PCdescribing that the FIFOs are empty.

First, the FIFOs must be loaded with data. This will start at s2 where all theFIFOs write enable signal ’fifo_write’ is set to 1 so that data can be read from thereceiver. Unless some FIFOs are full, fifo_full = 1 (for FIFO 0,4,5,6), this stateis the present state. As can be seen in the state machine above, the write enablesignal is shut down at the state where the FIFO/FIFOs become full and then thestate goes back to the start waiting for a new command.

If, for example, Computercommand is ’001’ FIFO4 is read in the following way.If fifo_empty4 = 0, the fifo_read4 is set to 1 and all the other FIFOs read enablesignals are set to zero to avoid that they lose information when FIFO4 is emptied.The ten bit word is stored in a ten bit memory, ’forfifo10bit4’ and then the nextstate is s28. If, instead the FIFO is empty (fifo_empty4 = 1), all read enablesare set to zero and ’fifo_din’ (din = data into UART) that is a eight bit memoryis set to the code ’127’(binary) describing that the FIFO is empty and then states3 is reached where this information is sent to the PC and its user when empty=1 and the UART can receive data. At state number s28, all read enable signalsare turned off. Because the RS232 cannot send more than 8 bit data (see section2.4.2) the eight least signification bits are read from the ’forfifo10bit4’ into the’fifo_din’ memory and a four bit flag is set in order to know which state will bethe next.

(At state s5 all the ten bit data for the four FIFOs is sent, the lower eightbits and the upper two bits. Therefore there has to be a flag that contains severalbits to describe all possible states. In this example, the flag is set to ’0011’.) Atstate s5: read enable for all the FIFOs is set to 0 as well as the ’re’ and ’we’ forthe UART until the UART is no longer full (empty= 0) and the ’fifo_din’ is readinto the ’din’ port at the same time as the UARTs write enable port is set to one.Depending on the flag code, next state is chosen. In this example next state is s29.The FIFOs read enable are set to zero to assure that they will not be open andlose information. ’re’ and ’we’ are also set to zero when no more reading or writingis done from the UART at this state. The flag is now set to ’0010’ to describe fors5 that it goes back to s7. Nothing more is done in this state and it may seemthat this state is unnecessary. However, as in the state machine in figure 3.14 itis needed to assure that information is not lost when the UART otherwise willread these two bits instead of the eight bits. This state is thus there to make surethat the UART has time to send its information. Next state is s30. Here all readenable for the FIFO are set to zero to avoid information loss and the upper twobits of ’forfifo10bit4’ are set in the eight bit fifo_din memory. S5 is the next statewhere the same procedure as at every time in s5 will take place but now the flagtells s5 to go back to s7. This will go on until the FIFO is empty and state s3is reached. At s3: the information that the FIFO is empty will be sent when theUART is free (empty=1) and then state s6 makes sure that all FIFOSs read andwrite as well as read and write for the UART are set to zero before the start state

Page 60: Full Text 02

48 Implementations

s0 is reached. Above, one example has been described. The state description forthe other FIFOs follow the same principle.

3.10 MatLabIn the MatLab the actual baud rate is set. This baud rate also has to be set in theUART. In all of the protocols, the baud rate was set to 115200 bps. Data widthwas set to 8 eight bits. The output buffer was set to one and the input buffer suchthat it would suit the incoming data depth from the FPGA.

The MatLab program was written so that it sends a code telling what is to bedone and then waits for data. The information in the incoming vector was shownin MatLab so that it could be evaluated. In the first protocol 3.9.2, a slightlymodified version of the code was used in which the MatLab code after data hasbeen received sends a new byte with data to the FPGA, see figure 3.12.

3.11 Channel bondingSince the skew between the signals from the four lines was only some samples andthe TX/RX buffer was already in use, the prerequisite for implementing channelbonding to get the data synchronously at the receivers was at hand. Using channelbonding, one channel is chosen as the master and the others, called slaves, haveto follow the master. The data arriving in the different channels are stored in theRX elastic buffer until all have arrived (within the time limit). Subsequently, theyare simultaneously transferred to the receiver.

As was discussed above, the GPT1 tile was not used at this point. It was, how-ever, not explicitly disconnected. It was defined in the ’RocketIO GTP TransceiverWizard’ and set to use the same settings as the GTP0 tile. Therefore, when theGTP0 was set to use channel bonding the GTP1 also was defined to use this. Inthis case, with channel bonding, this turned out to be critical. One of the GTP1 isset to be the master that shall wait until the slaves arrive. When the GTP1 neverarrived the channel bonding was not working. When the GTP1 was disconnectedby setting ’Line Rate’= No TX at ’TX setting’and on the ’RX setting’, ’LineRate’= No RX instead of 2.4 Gbps in figure 3.22, the channel bonding workedproperly.

When channel bonding is used, some complementary additions have to be in-troduced in the ’RocketIO GTP Transceiver Wizard’, which as to rest follows3.8.3. In the channel bonding part of this wizard, the square ’Use Channel Bond-ing’ is marked and the ’Sequence Length’ was chosen to be one. Maximum skew(’Sequence 1 Max Skew’) is set to 7, which will not cause any problem for the RXelastic buffer since the skew between the lines is small, see figure 3.26. Then the’Sequence’ has to be set. It was chosen as the eight bit word ’10000001’. Firstit was set to the same word as the comma but this was later abandoned since it

Page 61: Full Text 02

3.12 Two dual tiles 49

created a problem for the receiver to find the comma. With this configuration, acore with the possibility to use channel bonding had been created.

3.12 Two dual tilesAfter channel bonding was used, a test was made to receive data over four lines us-ing only two dual tiles with both GTP0 and GTP1 active. The test was successful.

Figure 3.19. New Project Wizard

Page 62: Full Text 02

50 Implementations

Figure 3.20. Settings of ’Device Properties’ in the New Project Wizard

Figure 3.21. New Project Wizard

Page 63: Full Text 02

3.12 Two dual tiles 51

Figure 3.22. Settings in the RocketIO GTP Wizard

Figure 3.23. Comma detection settings in the RocketIO GTP Wizard

Page 64: Full Text 02

52 Implementations

Figure 3.24. Search way for generating FIFOs

Figure 3.25. FIFO settings in the ’Fifo Generator’

Page 65: Full Text 02

3.12 Two dual tiles 53

Figure 3.26. Channel bonding settings

Page 66: Full Text 02
Page 67: Full Text 02

Chapter 4

Results

4.1 Design of the transceiversThe final configuration of the transceivers which has been shown to function asdesired is expressed in the RocketIO GTP Transceiver Wizard as summarized be-low for ten bit data paths.

Ten bit data path including PRBS patternSelect Tiles and Reference Clocks

• 1) GTP_DUAL_X0Y 3(MGT1120/1)REFCLK_source′CLK_Y 3′ andGTP_DUAL_X0Y 4(MGT1120/1)REFCLK_source′CLK_Y 3′

Line Rate and Protocol Templer

• 2) Internal Data Width ’10’

• 3) Silicon Version ’PRODUCTION’

• 4) Reference Clock ’240’ MHz

• 5) Use REFCLKOUT port to make it easy to change configuration.

• 6) Protocol Template ’Start from Scratch’

GTP0

• 7) TX-Settings: Line Rate ’2.4’ Gbps, Encoding ’None’, Data Path bits ’10’

• 8) RX- Settings: Line Rate ’2.4’ Gbps, Encoding ’None’, Data Path bits ’10’

55

Page 68: Full Text 02

56 Results

GTP1

• 9) Protocol templar ’Use GTP0 settings’RX PCS/PMA Aligment

• 10) Use RX Buffer, Use TX Buffer

GTP0

• 11) TXUSRCLK Source ’TXOUTCLK’ , RXUSRCLK Source ’TXOUTCLK’

• 12) Optional Ports ’RXRESET’, ’TXRESET’, TXENPRSTST

Preemphasis and Differential Swing

• 13) Preemphasis Use TXPREEMPHASIS port, Use TXDIFFCTRL port

• 14) RX Termination 2/3 VTTRX , Disable internal AC coupling

• 15) Use PRBS Detector, PRBS Error Threshold 255Comma detection

• 16) Use comma DetectionOptional Ports

• 17) ENPCOMMAALIGN (enable positive comma detection)

• 18) ENMCOMMAALIGN (enable negative comma detection)

• 19) RXBYTEISAILIGN

• 20) RXCOMMADET

Based on this configurations and the work described in Chapter 3 (Implemen-tations) the following configuration is suggested for use with the Σ∆-chip.

Ten bit data path including PRBS patternSelect Tiles and Reference Clocks

• 1) GTP_DUAL_X0Y 3(MGT1120/1)REFCLK_source′CLK_Y 3′ andGTP_DUAL_X0Y 4(MGT1120/1)REFCLK_source′CLK_Y 3′

Line Rate and Protocol Templer

Page 69: Full Text 02

4.1 Design of the transceivers 57

• 2) Internal Data Width ’10’

• 3) Silicon Version ’PRODUCTION’

• 4) Reference Clock ’240’ MHz

• 5) Use REFCLKOUT port to make it easy to change configuration.

• 6) Protocol Template ’Start from Scratch’

GTP0

• 7) TX-Settings: Line Rate ’2.4’ Gbps, Encoding ’None’, Data Path bits ’10’

• 8) RX- Settings: Line Rate ’2.4’ Gbps, Encoding ’None’, Data Path bits ’10’

GTP1

• 9) Protocol templar ’Use GTP0 settings’

RX PCS/PMA Alignment

• 10) Use RX Buffer

GTP0

• 11) TXUSRCLK Source ’TXOUTCLK’ , RXUSRCLK Source ’TXOUT-CLK’.

• 12) Optional Ports ’RXRESET’

Page 70: Full Text 02

58 Results

Preemphasis and Differential Swing

• 13) RX Termination 2/3 VTTRX , enable internal AC coupling

• 14) Use PRBS Detector, PRBS Error Threshold 255

Comment on comma detectionAs can be seen in the lists above, comma detection that was used in almost everytest is not included in the final design for the Σ∆-chip. As described in Back-ground, comma detect is used to keep track of the start and end of the words inthe incoming data stream. Without comma detection, the data will remain thesame but may be divided into words at different positions, see figure 4.1. This wastested in an experiment using own produced data and the result turned out to beas expected. An example of a result from this test is shown in figure 4.1.

00000001 00000010 0000001100000100 00001000 00001100

1 2 3 4

Figure 4.1: Results of an experiment with and without using comma detectionand an eight bit counter. In the upper row comma detection is used and the wordsare correctly divided. In the lower row, comma detection is not used and divisionof the words has started two bits earlier than it should (correct division shown bythe red lines).

As illustrated in figure 4.1, the words may be formed in an unexpected waywhen sent from a counter without using comma detection. The Σ∆-chip on theother hand produces unsorted data and division of the data stream into correctwords is not critical. Instead, the four bit words transferred to the PC are extractedfrom parallel data in the four lines. The important feature is then that the startup of the data stream in these lines is captured correctly.

00000100 00001000 0000111100000100 00001000 00001100

00000100 00111000 0000110000000100 00001000 00011010

Figure 4.2: Illustration of how the four bit words should be formed in the PC.

Page 71: Full Text 02

4.2 Design of the FIFO registers 59

Tests done in this work showed that the data arrived with a small skew betweenthe lines. When stored in FIFOs before being transferred to the the PC, this skewcould be accepted without loss of correct transfer of the data (see Comment onstorage of data in 4.2 for explanation).

4.2 Design of the FIFO registersThe final configurations for the FIFOs are shown in the list below. The write clock(WR_CLK) is later set to 240 MHz and the read clock (RD_CLK) to 50 MHz (ifbuilt in FIFO is used this has to be inserted in the FIFO generator as well). TheFIFO register that is used is configured as follows.

Memory Read/Write clock domains

• Independent clocks RD_CLK,WR_CLK

Memory type block RAM

• block RAM Read Mode

• Standard FIFO

Data Ports Parameters

• Write width 10, Write Depth 4194304, Read width 10

Data Ports Parameters

• Reset pin, Use dout reset, full flag reset value 1, Use Reset Value 0

DataInitialization

• No programable full threshold, No programable empty threshold

Page 72: Full Text 02

60 Results

Comment on storage of dataMeasure the skew with PRBS

3 2 1

3 2 1

3 2 1 03 2 1

PRBS

PRBSPRBSPRBS

Measure the skew with PRBS

Sort the data from the ∑∆ with knowledge of the skew

The four FIFOs

Figure 4.3: Measurement of the skew using PRBS pattern (illustrated in a sim-plified manner) from the Σ∆-converter at the four lines in the receivers. Sortingof data in the FIFOs is made correcting for the skew as illustrated.

Tests using own produced data have shown that a skew is obtained betweenthe data received in the four FIFOs. It follows that the first bit in every wordcannot be used to form the 4 bit words as desired. This is illustrated in figure 4.3where the first 4 bit word is ’1110’ instead of the correct ’1111’. The interestingabout this skew is that it is created when the transceivers are started and remainsstable as long as they are not turned off. The skew obtained can then be usedto know how the data should be collected to form the right words. When ownproduced data is used the skew is easy to see but with the Σ∆-converter this hasto be done in another way. The Σ∆-chip (figure 1.2) has the possibility to send aPRBS pattern through the four lines. If this is done, a MatLab program could bewritten to measure the skew. It should then be possible to change from sendingPRBS to send unsorted data from the Σ∆-converter while keeping the settings.The unsorted data could be sorted correctly into 4 bit words with knowledge aboutthe skew registered from the test with the PRBS pattern (see figure 4.3). Thisprocedure must be repeated every time the transceiver has been turned off.

4.3 MatLab

The MatLab program described in section 3.10 should be used.

4.4 Design of the protocol

The protocol used is the ten bit protocol described in section 3.9.6

Page 73: Full Text 02

4.5 Complete solution 61

4.5 Complete solution

UART

we

re

full

empty

din

dout

we_fifo41

re_fifo41

full_fifo41

empty_fifo41

dout_fifo41

Protocol

FIFO

FIFO

FIFO

FIFO

we_fifo4

re_fifo4

full_fifo4

empty_fifo4

dout_fifo4

we_fifo3

re_fifo3

full_fifo3

empty_fifo3

dout_fifo3

we_fifo31

re_fifo31

full_fifo31

empty_fifo31

dout_fifo31

PC

MatLabTX

RX

Virtex5 LT110X

RX0

RX1

RX0

RX1

RXDATA4

RXDATA41

RXDATA31

RXDATA3

CLK: 240 MHz

CLK: 50MHz

din_fifo41

din_fifo3

din_fifo4

din_fifo31

CLK : 1.2 GHz

CLK:11520HzDUAL_TILE4

DUAL_TILE3

Figure 4.4: Block diagram of the complete configuration including UART, pro-tocol, FIFOs and two dual tiles.

Two dual tiles, configured as described above, receive data from the four linesof the Σ∆-converter at a speed of 2.4 Gbps. The 1.2 GHz clock shown in figure 4.4,triggers on both the rising and falling edge of its signal so that data are clockedin at the desired 2.4 GHz. Within the receivers the incoming serial data streamis converted to a parallel data stream and clocked at a speed of 240 MHz for theten bit incoming data. Data from the receivers are read into four two port FIFOmemories at this speed and are read from the FIFOs at 50 MHz. The UART sendsthe data to the PC at the speed of 11520 bps. This solution, using two dual tiles,provides a system with low power consumption and leaves six dual tiles left forother tasks if needed.

Page 74: Full Text 02
Page 75: Full Text 02

Chapter 5

Discussion

Arrangements of the dual tiles

At the start of the project (after testing the loopbacks) it was not clear fromthe designer´s point of view if the GTP1 could be used at the same time as theGTP0 without causing disturbances between the two GTPs in a dual tile. To besure to have no problem with this, only the GTP0 tiles were used. The intentionwas to minimize the sources of error until the functioning of the transceivers wasbetter understood. Once data had been successfully sent over one line, the ideawas to use both GTP0 and GTP1. However, the task to send data over four linesat the same time had to be made. Then, to avoid that cross talk between thelines should cause a problem for the communication, use of both GTP0 and GTP1was postponed. The disadvantage was that this created big projects with manyparameters to keep track of. At the actual point of time, it was effective but oncethe design had been shown to be successful for 8 bit transmission, a test to receivethe data with only two dual tiles should have been implemented saving a lot oftime and trouble in the following work. Moreover, because the intention was touse the GTP1 once the transmission was assured, it was not excluded causing theprojects to become even bigger with an unused GTP1 connected in the dual tile.

Channel bonding

From the beginning the idea was to achieve a synchronous reception of thedata in the four lines. It was then shown that there was a skew between thembut also realized that this should not cause a problem if the data was stored inseparate memories and subsequently transferred to the PC where the desired fourbit words could be created.

The channel bonding was implemented as a suggestion for how to eliminate theskew between the lines if the skew is caused by start up problems. The Σ∆-chipthen needs a counter or something else to produce a known pattern of data becausethe receiver has to find this pattern and the Σ∆-converter sending unsorted data

63

Page 76: Full Text 02

64 Discussion

does not provide such patterns. The counter then would have to be removed aftersynchronous reception has been achieved and the Σ∆-converter to be plugged inwith, hopefully, no skew to appear thereafter. This solution has several draw-backs. It cannot be used with the present chip and, if a counter was implementedon a new chip, it would be problematic to disable the channel bonding when theconverter replaces the counter. The channel bonding will not stop searching forthe pattern and it is not unlikely that the Σ∆-converter could form this patternby chance and cause the channel bonding to stop the transmission and wait forthe slaves. When these are not arriving, the synchronous reception will cease.Because of this problem, channel bonding, even if it would work successfully witha counter on the chip, is not shown as a result. It was tested with the aim to givesome inspiration on how to solve this quest in the future.

Use of own produced data

The Σ∆-chip includes a PRBS generator so that the configured receiver canbe controlled. It is therefore a demand that the design of the receiver is such thatit has the possibility to receive a PRBS pattern of the same kind as the generatoron the Σ∆-chip. The chip does not include a counter so the use of a counter in theproject can be questioned. However, it has been useful during the work to designthe configuration. In designing the interface between the PC (MatLab) and theFPGA, it would have been hard to see if the random like PRBS pattern sent wascorrectly received at the PC. To see and estimate the degree of the skew betweenthe lines was facilitated. Own produced data also helps the designer to get a thor-ough understanding of the blocks in the receiver. It was, for instance, easy to seehow the words may be formed wrongly when comma detection is not used (seesection 4.1 under ’Comment on comma detection’). Using the PRBS pattern, thiswould be impracticable. The problem could possibly be solved by having a PRBSchecker written in MatLab and an algorithm to check if there was skew betweenthe lines. This should, however, have taken some extra effort to achieve and theunderstanding of the transceiver would not have been as great as when, as here,own produced data was used. Because of these advantages, own produced datawere used and its features explained in this work.

State machine

Throughout this work, simplicity has been aimed at. Many Verilog moduleshave been used and the designs in the configurations kept small to understand andfind errors in the design as early as possible. However, this has not been the casewith the protocol. Using just one state machine may be questioned. It has beenhard to represent as a figure to achieve an overlook. It has also been difficult tofind bugs in this large design.

Page 77: Full Text 02

65

Eight bit configuration

Two parallel processes, one of constructing a system for a eight bit solutionand one for a ten bit solution, are described. The two systems both work well.Use of the eight bit data made it easy to see the skew between the lines since thewhole words could be sent at the same time (through the RS232, see 2.42) butthis solution is not presented in the result section because it has only been usedin tests and preparations for the final ten bit system. However, it will be possibleto use the information in the chapter on the implementations to produce a wellworking system for eight bit transmission.

Page 78: Full Text 02
Page 79: Full Text 02

Chapter 6

Future work

Investigating the cause of skew

According to the user guide [6], the PMACLKs of the different dual tiles havean arbitrary phase difference which can be reduced by using the Phase alignmentcircuit. The skew seen between the four lines in the receiver could thus have beencaused by the phase differences in the transmitters. It would therefore be inter-esting to test if the skew will disappear using the Phase alignment circuit.

If the Σ∆-chip will soon be available, a simpler and more relevant way to testif the transmitters caused the skew can be done using the PRBS measurementdescribed in "Comment on storage of data" in 4.2. Instead of measuring the skew,the test could be used to see if there is any skew at all. If there is no skew, it couldbe concluded that the transmitters on the LT110X-chip caused the skew. If thisis the case, the original idea of how to form the 4 bit words in hardware could beimplemented, which would result in a faster system.

Clock sources

In this work the TXOUTCLK, the parallel clock produced in the PLL for thetransmitter, is used to source the user clocks in the receiver. This was done toachieve a synchronization, which was important when receiving data for one of theloopbacks. However, this design was retained throughout the whole project. Itwas convenient when transmitters always were used for the tests made in the workand causes no problems. A disadvantage is that an unnecessary amount of logicgets occupied. It would be possible to instead use the reference clock to feed theuser clocks and disconnect the transmitters. The REFCLKOUT will then be usedinstead of the TXOUTCLK as source for the receiver user clocks (RXUSRCLK).The TX line rate should then be set as ’No TX’ in the GTP Transceiver Wizard.

67

Page 80: Full Text 02

68 Future work

Different oscillator sources for transmitter and receiver

In this thesis work the TX buffer was used assuming that the same oscillatorsource could be used for both transmitter and receiver. Tests are planned at thedepartment to use two synchronized signal generators to produce one clock forthe∑

∆-converter at 2.4 GHz and one for the 240 MHz to the dual tiles. If thiswould not be successful, it could be interesting to test to resolve the phase differ-ence between the recovered clock RXRECCLK and the RXUSRCLK by using thePhase alignment circuit and feed the RXUSRCLK with the RXRECCLK to avoidfrequency differences between the clock domains.

Extension of MatLab program

The MatLab program has to be extended so that it can handle the skew mea-surement using the PRBS pattern sent into the four lines. It must also be able touse the information from the skew measurement so that the correct four bit wordscan be formed.

Page 81: Full Text 02

Chapter 7

Conclusion

The thesis work has been successful in its purpose to configure four receivers sothat they can receive data at the speed of 2.4 Gbps. The aim of the work wasalso to transfer the received data to a PC to enable analysis. This was achievedby creating two port FIFOs that store the 10 bit words from the receivers at 240MHz. The data can then be read from the FIFOs at 50 MHz. Thereby, by cre-ating a protocol the data could be transferred to the computer through a UARTat 11520 bps (using RS232). In MatLab the user can turn on storage of the datafrom the four lines into the FIFOs and then read the incoming data. The initialidea was that the first bit in each word from the four receivers would form a fourbit word directly in the FPGA. Then next bit in each word would form the nextfour bit word and so on. Due to observed skew, this will be done in the PC afterskew analysis instead.

Since the Σ∆-chip was not available the transmitters on the development boardwere used to configure the receivers. To achieve the most similar situation as withthe Σ∆-chip the receivers were configured with the same conditions, e.g. the samePRBS pattern. This enables that the existing configuration can be directly imple-mented when the chip arrives and the PRBS pattern can be sent to test the linkand measure the skew, see 4.2 (Comment on storage of data).

The data can be decimated (filtered and down sampled) in the PC but suchtask was not included in this work. There are still improvements that can be madeto the design of the receivers when the actual Σ∆-chip is ready to use but the maintask of this project has been successfully completed.

69

Page 82: Full Text 02
Page 83: Full Text 02

Chapter 8

References

1. ChristerSvensson (personalcommunication)

2. KevinMorris.V irtex− 5isalive.Thehighendgetshigher.FPGAandStructuredASICJournal. May162006.www.fpgajournal.com

3. http : //www.olopolo.com/kurser/dcom/rs232.html

4. RashadRamzan (personalcommunication)

5. Xilinx, http : //www.xilinx.com/support/documentation/datasheets/ds100.pdf.V irtex− 5FamilyOverview, version4.4edition, September2008

6. Xilinx, http : //www.xilinx.com/support/documentation/userguides/UG196.pdf.V irtex− 5RocketIOGTPTransceiverUserGuide, version1.6edition, February2008

7. Xilinx, http : //www.xilinx.com/support/documentation/userguides/UG225.pdf.ML52xUserGuide.V irtex− 5RocketIOCharacterizationP latform, version1.0edition,March2007

8. Xilinx, http : //www.xilinx.com/support/documentation/userguides/UG188.pdf.V irtex− 5FPGARocketIOGTPtransceiverWizard, version1.9.1edition, June2008

9. Xilinix, http : //www.xilinx.com/support/documentation/applicationnote.pdf.TransmittingDDRdatabetweenLV DSandRocketIOCMLdevices,XAPP756,version1.0edition,November2004(Author :MartinKellerman)

71

Page 84: Full Text 02