05417289

  • Upload
    gzb012

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

  • 7/27/2019 05417289

    1/6

    Performance Evaluation of Galois Field Arithmetic

    Operators for Optimizing Reed Solomon Codec

    Petrus MursantoFaculty of Computer Science University of IndonesiaKampus UI, Depok 16424 Phone: +62-21-7863419

    Email: [email protected]

    Abstract-A series of experiments has been conducted to show

    that efficiency improvement in Galois Field (GF) operators does

    not directly correspond to the system performance at application

    level. The experiments were motivated by so many research works

    focusing on performance improvement of GF operators.

    Numerous variants of operators were formed based on various

    combination of operation types (multiplication, division, inverse,

    square), representation basis (Polynomial, Normal, Dual), and

    processing types (serial, parallel). Each of the variants has the

    most efficient form in either time (fastest) or space (smallestoccupied area) when implemented in FPGA chips. In fact, GF

    operators are not utilized individually, rather integrated one to

    the others in implementing algorithms, mostly in error correction

    codes and cryptography applications. The experiments based on

    the implementation of Reed Solomon Encoder and Decoder

    RS(15,11) 4-bit using VHDL by means of two synthesis tools the

    Xilinx ISE 8.2i and the Altium ProChip Designer concludes that

    application performance mainly depends on the composition and

    distribution of the operators as well as their interaction and

    interconnection within the system architecture.

    Keyword: Galois Field, Reed Solomon, VHDL, FPGA.

    I. INTRODUCTION

    Galois Field (GF) arithmetic plays an important role in

    modern communication system, particularly in two important

    aspects of information exchange, i.e. security and data

    correctness. GF is utilized in cryptography algorithm [1][2] and

    error correction codes (ECC) [3][4]. Performance of

    applications in these two fields is determined by the efficiency

    of GF arithmetic operators involved in the system [5]. There

    has been found in the literatures, research efforts in improving

    GF operators efficiency, i.e. multiplication [6], division [7]

    and inversion [8]. In fact, GF operators are not performing theirfunctions individually and independently, rather they are parts

    of a functional integration at the system level. Is operatorefficiency beneficial to the application level performance?

    This paper reports an experimental result of implementing

    Reed Solomon Encoder and Decoder (Codec) RS(15,11) 4-bitbased on six variants of GF operator. The purpose of the

    experiment is to obtain an RS Codec configuration whose

    throughput is the most optimum. The RS Codec was

    implemented using VHDL by means of two synthesis tools: the

    Xilinx ISE 8.2i and the Altium ProChip Designer.

    II. PREVIOUS RESEARCH

    Similar to the ordinary algebra, GF algebra has a number of

    arithmetic operations, such as: addition, subtraction,

    multiplication, division, inversion, square and square root.

    Variants of GF arithmetic operators are characterized by:1.operation types: multiplication, division, inversion, square or

    square root.

    2.representation basis: standard/polynomial (PB), normal (NB)or dual (DB)

    3.processing types: serial or parallelIn digital circuit, GF addition and subtraction are simply

    implemented by exclusive-OR logic operation. The advance of

    digital technology has shifted performance metric from running

    time of software algorithm [9] to VLSI complexity, i.e. the

    number of components and their total delay [6].

    A. Performance Improvement of GF Operators

    The first circuit structure of GF arithmetic was proposed by

    Berlekamp in 1982, i.e. polynomial and dual based

    multiplication [10]. Normal based multiplier was introducedfirstly by Massey-Omura in 1986 [11], which is knownafterward as MO multiplier. In 1988, Mastrovito proposed a

    more modular multiplier with higher regularity of the structure

    that suits systolic cells in VLSI [12]. However, speed, size and

    modularity of Mastrovito's multiplier depend much on the

    irreducible polynomialP(x) used to generate the field elements.

    By selecting the right P(x), parallel multiplication has at most

    2m2-1 gates and occupies 55% of the space required for

    implementing Bartee and Schneider's algorithm [9]. In 1991,

    Mastrovito's dissertation reported an experimental investigation

    on multiplication using more than one representation basis

    [13]. It was concluded that PB multiplier is the most versatile

    form for the most arithmetic computational problems GF(2m) in

    VLSI. In addition, PB solution also posses conversion cost that

    can compensate the efficiency gained by the other

    representation basis [14] and occupies a half space of the one

    required by MO multiplier. Mapping problem for inter-basis

    conversion is the concern of Wu et al. [15] who introduced anefficient conversion method from PB to NB specifically for

    squaring. Furthermore, Sunar et al. proposed conversion

    matrixes for any form of generator polynomial [16].

  • 7/27/2019 05417289

    2/6

    Several improvements of multiplication algorithm were also

    reported by Afanasyev [17] and similarly by Hasan et al. [18]that proposed a modification of the architecture by defining the

    irreducible polynomial as all-one polynomial (AOP). By

    applying the AOP, Hasan claimed the complexity of

    multiplication decreases by 50%. Meanwhile, Lee-Lim also

    reported performance improvement by applying circular dual

    basis (CDB) [19]. Lee's method is very efficient for trinomial

    with composite GF((2n)m) where m is primary relative overn,orgcd(m,n) = 1. However, defining certain form of irreducible

    polynomial is considered as limitation, inflexible and low

    reusability [20].

    A comprehensive study on GF arithmetic was reported by

    Paar's dissertation [21], in which he proposed a decomposition

    algorithm from GF(2k) to GF((2

    n)

    m) where k = n.m, called

    composite field. In addition, Paar also explored inversion after

    the first algorithm introduced by Itoh-Tsujii in 1988 [22].

    Further Paar's research reported composite field multiplicationand inversion in GF(2

    8) [23]. Composite field implementation

    in FGPA showed component saving by 25% and acceleration

    by 10% [24]. The composite field inversion requires 29% of

    AND and XOR gates compared to the standard one. Rudra [25]

    and Jutla [26] also developed a method for linear

    transformation of GF binary elements to composite fieldrepresentation.

    Combination of serial and parallel processes were reported

    by Choi et al. [27] who introduced hybrid multiplier by

    forming irreducible polynomial xm

    + xn

    + 1 where n m/2.This hybrid multiplier has flexible structure to compromise

    space and time complexity and is proven having less

    complexity than Wu and Hasans multiplier [15]. Several other

    methods were proposed to support hardware implementation,

    such as Huang-Wu [28] that has systolic array architecture

    approach to ease the testing process.

    B. RS Codec Implementation

    RS Codec algorithm has been discussed extensively in term

    of software algorithm running times. There are two references

    mentioning hardware implementation of RS Codec, they are

    [29] and [30]. However, there is no analytical discussion from

    those references, but merely on experience sharing with very

    specific characteristics. Kamran Saleh managed to develop

    RS(15,11) 4-bit that has processing speed 692 Mbps [29].

    Shayan et al. developed RS(16,12) 5-bit for the Advanced

    Train Control System (ATCS) radio data link [30]. The RSdevice has total delay (encode and decode) 1.32ms in 2 MHz

    system.

    The available RS Codec applications are merely commercial

    IP Core with specific interface and features, such as [31], [32]

    and [33]. Component compositions, operator types and

    algorithms involved are kept secret due to its proprietary.

    III. MOTIVATION

    Previous research focused on efficiency improvement of GF

    operators to obtain better performance in term of speed or

    occupied space when implemented in digital circuits. However,

    literatures on GF-based implementation at system level are

    merely experience sharing with specific features without any

    analysis on the consequences of GF operator variants involved

    in the system. Based on the available literatures, theexperiments were designed to answer the following research

    questions:

    Can performance improvement gained at operator level beobtained linearly at the application level?

    How can GF-based circuit optimization be achieved atapplication level?

    Will an application take benefit by employing all bestvariants of GF operators?

    IV. METODOLOGY

    A set of experiments was designed to examine whether thebest variants of GF operator can be combined to construct the

    most efficient application. In other words, what configuration

    of GF operator variants can produce an application with the

    greatest throughput? Arithmetic operator types were taken assuggested by the algorithm of encoder and decoder for

    RS(15,11) 4-bit. The operators were implemented as the most

    efficient variant from each combination of two parameters:

    representation basis (PB, NB or DB) and processing structure

    (parallel or serial). For further reference in the next discussion,

    we use six variants of operators as shown in Table I.

    TABLE I

    GFOPERATORVARIANTS

    Varian Structure Basis

    1 ParallelPolynomial

    2 Serial

    3 ParallelNormal

    4 Serial

    5 ParallelDual

    6 Serial

    The RS Codec is implemented into six versions, each of

    which was constructed based on the variants in Table I. They

    were implemented using structural VHDL and synthesized

    with Xilinx ISE 8.2i and Altium ProChip Designer. The

    synthesis process results in maximum combinational path ortotal delay that defines the maximum frequency possiblysupplied to the system. Hence, the system throughput can be

    calculated based on data capacity proceeded per time unit,

    expressed in Mega Byte per second (MBps). Throughput is

    then used as an indicator of the system performance. An

    optimal configuration is defined as the one having biggest

    throughput among the six versions of the system.

  • 7/27/2019 05417289

    3/6

    V. GF OPERATORARCHITECTURES

    This section briefly discusses the six variants of GF operator,

    particularly the ones widely used in encoding and decoding

    process, i.e. the multiplication. Variant 1 of multiplier

    implements Mastrovito's circuit in [12]. Multiplication is

    proceeding partially in variant 2 by the circuit shown in Fig.1.

    Variant 3 and 4 are realized based on the NB multiplier in [13].

    Multiplier variant 5 and 6 are implemented using parallel andserial structure in [13].

    Fig. 1. Serial PB MultiplierGF(2m).

    The implementation of six multiplier variants results in

    delays shown in Table II. It can be seen that the parallel

    multiplier in dual basis has the biggest combination delay. Thebest variant is the one having the smallest delay, i.e.

    polynomial based parallel multiplier or variant 1.

    TABLE II

    DELAY OF GF(24)MULTIPLIER IN NS

    Structure Tools PB NB DB

    ParallelXilinx 12,69 13,49 17,61

    Altium 11,37 11,94 14,25

    SerialXilinx 15,96 15,18 15,31

    Altium 15,24 16,26 15,14

    VI. RS ENCODER

    The Reed Solomon encoder circuit is shown in Fig.2. Based

    on the same structure, six versions of RS Encoder were

    implemented, each of which employs multiplier from thevariants in Table I and their performance were compared each

    other. Addition in encoding process can easily be implementedusing XOR, bit-serially as well as in parallel. Therefore,

    performance is totally affected by the delay of each multiplier

    type in the encoder.

    The RS(15,11) 4-bit encodes the information of 11 symbols

    @4-bit or in total 44 bits. Number of cycle required for the

    encoding process is 1 cycle initialization plus 11 cycles for

    parallel multiplication. The serial PB and NB require 1

    (initialization) + 11 (1 init + 4 bit) = 56 cycles. Specific for

    serial DB, 1 (initialization) + 11 4 bit = 45 cycles. Synthesis

    result reports a minimum period as shown in Table III. As

    consequences the throughput obtained by each version of RSEncoder are shown in Table IV.

    Fig. 2. Block of RS encoding process.

    TABLE III

    MINIMUM PERIOD OF RS ENCODER IN NS

    Tool PB NB DB

    Xilinx 2,50 3,80 2,76

    Altium 2,11 3,69 2,55

    TABLE IV

    RS ENCODERTHROUGHPUT IN MBPS

    Structure Tools PB NB DB

    ParallelXilinx 36,00 33,95 26,00

    Altium 40,30 38,39 32,20

    SerialXilinx 39,30 25,90 44,25

    Altium 46,52 26,60 48,00

    It is shown that RS Encoder has the biggest throughput by

    using multiplier variant 6, i.e. dual based serial structure. In

    fact Table II shows that the fastest multiplier is performed by

    variant 1, i.e. polynomial based parallel multiplier.

    VII. RS DECODER

    The main function of decoding process is to detect the

    existence of error and make correction of it (if any). The

    configuration of RS(15,11) allows the maximum number of

    error (t) exists in 2 symbols. The decoding process has main

    process as shown in Fig.3.

    Fig. 3. RS Decoder.

  • 7/27/2019 05417289

    4/6

    The implementation and evaluation results of eachprocessing blocks are discussed in the following sessions.

    A. Syndrome Calculator

    Syndrome Calculator is to detect the existence of error in the

    transmitted codewords. If the values of syndromes are all zero,

    then r(x) is the codewords c(x) or in other words there is noerror. Error produces a set of syndromes with specific pattern.

    Number of syndrome to be calculated is 2t, hence in RS(15,11)

    there will be four syndromes (Si). Syndromes are calculated by

    serial entry i

    for1 i 4 to the received polynomial r(x).Since the data is transmitted per symbol, hence the whole data

    should be received first before computing Si in parallel. The

    delay and throughput of syndrome calculator for the threerepresentation basis are shown in Table V.

    TABLE V

    DELAY AND THROUGHPUT OF FULLY PARALLEL SYNDROME CALCULATOR

    Tools Unit PB NB DB

    XilinxDelay (ns) 15,29 16,67 15,14

    Thrghpt (MBps) 32,69 29,99 32,20

    AltiumDelay (ns) 13,70 14,75 12,25

    Thrghpt (MBps) 36,50 33,90 40,80

    Syndrome computation can also be performed sequentially

    as the symbols are transmitted. The result of performance

    measurement is shown in Table VI.

    TABLE VITHROUGHPUT SYMBOL-SERIAL SYNDROME CALCULATOR (MBPS)

    Structure Tool PB NB DB

    Parallel

    Xilinx 36,94 34,74 26,62

    Altium 41,23 39,26 32,89

    SerialXilinx 40,00 26,35 45,26

    Altium 47,37 27,10 49,10

    B. Key Equation Solver

    Key Equation Solver (KES) is the most complex module in

    RS decoding process. KES is implemented accordingBerlekamp-Massey algorithm in [34]. The experiment on the

    six variants of operator within KES circuits results in

    performance figure shown in Table VII.

    TABLE VIIKESTHROUGHPUT IN MBPS

    Structure Tool PB NB DB

    ParallelXilinx 147,75 138,95 106,49

    Altium 164.90 160,00 131,58

    SerialXilinx 150,00 98,80 169,71

    Altium 177,60 101,57 184,20

    C. Error Locator & Magnitude Polynomial

    Error locator and magnitude are calculated by the circuit

    described in [34]. Throughput is calculated with max error of 4

    and represented by four coefficients of the error locator

    polynomial = 4 4 = 16 bit. Table VIII shows the module

    performance with six variants of multiplier and square.

    TABLE VIII

    ERRORLOCATOR & MAGNITUDE POLYNOMIAL

    Structure Tool PB NB DB

    Parallel Xilinx 157,60 148,20 113,58Altium 175,90 174,00 140.35

    SerialXilinx 160,00 105,40 181,02

    Altium 189,50 108,34 196,50

    D. Error Corrector

    Error correction is conducted by summing (XOR) Error

    Magnitude Polynomial with the received data. Delay for the

    three representation basis is the delay of XOR gates. In Xilinx:

    delay = 2,9 ns and throughput = 2,586 GBps, whereas in

    Altium: delay = 2,7 ns and throughput = 2,778 GBps.

    E. Optimal RS Decoder

    Practically the overall RS Decoder performance requires auniform representation basis for the whole symbols. It has been

    learnt that the speed of inter-module sequential processes is

    defined by the Syndrome Calculator. The fastest error locatorin DB is slowed down by the Syndrome Calculator

    performance. In summary, Table IX to Table XII show the best

    throughput of the module from each representation basis.

    TABLE IXTHE BEST SERIAL SYNDROME CALCULATOR

    Tool Architecture Thrghpt (MBps) clk

    Xilinx

    Serial PB 40,00 2,5 75

    Parallel NB 34,74 13,49 16Serial DB 45,26 2,76 60

    Altium

    Serial PB 47,37 2,11 75

    Parallel NB 39,26 15,18 16

    Serial DB 49,10 2,55 60

    To combine the four modules into one integrated system,

    cycle rate must conform to the slowest component. Hence, the

    best configurations from each representation basis are shown in

    Table XIII. It is shown that both synthesis tools produce the

    best RS Decoder which has serial operation and GF element

    representation in dual basis. In fact, Table II shows that thebest multiplier is in parallel PB.

    TABLE XTHE BEST KEY EQUATION SOLVER

    Tool Architecture Thrghpt (MBps) clk

    Xilinx

    Serial PB 150,00 2,5 20

    Parallel NB 138,95 13,49 4

    Serial DB 169,49 2,76 16

    Altium

    Serial PB 177,60 2,11 20

    Parallel NB 157,00 11,94 4

    Serial DB 184,20 2,55 16

  • 7/27/2019 05417289

    5/6

    TABLE XITHE BEST ERRORLOCATOR & MAGNITUDE POLYNOMIAL

    Tool Architecture Thrghpt (MBps) clk

    Xilinx

    Serial PB 160 2,5 5

    Parallel NB 148,20 13,49 1

    Serial DB 181,02 2,76 4

    Altium

    Serial PB 189,5 2,11 5

    Parallel NB 174 11,94 1

    Serial DB 196,50 2,55 4

    TABLE XIITHE BEST ERRORCORRECTOR

    Tool Basis Thrghpt (MBps) clk

    Xilinx

    PB 2586 2,9 1

    NB 2586 2,9 1

    DB 2586 2,9 1

    Altium

    PB 2778 2,7 1

    NB 2778 2,7 1

    DB 2778 2,7 1

    TABLE XIII

    OPTIMAL DECODER

    Tool Basis /clk #cycle (ns) Thrghpt(MBps)

    Xilinx PB 2,5 75+20+5+1 252,5 29,70

    NB 13,49 16+4+1+1 296,87 25,26

    DB 2,76 60+16+4+1 223,72 33,52

    Altium PB 2,11 75+20+5+1 213,2 35,18

    NB 11,94 16+4+1+1 262,68 28,55

    DB 2,55 60+16+4+1 206,15 36,38

    VIII. OPTIMIZATION BY THE SYNTHESIS TOOL

    Experiment result shows that employing the best operators

    does not automatically produce the most efficient applications.This phenomenon is shown consistently by Xilinx and Altium

    synthesis tools. Simple and common logic saying that parallel

    process is faster than serial does not hold. This is caused by the

    fact that the synthesis tool does some improvement or limitedoptimization to the VHDL structural design.

    Multiplication is obviously the dominant process in RS

    Codec. Examination on the results shows that multiplication

    with fixed bit operands is optimized by removing unnecessary

    components in the system. For specific configuration of RS

    Codes,P(x) andg(x) are fixed during the system lifecycle. g(x)

    is one of the operand performing as a(x) in Fig.1. Fixed values

    ofpi and gi cause the content of blockEi that was originallyshown as in Fig.4a can be simplified becoming the circuits

    shown in Fig.4b-c-d-e. All combination of ai and pi results in

    internal blockEi without any AND gate. By omitting AND

    gates, Xilinx saves significantly the delay so that the minimum

    period is 2,5 ns. Hence, processing 4 bit requires 10 ns, which

    is smaller than parallel multiplication delay 12,69 ns.

    Fig. 4. Internal component ofEi in PB Serial Multiplier.

    It is examined that optimization was not applied to parallel

    multiplications although they also have constant operands. It is

    due to the fact that the tools optimize constant bit only,

    whereas parallel multiplier signal is implemented as an m-bitbus.

    DB multiplier is superior over the PB and NB variants in

    term of fully serial delivery of the product. With constant

    values ofP(x) and A(x), DB serial multiplier delivers theproduct bit by bit as the operand bi enters from MSB to LSB.

    Therefore, variant 6 saves 1 cycle compared to variant 2 and 4

    that deliver the product after the whole cycle for m-bit

    completes. For that reason, variant 6 based systems can

    proceed addition serially as the product is delivered from index

    m-1 to 0.

    With constantP(x) = x4

    + x + 1, optimization steps applied

    to serial DB multiplication is shown by Fig.5. In general this

    finding supports several statements in the literature that

    multiplication with constant operands can be more efficient

    [10]. It was examined as well that optimization does not applyto serial NB multiplier. In serial NB multiplication, the values

    of both operands change dynamically during the lifecycle of

    the system due to the rotation of internal registers.

    IX. CONCLUSION

    This paper reports that an optimal performance of RS Codec

    does not always require the best variants or the most efficient

    GF operators. Combining all operators, each of which is themost efficient variant, is not a simple mechanistic conversion

    process. Obtaining synergic efficiency at system level requirescareful considerations on several factors such as: the operator

    composition and distribution, interaction between them and

    types of internal process within the system. In addition, when

    implemented in FPGA, efficiency improvement is also

    contributed by the synthesis tool that optimizes serial operators

    whose operands are constant. This explains why theexperimental results show the superiority of serial based

    system over parallel ones.

  • 7/27/2019 05417289

    6/6

    Fig. 5. DB Serial Multiplier a) original b) optimized with constantP(x).

    It is interesting to examine further the consistency of this

    optimization in other GF based applications, such as

    cryptography and other configurations of RS Codec.

    Explorative experiments are required for real symbol width, m= 8 bit such as the one used by NASA RS(255,223) 8-bit [35]

    and Rijndael AES with cipher-key 8-bit [36]. Hypothetical

    prediction suggests that higher performance ratio would beobtained by serial variants over the parallel ones. It is because

    of additional combinational path in parallel operators that

    results in bigger delays. Meanwhile, serial operators requireonly several additional cycles with constant minimum periods.

    ACKNOWLEDGMENT

    This material is based upon work supported by EuropeanUnion Project VN/Asia-Link/001 (79754). The author would

    like to thank Prof. R.G. Spallek and Mr. Jrg Schneider for

    providing tools and facilities in the Professur fr VLSI-

    Entwurfssysteme, Diagnostik und Architektur, Technische

    Universitt Dresden, Germany.

    REFERENCES

    [1] vanTilborg, H. (1988) An Introduction to Cryptology, Kluwer Academic

    Publishers.[2] Schneier, B. (1993) Applied Chryptography, Wiley & Sons.

    [3] B.Wicker, S. (1995) Error Control Systems for Digital Communication

    and Storage, Prentice Hall, Upper Saddle River, New Jersey 07458.[4] Sweeney, P. (2002) Error Control Coding from Theory to Practice, John

    Wiley & Sons, Inc., England.

    [5] Lin, S. and Costello, D. J. (1983) Error Control Coding, Prentice Hall,Englewood Cliffs, New Jersey.

    [6] Wang, C., Truong, T., Shao, H., Deutsch, L., Omura, J., and Reed, I.

    August 1985 IEEE Transactions on Computers 34(8), 709717.[7] Hasan, M., Wang, M., and Bhargava, V. August 1992 IEEE Transactions

    on Computers 41(8), 972980.

    [8] Zhou, T., Wu, X., Bai, G., and Chen, H. 4 July 2002 Electronics Letters38(14), 706707.

    [9] Bartee, T. and Schneider, D. March 1963 Information and Computers 6,

    79 98.

    [10] Berlekamp, E. November 1982 IEEE Transactions on InformationTheory 28, 869 874.

    [11] Massey, J. and Omura, J. Computational method and apparatus for finite

    field arithmetic Technical report US Patent No. 4,587,627 to OMNET

    Association Sunnyvale CA, Washington, D.C. (1986) Patent and

    Trademark Office.[12] Mastrovito, E. D. VLSI designs for computations over finite fields

    GF(2m) Masters thesis Linkping Studies in Science and Technology

    Linkping, Sweden December 1988 Thesis No: 159.[13] Mastrovito, E. VLSI Architectures for Computations in Galois Fields

    PhD thesis Department of Electrical Engineering, Linkping University,

    Sweden (1991).[14] Gollman, D. Algorithmenentwurf in der kryptographie Habil. University

    of Karlsruhe (1990) Preprint.

    [15] Wu, H., Hasan, M. A., F.Blake, I., and Gao, S. 23 August 2001.[16] Sunar, B., Savas, E., and Koc, C . K. November 2003 IEEE

    Transactions on Computers 52(11), 13911398.[17] Afanasyev, V. (1990) In Proceedings of II International Workshop on

    Algebraic and Combinatorial Coding Theory Leningrad, USSA: pp. 68.

    [18] Hasan, M., Wang, M., and Bhargava, V. October 1993 IEEE

    Transactions on Computers 42(10), 1278 1280.[19] Lee, C.-H. and Lim, J.-I. A new aspect of dual basis for efficient field

    arithmetic Technical report Samsung Advaced Institute of Technology

    (1998).[20] Fenn, T. S., Benaissa, M., and Taylor, D. March 1996 IEEE Transactions

    on Computers 45(3), 319 327.

    [21] Paar, C. Efficient VLSI Architectures for Bit-Parallel Computation inGalois Fields PhD thesis Institute for Experimental Mathematics,

    University of Essen Essen, Germany June 1994.[22] Itoh, T. and Tsujii, S. (1988) Information and Computing 78, 171177.

    [23] Paar, C. and Rosner, M. April 1997 IEEE Symposium on FPGA-Based

    Custom Computing Machines (FCCM) pp. 219225.

    [24] Mursanto, P. 29-30 January 2007 In National Conference on ComputerScience & Information Technology Depok, Fasilkom UI.

    [25] Rudra, A., Dubey, P. K., Jutla, C. S., Kumar, V., Rao, J., and Rohatgi, P.(2001) volume 2162, of Proceedings of the Third International Workshop

    on Cryptographic Hardware and Embedded Systems London, UK:

    Springer-Verlag. pp. 171184.[26] Jutla, C., Kumar, V., and Rudra, A. On the circuit complexity of

    isomorphic Galois Field transformations Technical Report RC22652

    (W0211-243) IBM Research Division November 2002.[27] Choi, Y., Chang, K.-Y., Hong, D., and Cho, H. July 8 2004 Electronics

    Letters 40(Issue 14), 852 853.

    [28] Huang, C.-T. and Wu, C.-W. September 2000 IEEE Transactions onCircuit and Systems 48(9), 909918.

    [29] Saleh, K. Hardware architectures for Reed-Solomon decoders Technical

    report Amirkabir University of Technology, Iran January 2003.

    [30] Shayan, Y., Ngoc, T., and Bhargava, V. November 1990 IEEE Transc. on

    Vehicular Technology 39(4), 400409.[31] ASICSws Reed Solomon encoder IP core, http://www.asics.ws/doc/rse

    brief.pdf Nov 2006.

    [32] Xilinx, I. Reed-Solomon solutions with Spartan-ii FPGAs Technical

    report Xilinx, Inc. Feb. 2000 White Paper.[33] Reed Solomon coding Technical report VOCAL Technologies Ltd. 90A

    John Muir Drive, Buffalo - New York 14228 (2006).

    [34] Sarwate, D. and Shanbhag, N. October 2001 IEEE Transactions on VLSISystems 9(5), 641655.

    [35] Sklar, B. (2003) Digital Communications Fundamentals and

    Applications, Prentice Hall, Inc., New Jersey 2nd ed. edition.[36] Daemen, J. and Rijmen, V. March 2001 Dr.Dobbs Journal 26, 137139.