Upload
gzb012
View
218
Download
0
Embed Size (px)
Citation preview
7/27/2019 05417289
1/6
Performance Evaluation of Galois Field Arithmetic
Operators for Optimizing Reed Solomon Codec
Petrus MursantoFaculty of Computer Science University of IndonesiaKampus UI, Depok 16424 Phone: +62-21-7863419
Email: [email protected]
Abstract-A series of experiments has been conducted to show
that efficiency improvement in Galois Field (GF) operators does
not directly correspond to the system performance at application
level. The experiments were motivated by so many research works
focusing on performance improvement of GF operators.
Numerous variants of operators were formed based on various
combination of operation types (multiplication, division, inverse,
square), representation basis (Polynomial, Normal, Dual), and
processing types (serial, parallel). Each of the variants has the
most efficient form in either time (fastest) or space (smallestoccupied area) when implemented in FPGA chips. In fact, GF
operators are not utilized individually, rather integrated one to
the others in implementing algorithms, mostly in error correction
codes and cryptography applications. The experiments based on
the implementation of Reed Solomon Encoder and Decoder
RS(15,11) 4-bit using VHDL by means of two synthesis tools the
Xilinx ISE 8.2i and the Altium ProChip Designer concludes that
application performance mainly depends on the composition and
distribution of the operators as well as their interaction and
interconnection within the system architecture.
Keyword: Galois Field, Reed Solomon, VHDL, FPGA.
I. INTRODUCTION
Galois Field (GF) arithmetic plays an important role in
modern communication system, particularly in two important
aspects of information exchange, i.e. security and data
correctness. GF is utilized in cryptography algorithm [1][2] and
error correction codes (ECC) [3][4]. Performance of
applications in these two fields is determined by the efficiency
of GF arithmetic operators involved in the system [5]. There
has been found in the literatures, research efforts in improving
GF operators efficiency, i.e. multiplication [6], division [7]
and inversion [8]. In fact, GF operators are not performing theirfunctions individually and independently, rather they are parts
of a functional integration at the system level. Is operatorefficiency beneficial to the application level performance?
This paper reports an experimental result of implementing
Reed Solomon Encoder and Decoder (Codec) RS(15,11) 4-bitbased on six variants of GF operator. The purpose of the
experiment is to obtain an RS Codec configuration whose
throughput is the most optimum. The RS Codec was
implemented using VHDL by means of two synthesis tools: the
Xilinx ISE 8.2i and the Altium ProChip Designer.
II. PREVIOUS RESEARCH
Similar to the ordinary algebra, GF algebra has a number of
arithmetic operations, such as: addition, subtraction,
multiplication, division, inversion, square and square root.
Variants of GF arithmetic operators are characterized by:1.operation types: multiplication, division, inversion, square or
square root.
2.representation basis: standard/polynomial (PB), normal (NB)or dual (DB)
3.processing types: serial or parallelIn digital circuit, GF addition and subtraction are simply
implemented by exclusive-OR logic operation. The advance of
digital technology has shifted performance metric from running
time of software algorithm [9] to VLSI complexity, i.e. the
number of components and their total delay [6].
A. Performance Improvement of GF Operators
The first circuit structure of GF arithmetic was proposed by
Berlekamp in 1982, i.e. polynomial and dual based
multiplication [10]. Normal based multiplier was introducedfirstly by Massey-Omura in 1986 [11], which is knownafterward as MO multiplier. In 1988, Mastrovito proposed a
more modular multiplier with higher regularity of the structure
that suits systolic cells in VLSI [12]. However, speed, size and
modularity of Mastrovito's multiplier depend much on the
irreducible polynomialP(x) used to generate the field elements.
By selecting the right P(x), parallel multiplication has at most
2m2-1 gates and occupies 55% of the space required for
implementing Bartee and Schneider's algorithm [9]. In 1991,
Mastrovito's dissertation reported an experimental investigation
on multiplication using more than one representation basis
[13]. It was concluded that PB multiplier is the most versatile
form for the most arithmetic computational problems GF(2m) in
VLSI. In addition, PB solution also posses conversion cost that
can compensate the efficiency gained by the other
representation basis [14] and occupies a half space of the one
required by MO multiplier. Mapping problem for inter-basis
conversion is the concern of Wu et al. [15] who introduced anefficient conversion method from PB to NB specifically for
squaring. Furthermore, Sunar et al. proposed conversion
matrixes for any form of generator polynomial [16].
7/27/2019 05417289
2/6
Several improvements of multiplication algorithm were also
reported by Afanasyev [17] and similarly by Hasan et al. [18]that proposed a modification of the architecture by defining the
irreducible polynomial as all-one polynomial (AOP). By
applying the AOP, Hasan claimed the complexity of
multiplication decreases by 50%. Meanwhile, Lee-Lim also
reported performance improvement by applying circular dual
basis (CDB) [19]. Lee's method is very efficient for trinomial
with composite GF((2n)m) where m is primary relative overn,orgcd(m,n) = 1. However, defining certain form of irreducible
polynomial is considered as limitation, inflexible and low
reusability [20].
A comprehensive study on GF arithmetic was reported by
Paar's dissertation [21], in which he proposed a decomposition
algorithm from GF(2k) to GF((2
n)
m) where k = n.m, called
composite field. In addition, Paar also explored inversion after
the first algorithm introduced by Itoh-Tsujii in 1988 [22].
Further Paar's research reported composite field multiplicationand inversion in GF(2
8) [23]. Composite field implementation
in FGPA showed component saving by 25% and acceleration
by 10% [24]. The composite field inversion requires 29% of
AND and XOR gates compared to the standard one. Rudra [25]
and Jutla [26] also developed a method for linear
transformation of GF binary elements to composite fieldrepresentation.
Combination of serial and parallel processes were reported
by Choi et al. [27] who introduced hybrid multiplier by
forming irreducible polynomial xm
+ xn
+ 1 where n m/2.This hybrid multiplier has flexible structure to compromise
space and time complexity and is proven having less
complexity than Wu and Hasans multiplier [15]. Several other
methods were proposed to support hardware implementation,
such as Huang-Wu [28] that has systolic array architecture
approach to ease the testing process.
B. RS Codec Implementation
RS Codec algorithm has been discussed extensively in term
of software algorithm running times. There are two references
mentioning hardware implementation of RS Codec, they are
[29] and [30]. However, there is no analytical discussion from
those references, but merely on experience sharing with very
specific characteristics. Kamran Saleh managed to develop
RS(15,11) 4-bit that has processing speed 692 Mbps [29].
Shayan et al. developed RS(16,12) 5-bit for the Advanced
Train Control System (ATCS) radio data link [30]. The RSdevice has total delay (encode and decode) 1.32ms in 2 MHz
system.
The available RS Codec applications are merely commercial
IP Core with specific interface and features, such as [31], [32]
and [33]. Component compositions, operator types and
algorithms involved are kept secret due to its proprietary.
III. MOTIVATION
Previous research focused on efficiency improvement of GF
operators to obtain better performance in term of speed or
occupied space when implemented in digital circuits. However,
literatures on GF-based implementation at system level are
merely experience sharing with specific features without any
analysis on the consequences of GF operator variants involved
in the system. Based on the available literatures, theexperiments were designed to answer the following research
questions:
Can performance improvement gained at operator level beobtained linearly at the application level?
How can GF-based circuit optimization be achieved atapplication level?
Will an application take benefit by employing all bestvariants of GF operators?
IV. METODOLOGY
A set of experiments was designed to examine whether thebest variants of GF operator can be combined to construct the
most efficient application. In other words, what configuration
of GF operator variants can produce an application with the
greatest throughput? Arithmetic operator types were taken assuggested by the algorithm of encoder and decoder for
RS(15,11) 4-bit. The operators were implemented as the most
efficient variant from each combination of two parameters:
representation basis (PB, NB or DB) and processing structure
(parallel or serial). For further reference in the next discussion,
we use six variants of operators as shown in Table I.
TABLE I
GFOPERATORVARIANTS
Varian Structure Basis
1 ParallelPolynomial
2 Serial
3 ParallelNormal
4 Serial
5 ParallelDual
6 Serial
The RS Codec is implemented into six versions, each of
which was constructed based on the variants in Table I. They
were implemented using structural VHDL and synthesized
with Xilinx ISE 8.2i and Altium ProChip Designer. The
synthesis process results in maximum combinational path ortotal delay that defines the maximum frequency possiblysupplied to the system. Hence, the system throughput can be
calculated based on data capacity proceeded per time unit,
expressed in Mega Byte per second (MBps). Throughput is
then used as an indicator of the system performance. An
optimal configuration is defined as the one having biggest
throughput among the six versions of the system.
7/27/2019 05417289
3/6
V. GF OPERATORARCHITECTURES
This section briefly discusses the six variants of GF operator,
particularly the ones widely used in encoding and decoding
process, i.e. the multiplication. Variant 1 of multiplier
implements Mastrovito's circuit in [12]. Multiplication is
proceeding partially in variant 2 by the circuit shown in Fig.1.
Variant 3 and 4 are realized based on the NB multiplier in [13].
Multiplier variant 5 and 6 are implemented using parallel andserial structure in [13].
Fig. 1. Serial PB MultiplierGF(2m).
The implementation of six multiplier variants results in
delays shown in Table II. It can be seen that the parallel
multiplier in dual basis has the biggest combination delay. Thebest variant is the one having the smallest delay, i.e.
polynomial based parallel multiplier or variant 1.
TABLE II
DELAY OF GF(24)MULTIPLIER IN NS
Structure Tools PB NB DB
ParallelXilinx 12,69 13,49 17,61
Altium 11,37 11,94 14,25
SerialXilinx 15,96 15,18 15,31
Altium 15,24 16,26 15,14
VI. RS ENCODER
The Reed Solomon encoder circuit is shown in Fig.2. Based
on the same structure, six versions of RS Encoder were
implemented, each of which employs multiplier from thevariants in Table I and their performance were compared each
other. Addition in encoding process can easily be implementedusing XOR, bit-serially as well as in parallel. Therefore,
performance is totally affected by the delay of each multiplier
type in the encoder.
The RS(15,11) 4-bit encodes the information of 11 symbols
@4-bit or in total 44 bits. Number of cycle required for the
encoding process is 1 cycle initialization plus 11 cycles for
parallel multiplication. The serial PB and NB require 1
(initialization) + 11 (1 init + 4 bit) = 56 cycles. Specific for
serial DB, 1 (initialization) + 11 4 bit = 45 cycles. Synthesis
result reports a minimum period as shown in Table III. As
consequences the throughput obtained by each version of RSEncoder are shown in Table IV.
Fig. 2. Block of RS encoding process.
TABLE III
MINIMUM PERIOD OF RS ENCODER IN NS
Tool PB NB DB
Xilinx 2,50 3,80 2,76
Altium 2,11 3,69 2,55
TABLE IV
RS ENCODERTHROUGHPUT IN MBPS
Structure Tools PB NB DB
ParallelXilinx 36,00 33,95 26,00
Altium 40,30 38,39 32,20
SerialXilinx 39,30 25,90 44,25
Altium 46,52 26,60 48,00
It is shown that RS Encoder has the biggest throughput by
using multiplier variant 6, i.e. dual based serial structure. In
fact Table II shows that the fastest multiplier is performed by
variant 1, i.e. polynomial based parallel multiplier.
VII. RS DECODER
The main function of decoding process is to detect the
existence of error and make correction of it (if any). The
configuration of RS(15,11) allows the maximum number of
error (t) exists in 2 symbols. The decoding process has main
process as shown in Fig.3.
Fig. 3. RS Decoder.
7/27/2019 05417289
4/6
The implementation and evaluation results of eachprocessing blocks are discussed in the following sessions.
A. Syndrome Calculator
Syndrome Calculator is to detect the existence of error in the
transmitted codewords. If the values of syndromes are all zero,
then r(x) is the codewords c(x) or in other words there is noerror. Error produces a set of syndromes with specific pattern.
Number of syndrome to be calculated is 2t, hence in RS(15,11)
there will be four syndromes (Si). Syndromes are calculated by
serial entry i
for1 i 4 to the received polynomial r(x).Since the data is transmitted per symbol, hence the whole data
should be received first before computing Si in parallel. The
delay and throughput of syndrome calculator for the threerepresentation basis are shown in Table V.
TABLE V
DELAY AND THROUGHPUT OF FULLY PARALLEL SYNDROME CALCULATOR
Tools Unit PB NB DB
XilinxDelay (ns) 15,29 16,67 15,14
Thrghpt (MBps) 32,69 29,99 32,20
AltiumDelay (ns) 13,70 14,75 12,25
Thrghpt (MBps) 36,50 33,90 40,80
Syndrome computation can also be performed sequentially
as the symbols are transmitted. The result of performance
measurement is shown in Table VI.
TABLE VITHROUGHPUT SYMBOL-SERIAL SYNDROME CALCULATOR (MBPS)
Structure Tool PB NB DB
Parallel
Xilinx 36,94 34,74 26,62
Altium 41,23 39,26 32,89
SerialXilinx 40,00 26,35 45,26
Altium 47,37 27,10 49,10
B. Key Equation Solver
Key Equation Solver (KES) is the most complex module in
RS decoding process. KES is implemented accordingBerlekamp-Massey algorithm in [34]. The experiment on the
six variants of operator within KES circuits results in
performance figure shown in Table VII.
TABLE VIIKESTHROUGHPUT IN MBPS
Structure Tool PB NB DB
ParallelXilinx 147,75 138,95 106,49
Altium 164.90 160,00 131,58
SerialXilinx 150,00 98,80 169,71
Altium 177,60 101,57 184,20
C. Error Locator & Magnitude Polynomial
Error locator and magnitude are calculated by the circuit
described in [34]. Throughput is calculated with max error of 4
and represented by four coefficients of the error locator
polynomial = 4 4 = 16 bit. Table VIII shows the module
performance with six variants of multiplier and square.
TABLE VIII
ERRORLOCATOR & MAGNITUDE POLYNOMIAL
Structure Tool PB NB DB
Parallel Xilinx 157,60 148,20 113,58Altium 175,90 174,00 140.35
SerialXilinx 160,00 105,40 181,02
Altium 189,50 108,34 196,50
D. Error Corrector
Error correction is conducted by summing (XOR) Error
Magnitude Polynomial with the received data. Delay for the
three representation basis is the delay of XOR gates. In Xilinx:
delay = 2,9 ns and throughput = 2,586 GBps, whereas in
Altium: delay = 2,7 ns and throughput = 2,778 GBps.
E. Optimal RS Decoder
Practically the overall RS Decoder performance requires auniform representation basis for the whole symbols. It has been
learnt that the speed of inter-module sequential processes is
defined by the Syndrome Calculator. The fastest error locatorin DB is slowed down by the Syndrome Calculator
performance. In summary, Table IX to Table XII show the best
throughput of the module from each representation basis.
TABLE IXTHE BEST SERIAL SYNDROME CALCULATOR
Tool Architecture Thrghpt (MBps) clk
Xilinx
Serial PB 40,00 2,5 75
Parallel NB 34,74 13,49 16Serial DB 45,26 2,76 60
Altium
Serial PB 47,37 2,11 75
Parallel NB 39,26 15,18 16
Serial DB 49,10 2,55 60
To combine the four modules into one integrated system,
cycle rate must conform to the slowest component. Hence, the
best configurations from each representation basis are shown in
Table XIII. It is shown that both synthesis tools produce the
best RS Decoder which has serial operation and GF element
representation in dual basis. In fact, Table II shows that thebest multiplier is in parallel PB.
TABLE XTHE BEST KEY EQUATION SOLVER
Tool Architecture Thrghpt (MBps) clk
Xilinx
Serial PB 150,00 2,5 20
Parallel NB 138,95 13,49 4
Serial DB 169,49 2,76 16
Altium
Serial PB 177,60 2,11 20
Parallel NB 157,00 11,94 4
Serial DB 184,20 2,55 16
7/27/2019 05417289
5/6
TABLE XITHE BEST ERRORLOCATOR & MAGNITUDE POLYNOMIAL
Tool Architecture Thrghpt (MBps) clk
Xilinx
Serial PB 160 2,5 5
Parallel NB 148,20 13,49 1
Serial DB 181,02 2,76 4
Altium
Serial PB 189,5 2,11 5
Parallel NB 174 11,94 1
Serial DB 196,50 2,55 4
TABLE XIITHE BEST ERRORCORRECTOR
Tool Basis Thrghpt (MBps) clk
Xilinx
PB 2586 2,9 1
NB 2586 2,9 1
DB 2586 2,9 1
Altium
PB 2778 2,7 1
NB 2778 2,7 1
DB 2778 2,7 1
TABLE XIII
OPTIMAL DECODER
Tool Basis /clk #cycle (ns) Thrghpt(MBps)
Xilinx PB 2,5 75+20+5+1 252,5 29,70
NB 13,49 16+4+1+1 296,87 25,26
DB 2,76 60+16+4+1 223,72 33,52
Altium PB 2,11 75+20+5+1 213,2 35,18
NB 11,94 16+4+1+1 262,68 28,55
DB 2,55 60+16+4+1 206,15 36,38
VIII. OPTIMIZATION BY THE SYNTHESIS TOOL
Experiment result shows that employing the best operators
does not automatically produce the most efficient applications.This phenomenon is shown consistently by Xilinx and Altium
synthesis tools. Simple and common logic saying that parallel
process is faster than serial does not hold. This is caused by the
fact that the synthesis tool does some improvement or limitedoptimization to the VHDL structural design.
Multiplication is obviously the dominant process in RS
Codec. Examination on the results shows that multiplication
with fixed bit operands is optimized by removing unnecessary
components in the system. For specific configuration of RS
Codes,P(x) andg(x) are fixed during the system lifecycle. g(x)
is one of the operand performing as a(x) in Fig.1. Fixed values
ofpi and gi cause the content of blockEi that was originallyshown as in Fig.4a can be simplified becoming the circuits
shown in Fig.4b-c-d-e. All combination of ai and pi results in
internal blockEi without any AND gate. By omitting AND
gates, Xilinx saves significantly the delay so that the minimum
period is 2,5 ns. Hence, processing 4 bit requires 10 ns, which
is smaller than parallel multiplication delay 12,69 ns.
Fig. 4. Internal component ofEi in PB Serial Multiplier.
It is examined that optimization was not applied to parallel
multiplications although they also have constant operands. It is
due to the fact that the tools optimize constant bit only,
whereas parallel multiplier signal is implemented as an m-bitbus.
DB multiplier is superior over the PB and NB variants in
term of fully serial delivery of the product. With constant
values ofP(x) and A(x), DB serial multiplier delivers theproduct bit by bit as the operand bi enters from MSB to LSB.
Therefore, variant 6 saves 1 cycle compared to variant 2 and 4
that deliver the product after the whole cycle for m-bit
completes. For that reason, variant 6 based systems can
proceed addition serially as the product is delivered from index
m-1 to 0.
With constantP(x) = x4
+ x + 1, optimization steps applied
to serial DB multiplication is shown by Fig.5. In general this
finding supports several statements in the literature that
multiplication with constant operands can be more efficient
[10]. It was examined as well that optimization does not applyto serial NB multiplier. In serial NB multiplication, the values
of both operands change dynamically during the lifecycle of
the system due to the rotation of internal registers.
IX. CONCLUSION
This paper reports that an optimal performance of RS Codec
does not always require the best variants or the most efficient
GF operators. Combining all operators, each of which is themost efficient variant, is not a simple mechanistic conversion
process. Obtaining synergic efficiency at system level requirescareful considerations on several factors such as: the operator
composition and distribution, interaction between them and
types of internal process within the system. In addition, when
implemented in FPGA, efficiency improvement is also
contributed by the synthesis tool that optimizes serial operators
whose operands are constant. This explains why theexperimental results show the superiority of serial based
system over parallel ones.
7/27/2019 05417289
6/6
Fig. 5. DB Serial Multiplier a) original b) optimized with constantP(x).
It is interesting to examine further the consistency of this
optimization in other GF based applications, such as
cryptography and other configurations of RS Codec.
Explorative experiments are required for real symbol width, m= 8 bit such as the one used by NASA RS(255,223) 8-bit [35]
and Rijndael AES with cipher-key 8-bit [36]. Hypothetical
prediction suggests that higher performance ratio would beobtained by serial variants over the parallel ones. It is because
of additional combinational path in parallel operators that
results in bigger delays. Meanwhile, serial operators requireonly several additional cycles with constant minimum periods.
ACKNOWLEDGMENT
This material is based upon work supported by EuropeanUnion Project VN/Asia-Link/001 (79754). The author would
like to thank Prof. R.G. Spallek and Mr. Jrg Schneider for
providing tools and facilities in the Professur fr VLSI-
Entwurfssysteme, Diagnostik und Architektur, Technische
Universitt Dresden, Germany.
REFERENCES
[1] vanTilborg, H. (1988) An Introduction to Cryptology, Kluwer Academic
Publishers.[2] Schneier, B. (1993) Applied Chryptography, Wiley & Sons.
[3] B.Wicker, S. (1995) Error Control Systems for Digital Communication
and Storage, Prentice Hall, Upper Saddle River, New Jersey 07458.[4] Sweeney, P. (2002) Error Control Coding from Theory to Practice, John
Wiley & Sons, Inc., England.
[5] Lin, S. and Costello, D. J. (1983) Error Control Coding, Prentice Hall,Englewood Cliffs, New Jersey.
[6] Wang, C., Truong, T., Shao, H., Deutsch, L., Omura, J., and Reed, I.
August 1985 IEEE Transactions on Computers 34(8), 709717.[7] Hasan, M., Wang, M., and Bhargava, V. August 1992 IEEE Transactions
on Computers 41(8), 972980.
[8] Zhou, T., Wu, X., Bai, G., and Chen, H. 4 July 2002 Electronics Letters38(14), 706707.
[9] Bartee, T. and Schneider, D. March 1963 Information and Computers 6,
79 98.
[10] Berlekamp, E. November 1982 IEEE Transactions on InformationTheory 28, 869 874.
[11] Massey, J. and Omura, J. Computational method and apparatus for finite
field arithmetic Technical report US Patent No. 4,587,627 to OMNET
Association Sunnyvale CA, Washington, D.C. (1986) Patent and
Trademark Office.[12] Mastrovito, E. D. VLSI designs for computations over finite fields
GF(2m) Masters thesis Linkping Studies in Science and Technology
Linkping, Sweden December 1988 Thesis No: 159.[13] Mastrovito, E. VLSI Architectures for Computations in Galois Fields
PhD thesis Department of Electrical Engineering, Linkping University,
Sweden (1991).[14] Gollman, D. Algorithmenentwurf in der kryptographie Habil. University
of Karlsruhe (1990) Preprint.
[15] Wu, H., Hasan, M. A., F.Blake, I., and Gao, S. 23 August 2001.[16] Sunar, B., Savas, E., and Koc, C . K. November 2003 IEEE
Transactions on Computers 52(11), 13911398.[17] Afanasyev, V. (1990) In Proceedings of II International Workshop on
Algebraic and Combinatorial Coding Theory Leningrad, USSA: pp. 68.
[18] Hasan, M., Wang, M., and Bhargava, V. October 1993 IEEE
Transactions on Computers 42(10), 1278 1280.[19] Lee, C.-H. and Lim, J.-I. A new aspect of dual basis for efficient field
arithmetic Technical report Samsung Advaced Institute of Technology
(1998).[20] Fenn, T. S., Benaissa, M., and Taylor, D. March 1996 IEEE Transactions
on Computers 45(3), 319 327.
[21] Paar, C. Efficient VLSI Architectures for Bit-Parallel Computation inGalois Fields PhD thesis Institute for Experimental Mathematics,
University of Essen Essen, Germany June 1994.[22] Itoh, T. and Tsujii, S. (1988) Information and Computing 78, 171177.
[23] Paar, C. and Rosner, M. April 1997 IEEE Symposium on FPGA-Based
Custom Computing Machines (FCCM) pp. 219225.
[24] Mursanto, P. 29-30 January 2007 In National Conference on ComputerScience & Information Technology Depok, Fasilkom UI.
[25] Rudra, A., Dubey, P. K., Jutla, C. S., Kumar, V., Rao, J., and Rohatgi, P.(2001) volume 2162, of Proceedings of the Third International Workshop
on Cryptographic Hardware and Embedded Systems London, UK:
Springer-Verlag. pp. 171184.[26] Jutla, C., Kumar, V., and Rudra, A. On the circuit complexity of
isomorphic Galois Field transformations Technical Report RC22652
(W0211-243) IBM Research Division November 2002.[27] Choi, Y., Chang, K.-Y., Hong, D., and Cho, H. July 8 2004 Electronics
Letters 40(Issue 14), 852 853.
[28] Huang, C.-T. and Wu, C.-W. September 2000 IEEE Transactions onCircuit and Systems 48(9), 909918.
[29] Saleh, K. Hardware architectures for Reed-Solomon decoders Technical
report Amirkabir University of Technology, Iran January 2003.
[30] Shayan, Y., Ngoc, T., and Bhargava, V. November 1990 IEEE Transc. on
Vehicular Technology 39(4), 400409.[31] ASICSws Reed Solomon encoder IP core, http://www.asics.ws/doc/rse
brief.pdf Nov 2006.
[32] Xilinx, I. Reed-Solomon solutions with Spartan-ii FPGAs Technical
report Xilinx, Inc. Feb. 2000 White Paper.[33] Reed Solomon coding Technical report VOCAL Technologies Ltd. 90A
John Muir Drive, Buffalo - New York 14228 (2006).
[34] Sarwate, D. and Shanbhag, N. October 2001 IEEE Transactions on VLSISystems 9(5), 641655.
[35] Sklar, B. (2003) Digital Communications Fundamentals and
Applications, Prentice Hall, Inc., New Jersey 2nd ed. edition.[36] Daemen, J. and Rijmen, V. March 2001 Dr.Dobbs Journal 26, 137139.