[IEEE 2008 International Symposium on Information Theory and Its Applications (ISITA) - Auckland, New Zealand (2008.12.7-2008.12.10)] 2008 International Symposium on Information Theory

International Symposium on Information Theory and its Applications, ISITA2008Auckland, New Zealand, 7-10, December, 2008

Gradient Descent Bit Flipping Algorithms for Decoding LDPC Codes

Tadashi Wadayama†, Keisuke Nakamura†, Masayuki Yagita†,Yuuki Funahashi‡, Shogo Usami‡, Ichi Takumi†

†: Nagoya Institute of TechnologyE-mail: [email protected]

‡ Meijo University

Abstract

A novel class of bit-flipping (BF) algorithms for de-coding low-density parity-check (LDPC) codes is pre-sented. The proposed algorithms, which are calledgradient descent bit flipping (GDBF) algorithms, canbe regarded as simplified gradient descent algorithms.Based on gradient descent formulation, the proposedalgorithms are naturally derived from a simple non-linear objective function.

1. Introduction

Bit-flipping (BF) algorithms for decoding low-density parity-check (LDPC) codes [1] have been inves-tigated extensively and many variants of BF algorithmssuch as weighted BF (WBF) [2], modified weighted BF(MWBF) [3], and other variants [4, 5, 6] have beenproposed. The first BF algorithm was developed byGallager [1]. In a decoding process of Gallager’s algo-rithm, some unreliable bits (in a binary quantized re-ceived word) corresponding to unsatisfied parity checksare flipped for each iteration. The successors of Gal-lager’s BF algorithm inherits the basic strategy of Gal-lager’s algorithm. Although the bit error rate (BER)performance of the BF algorithm is inferior to that ofthe sum-product algorithm or the min-sum algorithm,in general, the BF algorithm enables us to design amuch simpler decoder, which is easier to implement.Thus, bridging the performance gap between BF de-coding and BP decoding is an important technical chal-lenge.

In the present paper, a novel class of BF algorithmsfor decoding LDPC codes is presented. The proposedalgorithm, which are called gradient descent bit flipping(GDBF) algorithms, can be regarded as bit-flippinggradient descent algorithms. The proposed algorithms

This work was supported by Grant-in-Aid for Scientific Re-search on Priority Areas (Deepening and Expansion of StatisticalInformatics) 180790091 and by a research grant from the Stor-age Research Consortium (SRC). A part of this work has beenposted to Arxiv: arXiv:0711.0261.

are naturally derived from a simple gradient descentformulation. The behavior of the proposed algorithmcan be explained from the viewpoint of the optimiza-tion of a non-linear objective function.

2. Preliminaries

2.1. Notation

Let H be a binary m×n parity check matrix, wheren > m ≥ 1. The binary linear code C is defined byC

�= {c ∈ Fn

2 : Hc = 0}, where F2 denotes the binaryGalois field. In the present paper, a vector is assumedto be a column vector. For convention, we introducethe bipolar codes C̃ corresponding to C as follows: C̃

�=

{(1 − 2c1, 1 − 2c2, . . . , 1 − 2cn) : c ∈ C}. Namely, C̃,which is a subset of {+1,−1}n, is obtained from C byusing binary (0, 1) to bipolar (+1,−1) conversion.

The binary-input AWGN channel is assumed in thepaper, which is defined by y = c + z (c ∈ C̃). Thevector z = (z1, . . . , zn) is a white Gaussian noise vec-tor where zj(j ∈ [1, n]) is an i.i.d. Gaussian randomvariable with zero mean and variance σ2. The nota-tion [a, b] denotes the set of consecutive integers froma to b.

Let N(i) and M(j)(i ∈ [1,m], j ∈ [1, n]) be N(i)�=

{j ∈ [1, n] : hij = 1} and M(j)�= {i ∈ [1,m] : hij = 1}

where hij is the ij-element of the parity check matrixH. Using this notation, we can write the parity con-dition as

∏j∈N(i) xj = 1(∀i ∈ [1,m]) which is equiv-

alent to (x1, . . . , xn) ∈ C̃. The value∏

j∈N(i) xj ∈{+1,−1} is called the i-th bipolar syndrome of x.

2.2. Brief review on known BF algorithms

A number of variants of BF algorithms have beendeveloped. We can classify the BF algorithms into two-classes: single bit flipping (single BF) algorithms andmultiple bits flipping (multi BF) algorithms. In thedecoding process of the single BF algorithm, only one

bit is flipped according to its bit flipping rule. On theother hand, the multi BF algorithm allows multiple bitflipping per iteration in a decoding process. In general,although the multi BF algorithm shows faster conver-gence than the single BF algorithm, the multi BF algo-rithm suffers from the oscillation behavior of a decoderstate, which is not easy to control. The framework ofthe single BF algorithms is summarized as follows:

Single BF algorithm

Step 1 For j ∈ [1, n], let xj := sign(yj). Let

x�= (x1, x2, . . . , xn).

Step 2 If the parity equation∏

j∈N(i) xj =+1 holds for all i ∈ [1, m], output x,and then exit.

Step 3 Find the flip position given by

� := arg mink∈[1,n]

Δk(x). (1)

and then flip the symbol: x� := −x�.The function Δk(x) is called an inver-sion function.

Step 4 If the number of iterations is un-der the maximum number of iterationsLmax, return to Step 2; otherwise out-put x and exit.

The function sign(·) is defined by

sign(x)�=

{+1, x ≥ 0−1, x < 0.

(2)

In a decoding process of the single BF algorithm, harddecision decoding for a given y is first performed, and xis initialized to the hard decision result. The minimumof the inversion function Δk(x) for k ∈ [1, n] is thenfound1. An inversion function Δk(x) can be seen as ameasure of the invalidness of bit assignment on xk. Thebit x�, where � gives the smallest value of the inversionfunction, is then flipped.

The inversion functions of WBF [2] are defined by

Δ(WBF )k (x)

�=

∑i∈M(k)

βi

∏j∈N(i)

xj . (3)

The values βi(i ∈ [1, m]) is the reliability of bipolar

syndromes defined by βi�= minj∈N(i) |yj |. In this case,

the inversion function Δ(WBF )k (x) gives the measure of

1When Δk(x) is an integer-valued function, we need a tie-break rule to resolve a tie.

invalidness of symbol assignment on xk, which is givenby the sum of the weighted bipolar syndromes.

The inversion functions of MWBF [3] has a similarform of the inversion function of WBF but it contains aterm corresponding to a received symbol. The inversionfunction of MWBF is given by

Δ(MWBF )k (x)

�= α|yk| +

∑i∈M(k)

βi

∏j∈N(i)

xj , (4)

where the parameter α is a positive real number.

3. Gradient descent formulation

3.1. Objective function

It seems natural to consider that the dynamics ofa BF algorithm as a minimization process of a hiddenobjective function. This observation leads to a gradientdescent formulation of BF algorithms.

The maximum likelihood (ML) decoding problemfor the binary AWGN channel is equivalent to theproblem of finding a (bipolar) codeword in C̃, whichgives the largest correlation to a given received wordy. Namely, the MLD rule can be written as x̂ =arg maxx∈C̃

∑nj=1 xiyi.

Based on this correlation decoding rule, we heredefine the following objective function:

f(x)�=

n∑i=1

xjyj +m∑

i=1

∏j∈N(i)

xj . (5)

The first term of the objective function correspondsto the correlation between a bipolar codeword and thereceived word, which should be maximized. The secondterm is the sum of the bipolar syndromes of x. If andonly if x ∈ C̃, then the second term has its maximumvalue

∑mi=1

∏j∈N(i) xj = m. Thus, this term can be

considered as a penalty term, which forces x to be avalid codeword. Note that this objective function is anon-linear function and has many local maxima. Theselocal maxima become a major source of sub-optimalityof the GDBF algorithm presented later.

3.2. Gradient descent BF algorithm

For the numerical optimization problem for a dif-ferentiable function such as (5), the gradient descentmethod [8] is a natural choice for the first attempt.The partial derivative of f(x) with respect to the vari-able xk(k ∈ [1, n]) can be immediately derived from thedefinition of f(x):

∂

∂xkf(x) = yk +

∑i∈M(k)

∏j∈N(i)\k

xj . (6)

Let us consider the product of xk and the partialderivative of xk in x, namely

xk∂

∂xkf(x) = xkyk +

∑i∈M(k)

∏j∈N(i)

xj . (7)

For a small real number s, we have the first-order ap-proximation:

f(x1, . . . , xk + s, . . . , xn) � f(x) + s∂

∂xkf(x). (8)

When ∂∂xk

f(x) > 0, we need to choose s > 0 in orderto have

f(x1, . . . , xk + s, . . . , xn) > f(x). (9)

On the other hand, if ∂∂xk

f(x) < 0 holds, we shouldchoose s < 0 to obtain the inequality (9). Therefore,if xk

∂∂xk

f(x) < 0, then flipping the kth symbol (xk :=−xk) may increase the objective function value2.

One reasonable way to find a flipping position is tochoose the position at which the absolute value of thepartial derivative is largest. This flipping rule is closelyrelated to the steepest descent algorithm based on �1-norm (also known as the coordinate descent algorithm)[8]. According to this observation, we have the rule tochoose the flipping position. The single BF algorithmbased on the inversion function

Δ(GD)k (x)

�= xkyk +

∑i∈M(k)

∏j∈N(i)

xj (10)

is called the Gradient descent BF (GDBF) algorithm.Thus, the decoding process of the GDBF algorithm

can be seen as a minimization process of −f(x) (it canbe considered as the energy of the system) based bit-flipping gradient descent method.

It is interesting to see that the combination of theobjective function f̃(x) defined by

f̃(x)�= α

n∑i=1

xj |yj | +m∑

i=1

βi

∏j∈N(i)

xj (11)

and the argument on gradient descent presented abovegives the inversion functions of conventional algorithmssuch as the WBF algorithm (3) and the MWBF algo-rithm (4). However, this objective function (11) looksless meaningful compared with the objective function(5). In other words, the inversion function Δ(GD)

k (x)defined in (10) has a more natural interpretation thanthose of the conventional algorithms: Δ(WBF )

k (x) in (3)2There is a possibility that the objective function value may

decrease because the step size is fixed (such as single flip).

and Δ(MWBF )k (x) in (4). Actually, the new inversion

function Δ(GD)k (x) is not only natural but also effec-

tive in terms of bit error performance and convergencespeed.

3.3. Multi GDBF algorithm

A decoding process of the GDBF algorithm can beregarded as a maximization process of the objectivefunction (5) in a gradient ascent manner. Thus, wecan utilize the objective function value in order to ob-serve the convergence behavior. For example, it is pos-sible to monitor the value of the objective function foreach iteration. In the first several iterations, the valueincreases as the number of iterations increases. How-ever, the value eventually ceases to increase when thesearch point arrives at the nearest point in {+1,−1}n

to the local maximum of the objective function. Wecan easily detect such convergence to a local maximumby observing the value of the objective function.

Both the BF algorithms reviewed in the previoussection and the GDBF algorithm flip only one bit foreach iteration. In terms of the numerical optimization,in these algorithms, a search point moves towards a lo-cal maximum with a very small step (i.e., 1 bit flip) inorder to avoid oscillation around the local maximum(See Fig.1 (A)). However, the small size step leads toslower convergence to a local maximum. In general,compared with the min-sum algorithm, BF algorithms(single flip/iteration) require a larger number of itera-tions to achieve the same bit error probability.

The multi bit flipping algorithm is expected to havea faster convergence speed than that of the single bitflipping algorithm because of its larger step size. If thesearch point is close to a local maximum, a fixed largestep is not suitable to find the (near) local maximumpoint; it leads to oscillation behavior of a multi-bit flip-ping BF algorithm (Fig.1(B)). We need to adjust thestep size dynamically from a large step size to a smallstep size in an optimization process (Fig.1(C)).

The objective function is a useful guideline for ad-justing the step size (i.e., number of flipping bits). Themulti GDBF algorithm is a GDBF algorithm includingthe multi-bit flipping idea. In the following, we assumethe inversion function Δ(GD)

k (x) defined by (10) (theinversion function for the GDBF algorithm).

The flow of the multi GDBF algorithm is almost thesame as that of the previously presented GDBF algo-rithm. When it is necessary to clearly distinguish twodecoding algorithms, the GDBF algorithm presented inthe previous sub-subsection is referred to as the singleGDBF algorithm.

In order to define the multi GDBF algorithm, we

� � � � � � � � � � �

� � � � � � � � � � � � � �

� � � � � � � � � � ��

� � � � ! � � � � � � � � � � �

" # � $ �

% � � � ! � � � � � � � � � � �

$ & � ' ( � ) �

(A) converging but slow, (B) not converging but fast,(C) converging and fast

Figure 1: Convergence behavior

need to introduce new parameters θ and μ. The pa-rameter θ is a negative real number, which is calledthe inversion threshold. The binary (0 or 1) variable μ,which is called the mode flag, is set to 0 at the begin-ning of the decoding process. Step 3 of the BF algo-rithm should be replaced with the following multi-bitflipping procedure.

Step 3 Evaluate the value of the objectivefunction, and let f1 := f(x). If μ =0, then execute Sub-step 3-1 (multi-bitmode), else execute Sub-step 3-2 (single-bit mode).

3-1 Flip all the bits satisfying

Δ(GD)k < θ (k ∈ [1, n]).

Evaluate the value of the objectivefunction again, and let f2 := f(x).If f1 > f2 holds, then let μ = 1.

3-2 Flip a single bit at the jth position,where

j�= arg min

k∈[1,n]Δ(GD)

k .

Usually, at the beginning of a decoding process, theobjective function value increases as the number of iter-ations increases in the multi-bit mode, namely, f1 < f2

holds for the first few iterations. When the search pointeventually arrives at the point satisfying f1 > f2, thebit flipping mode is changed from the multi-bit mode(μ = 0) to the single-bit mode (μ = 1). This modechange means adjustment of the step size, which helpsa search point to converge to a local maximum whenthe search point is located close to the local maximum.

4. Behavior of the GDBF algorithms

In this section, the behavior and decoding perfor-mance of (single and multi) GF-BF algorithms ob-tained from computer simulations are presented.

Figure 2 presents objective function values (5) as afunction of the number of iterations in the single andmulti GDBF processes. Throughout the present pa-per, a regular LDPC code with n = 1008,m = 504(called PEGReg504x1008 in [9]) is assumed. The col-umn weight of the code is 3. In both cases (single andmulti), we tested the same noise instance, and both al-gorithms output the correct codeword (i.e., successfuldecoding).

In the case of the single GDBF-algorithm, the ob-jective function value gradually increases as the numberof iterations grows in the first 50–60 iterations. Af-ter the slope, the increment of the objective functionvalue eventually stops, and a flat part that correspondsto a local maximum appears. In the flat part of thecurves, the oscillation behavior of the objective func-tion value can be seen. Due to the constraint suchthat a search point x must lie in {+1,−1}, a GDBFprocess cannot find a true local maximum point (thepoint where the gradient of the objective function be-comes a zero vector) of the objective function. Thus, asearch point moves around the local maximum point.This move causes the oscillation behavior observed ina single GDBF process. The curve corresponding tothe multi GDBF algorithm shows much faster conver-gence compared with the single GDBF algorithm. Ittakes only 15 iterations for the search point to comevery close to the local maximum point.

1250

1300

1350

1400

1450

1500

1550

0 20 40 60 80 100

Val

ue o

f Obj

ectiv

e F

unct

ion

Number of Iterations

multi GD-BFsingle GD-BF

SNR=4 dB

Figure 2: Objective function values in GDBF processesas a function of the number of iterations

Figure 3 presents the bit error curves of single and

multi GDBF algorithms (Lmax = 100, θ = −0.6). Asreferences, the curves for the WBF algorithm (Lmax =100), the MWBF algorithms (Lmax = 100, α = 0.2),and the normalized min-sum algorithm (Lmax = 5,scale factor 0.8) are included as well. The parameterLmax denotes the maximum number of iterations foreach algorithm. We can see that the GDBF algorithmsperform much better than the WBF and MWBF algo-rithms. For example, at BER = 10−6, the multi GDBFalgorithm offers a gain of approximately 1.6 dB com-pared with the MWBF algorithm. Compared with thesingle GDBF algorithm, the multi GDBF algorithm hasa steeper slope in its error curve. Unfortunately, thereis still a large performance gap between the error curvesof the normalized min-sum algorithm and the GDBFalgorithms. The GDBF algorithm fails to decode whena search point is attracted to an undesirable local maxi-mum of the objective function. This large performancegap suggests the existence of some local maxima rela-tively close to a bipolar codeword, which degrades theBER performance.

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

2 3 4 5 6 7 8

Bit

Err

or R

ate

SNR [dB]

WBF, Lmax=100

MWBF, Lmax=100

single GD-BF, Lmax=100

multi GD-BF, Lmax=100

Normalized min-sum, Lmax=5

Figure 3: Bit error rate of GDBF algorithms: regularLDPC code (PEGReg504x1008[9])

5. Escape from a local maximum

5.1. GDBF algorithm with escape process

Since the weight of the final position of a searchpoint is so small, a small perturbation of a capturedsearch point appears to be helpful for the search pointto escape from an undesirable local maximum. We canexpect that such a perturbation process improves theBER performance of BF algorithms.

One of the simplest ways to add a perturbation on atrapped search point is to switch the flip mode from the

single-bit mode to the multi-bit mode with an appro-priate threshold forcibly when the search point arrivesat a non-codeword local maximum. This additionalprocess is called the escape process. In general, theescape process reduces the object function value, i.e.,the search point moves downwards in the energy land-scape. After the escape process, the search point againbegins to climb a hill, which may be different from thetrapped point.

We here modify the multi GDBF algorithm by in-corporating two thresholds: θ1 and θ2. The thresholdθ1 is the threshold constant used in the multi-bit modeat the beginning of the decoding process. After severaliterations, the multi-bit mode is changed to single-bitmode and then the search point may eventually arriveat the non-codeword local maximum. In such a case,the decoder changes its mode to the multi-bit mode(i.e., μ = 0) with threshold θ2. Thus, the threshold θ2

can be regarded as the threshold for downward move-ment. Although θ2 can be a constant value, in termsof the BER performance, it is advantageous to chooserandomly3. In other words, θ2 can be a random vari-able. After the downward move (just one iteration),the decoder changes the threshold to θ1 again. Theabove process continues until the parity check condi-tion holds or the number of iterations becomes Lmax.Figure 4 illustrates the idea of the escape process.

� � � � � � � � �

� � � �

� � � � � � � � � � � � � � �

� � � � � � � � ��

� � � � � � � � ��

� � � � � � � �

Figure 4: Idea of escape process

5.2. Simulation results

Figure 5 shows the BER curve of such a decodingalgorithm (labeled ’multi GDBF with escape’). In thissimulation, we used the parameters: θ1 = −0.7, θ2 =1.7 + α where α is a Gaussian random number withmean zero and variance 0.01. These parameters havebeen obtained an ad hoc optimization at SNR = 4dB.

3This fact is observed from some experiments.

We can see that the BER curve of multi GDBF withescape (with Lmax = 300) is much steeper than thatof the naive multi GDBF algorithm. At BER = 10−5,multi GDBF with escape achieves a gain of almost 1.5dB compared with the naive multi GDBF algorithm.The average number of iterations of multi GDBF withescape is approximately 25.6 at SNR = 4 dB. This re-sult implies that the perturbation can actually savesome trapped search points to converge to the desir-able local maximum corresponding to the transmittedcodeword. It is an interesting open problem to opti-mize the flipping schedule to narrow the gap betweenthe min-sum BER curve and the GDBF BER curve.

10-6

10-5

10-4

10-3

10-2

10-1

100

1 2 3 4 5 6 7

Bit

Err

or R

ate

SNR [dB]

multi GD-BF with escape, Lmax=300multi GD-BF, Lmax=100

Normalized min-sum, Lmax=5Normalized min-sum, Lmax=100

WBF, Lmax=100

Figure 5: Bit error rate of the GDBF algorithm withthe escape process

6. Conclusion

This paper presents a class of BF algorithms basedon the gradient descent algorithm. GDBF algorithmscan be regarded as a maximization process of the ob-ject function using bit-flipping gradient descent method(i.e., bit-flipping dynamics which minimizes the energy−f(x)). The gradient descent formulation naturallyintroduces an energy landscape of the state-space ofthe BF-decoder. The viewpoint obtained by this for-mulation brings us a new way to understand conver-gence behaviors of BF algorithms. Furthermore thisviewpoint is also useful to design improved decodingalgorithms such as the multi GDBF algorithm and theGDBF algorithm with escape process from an unde-sired local maximum. The GDBF algorithm with es-cape process performs very well compared with knownBF algorithms. One lesson we have learned from thisresult is that fine control on flipping schedule is indis-

pensable to improve decoding performance of BF algo-rithms.

Acknowledgement

The present study was supported in part by the Min-istry of Education, Science, Sports, and Culture ofJapan through a Grant-in-Aid for Scientific Researchon Priority Areas (Deepening and Expansion of Statis-tical Informatics) 180790091 and by a research grantfrom the Storage Research Consortium (SRC).

References

[1] R. G. Gallager, “Low-Density Parity-CheckCodes”, in Research Monograph series. Cam-bridge, MIT Press,1963.

[2] Y.Kou, S.Lin, and M.P.C Fossorier, “Low-densityparity-check codes based on finite geometries: a re-discovery and new results,” IEEE Trans. Inform.Theory, pp.2711–2736, vol.47, Nov. 2001.

[3] J.Zhang, and M.P.C.Fossorier, “A modifiedweighted bit-flipping decoding of low-densityparity-check codes,” IEEE Communications Let-ters, pp.165–167, vol.8, Mar. 2004.

[4] M.Jiang, C.Zhao, Z.Shi, and Y.Chen, “An im-provement on the modified weighted bit flippingdecoding algorithm for LDPC codes,” IEEE Com-munications Letters , vol.9, no.9, pp.814–816,2005.

[5] F. Guo, and H. Henzo, “Reliability ratio basedweighted bit-flipping decoding for low-dencityparity-check codes,” IEEE Electronics Letters ,vol.40, no.21, pp.1356–1358, 2004.

[6] C. H. Lee, and W. Wolf, “Implemntation-efficientreliability ratio based weighted bit-flipping decod-ing for LDPC codes,” IEEE Electronics Letters ,vol.41, no.13, pp.755–757, 2005.

[7] A. Nouh and A. H Banihashemi, “Bootstrap de-coding of low-density parity check matrix,” IEEECommunications Letters , vol.6, no.9, pp.391–393,2002.

[8] S.Boyd and L. Vandenberghe, “Convex optimiza-tion,” Cambridge University Press, 2004.

[9] D. J. C. MacKay “Encyclopedia of sparse graphcodes, ” online:http://www.inference.phy.cam.ac.uk/mackay/

Documents

[IEEE 2008 International Symposium on Information Theory and Its Applications (ISITA) - Auckland, New Zealand (2008.12.7-2008.12.10)] 2008 International Symposium on Information Theory