Prediction of soil liquefaction using genetic programming

American Society of Civil Engineering جمعية مهندسي الري المصرية Egypt Section (ASCE-EGS) Egyptian Society of Irrigation Engineers

III Middel East Regional Conference on Civil Engineering Technology & III International Symposium on Environmental Hydrology 2002

. . Prediction of Soil Liquefaction Using Genetic Programming

Ezzat A. Fattah1, Hossam E.A. Ali2, Ahmed M. Ebid3

ABSTRACT In most geotechnical problems, it is too difficult to predict soil and structural behavior accurately, because of the large variation in soil parameters and the assumptions of numerical solutions. But recently many geotechnical problems are solved using Artificial Intelligence (AI) techniques, by presenting new solutions or developing existing ones. Genetic Programming, (GP), is one of the most recently developed (AI) techniques based on Genetic Algorithm (GA) technique. In this research, GP technique is utilized to develop prediction criteria for liquefaction phenomena in cohesivless soils using collected historical records. The liquefaction formula is developed using special software written by the authors in “Visual C++” language. The accuracy of the developed formula was also compared with earlier prediction methods. Keywords: Soil Liquefaction, Earthquake Engineering, GA, GP, and AI

INTRODUCTION Liquefaction is a disastrous phenomenon that happens in saturated soils and is usually triggered by seismic or dynamic shaking. This phenomenon occurs due to the soil's inability to quickly dissipate pore water pressure buildup under sudden loading. The sudden increase in pore water pressure as well as the dynamic-induced stresses can bring the soil structure to an unstable condition (Kramer 1996). For a long time, liquefaction was exclusively considered as an earthquake-related phenomenon. Researches in the last three decades have shown that this phenomenon can be triggered under different conditions and it eventually leads to significant soil densification. The nature of soil disturbance and densification process plays a crucial factor in liquefaction hazard evaluation tasks. Recognizing that soil liquefaction can lead to catastrophic damages, various efforts in the past have been conducted in order to understand this complex phenomenon. Many liquefaction assessment criteria have been established based on empirical, conventional and correlations techniques such as proposed by Seed and DeAlba (1986), Mitchell and Tseng (1990), Stark and Olson (1995), Shibata and Teparaksa (1988) among others. The most common factor in many of these approaches is the use of data obtained from in-situ tests such as Cone Penetration Test (CPT). This is because the CPT as well as other in-situ tests, are influenced by many soil and site variables such as soil density, soil structure, cementation, stress state and stress history (Rebertson and Campanella 1985).

1 Prof. of soil mechanics, Ain Shams University, Cairo, Egypt 2 Teacher of soil mechanics, Ain Shams University, Cairo, Egypt 3 Graduate student, Ain Shams University, Cairo, Egypt



. . In this research, a promising technique called Genetic Programming (GP) based on Genetic Algorithm (GA) will be used to develop a formula to predict the liquefaction potential in sandy soils. The used historical records contain CPT test results, mean diameter of soil particles, seismic shear stress ratio, earthquake magnitude and liquefaction observations. GENETIC ALGORITHM (GA) The Genetic Algorithm (GA) is an Artificial Intelligence (AI) technique, based on simulating the natural reproduction process, following the well-known Darwin's rule "The fittest survive". The natural selection theory for Darwin assumes that, for a certain population, there is always some differences between its members. These differences make some members more suitable for the surrounding conditions than the others. Accordingly, they have better chances to survive and reproduce a next generation with enhanced properties. Generation after generation most of the population will have these suitable properties, meanwhile the unsuitable members will eventually be diminished. In other words, during the reproduction process, the natural selection increases the fitness of the population, which means that this population is developed to suite the surrounding conditions. In the natural reproduction process, certain sequence of (DNA) characters represent properties of members, each character is called "Gene", and every set of genes is called "Chromosome" (Michalewicz, 1992). The theory of biological reproduction process was first simulated mathematically by John Holland, 1975, where genes and chromosomes are replaced by a parameters and solutions respectively, and the surrounding conditions are represented by a fitting function. Hence, according to Darwin's rule, during the reproduction process the population is developed to suite the fitting function (Holland 1975). The most important advantage of GA technique is its generality and its applicability to very wide range of engineering problems. This is because GA technique is not depending on type of data. Encoding the problem parameters in genetic form is the first and the most important step in the GA solution. The standard GA procedure consists of four main steps as depicted in Fig.1. First, a random population of solutions are generated and encoded in genetic form. Second, using a certain fitting function, an evaluating of the fitness of each solution is conducted. Then, the solutions according to there fitness are arranged and the unsuitable (least qualified) solutions are destroyed. Finally, producing new solutions to keep the population size constant by applying crossover operator on the surviving solutions. Mutation operator may be applied and then the cycle started again by evaluating the fitness of the new solutions and so on until the solution accuracy is accepted (Michalewicz 1992). GENETIC PROGRAMMING (GP) GP is one of the most recent developed knowledge-base techniques and it is next development to the GA, which can be defined as Multivariable Interpolation Procedure (MLP). The basic concept of GP is to find the best fitting surface in hyper-space for a certain given points using GP technique. In order to use GA, the previous steps will be followed, (Koza, 1994).



. .

Figure 1: Flow chart for GA procedure

First, fitness evaluation method (function to be optimized) has to be determined. Fitness of surface is represented herein by the summation of squared errors (SSE), which has to be minimized for best fitting surface. The SSE is calculated by

SSE = Σ [ GP prediction - Target output ]2 (1)

Then, conducting the most important step in GP which is encoding of chromosomes (i.e., determination of number of genes for each chromosome, and arrangement of genes on the chromosome) to represent a formula in genetic form. By doing so, some important points have to be considered:

1. Any set of points in certain domain of hyper space can be represented by many surfaces with deferent accuracy depending on the complexity of these surfaces.

2. Any complicated equation can be constructed from certain basic functions (operators) such as (=, +, -, x, /, sin, cos.…etc. ).

3. The most simple case is to use only the five basic operators (=, +, -, x, /) to construct a polynomial equation.

4. The five basic operators have two inputs and one output except the operator (=) which has one input and one output.



. . Therefore, to create a formula in genetic form, a binary tree structure will be constructed using the aforementioned five basic operators (i.e. =, +, -, x, /). This tree structure is graphically represented in Fig. 2. Using the previous operators, any polynomial can be represented in a tree form. The more complexity of the formula, the more levels of tree are needed to represent it. An example for representing formulas in a tree form is shown in Fig. 3.

Figure 2: The five basic operators in GP

Figure 3: Mathematical and genetic representation of binary tree

As shown in Fig. 3, each chromosome consists of two parts; operators part and variables part. Operators part represents all the tree except the level 0 and it consists of (2 No. of levels - 1) genes. The variables part represent only the level 0 of the tree and consists of (2 No. of levels) genes. Therefore, the total number of genes on every chromosome is ( 2 No. of levels + 1 - 1) genes.



. . After conducting encoding of chromosome procedure, the procedures to apply the genetic operations (crossover and mutation) have to be performed. Mutation is very simple operation to replace some randomly selected genes with random operator (in operators part) or variable (in variables part). Oppositely, the crossover procedure is not that simple, because the components of the two parts of the chromosome must not be mixed during crossover. On the other hand, the new chromosomes generated during crossover must have some features from their parents. That means that they cannot be generated randomly. So there were two ways to apply the crossover, the first method, which is suggested by Riccardo (1996), is called two-point crossover. In this technique, crossover procedure is applied on the two parts of the chromosome independently. Thus, a certain number of genes from the operators part of one parent will be swapped with their image from the other parent, and the same operation will be applied on the variables part too as shown in Fig. 4.

Figure 4: Two-point crossover method

The second way to apply crossover was proposed by the authors. In this technique, a new generation of chromosomes is generated by randomly selecting each gene from the similar surviving chromosomes. In other words, the first gene of the new chromosome will be selected randomly from the first genes of the whole surviving set of parent chromosomes, and so do the next genes. This process is depicted in Fig. 5 for three parents and one child.

Figure 5: Random selection crossover method



. . In random selection crossover technique number of survivors from one generation to the next can be chosen. On the other hand, in two-point crossover technique number of survivors must be half of the population. For this reason, the random selection crossover technique was chosen in this research to carry out the crossover operation in the developed software. Practically, the fastest conversion occurs when the number of survivors equals to 30-40% from the population. For less number of survivors the solution may be trapped in a local minima. On other hand, if the number of survivors is more than 50% of the population, the conversion will be very slow.

After the previous three steps, GA can be applied on the first and randomly created generation. Generation after generation, the fitness will increase (which means a decrease in SSE). After a certain number of generations the fitness will settle at a certain value (with minimum SSE). At this stage, the corresponding chromosomes represent the most fitting surface for this number of tree levels (which means for this degree of complexity). If the accuracy of this surface is not enough, larger number of tree levels must be used (Riccardo, 1996).

PREDICTION OF SOIL LIQUEFACTION USING GP GP as a “Multivariable Interpolation procedure” has a wide range of applications in the geotechnical field. Various empirical formulas (based on observations or experimental results) can be enhanced using GP. Correlation between site investigation tests as well as soil parameters could be formed in certain equations instead of experience or engineering judgment. Liquefaction of sand is a good example for applying GP in geotechnical field, as there is no certain formula based on mathematical derivation to predict the phenomena of sand liquefaction. There are however number of observations for this phenomenon after earthquakes, from records of soil parameters of attacked zones. Many empirical formulas are developed to relate soil parameters and earthquake main characteristics with the potential of liquefaction. In order to explain GP procedure utilized in this research, a simple example of soil liquefaction will be presented, for simplicity only five observations (records) will be interpolated and a tree of only two levels will be used. The data observed are the magnitude of the earthquake (M), the mean diameter of sand particles (D50) and the tip resistance from cone penetration test (CPT). The data of the five observations are summarized in the following table.

Table 1: Sample of available liquefaction case histories M

D50 (mm)

CPT kPa

State

7.5 0.33 3.14 Liquefaction 6.4 0.40 11.8 No Liquefaction

7.8 0.17 1.47 Liquefaction 5.9 0.10 5.7 No Liquefaction 7.1 0.26 10.0 Liquefaction



. . Assume that the value of the function is equal to 1 in case of liquefaction and equal 0 in case of no liquefaction, thus the fitness (summation of squared error) of each equation can be calculated from the following formula.

SSE = Σ [ f(M,D,CPT ) - ( 1.0 or 0.0 ) ]2 (2) Applying GP procedure on the five available cases is summarized in Fig.6, where the cycles of generating random formulas, calculating their fitness, choosing the survivors and applying crossover operator to generate the next generation are all presented graphically until the solutions is settled on the best fitting formula with minimum SSE.

Figure 6: Procedure of using GP to predict liquefaction



. . From Fig.6, the best fitting surface for these five observations is:

P = M D

CPT.

.50

2 (3)

Where P is the probability of prediction of liquefaction. As shown in table 2, the minimum value of P triggered liquefaction is 0.4. Therefore, if P is equal to or larger than 0.4 then liquefaction is likely to accrue, and if P is less than 0.4 then there is no liquefaction. The evaluation of the accuracy of the formula is summarized in Table 2.

Table 2: Accuracy evaluation of the developed formula M D50 CPT P Prediction Observation

7.5 0.33 3.140 0.40 Liquefaction Liquefaction 6.4 0.40 11.80 0.10 No No 7.8 0.17 1.47 0.45 Liquefaction Liquefaction 5.9 0.10 5.70 0.05 No No 7.1 0.26 10.0 0.10 No Liquefaction

From Table 2, only one observation is incorrect. Accordingly, the accuracy of this formula can be considered 80%. For more prediction accuracy, a tree with more levels must be used. RESULTS

For accurate prediction of liquefaction potential, three trails had been carried out using the developed software as previously demonstrated. The first trail has only two levels, the second one has three levels, and the last one has four levels. All the trails used the same data, which compiled by Olson (1995) and contains 174 records. The results of applying GP procedure on the available 174 record are summarized in Table 3, which contains the best fitting formula for each trail and its SSE and prediction accuracy.

Table 3: The best estimated function for each trails and its fitness

No. of Levels Best estimated function SSE

Prediction Accuracy %

2 P = M DM CPT

++

50 25.21 84

3 P =2 50.SSR + +

+M D

M CPT 23.23 85

4 P = M D

CPTSSR D

MCPT

+

++ −

3

1

50

50

. 21.14 88



. . In Table 3, M is the magnitude of the earthquake, D50 is Mean diameter of soil particles in mm, CPT is the tip resistance from Cone Penetration Test in MPa, SSR is the site Seismic Shear stress Ratio and P is the probability of liquefaction, which less than 0.50 in case of no liquefaction, and more than or equal 0.5 in case of liquefaction. Furthermore, in order to determined the best valid range of parameters that yields satisfied prediction accuracy, the range of the mean diameter of sand particles is divided into four zones and the prediction accuracy of the formula is determined for each zone, as shown in Fig. 7. It is clearly shown that most of the misclassifications in prediction occurred with very fine soils (D50 < 0.1 mm), which could be classified as silty sand. Therefore, it is recommended to use this formula in case of clean medium to fine sand (D50 > 0.1 mm).

0

10

20

30

40

50

60

70

80

90

100

2 Levels 3 Levels 4 Levels

No. of Levels

Acc

urac

y (%

) D50 < 0.1mm

0.1 < D50 < 0.2

0.2 < D50 < 0.3

0.3 < D50

Figure 7: Relation between accuracy of formula

and mean diameter of soil particles COMPARISON WITH EARLIER PREDICTION METHODS In order to evaluate the accuracy of the developed formula, the liquefaction potential predicted for the 174 records using the most known prediction approaches. This approaches are Seed and DeAlba (1986), Mitchell and Tseng (1990), Stark and Olson (1995) and Shibata and Teparaksa (1988). Comparison results of the are summarized in table 4 and shown in Fig. 8.

Table 4: Comparison between the accuracy of GP formula and earlier prediction methods

Prediction Method

Seed and

DeAlba (1986)

Shibata and

Teparaksa (1988)

Mitchell and

Tseng (1990)

Stark and

Olson (1995)

GP Formula

(2002)

Total No. of predicted cases 174 174 174 174 174 No. of misclassifications 51 22 35 21 21 Prediction accuracy % 71 88 80 88 88



. .

0

10

20

30

40

50

60

70

80

90

100

Seed &DeAlba1986

Shibata &Teparaksa

1988

Mitchell &Tseng1990

Stark &Olson1995

GP 2002

Prediction Methods

Acc

urac

y (%

)

Figure 8: Comparison between the accuracy of GP formula

and earlier prediction methods

CONCLUDING REMARKS In this research, GP as a new promising knowledge-base approach was utilized to develop a liquefaction potential assessment criteria. The conclusions of this research could be summarized in the following points: 1. From the above results, the best fitting formula to predict liquefaction is:

P = M D

CPTSSR D

MCPT

+

++ −

3

1

50

50

.

Where M is the magnitude of the earthquake, D50 is Mean diameter of soil particles in mm, CPT is the tip resistance from Cone Penetration Test in MPa, SSR is the site Seismic Shear stress Ratio and P is the probability of liquefaction, which less than 0.50 in case of no liquefaction, and more than or equal 0.5 in case of liquefaction.

2. The new formula provided by GP predicts the liquefaction with accuracy about

88-90%. Therefore, it is more accurate than many empirical relations such as Seed and DeAlba (1986) and Mitchell and Tseng (1990) and shearing the same range of prediction accuracy with Stark and Olson (1995) and Shibata and Teparaksa (1988).

3. For best accuracy it is recommended to use this formula in case of clean medium

to fine sand (D50 > 0.1 mm).



. . REFERENCES 1. Holland, J. (1975). "Adaptation in Natural and Artificial Systems," Ann Arbor,

MI, University of Michigan Press. 2. Koza, J. R., (1994). "Genetic Programming-2," MIT Press, Cambridge, MA. 3. Kramer, S. K., (1996). "Geotechnical Earthquake Engineering," Prentice-Hall,

Inc. 4. Lade, P. V., (1992). "Static Instability and Liquefaction of Loose Fine Sandy

Slopes," Journal of Geotechnical Engineering, Vol. 118, No. 1, pp. 51-71. 5. Michalewicz, Z. (1992)."Genetic Algorithms + Data Structure = Evaluation

Programs", Springer-Verlag Berlin Heidelberg, New York. 6. Mitchell, J. K. and Tseng D. (1990). "Assessment of liquefaction potential by

Cone Penetration Resistance," Proceeding, H. Bolton Seed Memorial Symposium, Berkeley, California, Vol. 2, pp. 335-350.

7. Riccardo, P. (1996). "Introduction To Evolutionary Computation," Collection of

Lectures, School of Computer Science, University of Birmingham, UK. 8. Robertson, P. K. and Campanella, R. G. (1985). “Liquefaction potential of

sands using the CPT,” Journal of Geotechnical Engineering, ASCE, Vol 111(3), pp. 384-403.

9. Seed, H. B. and De Alba, P. (1986). "Use of CPT and CPT tests for evaluation

the liquefaction resistance of soils," proceeding, InSitu 86, ASCE, pp. 281-302. 10. Stark, T. D. and Olson, S. (1995). “Liquefaction resistance using CPT and field

case histories,” Journal of Geotechnical Engineering, Vol. 121, No.12, pp. 856-869.