HIC04 153 Sedim Bhatt Sol

Embed Size (px)

Citation preview

  • 8/11/2019 HIC04 153 Sedim Bhatt Sol

    1/8

    6thInternational Conference on Hydroinformatics - Liong, Phoon & Babovic (eds)

    2004 World Scientific Publishing Company, ISBN 981-238-787-0

    1

    A DATA MINING APPROACH TO MODELLING SEDIMENTTRANSPORT

    B. BHATTACHARYA 1

    R.K. PRICE 2

    D.P. SOLOMATINE 31,2,3

    Department of Hydroinformatics and Knowledge Management, UNESCO-IHE

    Institute for Water Education, P.O. Box 3015, 2601 DA Delft, The Netherlands. Email:

    {bha, rkp, sol} @ihe.nl.

    Even though numerous models for predicting sediment transport rates are available theirdependability is often questionable. Data mining (DM), which is particularly useful inmodelling processes about which adequate knowledge of the physics is limited, is

    presented as a tool complimentary to modelling sediment transport. This paper reports onthe use of DM methods such as artificial neural networks and model trees in modelling

    bed-load and total-load transport using measured data. The predictive accuracy of thesemodels is compared with that of some well-known existing models. A conclusion isreached that the DM models are able to learn the complex transport process from theavailable data.

    INTRODUCTION

    A reasonable estimate of sediment transport rates in alluvial streams is important in thecontext of a number of water management issues. Even though extensive research overthe last fifty years has produced a plethora of bed-load, suspended-load and total-loadtransport models the predictive accuracy of these models has barely increased. Theadequacy of these models has been reviewed by ASCE [1], Gomez and Church [2], Yalin[3], Van Rijn [4]-[6], etc. Sediment transport is an immensely complex process and theexpression of the transport process through a deterministic mathematical framework maynot be possible in the foreseeable future.

    In parallel with research into sediment transport has been the emergence of newmodelling paradigms such as data mining (DM). This has opened up new opportunitiesfor modelling processes about which either the level of available knowledge is too

    limited to put the relevant information in a mathematical framework or too little data isavailable for calibrating an appropriate model. DM is presently being utilised in almostall branches of science as an alternative and complementary to the more traditional

    physically-based modelling system. Use of artificial neural networks (ANN) remains inthe forefront of this complementary modelling practice. The recent successfulapplications of DM methods to modelling water engineering problems (e.g. ASCE, [7])

    present DM as a suitable potential candidate to modelling sediment transport. This paper

  • 8/11/2019 HIC04 153 Sedim Bhatt Sol

    2/8

    2

    documents the development of bed-load and total-load transport models using an ANNand another promising modelling method called M5 model trees.

    DATA MINING MODELLING APPROACH

    In the DM approach a non-linear parametric function approximator is used and thecoefficients of the function decomposition are obtained from input-output data pairs, aspecified topology and systematic learning rules. Once trained, the DM model becomes a

    parametric description of the function being approximated. The goal of learning fromexamples is to find the general rule that created the specific examples, and this isachieved by trying out different model topology and related parameters. Out of several

    possible methods for function approximation we considered ANN and model trees(which is almost unknown to the water sector) as the modelling methods.

    An introduction to ANN can be found in Haykin [8]. Bed-load transport is anonlinear, multivariate process and the variables may have unknown or partially knowninterrelationships. The immense success with which ANNs have been used to modelvarious non-linear system behaviour in a wide range of areas such as hydrology,hydrodynamics, water quality, water system control, etc. (e.g. ASCE [7]) indicates thatthis approach could also be useful in sediment transport. Very limited research on usingDM in sediment transport has been reported. Nagy [9] used an ANN to estimate thenatural sediment discharge in rivers in terms of sediment concentration. Jain [10] used anANN for setting up a sediment rating curve. Namin and Lin [11] made an attempt to

    predict sediment transport for morphological assessment.

    Model trees (MT) use an automatic splitting of the input domain for assigning locallyaccurate multivariate linear regression models for each sub-area (Quinlan [12]). During

    training the acquired information is used to generate a tree structure that consists ofdecision nodes that contain an attribute name and branches to other decision trees, one foreach value of the attribute, and leaf or answer nodes with a linear model. Thus an MT is acombination of piecewise linear models each of which is suitable for a particular domainof the input space. The algorithm of an MT breaks up the input space of the training datainto a number of sub-areas represented by nodes or decision points in order to assignlinear models suitable for each sub-area of the input space. The continuous splitting oftenresults in too complex a tree that needs to be pruned (reduced) to a simpler tree toimprove the generalisation capability. Finally, the value predicted by the model at theappropriate leaf is adjusted by the smoothing operation to reflect the predicted values atthe nodes along the path from the root to that leaf. The use of MT in the water sector has

    been introduced only recently (Solomatine and Dulal [13]).

    MODELLING

    Selection of input-output parameters

    It is believed that bed-load transport depends critically on the parameters *D and T(fordetails see e.g. Van Rijn [4], Yalin [3], etc.) which are defined by:

  • 8/11/2019 HIC04 153 Sedim Bhatt Sol

    3/8

    3

    3/1

    250*)1(

    =

    gsDD (1)

    where D50 = representative particle diameter, g = acceleration due to gravity, =kinematic viscosity of water, s = specific density parameter (= density of sediment

    particle/ density of water), and

    2

    22

    )(

    )()(

    cr

    crT

    = (2)

    where = mobility parameter related to grain roughness, and cr= Shields criticalshear stress.

    It follows that *D and Tcan be chosen as the input parameters to a DM model forbed-load transport. These parameters embody the most important insights that bed-loadtransport is related to i) the mobility parameter related to the grain roughness; and ii)Shields criteria of incipient motion is relevant. As the parameter Tis computed using theShields diagram so the use of these parameters as inputs ensures the implicit embeddingof the Shields diagram in the DM model. The output of the model can be expressed asthe dimensionless transport rate (b) defined by:

    350)1( gDs

    qbb

    = (3)

    where qb= bed-load transport rate (by volume of dry weight of sediments).

    It is assumed that the total-load transport can be expressed with sufficient accuracyby the parameters *D , T and h/D50where hstands for the depth of flow (for details seeYalin [3]). Accordingly, the parameters *D , T and h/D50 can be selected as the input

    parameters to a DM model. The output of the model is chosen as the dimensionless totaltransport rate (t) defined by:

    350)1( gDs

    qtt

    = (4)

    where qt= total-load transport rate (by volume of dry weight of sediments).

    Bed-load transport model

    Gomez and Church [2] have summarized several datasets of bed-load transport fromlaboratory and field measurements. From these datasets only the data characterizingequilibrium bed-load transport have been considered.

    Observing a non-uniform distribution of data we adopted a log (natural)transformation of the input and output variables. In this way distributions of thetransformed variables were closer to the normal. In order to model a wide range oftransport rate characteristics we used 2/3rds of each dataset as training data and theremainder as testing data. In order to maintain a statistical homogeneity between the

  • 8/11/2019 HIC04 153 Sedim Bhatt Sol

    4/8

    4

    training and testing sets one possibility could be a random choice of testing data in aniterative loop with a defined objective function, but this reduces the possibility of

    reproducing the results. Instead we chose two consecutive data points for trainingfollowed by one data point for testing, and this resulted in training and testing sets with asimilar statistical distribution. 280 and 127 data points were used for training and testingrespectively.

    A multi-layered perceptron network trained by the back-propagation algorithm witha hyperbolic tangent function for the hidden layer, a linear transfer function for the outputlayer and 8 hidden nodes (found by optimisation) was used. For the model treeexperiment, a model tree was built using the tool WEKA (Witten and Frank [14]) withthe M5 algorithm.

    Total-load transport model

    Brownlie [15] has summarized several datasets of total transport rates and thecorresponding variables from 55 flume and 24 field datasets (for a detailed description

    see Brownlie [15]). Datasets marked by Brownlie as incorrect or not verified wereremoved. Further, data points with specific gravity of bed material outside the range of2.57-2.7, Froude number > 0.9,D50> 32 mm, h/D50> 50000, T> 50 were also removed.This resulted in a set of 4187 data points with h [0.02-14.5] m, velocity u[0.1-2.7] m/s,

    D50[0.011-32] mm and slope I[0.000003-0.0158]. The width/depth ratio was between 1and 579.

    A log (natural) transformation of the input and output variables was adopted to bring

    the data distribution closer to the normal. A methodology similar to that described for thebed-load transport model was adopted to maintain a homogeneity of statisticaldistribution in the training and testing sets. 2814 and 1355 data points were used fortraining and testing respectively. An ANN, similar in structure to that of the bed-loadtransport model, was chosen with 3 hidden nodes (found by optimisation).

    RESULTS AND DISCUSSIONS

    Bed-load transport

    The first MT generated was very complex with 25 linear models at the leaf nodes. It wasaccurate in training but suffered from overfitting and had to be pruned in order to ensurea good generalisation capacity. Pruning is done until the predictive accuracy does notdrop substantially. With pruning the simplest model possessing three linear equations wasfound. Two of them correspond to the incipient motion conditions (not shown) and thethird equation that describes the bed-load transport process for most situations is given

    by:

    )(ln893.0)(ln353.063.2)ln( * TDb += (5)

    where ln stands for the natural logarithm. Equation (5) can be rewritten as:

  • 8/11/2019 HIC04 153 Sedim Bhatt Sol

    5/8

    5

    Table 1. Comparison of errors of the bed-load transport models on the testing data(RMSE = root mean square error;Dr= computed transport rate/measured transport rate)

    353.0*

    893.0

    072078.0D

    Tb = (6)

    Eq. (6) has some similarity to the Van Rijn model [5] defined below:

    3.0*

    1

    2

    D

    Tb

    = (7)

    where 1= 0.053 and 2= 2.1 for T

  • 8/11/2019 HIC04 153 Sedim Bhatt Sol

    6/8

    6

    Table 2. Comparison of performances of the different total-load transport models on thetest dataset (r= correlation coefficient)

    Datatype

    No ofdata

    Data ranges Error term E-H VanRijn

    ANN MT

    u: 0.1-2.4 m/s RMSE 10-5m2/s 7.9 6.0 11.1 6.5

    h: 0.02-1.1 m r 0.94 0.85 0.96 0.96

    Flume 1029

    D50: 0.011-29 mm % of datawithDrwithin 0.5-2 55 43 56 52

    u: 0.1-2.7 m/s RMSE 10-5m2/s 151.9 90.6 68.4 60.0

    h: 0.02-14.5 m r 0.57 0.52 0.61 0.72

    Field 326

    D50: 0.09-32 mm % of datawithDrwithin 0.5-2 61 54 67 69

    u: 0.1-2.7 m/s RMSE 10-5

    m2

    /s 74.8 44.7 34.9 30.0h: 0.02-14.5 m r 057 0.54 0.63 0.74

    Total 1355

    D50: 0.011-32 mm % of datawithDrwithin 0.5-2 56 46 59 56

    seen that the ANN and MT models perform very well and are to some extent better thanthe Bagnolds model which was previously found as the best among the existing models(on this dataset). Fig. 1 shows a scatter plot of the computed vs the measured bed-load

    transport rates (field measurements from the test dataset) along with the lines of 2-timesand 0.5-times the measured transport rates. The scatter for the ANN model (not shown)was almost the same as for with the MT model. Both were far better than the Van Rijnmodel and were comparable with Bagnolds model.

    Total-load transport

    Similar to above, the simplest model tree with only two linear models was found fortotal-load transport. One of them corresponds to very low values of T(not shown) and theother equation that describes the total transport process for most situations is given by:

    )/(ln486.0)(ln61.1)(ln21.163.2)ln( 50* DhTDt ++= (8)

    (a) (b) (c)

    Fig 2. Scatter-plot of measured and computed total-load transport rates by (a) Engelund-Hansen [18]; (b) Van Rijn [6]; (c) Model tree (Equation (9)).

    1.E-05

    1.E-03

    1.E-05 1.E-03

    Measured flux (m2/s)

    Compu

    tedflux(m2/s)

    1.E-05

    1.E-03

    1.E-05 1.E-03

    Measured flux (m2/s)

    Compu

    tedflux(m2/s)

    1.E-05

    1.E-03

    1.E-05 1.E-03

    Measured flux (m2/s)

    Compu

    tedflux(m2/s)

  • 8/11/2019 HIC04 153 Sedim Bhatt Sol

    7/8

    7

    where ln stands for the natural logarithm. Equation (8) can be rewritten as:486.0

    5021.1

    *

    61.1

    072078.0

    =

    D

    h

    D

    Tt (9)

    Comparison of results

    There is no universally acceptable transport model for total-load. We have chosenEngelund-Hansen [18] and Van Rijn [6] models for a comparison because they are the

    models most widely used. Table 2 shows that the ANN and MT models performed muchbetter than the models of Engelund-Hansen and Van Rijn on the test dataset. Fig. 2 showsa scatter plot of the computed vs the measured transport rates (field measurements fromthe test dataset) along with the lines of 2-times and 0.5-times the measured transport

    rates. It can be observed that for the MT model, most of the testing data were within thesetwo lines.

    CONCLUSIONS

    In this research ANN and MT models for predicting bed-load and total transport rateswere developed using several published flume and field datasets. The main conclusionsof the study are listed below: The ANN and MT models for bed-load transport performed better than the models of

    Bagnold, Parker et al, and Van Rijn. The RMSE was smaller and the percentage of

    testing data withDrwithin 0.5-2 was larger. It is noteworthy that this dataset was usedby Gomez and Church [2] for a comparison of several bed-load models, and based on

    that study it can be concluded that the DM models performed better than most well-known bed-load transport models.

    The ANN and MT models on total-load transport performed better than the modelsof Engelund-Hansen and Van Rijn. The RMSE was low and the percentage of testingdata withDrwithin 0.5-2 was much larger.

    The performance of the MT models (both bed-load and total-load) was better thanthat of the ANN models. Moreover, the convenient mathematical expression of theMT model makes it easier to be used in practical situations.

    Even with extensive research on sediment transport during the past decades thepredictive accuracy of the available models has not improved much. It is thought thatexpressing the transport process through a deterministic mathematical framework is

    difficult and may not be possible. In this regard the applicability of the data miningapproach to modelling sediment transport seems to have a high potential.

    REFERENCES

    [1] ASCE Task Committee, Sediment transportation mechanics: Sediment dischargeformulas,J. of Hydraulic Division, ASCE, 97(HY4), (1971), pp 523-567.

  • 8/11/2019 HIC04 153 Sedim Bhatt Sol

    8/8

    8

    [2] Gomez, B., and Church, M., An assessment of bed load sediment transportformulae for gravel bed rivers, Water Resources Research, 25(6), (1989), pp 1161-

    1186.[3] Yalin, M. S., Mechanics of sediment transport, Pergamon Press, Oxford, (1977).[4] Van Rijn, L.C., Principles of sediment transport in rivers, estuaries and coastal

    areas, Aqua Publications, Amsterdam, The Netherlands, (1993).[5] Van Rijn, L.C., Sediment transport Part I: bed load transport, ASCE, J. of

    Hydraulic Div., 110(10), (1984), pp. 1431-1455.[6] Van Rijn, L.C., Sediment transport Part II: suspended load transport,ASCE, J. of

    Hydraulic Div.,110(11), (1984), pp. 1613-1641.[7] ASCE Task Committee on Application of artificial neural networks in hydrology,

    Artificial neural network in hydrology, ASCE, J. of Hydrologic Engg., 5(2),(2000), pp 115-137.

    [8] Haykin, S., Neural networks: a comprehensive foundation, Prentice Hall, (1999).[9] Nagy, H.M., Watanabe, B., and Hirano, M., Prediction of sediment load

    concentration in rivers using artificial neural network model, ASCE, J. ofHydraulic Engg., 128(6), (2002), pp.588-595.

    [10] Jain, S.K., Development of integrated sediment rating curves using ANNs, ASCE,J. of Hydraulic Engg.,, 127(1), (2001), pp.30-37.

    [11] Namin, M.M., and Lin, B., A 2D vertical hydrodynamic and morphological modelbased on AANs,Proc. of the 5thHydroinformatics conf, Cardiff, UK, (2002).

    [12] Quinlan, J.R., Learning with continuous classes, Proc. of Australian Joint Conf.

    on Artificial Intelligence, World Scientific, Singapore, (1992), pp. 343-348.[13] Solomatine, D.P., and Dulal, K.N., Model tree as an alternative to neural networkin rainfall-runoff modelling,Hydrological Sciences J.,48(3), (2003), pp.399-412.

    [14] Witten, I.H. and Frank, E., Data mining: practical machine learning tools andtechniques with java implementations, Morgan Kaufmann Pub., (2000).

    [15] Brownlie, W.R, Compilation of alluvial channel data: laboratory and field,W.M.Keck Laboratory of Hydraulics and Water Resources, Div. of Engg. and AppliedScience, California Institute of Technology, Pasadema, USA, (1981).

    [16] Bagnold, R.A., An empirical correlation of bedload transport rates in flumes andnatural rivers, in: The physics of sediment transport by wind and water, ASCE,(1988), pp. 323-345.

    [17] Parker, G., Klingeman, P.C., and McLean, D.G., Bedload and size distribution in

    paved gravel-bed streams,ASCE, J. of Hydraulic Div., 108(HY4), (1982), pp 544-571.

    [18] Engelund, F., and Hansen, E., A monograph on sediment transport in alluvialstreams, Teknisk Forlag, Copenhagen, Denmark, (1967).