13
Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty analysis Davor Antanasijevic ´ a,, Viktor Pocajt b , Aleksandra Peric ´ -Grujic ´ b , Mirjana Ristic ´ b a University of Belgrade, Innovation Center of the Faculty of Technology and Metallurgy, Karnegijeva 4, 11120 Belgrade, Serbia b University of Belgrade, Faculty of Technology and Metallurgy, Karnegijeva 4, 11120 Belgrade, Serbia article info Article history: Received 14 June 2014 Received in revised form 27 August 2014 Accepted 1 October 2014 Available online 13 October 2014 This manuscript was handled by Andras Bardossy, Editor-in-Chief, with the assistance of Fi-John Chang, Associate Editor Keywords: DO GRNN MCS VIF Genetic algorithm Correlation analysis summary This paper describes the training, validation, testing and uncertainty analysis of general regression neural network (GRNN) models for the forecasting of dissolved oxygen (DO) in the Danube River. The main objectives of this work were to determine the optimum data normalization and input selection tech- niques, the determination of the relative importance of uncertainty in different input variables, as well as the uncertainty analysis of model results using the Monte Carlo Simulation (MCS) technique. Min– max, median, z-score, sigmoid and tanh were validated as normalization techniques, whilst the variance inflation factor, correlation analysis and genetic algorithm were tested as input selection techniques. As inputs, the GRNN models used 19 water quality variables, measured in the river water each month at 17 different sites over a period of 9 years. The best results were obtained using min–max normalized data and the input selection based on the correlation between DO and dependent variables, which provided the most accurate GRNN model, and in combination the smallest number of inputs: Temperature, pH, HCO 3 , SO 4 2 , NO 3 -N, Hardness, Na, Cl , Conductivity and Alkalinity. The results show that the correlation coefficient between measured and predicted DO values is 0.85. The inputs with the greatest effect on the GRNN model (arranged in descending order) were T, pH, HCO 3 , SO 4 2 and NO 3 -N. Of all inputs, variability of temperature had the greatest influence on the variability of DO content in river body, with the DO decreasing at a rate similar to the theoretical DO decreasing rate relating to temperature. The uncertainty analysis of the model results demonstrate that the GRNN can effectively forecast the DO content, since the distribution of model results are very similar to the corresponding distribution of real data. Ó 2014 Elsevier B.V. All rights reserved. 1. Introduction Programs that monitor water quality help to understand various processes that have an impact on the overall quality of water and provide necessary information for the management of water resources in general. The quality of a water body is usually described by sets of physical, chemical and biological variables that are mutually interrelated (Khalil et al., 2010). The river waters have been contaminated as a result of the discharges from wastewater containing degradable organics, nutrients, domestic effluent, and agricultural waste (Dimitrovska et al., 2012). All of the aforemen- tioned contaminants directly or indirectly negatively affect key river quality parameters such as dissolved oxygen (DO) content, temperature, pH, conductivity, transparency, viscosity and total dissolved solids. Among them, the DO is the most severely affected, since the diffusion of oxygen into the river body (re-aeration) is an inherently slow process. This in turn puts additional strain on the other very important contributor to DO, namely the generation of oxygen from photosynthetic aquatic plants (Araoye, 2009). Fur- thermore, the above-mentioned water contamination, among other parameters (e.g. the amount of light, species and abundance of plants), also influence the factors which control the rate of pho- tosynthesis, which makes the quantification of DO content in rivers one of the primary concerns for water resource managers (Wen et al., 2013). Water quality modeling as a basis for water pollution control are commonly used to predict trends in water quality based on current water conditions, including pollutant concentrations (Najah et al., 2011). The major issue in the application of water quality models, such as IWA River Quality Model No. 1 (Reichert et al., 2001), QUAL2K (Chapra and Pellettier, 2003), WASP6 (Wool et al., 2006), is the requirement for more information regarding the river system than is often available (Mannina and Viviani, 2010). A constant need for less complex models for the DO forecasting led to the application of artificial neural networks http://dx.doi.org/10.1016/j.jhydrol.2014.10.009 0022-1694/Ó 2014 Elsevier B.V. All rights reserved. Corresponding author. Tel.: +381 11 3303 642; fax: +381 11 3370 387. E-mail address: [email protected] (D. Antanasijevic ´). Journal of Hydrology 519 (2014) 1895–1907 Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

Modelling of dissolved oxygen in the Danube River using ... · Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty

Embed Size (px)

Citation preview

Page 1: Modelling of dissolved oxygen in the Danube River using ... · Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty

Journal of Hydrology 519 (2014) 1895–1907

Contents lists available at ScienceDirect

Journal of Hydrology

journal homepage: www.elsevier .com/ locate / jhydrol

Modelling of dissolved oxygen in the Danube River using artificial neuralnetworks and Monte Carlo Simulation uncertainty analysis

http://dx.doi.org/10.1016/j.jhydrol.2014.10.0090022-1694/� 2014 Elsevier B.V. All rights reserved.

⇑ Corresponding author. Tel.: +381 11 3303 642; fax: +381 11 3370 387.E-mail address: [email protected] (D. Antanasijevic).

Davor Antanasijevic a,⇑, Viktor Pocajt b, Aleksandra Peric-Grujic b, Mirjana Ristic b

a University of Belgrade, Innovation Center of the Faculty of Technology and Metallurgy, Karnegijeva 4, 11120 Belgrade, Serbiab University of Belgrade, Faculty of Technology and Metallurgy, Karnegijeva 4, 11120 Belgrade, Serbia

a r t i c l e i n f o s u m m a r y

Article history:Received 14 June 2014Received in revised form 27 August 2014Accepted 1 October 2014Available online 13 October 2014This manuscript was handled by AndrasBardossy, Editor-in-Chief, with theassistance of Fi-John Chang, Associate Editor

Keywords:DOGRNNMCSVIFGenetic algorithmCorrelation analysis

This paper describes the training, validation, testing and uncertainty analysis of general regression neuralnetwork (GRNN) models for the forecasting of dissolved oxygen (DO) in the Danube River. The mainobjectives of this work were to determine the optimum data normalization and input selection tech-niques, the determination of the relative importance of uncertainty in different input variables, as wellas the uncertainty analysis of model results using the Monte Carlo Simulation (MCS) technique. Min–max, median, z-score, sigmoid and tanh were validated as normalization techniques, whilst the varianceinflation factor, correlation analysis and genetic algorithm were tested as input selection techniques. Asinputs, the GRNN models used 19 water quality variables, measured in the river water each month at 17different sites over a period of 9 years. The best results were obtained using min–max normalized dataand the input selection based on the correlation between DO and dependent variables, which providedthe most accurate GRNN model, and in combination the smallest number of inputs: Temperature, pH,HCO3

�, SO42�, NO3-N, Hardness, Na, Cl�, Conductivity and Alkalinity. The results show that the correlation

coefficient between measured and predicted DO values is 0.85. The inputs with the greatest effect on theGRNN model (arranged in descending order) were T, pH, HCO3

�, SO42� and NO3-N. Of all inputs, variability

of temperature had the greatest influence on the variability of DO content in river body, with the DOdecreasing at a rate similar to the theoretical DO decreasing rate relating to temperature. The uncertaintyanalysis of the model results demonstrate that the GRNN can effectively forecast the DO content, sincethe distribution of model results are very similar to the corresponding distribution of real data.

� 2014 Elsevier B.V. All rights reserved.

1. Introduction

Programs that monitor water quality help to understandvarious processes that have an impact on the overall quality ofwater and provide necessary information for the management ofwater resources in general. The quality of a water body is usuallydescribed by sets of physical, chemical and biological variables thatare mutually interrelated (Khalil et al., 2010). The river waters havebeen contaminated as a result of the discharges from wastewatercontaining degradable organics, nutrients, domestic effluent, andagricultural waste (Dimitrovska et al., 2012). All of the aforemen-tioned contaminants directly or indirectly negatively affect keyriver quality parameters such as dissolved oxygen (DO) content,temperature, pH, conductivity, transparency, viscosity and totaldissolved solids. Among them, the DO is the most severely affected,since the diffusion of oxygen into the river body (re-aeration) is an

inherently slow process. This in turn puts additional strain on theother very important contributor to DO, namely the generation ofoxygen from photosynthetic aquatic plants (Araoye, 2009). Fur-thermore, the above-mentioned water contamination, amongother parameters (e.g. the amount of light, species and abundanceof plants), also influence the factors which control the rate of pho-tosynthesis, which makes the quantification of DO content in riversone of the primary concerns for water resource managers (Wenet al., 2013).

Water quality modeling as a basis for water pollution controlare commonly used to predict trends in water quality based oncurrent water conditions, including pollutant concentrations(Najah et al., 2011). The major issue in the application of waterquality models, such as IWA River Quality Model No. 1 (Reichertet al., 2001), QUAL2K (Chapra and Pellettier, 2003), WASP6 (Woolet al., 2006), is the requirement for more information regardingthe river system than is often available (Mannina and Viviani,2010). A constant need for less complex models for the DOforecasting led to the application of artificial neural networks

Page 2: Modelling of dissolved oxygen in the Danube River using ... · Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty

1896 D. Antanasijevic et al. / Journal of Hydrology 519 (2014) 1895–1907

(ANN) in this field (Chang et al., 2013; Chen and Chang, 2009). Theadvantage of ANNs over deterministic models is that they requireless data and they are well suited for forecasting (Kisi et al.,2012). In addition, the ANN approach does not require a complexand explicit description of the underlying process in a mathemat-ical form (Nayak et al., 2005). The design of ANNs originated from adesire to emulate human learning, which led to the application ofmassive parallel, distributed processing and computing techniquesinspired by biological neuron processing. ANNs are proved to behighly effective for modeling non-linear problems, with applica-tion to diverse large-scale problems (Banerjee et al., 2011).

Successful application of ANN models for the forecasting of DOis associated with several challenges, the key issues being properdata normalization and the selection of the model inputs that havethe most significant impact on model performance. Employing alarge number of inputs to an ANN model usually increases the net-work size, resulting in a decrease in processing speed, a reductionin the efficiency of the network (Arhami et al., 2013), and also mayultimately result in a model that is not suitable as a practical fore-casting tool. One of the important subjects in ANN modeling stud-ies is the analysis of uncertainty and the influence of input datauncertainty on the model results. The term uncertainty refers tolack of knowledge or information on the models, parameters, con-stants, input data, and beliefs/concepts. Information on the totalmodel uncertainty, for models which support decision-making, isessential and it is as important as the modeling results themselves(Borrego et al., 2008). The Monte Carlo Simulation (MCS) techniqueis a widely used method for the analysis of uncertainty in hydro-logical modeling and it allows the quantification of the model out-put uncertainty resulting from uncertain model parameters, inputdata or model structure (Shrestha et al., 2009).

In recent years, considerable progress has been made in thedevelopment of ANN models for the forecasting of DO. Some exam-ples of the application of ANNs for the modeling of DO at a singlelocation, include models developed for the Melen River, Turkey(Samandar, 2010), Bow River, Canada (He et al., 2011), FoundationCreek in Colorado, USA (Ay and Kisi, 2012), the Danube River inBezdan, North Serbia (Antanasijevic et al., 2013a) and the UpperKlamath River in Oregon, USA (Heddam, 2014). In those papers,the authors tested a variety of ANN architectures (feed-forward,recurrent, radial basic and general regression neural network),applied for various periods of time, as well as using different datarepresentations (for details please see Appendix Table A1). Incontrast, the application of ANNs for the modeling of DO acrossmultiple sites was limited only to the use of multilayer perceptron(for examples and details see Appendix Table A2).

In this paper, we propose an integrated ANN model, based onthe general regression neural network (GRNN) architecture, forthe forecasting of DO across multiple sites; the model is in thisinstance applied to all monitoring stations located on the DanubeRiver, covering its 588-km course through the territory of Serbia.Different methods for data normalization and input selection werein order to enhance the performance of the model and to reducethe number of inputs needed for DO forecasting. The performanceof the created ANN models were analyzed using multiple statisticalmetrics. Finally, the impact of input data uncertainty on the modeloutput and the analysis of uncertainty of the results wereperformed using the Monte Carlo Simulation (MCS) technique.

2. Materials and methods

2.1. Study area and water quality data

The Danube is the longest river on the Balkan Peninsula and thesecond longest river in Europe, after the Volga. It is an international

waterway that connects Germany, as well as other Central Euro-pean and Balkan countries with the Black Sea. The Danube flowsfor 2857 km and passes through or touches the borders of ten coun-tries: Germany, Austria, Slovakia, Hungary, Croatia, Serbia, Bulgaria,Romania, Ukraine, and Moldova (ICPDR, 2014). Around 10% of itsbasin is located in Serbia and on its 588-km course the quality ofriver water is monitored at 17 separate monitoring stations (Fig. 1).

The dataset used in this study has been generated through con-tinuous monitoring of the water quality of the Danube River in theterritory of the Republic of Serbia. The water quality was moni-tored regularly each month (monthly or semi-monthly) at 17 dif-ferent sites over a period of 9 years (2002–2010) and the datawas obtained from the Serbian Agency for Environmental Protec-tion (SEPA, 2013). The availability of data, number of data patterns(input vectors) per year and number of data patterns per site forthe studied period are presented in Table 1. There were between131 and 252 available patterns per year, while the number of avail-able patterns per site was between 53 and 128.

All water samples collected during the study period were ana-lyzed for a large number of different water quality parameters,from which, 19 were selected as inputs for the model (Table 2).In total, the dataset contained 1512 data patterns with 20 waterquality parameters, which provided more than 30,000 individualdata points. The basic statistics of the selected input/outputparameters are presented in Table 2.

2.2. ANN architecture

An artificial neural network, which employs the model struc-ture of a biological neural network, is a very powerful computa-tional technique for modeling complex non-linear relationshipsparticularly in situations where the explicit form of the relation-ship between the variables involved is unknown (Singh et al.,2009). The basic and the most commonly used ANN architectureconsists of an input layer, a series of hidden layers and an outputlayer. Each of these layers consists of a number of interconnectedneurons (processing units). In this study, the ANN architectureknown as general regression neural network (GRNN) was used,since it proves to be an effective alternative to the basic Feed-for-ward ANNs (Heddam, 2014). The GRNN is based on the non-linearregression theory and is a universal approximator for smooth func-tion. It consists of four layers, which are presented in Fig. 2.

GRNN is a one-pass supervised learning network, which meansthat weights (Wij and WS1) between neurons in different layers aredetermined by the values of variables, there Wij are weightsdefined by the ith input variable and the jth training pattern, whileWS1 is equal to the values of output variable. In this architecture,the number of neurons in the input and output layer correspondsto the number of input and output variables, while the numberof pattern neurons is equal to the number of data patterns. Thenumber of neurons in the summation layer can be expressed asNo + 1, where No is the number of output neurons. In this case,the pattern neurons are connected to two neurons in the summa-tion layer, since the model has only one output.

Being a supervised network, the GRNN basically measures thedistance (Dj) of the training patterns in N-dimensional space,where N is the number of inputs, and estimates the output accord-ingly (Hanna et al., 2007). The calculated Dj, e.g. Euclidean distance(1), is then processed using an exponential activation function (2).

Dj ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXn

i¼1

ðwij � xiÞ2vuut ð1Þ

f ðDjÞ ¼ exp�Dj

2r2

� �ð2Þ

Page 3: Modelling of dissolved oxygen in the Danube River using ... · Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty

Fig. 1. Danube course through Serbia with the locations of monitoring stations.

Table 1Data availability for the studied period (the number of data patterns per year and site).

Monitoring station Year Patterns per site

2002 2003 2004 2005 2006 2007 2008 2009 2010

Bezdan 11 12 10 21 22 19 12 11 9 127Apatin 12 10 10 – 11 10 – 8 11 72Bogojevo 12 10 9 8 20 18 11 11 8 107Backa palanka – – – 8 9 8 10 11 7 53Novi Sad 11 12 7 21 23 19 11 12 12 128Slankamen 10 10 10 – 11 10 8 6 6 71Centa 10 10 10 – 11 10 7 6 – 64Zemun 11 12 9 7 15 19 13 7 10 103Pancevo 10 10 9 9 10 19 10 11 11 99Vinca – – – 14 14 17 14 11 10 80Smederevo 8 10 9 11 16 17 15 11 11 108Banatska Palanka 11 11 11 12 18 19 10 10 10 112Veliko Gradište 7 9 10 8 10 12 6 3 1 66Dobra 7 11 12 8 9 10 5 4 2 68Tekija 7 11 11 8 13 19 9 10 8 96Brza Palanka 7 9 11 8 9 10 6 3 4 67Radujevac 7 9 8 6 14 16 9 11 11 91

Patterns per year 141 156 146 149 235 252 156 146 131 1512

D. Antanasijevic et al. / Journal of Hydrology 519 (2014) 1895–1907 1897

One of the summation neurons (S1) computes the sum of theweighted outputs of the pattern layer (3), while the second neuron(S2), also called the division, calculates the un-weighted outputs ofthe pattern neurons (4).

S1 ¼Xk

j¼1

yj � f ðDjÞ ð3Þ

S2 ¼Xk

j¼1

f ðDjÞ ð4Þ

Finally, the output layer, which performs the estimatedweighted average, divides the output of the S1 neuron by the out-put of the S2 neuron to yield the desired estimate (Heddam, 2014).More details on the GRNN theory and training can be found

elsewhere (Specht, 1991). The parameter that defines the GRNN’spredicting performance is called the smoothing factor (r), and itrepresents the width of the calculated Gaussian curve for eachprobability density function. The smoothing factor is the only‘‘unknown’’ parameter in the GRNN algorithm and needs to bedetermined within the network training process. An optimal valueof r can be selected either manually or determined using an itera-tive or genetic algorithm. In general, r is greater than zero, how-ever often its optimal value is very close to zero. Nevertheless, ifthe determined/set r is smaller than an optimal value the resultingmodel will over/under-fit data, which leads to over-training andresults in reduced generalization performance. In the presentstudy, to prevent network overtraining a genetic algorithm (GA)was used, since it provides an optimal overall smoothing factor,as well as the individual smoothing factors (ISFs) for each input.ISFs can be used for the analysis of the significance of inputs as it

Page 4: Modelling of dissolved oxygen in the Danube River using ... · Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty

Table 2Descriptive statistics of the input and output data.

Unit Minimum(Imin)

Maximum(Imax)

Mean(Imean)

St. dev.(Isd)

Median(Imedian)

InputTemperature (T) �C �2.0 28.8 13.9 7.7 13.7pH – 6.9 8.8 8.0 0.3 8.0Conductivity lS/cm 241.0 684.0 405.0 59.0 397.0CO2 mg/l 0.0 21.1 2.5 2.4 2.0HCO3

� mg/l 115.3 466.7 197.5 28.3 197.0Alkalinity mg/l 94.5 302.0 163.1 21.1 162.1Chemical oxygen

demand (COD)mg/l 1.1 10.5 4.3 1.2 4.1

Total suspendedsolids (TSS)

mg/l 0.0 275.0 23.1 23.8 17.0

NO3-N mg/l 0.0 12.9 1.5 0.8 1.4Cl� mg/l 5.9 54.4 20.3 5.6 19.3SO4

2� mg/l 4.0 111.0 36.1 10.6 35.0PO4

3� mg/l 0.0 1.5 0.1 0.1 0.1P mg/l 0.0 1.5 0.1 0.1 0.1Ca mg/l 2.7 580.0 56.0 16.0 56.0Mg mg/l 2.9 44.0 14.5 4.0 14.0Hardness mg/l 18.8 334.0 198.0 30.3 198.0Na mg/l 1.1 55.5 15.2 4.6 14.5K mg/l 0.2 22.0 2.3 1.0 2.1Biological oxygen

demand (BOD)mg/l 0.5 7.9 2.5 1.1 2.4

OutputDissolved

oxygen (DO)mg/l 3.7 16.9 10.0 2.1 10.0

1898 D. Antanasijevic et al. / Journal of Hydrology 519 (2014) 1895–1907

has been done by Antanasijevic et al. (2013b). More details on theGA and its application for GRNN training can be found elsewhere(Chen and Chang, 2009; Kalogirou, 2003; Kim and Kim, 2008).

2.3. Data normalization and input selection techniques

Data normalization is a common step in the development pro-cess of both water quality and ANN models. Neural network train-ing can be made more efficient by performing normalization,which transforms inputs into a more usable form for the networkto utilize (Jayalakshmi and Santhakumaran, 2011). In this study,three commonly used normalization techniques min–max, medianand z-score were tested (5)–(7), along with two techniques in

Fig. 2. A schematic representation of GRNN architecture, where n is the number of inpuvalue of output variable, Ii is the number of input neurons, Pj is the number of hidden neactivation function, S1 and S2 are the signals from summation neurons, y(X) is the netw

which z-score values were additionally transformed using the sig-moid (8) and the hyperbolic-tangent (tanh) (9) function:

� Min—Max I0 ¼ I ð5Þ

ts, k is turons, Work outp

Imax � Imin

Median I0 ¼ I ð6Þ

� I median

Z-Score I0 ¼ I � Imean ð7Þ

� I sd

� Sigmoid I0 ¼ 1�I�Imean

ð8Þ

1þ e Isd

Tanh I0 ¼ tanhI � Imean� �

ð9Þ

�I sd

where I is the original value, I0 is the normalized value, whileother symbols are defined in Table 2.

There are two types of statistical approaches for the selection ofmodel inputs (Maier et al., 2010):

� Model free techniques that use dependence measures, such ascorrelation analysis, to obtain a set of mutually uncorrelatedinputs, as well as a set of inputs which are significantly corre-lated with the model output.� Model based techniques that use statistical measures, e.g. indi-

vidual smoothing factors (ISFs) determined through the ANNtraining in conjunction with a genetics algorithm, to obtainthe best combination scenarios of input variables.

In this study, the first two approaches applied for the selectionof input variables, that is the variance inflation factor (VIF) statisticand correlation analysis, are model free techniques, whilst thethird, ISF based technique, is model based.

VIF is commonly employed to screen for multicollinearity. Eachexplanatory variable is regressed against the other remainingexplanatory variables, and the VIF is calculated according to Eq.(10) where R2 is the regression model coefficient of determination(Kroll and Song, 2013). Inputs with the highest VIF values, i.e. the

he number of data patterns, xi is the input data pattern, yi is the measuredij, Ws1, Ws2 are the network weights, Dj is the distance measure, f(Dj) is the

ut.

Page 5: Modelling of dissolved oxygen in the Danube River using ... · Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty

Fig. 3. Performance metric values for the GRNN models created using different data normalization techniques.

Table 3VIF analysis and input combinations used for the models.

Input VIF

Alkalinity 10.13 Removed from dataset in VIF18 modelHCO3

� 8.91 Additionally removed from dataset in VIF17 modelHardness 4.14 Additionally removed from dataset in VIF16 modelConductivity 3.17 Additionally removed from dataset in VIF15 modelNa 2.88 Additionally removed from dataset in VIF14 modelPO4

3� 2.55 Additionally removed from dataset in VIF13 modelP 2.43 Additionally removed from dataset in VIF12 modelCl� 2.22 Additionally removed from dataset in VIF11 modelT 2.13 Additionally removed from dataset in VIF10 modelMg 2.11COD 1.73SO4

2� 1.70NO3-N 1.54CO2 1.48Ca 1.34BOD 1.34TSS 1.33pH 1.24K 1.17

D. Antanasijevic et al. / Journal of Hydrology 519 (2014) 1895–1907 1899

most highly correlated inputs, were sequentially removed from thedataset, which was then followed by the creation of a new model,without removed inputs.

VIF ¼ 11� R2 ð10Þ

The correlation analyses (CA) were in this case focused only onthe determination of the importance of each of the inputs for theDO content in the river body. Again, the inputs were sequentiallyremoved from the dataset, starting with the input that had thelowest correlation with the DO content, and a new model was cre-ated with the reduced dataset. Finally, the third approach implied asequential elimination of inputs with the smallest ISF values,which were calculated using a genetic algorithm (GA) in theprocess of GRNN training, as it is presented in Section 3.2.

2.4. Uncertainty analysis

Input data uncertainty is related to the accuracy and represen-tativeness of the input data used for predictions (Arhami et al.,2013). The uncertainty in model predictions due to uncertaintiesin input parameters is a function of the magnitudes and shapesof the probability density functions (PDFs) of the uncertainties inindividual inputs (Hanna et al., 1998). Monte Carlo Simulation(MCS) involves the repeat generation of random parameters fromtheir probability distributions, and then computing the statistics

of the output. Since, many PDFs are suitable for Monte Carloanalysis (e.g., normal, log-normal, uniform, Poisson, Weibul, etc.),the Kolmogorov–Smirnov test is often used to test the null hypoth-esis that two independent samples are not different according totheir distribution characteristics, with a 5% significance level(Dehghani et al., 2014). Maximum and minimum limits on eachvariable were also adopted to prevent unrealistic selection ofextreme values. The PDFs assumption and MCS re-sampling wereboth performed using Statistica 10 trial (StatSoft. Inc., 2010).

2.5. Model performance metrics

All created GRNN models were analyzed using multiple perfor-mance metrics (see Appendix B).

The aforementioned performance metrics have been consideredbecause they measure the models predictive capability, accuracyand precision. The index of agreement is a standardized measureof the degree of model prediction error and it is limited to therange of 0–1. IA represents the ratio between the mean squareerror and the potential error (Willmott, 1984). FA1.1 gives the per-centage of predictions for which the values of the ratio betweenobserved and predicted concentrations are in the range from 0.9to 1.1, and therefore it shows whether the model has the sameaccuracy for different output values (e.g. low, mean, high). MAEand RMSE measure residual errors and they are valuable to themodel as they indicate the error in the output units.

3. Results and discussion

3.1. Evaluation of normalization techniques

Before normalization, the main dataset is divided into threesub-datasets: training (used for finding the appropriate weightfor each input), validation (used for the determination of thesmoothing factor and ISF during the training process) and the testdataset (used for the evaluation of actual model performance). Thevalidation and test datasets were extracted randomly, choosing303 (20%) and 158 (10%) data patterns from the available 1512data patterns, respectively.

Normalization techniques were tested using the GRNN modelwith all available inputs and the values of the performance metricsare presented in Fig. 3. The model created using min–max normal-ized input values demonstrated the best performance, while themodel with z-score values demonstrated the worst performance.As can be seen in Fig. 3, the application of sigmoid and tanh func-tions on the z-scores normalized values led to the creation of GRNNmodels with improved performance compared to the originalz-score model. Min–max normalized values can be adopted as

Page 6: Modelling of dissolved oxygen in the Danube River using ... · Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty

Fig. 4. Performance indicators for GRNN models with: (a) VIF selected inputs; (b) CA selected inputs; (c) GA selected inputs.

1900 D. Antanasijevic et al. / Journal of Hydrology 519 (2014) 1895–1907

the optimal form of data for the GRNN model for the purpose of DOforecasting; this model will be further referred to as GRNN19, andit will be used for benchmarking the performance of modelscreated using different input selection techniques.

3.2. Selection of inputs

The first input selection strategy comprised the elimination ofhighly correlated inputs using VIF analysis. The VIF values for

primary selected inputs (Table 2) are given in Table 3. Since a gen-eral suggestion is that an input should be removed if it has theVIF > 10 (Kroll and Song, 2013), in this case only one input, alkalin-ity, needed to be removed. Since the goal was a reduction of morethan one input, an additional eight GRNN models (marked as VIF)were created by sequentially removing inputs with the highest VIFvalues. The values of performance metrics are presented in Fig. 4and they will be discussed along with the results of other inputselection techniques, at the end of this section.

Page 7: Modelling of dissolved oxygen in the Danube River using ... · Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty

Table 4The results of correlation analysis.

T pH Cond. CO2 HCO3� Alk. COD TSS NO3-N Cl� SO4

2� PO43� P Ca Mg Hard. Na K BOD

pH 0.03 1.00Cond.a �0.59 0.03 1.00CO2 0.02 �0.59 �0.05 1.00HCO3

� �0.49 �0.11 0.63 0.15 1.00Alk.b �0.50 �0.03 0.67 0.11 0.94 1.00COD 0.07 0.30 �0.08 �0.32 �0.18 �0.18 1.00TSS 0.00 0.12 �0.10 �0.18 �0.17 �0.19 0.40 1.00NO3-N �0.54 0.06 0.41 �0.16 0.23 0.25 0.08 0.05 1.00Cl� �0.38 0.07 0.55 �0.11 0.25 0.27 0.07 0.08 0.32 1.00SO4

2� �0.30 0.08 0.36 �0.25 0.19 0.20 0.18 0.13 0.23 0.45 1.00PO4

3� �0.09 �0.28 0.16 0.22 0.17 0.15 �0.19 �0.07 0.06 0.06 0.07 1.00P 0.02 �0.07 0.06 0.01 0.00 �0.01 0.09 0.17 0.03 0.10 0.15 0.70 1.00Ca �0.32 �0.06 0.34 0.11 0.38 0.40 �0.09 �0.03 0.16 0.16 0.18 0.10 0.03 1.00Mg �0.24 0.03 0.46 0.09 0.46 0.48 �0.15 �0.15 0.15 0.23 0.15 0.10 0.00 0.14 1.00Hard.c �0.53 �0.03 0.65 0.12 0.70 0.73 �0.20 �0.17 0.31 0.34 0.33 0.19 0.04 0.43 0.68 1.00Na �0.40 0.12 0.59 �0.24 0.31 0.32 0.18 0.13 0.35 0.71 0.57 0.02 0.10 0.18 0.24 0.36 1.00K �0.17 �0.01 0.31 �0.05 0.17 0.18 0.04 0.01 0.18 0.29 0.23 0.06 0.05 0.09 0.19 0.23 0.33 1.00BOD 0.14 0.30 �0.08 �0.16 �0.15 �0.11 0.44 0.09 �0.04 0.00 0.03 �0.16 �0.02 �0.08 �0.05 �0.12 0.03 �0.02 1.00

DO �0.73 0.32m 0.43 �0.19g 0.32m 0.39 0.09e 0.04d 0.43 0.31l 0.27k �0.04d �0.04d 0.23j 0.20h 0.40 0.31l 0.10f 0.22i

a Conductivity.b Alkalinity.c Hardness.d TSS, PO4

3� and P are removed from CA16 model.e COD is additionally removed from CA15 model.f K is additionally removed from CA14 model.g CO2 is additionally removed from CA13 model.h Mg is additionally removed from CA12 model.i BOD is additionally removed from CA11 model.j Ca is additionally removed from CA10 model (the best model).k SO4

2� is additionally removed from CA9 model.l Cl� and Na are removed from CA7 model.

m HCO3� and pH are removed from CA5 model.

Table 5ISFs values for the created GRNN models.

Input Model

GRNN19 GA18 GA17 GA16 GA15 GA14 GA13 GA12 GA11 GA10 GA9 GA8 GA7 GA6

T 2.6 2.1 2.8 1.9 2.5 2.7 2.5 2.0 2.8 2.8 2.9 2.1 2.5 2.4pH 2.4 2.3 2.0 1.2 2.0 2.2 2.6 1.6 1.8 2.5 1.8 1.9 1.5 2.2Alkalinity 1.2 1.7 2.9 2.1 1.5 2.0 1.6 2.2 2.9 2.8 2.2 2.5 2.2 2.4NO3-N 2.8 2.9 2.3 2.1 1.3 2.9 3.0 2.2 2.8 2.8 2.9 2.8 2.5 2.9SO4

2� 2.5 2.7 2.2 1.1 2.2 1.9 2.8 1.3 2.4 2.5 2.3 2.3 2.2 2.6Na 3.0 2.2 2.7 1.3 1.8 2.5 1.9 1.2 1.0 2.8 1.0 1.7 1.1 0.7Mg 2.2 1.2 3.0 2.1 2.1 2.7 1.3 1.2 1.1 2.6 1.0 1.4 1.1Hardness 2.8 2.4 3.0 1.4 2.8 1.6 2.0 0.7 1.4 2.8 2.5 1.3Cl� 1.3 0.7 1.0 0.8 0.7 1.4 1.0 0.9 0.8 1.7 0.6CO2 1.5 2.5 1.8 1.6 1.4 1.5 1.2 0.7 0.5 0.3K 0.8 1.7 2.2 2.3 1.1 2.8 2.6 1.2 0.1Conductivity 1.7 2.7 1.4 1.9 0.4 1.8 2.2 0.4Ca 2.6 1.6 1.7 2.1 1.8 1.1 0.8BOD 1.6 1.0 0.9 0.7 0.6 0.8COD 0.7 1.1 1.0 1.5 0.2PO4

3� 2.4 1.0 0.7 0.4TSS 1.5 1.8 0.0P 2.6 0.5HCO3

� 0.4a

a Bold numbers indicate removed inputs.

D. Antanasijevic et al. / Journal of Hydrology 519 (2014) 1895–1907 1901

The second input selection strategy was to determine and elim-inate inputs that are weakly correlated with the output. For thispurpose, correlation analysis was applied, with the obtained corre-lation coefficients presented in Table 4. The results of the createdmodels (marked as CA) are presented in Fig. 4, and they will alsobe discussed at the end of this section.

The last input selection strategy was based on the ISF valuesdetermined during the GRNN training, since they indicate therelative importance of each particular input to the model. The ISF

obtained for GRNN19 and the models created by the eliminationof inputs with the smallest IFS value (marked as GA) are presentedin Table 5.

Regarding the results obtained from the 33 GRNN modelscreated using three different input selection strategies, it can beconcluded there are two clearly defined models created usingtwo different strategies (GA14 and CA10) that demonstrated simi-lar or better performance than the GRNN19 model. The CA10model is selected for further analysis, since it showed the best

Page 8: Modelling of dissolved oxygen in the Danube River using ... · Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty

Fig. 5. Measured versus modeled DO plot: (a) GRNN19; (b) GA14; (c) CA10.

Fig. 6. The monthly average profiles of DO content.

1902 D. Antanasijevic et al. / Journal of Hydrology 519 (2014) 1895–1907

performance of all CA models, while GA14 is selected since it hasthe smallest number of inputs and highest FA1.1 value comparedto other GA models.

The third strategy, VIF analysis, proved to be inefficient in thiscase, the reason being the existence of only one highly correlatedpair of inputs (Alkalinity and HCO3

�) (Table 4.). Therefore, eliminat-ing inputs using this technique led to the decrease of overall modelperformance (Fig. 4).

The plots of observed versus the modeled DO values for theGRNN19, GA14 and CA10 models are presented in Fig. 5. It canbe clearly seen that CA10 produces the most accurate DO predic-tions (R2 = 0.853), without outliers which are apparently character-istic for the GRNN19 and GA14 models. In addition, it should benoted that the CA10 model has only 10 inputs (T, pH, HCO3

�,SO4

2�, NO3-N, Hardness, Na, Cl�, Conductivity and Alkalinity),which is a reduction of almost 50% compared to the GRNN19model. Therefore, the CA10 model can be selected as the optimalDO forecasting model and its uncertainty will be discussed in thenext section.

A relatively lower correlation (R2 = 0.85) achieved between themeasured and modeled DO, probably occurred due to the non-homogenous nature of the measured water quality input and DOvalues, since they were measured over a long period of time(9 years), with sampling sites distributed over a large geographicalarea, and under different atmospheric and water flow conditions.The monthly average profiles of DO content during the analyzedperiod for each monitoring site are presented in Fig. 6. Specificatmospheric conditions (temperature, pressure, wind, etc.) andthe variance in water flow, which were measured between thesampling sites, ranged from 1250 to 5690 m3 s�1 (Antanasijevicet al., 2013a) and may have caused differences in the turbulenceat the monitoring stations, which could in turn limit oxygen solu-bility in the water and hence directly affect the DO values. Sincethe corresponding data was not available, atmospheric conditionsand water flow were not taken into account, and therefore themodel cannot predict their influence on the DO content.

3.3. MCS uncertainty analysis

The first step was to determine the uncertainty and shapes ofthe probability density functions (PDFs) of the input variables.The assumed PDFs for inputs used in the CA10 model and the val-ues of Kolmogorov–Smirnov test are presented in Table 6, with theexamples of PDFs presented in Fig. 7. For sensitivity analysis usingMCS, 10,000 input patterns (vectors) were generated randomlyaccording to the selected PDFs (Table 6). The dataset was con-structed with 10 blocks of 1000 patterns per input, where eachblock had one input which had MCS values in the defined range,while other inputs had measured mean values (Table 2). The testedinput ranges and obtained DO ranges are presented in Table 6.

The DO ranges obtained show that the inputs with the highestinfluence on the DO values in the CA10 model are T, pH and HCO3-� (DDO = 3.59–7.38 mg/l). The second group includes four inputs(SO4

2�, NO3-N, Hardness, Na) the changes of which caused the DOto change from 1.67 to 2.88 mg/l, while in the third group withthe lowest influence (DDO 6 1.03 mg/l) are Cl�, Conductivity andAlkalinity.

Page 9: Modelling of dissolved oxygen in the Danube River using ... · Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty

Table 6Simulated PDFs with the input ranges and obtained DO ranges.

Input rank Input PDF Kolmogorov–Smirnov test DIa DDO (mg/l)

Stat. Sig. (p)

1. T Gaussian mixture (Fig. 7a) 0.0277 0.190 30.8 7.382. pH Gaussian mixtureb 0.0802 0.000 1.9 3.593. HCO3

� Gaussian mixture 0.0208 0.523 269.1 3.594. SO4

2� Gaussian mixtureb 0.0389 0.020 107 2.885. NO3-N General extreme value 0.0268 0.225 12.81 2.476. Hardness Folded normal (Fig. 7b) 0.0314 0.099 315.2 1.727. Na Gaussian mixtureb 0.0230 0.040 54.4 1.678. Cl� Johnson SU (Fig. 7c) 0.0278 0.191 48.49 1.039. Conductivity Gaussian mixture 0.0110 0.992 443 0.73

10. Alkalinity Log normal (Fig. 7d) 0.0250 0.295 207.5 0.49

a mg/l except for T (�C), pH and conductivity (lS/cm).b Best ranked PDF.

Fig. 7. Examples of simulated PDFs: (a) Gaussian mixture for T; (b) Folded normal for hardness; (c) Johnson SU for Cl�; (d) Log normal for alkalinity.

D. Antanasijevic et al. / Journal of Hydrology 519 (2014) 1895–1907 1903

The effect of the input variability on the variability of DOcontent in the river body was quantified using standard deviation(Fig. 8a). The inputs with the highest impact can be ranked as fol-lows: T > NO3-N > pH > Na > remaining inputs. Since temperaturehas a major influence on the modeled DO values, the impact ofits variability was closely examined by splitting the test tempera-ture range to ten smaller intervals, each being 3.08 �C. Theobtained standard deviations for each temperature interval alongwith the model response to the change of temperature are pre-sented in Fig. 8b. The impact of temperature variability is highestat both ends of the temperature range. Also, the obtained DOdecreasing trend caused by the increase of temperature, with a rateof�0.239 mg/�C is very close to the theoretical rate of�0.233 mg/�C

(at 1 atm pressure and for DT = 0–30 �C) (Wetzel, 2001). Thisproves that the CA10 model has been adequately trained and iscapable of precisely predicting the DO–T dependence.

In order to determine the uncertainty of results of the DOmodel, another set of input data was randomly generated respect-ing the PDFs described for each input, as defined in Table 6 with1000 simulations being performed. For the validation of the PDFapplied for the CA10 results, the PDFs were also determined forthe measured DO values and then compared. Descriptive and dis-tribution statistics are presented in Table 7 and it can be seen thatthe same 6-parameter highly significant Gaussian mixture PDF(p > 0.05) was determined. The empirical and Gaussian mixturecumulative distribution function (CDF) for different DO data

Page 10: Modelling of dissolved oxygen in the Danube River using ... · Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty

Fig. 8. Predicted mean DO value and standard deviations related to: (a) the uncertainties of inputs; (b) the uncertainties of temperature for defined temperature intervalswithin the tested temperature range.

Table 7Statistics of measured DO data and predictions for different datasets.

Statistical metrics Dataset

Test data All data MCS

Measured CA10 model Measured CA10 model

Descriptive statisticsN 158 158 1512 1000Mean 9.91 9.93 9.95 9.76St. dev. 2.25 1.97 2.13 1.61Minimum 4.2 5.55 3.7 5.71Maximum 16.8 15.57 16.9 15.16Confidence �95% 9.56 9.62 9.84 9.66Confidence +95% 10.27 10.24 10.06 9.86

Distributions statisticsPDF Gaussian mixture Gaussian mixture Gaussian mixture Gaussian mixtureKolmogorov–Smirnov test Stat. 0.0405 0.0431 0.0191 0.0338

Sig. (p) 0.949 0.9187 0.6309 0.1986PDF parameters 1 0.331 0.302 0.373 0.360

2 7.770 7.812 8.137 7.9733 1.487 1.047 1.477 0.7804 0.668 0.698 0.626 0.6405 10.98 10.85 11.03 10.766 1.740 1.498 1.682 0.946

1904 D. Antanasijevic et al. / Journal of Hydrology 519 (2014) 1895–1907

(measured and predicted) are shown in Fig. 9. Fig. 9 also containsthe lower and upper 95% confidence bands for the obtainedempirical CDFs, and it shows that even though the CA model wasapplied on different input data, the distributions obtained are verysimilar to the corresponding distributions of real (measured) data,including the data populating the extreme areas of the distributioncurve.

4. Conclusions

The ANN model along with the GRNN architecture described inthis paper were developed in order to predict the DO content in theDanube River. The GRNN was trained, validated and tested using19 water quality parameters, measured monthly and fortnightlyover a period of 9 years, from 2002 to 2010, which were subse-quently used as inputs for this study. Five different data normaliza-tion (min–max, median, z-score, sigmoid and tanh) and three inputselection techniques (VIF, correlation analysis and geneticalgorithm) were tested. The performance of the ANN model wasevaluated using the index of agreement, the mean absolute error,

the root mean squared error and the percentage of predictionswithin a factor of 1.1 of the observed values (FA1.1).

The best results were obtained using min–max normalized dataand the input selection based on the correlation between DO anddependent variables, which provided the most accurate GRNNmodel, and in combination with the smallest number of inputs:T, pH, HCO3

�, SO42�, NO3-N, Hardness, Na, Cl�, Conductivity and

Alkalinity. The results show that the correlation coefficient betweenpredicted and measured DO values is 0.85 with an FA1.1 of 81%.

The Monte Carlo Simulation (MCS) technique was also used forthe determination of the relative importance of uncertainty accord-ing to different input variables. The MCS sensitivity analysis for thebest DO model range that had the most influence on the CA10model included (arranged in descending order) T, pH, HCO3

�, SO42�

and NO3-N. The effect of the input variability on the variability ofDO content in the river body was quantified using standard devia-tion. Furthermore, the MCS uncertainty analysis was performed onthe model results. It was concluded that the ANN could effectivelyforecast the DO content, since the model result distributions arevery similar to the corresponding distributions of real (measured)

Page 11: Modelling of dissolved oxygen in the Danube River using ... · Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty

Fig. 9. Cumulative distribution function (CDF) with 95% confidence bounds for (a) measured DO (test data); (b) CA10 DO predictions (test data); (c) measured DO (all data);(d) DO predictions for the 1000 Monte Carlo Simulations with CA10 model.

D. Antanasijevic et al. / Journal of Hydrology 519 (2014) 1895–1907 1905

data, including the data populating the extreme areas of the distri-bution curve.

Acknowledgement

The authors are grateful to the Ministry of Education, Scienceand Technological Development of the Republic of Serbia, ProjectNo. 172007 for financial support.

Appendix A. Detail on the application of ANNs for DO modeling

For details on papers related to ANN models of DO at a singlesite, please see Table A1.

For details on papers related to ANN models of DO across multi-ple sites, please see Table A2.

Appendix B. Abbreviations and formulas of the modelperformance indicators used in this study and presented in theAppendix A

Co observed value; Cp: predicted value; Co: mean of theobserved data, n: the number of data patterns (observations).

The index of agreement ðIAÞ : IA ¼ 1� ðCp � CoÞ2

Cp � Co

�� ��þ jCo � C0jh i2

ð11Þ

The percent of predictions within a factor of 1:1 of

the observed valuesðFA1:1Þ : 0:9 <Cp

Co< 1:1 ð12Þ

The root mean squared errorðRMSEÞ :

RMSE ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðCp � CoÞ2h ir

ð13Þ

The mean absolute errorðMAEÞ and Bias :

MAE ¼ 1n

Xn

i¼1

Cp � Co

�� �� ¼ Bias ð14Þ

Standard error of predictionðSEPÞ :

SEP ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPni¼1ðCp � Co � BiasÞ2

n� 1

sð15Þ

Page 12: Modelling of dissolved oxygen in the Danube River using ... · Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty

Table A1A summary of ANN models for DO forecasting at single location.

Authors Samandar (2010) He et al. (2011) Ay and Kisi (2012) Antanasijevic et al. (2013a) Heddam (2014)River (country) Melen (Turkey) Bow (Canada) Foundation Creek (USA) Danube (Serbia) Upper Klamath

(USA)Data details Time period (years) –a 3 8 5 3

Number of data patterns 159 543 2063 73 2942Representation Random Time series Time series Time series Random

ANN architecture(I–H–O, I–H|H–Oor I–P–S–O)b

FFNNc

(8–17–1)FFNN(2–4–1)

FFNN(3–1–1)

RBNNd

(3–1–1)FFNN(4–10–1)

RNNe

(4–10|1–1)GRNN(4–61–2–1)

GRNN(4–2942f–2–1)

Training algorithm Back-propagation

Back-propagation

Back-propagation

Radial basisfunction

Back-propagation

Back-propagation

Geneticalgorithm

–a

Performanceindicatorsg

R2 0.84 0.90 0.76 0.81 0.76 0.87 0.85 0.98RMSE (mg/l) – 0.47 0.61 0.55 0.83 0.59 0.78 0.49MAE (mg/l) – 0.38 0.44 0.40 0.72 0.49 0.60 0.32IA – – – – 0.93 0.97 0.95 0.99FA1.1 (%) – – – – 82 100 82 –

a Not stated.b The number of neurons in layer: input (I), hidden (H), pattern (P), summation (S) and output (O).c Feed-forward neural network – includes multilayer perceptron (MLP).d Radial basis neural network.e Recurrent neural network.f Not stated – the authors assessment.g For indicator abbreviations and formulations see Appendix B.

Table A2A summary of ANN models developed for DO forecasting across multiple sites.

Authors Singh et al. (2009) Basant et al. (2010) Najah et al. (2011) Wen et al. (2013)River (country) Gomti (India) Johor (Malaysia) Heihe (China)Data details Number of monitoring station 8 4 3

Time period (years) 10 5 6Number of data patterns 960 –a 164Representation Time series Randomb –a Random

ANN architecture (I–H–O) c FFNNd (11–23–1) FFNNe (11–21–2) FFNN (5–17–1) FFNN (8–14–1)Training algorithm BP BP BP BPPerformance indicatorsf R2 0.76 0.71 0.99 0.97

RMSE (mg/l) 1.23 1.36 0.17 0.46Bias (mg/l) �0.43 �0.51 – –SEP (mg/l) – 1.27 – –Ef – 0.66 0.98 –Af – 1.38 –

a Not stated.b Using the Kennard–Stone algorithm.c The number of neurons in layer: input (I), hidden (H) and output (O).d Feed-forward neural network – includes multilayer perceptron (MLP).e DO was simultaneously modeled with BOD.f Ror indicator abbreviations and formulations see Appendix B.

1906 D. Antanasijevic et al. / Journal of Hydrology 519 (2014) 1895–1907

The Nash—Sutcliffe coefficient of efficiencyðEf Þ :

Ef ¼ 1�Pn

i¼1ðCp � CoÞ2Pni¼1ðCp � CoÞ

2 ð16Þ

The accuracy factorðAf Þ : Af ¼ 10

X logCpCo

��� ���n

0@

1A

ð17Þ

References

Antanasijevic, D., Pocajt, V., Povrenovic, D., Peric-Grujic, A., Ristic, M., 2013a.Modelling of dissolved oxygen content using artificial neural networks: DanubeRiver, North Serbia, case study. Environ. Sci. Pollut. Res. 20, 9006–9013.

Antanasijevic, D.Z., Pocajt, V.V., Povrenovic, D.S., Ristic, M.Ð., Peric-Grujic, A.A.,2013b. PM 10 emission forecasting using artificial neural networks and geneticalgorithm input variable optimization. Sci. Total Environ. 443, 511–519.

Araoye, P.A., 2009. The seasonal variation of pH and dissolved oxygen (DO2)concentration in Asa lake Ilorin, Nigeria. Int. J. Phys. Sci. 4, 271–274.

Arhami, M., Kamali, N., Rajabi, M.M., 2013. Predicting hourly air pollutant levelsusing artificial neural networks coupled with uncertainty analysis by MonteCarlo simulations. Environ. Sci. Pollut. Res. 20, 4777–4789.

Ay, M., Kisi, O., 2012. Modeling of dissolved oxygen concentration using differentneural network techniques in Foundation Creek, El Paso County, Colorado. J.Environ. Eng. 138, 654–662.

Banerjee, P., Singh, V.S., Chatttopadhyay, K., Chandra, P.C., Singh, B., 2011. Artificialneural network model as a potential alternative for groundwater salinityforecasting. J. Hydrol. 398, 212–220.

Basant, N., Gupta, S., Malik, A., Singh, K.P., 2010. Linear and nonlinear modeling forsimultaneous prediction of dissolved oxygen and biochemical oxygen demandof the surface water—a case study. Chemometr. Intell. Lab. Syst. 104, 172–180.

Borrego, C., Monteiro, A., Ferreira, J., Miranda, A.I., Costa, A.M., Carvalho, A.C., Lopes,M., 2008. Procedures for estimation of modelling uncertainty in air qualityassessment. Environ. Int. 34, 613–620.

Chang, F.-J., Chen, P.-A., Liu, C.-W., Liao, V.H.-C., Liao, C.-M., 2013. Regionalestimation of groundwater arsenic concentrations through systematicaldynamic-neural modeling. J. Hydrol. 499, 265–274.

Chapra, S., Pellettier, G., 2003. QUAL2K: A Modeling Framework for Simulating Riverand Stream Water Quality. Civil and Environmental Engineering Dept., TuftsUniversity, Medford.

Chen, Y.-H., Chang, F.-J., 2009. Evolutionary artificial neural networks forhydrological systems forecasting. J. Hydrol. 367, 125–137.

Dehghani, M., Saghafian, B., Nasiri Saleh, F., Farokhnia, A., Noori, R., 2014.Uncertainty analysis of streamflow drought forecast using artificial neuralnetworks and Monte-Carlo simulation. Int. J. Climatol. 34, 1169–1180.

Dimitrovska, O., Markoski, B., Toshevska, B.A., Milevski, I., Gorin, S., 2012. Surfacewater pollution of major rivers in the Republic of Macedonia. Procedia Environ.Sci. 14, 32–40.

Page 13: Modelling of dissolved oxygen in the Danube River using ... · Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty

D. Antanasijevic et al. / Journal of Hydrology 519 (2014) 1895–1907 1907

Hanna, S.R., Chang, J.C., Fernau, M.E., 1998. Monte Carlo estimates of uncertaintiesin predictions by a photochemical grid model (UAM-IV) due to uncertainties ininput variables. Atmos. Environ. 32, 3619–3628.

Hanna, A.M., Ural, D., Saygili, G., 2007. Neural network model for liquefactionpotential in soil deposits using Turkey and Taiwan earthquake data. Soil Dynam.Earthquake Eng. 27, 521–540.

He, J., Chu, A., Ryan, M.C., Valeo, C., Zaitlin, B., 2011. Abiotic influences on dissolvedoxygen in a riverine environment. Ecol. Eng. 37, 1804–1814.

Heddam, S., 2014. Generalized regression neural network-based approach formodelling hourly dissolved oxygen concentration in the Upper Klamath River,Oregon, USA. Environ. Technol. 35, 1650–1657.

The International Commission for the Protection of the Danube River (ICPDR), 2014.Countries of the Danube River Basin. <http://www.icpdr.org/main/danube-basin/countries-danube-river-basin#>.

Jayalakshmi, T., Santhakumaran, A., 2011. Statistical normalization and backpropagation for classification. Int. J. Comput. Theor. Eng. 3, 1793–8201.

Kalogirou, S.a., 2003. Artificial intelligence for the modeling and control ofcombustion processes: a review. Prog. Energy Combust. Sci. 29, 515–566.

Khalil, B., Ouarda, T.B.M.J., St-Hilaire, A., Chebana, F., 2010. A statistical approach forthe rationalization of water quality indicators in surface water qualitymonitoring networks. J. Hydrol. 386, 173–185.

Kim, S., Kim, H.S., 2008. Neural networks and genetic algorithm approachfor nonlinear evaporation and evapotranspiration modeling. J. Hydrol. 351,299–317.

Kisi, O., Ozkan, C., Akay, B., 2012. Modeling discharge–sediment relationship usingneural networks with artificial bee colony algorithm. J. Hydrol. 428–429,94–103.

Kroll, C.N., Song, P., 2013. Impact of multicollinearity on small sample hydrologicregression models. Water Resour. Res. 49, 3756–3769.

Maier, H.R., Jain, A., Dandy, G.C., Sudheer, K.P., 2010. Methods used for thedevelopment of neural networks for the prediction of water resource variablesin river systems: current status and future directions. Environ. Model. Softw. 25,891–909.

Mannina, G., Viviani, G., 2010. Water quality modelling for ephemeral rivers: modeldevelopment and parameter assessment. J. Hydrol. 393, 186–196.

Najah, A., El-Shafie, A., Karim, O.A., Jaafar, O., El-shafie, A.H., 2011. An application ofdifferent artificial intelligences techniques for water quality prediction. Int. J.Phys. Sci. 6, 5298–5308.

Nayak, P.C., Sudheer, K.P., Ramasastri, K.S., 2005. Fuzzy computing based rainfall–runoff model for real time flood forecasting. Hydrol. Process. 19, 955–968.

Reichert, P., Borchardt, D., Henze, M., Rauch, W., Shanahan, P., Somlyody, L.,Vanrolleghem, P., 2001. River Water Quality Model No. 1, IWA. London, UK.

Samandar, A., 2010. Ranking water quality variables using feature selectionalgorithms to improve generalization capability of artificial neural networks.Sci. Res. Essays 5, 1254–1259.

Serbian Agency for Environmental Protection (SEPA), 2013. <http://www.sepa.gov.rs/>.

Shrestha, D.L., Kayastha, N., Solomatine, D.P., 2009. A novel approach to parameteruncertainty analysis of hydrological models using neural networks. Hydrol.Earth Syst. Sci. 13, 1235–1248.

Singh, K.P., Basant, A., Malik, A., Jain, G., 2009. Artificial neural network modeling ofthe river water quality—a case study. Ecol. Model. 220, 888–895.

Specht, D.F., 1991. A general regression neural network. IEEE Trans. NeuralNetworks 2, 568–576.

StatSoft. Inc., 2010. Statistica (Data Analysis Software System), version 10 trial.Tulsa, USA.

Wen, X., Fang, J., Diao, M., Zhang, C., 2013. Artificial neural network modeling ofdissolved oxygen in the Heihe River, Northwestern China. Environ. Monit.Assess. 185, 4361–4371.

Wetzel, R.G., 2001. Limnology, third ed. Academic Press, London, UK.Willmott, C.J., 1984. On the evaluation of model performance in physical geography.

In: Gaile, G.L., Willmott, C.J. (Eds.), Spatial Statistics and Models. Springer,Netherlands, Dordrecht, pp. 443–460.

Wool, T.A., Ambrose, R.B., Martin, J.L., Comer, E.A., 2006. Water Quality AnalysisSimulation Program (WASP) Version 6.0 DRAFT: User’s Manual. USEnvironmental Protection Agency, Athens, GA.