MAL1303: STATISTICAL HYDROLOGY
Fitting Distribution & Markov Chain Analysis
Dr. Shamsuddin ShahidDepartment of Hydraulics and Hydrology
Faculty of Civil Engineering, Universiti Teknologi Malaysia
Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586 Email: [email protected]
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
One common application of probability distributions is modeling uni-variate data with a specific probability distribution. This involves thefollowing two steps:
1. Determination of the "best-fitting" distribution.2. Estimation of the parameters (shape, location, and scale parameters)
for that distribution
There are various methods, both numerical and graphical, for estimatingthe parameters of a probability distribution:
1. Moments2. Maximum likelihood3. Least squares4. Probability plots5. Statistical tests
Modeling Distribution
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Probability Plot
Statistical Tests
• Chi-square Test• Kolmogorov-Smirnov (K-S) Test• Anderson-Darling Test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Fitting Data Distribution
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Fitting Data Distribution
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Fitting Data Distribution
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Chi-square Test
Kolmogorov-Smirnov (K-S) Test
Anderson-Darling (AD) Test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Kolmogorov-Smirnov (K-S) Test
A fully non-parametric test for comparing two distributions Does not depend on approximations for the distribution
Given two cumulative probability functions FX and FY, the test statistics are
Usually the value D=max{D+, D-} is used
))()((max
))()((max
xFxFD
xFxFD
XYx
YXx
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
))()((max
))()((max
xFxFD
xFxFD
XYx
YXx
Kolmogorov-Smirnov (K-S) Test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
It is non-parametric and hence robust It does not rely on the mean’s location only (like the t-test) It works for non-normal data (the t-test can fail if the data is too far
from normal) It is not sensitive to scaling It is more powerful than χ2
However, it is less sensitive than t if the data is indeed normal
Kolmogorov-Smirnov Test: Advantages
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Problem:Samples of groundwaterDepth (meter) in a catchmentare collected as given below.What is the distribution ofdata?
Kolmogorov-Smirnov (K-S) Test: Example
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Normal Distribution
Probability Plots
Gamma Distribution (=2; =2)
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
To know the distribution of data, we have to fit the data with different types of distribution.
Let us, first try with Gamma Distribution with =2; =2
Therefore,
Ho: Groundwater depth data is following Gamma Distribution (=2; =2)
Ha: Groundwater depth data is not following Gamma Distribution (=2; =2)
Kolmogorov-Smirnov Test (K-S)
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Steps:1. Arrange data in order2. Rank the data3. Calculate observed cumulative
Frequency as 1/(n+1)4. Calculate the expected cumulative
frequency of data for a particular distribution of interest.
Kolmogorov-Smirnov (K-S) Test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Steps:The expected cumulative frequency of data for a particular distribution of interest.
In the present case we calculate the expected cumulative frequency for Gamma distribution (=2; =2)
GAMMADIST (x, , , cumulative)
Example:GAMMADIST(4.13, 2, 2, 1)= 0.6113
Kolmogorov-Smirnov (K-S) Test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
The K–S statistic Dn is defined as:
Dn = max[|Fn(x) – F(x)|]
Where ,n = total number of data points F(x) = distribution function of the fitted distribution Fn(x) = i/n+1 i = the cumulative rank of the data point.
Kolmogorov-Smirnov (K-S) Test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
The K–S statistic,
Dn = max[|Fn(x) – F(x)|]
= 0.5302
α=0.05 ; n = 17 DCritical = 0.318 Since 0.5302> 0.318
Null Hypothesis isrejected.
Decision:Groundwater depth issignificantly differentfrom Gammadistribution (=2; =2)
Kolmogorov-Smirnov (K-S) Test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Let us, now try with normal distribution.
Therefore,
Ho: Groundwater depth data is Normally Distribution
Ha: Groundwater depth data is not Normally Distribution
Kolmogorov-Smirnov (K-S) Test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
In the present case we calculate the expected cumulative frequency for normal distribution
Mean of the Data = 5.34Standard Deviation = 0.865722
NORMDIST (x, mean, stdev, cum)
NORMDIST(4.13, 5.34, 0.865722,1)= 0.0811
Kolmogorov-Smirnov (K-S) Test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
The K–S statistic,
Dn = max[|Fn(x) – F(x)|]
= 0.1648
α=0.05 ; n = 17 DCritical = 0.318 Since 0.1648 < 0.318
Null Hypothesis can notbe rejected.
Decision:Groundwater depth innot significantlydifferent from normaldistribution
Kolmogorov-Smirnov (K-S) Test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Anderson-Darling (AD) Test
Anderson-Darling (AD) test is also widely used in practice.
AD goodness of fit test can be done by using following formula:
nZFln(ZFln(n
iAD )in()i(
n
i
100
1121
Hypothesis rejected if: AD > CV
Where, CV = 0.752/(1+0.75/n + 2.25/n2)
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Anderson-Darling (AD) Test: Example
Groundwater depth data of a catchment is givenbelow. Find the best distribution that fits the data.
Solution:
First, we shall try with Normal Distribution. Thenother distributions.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
nZFln(ZFln(n
iAD )in()i(
n
i
100
1121
Anderson-Darling (AD) Test: Example
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Gamma Distribution with =1; =5.Decision:Groundwater depth in not significantly different fromGamma Distribution with =1; =5.
Anderson-Darling (AD) Test: Example
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
A stochastic process is the counterpart to a deterministic process.
Instead of dealing with only one possible way the process might developover time, in a stochastic or random process there is some indeterminacydescribed by probability distributions.
This means that even if the initial condition (or starting point) is known,there are many possibilities the process might go to, but some paths maybe more probable and others less so.
In the simplest possible case, a stochastic process amounts to a sequenceof random variables known as a time series.
Stochastic Processes
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Stochastic hydrology is mainly concerned with the assessment ofuncertainty in model predictions
Stochastic hydrology is an essential base of water resourcessystems analysis, due to the inherent randomness of the input,and consequently of the results.
Stochastic hydrology is very important in decision-making processregarding the planning and management of water systems.
Stochastic Hydrology
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
In the simplest possible case, a stochastic process amounts to a sequenceof random variables known as a time series.
Stochastic process recognize the pattern of random events with certainuncertainty.
This process is known as Markov Chain Analysis.
Stochastic Hydrology
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• A Markov chain, named for Andrey Markov, is a random processbased on that the next state depends only on the current stateand not on the past.
• A Markov analysis looks at a sequence of events, and analyzes thetendency of one event to be followed by another.
• A Markov process is useful for analyzing dependent randomevents - that is, events whose likelihood depends on whathappened last.
• The Markov chain is based on the assumption that the occurrenceof one event depends upon the previous events.
Markov Chain Analysis
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Markov chains are widely used in Hydrology. It is used to predictoccurrence of hydrological events.
• Markov chain analysis has been used to quantify tendencies ofhydrological processes. Does certain phenomena will increase forthe time being?
• Prediction of hydrological hazards or any other natural events.
• Prediction of weather, river discharge, etc.
Markov Chain Analysis
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Markov Chain Analysis
Let us consider sequence of events as given below:
ABACAABCABBBCABCCABBCA
Is there any pattern present in the sequence?
Apparently there is no clear pattern of occurrence of events.
Markov chain tries to find the patterns present in the sequence.
Once patterns are identified, it is possible to predict the possibility of future events.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Markov Analysis: Example
Let us consider, rainfall time series data for twenty years are given below. We want identify the pattern in rainfall and predict the future rainfall.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Markov Analysis: Example
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Sequence of Precipitation Climate is:
D N N N W W VW N N W W VW N D N N W W W VW D
Markov Analysis: Example
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Markov Analysis
Step-1: Find the transitional frequency matrix
D N N N W W VW N N W W VW N D N N W W W VW D
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Markov Analysis
Step-2: Find thetransitionalprobability matrix
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Markov Analysis
Step-3:Construction offlow diagram
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Markov Analysis
Step-4: Find thelikely cycles.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Markov Analysis: Testing of Transitional Frequency Matrix
N 0.4W 0.35VW 0.15D 0.1
2 Test for randomness in transition frequency matrix
Dividing each column total of theobserved transition frequency matrixby the total number of transitions,the fixed probability vector iscalculated. The expected randomtransition probability matrix is thendetermined by these probabilities:
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Markov Analysis
The probabilities areconverted into expectedcounts by multiplying by rowtotals for the observedtransitions frequency matrixto give the expectedrandom transition frequencymatrix
Now, we have an observedtransition frequency matrixand an expected randomtransition frequency matrixin the same form.
The difference between theobserved and expected canbe calculated by using 2
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Markov Analysis
Null Hypothesis (H0): the data come from a population oftransitions that are random; the probability of encountering aclimate is not dependent on the previous climate.
Alternative Hypothesis (HA): the data from a population oftransitions that are non-random; the probability of encountering aclimate is dependent on the previous climate.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Markov Analysis2 =(Oj – Ej)2/Ej
Degree of Freedomv = ((no. of years) – 1)2
= (4 – 1)2
= 92
(0.05,9)= 16.92
2(calculated) > 2
(critical)
Null hypothesis rejected.
Decision: There is a significantMarkov property. Theoccurrence of climate is, to anextent, dependent on precedingclimate.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Markov Analysis
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Recommended