16
Atmospheric Dynamics Final Project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution Written Report B02602005 顏顏顏 This final project was a small test on the accuracy of a developing statistic distribution model. It was aimed to discuss further properties of the model for which may be used in future air pollution project. 1. Theory The model is a Bayesian model, assuming Gaussian distribution. The velocity on each observation site gives the expected position for the next time interval, while the variance is estimated from the data of all the observation sites. Two key concepts were used to optimize the estimation: accuracy and creditability. Accuracy refers to the minimization of the squared error: j ( φ ( x i ,t n ) p ( x j x i , v ( x i ,t ) ) p ( x j x i , v ( x i ,t ) ) φ( x j ,t n+1 )) 2 Creditability refers to maximization of the representative ability of the sample points to each other. It can be done by maximizing the logarithm of the likelihood function:

Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

Embed Size (px)

Citation preview

Page 1: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

Atmospheric Dynamics Final Project: A Trivial Test on a Bayesian Statistic Distribution Model for Air PollutionWritten ReportB02602005顏東白This final project was a small test on the accuracy of a developing statistic distribution model. It was aimed to discuss further properties of the model for which may be used in future air pollution project.

1. TheoryThe model is a Bayesian model, assuming Gaussian distribution. The velocity on each observation site gives the expected position for the next time interval, while the variance is estimated from the data of all the observation sites.

Two key concepts were used to optimize the estimation: accuracy and creditability.

Accuracy refers to the minimization of the squared error:

∑j

(∑ φ ( x⃑ i , t n ) p ( x⃑ j− x⃑ i , v⃑ ( x⃑ i , t ))

∑ p ( x⃑ j− x⃑i , v⃑ ( x⃑ i , t ))−φ( x⃑ j , t n+1))

2

Creditability refers to maximization of the representative ability of the sample points to each other. It can be done by maximizing the logarithm of the likelihood function:

ln (L(σ2|(x i )n ) )=−n2 ln (2π σ 2)− 12σ2

∑ijd ij

2

Page 2: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

Minimization of the square error yields

∑ijφ j d ij

2e−d ij

2

2σ 2 ∑ije

−dij2

2σ2 =∑ijφ j e

−dij2

2σ 2 ∑ijd ij

2e−dij

2

2σ 2

Maximization of the logarithm of the likelihood function yields:

σ̂ 2= 12n2

∑ijd ij2

An appropriate selection of sites is to select the sites for which

E [∑ijφ j d ij

2e−dij

2

2 σ̂2 ∑ije

−dij2

2 σ̂ 2 ]=E[∑ijφ j e

−dij2

2 σ̂ 2 ∑ijd ij

2e−dij

2

2 σ̂ 2 ]

Where sigma hat is the estimated standard deviation from MLE. Such selection was not used due to its complexity; instead a more naïve estimation was adopted. We tried to minimize the function

(− ln (2π σ2 )− 12n2σ2

∑ijdij2)2

+(∑ijφ j d ij

2 e−dij

2

2σ2 ∑ije

−dij2

2σ 2 −∑ijφ j e

−dij2

2σ2 ∑ijd ij2 e

−dij2

2σ 2 )2

As a tradeoff between MSE and MLE considerations.

Page 3: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

2. Code

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

##Environment Setting

library("KernSmooth", lib.loc="C:/Program Files/R/R-3.2.2/library")

library("fields", lib.loc="~/R/win-library/3.2")

##Parameters Setting

#Gaussian Function

G<-function(x){

b=2.71828182846^(-x^2)/(2*pi)^0.5

return(b)

}

t=0.01

#random generator

x_0=rnorm(200,0,1)

y_0=rnorm(200,0,1)

Page 4: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

data_0=cbind(x_0,y_0)

#process initial est.

dis_0=bkde2D(data_0,c(0.5,0.5),grid=c(1001L,1001L))

se=matrix(nrow=1001,ncol=1001,byrow=TRUE)

rse=matrix(nrow=1001,ncol=1001,byrow=TRUE)

for(i in 1:1001){

for(j in 1:1001){

se[i,j]=(G(dis_0$x1[i])*G(dis_0$x2[j])-dis_0$fhat[i,j])^2

rse[i,j]=((G(dis_0$x1[i])*G(dis_0$x2[j])-dis_0$fhat[i,j])/(G(dis_0$x1[i])*G(dis_0$x2[j])))^2

}

}

##Run advection using RK4

#velocity field

ode<-function(x){

Page 5: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

m=matrix(ncol=2,nrow=1,byrow=TRUE)

m[1]=-.5*x[2]

m[2]=.5*x[1]

return(m)

}

#the Runge-Kutta Method

RK<-function(x){

k_1=ode(x)*t

k_2=ode(x+0.5*k_1)*t

k_3=ode(x+0.5*k_2)*t

k_4=ode(x+k_3)*t

x_f=x+(k_1+2*k_2+2*k_3+k_4)/6

return(x_f)

}

Page 6: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

#Run Advection

data=matrix(ncol=2,nrow=200,byrow=TRUE)

data=data_0

for(i in 1:5000){

for(j in 1:200){

data[j,]=RK(data[j,])

}

}

#process final est.

dis=bkde2D(data,c(0.5,0.5),grid=c(1001L,1001L))

se=matrix(nrow=1001,ncol=1001,byrow=TRUE)

rse=matrix(nrow=1001,ncol=1001,byrow=TRUE)

for(i in 1:1001){

for(j in 1:1001){

se[i,j]=(G(dis$x1[i])*G(dis$x2[j])-dis_0$fhat[i,j])^2

Page 7: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

rse[i,j]=((G(dis$x1[i])*G(dis_0$x2[j])-dis$fhat[i,j])/(G(dis$x1[i])*G(dis$x2[j])))^2

}

}

##Kriging Estimation

#Gaussian Function

G<-function(x){

b=2.71828182846^(-x^2)/(2*pi)^0.5

return(b)

}

t=.01

#input observed data

ns=31

obs=matrix(nrow=ns,ncol=3,byrow=TRUE)

Page 8: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

for(i in 1:10){

x_obs[i]=cos(2*pi*i/10)

x_obs[i+10]=2*cos(2*pi*i/10)

x_obs[i+20]=.5*cos(2*pi*i/10)

y_obs[i]=sin(2*pi*i/10)

y_obs[i+10]=2*sin(2*pi*i/10)

y_obs[i+20]=.5*sin(2*pi*i/10)

}

x_obs[31]=0

y_obs[31]=0

d=matrix(nrow=length(x_obs),ncol=length(y_obs),byrow=TRUE)

sigma=c()

for(j in 1:length(x_obs)){

obs[j,1]=G(x_obs[j])*G(y_obs[j])

Page 9: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

obs[j,2]=-.5*y_obs[j]

obs[j,3]=.5*x_obs[j]

}

for(j in 1:length(x_obs)){

for(k in 1:length(x_obs)){

d[j,k]=((x_obs[k]-x_obs[j])-obs[j,2]*t)^2+((y_obs[k]-y_obs[j])-obs[j,3]*t)^2

}

}

#MSE

f=c()

int=.001

L=410

U=440

c_1=matrix(nrow=length(x_obs),ncol=length(y_obs),byrow=TRUE)

Page 10: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

c_2=matrix(nrow=length(x_obs),ncol=length(y_obs),byrow=TRUE)

c_3=matrix(nrow=length(x_obs),ncol=length(y_obs),byrow=TRUE)

c_4=matrix(nrow=length(x_obs),ncol=length(y_obs),byrow=TRUE)

for(i in L:U){

for(j in 1:length(x_obs)){

for(k in 1:length(y_obs)){

c_1[j,k]=obs[j,1]*d[j,k]*G(-sqrt(d[j,k]/(int*i)))

c_2[j,k]=G(-sqrt(d[j,k]/(int*i)))

c_3[j,k]=obs[j,1]*G(-sqrt(d[j,k]/(int*i)))

c_4[j,k]=d[j,k]*G(-sqrt(d[j,k]/(int*i)))

}

}

f[i]=(sum(c_1)*sum(c_2)-sum(c_4)*sum(c_3))^2

}

plot(seq(int*L,int*U,int),f[c(L:U)],main="Constrain Distribution",xlab="sigma",ylab="f(x)")

Page 11: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

sigma=0.42 #sorting function requires

#MLE

sigma=sum(d)/2/ns^2

#hybrid

f=c()

g=c()

h=c()

int=.00001

L=43590

U=43600

c_1=matrix(nrow=length(x_obs),ncol=length(y_obs),byrow=TRUE)

c_2=matrix(nrow=length(x_obs),ncol=length(y_obs),byrow=TRUE)

c_3=matrix(nrow=length(x_obs),ncol=length(y_obs),byrow=TRUE)

c_4=matrix(nrow=length(x_obs),ncol=length(y_obs),byrow=TRUE)

Page 12: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

for(i in L:U){

for(j in 1:length(x_obs)){

for(k in 1:length(y_obs)){

c_1[j,k]=obs[j,1]*d[j,k]*G(-sqrt(d[j,k]/(int*i)))

c_2[j,k]=G(-sqrt(d[j,k]/(int*i)))

c_3[j,k]=obs[j,1]*G(-sqrt(d[j,k]/(int*i)))

c_4[j,k]=d[j,k]*G(-sqrt(d[j,k]/(int*i)))

}

}

f[i]=(sum(c_1)*sum(c_2)-sum(c_4)*sum(c_3))^2

g[i]=(log(2*pi*(int*i)^2)+sum(d)/2/(int*i)/ns^2)^2

h[i]=g[i]+f[i]

}

plot(seq(int*L,int*U,int),h[c(L:U)],main="Constrain Distribution",xlab="sigma",ylab="h(x)")

Page 13: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

sigma=.43595

#set grid

x_est=seq(-3,3,0.1)

y_est=seq(-3,3,0.1)

d_est=array(dim=c(length(x_est),length(y_est),length(x_obs)))

#estimation

est=matrix(nrow=length(x_est),ncol=length(y_est),byrow=TRUE)

err=matrix(nrow=length(x_est),ncol=length(y_est),byrow=TRUE)

rel=matrix(nrow=length(x_est),ncol=length(y_est),byrow=TRUE)

for(i in 1:length(x_est)){

for(j in 1:length(y_est)){

for(s in 1:length(x_obs)){

d_est[i,j,s]=((x_est[i]-x_obs[s])-obs[s,2]*t)^2+((y_est[j]-y_obs[s])-obs[s,3]*t)^2

}

Page 14: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

176

177

178

179

180

est[i,j]=sum(obs[,1]*G(-sqrt(d_est[i,j,]/sigma)))/sum(G(-sqrt(d_est[i,j,]/sigma)))

err[i,j]=(G(x_est[i])*G(y_est[j])-est[i,j])^2

rel[i,j]=err[i,j]/(G(x_est[i])*G(y_est[j]))^2

}

}

3. Results and Discussion

Page 15: Atmospheric dynamics final project: A Trivial Test on a Bayesian Statistic Distribution Model for Air Pollution

1) The error in between the middle of sample points is the greatest.2) Relative error is acceptable only within the circle with radius 1, no matter how you select your sites.3) Site selection only alters the predicting ability within the r<1 circle.4) Inside r<0.5, prediction ability decreases; this can be improved when site selection is closer to origin.