Upload
vodien
View
219
Download
5
Embed Size (px)
Citation preview
Neural Network: 신경망모형
김진석
Department of Statistics and Information Science
Dongguk University
E-mail:[email protected]
2008년 9월
0-0
차례
제 1 절 신경망모형의 구조 0-3
제 2 절 신경망모형의 적합 0-62.1 Learning Algorithms . . . . . . . . . . . . . . . . . . . . . 0-72.2 Some issues in learning networks . . . . . . . . . . . . . 0-7
제 3 절 R 실습 0-83.1 Edgar Anderson’s Iris Data . . . . . . . . . . . . . . . . . 0-9
제 4 절 나무모형과 신경망 비교 0-17
0-1
A Neural network is a multi-stage regression or classification model.
• Regression problem
• Classification problem
y1i
y2i
x1h
x2h
y1h
y2h
x1o
x2o
Input layer hidden layer output layer
y1o
y2o
y3i
w11
w21
v11
v21
그림 1: Feed-forward neural network
0-2
제 1절 신경망모형의구조
Input variable의 수가 p, hidden unit의 수가 M 그리고 output unit의 수가 K인 모형 :
fk(x) = gk
[c2(σ(c1(x))
)]= gk
[v0k +
M∑j=1
vjkσ(w0 +
p∑i=1
wijxi
)]
여기서 c1, c2를 combination function, 그리고 σ,gk를 activation func-tion이라고 부른다.
0-3
위의 예에서 입력노드에서 j번째 hidden node (은닉노드)로 가는
combination function은 아래와 같은 형태를 가진다.
c1(y) = w0 +p∑
i=1
wijyi.
또한 은닉노드에서 출력노드 k로의 combination function은
c2(yh1 , . . . , y
hM ) = v0 +
M∑j=1
vjkyhj .
히든노드 j 에서 activation function은 다음과 같은 형태를 가지며,
σ(xhj ) =
exp(xhj )
1 + exp(xhj ), logistic function,
0-4
출력노드에서의 activation function 형태는 아래와 같다.
gk(x) =
exp(xk)∑Kl=1 exp(xl)
softmax,
xk identity,
I(xk > 0) threshold.
표 1: Combination and activation functions for network unitsinput units hidden units output units
combination identity linear linear(c(x))
activation identity logistic logistic(f(x)) linear
softmax
0-5
제 2절 신경망모형의적합
1-hidden layer 인 경우, 추정해야 할 모수(weights, parameters)는 아
래와 같다.
{w0j ,wj : j = 1, . . . ,M}M × (p+ 1)
{v0k,vk : k = 1, . . . ,K}K × (M + 1)
위의 모수 추정은 아래의 목적함수를 최소화 하는 모수값을 찾으면 된다.목적함수(Objective function, or Error functions):
E =N∑
i=1
K∑k=1
(yik − fk(xi))2, for regression
= −N∑
i=1
K∑k=1
yik log fk(xi), for classification
0-6
2.1 Learning Algorithms
• Backpropagation Algorithm
• Steepest decent
• DFP(Davidon-Fletcher-Powell)
• BFGS(Broyden-Fletcher-Goldfarb-Shanno) algorithm
2.2 Some issues in learning networks
• Starting values
• Overfitting - regularization(stoping rule, weight decay)
• Scaling of input variables
0-7
• Number of hidden units and layers
• Multiple minima
제 3절 R실습
library(nnet)
nnet(
formula, # A formula of the form class ~ x1 + x2 + ...
weights, # weights for each example. Defaults is 1.
size, # number of units in the hidden layer
data, # Data frame
linout, # switch for linear output units. Default logistic.
entropy, # switch for entropy, Default by least-squares.
softmax, # switch for softmax
0-8
decay, # parameter for weight decay. Default 0.
maxit, # maximum number of iterations. Default 100.
trace, # switch for tracing optimization. Default TRUE.
)
3.1 Edgar Anderson’s Iris Data
This famous (Fisher’s or Anderson’s) iris data setThe measurements in cm of the variables sepal(꽃받침) length and
width and petal(꽃잎) length and width, respectively, for 50 flowersfrom each of 3 species of iris.
• Target: Species (setosa, versicolor, and virginica).
• Inputs: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width.
0-9
> iris[sample(1:150, 6),]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
110 7.2 3.6 6.1 2.5 virginica
68 5.8 2.7 4.1 1.0 versicolor
143 5.8 2.7 5.1 1.9 virginica
6 5.4 3.9 1.7 0.4 setosa
103 7.1 3.0 5.9 2.1 virginica
102 5.8 2.7 5.1 1.9 virginica
pdf("iris.pdf")
plot(iris[,1:3], col=as.integer(iris$Species),
pch=substring((iris$Species),1,1))
dev.off()
Partition data into training data and test data
0-10
Sepal.Length
2.0 2.5 3.0 3.5 4.0
ss
sss
s
ss
s
s
s
ss
s
s ss
s
s
ss
s
s
ss
s sss
ss
sss
ss
s
s
s
ss
s s
s ss
s
s
ss
v
v
v
v
v
v
v
v
v
vv
vv v
v
v
vv
v
vv
vv
vvv
v v
vv
vvvv
v
v
vv
vvv
vv
v
v vv
v
v
v
v
v
v
vv
v
v
v
v
v
vvv
v v
vv
vv
v
v
v
v
vv
v
v vv
vv
v
vvv
v
vvv
vvv
v
vvvv
vv
v
4.5
5.5
6.5
7.5
sssss
s
ss
s
s
s
ss
s
s sss
s
ss
s
s
ss
ssss
ss
sss
ss
s
s
s
ss
ss
s sss
s
ss
v
v
v
v
v
v
v
v
v
vv
vv v
v
v
vv
v
vv
vv
vvv
vv
vv
vvv
v
v
v
vv
vv v
vv
v
vvv
v
v
v
v
v
v
vv
v
v
v
v
v
vvv
vv
vv
vv
v
v
v
v
vv
v
vvv
vv
v
vvv
v
vvv
vv
v
v
vvvvvv
v
2.0
2.5
3.0
3.5
4.0
s
sss
s
s
s s
ss
s
s
ss
s
s
s
s
ss
s
ss
ss
s
s ssss
s
s s
ss
ss
s
ss
s
s
s
s
s
s
s
s
s vv v
v
vv
v
v
vv
v
v
v
vvvv
v
v
v
v
v
v
v v vv
vv
vvv
v v
v
v
v
v
v
vv
v
v
v
v
vv v
v
v
v
v
vv v v
v
v
v
v
v
v
v
v
v
vv
v
v
v
v
v vv
v v
vv
vv
v
v
vvv
v
v
vv vv v
v
vv
v
v
v
v
vSepal.Width
s
sss
s
s
ss
ss
s
s
ss
s
s
s
s
ss
s
ss
ss
s
sssss
s
ss
ss
ss
s
ss
s
s
s
s
s
s
s
s
s vv v
v
vv
v
v
vv
v
v
v
vvvv
v
v
v
v
v
v
vvv vvv
vvv
v v
v
v
v
v
v
v v
v
v
v
v
vvv
v
v
v
v
vvv v
v
v
v
v
v
v
v
v
v
vv
v
v
v
v
v vv
v v
vv
vv
v
v
vvv
v
v
vv vvv
v
vv
v
v
v
v
v
4.5 5.5 6.5 7.5
ssss ss
s ss s ssss s
sssss ss
s
ssss ssss ss sss sss ssssss
s ss ss
vvv
vvv v
v
v
vv
vv
v
v
vvv
vv
v
v
vvv v
vvv
vvv v
vv v v
vvv
v vv
v
vvv v
v
v
v
v
vv v
v
v
vv
v
vvv
vvvv
vv
v
v
v
v
v
vv
vv
v vv
v
vv
vv
vv
vvvvv
vvvv vv
v
ss ss ss
sss s ssss s
ssssss s
s
sss sssss s sss
s sss sss ss
ss ss ss
vvv
vvv v
v
v
vv
vv
v
v
vvv
vv
v
v
v vvv
v vv
vvv v
vv vv
vvv
v vv
v
v vvv
v
v
v
v
vvv
v
v
vv
v
vv vv v vv
vv
v
v
v
v
v
vv
v v
v vv
v
vv
vv
vv
vvvvv
vvvv v vv
1 2 3 4 5 6 7
12
34
56
7
Petal.Length
그림 2: Iris data
0-11
data(iris)
# use half the iris data
samp <- c(sample(1:50,25), sample(51:100,25), sample(101:150,25))
iris.tr<-iris[samp,]
iris.te<-iris[-samp,]
Neural Network modeling
ir1 <- nnet(Species~., data=iris.tr, size = 2, decay = 5e-4)
# weights: 19
initial value 83.513437
iter 10 value 1.427673
iter 20 value 0.620816
iter 30 value 0.526732
iter 40 value 0.484637
0-12
iter 50 value 0.445681
...
iter 90 value 0.370717
iter 100 value 0.369270
final value 0.369270
stopped after 100 iterations
View Neural network model
> names(ir1)
"value" : 에러함수의 값
"wts" : 모수추정치
"fitted.values" : output추정치
"residuals" : 잔차
> summary(ir1)
0-13
a 4-2-3 network with 19 weights
options were - softmax modelling decay=5e-04
b->h1 i1->h1 i2->h1 i3->h1 i4->h1 #은닉노드 1
-6.48 -5.24 -3.83 8.15 6.37
b->h2 i1->h2 i2->h2 i3->h2 i4->h2 #은닉노드 2
0.39 0.61 1.79 -3.02 -1.30
b->o1 h1->o1 h2->o1 #출력노드 1
-2.48 -1.83 9.14
b->o2 h1->o2 h2->o2 #출력노드 2
5.96 -9.13 -7.81
b->o3 h1->o3 h2->o3 #출력노드 3
-3.54 10.84 -1.26
예측함수 : Predict new examples by a trained neural net.
predict(
0-14
object, # nnet class (모형).
newdata,# test data set (matrix or data frame)
type, # type = "raw", the matrix of values
# returned by the trained network;
# type = "class", the corresponding class .
...
)
Model 평가 : Test error
y<-iris.te$Species
p<- predict(ir1, iris.te, type = "class")
tt<-table(y, p)
p
0-15
y setosa versicolor virginica
setosa 25 0 0
versicolor 0 21 4
virginica 0 0 25
Hidden unit의 수에 따른 Test error
test.err<-function(h.size)
{
ir <- nnet(Species~., data=iris.tr, size = h.size,
decay = 5e-4, trace=F)
y<-iris.te$Species
p<- predict(ir, iris.te, type = "class")
err<-mean(y != p)
0-16
c(h.size, err)
}
out<-t(sapply(2:10, FUN=test.err))
pdf("nntest.pdf")
plot(out, type="b", xlab="The number of Hidden units",
ylab="Test Error")
dev.off()
제 4절 나무모형과신경망비교
library(tree)
ir.t <- tree(Species~., data=iris.tr, minsize=2, mincut=1)
0-17
●
● ●
● ● ● ● ● ●
2 4 6 8 10
0.01
40.
016
0.01
80.
020
0.02
20.
024
0.02
6
The number of Hidden units
Tes
t Err
or
그림 3: Hidden unit의 수에 따른 Test error
0-18
cvt<-cv.tree(ir.t, FUN=prune.misclass)
for(i in 2:5)
{
cvt$dev<-cvt$dev + cv.tree(ir.t, FUN=prune.misclass)$dev
cvt$dev <- cvt$dev/5
}
K<-cvt$size[which.min(cvt$dev)]
ir.tp<-prune.misclass(ir.t, best=K)
pdf("tree_iris.pdf")
par(mfrow=c(1,2))
plot(cvt); plot(ir.tp); text(ir.tp)
dev.off()
Comparison Tree model with NN
> p<- predict(ir.tp, iris.te, type = "class")
0-19
> err<-mean(iris.te$Species != p)
[1] 0.06666667
> test.err(2) ## NN model with 2 hidden units
[1] 2.00000000 0.05333333
0-20
size
mis
clas
s
05
1015
1 2 3 4 5
25 1 −Inf
|Petal.Length < 2.3
Petal.Length < 4.95
Petal.Width < 1.65Sepal.Length < 6
setosa
versicolorversicolorvirginica
virginica
그림 4: Tree model with Iris data
0-21