Neural Network: 망모형 - Datamining Lab. at Dongguk...

Preview:

Citation preview

Neural Network: 신경망모형

김진석

Department of Statistics and Information Science

Dongguk University

E-mail:jinseog.kim@gmail.com

2008년 9월

0-0

차례

제 1 절 신경망모형의 구조 0-3

제 2 절 신경망모형의 적합 0-62.1 Learning Algorithms . . . . . . . . . . . . . . . . . . . . . 0-72.2 Some issues in learning networks . . . . . . . . . . . . . 0-7

제 3 절 R 실습 0-83.1 Edgar Anderson’s Iris Data . . . . . . . . . . . . . . . . . 0-9

제 4 절 나무모형과 신경망 비교 0-17

0-1

A Neural network is a multi-stage regression or classification model.

• Regression problem

• Classification problem

y1i

y2i

x1h

x2h

y1h

y2h

x1o

x2o

Input layer hidden layer output layer

y1o

y2o

y3i

w11

w21

v11

v21

그림 1: Feed-forward neural network

0-2

제 1절 신경망모형의구조

Input variable의 수가 p, hidden unit의 수가 M 그리고 output unit의 수가 K인 모형 :

fk(x) = gk

[c2(σ(c1(x))

)]= gk

[v0k +

M∑j=1

vjkσ(w0 +

p∑i=1

wijxi

)]

여기서 c1, c2를 combination function, 그리고 σ,gk를 activation func-tion이라고 부른다.

0-3

위의 예에서 입력노드에서 j번째 hidden node (은닉노드)로 가는

combination function은 아래와 같은 형태를 가진다.

c1(y) = w0 +p∑

i=1

wijyi.

또한 은닉노드에서 출력노드 k로의 combination function은

c2(yh1 , . . . , y

hM ) = v0 +

M∑j=1

vjkyhj .

히든노드 j 에서 activation function은 다음과 같은 형태를 가지며,

σ(xhj ) =

exp(xhj )

1 + exp(xhj ), logistic function,

0-4

출력노드에서의 activation function 형태는 아래와 같다.

gk(x) =

exp(xk)∑Kl=1 exp(xl)

softmax,

xk identity,

I(xk > 0) threshold.

표 1: Combination and activation functions for network unitsinput units hidden units output units

combination identity linear linear(c(x))

activation identity logistic logistic(f(x)) linear

softmax

0-5

제 2절 신경망모형의적합

1-hidden layer 인 경우, 추정해야 할 모수(weights, parameters)는 아

래와 같다.

{w0j ,wj : j = 1, . . . ,M}M × (p+ 1)

{v0k,vk : k = 1, . . . ,K}K × (M + 1)

위의 모수 추정은 아래의 목적함수를 최소화 하는 모수값을 찾으면 된다.목적함수(Objective function, or Error functions):

E =N∑

i=1

K∑k=1

(yik − fk(xi))2, for regression

= −N∑

i=1

K∑k=1

yik log fk(xi), for classification

0-6

2.1 Learning Algorithms

• Backpropagation Algorithm

• Steepest decent

• DFP(Davidon-Fletcher-Powell)

• BFGS(Broyden-Fletcher-Goldfarb-Shanno) algorithm

2.2 Some issues in learning networks

• Starting values

• Overfitting - regularization(stoping rule, weight decay)

• Scaling of input variables

0-7

• Number of hidden units and layers

• Multiple minima

제 3절 R실습

library(nnet)

nnet(

formula, # A formula of the form class ~ x1 + x2 + ...

weights, # weights for each example. Defaults is 1.

size, # number of units in the hidden layer

data, # Data frame

linout, # switch for linear output units. Default logistic.

entropy, # switch for entropy, Default by least-squares.

softmax, # switch for softmax

0-8

decay, # parameter for weight decay. Default 0.

maxit, # maximum number of iterations. Default 100.

trace, # switch for tracing optimization. Default TRUE.

)

3.1 Edgar Anderson’s Iris Data

This famous (Fisher’s or Anderson’s) iris data setThe measurements in cm of the variables sepal(꽃받침) length and

width and petal(꽃잎) length and width, respectively, for 50 flowersfrom each of 3 species of iris.

• Target: Species (setosa, versicolor, and virginica).

• Inputs: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width.

0-9

> iris[sample(1:150, 6),]

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

110 7.2 3.6 6.1 2.5 virginica

68 5.8 2.7 4.1 1.0 versicolor

143 5.8 2.7 5.1 1.9 virginica

6 5.4 3.9 1.7 0.4 setosa

103 7.1 3.0 5.9 2.1 virginica

102 5.8 2.7 5.1 1.9 virginica

pdf("iris.pdf")

plot(iris[,1:3], col=as.integer(iris$Species),

pch=substring((iris$Species),1,1))

dev.off()

Partition data into training data and test data

0-10

Sepal.Length

2.0 2.5 3.0 3.5 4.0

ss

sss

s

ss

s

s

s

ss

s

s ss

s

s

ss

s

s

ss

s sss

ss

sss

ss

s

s

s

ss

s s

s ss

s

s

ss

v

v

v

v

v

v

v

v

v

vv

vv v

v

v

vv

v

vv

vv

vvv

v v

vv

vvvv

v

v

vv

vvv

vv

v

v vv

v

v

v

v

v

v

vv

v

v

v

v

v

vvv

v v

vv

vv

v

v

v

v

vv

v

v vv

vv

v

vvv

v

vvv

vvv

v

vvvv

vv

v

4.5

5.5

6.5

7.5

sssss

s

ss

s

s

s

ss

s

s sss

s

ss

s

s

ss

ssss

ss

sss

ss

s

s

s

ss

ss

s sss

s

ss

v

v

v

v

v

v

v

v

v

vv

vv v

v

v

vv

v

vv

vv

vvv

vv

vv

vvv

v

v

v

vv

vv v

vv

v

vvv

v

v

v

v

v

v

vv

v

v

v

v

v

vvv

vv

vv

vv

v

v

v

v

vv

v

vvv

vv

v

vvv

v

vvv

vv

v

v

vvvvvv

v

2.0

2.5

3.0

3.5

4.0

s

sss

s

s

s s

ss

s

s

ss

s

s

s

s

ss

s

ss

ss

s

s ssss

s

s s

ss

ss

s

ss

s

s

s

s

s

s

s

s

s vv v

v

vv

v

v

vv

v

v

v

vvvv

v

v

v

v

v

v

v v vv

vv

vvv

v v

v

v

v

v

v

vv

v

v

v

v

vv v

v

v

v

v

vv v v

v

v

v

v

v

v

v

v

v

vv

v

v

v

v

v vv

v v

vv

vv

v

v

vvv

v

v

vv vv v

v

vv

v

v

v

v

vSepal.Width

s

sss

s

s

ss

ss

s

s

ss

s

s

s

s

ss

s

ss

ss

s

sssss

s

ss

ss

ss

s

ss

s

s

s

s

s

s

s

s

s vv v

v

vv

v

v

vv

v

v

v

vvvv

v

v

v

v

v

v

vvv vvv

vvv

v v

v

v

v

v

v

v v

v

v

v

v

vvv

v

v

v

v

vvv v

v

v

v

v

v

v

v

v

v

vv

v

v

v

v

v vv

v v

vv

vv

v

v

vvv

v

v

vv vvv

v

vv

v

v

v

v

v

4.5 5.5 6.5 7.5

ssss ss

s ss s ssss s

sssss ss

s

ssss ssss ss sss sss ssssss

s ss ss

vvv

vvv v

v

v

vv

vv

v

v

vvv

vv

v

v

vvv v

vvv

vvv v

vv v v

vvv

v vv

v

vvv v

v

v

v

v

vv v

v

v

vv

v

vvv

vvvv

vv

v

v

v

v

v

vv

vv

v vv

v

vv

vv

vv

vvvvv

vvvv vv

v

ss ss ss

sss s ssss s

ssssss s

s

sss sssss s sss

s sss sss ss

ss ss ss

vvv

vvv v

v

v

vv

vv

v

v

vvv

vv

v

v

v vvv

v vv

vvv v

vv vv

vvv

v vv

v

v vvv

v

v

v

v

vvv

v

v

vv

v

vv vv v vv

vv

v

v

v

v

v

vv

v v

v vv

v

vv

vv

vv

vvvvv

vvvv v vv

1 2 3 4 5 6 7

12

34

56

7

Petal.Length

그림 2: Iris data

0-11

data(iris)

# use half the iris data

samp <- c(sample(1:50,25), sample(51:100,25), sample(101:150,25))

iris.tr<-iris[samp,]

iris.te<-iris[-samp,]

Neural Network modeling

ir1 <- nnet(Species~., data=iris.tr, size = 2, decay = 5e-4)

# weights: 19

initial value 83.513437

iter 10 value 1.427673

iter 20 value 0.620816

iter 30 value 0.526732

iter 40 value 0.484637

0-12

iter 50 value 0.445681

...

iter 90 value 0.370717

iter 100 value 0.369270

final value 0.369270

stopped after 100 iterations

View Neural network model

> names(ir1)

"value" : 에러함수의 값

"wts" : 모수추정치

"fitted.values" : output추정치

"residuals" : 잔차

> summary(ir1)

0-13

a 4-2-3 network with 19 weights

options were - softmax modelling decay=5e-04

b->h1 i1->h1 i2->h1 i3->h1 i4->h1 #은닉노드 1

-6.48 -5.24 -3.83 8.15 6.37

b->h2 i1->h2 i2->h2 i3->h2 i4->h2 #은닉노드 2

0.39 0.61 1.79 -3.02 -1.30

b->o1 h1->o1 h2->o1 #출력노드 1

-2.48 -1.83 9.14

b->o2 h1->o2 h2->o2 #출력노드 2

5.96 -9.13 -7.81

b->o3 h1->o3 h2->o3 #출력노드 3

-3.54 10.84 -1.26

예측함수 : Predict new examples by a trained neural net.

predict(

0-14

object, # nnet class (모형).

newdata,# test data set (matrix or data frame)

type, # type = "raw", the matrix of values

# returned by the trained network;

# type = "class", the corresponding class .

...

)

Model 평가 : Test error

y<-iris.te$Species

p<- predict(ir1, iris.te, type = "class")

tt<-table(y, p)

p

0-15

y setosa versicolor virginica

setosa 25 0 0

versicolor 0 21 4

virginica 0 0 25

Hidden unit의 수에 따른 Test error

test.err<-function(h.size)

{

ir <- nnet(Species~., data=iris.tr, size = h.size,

decay = 5e-4, trace=F)

y<-iris.te$Species

p<- predict(ir, iris.te, type = "class")

err<-mean(y != p)

0-16

c(h.size, err)

}

out<-t(sapply(2:10, FUN=test.err))

pdf("nntest.pdf")

plot(out, type="b", xlab="The number of Hidden units",

ylab="Test Error")

dev.off()

제 4절 나무모형과신경망비교

library(tree)

ir.t <- tree(Species~., data=iris.tr, minsize=2, mincut=1)

0-17

● ●

● ● ● ● ● ●

2 4 6 8 10

0.01

40.

016

0.01

80.

020

0.02

20.

024

0.02

6

The number of Hidden units

Tes

t Err

or

그림 3: Hidden unit의 수에 따른 Test error

0-18

cvt<-cv.tree(ir.t, FUN=prune.misclass)

for(i in 2:5)

{

cvt$dev<-cvt$dev + cv.tree(ir.t, FUN=prune.misclass)$dev

cvt$dev <- cvt$dev/5

}

K<-cvt$size[which.min(cvt$dev)]

ir.tp<-prune.misclass(ir.t, best=K)

pdf("tree_iris.pdf")

par(mfrow=c(1,2))

plot(cvt); plot(ir.tp); text(ir.tp)

dev.off()

Comparison Tree model with NN

> p<- predict(ir.tp, iris.te, type = "class")

0-19

> err<-mean(iris.te$Species != p)

[1] 0.06666667

> test.err(2) ## NN model with 2 hidden units

[1] 2.00000000 0.05333333

0-20

size

mis

clas

s

05

1015

1 2 3 4 5

25 1 −Inf

|Petal.Length < 2.3

Petal.Length < 4.95

Petal.Width < 1.65Sepal.Length < 6

setosa

versicolorversicolorvirginica

virginica

그림 4: Tree model with Iris data

0-21