18
pt0 t/| \ DMR-3” “K Nearest Neighbor, Neural Network, Naive Bayesian, SVM” Jinseog Kim Dongguk University [email protected] 2017-04-11 Jinseog KimDongguk University[email protected] pt0 t/| \ DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM” 2017-04-11 1 / 18

pt0 t ð DMR-3” “KNearestNeighbor,NeuralNetwork ...datamining.dongguk.ac.kr/lectures/2017-1/dm/DMR-3.pdf“pt0 ‹t‚ð⁄‚| \ DMR-3” “KNearestNeighbor,NeuralNetwork,NaiveBayesian,SVM”

  • Upload
    lelien

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”

Jinseog KimDongguk University

[email protected]

2017-04-11

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 1 / 18

Supervised learning

1 generalized linear models1 linear regression2 logistic regression3 poisson, gamma, . . .

2 non-parametric models1 decision tree, knn,

3 neural network(1 hidden layer)4 penalized regression models

1 ridge - l2 penalty2 lasso - l1 penalty3 elastic net - l1 + l2 penalty

5 ensembles1 bagging2 various boosting3 random forest

6 deep learning : h2o

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 2 / 18

Predictive modeling workflow and R methods(functions)

1 fit model fit <- knn(trainingData, outcome, k = 5)2 check or validation models print, plot, summary3 predict using selected model & test data predict(fit, newdata)4 performance evaluation ROCR, ‘caret’ packages

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 3 / 18

Naive Bayes 모형

입력변수의 조건부 분포가 서로 독립 = 단순 베이즈 가정(naive Bayes assumption)

P(Y = k|X1 = x1, . . . ,Xp = xp) ∝ P(X1 = x1, . . . ,Xp = xp |Y = k)P(Y = k)

∝ P(Y = k)

p∏j=1

P(Xj = xj |Y = k)

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 4 / 18

k-근방 분류

k-NN은 통계적 모형을 적합하지 않는 메모리 기반의 방법x점이 주어지면 x와 가장 거리가 가까운 k개의 훈련자료점의 y값들을 비교N(x)를 x와 거리가 가장 가까운 k개의 훈련자료들의 집합인 k-근방(k-nearest neighborhood)이라 하면

y(x) = arg maxl∈{−1,+1}

∑xj∈N(x)

I(yj = l)

k가 증가하면 편의는 커지고 분산은 감소한다.점근적으로 1-NN의 오분류율은 최적 Bayes 오분류율의 2배를 넘지 않는다고 알려져 있다.

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 5 / 18

신경망 (Neural networks)

다층신경망

1 입력층(input layer):입력변수에 대응되는 노드로 구성, 노드의 수는 입력변수의 개수

2 은닉층(hidden layer):입력층으로부터 전달되는 변수값들의 선형결합을 비선형함수로 처리

출력층 또는 다른 은닉층에 전달하는 역할

3 출력층(output layer)출력변수에 대응되는 노드

분류모형에서는 클래스의 수 만큼의 출력노드가 생성

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 6 / 18

신경망 (Neural networks)

xx y

입력층(input layer)

은닉층(hidden layer)

출력층(output layer)

Figure: 다층신경망 모형의 구조

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 7 / 18

SVM

Hinge loss: (1− yi f (xi ))+ , max(1− yi f (xi ), 0)

λ > 0: tunning parameterSVM은 아래의 penalized hinge loss를 최소화하는 방법

minf∈H

(1n

n∑i=1

(1− yi f (xi ))+ + λ‖f ‖2).

f (x) =

n∑i=1

αi K(xi , x)

f는 n개의 훈련자료점에서 계산된 커널의 선형결합

minβ,β0

(1n

n∑i=1

(1− yi (β0 + Φ(xi )Tβ))+ + λ‖β‖2

)

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 8 / 18

R 패키지 및 함수

R패키지 및 함수명 옵션 함수/옵션 설명e1071::naiveBayes laplace Laplace smoothing(default=0)class::knn k 근방 관측치 수

nnet::nnet size hidden node의 수linout switch for linear output unitsentropy switch for entropysoftmax switch for softmaxdecay parameter for weight decay(default=0)maxit 역전파 알고리즘의 반복 횟수를 지정하며 기본값은 100임

neuralnet::neuralnet hiddenerr.fct error function(“sse”|“ce”=cross entroy)act.fct activation function(“logistic”|“tanh”)linear.ouout switch for linear output unitsalgorithm learning algorithm(“backprop”|rprop+|. . . )

e1071::svm kernel “linear”|“polynomial”|“radial basis”gamma parameter for kernel functiontype “C-classification”|“one-classification”

“nu-classification”cost C constraint(regularized param: defaut=1)cross K-fold CV(default=0)

dataset::iris Edgar Anderson’s Iris DataElemStatLearn::spam spam mail data

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 9 / 18

Naive Bayes 예제

library(e1071)model <- naiveBayes(Species ~ ., data = iris)# predictpredict(model, iris[1:5, -5], type = "raw")

## setosa versicolor virginica## [1,] 1 2.981309e-18 2.152373e-25## [2,] 1 3.169312e-17 6.938030e-25## [3,] 1 2.367113e-18 7.240956e-26## [4,] 1 3.069606e-17 8.690636e-25## [5,] 1 1.017337e-18 8.885794e-26

pred <- predict(model, iris[,-5])# Confusion Matrixtable(pred, iris$Species)

#### pred setosa versicolor virginica## setosa 50 0 0## versicolor 0 47 3## virginica 0 3 47

# using laplace smoothing:model2 <- naiveBayes(Species ~ ., data = iris, laplace = 3)pred <- predict(model, iris[,-5])table(pred, iris$Species)

#### pred setosa versicolor virginica## setosa 50 0 0## versicolor 0 47 3## virginica 0 3 47

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 10 / 18

k NN 예제

library(class)set.seed(1)y <- iris[,5]# Partition indice into training data & test datatr.idx <- sample(length(y), 75)# prediction via knnpred <- knn(train=iris[tr.idx, -5], test=iris[-tr.idx, -5], cl=y[tr.idx], k = 3)# confusion matrixtable(pred, y[-tr.idx])

#### pred setosa versicolor virginica## setosa 24 0 0## versicolor 0 22 3## virginica 0 0 26

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 11 / 18

neural network 예제: nnet

library(nnet)ir1 <- nnet(Species~., data=iris[tr.idx,], size = 2, decay = 5e-4)

## # weights: 19## initial value 98.070522## iter 10 value 49.984837## iter 20 value 33.254723## iter 30 value 4.667817## iter 40 value 3.787903## iter 50 value 3.701295## iter 60 value 3.672516## iter 70 value 3.670662## iter 80 value 3.666205## iter 90 value 3.664552## iter 100 value 3.662954## final value 3.662954## stopped after 100 iterations

summary(ir1)

## a 4-2-3 network with 19 weights## options were - softmax modelling decay=5e-04## b->h1 i1->h1 i2->h1 i3->h1 i4->h1## 4.82 0.17 1.73 -1.43 -2.47## b->h2 i1->h2 i2->h2 i3->h2 i4->h2## 0.42 0.68 1.79 -3.12 -1.42## b->o1 h1->o1 h2->o1## -4.75 1.80 9.45## b->o2 h1->o2 h2->o2## -1.43 8.42 -8.72## b->o3 h1->o3 h2->o3## 6.12 -10.13 -0.80

# predictpred <- predict(ir1, iris[-tr.idx, -5], type = "class")# confusion matrixtable(iris$Species[-tr.idx], pred)

## pred## setosa versicolor virginica## setosa 24 0 0## versicolor 0 20 2## virginica 0 1 28

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 12 / 18

neural network 예제: neuralnet

Not multiclass problemModel formula에 ’.’사용시 오류발생

library(neuralnet)iris2 <- iris[1:100,-5]iris2$Y <- as.integer(iris$Species[1:100])-1nn <- neuralnet( Y~Sepal.Length+Sepal.Width+Petal.Length, data=iris2,

hidden=2, err.fct="ce",linear.output=FALSE)

# basic resultnn

## $call## neuralnet(formula = Y ~ Sepal.Length + Sepal.Width + Petal.Length,## data = iris2, hidden = 2, err.fct = "ce", linear.output = FALSE)#### $response## Y## 1 0## 2 0## 3 0## 4 0## 5 0## 6 0## 7 0## 8 0## 9 0## 10 0## 11 0## 12 0## 13 0## 14 0## 15 0## 16 0## 17 0## 18 0## 19 0## 20 0## 21 0## 22 0## 23 0## 24 0## 25 0## 26 0## 27 0## 28 0## 29 0## 30 0## 31 0## 32 0## 33 0## 34 0## 35 0## 36 0## 37 0## 38 0## 39 0## 40 0## 41 0## 42 0## 43 0## 44 0## 45 0## 46 0## 47 0## 48 0## 49 0## 50 0## 51 1## 52 1## 53 1## 54 1## 55 1## 56 1## 57 1## 58 1## 59 1## 60 1## 61 1## 62 1## 63 1## 64 1## 65 1## 66 1## 67 1## 68 1## 69 1## 70 1## 71 1## 72 1## 73 1## 74 1## 75 1## 76 1## 77 1## 78 1## 79 1## 80 1## 81 1## 82 1## 83 1## 84 1## 85 1## 86 1## 87 1## 88 1## 89 1## 90 1## 91 1## 92 1## 93 1## 94 1## 95 1## 96 1## 97 1## 98 1## 99 1## 100 1#### $covariate## [,1] [,2] [,3]## [1,] 5.1 3.5 1.4## [2,] 4.9 3.0 1.4## [3,] 4.7 3.2 1.3## [4,] 4.6 3.1 1.5## [5,] 5.0 3.6 1.4## [6,] 5.4 3.9 1.7## [7,] 4.6 3.4 1.4## [8,] 5.0 3.4 1.5## [9,] 4.4 2.9 1.4## [10,] 4.9 3.1 1.5## [11,] 5.4 3.7 1.5## [12,] 4.8 3.4 1.6## [13,] 4.8 3.0 1.4## [14,] 4.3 3.0 1.1## [15,] 5.8 4.0 1.2## [16,] 5.7 4.4 1.5## [17,] 5.4 3.9 1.3## [18,] 5.1 3.5 1.4## [19,] 5.7 3.8 1.7## [20,] 5.1 3.8 1.5## [21,] 5.4 3.4 1.7## [22,] 5.1 3.7 1.5## [23,] 4.6 3.6 1.0## [24,] 5.1 3.3 1.7## [25,] 4.8 3.4 1.9## [26,] 5.0 3.0 1.6## [27,] 5.0 3.4 1.6## [28,] 5.2 3.5 1.5## [29,] 5.2 3.4 1.4## [30,] 4.7 3.2 1.6## [31,] 4.8 3.1 1.6## [32,] 5.4 3.4 1.5## [33,] 5.2 4.1 1.5## [34,] 5.5 4.2 1.4## [35,] 4.9 3.1 1.5## [36,] 5.0 3.2 1.2## [37,] 5.5 3.5 1.3## [38,] 4.9 3.6 1.4## [39,] 4.4 3.0 1.3## [40,] 5.1 3.4 1.5## [41,] 5.0 3.5 1.3## [42,] 4.5 2.3 1.3## [43,] 4.4 3.2 1.3## [44,] 5.0 3.5 1.6## [45,] 5.1 3.8 1.9## [46,] 4.8 3.0 1.4## [47,] 5.1 3.8 1.6## [48,] 4.6 3.2 1.4## [49,] 5.3 3.7 1.5## [50,] 5.0 3.3 1.4## [51,] 7.0 3.2 4.7## [52,] 6.4 3.2 4.5## [53,] 6.9 3.1 4.9## [54,] 5.5 2.3 4.0## [55,] 6.5 2.8 4.6## [56,] 5.7 2.8 4.5## [57,] 6.3 3.3 4.7## [58,] 4.9 2.4 3.3## [59,] 6.6 2.9 4.6## [60,] 5.2 2.7 3.9## [61,] 5.0 2.0 3.5## [62,] 5.9 3.0 4.2## [63,] 6.0 2.2 4.0## [64,] 6.1 2.9 4.7## [65,] 5.6 2.9 3.6## [66,] 6.7 3.1 4.4## [67,] 5.6 3.0 4.5## [68,] 5.8 2.7 4.1## [69,] 6.2 2.2 4.5## [70,] 5.6 2.5 3.9## [71,] 5.9 3.2 4.8## [72,] 6.1 2.8 4.0## [73,] 6.3 2.5 4.9## [74,] 6.1 2.8 4.7## [75,] 6.4 2.9 4.3## [76,] 6.6 3.0 4.4## [77,] 6.8 2.8 4.8## [78,] 6.7 3.0 5.0## [79,] 6.0 2.9 4.5## [80,] 5.7 2.6 3.5## [81,] 5.5 2.4 3.8## [82,] 5.5 2.4 3.7## [83,] 5.8 2.7 3.9## [84,] 6.0 2.7 5.1## [85,] 5.4 3.0 4.5## [86,] 6.0 3.4 4.5## [87,] 6.7 3.1 4.7## [88,] 6.3 2.3 4.4## [89,] 5.6 3.0 4.1## [90,] 5.5 2.5 4.0## [91,] 5.5 2.6 4.4## [92,] 6.1 3.0 4.6## [93,] 5.8 2.6 4.0## [94,] 5.0 2.3 3.3## [95,] 5.6 2.7 4.2## [96,] 5.7 3.0 4.2## [97,] 5.7 2.9 4.2## [98,] 6.2 2.9 4.3## [99,] 5.1 2.5 3.0## [100,] 5.7 2.8 4.1#### $model.list## $model.list$response## [1] "Y"#### $model.list$variables## [1] "Sepal.Length" "Sepal.Width" "Petal.Length"###### $err.fct## function (x, y)## {## -(y * log(x) + (1 - y) * log(1 - x))## }## <environment: 0x3df4d18>## attr(,"type")## [1] "ce"#### $act.fct## function (x)## {## 1/(1 + exp(-x))## }## <environment: 0x3df4d18>## attr(,"type")## [1] "logistic"#### $linear.output## [1] FALSE#### $data## Sepal.Length Sepal.Width Petal.Length Petal.Width Y## 1 5.1 3.5 1.4 0.2 0## 2 4.9 3.0 1.4 0.2 0## 3 4.7 3.2 1.3 0.2 0## 4 4.6 3.1 1.5 0.2 0## 5 5.0 3.6 1.4 0.2 0## 6 5.4 3.9 1.7 0.4 0## 7 4.6 3.4 1.4 0.3 0## 8 5.0 3.4 1.5 0.2 0## 9 4.4 2.9 1.4 0.2 0## 10 4.9 3.1 1.5 0.1 0## 11 5.4 3.7 1.5 0.2 0## 12 4.8 3.4 1.6 0.2 0## 13 4.8 3.0 1.4 0.1 0## 14 4.3 3.0 1.1 0.1 0## 15 5.8 4.0 1.2 0.2 0## 16 5.7 4.4 1.5 0.4 0## 17 5.4 3.9 1.3 0.4 0## 18 5.1 3.5 1.4 0.3 0## 19 5.7 3.8 1.7 0.3 0## 20 5.1 3.8 1.5 0.3 0## 21 5.4 3.4 1.7 0.2 0## 22 5.1 3.7 1.5 0.4 0## 23 4.6 3.6 1.0 0.2 0## 24 5.1 3.3 1.7 0.5 0## 25 4.8 3.4 1.9 0.2 0## 26 5.0 3.0 1.6 0.2 0## 27 5.0 3.4 1.6 0.4 0## 28 5.2 3.5 1.5 0.2 0## 29 5.2 3.4 1.4 0.2 0## 30 4.7 3.2 1.6 0.2 0## 31 4.8 3.1 1.6 0.2 0## 32 5.4 3.4 1.5 0.4 0## 33 5.2 4.1 1.5 0.1 0## 34 5.5 4.2 1.4 0.2 0## 35 4.9 3.1 1.5 0.2 0## 36 5.0 3.2 1.2 0.2 0## 37 5.5 3.5 1.3 0.2 0## 38 4.9 3.6 1.4 0.1 0## 39 4.4 3.0 1.3 0.2 0## 40 5.1 3.4 1.5 0.2 0## 41 5.0 3.5 1.3 0.3 0## 42 4.5 2.3 1.3 0.3 0## 43 4.4 3.2 1.3 0.2 0## 44 5.0 3.5 1.6 0.6 0## 45 5.1 3.8 1.9 0.4 0## 46 4.8 3.0 1.4 0.3 0## 47 5.1 3.8 1.6 0.2 0## 48 4.6 3.2 1.4 0.2 0## 49 5.3 3.7 1.5 0.2 0## 50 5.0 3.3 1.4 0.2 0## 51 7.0 3.2 4.7 1.4 1## 52 6.4 3.2 4.5 1.5 1## 53 6.9 3.1 4.9 1.5 1## 54 5.5 2.3 4.0 1.3 1## 55 6.5 2.8 4.6 1.5 1## 56 5.7 2.8 4.5 1.3 1## 57 6.3 3.3 4.7 1.6 1## 58 4.9 2.4 3.3 1.0 1## 59 6.6 2.9 4.6 1.3 1## 60 5.2 2.7 3.9 1.4 1## 61 5.0 2.0 3.5 1.0 1## 62 5.9 3.0 4.2 1.5 1## 63 6.0 2.2 4.0 1.0 1## 64 6.1 2.9 4.7 1.4 1## 65 5.6 2.9 3.6 1.3 1## 66 6.7 3.1 4.4 1.4 1## 67 5.6 3.0 4.5 1.5 1## 68 5.8 2.7 4.1 1.0 1## 69 6.2 2.2 4.5 1.5 1## 70 5.6 2.5 3.9 1.1 1## 71 5.9 3.2 4.8 1.8 1## 72 6.1 2.8 4.0 1.3 1## 73 6.3 2.5 4.9 1.5 1## 74 6.1 2.8 4.7 1.2 1## 75 6.4 2.9 4.3 1.3 1## 76 6.6 3.0 4.4 1.4 1## 77 6.8 2.8 4.8 1.4 1## 78 6.7 3.0 5.0 1.7 1## 79 6.0 2.9 4.5 1.5 1## 80 5.7 2.6 3.5 1.0 1## 81 5.5 2.4 3.8 1.1 1## 82 5.5 2.4 3.7 1.0 1## 83 5.8 2.7 3.9 1.2 1## 84 6.0 2.7 5.1 1.6 1## 85 5.4 3.0 4.5 1.5 1## 86 6.0 3.4 4.5 1.6 1## 87 6.7 3.1 4.7 1.5 1## 88 6.3 2.3 4.4 1.3 1## 89 5.6 3.0 4.1 1.3 1## 90 5.5 2.5 4.0 1.3 1## 91 5.5 2.6 4.4 1.2 1## 92 6.1 3.0 4.6 1.4 1## 93 5.8 2.6 4.0 1.2 1## 94 5.0 2.3 3.3 1.0 1## 95 5.6 2.7 4.2 1.3 1## 96 5.7 3.0 4.2 1.2 1## 97 5.7 2.9 4.2 1.3 1## 98 6.2 2.9 4.3 1.3 1## 99 5.1 2.5 3.0 1.1 1## 100 5.7 2.8 4.1 1.3 1#### $net.result## $net.result[[1]]## [,1]## 1 0.0001902465399## 2 0.0001937400344## 3 0.0001907685272## 4 0.0001942042568## 5 0.0001900974977## 6 0.0001901835487## 7 0.0001904980852## 8 0.0001909566905## 9 0.0001960926052## 10 0.0001941452257## 11 0.0001901201323## 12 0.0001917894554## 13 0.0001937554374## 14 0.0001906180408## 15 0.0001898414757## 16 0.0001898394265## 17 0.0001898674167## 18 0.0001902465399## 19 0.0001903777625## 20 0.0001900175522## 21 0.0001931322374## 22 0.0001901239784## 23 0.0001898583854## 24 0.0001950480407## 25 0.0001999448719## 26 0.0002015379261## 27 0.0001917710594## 28 0.0001905454412## 29 0.0001904784614## 30 0.0001946363926## 31 0.0001973266982## 32 0.0001909405371## 33 0.0001898768467## 34 0.0001898449699## 35 0.0001941452257## 36 0.0001903668561## 37 0.0001900673076## 38 0.0001900989532## 39 0.0001921522678## 40 0.0001909519280## 41 0.0001900721284## 42 0.0002472912648## 43 0.0001907851838## 44 0.0001910733185## 45 0.0001914933315## 46 0.0001937554374## 47 0.0001901546986## 48 0.0001914550676## 49 0.0001901212885## 50 0.0001908510805## 51 0.9998081859125## 52 0.9998079662598## 53 0.9998082739344## 54 0.9998082075975## 55 0.9998082659314## 56 0.9998082420926## 57 0.9998081226014## 58 0.9998016062732## 59 0.9998082477377## 60 0.9998073403494## 61 0.9998079343750## 62 0.9998075988752## 63 0.9998082407994## 64 0.9998082688438## 65 0.9997962104727## 66 0.9998079334848## 67 0.9998081598241## 68 0.9998079841643## 69 0.9998082944360## 70 0.9998079163774## 71 0.9998082324175## 72 0.9998074573008## 73 0.9998082965938## 74 0.9998082794508## 75 0.9998080412856## 76 0.9998080648712## 77 0.9998082872716## 78 0.9998082892218## 79 0.9998082110084## 80 0.9998030021356## 81 0.9998078770877## 82 0.9998075726321## 83 0.9998073663112## 84 0.9998082968869## 85 0.9998081584989## 86 0.9998074751835## 87 0.9998082265489## 88 0.9998082879823## 89 0.9998070775844## 90 0.9998080757294## 91 0.9998082585916## 92 0.9998082190747## 93 0.9998079529615## 94 0.9998040890394## 95 0.9998081146502## 96 0.9998075935122## 97 0.9998078488692## 98 0.9998080407645## 99 0.9997493126702## 100 0.9998078044394###### $weights## $weights[[1]]## $weights[[1]][[1]]## [,1] [,2]## [1,] -2.1074346244 0.05546910579## [2,] 0.2462135567 0.18373245143## [3,] -4.6505521573 4.32171562746## [4,] 5.4983925199 -5.36367427099#### $weights[[1]][[2]]## [,1]## [1,] -0.3694244913## [2,] 8.9288026157## [3,] -8.1997863505######## $startweights## $startweights[[1]]## $startweights[[1]][[1]]## [,1] [,2]## [1,] -1.5235668004 -0.3041839236## [2,] 0.5939461876 0.3700188099## [3,] 0.3329503712 0.2670987908## [4,] 1.0630998373 -0.5425200310#### $startweights[[1]][[2]]## [,1]## [1,] 1.2078678060## [2,] 1.1604026157## [3,] 0.7002136495######## $generalized.weights## $generalized.weights[[1]]## [,1] [,2] [,3]## 1 -0.00009838886640216 -0.00979088526779 0.01195569263610## 2 -0.00075153905074655 -0.09038259558786 0.11027006329094## 3 -0.00025996143113501 -0.02188999508892 0.02675459586560## 4 -0.00115698490066368 -0.10071643458977 0.12307450564049## 5 -0.00007358065692022 -0.00631287928195 0.00771493074781## 6 -0.00007691528764301 -0.00832974368828 0.01016730421610## 7 -0.00021562606003909 -0.01559594467960 0.01908055186604## 8 -0.00026205549594560 -0.02628534121893 0.03209585955520## 9 -0.00174334615459830 -0.14328753284157 0.17515591319977## 10 -0.00086992071847555 -0.09958930193036 0.12152834085488## 11 -0.00005751304080539 -0.00685647212882 0.00836545456451## 12 -0.00052292835674044 -0.04548515736404 0.05558267671924## 13 -0.00083874158018731 -0.09066820838868 0.11067076402040## 14 -0.00026771267334515 -0.01836399735446 0.02247558041033## 15 -0.00000270459076026 -0.00035280653561 0.00043029579202## 16 -0.00000321771190492 -0.00030423743219 0.00037160445211## 17 -0.00000994908997315 -0.00095707350831 0.00116889074784## 18 -0.00009838886640216 -0.00979088526779 0.01195569263610## 19 -0.00007273337082565 -0.01288693979807 0.01570133676898## 20 -0.00005275534761084 -0.00445177577918 0.00544102164776## 21 -0.00042638962817000 -0.07664549232104 0.09338039459458## 22 -0.00007736224613062 -0.00693153176784 0.00846887018790## 23 -0.00001202127503910 -0.00074319396868 0.00091032138006## 24 -0.00092459056801916 -0.12018850581189 0.14658830182746## 25 -0.00241960934647655 -0.22910763354675 0.27983657081039## 26 -0.00177445611879005 -0.26480486250997 0.32280835684879## 27 -0.00043644240266971 -0.04512461977270 0.05509139001893## 28 -0.00014789482403494 -0.01675683249165 0.02044921887459## 29 -0.00012774923395629 -0.01520388833518 0.01855011595023## 30 -0.00121487010061903 -0.11056094814922 0.13507040717277## 31 -0.00160035359881464 -0.17127674320163 0.20907237327345## 32 -0.00016083527326819 -0.02598599215164 0.03166983736293## 33 -0.00001526886201110 -0.00117465472525 0.00143650981654## 34 -0.00000489096039998 -0.00043327793556 0.00052940825780## 35 -0.00086992071847555 -0.09958930193036 0.12152834085488## 36 -0.00011878070197932 -0.01259601306277 0.01537624449223## 37 -0.00003703047603898 -0.00563214336518 0.00686538403319## 38 -0.00007957514139119 -0.00634245022839 0.00775454876424## 39 -0.00070751377196543 -0.05376472068164 0.06575532670061## 40 -0.00023683870174765 -0.02619353883267 0.03196879717851## 41 -0.00006439462433624 -0.00572436268453 0.00699427543677## 42 -0.00795805587808780 -1.15396222685487 1.40686768255215## 43 -0.00032221600646058 -0.02223145764431 0.02720778465904## 44 -0.00030088850566410 -0.02897830107288 0.03539149418833## 45 -0.00041399258872639 -0.03868142590090 0.04724962795136## 46 -0.00083874158018731 -0.09066820838868 0.11067076402040## 47 -0.00008842461710671 -0.00764398960445 0.00934125920643## 48 -0.00047102729146614 -0.03774636957077 0.04614873944309## 49 -0.00006414080920435 -0.00687853145509 0.00839633661718## 50 -0.00022779411017594 -0.02384445174073 0.02910928464234## 51 -0.00000976847159206 -0.00260734029241 0.00317371416039## 52 0.00002280919483893 -0.00773947542128 0.00938876096660## 53 -0.00000233397792467 -0.00056456821373 0.00068734008236## 54 0.00001426033988456 -0.00212116262132 0.00256825078970## 55 -0.00000112565449652 -0.00075178379283 0.00091404701739## 56 0.00000918420515676 -0.00131452806719 0.00159138422232## 57 0.00001686390842779 -0.00410045216598 0.00497131682541## 58 0.00243059377510243 -0.15468800790633 0.18643592730564## 59 -0.00000203614715948 -0.00117456129694 0.00142824669261## 60 0.00029619345246650 -0.02246610206858 0.02711199869910## 61 0.00009453941628177 -0.00853517125233 0.01031128773656## 62 0.00012205486315968 -0.01634302358208 0.01978022243334## 63 0.00000109903774828 -0.00133844143915 0.00162541635039## 64 0.00000207766376209 -0.00068644195978 0.00083268938960## 65 0.00319184808587203 -0.27437036448907 0.33137104131305## 66 -0.00000480384230389 -0.00847983587327 0.01030524234345## 67 0.00003106191489319 -0.00324550639687 0.00392388587293## 68 0.00004745197864892 -0.00734227411035 0.00889101822615## 69 -0.00000024671014512 -0.00008889768447 0.00010815511342## 70 0.00006776942524435 -0.00893280017142 0.01081086006538## 71 0.00001095525400672 -0.00154105273004 0.00186550180636## 72 0.00008805816607891 -0.01959831501356 0.02375606386657## 73 -0.00000010600407870 -0.00003876983678 0.00004716735917## 74 0.00000098454157628 -0.00043869451419 0.00053237040454## 75 0.00000744034627973 -0.00598400341163 0.00726547605508## 76 -0.00000178815775103 -0.00542861801094 0.00659641328643## 77 -0.00000129942535343 -0.00025488228106 0.00031046070248## 78 -0.00000067632990572 -0.00020995856951 0.00025549844519## 79 0.00000956802729953 -0.00203820491468 0.00247035403734## 80 0.00105473498773169 -0.12224251457165 0.14786474396306## 81 0.00008029342422249 -0.00985496435083 0.01192349845472## 82 0.00014367952518722 -0.01696801733775 0.02052621906233## 83 0.00015459707992514 -0.02175672735236 0.02633737317092## 84 0.00000003863253231 -0.00003205688488 0.00003892261287## 85 0.00003805551932847 -0.00328168303317 0.00396353028522## 86 0.00016572885936967 -0.01924343747944 0.02327708270158## 87 -0.00000252767020529 -0.00166731457891 0.00202720299272## 88 -0.00000064419771455 -0.00023884019890 0.00029056739438## 89 0.00030957576394924 -0.02855534164573 0.03450166832887## 90 0.00004318825121284 -0.00521057234346 0.00630380666354## 91 0.00000723620756622 -0.00092902428247 0.00112422942051## 92 0.00000763966838564 -0.00184902485597 0.00224170387305## 93 0.00004840800097425 -0.00806805742256 0.00977219481750## 94 0.00136906181116393 -0.09785945959323 0.11804780911401## 95 0.00003520973818599 -0.00429940774894 0.00520173241703## 96 0.00015685005727554 -0.01649398910278 0.01994220350525## 97 0.00009183545424847 -0.01051909919828 0.01272325778035## 98 0.00001973995288783 -0.00600552948469 0.00728405347014## 99 0.01818593048598508 -1.19214176992653 1.43715133422498## 100 0.00009554101026469 -0.01155348923746 0.01397767210716###### $result.matrix## 1## error 0.019334368928## reached.threshold 0.009671954608## steps 90.000000000000## Intercept.to.1layhid1 -2.107434624400## Sepal.Length.to.1layhid1 0.246213556691## Sepal.Width.to.1layhid1 -4.650552157343## Petal.Length.to.1layhid1 5.498392519926## Intercept.to.1layhid2 0.055469105794## Sepal.Length.to.1layhid2 0.183732451433## Sepal.Width.to.1layhid2 4.321715627463## Petal.Length.to.1layhid2 -5.363674270992## Intercept.to.Y -0.369424491273## 1layhid.1.to.Y 8.928802615695## 1layhid.2.to.Y -8.199786350485#### attr(,"class")## [1] "nn"

# visualization# plot.nn(nn)

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 13 / 18

Visualize NN: neuralnet

6.168893.651

21

Petal.Length

-4.93047

-5.76396

Sepal.Width

-0.54117

1.82463Sepal.Length

10.35647

9.005

Y

0.25107

-3.22762

1

-8.48505

1

Error: 0.011453 Steps: 98

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 14 / 18

Support Vector Machine : Spam Data

data(spam, package = "ElemStatLearn")

str(spam)### 'data.frame': 4601 obs. of 58 variables:### $ A.1 : num 0 0.21 0.06 0 0 0 0 0 0.15 0.06 ...### $ A.2 : num 0.64 0.28 0 0 0 0 0 0 0 0.12 ...### ...### $ A.57: int 278 1028 2259 191 191 54 112 49 1257 749 ...### $ spam: Factor w/ 2 levels "email","spam": 2 2 2 2 2 2 2 2 2 2 ...

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 15 / 18

Support Vector Machine : parmeter tunning & fitting

# sampling index of train datatr.idx <- sample(nrow(spam), 0.7*nrow(spam))# tunning parameters# tuningobj <- tune(svm, spam ~ ., data = spam[tr.idx,],

ranges = list(cost = c(0.25, 0.35)))summary(obj)

#### Parameter tuning of 'svm':#### - sampling method: 10-fold cross validation#### - best parameters:## cost## 0.35#### - best performance: 0.08074534161#### - Detailed performance results:## cost error dispersion## 1 0.25 0.08322981366 0.01093591109## 2 0.35 0.08074534161 0.01003660426

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 16 / 18

Support Vector Machine : parmeter tunning & fitting# fit svm modelmodel <- svm(spam ~., data = spam[tr.idx,], cost=obj$best.parameters$cost)summary(model) # coefs

#### Call:## svm(formula = spam ~ ., data = spam[tr.idx, ], cost = obj$best.parameters$cost)###### Parameters:## SVM-Type: C-classification## SVM-Kernel: radial## cost: 0.35## gamma: 0.01754385965#### Number of Support Vectors: 1142#### ( 588 554 )###### Number of Classes: 2#### Levels:## email spam

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 17 / 18

SVM : prediction & performance evaluation

pred <- predict(model, newdata=spam[-tr.idx,])table(spam$spam[-tr.idx], pred)

## pred## email spam## email 823 34## spam 69 455

Jinseog KimDongguk [email protected] ()“데이터 사이언티스트를 위한 DMR-3”“K Nearest Neighbor, Neural Network, Naive Bayesian, SVM”2017-04-11 18 / 18