Tensorflow regression 텐서플로우 회귀

텐서플로우 학습회귀

신림프로그래머, 최범균, 2016-11-25

학습 자료

모두를 위한 딥러닝 (김성훈)

주의

•이 분야 전문가가 아니므로 잘못된 내용이 존재할 수 있습니다.

1. 선형 회귀

선형회귀(linear regression)

•값을 예측할 때 사용

•변수 사이의 관계 분석• 예, 햇볕 시간과 관객수 관계

•데이터에 가장 잘 맞는 선을 찾는 것• 회귀선

선형회귀(linear regression)

* 데이터 예: 헤드퍼스트 통계학

x (햇볕) y (관객 수)

1.9 22

2.5 33

3.2 30

3.8 42

4.7 38

5.5 49

5.9 42

7.2 55

선형회귀

x (햇볕) y (관객 수)

1.9 22

2.5 33

3.2 30

3.8 42

4.7 38

5.5 49

5.9 42

7.2 55

H(𝑥) = 𝒘𝑥 + 𝒃

학습 데이터에 가장 잘 맞는 가중치 w와 상수항 b를 찾기!

비용 함수(cost function, loss function)

•가중치와 상수가 얼마나 데이터에 맞는지 측정

•다양한 비용 함수 존재• 예, 오차 제곱의 평균

• 오른쪽 표: w=10, b=5일 때

x y' (예측) y (실제) 오차 제곱1.9 23.3 22 -1.3 1.692.5 27.5 33 5.5 30.253.2 32.4 30 -2.4 5.763.8 36.6 42 5.4 29.164.7 42.9 38 -4.9 24.015.5 48.5 49 0.5 0.255.9 51.3 42 -9.3 86.497.2 60.4 55 -5.4 29.16

cost 25.84625

𝐶𝑜𝑠𝑡 𝑋 =1

𝑚 𝐻 𝑥𝑖 − 𝑦𝑖

2

비용 함수와 경사 하강법

비용 함수를 미분한 값이최소가 되는 w와 b를 찾음

텐서플로우로 선형회귀 구현하기

# 학습 데이터x_data = [1.9, 2.5, 3.2, 3.8, 4.7, 5.5, 5.9, 7.2]y_data = [22, 33, 30, 42, 38, 49, 42, 55]

import tensorflow as tf

W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))b = tf.Variable(tf.zeros([1]))y = W * x_data + b # 가설

loss = tf.reduce_mean(tf.square(y - y_data)) # 차의 제곱 평균

# 경사 하강법train = tf.train.GradientDescentOptimizer(0.015).minimize(loss)

init = tf.initialize_all_variables()sess = tf.Session()sess.run(init)

print('FIRST', sess.run(loss), sess.run(W), sess.run(b))

for step in range(1800):sess.run(train)if step % 20 == 0:

print(step, sess.run(loss), sess.run(W), sess.run(b))

FIRST 1365.22 [ 0.69029188] [ 0.]0 180.363 [ 5.76034307] [ 1.07642579]20 38.5053 [ 7.97524023] [ 2.60947013]40 35.2595 [ 7.780509] [ 3.57756996]60 32.475 [ 7.60014772] [ 4.47422886]………1420 15.671 [ 5.34873962] [ 15.66702652]1440 15.6709 [ 5.34782982] [ 15.6715498]1460 15.6708 [ 5.34698725] [ 15.67573929]1480 15.6708 [ 5.34620667] [ 15.67961884]1500 15.6707 [ 5.34548378] [ 15.68321228]1520 15.6707 [ 5.3448143] [ 15.6865406]1540 15.6707 [ 5.34419394] [ 15.68962383]1560 15.6706 [ 5.34361982] [ 15.69247818]1580 15.6706 [ 5.34308767] [ 15.69512367]1600 15.6706 [ 5.3425951] [ 15.69757462]1620 15.6706 [ 5.34213877] [ 15.69984341]1640 15.6706 [ 5.34171629] [ 15.7019434]1660 15.6705 [ 5.34132433] [ 15.70388889]

Matplotlib로 결과 출력

import matplotlibmatplotlib.rcParams['font.family'] = 'NanumBarunGothic'

import matplotlib.pyplot as plt

plt.plot(x_data, y_data, 'ro')plt.plot(x_data, sess.run(W) * x_data + sess.run(b))plt.xlabel('햇볕')plt.ylabel('관객수')plt.legend()plt.show()

다항식도 동일

H 𝑥1, 𝑥2, 𝑥3 = 𝑤1 × 𝑥1 + 𝑤2 × 𝑥2 + 𝑤3 × 𝑥3 + 𝑏

𝑤1 𝑤2 𝑤3𝑥1𝑥2𝑥3

+ 𝑏

𝑊 = 𝑤1 𝑤2 𝑤3 X =𝑥1𝑥2𝑥3

𝐻 𝑋 = 𝑊𝑋 + 𝑏

다항식도 동일, b를 w로

H 𝑥1, 𝑥2, 𝑥3 = 𝑤0 × 1 + 𝑤1 × 𝑥1 + 𝑤2 × 𝑥2 + 𝑤3 × 𝑥3

𝐻 𝑥1, 𝑥2, 𝑥3 =

𝑤0𝑤1𝑤2𝑤3

𝑇 1𝑥1𝑥2𝑥3

= 𝑊𝑇𝑋

2. 로지스틱 회귀와 이항형 분류

이항형(binomial) 분류

•둘 중 하나로 분류• 예

• 암 재발/재발아님

•선형 회귀로는 안 됨

0.5

이항형 로지스틱 회귀

•선형 회귀 + 시그모이드 함수이용

https://www.desmos.com/calculator/vfxhwrzho7

𝐻 𝑋 =1

1 + 𝑒−𝑊𝑇𝑋

로지스틱 회귀 비용 함수

•크로스 엔트로피 사용

𝑐𝑜𝑠𝑡 𝑊 =1

𝑚 𝑐 𝐻 𝑥 , 𝑦

𝑐 𝐻 𝑥 , 𝑦 = − log 𝐻 𝑥 ∶ 𝑦 = 1

− log 1 − 𝐻 𝑥 ∶ 𝑦 = 0

𝑐 𝐻 𝑥 , 𝑦 = −𝑦 log 𝐻 𝑥 − 1 − 𝑦 log 1 − 𝐻(𝑥)

https://www.desmos.com/calculator/xm42ktnf29

텐서플로우로 로지스틱 회귀 구현하기

•데이터 예• 아빠가 들려 주는 [통계] 로지스틱 회귀분석 후 ROC 커브 그리기 (https://goo.gl/bxcn4c)

#x0 AGE SEX WT SMOKING CHD1 22 1 60 1 01 23 1 58 1 01 24 1 62 1 01 27 1 67 1 01 28 1 64 1 0

1 30 1 60 1 01 30 1 65 1 01 32 1 58 1 01 32 1 72 1 01 35 1 64 2 11 38 2 56 1 01 40 2 46 2 01 41 2 58 2 1

1 46 2 72 1 01 47 2 63 1 0

1 48 2 60 1 01 49 2 48 2 11 49 2 50 1 01 50 2 58 2 11 51 2 62 1 0

https://goo.gl/bxcn4c


* https://github.com/FuZer/Study_TensorFlow 참고

import tensorflow as tfimport numpy as np

xy = np.loadtxt('logistic_train2.txt', unpack=True, dtype='float32')x_data = xy[0:-1] # [5, 20]y_data = xy[-1] # [1, 20]

X = tf.placeholder(tf.float32)Y = tf.placeholder(tf.float32)

W = tf.Variable(tf.random_uniform([1, len(x_data)], -1.0, 1.0)) # [1, 5]

h = tf.matmul(W, X) # [1, 20]hypothesis = tf.sigmoid(h)

cost = -tf.reduce_mean(Y * tf.log(hypothesis) + (1 - Y) * tf.log(1 - hypothesis))

a = tf.Variable(0.0015) # learning rateoptimizer = tf.train.GradientDescentOptimizer(a)train = optimizer.minimize(cost) # goal is minimize cost

𝑐𝑜𝑠𝑡 𝑊 =1

𝑚 𝑐 𝐻𝑦𝑝𝑜 𝑥 , 𝑦

𝑐 𝐻 𝑥 , 𝑦 = −𝑦 log 𝐻 𝑥 − 1 − 𝑦 log 1 − 𝐻(𝑥)

𝐻𝑦𝑝𝑜 𝑋 =1

1 + 𝑒−𝑊𝑇𝑋

https://github.com/FuZer/Study_TensorFlow

* https://github.com/FuZer/Study_TensorFlow 참고

init = tf.initialize_all_variables()

sess = tf.Session()sess.run(init)

for step in range(20001): # 10,000-60,000까지 변경해가며 해 봤음sess.run(train, feed_dict={X: x_data, Y: y_data})if step % 200 == 0:

print (step, sess.run(cost, feed_dict={X: x_data, Y: y_data}), sess.run(W))

print('-----------------------------------------')print(sess.run(hypothesis, feed_dict={X: [[1], [35], [1], [64], [2]]}) > 0.5)

0.199018 [[-0.69001538 0.07054905 -1.43649220 -0.09584059 3.13020134]]0.195910 [[-0.53398216 0.02894925 -0.96797615 -0.09904154 3.71121287]]


https://github.com/FuZer/Study_TensorFlow

테스트 결과

가중치-0.69001538 0.07054905 -1.43649220-0.095840593.13020134

#x0 AGE SEX WT SMOKING CHD 회귀 결과 값1 22 1 60 1 0 0.0394 1 23 1 58 1 0 0.0506 1 24 1 62 1 0 0.0375 1 27 1 67 1 0 0.0290 1 28 1 64 1 0 0.0409 1 30 1 60 1 0 0.0672 1 30 1 65 1 0 0.0427 1 32 1 58 1 0 0.0913 1 32 1 72 1 0 0.0256 1 35 1 64 2 1 0.6152 1 38 2 56 1 0 0.0423 1 40 2 46 2 0 0.7523 1 41 2 58 2 1 0.5078 1 46 2 72 1 0 0.0165 1 47 2 63 1 0 0.0409 1 48 2 60 1 0 0.0575 1 49 2 48 2 1 0.8255 1 49 2 50 1 0 0.1458 1 50 2 58 2 1 0.6606 1 51 2 62 1 0 0.0586

3. 다항 분류와 소프트맥스

소프트맥스와 다항(multinomial) 분류

•기본 아이디어 각 분류별로 값을 구한 뒤 확률로 변환

𝑤𝑎1 𝑤𝑎2 𝑤𝑎3

𝑤𝑏1 𝑤𝑏2 𝑤𝑏3

𝑤𝑐1 𝑤𝑐2 𝑤𝑐3

×𝑥1𝑥2𝑥3

=

𝑦1𝑦2𝑦3

𝑝1𝑝2𝑝3

소프트맥스

𝑆 𝑦𝑖 =𝑒𝑦𝑗

𝑗 𝑒𝑦𝑗

p1 + p2 + p3 = 1.0

소프트맥스 비용함수

•코로스 엔트로피

𝐷 𝑆, 𝐿 = −

𝑖

𝐿𝑖 log 𝑆𝑖

𝑝1𝑝2𝑝3

𝑦1𝑦2𝑦3

학습데이터 소프트맥스로구한 확률

𝐿𝑜𝑠𝑠 =1

𝑚

𝑖

𝐷 𝑆(𝑊𝑋𝑖 , 𝐿𝑖)

* 1개 데이터에 대한 값 * 전체 학습 데이터에 대한 값

텐서플로우로 소프트맥스 다항 분류 구현하기

xy = np.loadtxt('softmax_train.txt', unpack=False, dtype='float32')x_data = xy[:, :2] # [8, 2]y_data = xy[:, 2:] # [8, 3]

X = tf.placeholder(tf.float32, [None, 2]) # [n, 2]Y = tf.placeholder(tf.float32, [None, 3]) # [n, 3]

W = tf.Variable(tf.zeros([2, 3]))b = tf.Variable(tf.zeros([3]))h = tf.nn.softmax(tf.matmul(X, W) + b) # [8, 3], h = softmax(XW + b)

cost = tf.reduce_mean(-tf.reduce_sum(Y*tf.log(h), reduction_indices=1))

optimizer = tf.train.GradientDescentOptimizer(0.001).minimize(cost)

init = tf.initialize_all_variables()sess = tf.Session()sess.run(init)

for step in range(2001):sess.run(optimizer, feed_dict={X:x_data, Y:y_data})if step % 200 == 0:

print (step, sess.run(cost, feed_dict={X: x_data, Y: y_data}), sess.run(W), sess.run(b))

텐서플로우로 소프트맥스 다항 분류 구현하기

a = sess.run(h, feed_dict={X: [[11, 7]]})print ("a :", a, sess.run(tf.arg_max(a, 1)))

b = sess.run(h, feed_dict={X: [[3, 4]]})print ("b :", b, sess.run(tf.arg_max(b, 1)))

c = sess.run(h, feed_dict={X: [[1, 0]]})print ("c :", c, sess.run(tf.arg_max(c, 1)))

a : [[ 0.46272621 0.35483006 0.18244369]] [0]

b : [[ 0.33820099 0.42101386 0.24078514]] [1]

c : [[ 0.27002314 0.29085544 0.4391214 ]] [2]

참고

•온라인 그래프 출력• https://www.desmos.com/calculator

•모두를 위한 딥러닝 (김성훈 교수님)• https://hunkim.github.io/ml/

•실습한 코드• https://github.com/madvirus/tfstudy

https://www.desmos.com/calculator

https://hunkim.github.io/ml/

https://github.com/madvirus/tfstudy

참고, Anaconda로 설치

• anaconda + jupyter

• 환경 생성• conda create -n tensorflow python=3.5

• 생성한 환경에서 텐서플로우와 관련 라이브러리 설치• source activate tensorflow• pip install tensorflow• conda install matplotlib• conda install seaborn

• conda install notebook ipykernel• ipython kernel install --user

• 참고• Installation Quickstart: TensorFlow, Anaconda, Jupyte (https://goo.gl/kSeZKI)

https://goo.gl/kSeZKI

Technology

Tensorflow regression 텐서플로우 회귀