111
누구나 TensorFlow! J. Kang Ph.D. 누구나 TensorFlow - Module1 : 준비하기 Jaewook Kang, Ph.D. [email protected] Sep. 2017 1 © 2017 Jaewook Kang All Rights Reserved

[Tf2017] day1 jwkang_pub

Embed Size (px)

Citation preview

Page 1: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

누구나 TensorFlow- Module1 : 준비하기

Jaewook Kang, [email protected]

Sep. 2017

1

© 2017Jaewook KangAll Rights Reserved

Page 2: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

GIST EEC Ph.D. (2015)

신호처리과학자, 삽질러

좋아하는것:

통계적신호처리 / 무선통신신호처리

임베디드오디오 DSP C/C++라이브러리구현

머신러닝기반오디오신호처리알고리즘

공부해서남주기

2

대표논문:Jaewook Kang, et al., "Bayesian Hypothesis Test using Nonparametric Belief Propagation for Noisy Sparse Recovery," IEEE Trans. on Signal process., Feb. 2015

Jaewook Kang et al., "Fast Signal Separation of 2D Sparse Mixture via Approximate Message-Passing," IEEE Signal Processing Letters, Nov. 2015

Jaewook Kang (강재욱)

소개

Page 3: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

학계와산업계의거리

• 학계는 19세기시스템• 학회와논문• 리뷰프로세스와퍼브리쉬• CS 박사학위기간 5년이상

• 산업계는 21세기시스템• 온라인 커뮤니티 & 컨트리뷰터• 오픈소스 + 개방형플랫폼• 블로그

3

Page 4: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

학계와산업계의거리

• 무엇이두사이의연결고리가될수있는가?

• 학계• 새로운발견은대부분서양학계에서나온다

• 무선통신, 딥러닝

• 한우물정신• 실패와도전

• 산업계• 빠른사이클• 머니머니머니: 현실세계에서효용 == 머니• 최대한많은상호작용

4

Page 5: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

현대과학의연구

• 나는연결고리를찾아서스타트업이라는정글로뛰어들었다.

• 논문 peer 리뷰보다빠른고객피드백

• 논문인용수보다매출

• 논문보다블로그 (블로그보다동영상)

• 특허보다 one more like!

5

Page 6: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

현대과학의연구

• 여러분은? 어떤연결고리를찾으실건가요?

6

Page 7: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Page 8: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

• 2년전무엇을느꼈는가?

• 2년이지난이후무엇이벌어지고있는가?

• 무엇을준비해야하는가?

• 수학

• 코딩

• 머신러닝

8

Page 9: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

• 2년전무엇을느꼈는가?

• 2년이지난이후무엇이벌어지고있는가?

• 무엇을준비해야하는가?

• 수학: 세상을설명하는가장보편적인진리

• 코딩: 디지털문맹이되지말자

• 머신러닝: 나대신일해주는시스템

9

Page 10: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

누구나 TensorFlow의목표

강의철학1: 누구나 “머신러닝”할수있다!!

1) 머신러닝의기본개념을수식머리로이해한다.

2) 머신러닝기본기법을 TensorFlow로구현할수있다.

3) 수집한데이터를파이썬을통해서다룰수있다.

4) 내문제에머신러닝적용할수있다.

5) 스스로머신러닝을더연구해보고싶다.

10

Page 11: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

누구나 TensorFlow의목표

강의철학2: 당신은이강의에서모든것을얻어갈수없다!

1) 머신러닝의기본개념을수식머리로이해한다.

2) 머신러닝기본기법을 TensorFlow로구현할수있다.

3) 수집한데이터를파이썬을통해서다룰수있다.

4) 내문제에머신러닝적용할수있다.

5) 스스로머신러닝을더연구해보고싶다.

11

Page 12: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

누구나 TensorFlow의목표

강의대상: 머신러닝을시작하는모든사람!!

1) 머신러닝에관심을가지려고하는대학교 3학년

2) 머신러닝을배워서취업면접에서한마디라도더하고싶은취준생

3) 교수님이머신러닝을이용해보라고하는데막막한대학원생

4) 최소한으로공부해서머신러닝을적용해보고싶은주니어개발자!

12

Page 13: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

강의일정

13

일정 목표시간

세부내용

Module 1

Python 준비하기 4

- python / anaconda /pyCharm 설치확인- LAB0: python 개발환경설치- python, numpy / pandas 기본- LAB1: python 예제실습

머신러닝가족에게설명하기 4

- 머신러닝이란?- 왜머신러닝인가?- 최근에머신러닝이각광받는이유- 머신러닝의종류- 머신러닝의 trade-off

Module

2

꼭알아야할수리개념 2- 확률이론기본- 조건부확률 / 베이지안룰- 선형대수기본

TensorFlow 준비하기 3

- 왜 TensorFlow인가? + TensorFlow 설치- LAB2: Helloworld in TensorFlow- TensorFlow 기본요소- LAB3: TensorFlow 기본예제 line-fitting- ML training in Tensorflow

선형예측모델만들기Linear Regression

3- Linear Basis function model- Maximum likelihood and least square- LAB4: TensorFlow curve-fitting

Page 14: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

강의일정

14

일정 목표시간

세부내용

Module3

직선으로데이터구분하기Logistic classification

- Introduction to Linear Classification- Naïve Bayes (NB)- Linear Discriminent Analysis (LDA)- Logistic Regression (LR)- NB vs LDA vs LR- LAB5: Linear Classication in TensorFlow

Module

4딥러닝의선조뉴럴네트워크

4

- 뉴런을 수학으로 표현하기- Feed-Forward Neural Networks- Linear 뉴런의 한계와 Activation 함수- Gradient descent Revisit- Backpropagation algorithm- LAB6: Two-layer neural net with Backpropagation in

TensorFlow

Page 15: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

강의일정

15

일정 목표시간

세부내용

Module5

딥러닝으로

- Neural Network 의역사 (~1990s)- 딥러닝이전인공신경망의한계- 딥러닝의열쇠 (2000s)- Modern deep Neural Networks (2010s~)- Convolutional Neural Net- LAB7: CNN in Tensorflow- Recurrent Neural Net- LAB8: RNN in Tensorflow- Generative Adversary Net- LAB9: GAN in Tensorflow

Page 16: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

GitHub link

GitHub link (all public)– https://github.com/jwkanggist/EveryBodyTensorFlow

16

Page 17: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

1. Python 준비하기

1. python/ anaconda /pyCharm 설치2. LAB0: python 개발환경구축3. python, numpy / pandas.DataFrame기본실습4.LAB1: python 실습 : 계단오르기문제

연구자들이여 MATLAB에서벗어나라!!

17

Page 18: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Reference : Scientific computing in python

18

Python for Data Analysis

1st Edition, 2013 O’Reilly

Wes McKinney

Page 19: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Why Python for Scientific Computing ?

높은생산성 & 개방성! Script language + Concise Grammar Easy to learn and produce

License-free +extensive open libraries: numpy, scipy, pandas, tensorflow…

19

helloWorld.java helloWorld.py

public class Main{

public static void main(String[] args){System.out.println("hello world");

}}

print ‘hello world’

Page 20: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Why Python for Scientific Computing ?

높은생산성 & 개방성! Script language + Concise Grammar Easy to learn and produce

License-free +extensive open libraries: numpy, scipy, pandas, tensorflow…

호환성 & 통합성 Very good compatibility to other languages: cython, jython,…

For further your interest, visit this link.

20

MATLAB Python

- MathWork (1984)- 공학시뮬레이션- 기본엔진 + 툴박스- 행렬계산효율성매우좋음- 폐쇄적- 제한적호환성- 유료!!

- 파이썬소프트웨어재단 (1991)- 범용성 / 간결한문법/ 가독성- 방대한 3rd party 라이브러리- 개방적 + 호환성 + 이식성

good- 모든게무료!!

Page 21: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python설치

우분투/ OSX: 기본적으로설치되어있음– 이것을 “system 파이썬”이라고가급적다루지않는것이좋음

– 파이썬가상환경을구성하여개발해야함

– pyenv + virtualenv : 파이썬버전별설치 / 패키지관리

21

[이미지출처] https://www.extramile.io/blog/how-to-setup-multiple-python-versions-with-pyenv/

Page 22: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python설치

우분투/ OSX: 기본적으로설치되어있음– 이것을 “system 파이썬”이라고가급적다루지않는것이좋음

– 파이썬가상환경을구성하여개발해야함

– pyenv + virtualenv : 파이썬버전별설치 / 패키지관리

22

[이미지출처] http://qiita.com/hedgehoCrow/items/0733c63c690450b14dcf

Page 23: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python설치

우분투/ OSX: 기본적으로설치되어있음– 이것을 “system 파이썬”이라고가급적다루지않는것이좋음

– 파이썬가상환경을구성하여개발해야함

– pyenv + virtualenv : 파이썬버전별설치 / 패키지관리

– autoenv : 폴터별가상환경구축

Win10: 운영체제에맞춰서인스톨러를다운받아서설치• https://www.python.org/downloads/windows/

• Win10에서는 python 3.5 이상만 Tensorflow와호환됨

• 관리자권한으로실행!

23

Page 24: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

pyenv 환경설치 (OSX)

To Enable to install various versions of the pythons

To Resolve “version dependency” of python

24

OSX: shell commands

# step Xcode 최신버전설치

# step II brew로 pyenv설치$ brew update $ brew install pyenv$ brew install pyenv-virtualenv

# pyenv 환경변수설정$ echo ‘“export PATH=”$~/.pyenv/shims:$PATH”’ >> ~/.bash_profile$ echo 'eval "$(pyenv init -)"' >> ~/.bash_profile$ echo 'eval "$(pyenv virtualenv-init -)">> ~/.bash_profile

Page 25: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

pyenv 환경설치 (Ubuntu)

다양한파이썬버전을설치하고사용할수있다.

파이썬버전에대한의존성을해결

25

Ubuntu 16.04: shell commands

# 필요패키지사전설치$sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev \libreadline-dev libsqlite3-dev wget curl llvm#pyenv 소스다운받기위해서 git 인스톨$sudo apt-get install git#pyenv다운받아서인스톨$curl -L https://raw.githubusercontent.com/yyuu/pyenv-installer/master/bin/pyenv-installer | bash #환경변수에 pyenv 추가$ vim ~/.bashrc#--------------------------------------------------------#아래내용추가#-------------------------------------------------------export PATH="$HOME/.pyenv/bin:$PATH"eval "$(pyenv init -)"eval "$(pyenv virtualenv-init -)

Page 26: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

pyenv 환경설치 (win10)

Pyenv officially do not support windows OS.

26

Page 27: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Pyenv-virtualenv사용 (ubuntu/OSX only)

Python-Env is implemented by pip-packages– 패키지는 pip에의해서관리된다. / 패키지는버전을가진다.

– 패키지호환성이버전에의해서결정된다.

– 안정적개발을위해서프로젝트 / end-타켓별로다른개발환경이필요하다

Use of Pyenv-vertualenv

– 하나의 local-PC에다양한파이썬환경을구축하고사용가능

– pip패키지들에대한의존성을해결

– 같은버전의파이썬을다른종류의패키지구성으로사용가능

27

Page 28: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Pyenv-virtualenv사용 (ubuntu/OSX only)

Pyenv : 사용파이썬버전을구분

Pyenv-virtualenv: 파이썬버전 + 설치된 pip패키지를구분

28

Purpose Shell commands

Check installable version list $ pyenv install –list

Install a specific python version $ pyenv install [ python version name]

Check installed python version in your local $ pyenv versions

Switch to a python version you want to use $ pyenv shell [installed version name]

Purpose Shell commands

Install a specific python version $ pyenv eirtualenv [python version] [user-specifiedvirtualenv name]Ex)$ pyenv virtualenv 2.7.12 tensorflowenv

Check your virtualenv in your local $ pyenv versions

Activate virtualenv $ source activate [user-specified virtualenv name]

Deactivate virtualenv $source deactivate [user-specified virtualenv name]

Page 29: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

- Interpreter- Programming language- Standard library

- Efficient N-dimensional array data structure- Fast / flexible matrix&vector computing- Very good compatibility to C/C++ /Fortran

- Basic pip packages for python scientific computing

- Packages for machine learning extension

Python Scientific Computing Ecosystem

29

Page 30: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python Scientific Computing Ecosystem

30

Package name

Description

numpy Based N-dimensional array package

scipy Fundamental library for scientific computing

Ipython Enhanced python interactive console (like MATLAB command window)

matplotlib Comprehensive 2D plotting

scikit-learn Providing library of off-the-shelf machine learning algorithms (higher-level)

tensorflow Providing library of low-level machine learning algorithm

Pandas Data analysis framework: Easy data manipulation / importing /exporting

Page 31: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python Scientific Computing Ecosystem

31

모든게다귀찮다면?

Page 32: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python Scientific Computing Ecosystem

32

모든게다귀찮다면? Anaconda!

–뱀이야기인가요 ?

VS ?

Page 33: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python Scientific Computing Ecosystem

33

Platform name Description

Python Interpreter + standard library

Anaconda Interpreter + standard libraries +Most of the scientific computing libraries

모든게다귀찮다면? Anaconda!– 아니요 -_-;;

– Python + 관련패키지통합 distribution !!

– For window users:Anaconda3 4.4 with python 3.6설치

• https://www.anaconda.com/download/#windows

<

Page 34: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

PyCharm IDE 설치JetBrain에서제공하는파이썬개발툴 (IDE)

– Git 지원

– Pip 패키지관리지원

– IPython console지원

– Debugger 지원

– Terminal 지원 (ubunto/OSX only)

Ubunto /OSX/win 설치– https://www.jetbrains.com/pycharm/download/#section=mac

– 반드시 community version으로설치할것

– 설치경로에한글이없어야함!

34

Page 35: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

LAB0: python개발환경구축설치

목표: 파이썬을다루기위한기본개발환경설치

Ubunto / OSX 사용자– Pyenv / pyenv-virtualenv 설치

– 파이썬개발을위한가상환경만들어보기

– Anaconda3 설치

– PyCharm 커뮤니티버전설치

– 가상환경생성후 Pip 명령어를이용한파이썬패키지설치확인• $ pip install numpy scipy matplotlib pandas scikit-learn

• $ pip list

Window 사용자– anaconda3 설치

– pyCharm 커뮤니티버전설치

– Pycharm IDE에서 File Settings Project Interpreter 에서 anaconda3로설정

35

Page 36: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

LAB0: python개발환경구축설치

더자세하게는아래파일참조– 우분투:

https://docs.google.com/document/d/1DXncq3t9_UezhtiD_1-yWAYqGsnvem8L2TdU2CpeQZc/edit#heading=h.z0rg9qs9rz4s

– OSX:

https://docs.google.com/document/d/1unqQ8HziRJcZyPiRVX_gMg6eQh65tERM-jhNOIHe44M/edit#heading=h.11whz3pkb7w4

36

Page 37: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Jupyter NotebookQ) 설치과정이너무복잡하고어렵습니다. 더쉬운방법은없나요 ?

– Jupyter Notebook을사용해보세요

– 본강의에서는생략합니다

37

Page 38: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

1. Python 준비하기

1. python 2.7/ anaconda2 /pyCharm 설치2. LAB0: python 개발환경구축3. python, numpy / pandas.DataFrame기본실습4. LAB1: python 실습 : Random Walk 문제

38

Page 39: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

GitHub Repository

모든소스코드는 GitHub를통해서공유됩니다.– https://github.com/jwkanggist/EveryBodyTensorFlow

39

Page 40: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python - Hello world파이썬기본문법

– 일단 helloworld부터찍고보기

40

Ex1) helloworld.py

$ vim helloworld.pyprint ‘Hello world’print (‘Hello world!!’)

$ python helloworld.pyHello worldHello world!!

Page 41: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python- package Import파이썬기본문법

– 파이썬위대함 1: 생각할수있는대부분의 API가오픈소스로존재한다.

– 파이썬위대함 2: pip + import의조합이그런오픈소스API 사용이매우쉽다.

41

Ex2) Tensorflow pip 설치후 import 하기

$ pip install tensorflow[Installing Tensorflow …]

$ python >>> import tensorflow as tf

Page 42: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python - package Import파이썬기본문법

– 파이썬위대함 1: 생각할수있는대부분의 API가오픈소스패키지로존재한다.

– 파이썬위대함 2: pip + import의조합이그런오픈소스패키지사용이매우쉽게한다.

– 파이썬위대함 3: 오픈소스패키지의완성도가매우높다.

42

Ex3) pandas install 후 import 해서사용

$ pip install numpy scipy pandas matplotlib[Installing…]

>>> import numpy as np>>> import scipy as sp>>> import pandas as pd>>> import matplotlib.pyplot as plt

Page 43: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python- List파이썬기본문법

– 순서가있는값들의나열

– 한가지 / 여러가지타입으로이루어진리스트구성가능

– 0부터시작하는인덱스로브라우징가능

43

Ex4) 리스트관련 example

>>> lst = ['a', 3.14, 55555, 'abcdef', ['a', 'b', 'c']]>>> lst['a', 3.14, 55555, 'abcdef', ['a', 'b', 'c']]>>> lst = [3.14132323, 3535242, 'abc', "def", 'color']>>> lst[3.14132323, 3535242, 'abc', 'def', 'color']

Page 44: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python - List파이썬기본문법

– 대입하는경우리스트의복사가아니고참조함

– 다양한멤버함수를가지고있음: append(), insert(),remove(), sort(), split()

44

Ex4) 리스트관련 example

>>> a= ['adfa','123','tda','114']>>> a['adfa', '123', 'tda', '114']>>> b=a>>> b['adfa', '123', 'tda', '114']>>> b[0]='0'>>> a['0', '123', 'tda', '114']

Page 45: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python- For statement파이썬기본문법

– 1) range()과함께사용

45

Ex5) examplefor.py

$ vim examplefor.py#-*- coding: utf-8 -*-for i in range (0,5):

print ‘ printing number: %s ‘ % i

$ python examplefor.pyprinting number: 0printing number: 1printing number: 2printing number: 3printing number: 4

Page 46: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python- For statement파이썬기본문법

– 1) range()과함께사용

– 2) list와함께사용

46

Ex6) examplefor2.py

$ vim examplefor2.py#-*- coding: utf-8 -*-namelist = [ ‘kim’,’park’,’lee’,’kang’]for name in namelist:

print ‘ printing name: %s ‘ % name

$ python examplefor2.pyprinting name: kimprinting name: parkprinting name: leeprinting name: kang

Page 47: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python- If-elif-else statement파이썬기본문법

– 다양한부등호와함께사용: ==, !=, <, >…

– is, not, and, or의 논리연산사용

47

Ex7-1) exampleif.py

$ vim exampleif.py#-*- coding: utf-8 -*-

number = 0

if number == 0:print ‘zero’

elif number == 1:print ‘one’

else:print ‘any’

Page 48: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python - If-elif-else statement파이썬기본문법

– 다양한부등호와함께사용: ==, !=, <, >…

– is, not, and, or의 논리연산사용

48

Ex7-2) exampleif2.py

$ vim exampleif2.py#-*- coding: utf-8 -*-

number = 0number2 = 1

if (number == 0) and (number2 == 1):print ‘Both True!!’

elif (number == 0) or (number2 == 1):print ‘one of both is True’

else:print ‘All False’

Page 49: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Python - user-defined func파이썬기본문법

– “def” 키워드를가지고정의할수있다.

49

Ex8) examplefunc.py

$ vim examplefunc.py#-*- coding: utf-8 -*-def myFuncPrint (keyword, numOfPrint):

for i in range(0,numOfPrint):print “ The %s-th printing of %s” % (i+1,keyword)

myFuncPrint (‘JaewookKang’, 5)

% python examplefunc.pyThe 1-th printing of JaewookKangThe 2-th printing of JaewookKangThe 3-th printing of JaewookKangThe 4-th printing of JaewookKangThe 5-th printing of JaewookKang

Page 50: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Numpy: Numerical Python

벡터&행렬연산에필요한 N차원배열을다루기위한도구– 효율적인 N차원배열자료구조: ndarray

– 다양한 ndarray 기반수학함수 API

– 간편한메모리 to ndarray 인터페이스

Ndarray vs. List

– 벡터&행렬연산에는 ”ndarray”가단언컨데유리함!

50

ndarray List

메모리사용특징

- 한주소값으로부터연속적으로메모리에 elem값저장

Elem값이이산적인주소값에저장

연산특징 sequential한 elem값연산에유리

Elem 단위연산에유리

구현 Array list (C/C++ array) Linked list

Page 51: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Numpy: Numerical Python

Numpy에서꼭알아야할것– ndarray 생성

– ndarray 인덱싱

– ndarray 유니버설함수

51

Page 52: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

ndarray 생성

Practice1: List로부터생성

52

Ex9) ex_ndarray1.py

$ python >>> import numpy as np>>> templist = [1,2,3,4,5,6]>>> templist[1,2,3,4,5,6]>>> nptemplist = np.array(templist)>>> nptemplistarray ([1 ,2 ,3 ,4 ,5 ,6 ])

>>> whosVariable Type Data/Info---------------------------------np module <module 'numpy' from …..nptemplist ndarray 5: 5 elems, type `int64`, 40 bytessys…..templist list n=5…

Page 53: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

ndarray 생성

Practice2: List로부터생성 + dtype설정

53

Ex10) ex_ndarray2.py

$ python >>> import numpy as np>>> templist = [1,2,3,4,5,6]>>> templist[1,2,3,4,5,6]>>> nptemplist = np.array( templist, dtype = np.int8 )>>> nptemplistarray ([1 ,2 ,3 ,4 ,5 ,6 ])

>>> whosVariable Type Data/Info---------------------------------np module <module 'numpy' from …..nptemplist ndarray 5: 5 elems, type `int8`, 40 bytessys…..templist list n=5…

Page 54: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

ndarray 생성

Practice3: all zeros/ones 벡터&행렬생성

54

Ex11) ex_zeros.py

$ python >>> import numpy as np>>>np.zeros(1)array([ 0.])>>> np.zeros(2)array([ 0., 0.])>>> np.zeros([2,1])array([[ 0.], [ 0.]])

>>> np.zeros([2,2])array([[ 0., 0.], [ 0., 0.]])

Page 55: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

ndarray 생성

Practice3: all zeros/ones 벡터&행렬생성

55

Ex12) ex_ones.py

$ python >>> import numpy as np>>>np.ones(1)array([ 1.])>>> np.ones(2)array([ 1., 1.])>>> np.ones([2,1])array([[ 1.], [ 1.]])

>>> np.ones([2,2])array([[ 1., 1.], [ 1., 1.]])

Page 56: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

ndarray 인덱싱

Practice4: ndarray 의인덱싱은참조를사용한다.– 슬라이싱기능: “:”를가지고인덱싱하는것

– 리스트와의중요한차이점은 slice은원본배열의 view라는점

56

Ex13) ex_indexing.py

>>> arr = np.arange(10)>>> arrarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])>>>arr[5]5>>> arr[5:8]array([5, 6, 7])

>>> arr[5:8] = 12 >>> arrarray([ 0, 1, 2, 3, 4, 12, 12, 12, 8, 9])

Page 57: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

ndarray 인덱싱

Practice5: ndarray 의인덱싱은참조를사용한다.– 슬라이싱기능: “:”를가지고인덱싱하는것

– 리스트와의중요한차이점은 slice은원본배열의 view라는점

– ndarray의복사복을얻고싶다면 arr[5:8].copy()를사용

57

Ex14) ex_slicing.py

>>> arr_slice = arr[5:8]>>> arr_slice[1] = 12345>>> arrarray([ 0, 1, 2, 3, 4, 12, 12345, 12, 8, 9])>>> arr_slice[:] = 128>>> arrarray([ 0, 1, 2, 3, 4, 128, 128, 128, 8, 9])

Page 58: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

ndarray 유니버설 함수

Practice6: ndarray안에있는데이터원소별로연산을수행하는함수

58

함수 설명

abs, fabs 각원소의절대값을구한다. 복소수가아닌경우에는빠른연산을위해 fabs를사용한다.

sqrt 각원소의제곱근을계산한다.

square 각원소의제곱을계산한다.

Log, log10, log2 각원소의지수 exp^x를계산한다.

ceil 각원소의소수자리를올린다.

floor 각원소의소수자리를버린다.

modf 각원소의몫과나머지를각각의배열로반환한다.

Isnan, isinf 각원소의 nan/inf 상태를불리안값을가지고확인한다.

Cos, cosh, sin, sinh, tan, tanh 각원소의삼각/쌍곡함수

Page 59: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

ndarray 유니버설 함수

Practice6: ndarray안에있는데이터원소별로연산을수행하는함수

59

함수 설명

Add 두 ndarray를원소단위로덧셈한다.

Substract 두 ndarray를원소단위로뺄셈한다.

Multiply 두 ndarray를원소단위로나눗셈한다.

Divide, floor_divide 첫번째배열의원소에서두번째배열의원소를나눈다. Floor_divide는몫만취한다.

Power 첫번째배열의원소에서두번쨰배열의 원소만큼제곱한다.

Maximum, fmax 두중큰값을반환 fmax는 NaN은무시한다.

Minimum, fmin 두중작은값을반환 fmin은 NaN은무시한다.

Logical_and, logical_or, logical_xor 각각두원소간의논리연산결과를반환한다.

Mod 첫번째배열원소에두번째배열원소로나눈나머지값을반환한다.

Page 60: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

LAB1: Random walk문제

계단을 0층부터시작하여같은확률로계단을한칸올라가거나내려갈수있다.

시행횟수에따른계단층수를시뮬레이션해보자

– 초기위치: position = 0

– 시행횟수: step = 10

random walk 값이음수가되는값만출력해보자

Numpy를사용해서구현해보자

– np.where()

– np.random.randint(2,size=1)

– np.cumsum()

– For문 IF문사용

참고 github link: https://github.com/jwkanggist/EveryBodyTensorFlow/blob/master/lab1_randomwalk.py

60

Page 61: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

1. 머신러닝 가족에게 설명하기

- 머신러닝이란?- 최근에머신러닝이각광받는이유- 머신러닝의 3대요소- 머신러닝의종류- 머신러닝에서의 Trade-off

61

Page 62: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Reference : Machine learning fundamental

62

Hands-On Machine LearningWith Scikit-Learn & TensorFlow

2017 Aurelien Geron, Google

O’Reilly

본강의는해당교재를참고했음을사전에알립니다.

Page 63: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

What Machine Learning?

63

머신러닝은무엇인가?

Page 64: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

What Machine Learning?

64

머신러닝은무엇인가?

– 컴퓨터가지능을갖는것? 인공지능? AlphaGo?

– 데이터를통해서예측하는것?

– 컴퓨터에게알려주는거?

Page 65: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

What Machine Learning?

65

Definition (Machine Learning):Field of study that gives computers the ability to learn without being explicitly programmed.

- Arthur Samuel, 1959

머신러닝은무엇인가?

– 컴퓨터가지능을갖는것? 인공지능? AlphaGo?

– 데이터를통해서예측하는것?

– 컴퓨터에게알려주는거?

Page 66: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

What Machine Learning?

66

머신러닝은무엇인가?

– 컴퓨터가지능을갖는것? 인공지능? AlphaGo?

– 데이터를통해서예측하는것?

– 컴퓨터에게알려주는거?

Definition (Machine Learning):A computer program is said to learn rom experience E with respect to some task T and some performance measure P, if its performance on T, measured by P, is improved with experience E.

- Tom Mitchell, 1997

Page 67: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

What Machine Learning?

67

머신러닝은무엇인가?

– 임의의업무생산성/정확성/능률 (P) 의척도대해서데이터라는경험 (E) 을통해서컴퓨터가업무 (T) 를볼수있도록훈련시키는일

Definition (Machine Learning):A computer program is said to learn rom experience E with respect to some task T and some performance measure P, if its performance on T, measured by P, is improved with experience E.

- Tom Mitchell, 1997

Page 68: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

머신러닝의구성요소

6

머신러닝을하기위해서무엇이필요한가?

– 경험 (E): Training data Set

– 업무 (T): System or model to be learned

– 척도 (P): performance measure, accuracy

Page 69: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

머신러닝의구성요소머신러닝을하기위해서무엇이필요한가?

– 경험 (E): Training data Set

– 업무 (T): System or model to be learned

– 척도 (P): performance measure, accuracy

69

Training Step:

T: Spam mailFiltering system

(E):User flaggedSpam list

(T): Smap Filteringsystem

(P): Filtering Accuracy

TrainingCollectingTraining DataSpam ✔

Spam ✔

Spam ✔

Page 70: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

머신러닝의구성요소머신러닝을하기위해서무엇이필요한가?

– 경험 (E): Training data Set

– 업무 (T): System or model to be learned

– 척도 (P): performance measure, accuracy

70

Tasking Step:

T: Spam mailFiltering system

ArbitraryInput

SPAM

Filtering(spam classification)

Page 71: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Why Machine Learning?

Conventional approach – 단편적인현상을통해프로그래머가인지한몇가지 rule를가지고

문제를해결

• 보편적인 rule 을만들기어려움

• 실제문제를해결을보장하지않음

• 결과물은엔지니어의경험에매우의존적임

71

Study theProblem

Establishrules

Evaluate

Analyzeerrors

Release!

Page 72: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Why Machine Learning?

ML approach

– 데이터로부터문제해결을위한보편적인 rule을찾는다.

• Training data set에대한일반화

• 데이터에의한대한문제해결

72

Study theProblem

Train ML algorithm

Evaluate

Analyzeerrors

Release!

Lots of Data

Page 73: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Why Machine Learning?

ML approach

– 데이터로부터문제해결을위한 rule을자동으로찾는다.

• 머신러닝이진정한의미가있기위해서는…

• 빅데이터가있고

• 자동으로시스템이러닝될수있다.

73

Train ML algorithm

EvaluateSolution

UpdateData

Release!

Lots of Data

Can be automated

Page 74: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

ML approach 특징요약

– All from data!!• Larger data Better systems

• Simple but universal rule

• Easy update rule given new data

• Getting unrecognized insights about complex problems

Why Machine Learning?

74

Inspect the solution

Lots of Data

Study theProblem

Train ML algorithm

IterateIf needed

Understand the problem

better

Solution

New Insight

Page 75: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Why Machine learning Now?

머신러닝 3요소

–빅데이터: SmartDevices, IoT, sensor networks

–알고리즘: deep learning / reinforcement-learning

–컴퓨팅파워: GPU, parallel / distributed computing

75

Page 76: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Why Machine learning Now?

머신러닝 3요소

–빅데이터 >> 컴퓨팅파워 > 알고리즘

• 품질좋은/많은량의데이터만있으면머신러닝시스템의성능을무족건좋다.

76

Page 77: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Why Machine learning Now?

머신러닝 3요소

–빅데이터 >> 컴퓨팅파워 > 알고리즘

• 품질좋은/많은량의데이터만있으면머신러닝시스템의성능을무족건좋다.

그럼데이터만있으면엔지니어는이제필요없나요 ?– 아니요!: 위세가지는항상모두주어지지않음

We need ”Domain Adaptation !!”

목표: 성능개선 + 러닝시간단축 + 데이터용량최소화

77

Page 78: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

We Still Need ML Engineers!Domain Adaptation 1 – Feature Extraction

– 데이터를특징을잘뽑아서데이터량을줄인다– Simplify data without losing too much information !

– Feature extraction을잘하기위해서는도메인지식이중요

Domain Adaptation 2 – Learning Environment – 모바일에서는컴퓨팅파워와데이터저장소가부족

– 긴러닝타임도부담

– Google’s Approach in Mobile ML env. (utblink)

• Pre-training model in cloud

• Local fine-Tuning with private data in mobile APP

• Run prediction engine in mobile APP

78

+Cloud ML

Page 79: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems학습방식에따라서

– Supervised

– Unsupervised

– Reinforcement

데이터업데이트방식에따라서

– Batch-based learning

– Online-based learning

79

Page 80: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Supervised learning– Training data set이시스템의입력과출력의 pair 로구성된경우• 여기서출력을 label이라고함

• 여기서입력을 instance (또는 feature)라고함

80

2

1 1

2

1

2

?

New instance

Training Set

Instance

Label

Page 81: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Supervised learning– Training data set이시스템의입력과출력의 pair 로구성된경우

– 컴퓨터가러닝하는데선생이있어서답(label)을알려주는경우

– 머신러닝의가장보편적인형태

81

An example: Smap filter revisited

T: Spam mailFiltering system

(E):User flaggedSpam list

(T): Smap Filteringsystem

(P): Filtering Accuracy

TrainingCollectingTraining DataSpam ✔

Spam ✔

Spam ✔

Page 82: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Supervised learning

– Regression

– Classification

Models in Supervised learning

– Linear Models

– Neural networks

– SVM, RVM

82

Page 83: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Unsupervised learning– Training data set이입력만있는경우

– 컴퓨터가러닝하는데선생님이없는경우

– 세상에존재하는대부분의데이터는 label이없다

83

Training Set

Page 84: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Unsupervised learning– Training data set이입력만있는경우

– 컴퓨터가러닝하는데선생님이없는경우

– 세상에존재하는대부분의데이터는 label이없다

84

Training Set

Page 85: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Unsupervised learning– Clustering

– Dimensionality reduction• The goal is to simplify the data without losing too much

information

– Feature extraction: • which is a kind of dimensionality reduction

• Reducing computational& memory cost for ML

• Making ML system efficient!

• Domain knowledge is required

85

Page 86: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Semisupervised learning– Combination of supervised and unsupervised learning

• 1) Supervised learning with labeled data

• 2) Unsupervised learning with unlabeled data

• 3) Self-labelling of unlabeled data using step1 and step2

• 4) re-training of the system

86

이미지출처:https://www.researchgate.net/publication/277605013_Signal_Processing_Approaches_to_Minimize_or_Suppress_Calibration_Time_in_Oscillatory_Activity-Based_Brain-Computer_Interfaces

Page 87: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Semisupervised learning– Training data set = labeled data + unlabeled data

– Combination of supervised and unsupervised learning

87 feature1

Feature 1

Feature 2

Page 88: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Semisupervised learning– Training data set = labeled data + unlabeled data

– Combination of supervised and unsupervised learning

88

Supervised learning with Labeled data

feature1

Feature 1

Feature 2

Page 89: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Semisupervised learning– Training data set = labeled data + unlabeled data

– Combination of supervised and unsupervised learning

89

Labeling unlabed dataWith trained classifier

Feature 1

Feature 2

Page 90: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Semisupervised learning– Training data set = labeled data + unlabeled data

– Combination of supervised and unsupervised learning

90

Re-training and classifier update

Feature 1

Feature 2

Page 91: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Reinforcement learning– 컴퓨터가시간에따라서보상을통해서최고의행동전략을

환경으로부터스스로배워가는것

– 결과로부터스스로학습법

91

[이미지출처]https://www.analyticsvidhya.com/blog/2016/12/getting-ready-for-ai-based-gaming-agents-overview-of-open-source-reinforcement-learning-platforms/

Page 92: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML SystemsMarI/O (2015)

– 목적: 게임에대한연속적인버튼선택러닝– https://www.youtube.com/watch?v=qv6UVOQ0F44– 게임의각프레임상황을 supervised DNN으로모델링

• 입력: 프레임화면이미지• 출력: 각화면이미지에대한버튼선택 (A,B,X,Y,R,L,up,down,right,left)

– 강화학습• 각프레임의환경을 “State”로정의 (블록배치, enemy위치)• 현재 state의 Action (버튼선택)에따라다음 state가정해진다.• 각 State 에대해서 Action (버튼선택)에따른 ”Reward” (Fitness) 를피드백

– 살아있으면 + 1– 버섯, 코인, 먹으면 + X– Enemy 에접촉해서데미지를입으면 - Y

• Reward가높은 state의 action에대한확률이높인다• Reward는과거 N-state에대해서피드백된다. (동적)

92

Page 93: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML SystemsGoogle DeepMind 벽돌깨기 (2016)

– https://www.youtube.com/watch?v=V1eYniJ0Rnk

– AI가살아남는것을넘어서편하게점수를따는전략을스스로찾아냄

– AI가스스로행동전략을진화시킨첫번째케이스로알려짐

93

Page 94: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Batch learning– Training data set를통째로넣어서한번에시스템을러닝하는것• 장점: simple / 주어진데이터안에서안정적인동작

• 단점:

– 급격하게변화하는데이터대처에비효율

– Learning cost가크다

» 계산량: 데이터가크니깐크다

» 메모리: training data를항상보관하고있어야한다.

» 시간 : 데이터가크니깐러닝타임이길다.

94

Page 95: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Online Learning– 시스템을러닝한후에도 instance data가들어오면 online으로

시스템을 incrementally 업데이트할수있는학습법

95

EvaluateSolution

Lots of Data

Train ML algorithm

Run and Update Model

Release!

New Data New Data New Data

Page 96: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Online Learning– 시스템을러닝한후에도 instance data가들어오면 online으로

시스템을 incrementally 업데이트할수있는학습법

– 하드웨어의계산부담을줄이기위해서 batch learning을 online learning으로쪼개서할수도있음 (실제로다그렇케함)

96

Study theProblem

Train onlineML algorithm

Analyzeerrors

Evaluate

Release!

Lots of Data

Chop intopieces

Page 97: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Types of ML Systems

Online Learning– 장점:

• 변화하는데이터에바로바로대처

• 러닝비용을낮출수있다

• Training data를들고있지않아도된다.

– 단점: 데이터의변화에너무시스템이민감해질수있다.

97

Page 98: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Trade-off in ML

Model selection

– Underfitting

– Overfitting

Curse of dimensionality

98

Page 99: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Trade-off in ML

Model selection– 대상으로하는데이터의범주에맞는머신러닝모델선택이

중요하다!

– 목적으로하는데이터범주에대해서

99

Page 100: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Trade-off in ML

Model selection– 대상으로하는데이터의범주에맞는머신러닝모델선택이

중요하다!

– 목적으로하는데이터범주에대해서• 설명을잘못하는모델예측오류값이매우크다 (Underfitting)

100

Training dataValidation data

Page 101: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Trade-off in ML

Model selection– 대상으로하는데이터의범주에맞는머신러닝모델선택이

중요하다!

– 목적으로하는데이터범주에대해서• 설명을잘못하는모델예측오류값이매우크다 (Underfitting)

• 훈련데이터에대해서만작은오류값을출력하는모델 (Overfitting)

– 일반화된예측결과를제공못함

101

Training dataValidation data

Page 102: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Trade-off in ML

Model selection– 대상으로하는데이터의범주에맞는머신러닝모델선택이

중요하다!

– 목적으로하는데이터범주에대해서• 설명을잘못하는모델예측오류값이매우크다 (Underfitting)

• 훈련데이터에대해서만작은오류값을출력하는모델 (Overfitting)

– 일반화된예측결과를제공못함

• 필요이상으로복잡하지않은모델

102

Training dataValidation data

Page 103: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Trade-off in MLModel selection

– 간단한모델• Good:

– 1) 주어진데이터셋에대해서일반화된결과를내놓는다.

– 2) 작은데이터셋을가지고훈련이가능하다.

– 3) 작은범위의데이터셋에대해서성능이복잡한모델보다좋다.

• Bad :

– 1) Underfitting: 다양한/넓은범위의데이터셋에대해서설명 (모델링) 하지못한다.

– 복잡한모델• Good:

– 1) 다양한/넓은범위의데이터셋에대해서설명(모델링)이가능하다.

• Bad :

– 1) 작은범위의데이터셋에대해서성능이간단한모델에비해서열화된다.

– 2) 훈련을위해서다양하고큰데이터셋이필요하다.

– 3) Overfitting: 대상으로하는데이터셋에대해서일반화된결과를내놓기어렵다.

103

Page 104: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Trade-off in ML

Model selection

104

Given a certain size of data…

Validation error

Page 105: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Trade-off in ML

The Curse of Dimensionality– 데이터의차원이증가할수록학습에필요한데이터의개수를

기하급수적으로늘어나는문제

– 같은비율의공간을모델링하기위해서필요한데이터량이급속도록증가

– Example: 공간의 20%를채우기위해서필요한데이터수• 0.2(1차원) 0.9(2차원) 1.74 (3차원)

105[이미지출처] http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/

1차원데이터0.2 = 0.2

2차원데이터0.2 = 0.45 * 0.45

3차원데이터0.2 = 0.58 * 0.58 * 0.58

Page 106: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Trade-off in ML

The Curse of Dimensionality– 저차원에서는충분했던데이터양도고차원이되면공간(학습모델)을설명하기에부족해질수있음

• 데이터부족 Overfitting문제발생

106[이미지출처] http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/

Page 107: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Trade-off in ML

The Curse of Dimensionality– 저차원에서는충분했던데이터양도고차원이되면공간(학습모델)을설명하기에부족해질수있음

• 데이터부족 Overfitting문제발생

107[이미지출처] http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/

Page 108: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Trade-off in ML

The Curse of Dimensionality– 저차원에서는충분했던데이터양도고차원이되면공간(학습모델)을설명하기에부족해질수있음

– 모델을설명하기에데이터가부족해?

108

Page 109: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Trade-off in ML

The Curse of Dimensionality– 저차원에서는충분했던데이터양도고차원이되면공간(학습모델)을설명하기에부족해질수있음

– 모델을설명하기에데이터가부족해? Overfitting문제

– 해결책• 데이터양 + 다양성늘리기

• 중요한특징만뽑자데이터차원축소 (Dimensionality Reduction)

– Feature Extraction

– Feature Selection

• 사전지식이용 Regularziation

109

Page 110: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Trade-off in ML

Raw data vs Featured data

110

Raw data Featured Data

No information loss Discover Unrecognized patterns

Some information loss Degrade performance sometime

Curse of dimensionality Dimensionality Reduction

Higher computational cost Slower learning time

Lower computation cost Faster learning time

Large-scale storage is required

Small-scale storage is fine

Page 111: [Tf2017] day1 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Trade-off in ML

Big data vs. Small data

111

Big data Small Data

No overfitting totraining data

Overfitting to trainingdata is possible

Working with complex models for difficult problems

Working with simple models for easy problems

Higher computational cost Slower learning time

Lower computation cost Faster learning time

Large-scale storage is required

Small-scale storage is fine