기계학습을통한 시계열데이터분석및 금융시장예측응용krnet.or.kr/board/data/dprogram/2190/S2-1_%C3%D6%C0%E7%BD%C4.pdf · An Ensemble of Model for Stock Market

기계학습을 통한시계열 데이터 분석 및금융시장 예측 응용

울산과학기술원

전기전자컴퓨터공학부

최재식

Facebook의얼굴인식기(DeepFace)가사람과비슷한인식성능을보임

▪ 문제: 사진에서연애인의이름을맞추기

▪ 사람의인식율: 97.5% vs DeepFace의인식률: 97.35% (2014년 3월)

얼굴인식

ImageNet (http://image-net.org): 1500만개이상의이미지데이터 (2만2천개의물체분류)

ImageNet Large Scale Visual Recognition Challenge (ILSVRC)

문제: 주어진이미지에대해서 1,000개의 물체분류중 5개를추천

▪ 딥러닝이전시스템의예측오류: (SVM, Ensemble) 2012: >26.2%

▪ 딥러닝시스템: 2013년 AlexNet(토론토대학): 15.3%, 2014년 LeNet(구글): 6.7%,

2015년 ResNet(마이크로소프트): 3.6%, 2016년 Trimps(중국보안연구소): 3.0%

물체인식

AlphaGo (https://deepmind.com/alpha-go)

알파고와이세돌기사의대국에서알파고의 4 대 1 승리 (2016년 3월)

바둑(알파고)

https://www.youtube.com/watch?v=Dy0hJWltsyE

The Deep Learning Revolution - NVIDIA

https://www.youtube.com/watch?v=Dy0hJWltsyE

인공지능의미래?What is the future of AI?

Artificial Intelligence

파괴적(Disruptive) 기술이 2025년에세계경제에미칠영향(맥킨지 2013)

지식 노동의 자동화Automation of Knowledge Work

데이터수집/분석/처리

지식 노동의 자동화 –금융 및 보험Automation of Knowledge Work

SOURCE: https://public.tableau.com/profile/mckinsey.analytics#!/vizhome/AutomationBySector/WhereMachinesCanReplaceHumans

https://public.tableau.com/profile/mckinsey.analytics#!/vizhome/AutomationBySector/WhereMachinesCanReplaceHumans

Financial Time Series Analysis

AI based Startups for Fintech

공포가 뭐죠?... 냉정한 AI 증시 요동칠 때 600% 수익(조선비즈, 2017년 3월 18일)

An Ensemble of Model for Stock Market Prediction

• Problem: Predicting S&P 500 from 1992 ~ 2015

Kruss et. Al., Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500, European Journal of

Operational Research, 2007.

Before using AI

(1992 ~ 2001/3)

200%

Applying AI methods

(2001/4 ~ 2008/8)

22%

Crisis

(2008/09~2009/12)

400%

Recent models

(2010/1~)

0%

Model: Deep Neural Networks + Gradient boosted tree + Random forest

트레이더, 600명에서 2명으로, IT 기업된 골드만삭스(이코노미조선, 2017년 2월 22일)

“약 3만5000명에달하는 골드만삭스 전체임직원의 4분의 1가량이 컴퓨터 엔지니어”‘2017 CSE 심포지엄’ –골드만삭스 CFO

관계형 자동 통계학자(UNIST)The Relational Automatic Statistician

다수의 시계열 데이터의 변화 및 상호 관계를 분석하는 인공지능 시스템

+ +

+

주식 데이터

다중 주식 데이터베이지안 다중 커널 학습

질의

자동 보고서 작성/변화 예측

주식간 관계 정보 분석

UNIST 시스템: MIT/캠브리지 분석 시스템 대비 예측 오류 40% 감소 (2016년 6월)

학습

딥러닝

퍼셉트론Perceptron

비선형 변환Nonlinear Transform

Linear Separable Classes in Multilayer Perceptron

After a nonlinear transformation, red and blue are linear separable*

* Y. LeCun, Y. Bengio, G. Hinton (2015). Deep Learning. Nature 521, 436-444.

How Does Deep Learning Work?

https://www.youtube.com/watch?v=He4t7Zekob0

Learning Feature Hierarchy - Examples

Recognizing drawing

Recognizing human faces

LeNet 5 (1989) vs GoogLeNets (2014)

LeNet5: Recognizing digits using a neural network with 5 layers

Recognizing human faces

Deep Learning Artistic Style

위성사진 기반 콩 생산량 예측(미국)

콩 생산량 예측 (Bu/Ac)

2016/08 2016/09 2016/10

KERNEL 50 > 51.0 52.1

USDA 48.9 50.6 51.4

위성사진 인공지능기반콩생산량 예측(시/군구단위)[미국 USDA: 예측 주단위]

http://www.telluslabs.com/2016/10/12/telluslabs-forecasts-lead-usda-reports-corn-soy/

에너지 사용 (탄소배출 저감)

구분 내용

필요성• 온실가스 배출 저감을 위해 에너지 사용량을 줄이고 효율적인 에너지

사용방안 모색 필요

성과• 시민들의 에너지 사용량데이터(기온, 사용량, 전달속도 등)를 활용하여 neural

network 모델 구축. 구축된 모델을 활용하여 에너지의 사용을 저감할 수 있는있는 방안 모색가능

이용기술 • Machine Learning/Deep Learning/Optimization

https://environment.google/approach/

Google says that it emits 1.5m tonnes of carbon annually but claims thatits data centres consume 50% less energy than the industry average.

https://www.theguardian.com/technology/google

www.google.com/about/datacenters/efficiency/internal/assets/machine-learning-applicationsfor-datacenter-optimization-finalv2.pdf

에너지 사용 (탄소배출 저감)

1st ranked model in Kaggle

https://www.kaggle.com/c/otto-group-product-classification-challenge/discussion/14335#133321

Predicting Markets with Google Trends

• Problem: Invest Stocks based Google Search Keywords

• Goal: Maximize return

T. Preis, H. S. Moat and H. E. Stanley, Quantifying Trading Behavior in Financial Markets Using Google Trends, Scientific Report, 2013.

Sell stocks at the closing of the first day of the week when debt search is more than 3

week average. Otherwise, buy stock at the closing of first day of the week.

Dataminr – Event Detection Technology

KB 지식 비타민, 2015.9

Reading Texts for Predicting the Future

Constant function

Sudden drop btw

9/12/01 ~ 9/15/01

Smooth function

Length scale: y weeks

Rapidly varying

smooth function

Length scale: z hours

Quarterly Report News

딥러닝과 시계열데이터 분석

Residual Network (ResNet, He et. al., 2015)

Residual learning

Comparison of Resnet

3.6% of error in ImageNet Challenge, 2015

Densely Connected Convolutional Networks(DensNet, Huang et. Al., 2016)

Better Performance than ResNet

CIFAR10 (3.74% -ResNet 4.62%)

CIFAR 100 (19.25% - ResNet 22.71%)

Recurrent Convolutional Neural Layers (RCNN, Liang and Hu, 2015)

* Figure is drawn by SubinYi

Recurrent Convolutional Layer (RCL)

RCNN on EEG Analysis

Hand Start

First Digit Touch

Lift off

Replace

Both Released

* Joint work with Azamatbek Akhmedov

Luciw et. al., Multi-channel EEG recordings during 3,936 grasp and lift trials with varying weight and friction, Scientific Data (Nature), 2014


One chunk: Data: 3584,32

Hand Start

First Digit Touch

Lift off

Replace

Both Released

* Joint work with Azamatbek Akhmedov

Luciw et. al., Multi-channel EEG recordings during 3,936 grasp and lift trials with varying weight and friction, Scientific Data (Nature), 2014


Applying RCL Convolutional Layer:(1,3584)

Max pooling

Max pooling

Max pooling

Max pooling

Max pooling

Fully Connected

RCL:(1,896)

RCL:(1,224)

RCL:(1,56)

RCL:(1,14)

(1,7)

(6)

97.687%

복잡계 시스템 –제조업

딥러닝 모델

용선온도 예측400여개 시계열 센서 데이터

Grouped CNN/ Grouped RCNN(Yi, Ju and Choi, 2017)



Collected from 148 sensors of 12,654 time steps

Dataset: US Groundwater(Yi, Ju and Choi, 2017)

Collected from 88 sites for 28 years

Dataset: US Groundwater(Yi, Ju and Choi, 2017)

Groundwater Drone

Conclusion

- 인공지능/기계학습 기술의 발전은 시계열 데이터의 인식/분석/예측에 큰영향을 줄 것으로 예상됨

- 이미지/영상 인식에서 딥러닝의 발전은 시계열 데이터 인식의 발전에도긍정적인 영향을 줌

- 최근에는 skip layer를 효과적으로 이용하는 딥러닝 방법들이 좋은성능을 보임

- 시계열 데이터를 동적 모델(RNN/LSTM)로 보는 방법외에, 특정 시계열구간에 CNN을 적용한 모델도 효과적인 방법으로 보임.

Thank [email protected]

Documents

기계학습을통한 시계열데이터분석및 금융시장예측응용krnet.or.kr/board/data/dprogram/2190/S2-1_%C3%D6%C0%E7%BD%C4.pdf · An Ensemble of Model for Stock Market