孫民/從電腦視覺看人工智慧 : 下一件大事

Ar#ficial Intelligence: The Next Big Thing

from a computer vision perspec0ve VSLab

清大電機孫民

What’s the Next Big Thing?

h2p://research.microso6.com/en-‐us/um/redmond/events/fs2015

“big data being the source, machine learning being the technique, and AI being the outcome” by Prof. Hsuan-‐Tien Lin at IEEE BigData 2016

Many kinds of source (data) and outcomes (AI tasks) can be trained end-‐to-‐end using Deep Learning (DL)

Classical AI Tests: Turing Test

by Alan Turing in 1950

Chatbot@F8

h2ps://developers.facebook.com/videos/f8-‐2016/keynote/

Classical AI Tests: CAPTCHA

Breaking CAPTCHA

by vicarious.com

AlphaGo

2016 by Google DeepMind

Are these what AI all about?

2014 Subfields of AI

Ar#fical General Intelligence (AGI)

Deep Learning (DL)

•  Data •  GPU Compu0ng •  Talents

DL Fuses AI-‐subfields •  Vision and Language

•  Vision and Control

h2p://mscoco.org/

Atari Breakout game & AlphaGo, DeepMind.

-‐> AGI

•  Mul0ple Encoding and Decoding

Image Cap#oning

f( ) = The man at bat is ready to swing at the pitch

Vision Language

Recurrent Neuron Network (RNN) credit: Nature

convolu0ons

Convolu#on Neuron Network (CNN) credit: wiki

Image Ques#on Answering

h2p://visualqa.org/

Zhen et al. ECCV 2016 from VSLab and Stanford AI Lab

Big Video Data with Titles •  Pairs of

Raw Video

CNN CNN CNN CNN

Viral Videos

Google for “viral video company”

Large Video Repository

Currently 28740 videos and keep growing

DL Fuses AI-‐subfields •  Vision and Language

•  Vision and Control

h2p://mscoco.org/

Atari Breakout game & AlphaGo, DeepMind.

-‐> AGI

•  Mul0ple Encoding and Decoding

Vision and Control

h2ps://gym.openai.com/

•  Learning to play game with weak supervision: Reinforcement Learning (RL)

Where It All Begins …

by DeepMind in NIPS 2013 Deep Learning Wrokshop

Playing Atari with Deep Reinforcement Learning

slides by Yen-‐Chen Lin

Control: Learning to Act

Play Breakout equals to •  Input: screen images •  Output: ac0ons (do nothing | left | right)

Supervised Classifica0on

Supervised Solu#on •  Training data: Record experts game sessions

•  Target label: Ac0on experts take at every step

•  What if there’s no expert?

•  This is not how human learns

Problems:

How Human Learns •  Don’t need somebody to tell us a million 0mes which move to choose at each screen

•  Just need occasional feedback that we did the right thing

Reinforcement Learning •  Somewhere between supervised and

unsupervised learning •  Sparse and time-delayed labels

Based only on those rewards, the agent has to learn to behave in the environment. A ra0onal agent should op0mize total reward.

RL in A Nutshell

Markov Decision Process

•  State

•  Action

•  Reward

The probability of the next state si+1 depends only on current state si and ac0on ai. slides by

Yen-‐Chen Lin

Episode

One episode of this process (e.g. one game) forms a finite sequence of states, ac0ons and rewards:

Example: Breakout

•  State: game screen

•  Action:

•  Reward: game score

1. do nothing 2. le6 3. right

Example: Breakout

•  State: successive game screens

•  Action:

•  Reward: game score

1. do nothing 2. le6 3. right

•  To perform well, we should also take future rewards into account, how to do that?

Total reward:

Total future reward:

Reward

Discounted Future Reward

•  However, since the environment is stochas0c, intui0vely one should earn reward as soon as possible

Total discounted future reward:

Q func#on

•  Q(s, a):

The maximum discounted future reward when we perform ac0on a in state s, and con0nue optimally from that point on.

It represents the “quality” of a certain action in a given state.

How to Choose Ac#on?

Here π represents the policy, the rule how we choose an ac0on in each state.

If we know Q func0on,

Q Func#on Implementa#on

ac#on 0 ac#on 1 ac#on 2

state 0 -‐2 -‐1 5

state 1 3 2 3

state 2 5 6 -‐6

If We Use Pixels as State

1.  Resize images to 84x84 2.  Convert to grayscale with 256 levels 3.  Use last 4 frames to represent state

25684x84x4 = 1067970 possible game states

We can never cover all the cases!

Vision & Controal: Deep Q Network

We use CNN to represent Q func0on, which takes:

•  Input: the state (4 game screens) and ac0on

•  Output: Q-‐values of different ac0ons a (i.e., Q(s,a))

π( )=argmaxaQ( ,a)

Fusing Mul#ple Sensors

Ke#le%

Medium+wrap%

Ke#le%

Medium+wrap%

thumb+4+finger%

Manipula7on%Region%

Side+view%

Chan et al. ECCV 2015 from VSLab

Left Hand Head Right Hand 81

Office

Left Hand Head Right Hand 82

Office

Recogni#on from Wearable Cameras

Gesture%Recogni1on%

Object%Category%Recogni1on%

Real-‐#me Wearable Demo

Fisheye camera NVIDIA TK1

Real-‐#me Wearable Demo cellphone, bo2le, keyboard, mouse, free hand

Take-‐Home Message •  Encoding Source (data)

– N-‐D observa0on – N-‐D sequence of observa0ons

•  Decoding Outcome (AI tasks) – N-‐D single output – N-‐D open-‐ended sequence as output

•  Mul0ple Encoding and Decoding •  If each module is differen0able/approximately differen0able -‐> End-‐to-‐End Learning

We get many tools to tackle Ar#ficial General Intelligence

Just Try!

Worse Thing: Do Nothing

My Two Cents for Taiwan

Ques#ons •  Can I simply ask my engineers to use open source deep learning tools to create new products?

Answer: Yes and Not really. Yes – if you want to complete a well-‐known task. But Google’s MLaaS product will almost always beat you. Not really – if you want to solve your own problem, with your own data. You need talents or make engineers not afraid of failure.

Where can I find talents? •  Most talents are PhD students or young professionals in the US and EU.

h2p://www.economist.com/news/business/21695908-‐silicon-‐valley-‐fights-‐talent-‐universi0es-‐struggle-‐hold-‐their

How can we compete?

Local Students •  Our students know deep learning is HOT!

[ Deep Learning Workshop 中研院 ] 500 位參加者

Case Study: NTHU@TW Undergraduate

h2ps://github.com/yenchenlin1994/DeepLearningFlappyBird

Case Study: UNIST@Korean Undergraduate

To-‐Do for Local Students •  We need more students to work on

–  realis0c deep learning projects with – enough computer resource

•  We need some of them to stay in our local industry

Advanced Deep Learning Course at NTHU (105學年) 1.  Taught by a group of profs 2.  Topics including latest DNN models, distributed

training, DL for embedded system 3.  Sponsored by MTK and ITRI 巨資中心 4.  More sponsors are welcomed!

For Talents Abroad Get in the Talents Race!

h2p://cvpr2016.thecvf.com/exhibit/industry_expo

For Talents Abroad

Most of them fresh PhDs

1 Billion Pledged USD

For Talents Abroad

AI is happening Fast

Thanks!

孫民/從電腦視覺看人工智慧 : 下一件大事

Data & Analytics

圖 : 孫楯彥文 : 孫楯彥 PPS : 黃冠傑音樂 :

55年醫學經驗感統運動權威 - wfes.tp.edu.tw°¸春大腦優化簡介_0.pdf · 感覺統合：組織來自身體和環境感覺(聲音、味道、觸覺…等等)的過程，使身體能在

電腦視覺 Computer and Robot Vision I

電腦操作會考電腦 2001-2002

個人電腦、網電腦與私雲電腦 ( 雲端科技 )

OPENCV in Python 電腦視覺與人臉辨識入門教學

感覺統合理論感覺統合與學習感覺統合失調？感覺統 … Presentation/Day... · 2 感覺統合治療？感覺統合失調？感覺統合理論？感覺統合的歷史及基礎理論

同理心的基礎－自我覺察與覺察他人

演化式演算法在電腦視覺之應用 20091126 Seminar 報告

第二十三屆電腦視覺、圖學暨影像處理研討會oz.nthu.edu.tw/~d923996/docs/CVGIP2010 agenda.pdf · 第二十三屆電腦視覺、圖學暨影像處理研討會大會議程表

_孫子兵法白話對照

電腦視覺 Computer Vision: from Recognition to Geometrymedia.ee.ntu.edu.tw/courses/cv/18F/slides/cv2018_lec07.pdf · •Our main idea: learning pixel affinities (distances) for

【新興領域：3 月焦點 5】你是我的眼淺談電腦視覺辨識市場 file根據維基百科上的定義，電腦視覺辨識是一門研究如何使機器「看」的科學，更進一步的

孫越的 [甦醒] 要看

EU-FP7 ICT 歐盟科研架構計畫 – 電腦視覺、圖學與影像處理相關計畫徵求說明

大力士參孫 Samson

開誠布公！頭腦好的女人、頭腦不好的女人之說話術booklook.morningstar.com.tw/pdf/0104372.pdf · 但是大部分的女人都沒有察覺到自己其實擁有高度的身心能力

視覺運算公司...- 視覺運算公司二十年以來，始終站在電腦繪圖藝術與科學的視覺運算浪潮最前端。我們發明的現代視覺運算引擎讓繪圖領域從此擴展，得以納入電玩遊

韓國女歌手 : 孫淡妃

電腦繪圖與視覺化期末專案 ─ 太空飛行射擊模擬