View
3.895
Download
1
Embed Size (px)
Citation preview
Ar#ficial Intelligence: The Next Big Thing
from a computer vision perspec0ve VSLab
清大電機 孫民
What’s the Next Big Thing?
h2p://research.microso6.com/en-‐us/um/redmond/events/fs2015
Goal
“big data being the source, machine learning being the technique, and AI being the outcome” by Prof. Hsuan-‐Tien Lin at IEEE BigData 2016
Many kinds of source (data) and outcomes (AI tasks) can be trained end-‐to-‐end using Deep Learning (DL)
Classical AI Tests: Turing Test
by Alan Turing in 1950
Chatbot@F8
h2ps://developers.facebook.com/videos/f8-‐2016/keynote/
Classical AI Tests: CAPTCHA
Breaking CAPTCHA
by vicarious.com
AlphaGo
2016 by Google DeepMind
Are these what AI all about?
2014 Subfields of AI
2015
Ar#fical General Intelligence (AGI)
Deep Learning (DL)
• Data • GPU Compu0ng • Talents
DL Fuses AI-‐subfields • Vision and Language
• Vision and Control
h2p://mscoco.org/
Atari Breakout game & AlphaGo, DeepMind.
-‐> AGI
• Mul0ple Encoding and Decoding
Image Cap#oning
f( ) = The man at bat is ready to swing at the pitch
Vision Language
Recurrent Neuron Network (RNN) credit: Nature
convolu0ons
Convolu#on Neuron Network (CNN) credit: wiki
Image Ques#on Answering
h2p://visualqa.org/
Zhen et al. ECCV 2016 from VSLab and Stanford AI Lab
Big Video Data with Titles • Pairs of
Raw Video
CNN CNN CNN CNN
Title
Viral Videos
Google for “viral video company”
Large Video Repository
Currently 28740 videos and keep growing
DL Fuses AI-‐subfields • Vision and Language
• Vision and Control
h2p://mscoco.org/
Atari Breakout game & AlphaGo, DeepMind.
-‐> AGI
• Mul0ple Encoding and Decoding
Vision and Control
h2ps://gym.openai.com/
• Learning to play game with weak supervision: Reinforcement Learning (RL)
Where It All Begins …
by DeepMind in NIPS 2013 Deep Learning Wrokshop
Playing Atari with Deep Reinforcement Learning
slides by Yen-‐Chen Lin
Control: Learning to Act
Play Breakout equals to • Input: screen images • Output: ac0ons (do nothing | left | right)
Supervised Classifica0on
slides by Yen-‐Chen Lin
Supervised Solu#on • Training data: Record experts game sessions
• Target label: Ac0on experts take at every step
• What if there’s no expert?
• This is not how human learns
Problems:
slides by Yen-‐Chen Lin
How Human Learns • Don’t need somebody to tell us a million 0mes which move to choose at each screen
• Just need occasional feedback that we did the right thing
slides by Yen-‐Chen Lin
Reinforcement Learning • Somewhere between supervised and
unsupervised learning • Sparse and time-delayed labels
Based only on those rewards, the agent has to learn to behave in the environment. A ra0onal agent should op0mize total reward.
slides by Yen-‐Chen Lin
RL in A Nutshell
slides by Yen-‐Chen Lin
Markov Decision Process
• State
• Action
• Reward
The probability of the next state si+1 depends only on current state si and ac0on ai. slides by
Yen-‐Chen Lin
Episode
One episode of this process (e.g. one game) forms a finite sequence of states, ac0ons and rewards:
slides by Yen-‐Chen Lin
Example: Breakout
• State: game screen
• Action:
• Reward: game score
1. do nothing 2. le6 3. right
slides by Yen-‐Chen Lin
Example: Breakout
• State: successive game screens
• Action:
• Reward: game score
1. do nothing 2. le6 3. right
slides by Yen-‐Chen Lin
• To perform well, we should also take future rewards into account, how to do that?
Total reward:
Total future reward:
Reward
slides by Yen-‐Chen Lin
Discounted Future Reward
• However, since the environment is stochas0c, intui0vely one should earn reward as soon as possible
Total discounted future reward:
slides by Yen-‐Chen Lin
Q func#on
• Q(s, a):
The maximum discounted future reward when we perform ac0on a in state s, and con0nue optimally from that point on.
It represents the “quality” of a certain action in a given state.
slides by Yen-‐Chen Lin
How to Choose Ac#on?
Here π represents the policy, the rule how we choose an ac0on in each state.
If we know Q func0on,
slides by Yen-‐Chen Lin
Q Func#on Implementa#on
ac#on 0 ac#on 1 ac#on 2
state 0 -‐2 -‐1 5
state 1 3 2 3
state 2 5 6 -‐6
slides by Yen-‐Chen Lin
If We Use Pixels as State
1. Resize images to 84x84 2. Convert to grayscale with 256 levels 3. Use last 4 frames to represent state
25684x84x4 = 1067970 possible game states
We can never cover all the cases!
slides by Yen-‐Chen Lin
Vision & Controal: Deep Q Network
We use CNN to represent Q func0on, which takes:
• Input: the state (4 game screens) and ac0on
• Output: Q-‐values of different ac0ons a (i.e., Q(s,a))
slides by Yen-‐Chen Lin
π( )=argmaxaQ( ,a)
Fusing Mul#ple Sensors
Ke#le%
Medium+wrap%
Ke#le%
Medium+wrap%
thumb+4+finger%
Manipula7on%Region%
Side+view%
Chan et al. ECCV 2015 from VSLab
Left Hand Head Right Hand 81
Lab
Office
Home
Left Hand Head Right Hand 82
Lab
Office
Home
Recogni#on from Wearable Cameras
Pred%
GT%
Pred%
GT%
Gesture%Recogni1on%
Object%Category%Recogni1on%
Real-‐#me Wearable Demo
Fisheye camera NVIDIA TK1
Real-‐#me Wearable Demo cellphone, bo2le, keyboard, mouse, free hand
Take-‐Home Message • Encoding Source (data)
– N-‐D observa0on – N-‐D sequence of observa0ons
• Decoding Outcome (AI tasks) – N-‐D single output – N-‐D open-‐ended sequence as output
• Mul0ple Encoding and Decoding • If each module is differen0able/approximately differen0able -‐> End-‐to-‐End Learning
We get many tools to tackle Ar#ficial General Intelligence
Just Try!
Worse Thing: Do Nothing
My Two Cents for Taiwan
Ques#ons • Can I simply ask my engineers to use open source deep learning tools to create new products?
Answer: Yes and Not really. Yes – if you want to complete a well-‐known task. But Google’s MLaaS product will almost always beat you. Not really – if you want to solve your own problem, with your own data. You need talents or make engineers not afraid of failure.
Where can I find talents? • Most talents are PhD students or young professionals in the US and EU.
h2p://www.economist.com/news/business/21695908-‐silicon-‐valley-‐fights-‐talent-‐universi0es-‐struggle-‐hold-‐their
How can we compete?
Local Students • Our students know deep learning is HOT!
[ Deep Learning Workshop 中研院 ] 500 位參加者
Case Study: NTHU@TW Undergraduate
h2ps://github.com/yenchenlin1994/DeepLearningFlappyBird
Case Study: UNIST@Korean Undergraduate
To-‐Do for Local Students • We need more students to work on
– realis0c deep learning projects with – enough computer resource
• We need some of them to stay in our local industry
Advanced Deep Learning Course at NTHU (105學年) 1. Taught by a group of profs 2. Topics including latest DNN models, distributed
training, DL for embedded system 3. Sponsored by MTK and ITRI 巨資中心 4. More sponsors are welcomed!
For Talents Abroad Get in the Talents Race!
h2p://cvpr2016.thecvf.com/exhibit/industry_expo
For Talents Abroad
Most of them fresh PhDs
1 Billion Pledged USD
For Talents Abroad
AI is happening Fast
Thanks!