孫民/從電腦視覺看人工智慧 : 下一件大事

Ar#ficial Intelligence: The Next Big Thing

from a computer vision perspec0ve VSLab

清大電機孫民

What’s the Next Big Thing?

h2p://research.microso6.com/en-‐us/um/redmond/events/fs2015

Goal

“big data being the source, machine learning being the technique, and AI being the outcome” by Prof. Hsuan-‐Tien Lin at IEEE BigData 2016

Many kinds of source (data) and outcomes (AI tasks) can be trained end-‐to-‐end using Deep Learning (DL)

Classical AI Tests: Turing Test

by Alan Turing in 1950

Chatbot@F8

h2ps://developers.facebook.com/videos/f8-‐2016/keynote/

Classical AI Tests: CAPTCHA

Breaking CAPTCHA

by vicarious.com

AlphaGo

2016 by Google DeepMind

Are these what AI all about?

2014 Subfields of AI

2015

Ar#fical General Intelligence (AGI)

Deep Learning (DL)

•  Data •  GPU Compu0ng •  Talents

DL Fuses AI-‐subfields •  Vision and Language

•  Vision and Control

h2p://mscoco.org/

Atari Breakout game & AlphaGo, DeepMind.

-‐> AGI

•  Mul0ple Encoding and Decoding

Image Cap#oning

f( ) = The man at bat is ready to swing at the pitch

Vision Language

Recurrent Neuron Network (RNN) credit: Nature

convolu0ons

Convolu#on Neuron Network (CNN) credit: wiki

Image Ques#on Answering

h2p://visualqa.org/

Zhen et al. ECCV 2016 from VSLab and Stanford AI Lab

Big Video Data with Titles •  Pairs of

Raw Video

CNN CNN CNN CNN

Title

Viral Videos

Google for “viral video company”

Large Video Repository

Currently 28740 videos and keep growing

DL Fuses AI-‐subfields •  Vision and Language

•  Vision and Control

h2p://mscoco.org/

Atari Breakout game & AlphaGo, DeepMind.

-‐> AGI

•  Mul0ple Encoding and Decoding

Vision and Control

h2ps://gym.openai.com/

•  Learning to play game with weak supervision: Reinforcement Learning (RL)

Where It All Begins …

by DeepMind in NIPS 2013 Deep Learning Wrokshop

Playing Atari with Deep Reinforcement Learning

slides by Yen-‐Chen Lin

Control: Learning to Act

Play Breakout equals to •  Input: screen images •  Output: ac0ons (do nothing | left | right)

Supervised Classifica0on


Supervised Solu#on •  Training data: Record experts game sessions

•  Target label: Ac0on experts take at every step

•  What if there’s no expert?

•  This is not how human learns

Problems:


How Human Learns •  Don’t need somebody to tell us a million 0mes which move to choose at each screen

•  Just need occasional feedback that we did the right thing


Reinforcement Learning •  Somewhere between supervised and

unsupervised learning •  Sparse and time-delayed labels

Based only on those rewards, the agent has to learn to behave in the environment. A ra0onal agent should op0mize total reward.


RL in A Nutshell


Markov Decision Process

•  State

•  Action

•  Reward

The probability of the next state si+1 depends only on current state si and ac0on ai. slides by

Yen-‐Chen Lin

Episode

One episode of this process (e.g. one game) forms a finite sequence of states, ac0ons and rewards:


Example: Breakout

•  State: game screen

•  Action:

•  Reward: game score

1. do nothing 2. le6 3. right


Example: Breakout

•  State: successive game screens

•  Action:

•  Reward: game score

1. do nothing 2. le6 3. right


•  To perform well, we should also take future rewards into account, how to do that?

Total reward:

Total future reward:

Reward


Discounted Future Reward

•  However, since the environment is stochas0c, intui0vely one should earn reward as soon as possible

Total discounted future reward:


Q func#on

•  Q(s, a):

The maximum discounted future reward when we perform ac0on a in state s, and con0nue optimally from that point on.

It represents the “quality” of a certain action in a given state.


How to Choose Ac#on?

Here π represents the policy, the rule how we choose an ac0on in each state.

If we know Q func0on,


Q Func#on Implementa#on

ac#on 0 ac#on 1 ac#on 2

state 0 -‐2 -‐1 5

state 1 3 2 3

state 2 5 6 -‐6


If We Use Pixels as State

1.  Resize images to 84x84 2.  Convert to grayscale with 256 levels 3.  Use last 4 frames to represent state

25684x84x4 = 1067970 possible game states

We can never cover all the cases!


Vision & Controal: Deep Q Network

We use CNN to represent Q func0on, which takes:

•  Input: the state (4 game screens) and ac0on

•  Output: Q-‐values of different ac0ons a (i.e., Q(s,a))


π( )=argmaxaQ( ,a)

Fusing Mul#ple Sensors

Ke#le%

Medium+wrap%

Ke#le%

Medium+wrap%

thumb+4+finger%

Manipula7on%Region%

Side+view%

Chan et al. ECCV 2015 from VSLab

Left Hand Head Right Hand 81

Lab

Office

Home

Left Hand Head Right Hand 82

Lab

Office

Home

Recogni#on from Wearable Cameras

Pred%

GT%

Pred%

GT%

Gesture%Recogni1on%

Object%Category%Recogni1on%

Real-‐#me Wearable Demo

Fisheye camera NVIDIA TK1

Real-‐#me Wearable Demo cellphone, bo2le, keyboard, mouse, free hand

Take-‐Home Message •  Encoding Source (data)

– N-‐D observa0on – N-‐D sequence of observa0ons

•  Decoding Outcome (AI tasks) – N-‐D single output – N-‐D open-‐ended sequence as output

•  Mul0ple Encoding and Decoding •  If each module is differen0able/approximately differen0able -‐> End-‐to-‐End Learning

We get many tools to tackle Ar#ficial General Intelligence

Just Try!

Worse Thing: Do Nothing

My Two Cents for Taiwan

Ques#ons •  Can I simply ask my engineers to use open source deep learning tools to create new products?

Answer: Yes and Not really. Yes – if you want to complete a well-‐known task. But Google’s MLaaS product will almost always beat you. Not really – if you want to solve your own problem, with your own data. You need talents or make engineers not afraid of failure.

Where can I find talents? •  Most talents are PhD students or young professionals in the US and EU.

h2p://www.economist.com/news/business/21695908-‐silicon-‐valley-‐fights-‐talent-‐universi0es-‐struggle-‐hold-‐their

How can we compete?

Local Students •  Our students know deep learning is HOT!

[ Deep Learning Workshop 中研院 ] 500 位參加者

Case Study: NTHU@TW Undergraduate

h2ps://github.com/yenchenlin1994/DeepLearningFlappyBird

Case Study: UNIST@Korean Undergraduate

To-‐Do for Local Students •  We need more students to work on

–  realis0c deep learning projects with – enough computer resource

•  We need some of them to stay in our local industry

Advanced Deep Learning Course at NTHU (105學年) 1.  Taught by a group of profs 2.  Topics including latest DNN models, distributed

training, DL for embedded system 3.  Sponsored by MTK and ITRI 巨資中心 4.  More sponsors are welcomed!

For Talents Abroad Get in the Talents Race!

h2p://cvpr2016.thecvf.com/exhibit/industry_expo

For Talents Abroad

Most of them fresh PhDs

1 Billion Pledged USD

For Talents Abroad

AI is happening Fast

Thanks!

Data & Analytics

孫民/從電腦視覺看人工智慧 : 下一件大事