孫民/從電腦視覺看人工智慧 : 下一件大事

Preview:

Citation preview

Ar#ficial  Intelligence:    The  Next  Big  Thing    

from  a  computer  vision  perspec0ve    VSLab  

清大電機 孫民

What’s  the  Next  Big  Thing?  

h2p://research.microso6.com/en-­‐us/um/redmond/events/fs2015  

Goal    

“big  data  being  the  source,  machine  learning  being  the  technique,  and  AI  being  the  outcome”    by  Prof.  Hsuan-­‐Tien  Lin  at  IEEE  BigData  2016    

Many  kinds  of  source  (data)  and  outcomes  (AI  tasks)  can  be  trained  end-­‐to-­‐end  using  Deep  Learning  (DL)  

Classical  AI  Tests:  Turing  Test  

by  Alan  Turing  in  1950  

Chatbot@F8  

h2ps://developers.facebook.com/videos/f8-­‐2016/keynote/  

Classical  AI  Tests:  CAPTCHA  

Breaking  CAPTCHA  

by  vicarious.com  

AlphaGo  

2016  by  Google  DeepMind  

Are  these  what  AI  all  about?  

2014  Subfields  of  AI  

2015  

Ar#fical  General  Intelligence  (AGI)  

Deep  Learning  (DL)  

•  Data  •  GPU  Compu0ng  •  Talents  

DL  Fuses  AI-­‐subfields  •  Vision  and  Language  

 •  Vision  and  Control  

h2p://mscoco.org/  

Atari  Breakout  game  &  AlphaGo,  DeepMind.  

-­‐>  AGI  

•  Mul0ple  Encoding  and  Decoding  

Image  Cap#oning  

f(                      )  =  The  man  at  bat  is  ready  to  swing    at  the  pitch  

Vision   Language  

Recurrent  Neuron  Network  (RNN)  credit:  Nature  

convolu0ons  

Convolu#on  Neuron  Network  (CNN)  credit:  wiki  

Image  Ques#on  Answering  

h2p://visualqa.org/  

Zhen  et  al.  ECCV  2016  from  VSLab  and  Stanford  AI  Lab  

Big  Video  Data  with  Titles  •  Pairs  of  

Raw  Video    

CNN   CNN   CNN   CNN  

Title  

Viral  Videos  

Google  for  “viral  video  company”  

Large  Video  Repository  

Currently  28740  videos  and  keep  growing  

DL  Fuses  AI-­‐subfields  •  Vision  and  Language  

 •  Vision  and  Control  

h2p://mscoco.org/  

Atari  Breakout  game  &  AlphaGo,  DeepMind.  

-­‐>  AGI  

•  Mul0ple  Encoding  and  Decoding  

Vision  and  Control  

h2ps://gym.openai.com/  

•  Learning  to  play  game  with  weak  supervision:    Reinforcement  Learning  (RL)  

Where  It  All  Begins  …    

by  DeepMind  in  NIPS  2013  Deep  Learning  Wrokshop  

Playing Atari with Deep Reinforcement Learning

slides  by    Yen-­‐Chen  Lin  

Control:  Learning  to  Act  

Play  Breakout  equals  to  •  Input:  screen  images  •  Output:  ac0ons          (do  nothing  |  left  |  right)    

Supervised  Classifica0on  

slides  by    Yen-­‐Chen  Lin  

Supervised  Solu#on    •  Training data:  Record  experts  game  sessions  

•  Target label:  Ac0on  experts  take  at  every  step  

•  What  if  there’s  no  expert?  

•  This  is  not  how  human  learns  

Problems:  

slides  by    Yen-­‐Chen  Lin  

How  Human  Learns  •  Don’t  need  somebody  to  tell  us  a  million  0mes  which  move  to  choose  at  each  screen  

•  Just  need  occasional feedback  that  we  did  the  right  thing  

slides  by    Yen-­‐Chen  Lin  

Reinforcement  Learning  •  Somewhere  between  supervised  and  

unsupervised  learning  •  Sparse  and  time-delayed  labels  

Based  only  on  those  rewards,  the  agent  has  to  learn  to  behave  in  the  environment.    A  ra0onal  agent  should  op0mize  total  reward.  

slides  by    Yen-­‐Chen  Lin  

RL  in  A  Nutshell  

slides  by    Yen-­‐Chen  Lin  

Markov  Decision  Process  

•  State    

•  Action    

•  Reward

The  probability  of  the  next  state  si+1  depends  only  on  current  state  si  and  ac0on  ai. slides  by    

Yen-­‐Chen  Lin  

Episode  

One  episode  of  this  process  (e.g.  one  game)  forms  a  finite  sequence  of  states,  ac0ons  and  rewards:  

slides  by    Yen-­‐Chen  Lin  

Example:  Breakout  

•  State: game  screen    

•  Action:

•  Reward:  game  score  

1. do  nothing  2.  le6  3.  right  

slides  by    Yen-­‐Chen  Lin  

Example:  Breakout  

•  State: successive game  screens    

•  Action:

•  Reward:  game  score  

1. do  nothing  2.  le6  3.  right  

slides  by    Yen-­‐Chen  Lin  

•  To  perform  well,  we  should  also  take  future  rewards  into  account,  how  to  do  that?  

Total reward:

Total future reward:

Reward  

slides  by    Yen-­‐Chen  Lin  

Discounted  Future  Reward  

•  However,  since  the  environment  is  stochas0c,  intui0vely  one  should  earn  reward  as  soon  as  possible  

Total discounted future reward:

slides  by    Yen-­‐Chen  Lin  

Q  func#on  

•  Q(s, a):

The  maximum discounted future reward    when  we  perform  ac0on  a  in  state  s,    and  con0nue  optimally  from  that  point  on.  

It represents the “quality” of a certain action in a given state.

slides  by    Yen-­‐Chen  Lin  

How  to  Choose  Ac#on?  

Here  π  represents  the  policy,    the  rule  how  we  choose  an  ac0on  in  each  state.  

If  we  know  Q  func0on,  

slides  by    Yen-­‐Chen  Lin  

Q  Func#on  Implementa#on  

ac#on  0   ac#on  1   ac#on  2  

state  0   -­‐2   -­‐1   5  

state  1   3   2   3  

state  2   5   6   -­‐6  

slides  by    Yen-­‐Chen  Lin  

If  We  Use  Pixels  as  State  

1.  Resize  images  to  84x84  2.  Convert  to  grayscale  with  256  levels  3.  Use  last  4  frames  to  represent  state  

25684x84x4  =  1067970      possible  game  states  

We  can  never  cover  all  the  cases!  

slides  by    Yen-­‐Chen  Lin  

Vision  &  Controal:  Deep  Q  Network  

We  use  CNN  to  represent  Q  func0on,  which  takes:  

•  Input:  the  state  (4  game  screens)  and  ac0on  

•  Output:  Q-­‐values  of  different  ac0ons  a  (i.e.,  Q(s,a))  

slides  by    Yen-­‐Chen  Lin  

π(              )=argmaxaQ(                ,a)    

Fusing  Mul#ple  Sensors  

Ke#le%

Medium+wrap%

Ke#le%

Medium+wrap%

thumb+4+finger%

Manipula7on%Region%

Side+view%

Chan  et  al.  ECCV  2015  from  VSLab  

Left Hand Head Right Hand 81

Lab

Office

Home

Left Hand Head Right Hand 82

Lab

Office

Home

Recogni#on  from  Wearable  Cameras  

Pred%

GT%

Pred%

GT%

Gesture%Recogni1on%

Object%Category%Recogni1on%

Real-­‐#me  Wearable  Demo  

Fisheye  camera   NVIDIA  TK1  

Real-­‐#me  Wearable  Demo  cellphone,  bo2le,  keyboard,  mouse,  free  hand  

Take-­‐Home  Message  •  Encoding  Source  (data)  

– N-­‐D  observa0on  – N-­‐D  sequence  of  observa0ons  

•  Decoding  Outcome  (AI  tasks)  – N-­‐D  single  output  – N-­‐D  open-­‐ended  sequence  as  output    

•  Mul0ple  Encoding  and  Decoding  •  If  each  module  is  differen0able/approximately  differen0able  -­‐>  End-­‐to-­‐End  Learning  

We  get  many  tools  to  tackle  Ar#ficial  General  Intelligence  

 Just  Try!  

Worse  Thing:  Do  Nothing  

My  Two  Cents  for  Taiwan  

Ques#ons  •  Can  I  simply  ask  my  engineers  to  use  open  source  deep  learning  tools  to  create  new  products?  

Answer:  Yes  and  Not  really.  Yes  –  if  you  want  to  complete  a  well-­‐known  task.  But  Google’s  MLaaS  product  will  almost  always  beat  you.  Not  really  –  if  you  want  to  solve  your  own  problem,  with  your  own  data.  You  need  talents  or  make  engineers  not  afraid  of  failure.  

Where  can  I  find  talents?  •  Most  talents  are  PhD  students  or  young  professionals  in  the  US  and  EU.  

h2p://www.economist.com/news/business/21695908-­‐silicon-­‐valley-­‐fights-­‐talent-­‐universi0es-­‐struggle-­‐hold-­‐their  

How  can  we  compete?  

Local  Students  •  Our  students  know  deep  learning  is  HOT!  

[  Deep  Learning  Workshop  中研院  ]  500  位參加者  

Case  Study:  NTHU@TW  Undergraduate  

h2ps://github.com/yenchenlin1994/DeepLearningFlappyBird  

Case  Study:  UNIST@Korean  Undergraduate  

To-­‐Do  for  Local  Students  •  We  need  more  students  to  work  on    

–  realis0c  deep  learning  projects  with    – enough  computer  resource  

•  We  need  some  of  them  to  stay  in  our  local  industry  

Advanced  Deep  Learning  Course  at  NTHU  (105學年)  1.  Taught  by  a  group  of  profs  2.  Topics  including  latest  DNN  models,  distributed  

training,  DL  for  embedded  system  3.  Sponsored  by  MTK  and  ITRI  巨資中心  4.  More  sponsors  are  welcomed!  

For  Talents  Abroad  Get  in  the  Talents  Race!  

h2p://cvpr2016.thecvf.com/exhibit/industry_expo  

For  Talents  Abroad  

Most  of  them  fresh  PhDs  

1  Billion  Pledged  USD  

For  Talents  Abroad  

AI  is  happening  Fast  

Thanks!  

Recommended