Upload
ainl-conferences
View
177
Download
0
Embed Size (px)
DESCRIPTION
В своем выступлении я опишу наш текущий проект в Interaction Lab, на факультете математики и компьютерных наук университета Херриот-Ватт, Шотландия. Наше исследование посвящено разработке голосовой интерактивной системы, которая может эффективно и адаптивно взаимодействовать с людьми. Такие системы часто используют обучение с подкреплением (Reinforcement Learning), вычислительную модель, которая методом проб и ошибок выучивает сложные модели поведения. Недостатком таких систем является ограниченная масштабируемость, т.е. трудности при работе с большим пространством возможностей и паралелльными задачами. Я опишу три возможных решения этой проблемы: использование предыдущих знаний, повторное использование выученных стратегий и гибкое взаимодействие. Все три подхода будут проиллюстрированы действующими системами, которые тестировались на реальных пользователях. В конце я обсужу возможные направления будущей работы, нацеленной на использование систем Reinforcement Learning в реальных (неэкспериментальных) системах.
Citation preview
1
Hierarchical Reinforcement Hierarchical Reinforcement Learning for Interactive Learning for Interactive
Systems and RobotsSystems and Robots
Heriberto CuayáhuitlHeriberto CuayáhuitlInteraction LabInteraction Lab
Heriot-Watt University, Edinburgh, UKHeriot-Watt University, Edinburgh, UKSchool of Mathematical & Computer SciencesSchool of Mathematical & Computer Sciences
[email protected]@hw.ac.uk
AINL, Moscow, 12-13 September 2014
Mary Ellen Foster
Simon Keizer
Zhuoran Wang
Srini Janarthanam
Xingkun Liu
Helen Hastie Oliver Lemon
Verena Rieser
Dimitra Gkatzia
Nina Dethlefs Arash Eshghi
2
Heriberto Cuayahuitl
Ioannis Efstathiou
Kathin Lohan
Wenshuo Tang
3
Reinforcement Learning Projects
Interactive Learning System/Robot
• Interactive learning machine: is an entity which improves its performance through interacting with other machines, its physical world and/or humans.
4(Cuayáhuitl, H., et al., 2013, IJCAI-MLIS)
A Motivating Scenario
A robot learning to play multiple games
from interaction 5
Outline
1. Reinforcement Learning (RL)
2. Hierarchical
RL
3. Applications
4. Related Work
5. Future Directions
Interactive Learning Systems
6
6. Summary
Outline: Where are we?
1. Reinforcement Learning (RL)
2. Hierarchical
RL
3. Applications
4. Related Work
5. Future Directions
Interactive Learning Systems
7
6. Summary
Interaction as a Markov Decision Process (MDP)
● The environment is described as an MDP:● A set of states S;● A set of actions A;● A state transition function T; ● A reward function R.
● The MDP solution (policy or interaction manager) decides what to do using reinforcement learning
Choice pointsPr(s2|s1,a1)
Reinforcement Learning is not Trivial
9
100 101 102100
105
1010
1015
1020
1025
1030
Stat
e Sp
ace
Gro
wth
Number of Binary Variables
Known Issues: Scalability and
Partial Observability
The Goal of Reinforcement Learners
The goal is to find an optimal policy:
How to Represent the Agent's Policy?
● Tabular representations
● Tree-based representations
● Function approximation● Linear
● Non-linear11
Reinforcement Learning Algorithms
● Q-Learning
● Q-Learning with Linear Function Approximation
12(Sutton & Barto, MIT Press, 1998; Szepesvari, Morgan Clay Pub., 2010)
Illustrative Example: The Interactive Taxi
• State Trans.: 0.8 of correct navigation/recognition
• Reward:+100 for reaching the goal, 0 otherwise
• Size of state-action space: |S*A| = 50*5^4*3*4*16 = 6M state-actions 13
Outline: Where are we?
1. Reinforcement Learning (RL)
2. Hierarchical
RL
3. Applications
4. Related Work
5. Future Directions
Interactive Learning Systems
14
6. Summary
Hierarchical Reinforcement Learning
• Why? To learn system behaviours to carry out multiple tasks jointly (not separately)
15
I know how to do that, from playing the other game
Interaction as a Semi-Markov Decision Process (SMDP)
● Environment as an SMDP:● S: set of states● A: set of (complex) actions● T: state transition function● R: reward function
● One SMDP for each task or subtask
● Hierarchical reinforcement learning algorithms to solve SMDPs (e.g. HSMQ, MAXQ)
Tasks
Task1
Task N
Sub-task
Sub-Task
Sub-task
Sub-Task
16
The goal is to find:
Conceptual SMDP for Interactive Systems
quicker learning, more scalability, behaviour reuse
Benefits
Hierarchical Reinforcement Learning Algorithms
● HSMQ-Learning
● HSMQ-Learning with Linear Function Approximation
● Other HRL algorithms: MAXQ, HAMQ
● Algorithms for structure learning: HEXQ, VISA, HI-MAT
18(Barto & Mahadevan, 2003; Hengst, 2010)
Illustrative Example: The Interactive Taxi
• State Trans.: 0.8 of correct navigation/recognition
• Reward:+100 for reaching the goal, 0 otherwise
• State-action space: |S*A| = 10.7K state-actions19
Outline: Where are we?
1. Reinforcement Learning (RL)
2. Hierarchical
RL
3. Applications
4. Related Work
5. Future Directions
Interactive Learning Systems
20
6. Summary
Speech-Based Human-Machine Communication
HRL Agents
Application 1: Travel Planning
● HRL without prior knowledge (HSMQ-Learning)
● HRL with prior knowledge (HAM+HSMQ-Learning)
● Training with simulated interactions
● Testing with real users
22(Cuayahuitl et al., Computer, Speech & Language, 2010)
W=joint state (SMDP+HAM)
Travel Planning Spoken Dialogue System
23(Cuayáhuitl et al., Computer, Speech & Language, 2010)
Results in the Travel Planning Domain
24
• HRL finds solutions faster than flat learning
• HRL is more scalable than flat learning
• Learnt policies outperform hand-coded ones(Cuayáhuitl et al., Computer, Speech & Language, 2010)
Application 2: Indoor Wayfinding
● HRL without policy reuse (HSMQ-Learning)
● HRL with policy reuse (HSMQ_PR-Learning)● Detect situations where the system knows how to act● Action-selection using an optimal (if reuse=true) or an
exploratory policy (if reuse=false)
● Training with simulated interactions
● Testing with real users
25(Cuayahuitl et al., Computer, Speech & Language, 2010)
Indoor Wayfinding Dialogue System
26(Cuayáhuitl & Dethlefs., ACM Trans. Speech & Lang. Proc., 2011)
Infokiosk & mobile phone
interfaces
Results in the Indoor Wayfinding Domain
27
• Policy reuse finds solutions faster than without it
• Adaptive route instructions are more efficient
(Cuayáhuitl & Dethlefs., ACM Trans. Speech & Lang. Proc., 2011)
Application 3: Human-Robot Interaction
● HSMQ vs. FlexHSMQ Learning w/linear function approx. ● Training with simulated interactions● Testing with real users
28(Cuayahuitl et al., Computer, Speech & Language, 2010)
Robot Dialogue System (Quiz Game)
29
Interaction Manager
(Cuayáhuitl et al., ACM Trans. Interactive Intelligent Sys., 2014)
Results in the Quiz Domain
30
• Non-strict HRL leads to more natural interactions
• Non-strict HRL is preferred by human users
(Cuayáhuitl et al., ACM Trans. Interactive Intelligent Sys., 2014)
Robot Asking and Answering Questions
(Belpaeme, et al., 2012, Intl. Journal of HRI) 31
Outline: Where are we?
1. Reinforcement Learning (RL)
2. Hierarchical
RL
3. Applications
4. Related Work
5. Future Directions
Interactive Learning Systems
32
6. Summary
Learning with Large State Spaces
33
Learning under Uncertainty
34
Spectrum of Markov Process Models
Promising for multi-task learning systems
35
(Mahadevan, S. et al., 2004, Handbook of Learning and Approx. Dyn. Prog.)
Outline: Where are we?
1. Reinforcement Learning (RL)
2. Hierarchical
RL
3. Applications
4. Related Work
5. Future Directions
Interactive Learning Systems
36
6. Summary
Issues that Might Lead to Future Interactive Learning Systems
1.Big effort to make the system perform similar tasks
2.Simulations may not represent the real world
3.It is often hard to specify the reward function
4.The real world is partially known and dynamic
5.Poor spatial cognition will affect real world impact
6.Small vocabularies discourage talking to machines
7.Lack of interactive learning systems in the real world
37
Towards Autonomous Interactive Systems and Robots
Degre
e o
f auto
nom
y
Amount of tasks
Current interactive systems require a
lot of human intervention
Future interactive systems should
be more autonomous
How do we get here?
38
Wholistic perspective for language, vision and robotics
Outline: Where are we?
1. Reinforcement Learning (RL)
2. Hierarchical
RL
3. Applications
4. Related Work
5. Future Directions
Interactive Learning Systems
39
6. Summary
Summary
• Machines can be programmed to behave just as expected, but the physical world and humans demand systems that can learn
• Hierarchical learning plays an important role for multi-tasked interactive systems and robots
• More autonomy is needed if systems are to learn new skills with little human intervention
• A wholistic interdisciplinary perspective is needed for intelligent interactive robots
40
References• Cuayáhuitl, H., Dethlefs, N., Kruijff -Korbayová, I., (2014) Non-
Strict Hierarchical Reinforcement Learning for Interactive Systems and Robots. To appear in ACM Transactions on Intelligent Interactive Systems, vol. 4, no. 3.
• Cuayáhuitl, H. and Dethlefs, N., (2011), Spatially-Aware Dialogue Control Using Hierarchical Reinforcement Learning. In ACM Transactions on Speech and Language Processing, vol. 7, no. 3, pp. 5:1-5:26.
• Cuayáhuitl, H., Renals, S., Lemon, O., Shimodaira, H., (2010), Evaluation of a Hierarchical Reinforcement Learning Spoken Dialogue System. In Computer Speech and Language, vol. 24, no. 2, pp. 395-429.
E-Mail: [email protected]