53
Introduction to Deep Learning for Dialogue Systems Hwaran Lee SK T-Brain, AI Center October 10, 2019

Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Introduction to Deep Learning for Dialogue Systems

이 화 란 Hwaran Lee

SK T-Brain, AI CenterOctober 10, 2019

Page 2: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks
Page 3: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

OutlineI. Introduction to dialog systemsII. Background• Machine learning• Deep learning and Neural networks

III. Deep learning for Natural Language• Word embedding• Language models

IV. Deep learning for Dialog systems• SUMBT• LaRL• Challenges

3

Page 4: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Brief History of Dialogue Systems

4Material: https://deepdialogue.miulab.tw

MiPad

Page 5: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Brief History of Dialogue Systems

5

Google, Duplex (2018)

Naver Line, Duet (2019)

Microsoft, Xiaoice (2018)

Page 6: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Why Natural Language?• Global Digital Statistics (2018 January)

6

The more natural and convenient input of devices evolves towards speech

Material: https://deepdialogue.miulab.tw

Page 7: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

GUI v.s. CUI (Conversational UI)

7Material: https://deepdialogue.miulab.tw

Page 8: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

GUI v.s. CUI (Conversational UI)

8Material: https://deepdialogue.miulab.tw

Page 9: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Category of Dialogue Systems

9

User says:

• I am smart

• I have a questionWhen Iron Man is dead?

• I need to get this doneI want to book a restaurant

Material: Jianfeng Gao & Michel Galley, Tutorial, ICML, 2019

Dialogue Category

→ Chitchat

→ QA

→ Goal-oriented

Page 10: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Spoken Dialog Systems

10

Backend Action / External Knowledge

• Domain Identification • User Intent Detection • Slot Filling

• Dialog State Tracking (DST) • Dialog Policy

Are there any action movies to see this weekend?

Semantic Frame request_movie genre=action, date=this weekend

System Action / Policy request_locationWhere are you located?

Page 11: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Transition of NLP to Neural Approaches

11

Backend Action / External Knowledge

Neural Model for Each Module

Page 12: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Transition of NLP to Neural Approaches

12

Page 13: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

OutlineI. Introduction to dialog systemsII. Background• Machine learning• Deep learning and Neural networks

III. Deep learning for Natural Language• Word embedding• Language models

IV. Deep learning for Dialog systems• SUMBT• LaRL• Challenges

13

Page 14: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Machine Learning ≈ Find appropriate function

• Speech Recognition( ) = 안녕하세요

• Image classification( ) = Cat

• Go Playing( ) = 5-5 (next movement)

• Chat Bot( “오늘 점심 메뉴 뭐지?” ) = “오늘 점심 식단은…”

f

f

f

f

14Material: https://deepdialogue.miulab.tw

Model fInput Output

Page 15: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Types of Machine Learning

15

• Classification • Image classification • Sentiment text classification

• Regression • Weather forecasting • Market forecasting

• Clustering • Recommender system

• Dimension reduction • Meaningful compression • Feature extraction

• Topic modeling

• Robot Navigation • Game AI • Dialog Policy Learning

Page 16: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Neural Networks and Deep Learning

16

f

Page 17: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Training Neural Nets = Optimization

• Update the weights and biases to decrease loss function 1. Forward pass: compute network output and error2. Backward pass: compute gradient by EBP3. Update weights by gradient descent

w(t+1) ← w(t) − η∂ℒ∂w(t)

17

Forward pass Error Back Propagation

ℒ(θ) = − ∑i

(yi − f (xi))2

Page 18: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Three things defining deep learning

1. Neuron type (activation function)

2. Architecture

3. Learning algorithm:Loss function & Optimization

18

ℒ(θ) = − ∑i

(yi − f(xi))2

Page 19: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

1. Neuron type (activation function)

19

Page 20: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

2. Architecture

20

1. Deep Neural Networks (DNN) • Fully Connected Layers

2. Convolutional Neural Networks (CNN) • Weight sharing and pooling • Spatial data: Image

3. Recurrent Neural Networks (RNN) • Time series data • Speech, Language, Video

Page 21: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

3. Learning algorithm: Loss function & Optimization

• Loss function quantifies gap between prediction and ground truth (labels)

• For regression:• Mean Squared Error (MSE)

• For classification:• Cross Entropy Loss

(a.k.a. Negative Log Likelihood)

21

Mean Squared Error

ℒ(θ) = −1N ∑

i

(ti − f(xi))2

Cross Entropy Loss

ℒ(θ) = −C

∑i

ti log(p(y |xi))

• Optimization: Stochastic gradient descent (SGD)

Page 22: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Neural Network Playground

22http://playground.tensorflow.org

Page 23: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

OutlineI. Introduction to dialog systemsII. Background• Machine learning• Deep learning and Neural networks

III. Deep learning for Natural Language• Word embedding• Language models

IV. Deep learning for Dialog systems• SUMBT• LaRL• Challenges

23

Page 24: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Natural Language Process Tasks• Text Classification• Sentiment classification: I love it → positive? negative?

• Language Generation• Machine translation: 사랑합니다 → Love it

• Image captioning:• Question-answering

(Machine reading comprehension)• POS tagging• Chungking• …

24

Page 25: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Word Embeddings (word2vec)• How to represent word symbols as (semantic) vectors?

25

love it !

Pos Neg

?

https://developers.google.com/machine-learning/crash-course/embeddings/translating-to-a-lower-dimensional-space

Page 26: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Word Embeddings (word2vec)• Learn the meaning of a word from its neighborhoods!

26T. Mikolove et al., Efficient Estimation of Word representations is vector space, 2013.

Page 27: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Language Model• Probability of a sequence of m words:

• Application: Choose the next word:

• N-Gram LM

• (tri-gram)

• Count based approach has weakness on unseen word sequence• Fixed width context

• Neural Language Model• RNNLM (Mikolov, 2010)

p(w1, w2, . . . wm)p(wm+1 |w1,...,m)

p(wm+1 |wm,m−1) =count(wm+1, wm, wm−1)

count(wm, wm−1)

27

Page 28: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Recurrent Neural Networks

28

ht = f(xt, ht−1) = σ(Wxxt + Whht−1 + b)

Page 29: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Long-Term Dependency

29

Page 30: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Long Short-Term Memory (LSTM) Networks

30

Pick what to forget and what to remember!Previous output and new data are fed into the LSTM layers1. Decide what to forget from the memory cell (forget gate)2. Decide what to remember from the data and previous output (input gate)3. Decide what to output (output gate)

(ht−1) (xt)

Memory Cell

Page 31: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Bi-directional RNN

• Learn representations from both past and future time steps

31

Page 32: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Encoder-decoder architecture• Sequence-to-sequence (Seq2seq)

32

• Machine translation • Dialog Response generation

Page 33: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Attention Mechanism

33

Focus on certain parts of the input sequence when predicting a certain part of the output sequence

Page 34: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Transformer, Attention is all you need

34

• Without RNNs, only attention mechanism is used! • Self-attention • Multi-head attention • Positional encoding

Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.

Page 35: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Recent Word and Sentence Representation• BERT: Bi-directional Encoder Representations from

Transformers

35

Transfer Learning

Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

Page 36: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

BERT• Pretraining: Masked Language Model

36

Page 37: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

BERT• Pretraining: Two-sentence Classification

37

Page 38: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

BERT• Fine-tuning for downstream tasks

38

Page 39: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Pre-trained BERT for Korean• Google Bert (Multilingual) : https://github.com/google-research/bert/blob/

master/multilingual.md • ETRI, KorBert: http://aiopen.etri.re.kr/service_dataset.php • SK T-Brain, KoBERT: https://github.com/SKTBrain/KoBERT

39

Page 40: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

OutlineI. Introduction to dialog systemsII. Background• Machine learning• Deep learning and Neural networks

III. Deep learning for Natural Language• Word embedding• Language models

IV. Deep learning for Dialog systems• SUMBT• LaRL• Challenges

40

Page 41: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Conversational Agents

41Material: https://deepdialogue.miulab.tw

Page 42: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Spoken Dialog Systems

42

Backend Action / External Knowledge

• Domain Identification • User Intent Detection • Slot Filling

Are there any action movies to see this weekend?

Semantic Frame request_movie genre=action, date=this weekend

System Action / Policy request_locationWhere are you located?

Page 43: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Multi-domain Goal-Oriented Dialogue System

43

MultiWOZ dataset

Page 44: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Toward End-to-End Multi-Domain Goal-oriented Dialogue systems

44

• Dialog State Tracking (DST) • Dialog Policy

Backend Action / External Knowledge

• Domain Identification • User Intent Detection • Slot Filling

Are there any action movies to see this weekend?

Semantic Frame request_movie genre=action, date=this weekend

System Action / Policy request_locationWhere are you located?

Utterance —> Dialog State Tracking

Dialog States (—> Policy Action) —> Response

Page 45: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

SUMBT: Slot-Utterance Matching Belief Tracker

45Lee H, Lee J, Kim TY. SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking. ACL, 2019.

• Problem: Domain independent belief tracker • Key Idea: Find the slot-value of a domain-slot type from user and system’s

utterances using attention mechanism like question-answering problems

Model MultiWOZ MultiWOZ (Only Restaurant)

Joint Slot Joint SlotMDBT (Ramadan et al., 2018)* 0.1557 0.8953 0.1789 0.5499

GLAD (Zhong et al., 2018)* 0.3557 0.9544 0.5323 0.9654GCE (Nouri et al., 2018)* 0.3627 0.9842 0.6093 0.9585TRADE (Wu et al., 2019) 0.4862 0.9692 0.6535 0.9328

SUMBT 0.49065 0.97290 0.82840 0.96475

Page 46: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

LaRL: Latent Action Reinforcement Learning

46Zhao, T., Xie, K., & Eskenazi, M. Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models. NAACL-HLT 2019

• Problems: • Simple hand-crafted system action space • Word-level RL suffers from credit assignment

• Key Idea: Latent action spaces, decoupling the discourse-level decision-making from natural language generation

User simulator / Evaluator

(Environment)Model

State, Reward

Action

Page 47: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Toward End-to-End Multi-Domain Goal-oriented Dialogue systems

47

• Dialog State Tracking (DST) • Dialog Policy

Backend Action / External Knowledge

• Domain Identification • User Intent Detection • Slot Filling

Are there any action movies to see this weekend?

Semantic Frame request_movie genre=action, date=this weekend

System Action / Policy request_locationWhere are you located?

Utterance —> Dialog State Tracking

Dialog States (—> Policy Action) —> Response

DEMO

Page 48: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Evolution Roadmap

48Material: https://deepdialogue.miulab.tw

Page 49: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

XiaoIce System Architecture

49

Microsoft, Xiaoice (2018)

Page 50: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Dialogue System with Personality

50

https://convai.huggingface.co

Page 51: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

SummaryI. Introduction to dialog systems• Brief history, components and categories of dialogue systems

II. Background• Machine learning:

Supervised, Unsupervised, Reinforcement Learning• Deep learning and Neural networks:

Neuron, Architecture, Learning AlgorithmIII. Deep learning for Natural Language• Word embedding: Skip-gram, CBOW• Language models: RNN, BERT (Attention, Transformer)

IV. Deep learning for Dialog systems• E2E Multi-domain Goal-oriented Dialog System• Future direction

• Empathic, Personality, Open domain, Common sense …

51

Page 52: Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

SK T-Brain, AI Center• https://skt.ai

52