RNN & NLP Application - cs.kangwon.ac.krcs.kangwon.ac.kr/~leeck/NLP/RNN_NLP.pdf · Long Short-Term Memory RNN • Vanishing Gradient Problem for RNN • LSTM can preserve gradient

RNN & NLP Application

강원대학교 IT대학

이창기

차례

• RNN

• NLP application

Recurrent Neural Network

• “Recurrent” property dynamical system over time

Bidirectional RNN

• Exploit future context as well as past

Long Short-Term Memory RNN• Vanishing Gradient Problem for RNN

• LSTM can preserve gradient information

LSTM Block Architecture

Gated Recurrent Unit (GRU)

• 𝑟𝑡 = 𝜎 𝑊𝑥𝑟𝑥𝑡 + 𝑊ℎ𝑟ℎ𝑡−1 + 𝑏𝑟

• 𝑧𝑡 = 𝜎 𝑊𝑥𝑥𝑥𝑡 + 𝑊ℎ𝑧ℎ𝑡−1 + 𝑏𝑧

• ℎ𝑡 = 𝜙 𝑊𝑥ℎ𝑥𝑡 + 𝑊ℎℎ 𝑟𝑡 ⊙ ℎ𝑡−1 + 𝑏ℎ

• ℎ𝑡 = 𝑧𝑡ℎ𝑡 + 1 − 𝑧𝑡 ℎ𝑡

• 𝑦𝑡 = 𝑔(𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦)

차례

• RNN

• NLP application

Sequence Labeling – RNN, LSTM

Word embedding

Featureembedding

FFNN(or CNN), CNN+CRF (SENNA)

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1)y(t-1) y(t )

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1)y(t-1) y(t )

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1)y(t-1) y(t )

x(t-1) x(t ) x(t+1)

y(t+1)y(t-1) y(t )

RNN, CRF Recurrent CRF

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1)y(t-1) y(t )

x(t-1) x(t ) x(t+1)

y(t+1)y(t-1) y(t )

C(t)x(t ) h(t )

i (t )

f (t )

o(t )

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1)y(t-1) y(t )

LSTM RNN + CRF LSTM-CRF (KCC 15)

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1)y(t-1) y(t )

LSTM-CRF

• 𝑖𝑡 = 𝜎 𝑊𝑥𝑖𝑥𝑡 + 𝑊ℎ𝑖ℎ𝑡−1 + 𝑊𝑐𝑖𝑐𝑡−1 + 𝑏𝑖

• 𝑓𝑡 = 𝜎 𝑊𝑥𝑓𝑥𝑡 + 𝑊ℎ𝑓ℎ𝑡−1 + 𝑊𝑐𝑓𝑐𝑡−1 + 𝑏𝑓

• 𝑐𝑡 = 𝑓𝑡 ⊙ 𝑐𝑡−1 + 𝑖𝑡 ⊙ tanh 𝑊𝑥𝑐𝑥𝑡 + 𝑊ℎ𝑐ℎ𝑡−1 + 𝑏𝑐

• 𝑜𝑡 = 𝜎 𝑊𝑥𝑜𝑥𝑡 + 𝑊ℎ𝑜ℎ𝑡−1 + 𝑊𝑐𝑜𝑐𝑡 + 𝑏𝑜

• ℎ𝑡 = 𝑜𝑡 ⊙ tanh(𝑐𝑡)


• 𝑦𝑡 = 𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦

• 𝑠 𝐱, 𝐲 = 𝑡=1𝑇 𝐴 𝑦𝑡−1, 𝑦𝑡 + 𝑦𝑡

• log 𝑃 𝐲 𝐱 = 𝑠 𝐱, 𝐲 − log 𝐲′ exp(𝑠(𝐱, 𝐲′))

GRU+CRF

• 𝑟𝑡 = 𝜎 𝑊𝑥𝑟𝑥𝑡 + 𝑊ℎ𝑟ℎ𝑡−1 + 𝑏𝑟

• 𝑧𝑡 = 𝜎 𝑊𝑥𝑧𝑥𝑡 + 𝑊ℎ𝑧ℎ𝑡−1 + 𝑏𝑧

• ℎ𝑡 = 𝜙 𝑊𝑥ℎ𝑥𝑡 + 𝑊ℎℎ 𝑟𝑡 ⊙ ℎ𝑡−1 + 𝑏ℎ

• ℎ𝑡 = 𝑧𝑡 ⊙ ℎ𝑡−1 + 𝟏 − 𝑧𝑡 ⊙ ℎ𝑡


• 𝑦𝑡 = 𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦

• 𝑠 𝐱, 𝐲 = 𝑡=1𝑇 𝐴 𝑦𝑡−1, 𝑦𝑡 + 𝑦𝑡

• log 𝑃 𝐲 𝐱 = 𝑠 𝐱, 𝐲 − log 𝐲′ exp(𝑠(𝐱, 𝐲′))

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1)y(t-1) y(t )

Bi-LSTM CRF

• Bidirectional LSTM+CRF

• Bidirectional GRU+CRF

• Stacked Bi-LSTM+CRF …

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1)y(t-1) y(t )

bh(t-1) bh(t ) bh(t+1)

Stacked LSTM CRF

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1)y(t-1) y(t )

bh(t-1) bh(t ) bh(t+1)

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1)y(t-1) y(t )

h2(t-1) h2(t ) h2(t+1)

LSTM CRF with Context words= CNN + LSTM CRF

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1)y(t-1) y(t )

x(t-2) x(t+2)

• Bi-LSTM CRF =~ LSTM CRF with Context > LSTM CRF

Neural Architectures for NER(Arxiv16)

• LSTM-CRF model + Char-based Word Representation– Char: Bi-LSTM RNN

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF (ACL16)

• LSTM-CRF model + Char-level Representation– Char: CNN

NER with Bidirectional LSTM-CNNs(Arxiv16)

LSTM RNN 기반 한국어 감성분석

• LSTM RNN-based encoding– Sentence embedding 입력

– Fully connected NN 출력

– GRU encoding 도 유사함

x(1) x(2 ) x(t)

h(1) h(2 ) h(t)

y

Data set Model Accuracy

MobileTrain: 4543Test: 500

SVM (word feature) 85.58

CNN(relu,kernel3,hid50)+Word embedding(word feature)

91.20

GRU encoding + Fully connected NN 91.12

LSTM RNN encoding + Fully connected NN 90.93

Neural Machine Translation

T|S 777항공편 은 3시간 동안 지상 에 있 겠 습니다 . </s>

flight 0.5 0.4 0 0 0 0 0 0 0 0 0 0 0

777 0.3 0.6 0 0 0 0 0 0 0 0 0 0 0

is 0 0.1 0 0 0.1 0.2 0 0.4 0 0.1 0 0 0

on 0 0 0 0 0 0 0 0.7 0.2 0.1 0 0 0

the 0 0 0 0.2 0.3 0.3 0.1 0 0 0 0 0

ground 0 0 0 0.1 0.2 0.5 0.3 0 0 0 0 0 0

for 0 0 0 0.1 0.2 0.5 0.1 0.1 0 0 0 0 0

three 0 0 0 0.2 0.2 0.6 0 0 0 0 0 0 0

hours 0 0 0 0.1 0.3 0.5 0 0 0 0 0 0 0

. 0 0 0 0.4 0 0.1 0.2 0.1 0.1 0.1 0 0 0

</s> 0 0 0 0 0 0 0 0.1 0 0.1 0.1 0.3 0.3

Recurrent NN Encoder–Decoderfor Statistical Machine Translation (EMNLP14)

GRU RNN EncodingGRU RNN DecodingVocab: 15,000 (src, tgt)

Sequence to Sequence Learning with Neural Networks (NIPS14 – Google)

Source Voc.: 160,000Target Voc.: 80,000Deep LSTMs with 4 layersTrain: 7.5 epochs (12M sentences, 10 days with 8-GPU machine)

Neural MT by Jointly Learning to Align and Translate (ICLR15)

GRU RNN + Alignment EncodingGRU RNN DecodingVocab: 30,000 (src, tgt)Train: 5 days

Abstractive Text Summarization (한글 및 한국어 16)

로드킬로 숨진 친구의 곁을 지키는 길고양이의 모습이 포착되었다.

RNN_search+input_feeding+CopyNet

End-to-End 한국어 형태소 분석(동계학술대회16)

Attention + Input-feeding + Copying mechanism

Sequence-to-sequence 기반 한국어 구구조 구문 분석(한글 및 한국어 16)

입력 예시 1 43/SN 국/NNG <sp> 참가/NNG

입력 예시 2 4 3 <SN> 국 <NNG> <sp> 참 가 <NNG>

43/SN 국/NNG+ 참가/NNG

NP NP

NP

(NP (NP 43/SN + 국/NNG) (NP 참가/NNG))

GRU

GRU

GRU

GRU

GRU

GRU

x1

h1t-1

h2t-1

yt-1

h1t

h2t

yt

x2 xT

ct

Attention + Input-feeding

입력

선 생 <NNG> 님 <XSN> 의 <JKG> <sp> 이 야 기 <NNG> <sp> 끝 나 <VV> 자 <EC> <sp> 마 치 <VV> 는 <ETM> <sp>

종 <NNG> 이 <JKS> <sp> 울 리 <VV> 었 <EP> 다 <EF> . <SF>

정답(S (S (NP_SBJ (NP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) (S

(NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) )

RNN-search[7] (Beam size 10)

(S (VP (NP_OBJ (NP_MOD XX ) (NP_OBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) )

RNN-search + Input-feeding +

Dropout (Beam size 10)

(S (S (NP_SBJ (NP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) )

Sequence-to-sequence 기반 한국어 구구조 구문 분석(한글 및 한국어 16)

모델 F1

스탠포드 구문분석기[13] 74.65

버클리 구문분석기[13] 78.74

형태소 + <sp> RNN-search[7] (Beam size 10)

87.34(baseline)

87.65*(+0.31)

형태소의 음절 + 품사태그 + <sp>

RNN-search[7] (Beam size 10)87.69(+0.35)

88.00*(+0.66)

RNN-search + Input-feeding (Beam size 10)88.23(+0.89)

88.68*(+1.34)

RNN-search + Input-feeding + Dropout (Beam size 10)

88.78(+1.44)

89.03*(+1.69)

Neural Responding Machine for Short-Text Conversation (ACL 15)

Neural Responding Machine – cont’d

실험 결과 (ACL 15)

Short-Text Conversation(동계학술대회16)

- Data: 클리앙 ‘아무거나 질문 게시판’- 77,346 질문-응답 쌍- 학습:개발:평가 = 8:1:1

이미지 캡션 생성 소개

• 이미지 내용 이해 이미지 내용을 설명하는 캡션자동 생성– 이미지 인식(이해) 기술 + 자연어처리(생성) 기술

• 활용 분야– 이미지 검색

– 맹인들을 위한 사진 설명, 네비게이션

– 유아 교육, …

기존 연구• Multimodal RNN (M-RNN) [2]

Baidu CNN + vanilla RNN

CNN: VGGNet

• Neural Image Caption generator (NIC) [4] Google CNN + LSTM RNN

CNN: GoogLeNet

• Deep Visual-Semantic alignments (DeepVS) [5] Stanford University RCNN + Bi-RNN

alignment (training) CNN + vanilla RNN

CNN: AlexNet

AlexNet, VGGNet

RNN을 이용한 이미지 캡션 생성(동계학술대회 15)

• CNN + RNN

– CNN: VGGNet 15 번째 layer (4096 차원)

– RNN: GRU (LSTM RNN의 변형)• Hidden layer unit: 500, 1000 (Best)

• Multimodal layer unit: 500, 1000 (Best)

– Word embedding• SENNA: 50차원 (Best)

• Word2Vec: 300 차원

– Data set• Flickr 8K : 8000 이미지 * 이미지 캡션 5문장

– 6000 학습, 1000 검증, 1000 평가

• Flickr 30K : 31783 이미지 * 이미지 캡션 5문장

– 29000 학습, 1014 검증, 1000 평가

– 4가지 모델 실험• GRU-DO1, GRU-DO2, GRU-DO3, GRU-DO4

GRU

Embedding

CNNMultimodal

Softmax

Wt

Wt+1

Image

GRU

Embedding

CNNMultimodal

Softmax

Wt

Wt+1

Image GRU

Embedding

CNNMultimodal

Softmax

Wt

Wt+1

Image

• GRU-DO1 • GRU-DO2

• GRU-DO3 • GRU-DO4

RNN을 이용한 이미지 캡션 생성(동계학술대회 15)

GRU

Embedding

CNNMultimodal

Softmax

Wt

Wt+1

Image GRU

Embedding

CNNMultimodal

Softmax

Wt

Wt+1

Image GRU

Embedding

CNNMultimodal

Softmax

Wt

Wt+1

Image

Flickr 30K B-1 B-2 B-3 B-4

m-RNN (Baidu)[2] 60.0 41.2 27.8 18.7

DeepVS (Stanford)[5] 57.3 36.9 24.0 15.7

NIC (Google)[4] 66.3 42.3 27.7 18.3

Ours-GRU-DO1 63.01 43.60 29.74 20.14

Ours-GRU-DO2 63.24 44.25 30.45 20.58

Ours-GRU-DO3 62.19 43.23 29.50 19.91

Ours-GRU-DO4 63.03 43.94 30.13 20.21

Flickr 8K B-1 B-2 B-3 B-4

m-RNN (Baidu)[2] 56.5 38.6 25.6 17.0

DeepVS (Stanford)[5] 57.9 38.3 24.5 16.0

NIC (Google)[4] 63.0 41.0 27.0 -

Ours-GRU-DO1 63.12 44.27 29.82 19.34

Ours-GRU-DO2 61.89 43.86 29.99 19.85

Ours-GRU-DO3 62.63 44.16 30.03 19.83

Ours-GRU-DO4 63.14 45.14 31.09 20.94

Flickr30k 실험 결과

A black and white dog is jumping in the grass

A group of people in the snow

Two men are working on a roof

신규 데이터

A large clock tower infront of a building

A man in a field throwing a frisbee

A little boy holding a white frisbee

A man and a woman are playing with a sheep

한국어 이미지 캡션 생성

한 어린 소녀가풀로 덮인 들판에서 있다

건물 앞에 서 있는한 남자

분홍색 개를 데리고 있는 한 여자와 한 여자

구명조끼를 입은한 작은 소녀가 웃고 있다

GRU

Embedding

CNNMultimodal

Softmax

Wt

Wt+1

Image

Residual Network + 한국어 이미지 캡션 생성(동계학술대회16)

Documents

RNN & NLP Application - cs.kangwon.ac.krcs.kangwon.ac.kr/~leeck/NLP/RNN_NLP.pdf · Long Short-Term Memory RNN • Vanishing Gradient Problem for RNN • LSTM can preserve gradient