深層学習による自然言語処理の研究動向

PowerPoint

2016-04-28AI

0

1 Hiroyuki Shindo

@haplotyper (twitter), @hshindo (Github)

1

2https://github.com/hshindo/Merlin.jl

2

NAISTCREST

3

DocumentSectionParagraphSentenceDependency

Word

L

User Interface / Document Visualization

3

stress sensor

AB

impulsivityEmpathy

TwitterFacebook

social typefMRI

NAISTCREST

5MWEEx. a number of , not only ... but ...

5

6

Hello

Todays newsHe have a pen.has?Summary

etc

6

7

AudioImageText

From: https://www.tensorflow.org/

7

8

John loves Mary .

8

9bot

9

10NN

10

11 f

SigmoidTanhRectifier Linear Unit

Convolution

Pooling

etc...

11

12

12

13

13

RNN

14

...

LSTM, GRU

14

Long-Short Term Memory (LSTM)

15

Gated Recurrent Unit

15

16

16

17

DTCDJJNNVNNThe auto maker sold 1000 cars last year.45DT: (the, a, an, ...)N: V:CD:JJ:

17

18

The auto maker sold ...

1001...01w0 = makerw1 = soldW-1 = autow-1w0w1w0, w1, w-1w0 n-gramw0 && n-gramw1 && n-gramw2 && n-gramEtc

106 109

18

19

VB


w-1w0w1

19

20


w-1w0w1

101 102 1.1-0.5-0.1... 3.7-2.1

20

21

3.21.45.1???

21

22RNNA B C DX Y Z

A

B

C

D

X

Y

Z

XYZ

Sutskever et al., Sequence to Sequence Learning with Neural Networks, Arxiv, 2014

22

23RNN

A

B

C

D

X

Y

Z

XYZ

Bahdanau et al., Neural Machine Translation by Jointly Learning to Align and Translate, ICLR, 2015

23

24RNN

A

B

C

D

X

Y

Z

XYZ


24

25RNN

A

B

C

D

X

Y

Z

XYZ


25

26RNN

A

B

C

D

X

Y

Z

XYZ


26

27


27

28RNNRush et al., A Neural Attention Model for Sentence Summarization, EMNLP, 2015russian defense minister ivanov called sunday for the creation of a joint front for combating global terrorismrussia calls for joint front against terrorism

28

29

A

cat

sofa

Acatis

RNN (with LSTM, GRU)

29

Softmax

30

~105

~105~102Softmax

30

Softmax

31Softmax [Morin+ 2005] [Ji+ 2016]SoftmaxSparsemax [Martins+ 2016]Spherical softmax [Vincent+ 2015]Self-normalization [Andreas and Klein 2015]

or

31

SoftmaxVincent

32Vincent et al., Efficient Exact Gradient Update for training Deep Networks with Very Large Sparse Targets, Arxiv, 2014W

WDdD:

32

SoftmaxVincent

33

Vincent et al., Efficient Exact Gradient Update for training Deep Networks with Very Large Sparse Targets, Arxiv, 2014

33

Softmax

34Softmax [Morin+ 2005] [Ji+ 2016]SoftmaxSpherical softmax [Vincent+ 2015]Self-normalization [Andreas and Klein 2015]

34

Lateral Network

35Devlin et al., Pre-Computable Multi-Layer Neural Network Language Models, EMNLP, 2015

Lateral Network

35

Lateral Network

36Devlin et al., Pre-Computable Multi-Layer Neural Network Language Models, EMNLP, 2015

pre-computation

36

37Generates Image Description with RNN

Karpathy et al., Deep Visual-Semantic Alignments for Generating Image Descriptions, CVPR, 2015CNNRCNN

RNN

37

38

RNN + LSTM + Attention

Softmax

38

39

39

40

lovesMaryJohn

40

41Chen and Manning, A Fast and Accurate Dependency Parser using Neural Networks, ACL, 2014

Shift-reduceShift-reduceNN

41

42Pei et al., An Effective Neural Network Model for Graph-based Dependency Parsing, ACL, 2015Eisner

EisnerNN

SHift-reduce

42

Eisners Algorithm

43She read a short novel.01234

Initialization

43

Eisners Algorithm

44She read a short novel.

[0, 1, comp] + [1, 2, comp] [0, 2, incomp]01234

44

Eisners Algorithm


[0, 1, comp] + [1, 2, comp] [0, 2, incomp]01234

45

Eisners Algorithm


01234

[0, 1, comp] + [1, 2, comp] [0, 2, incomp]

[0, 1, comp] + [1, 2, incomp] [0, 2, comp]

46

Eisners Algorithm


01234

[0, 1, comp] + [1, 2, comp] [0, 2, incomp]

[0, 1, comp] + [1, 2, incomp] [0, 2, comp]

47

Eisners Algorithm


01234

48

Eisners Algorithm


01234

49

Eisners Algorithm


01234

50

Eisners Algorithm


01234

51

Eisners Algorithm


01234

52

Eisners Algorithm


01234

53

Eisners Algorithm


01234

54

Eisners Algorithm


01234

55

Eisners Algorithm


01234

56

Eisners Algorithm


01234

57

Eisners Algorithm


01234

58

Eisners Algorithm


01234

59

Eisners Algorithm


01234

60

61Dyer et al., Recurrent Neural Network Grammars, arXiv, 2016LSTMShift-reduce

LSTMWSJF92.4state-of-the-art

61

62linearizationVinyals et al., Grammar as a Foreign Language, Arxiv, 20153LSTM1pt

62

63linearizationVinyals et al., Grammar as a Foreign Language, Arxiv, 20151.5%Attention

63

64

Shift-reduce

64

QA

65

65

66Hermann et al., Teaching Machines to Read and Comprehend, Arxiv, 2015

CNNBi-directional LSTM

66

67Hermann et al., Teaching Machines to Read and Comprehend, Arxiv, 2015

67

68Facebook bAbi TaskFacebookTask 1 Task 20

100%

Weston et al., Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks, arXiv, 2015

68

69Facebook bAbi TaskWeston et al., Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks, arXiv, 2015

69

70Dynamic Memory NetworksKumar et al., Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, arXiv, 2015

::

70


71


17: Positional Reasoning,19: Path Finding

72

73Xiong et al., Dynamic Memory Networks for Visual and Textual Question Answering, arXiv, 2016Dynamic Memory NetworksDMN for Visual QA

CNN

73

74Visual QAAndreas et al., Learning to Compose Neural Networks for Question Answering, NAACL, 2016 (Best Paper Award)

Visual QA

74

75Visual QAAndreas et al., Learning to Compose Neural Networks for Question Answering, NAACL, 2016 (Best Paper Award)

Visual QA

75

76

76

77

A, B: x, y: Merlin.jl

>> x = Var()>> y = Var()>> A = Var(rand(8,5))>> B = Var(rand(8,5))>> z = A*x + B*y>> f = Graph(z)

>> fx = f(rand(8,3),rand(8,3))>> backward!(fx)

77

78

gemm!BLASin-place

78

79pre-computationW

embeddings

The auto maker ...

X

concatx1W1

x2W2

79

80

function fib(n::Int) if n < 2 1 else fib(n-1) + fib(n-2) endend

built-in

C, python

Julia

80

81https://github.com/hshindo/Merlin.jl

81

82JuliaDeep Learning: https://github.com/hshindo/Merlin.jl

NLP: https://github.com/hshindo/Jukai.jl

Julia100

82

83in getting their money back

... ... ... ...

gettinginback

... ...

... ... ... ...

CNN

CNNCNN based POS-Tagging [Santos+ 14]

83

gettin

g

10 dim.

CNN based POS-Tagging [Santos+ 14]

getti

g

... ... ... ...

max-pooling10 dim.

max

n

CNN based POS-Tagging [Santos+ 14]

86

CNNCNN

CPU

86

87MethodCNN96.83CNN + CNN97.28

:WSJ newswire text, 40k sentences

:WSJ newswire text, 2k sentences

87

88CPU, Julia, , 2016

88

89

CPU, Julia, , 2016

89

90

90

Technology

深層学習による自然言語処理の研究動向