AI&BigData Lab 2016. Анатолий Востряков: Перевод с "плохого" английского на "хороший"

Перевод с "плохого" английского на "хороший"

How to correct errors and improve text

Current methods of error correction

1. Constituency Parsing


2. Dependency Parsing


Mike showed Melissa how to use computer.

(“use”, “computer”) => 100,000

(“use”, “a”, “computer”) => 300,000

Mike showed Melissa how to use computer software.

3. Ngrams can be tricky

4. Classical ML algorithms like SVM or Random Forest which are trained on several features:

- Ngrams- Syntactic Ngrams (dependency arcs)- POS-tags- Length of words- Count of synonyms- etc


Why Deep Learning?

2. Possible to create generative models1. Possible to use a wider context preserving the order of words

What we use in Deep Learning in NLP for words?

1. The first try: one-hot encoding

Word vectors

vec(king)−vec(man)+vec(woman) = vec(queen)

Word vectors: Word2vec or GloVe

Embedding matrix:

Embedding layer

Indices of words in an embedding matrix

ht=tanh(Whht−1+Wxxt),

RNN: vanilla RNN block

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

RNN: LSTM block

Hochreiter, Sepp and Schmidhuber, Ju ̈rgen, 1997



RNN: GRU block

New RNN blocks:- Associative Long Short-Term Memory, 2016- Unitary evolution recurrent neural networks, 2015

Multi-layer bidirectional LSTM

W1 W2 W3 …. Wn

f1 f2 f3 …. fn

W1 W2 W3 …. Wn

b1 b2 b3 …. bn

Output: yi = [fi; bi]

Words below - vectors of English wordsWords above - one-hot vectors<go>, <end> - special vectors that mark the start and the end of the output sentence

http://arxiv.org/pdf/1409.3215v3.pdf

Loss =

Neural machine translation or sequence-to-sequence architecture



Neural machine translation or sequence-to-sequence architecture

Encoder Decoder

http://arxiv.org/pdf/1409.0473v6.pdfhttp://www.aclweb.org/anthology/D15-1166

A global context vector ct is then computed as the weighted average, according to at, over all the source states.

+ Attention



http://www.aclweb.org/anthology/D15-1166

http://www.aclweb.org/anthology/D15-1166

What’s wrong with words?

Character-level error correction with attention

http://arxiv.org/pdf/1603.09727.pdfThe best system in CoNLL-2014 Shared Task competition.

http://arxiv.org/pdf/1603.09727.pdf


Encoder

Decoder

+ Attention

Output:

The weighted sum of the encoded hidden states at is then concatenated with d(M), and passed through another affine transform followed by a ReLU nonlinearity before the final softmax output layer.

where φ1 and φ2 represent feedforward affine transforms followed by a tanh nonlinearity

Sentence-level grammatical error identification as sequence-to-sequence correction




Formulas for word/character level

Encoder with attention to get context vector cj: Decoder:

The same architecture with character level

CharCNN

Two separate CharCNNs for Encoder and Decoder:

Highway network

,where f is ReLu; r = σ(Wrz+br); z - vector from CharCNN

Thank you!

Design of slides: Elena Godina

My contacts:[email protected]

Anatoly Vostryakov at linkedin

Technology

AI&BigData Lab 2016. Анатолий Востряков: Перевод с "плохого" английского на "хороший"