AINL 2016: Galinsky, Alekseev, Nikolenko

improving neural networks modelsfor natural language processing in russianwith synonyms

Ruslan Galinsky, Anton Alekseev and Sergey Nikolenko

AINL 2016St. Petersburg, November 11, 2016

Steklov Institute of Mathematics at St. Petersburg

Laboratory for Internet Studies, NRU Higher School of Economics, St. Petersburg

neural networks for nlp

• Recent success on numerous tasks: from text classification tosyntax parsing.

• Training ANNs and tuning parameters is hard.• At many tasks there is not enough data for applying advancedneural approaches.

2

data augmentation

• Widely applied in computer vision:shifting, cropping, resizing images, adding noise.

• Not a denoising-aimed technique, but building a better, biggerdataset for the task.

• Can be used for different NLP tasks.

PyData 2015 London, Python for Image Understanding: Deep Learning with Convolutional Neural Nets.

3

data augmentation for language

• One cannot simply insert/swap/remove random words.• Paraphrases — hard to collect and expensive.• Modify word order?• Augment with synonyms?• Augment with words of ’possibly unimportant parts of speech’?

4

quick glance at our study

• Character-level model for Russian language data.• Text classification task: sentiment analysis.• Augmentation with synonyms, adjectives and syntax-unawareword shuffling.

5

why try character-level models for nlp

• Traditional (arguably) approach — various word level models fordifferent tasksConceptual flaws:

• dealing with different form of the same lexeme,• unknown words meaning inference,• word dictionaries can be large,• spell-checking may be required.

• Hopefully all these can be addressed at the subword level,especially for morphology-rich languages, such as Russian.

• Character-level models work with text as a sequence of symbols.

6

model

• Xiang Zhang, Junbo Zhao, Yann LeCun.Character-level Convolutional Networks for Text Classification.2015.

• The paper suggested NN architecture and augmentation withthesauri; authors collected their own large datasets.

• Architecture key points:• input: texts as padded sequences of one-hot-encoded symbols,• 6 conv-layers, 3 max-pooling layers, 3 dense layers with 2 dropoutlayers in between,

• output: one-hot-encoded label.

7

our study

• Sentiment analysis for Russian language datasets withchar-level ConvNet.

• Check which augmentation techniques are applicable:synonyms, adjectives, shuffling.

• Test model on different domain data.

8

augmenting with synonyms

• Method:1. Select all nouns and adjectives (e.g. pymorphy).2. For each word

2.1 lemmatize the word,2.2 look for replacement candidates in a pairs-of-synonyms dictionary

(empirical observation: ’synonymity relation’ should be reflexive),2.3 choose the replacement sampling the synonym from multinomial

distribution over candidates (distribution reflects words occurencestatistics),

2.4 “un-lemmatize” the chosen word.

3. Insert the replacement into the appropriate position in the text.

• Synonyms sources:• Abramov, N. Dictionary of Russian Synonyms and SynonymousPhrases. Moscow: Russkie Slovari. 1999.

• Alexandrova, Z. E. Dictionary of Russian Synonyms. Moscow:Russkii Yazyk. 2001.

9

augmenting with word shuffling

• We tried a simple bag-of-words-like approach: randomshuffling.humpty dumpty sat on the wall → wall the on dumpty sat humpty

• Proposal for future research: syntax parsing, swapping certain parsesubtrees.

10

augmenting with word adjectives

• Suggested procedure:1. Count all word bigrams of POS ”adjective-noun”, ”noun-adjective”.2. For every noun 𝑤 that does not have an associated adjective in a

given text:2.1 sample whether to add an adjective based on prior statistics of any

adjective’s occurence next to 𝑤,2.2 if an adjective should be added, sample, which one, from a

frequency-based multinomial distribution,2.3 decide whether it should be added before or after the noun 𝑤,

sampling from the corresponding distribution,2.4 insert the chosen adjective into the appropriate place in the text.

11

evaluation: datasets

Table 1: Dataset statistics

Dataset ReviewsPositive Negative Total

Basic: torg.mail.ru + Restoclub 63088 35046 98134Augmented with adjectives 126176 70092 196268Augmented with synonyms 125523 69849 195372

Validation dataset: TripAdvisor 26807 11075 37882

12

evaluation: accuracy, test set

5 10 15 20 25 30 35 40 45 50

0.7

0.8

0.9

1

Training epochs

Accu

racy

Basic dataset ReshuffledAugm. with synonyms Augm. with adjectives

Figure 1: A comparison of test set accuracies of all models in the study

13

evaluation: test set vs different domain

Table 2: Experimental results

Dataset Best accuracyTest set TripAdvisor set

Basic dataset 0.8457 0.7163Basic with reshuffled words 0.8445 0.7160Augmented with adjectives 0.7241 0.5430Augmented with synonyms 0.8700 0.7020

14

summary

• Several approaches to data augmentation in the context ofcharacter-level models were suggested.

• Simple augmentation with synonyms showed significantimprovements for a sentiment analysis task.

• Not every augmentation technique is beneficial, ones should betested carefully.

• Adding new words to reviews makes sense even if they slightlyviolate grammatical rules.

15

thank you!

Thank you for your attention!

16

Science

AINL 2016: Galinsky, Alekseev, Nikolenko