18
HSE-School of linguistics at Russian Paraphrase Detection Shared Task Anastasia Romanova, Mikhail Nefedov Saint-Petersburg, 2016 Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S

AINL 2016: Romanova, Nefedov

Embed Size (px)

Citation preview

Page 1: AINL 2016: Romanova, Nefedov

HSE-School of linguistics at Russian ParaphraseDetection Shared Task

Anastasia Romanova, Mikhail Nefedov

Saint-Petersburg, 2016

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 2: AINL 2016: Romanova, Nefedov

Overview

1 Introduction

2 Task

3 Standard Features

4 Word Embedding Features

5 Results

6 Next steps

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 3: AINL 2016: Romanova, Nefedov

Introduction

Higher School of Economics School of Linguistics

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 4: AINL 2016: Romanova, Nefedov

Task

Compare two sentencesTwo types of classificationStandard and Non-standard runs

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 5: AINL 2016: Romanova, Nefedov

Standard Features

Precision

precision = word-overlap(sentence1, sentence2)word–count(sentence1)

Recall

recall = word-overlap(sentence1, sentence2)word-count(sentence2)

BLEU scoreProposed by IBM (Papineni et al., 2002) for evaluating MachineTranslation Systems

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 6: AINL 2016: Romanova, Nefedov

Standard Features

SyntaxNetReleased by Google in May, 2016Models for 40 languages

Dependency parse tree

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 7: AINL 2016: Romanova, Nefedov

Standard Features

Tree Edit Distance (Zhang, Shasha, 1989)

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 8: AINL 2016: Romanova, Nefedov

Standard Results

Standard run

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 9: AINL 2016: Romanova, Nefedov

Word Embedding Features

Words as vectors

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 10: AINL 2016: Romanova, Nefedov

Word Embedding Features

Drawbacks of the averaging approach (Rijkeand Kenter, 2015)

Vectors for words Mean vectors

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 11: AINL 2016: Romanova, Nefedov

Word Embedding Features

Before preprocessing

Клинтон выступила с первой речью после поражения навыборах

After preprocessing

клинтон_S выступать_V первый_A речь_S поражение_Sвыбор_S

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 12: AINL 2016: Romanova, Nefedov

Word Embedding Features

BM25 + Word2Vec

sl - longest sentencesss - shortest sentencesavgsl - average sentence length

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 13: AINL 2016: Romanova, Nefedov

Word Embedding Features

All to all similarities

The boy smiles - The girls laughs

Similarity matrix

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 14: AINL 2016: Romanova, Nefedov

Word Embedding Features

All to all similarities

The boy smiles - The girls laughs

Bins for all values

Bins for maximum values

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 15: AINL 2016: Romanova, Nefedov

Word Embedding Features

Per-dimension similarities

Cosine similarity

Similarity bins

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 16: AINL 2016: Romanova, Nefedov

Results

Non-standard run

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 17: AINL 2016: Romanova, Nefedov

Next steps

Find optimal intervals for binsCreate a new Word2Vec modelTest AdaGramCompute idf on a larger corpusInclude dependency weighting into BM25

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Page 18: AINL 2016: Romanova, Nefedov

Contacts I

[email protected]@gmail.com

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task