14
Question Answering on Romanian, English and French Languages Al. I. Cuza” University of Ia Al. I. Cuza” University of Ia s s i, i, Rom Rom a a nia nia Faculty of Computer Science Faculty of Computer Science

Clef2010 QA

Embed Size (px)

Citation preview

Page 1: Clef2010 QA

Question Answering on Romanian, English and

French Languages

„„Al. I. Cuza” University of IaAl. I. Cuza” University of Iassi, i, RomRomaaniania

Faculty of Computer ScienceFaculty of Computer Science

Page 2: Clef2010 QA

Introduction System components

◦ Questions analysis◦ Index creation and information retrieval◦ Answer extraction

Results Application of QA system

◦ eLearning◦ Robotics◦ CriES 2010

Conclusions

Page 3: Clef2010 QA

Our group participate in CLEF exercises from 2006:◦ 2006 – Ro–En (English collection) – 9.47% right

answers ◦ 2007 – Ro–Ro (Romanian Wikipedia) – 12 %◦ 2008 – Ro–Ro (Romanian Wikipedia) – 31 %◦ 2009 – Ro–Ro, En–En (JRC-Acquis) – 47.2 % (48.6%)◦ 2010 – Ro-Ro, En-En, Fr-Fr (JRC-Acquis, Europarl) –

47.5% (42.5%, 27 %)

Page 4: Clef2010 QA

Lucene queries

Lucene Index

Question analysis: - Tokenization & lemmatization - Focus, keywords and names entities identification - Question classification

JRC-Acquis corpus

Initial questions

Information Retrieval

Relevant snippets

Romanian Grammar

Definition Answer Extraction

Reason Answer Extraction

Other Answer Extraction

Final Answers

EUROPARL

corpus

Page 5: Clef2010 QA

Q1: What percentage of people in Italy relies on television for information?

<q q_id="0001" source_lang="EN" target_lang="RO"> <string>Ce procent al populaţiei din Italia contează pe

televiziune pentru a obţine informaţii</string> <focus>procent</focus> <verb>contează obţine</verb> <noun>populaţiei televiziune informaţii</noun> <nameEntities>Italia</nameEntities> <luceneQuery>procent~0.7 populaţiei~0.7 Italia^3 (contează^2

conta) televiziune~0.7 obţine informaţii~0.7 </luceneQuery> <questionType>FACTOID</questionType> ~ 40 patterns <answerType>MEASURE</answerType> ~ 30 patterns</q>

Page 6: Clef2010 QA

We used Lucene and we have created two indexes, one at paragraph level and one at document level

Using the Lucene queries and the indexes we used the Lucene search engine to extract a ranked list of snippets for every question as possible answer candidates

Page 7: Clef2010 QA

Depend by Lucene score and additional we built special modules to extract answers for questions of type DEFINITION, REASON-PURPOSE, PROCEDURE, OPINION

Two thresholds values◦ A higher one - in this case, the system offers

many NOA answers – RA is affected, but c@1 is higher

◦ A lower one – in this case we offer only a few NOA answers – RA is higher, but c@1 is lower

Page 8: Clef2010 QA

RO-RO EN-EN FR-FR

answered right 95 102 85 78 54 47

answered wrong 74 93 98 99 124 153

total answered 169 195 183 177 178 200

unanswered right 0 0 0 0 0 0

unanswered wrong

0 0 0 0 0 0

unanswered empty

31 5 17 23 22 0

total unanswered 31 5 17 23 22 0

c@1 measure 0.55 0.42 0.46 0.43 0.30 0.24

Page 9: Clef2010 QA

eLearning – fast answer for ~30% questions

Robotics – communication

CriES 2010 – identify experts on Yahoo! Answers

Page 10: Clef2010 QA

Similar questions Answer Ok?

Exista aplicatii in care folosirea Design Patterns-urilor sa nu fie eficienta?

Evident (de exemplu un program Hello World)... Go

La ce altceva design pattrenurile ar putea fi aplicate?

Gasirea solutiei la o problema, crearea unui limbaj de programator avansat, in scrierea documentatiei, in discutii cu colegii de la o firma de software.

Go

Care este diferenta dintre pattern si o expresie (idiom) de codare.

Nu am auzit de expresia idiom de codare, dar pare ceva particular, pe cand design pattern-ul este general...

Go

Care este diferenta dintre pattern si clase.

Un design pattern este o solutie la o problema si prin urmare este compusa dintr-o ierarhie de clase intre care avem relatii.

Go

Este design pattern-ul diferit de pattern? De ce a fost ales acest nume?

Design pattern-ul este un pattern in domeniul ingineriei software. Nu stiu de ce a fost ales acest nume.... :)

Go

Folosim Design Patterns in aceeasi aplicatie sau le folosim in aplicatii diferite?

In aceeasi aplicatie. Go

Ce este un design pattern? In primul rand: un nume, o problema si o solutie Go

Questions Answer Priority Status Details

La ce se folosesc design pattern-urile?

normal nevoieNeaparat

Go

Raspunde la intrebare

Raspuns

Go

Exception handlingul in Java poate fi considerat o aplicatie a Decorator pattern?

urgent

nevoieNeaparat

Go

Raspunde la intrebare

Raspuns

Go

Exista aplicatii in care folosirea Design Patterns-urilor sa nu fie eficienta?

Evident (de exemplu un program Hello World)...

normal doarAsa

La ce altceva design pattrenurile ar putea fi aplicate?

Gasirea solutiei la o problema, crearea unui limbaj de programator avansat, in scrierea documentatiei, in discutii cu colegii de la o firma de software.

normal saAfluMulte

Page 11: Clef2010 QA

With Swoogle we extend the knowledge base The ontologies returned are then converted to

AIML format and saved in the robot’s memory

Page 12: Clef2010 QA

Initial digraph

Initial Yahoo!answers collections

en fr ge sp

Eliminate stop words

Domains keywords

Initial users questions

Eliminate stop words

Questions keywords

Relevant words for questions

Relevant words for domains

Similarity score between questions and domains

Run 2 Run 1Run 0

Page 13: Clef2010 QA

UAIC QA system evolved over time (from 9 % in 2006 at 47.5 % in 2010)

The main problem is related to quality and quantity of Romanian resources involved

In present we are concerned with using of QA components in other applications in order to improve their capabilities

Page 14: Clef2010 QA