16
Identify Experts from a Domain of Interest Al. I. Cuza” University of Ia Al. I. Cuza” University of Ia s s i, Rom i, Rom a a nia nia Faculty of Computer Science Faculty of Computer Science

Identify Experts from a Domain of Interest

Embed Size (px)

Citation preview

Page 1: Identify Experts from a Domain of Interest

Identify Experts from a Domain of Interest

„„ Al. I. Cuza” University of IaAl. I. Cuza” University of Ia ss i, Romi, Rom aa niania

Faculty of Computer ScienceFaculty of Computer Science

Page 2: Identify Experts from a Domain of Interest

Context Statistics CriES2010 Input data System components◦ Questions and answers pre-processing◦ Pre-processing of interest areas◦ Getting the list of experts

Results Conclusions

Page 3: Identify Experts from a Domain of Interest

Yahoo! Answers – a collaborative community service, multilingual through which members can ask questions and can receive answers

Page 4: Identify Experts from a Domain of Interest

Google Ad Planner traffic statistics for Y!A, December 2009:◦ 26,000,000 Unique visitors (users) (US)◦ 110,000,000 Total visits (US)

Y!A represents between 1.03% to 1.7% of Yahoo! traffic In present, the identification of experts is done semi-automatically

Page 5: Identify Experts from a Domain of Interest

Automatic search of human expert in the multilingual context offered by Yahoo! Answers network

Participants start from a collection of questions and answers and they must identify the expert able to answer to a new question

Page 6: Identify Experts from a Domain of Interest

Initial digraph

Initial Yahoo!answers collections

en fr ge sp

Eliminate stop words

Domains keywords

Initial users questions

Eliminate stop words

Questions keywords

Relevant words for questions

Relevant words for domains

Similarity score between questions and domains

Run 2 Run 1Run 0

Page 7: Identify Experts from a Domain of Interest

Initially we divided the original XML (over 800 Mb) in 204 smaller files (the bigger file was “Other – Internet” ~ 80 Mb and the smaller one was the “MSN” ~ 670 bytes)

Examples of categories achieved:◦ Alergia, Alergias, Allergies◦ Astronomy◦ Biology◦ Mathematics◦ Monitors◦ Paranormal

Page 8: Identify Experts from a Domain of Interest

For every question from a category, we process the information existing in the tags <title> and <description>

First we removed the stop-words and punctuation signs <topic lang="en">

<title>Do animals have feelings?</title> <description>can an animal feel regrets ,

compassion, sad, fear etc?</description> <category>Zoology</category>

<tokens>animals, feelings, animal, feel, regrets, compassion, sad, fear</tokens>

</topic>

Page 9: Identify Experts from a Domain of Interest

For English topics we used WordNet:<topic lang="en"> <title>What is the origin of "foobar"?</title> <description>I want to know the meaning of the word and how

to explain to my friends.</description> <category>Programming&Design</category> (1) <tokens>origin,foobar,meaning,word,explain,friends

</tokens> (2) <synonyms>descent,extraction,origination,inception,

significance,signification,import,substance</synonyms></topic>

Page 10: Identify Experts from a Domain of Interest

For other languages we used Google Translate service first and then English WordNet:

<topic lang="fr"><title>ki connaitre l'histoire de l'aspirine?</title><description/><category>Biologie</category><questioner>u8620</questioner><answerer>u313460</answerer>(1)<tokens> connaitre, histoire, aspirine</tokens>(2.1)<tokens_en>know,history,aspirin</tokens_en>(2.2)<synonyms_en>account,chronicle,story,acetylsalicylic

acid,Bayer,Empirin,St. Joseph</synonyms_en>(2.3)<synonyms>compte, chronique, l'histoire, l'acide

acétylsalicylique, Bayer, Empirin, Saint- Joseph</synonyms></topic>

Page 11: Identify Experts from a Domain of Interest

For each new question we calculate a similarity score between it and existing answered questions from the same topic

The similarity score depend by common words from tags <tokens> and <synonyms>

The solution = first 10 experts selected in descending order of similarity scores

Page 12: Identify Experts from a Domain of Interest

Similar to Run 1: For each new question we calculate a similarity score

between it and existing answered questions from the same topic

The solution = first 10 experts selected in descending order of similarity scores

Difference: The similarity score depend only by common words from tag <tokens>

Page 13: Identify Experts from a Domain of Interest

In this case we used only the input digraph

<edge source="u765155" target="u52050"> <desc>1592994;Laptops & Notebooks</desc></edge>

For every topic and for every person we calculate the number of questions answered by that person in that topic (using “target” element)

Initial digraph

Page 14: Identify Experts from a Domain of Interest

Run Id CharacteristicsStrict Lenient

P@10 MRR P@10 MRR

0 We eliminate stop words and we consider relevant keywords and their synonyms (using Google Translate and English WordNet)

0.52 0.80 0.82 0.94

1 We eliminate stop words and we consider only relevant keywords

0.47 0.77 0.77 0.93

2 We consider only the digraph provided by Yahoo 0.62 0.84 0.83 0.94

Page 15: Identify Experts from a Domain of Interest

Runs 2 and 0 obtained good results (normal for run 0 and unexpected for run 2)

Problems related to execution time for our runs (few hours)

Future work is related to multilinguality:◦ In our approach Allergies, Allergien, Alergias,

Alergia represent different topics with different experts◦ We still search the algorithm to identify the best multilingual

expert

Page 16: Identify Experts from a Domain of Interest