Jae-Gil Lee Department of Knowledge Service Engineering KAIST

Booming Up the Long Tails: Discover-ing Potentially Contributive Users in Community-Based Question Answering Services

Jae-Gil Lee

Department of Knowledge Service Engi-neeringKAIST

2/12/2014 2

ContentsBackground and MotivationOverview of the MethodologyDetailed MethodologyExperiment EvaluationConclusions

This paper received the Best Paper Award at AAAI ICWSM-13

2/12/2014 3

Community-Based Question Answering (CQA) Ser-

vices

Current problems in CQA servicesToo many questions Hard to find questions to answer Solutions: expert finding, question routing [Zhou et al. 2009]

Search engines are weak at Recent updated information Personalized information Advice & opinion [Budalakoti et al. 2010]

160,000 questions per day

50,000 questions per dayAsk

Answer

CQA services

2/12/2014 4

Question RoutingGraph-based Content-based Profile-based

Also, hybrid methods

HITs, PageRankFind influential answerers

Language ModelingMatch questions & answerers

User profileFind experts based on profiles

Two important factors in question routing• Expertise: answerers need proper knowledge on the question area• Availability: answerers need time to answer[Horowitz et al. 2010, Li et al. 2010, Zhang et al. 2007]

There is a trade-off between expertise and availability

2/12/2014 5

Short Tail vs. Long TailMost contributions (i.e., answers) in CQA services

are made by a small number of heavy usersMany questions won’t be answered if such heavy users

become unavailable

A system is not robust if it heavily relies on a small number of users

2/12/2014 6

On the other hand, recently-joined users are prone to leave CQA servicesExample: the appearances of the 9,874 an-

swerers who wrote answers in the computers category of KiN

Only 8.4% of an-swerers remained after a year

2/12/2014 7

Comparison with Traditional Question Routing

Motivating such recently-joined users to be-come heavy users―by routing proper questions to them so that they can easily contribute―is of prime importance towards the success of the ser-vices Existing methodologies

COur methodology D

Which users should we take care of? Recently-joined expert users!

2/12/2014 8

Problem SettingDeveloping a methodology of measuring

the likelihood of a light user becoming a contributive (i.e., heavy) user in the future in CQA services

Input: (i) the statistics of each heavy user, (ii) the answers written by heavy users, (iii) the answers written by light users

Output: the likelihood of each light user becoming a heavy user in the future Answer Affordance

2/12/2014 9


2/12/2014 10

ChallengesThere is no sufficient information (i.e., an-

swers) to judge the expertise of recently-joined users!

Kind of a cold-start problem

How can we cope with the lack of in-formation?

2/12/2014 11

Intuition A person’s active vocabulary reveals his/her

knowledge Vocabulary has sharable characteristics so that

domain-specific words are repeatedly used by ex-pert answerers

Using the active vocabulary of a user to infer his/her expertise, i.e., using the vocabulary to bridge a gap between heavy users and light users

2/12/2014 12

Vocabulary Level Vocabulary knowledge

“Vocabulary knowledge should at least comprise two dimensions, which are vocabulary breadth (or size), and depth (or quality)” [Marjorie et al. 1996]

Three dimensions of lexical competence “(a) partial to precise knowledge, (b) depth of knowledge, and (c)

receptive to productive use ability” [Henriksen 1999]

Productive vocabulary ability “It implies degrees of knowledge. A learner may be reluctant to

use infrequent word using a simpler, more frequent word of a sim-ilar meaning. Such reluctance is often a result of uncertainty about the word’s usage. Lack of confidence is a reflection of imper-fect knowledge. We refer to the ability to use a word at one’s free will as free productive ability” [Laufer et al. 1999]

De-tails

2/12/2014 13

Domain Experts’ Vocabulary Usage

“Experts generated queries containing words from domain-specific lexicons fifty percent more often than non-experts. In addition to being able to generate more technically-sophisti-cated queries, experts also generated longer queries in terms of tokens and characters. It may be that because domain experts are more familiar with the domain vocabulary.” [White et al. 2009]

“Behavior of software engineers is quite distinct from general web search behavior. They use longer and more detailed queries. They make heavy use of specialized terms and search syntax. … Controlled vocabulary look-up lists or query processing tools should be in place to deal with acronyms, product names, and other technical terms” [Freund et al. 2006]

“When searching, experts found slightly more relevant documents. Experts issued more queries per task and longer queries, and their vocabulary overlapped somewhat more with the-saurus entries” [Zhang et al. 2005]

Domain experts use specialized, but formatted/standardized words

De-tails

2/12/2014 14

Domain Expert’sVocabulary Durability

“One important change in behavior was the use of a more specific vocabulary as students learned more about their research topic” [Vakkari et al. 2003]

“Experts’ use of domain-specific vocabulary changes only slightly over the duration of the study. However, many non-expert users exhibit an increase in their usage of domain-specific vocabu-lary” [White et al. 2009]

Domain expert’s unique word set remains for a long time without change

De-tails

2/12/2014 15

Usage of the Vocabulary: Overview

Heavy Users Words Light Users

2/12/2014 16


2/12/2014 17

Basics of CQA Ser-vices

Top-Level Categories (e.g., Computers, Travel)

Defining the expertise of a user on a top-level category in our methodology

User Profile Selection Count = A Selection Ratio = B = A/D Recommendation Count = C

2/12/2014 18

Answer AffordanceConsidering both expertise and avail-

ability

𝐴𝑓𝑓𝑜𝑟𝑑𝑎𝑛𝑐𝑒 (𝑢𝑙 )=¿

2/12/2014 19

Estimated Expertise

WordLevel(w1)

WordLevel(w2)

WordLevel(w3)

WordLevel(wn)

Heavy Users UH

u2

un

u1

Expertise(u1)

Expertise(u2)

Expertise(un)

w1, w2, w4, w6 …

EstimatedExpertise(un+1)

...

Step 2 Step 3 Step 4

w2, w4, w6, w7 …

w1, w3, w6, w8 …

...

w1, w3, w4, w5 …

...

w3, w4, w5, w8 …

w2, w3, w6, w8 …

un+2

un+k

un+1

Light Users UL

WordLevel(wi )

Step 1

EstimatedExpertise(un+2)

EstimatedExpertise(un+k)

Vocabulary

2/12/2014 20

Step 1: the expert score of a heavy user is calculated using the abundant historical data Expertise(uh)The expertise of a user becomes higher (i) as

the user’s answers are more concentrated on the target category and (ii) as the user has higher selection count, selection ratio, and recommendation count

2/12/2014 21

Step 2: the level of a word is determined by the expert scores of the heavy users who used the word before WordLevel(wi)The word level of a word becomes higher as

the word is used by more expert users and more frequently

Decomposing an answer into words is reliable even for a small number of answers, because each answer typically has quite a few words

2/12/2014 22

Step 3: these word levels are propagated to a set of words used by a light user in his/her answersThis step is supported by the observation that

the vocabulary of an expert stays mostly un-changed despite a temporal gap [White, Du-mais, and Teevan 2009]

2/12/2014 23

Example: sample words in the travel cate-gory with their value of WordLevel(Wi)

2/12/2014 24

Step 4: the expert score of the light user is reversely calculated based on his/her vocabulary EstimatedExpertise(ul)

2/12/2014 25

AvailabilitySimply measuring the number of a user’s

answers with their importance proportional to their recency

2/12/2014 26


2/12/2014 27

Data Set Collected from Naver Knowledge-In (KiN)

http://kin.naver.com Ranging from September 2002 to August 2012

Ten years Including two categories: Computers and Travel

Computers factual information, Travel subjective opinions

The entropy is used for measuring the expertise of a user, working well especially for the categories where factual expertise is primarily sought after [Adamic et al. 2008]

StatisticsComputers Travel

# of answers 3,926,794 585,316# of words 191,502 232,076# of users 228,369 44,866

2/12/2014 28

Period Division Dividing the 10 year period into three periods

The resource period is sufficiently long to learn the expertise of users, so is the test period; in contrast, the training period is not

Heavy users: those who joined during the resource pe-riod

Light users: those who joined during the training period (only one year)

Assuming that the end of the training period is the present

2/12/2014 29

Accuracy of Expertise Predic-tion: Preliminary Tests

Extracting the main interest declared by each user in CQA services

Measuring the ratio of such self-declared experts on the target category among the top-k light users sorted by EstimatedExpertise()

The ratio of users who expressed their interests

(a) Computers (b) Travel

2/12/2014 30

Accuracy of Expertise Predic-tion: Evaluation Method

Finding the top-k users by EstimatedExpertise() from the training period our prediction

Finding the top-k users by KiN’s ranking scheme from the test period ground truthKiN’s ranking scheme is a weighted sum of the selec-

tion count and the selection ratio Measuring (i) P@k and (ii) R-precisionRepeating the same procedure for comparison

with the following approachesExpertise(): the way of ranking heavy users rather

than light users in our methodologySelCount(): the selection countRecommCount(): the recommendation count

2/12/2014 31

Accuracy of Expertise Predic-tion: Results

The precision performance for the travel category

The precision performance for the computers category

2/12/2014 32

Accuracy of Answer Affordance: Evaluation Method

Finding the top-k users by Affordance() for light users our methodology

Finding the top-k users managed by KiN competitor

Measuring the user availability and the an-swer possession for the next one monthUser availability: the ratio of the number of the

top-k users who appeared on the day to the total number of users who appeared on that day

Answer possession: the ratio of the number of the answers posted by the top-k users on the day to the total number of answers posted on that day

2/12/2014 33

The result of the answer pos-session


The result of the user avail-ability


2/12/2014 34


2/12/2014 35

ConclusionsDeveloped a new methodology that can

make CQA services more active and robust

Verified the effectiveness of our methodol-ogy using a real data set for ten years

Quote from the reviews:“I'm sold. If these results hold on another CQA site, this will be a very significant contribution to online communities. The study is well done, it's incredibly readable and clear, and the evaluation dataset is impeccable (10 years of data from one of the top 3 sites).”

2/12/2014 36

Thank You!

Documents

Jae-Gil Lee Department of Knowledge Service Engineering KAIST