Upload
iden
View
35
Download
0
Embed Size (px)
DESCRIPTION
Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services. Jae-Gil Lee Department of Knowledge Service Engineering KAIST. Contents. Background and Motivation Overview of the Methodology Detailed Methodology Experiment Evaluation - PowerPoint PPT Presentation
Citation preview
Booming Up the Long Tails: Discover-ing Potentially Contributive Users in Community-Based Question Answering Services
Jae-Gil Lee
Department of Knowledge Service Engi-neeringKAIST
2/12/2014 2
ContentsBackground and MotivationOverview of the MethodologyDetailed MethodologyExperiment EvaluationConclusions
This paper received the Best Paper Award at AAAI ICWSM-13
2/12/2014 3
Community-Based Question Answering (CQA) Ser-
vices
Current problems in CQA servicesToo many questions Hard to find questions to answer Solutions: expert finding, question routing [Zhou et al. 2009]
Search engines are weak at Recent updated information Personalized information Advice & opinion [Budalakoti et al. 2010]
160,000 questions per day
50,000 questions per dayAsk
Answer
CQA services
2/12/2014 4
Question RoutingGraph-based Content-based Profile-based
Also, hybrid methods
HITs, PageRankFind influential answerers
Language ModelingMatch questions & answerers
User profileFind experts based on profiles
Two important factors in question routing• Expertise: answerers need proper knowledge on the question area• Availability: answerers need time to answer[Horowitz et al. 2010, Li et al. 2010, Zhang et al. 2007]
There is a trade-off between expertise and availability
2/12/2014 5
Short Tail vs. Long TailMost contributions (i.e., answers) in CQA services
are made by a small number of heavy usersMany questions won’t be answered if such heavy users
become unavailable
A system is not robust if it heavily relies on a small number of users
2/12/2014 6
On the other hand, recently-joined users are prone to leave CQA servicesExample: the appearances of the 9,874 an-
swerers who wrote answers in the computers category of KiN
Only 8.4% of an-swerers remained after a year
2/12/2014 7
Comparison with Traditional Question Routing
Motivating such recently-joined users to be-come heavy users―by routing proper questions to them so that they can easily contribute―is of prime importance towards the success of the ser-vices Existing methodologies
COur methodology D
Which users should we take care of? Recently-joined expert users!
2/12/2014 8
Problem SettingDeveloping a methodology of measuring
the likelihood of a light user becoming a contributive (i.e., heavy) user in the future in CQA services
Input: (i) the statistics of each heavy user, (ii) the answers written by heavy users, (iii) the answers written by light users
Output: the likelihood of each light user becoming a heavy user in the future Answer Affordance
2/12/2014 9
ContentsBackground and MotivationOverview of the MethodologyDetailed MethodologyExperiment EvaluationConclusions
2/12/2014 10
ChallengesThere is no sufficient information (i.e., an-
swers) to judge the expertise of recently-joined users!
Kind of a cold-start problem
How can we cope with the lack of in-formation?
2/12/2014 11
Intuition A person’s active vocabulary reveals his/her
knowledge Vocabulary has sharable characteristics so that
domain-specific words are repeatedly used by ex-pert answerers
Using the active vocabulary of a user to infer his/her expertise, i.e., using the vocabulary to bridge a gap between heavy users and light users
2/12/2014 12
Vocabulary Level Vocabulary knowledge
“Vocabulary knowledge should at least comprise two dimensions, which are vocabulary breadth (or size), and depth (or quality)” [Marjorie et al. 1996]
Three dimensions of lexical competence “(a) partial to precise knowledge, (b) depth of knowledge, and (c)
receptive to productive use ability” [Henriksen 1999]
Productive vocabulary ability “It implies degrees of knowledge. A learner may be reluctant to
use infrequent word using a simpler, more frequent word of a sim-ilar meaning. Such reluctance is often a result of uncertainty about the word’s usage. Lack of confidence is a reflection of imper-fect knowledge. We refer to the ability to use a word at one’s free will as free productive ability” [Laufer et al. 1999]
De-tails
2/12/2014 13
Domain Experts’ Vocabulary Usage
“Experts generated queries containing words from domain-specific lexicons fifty percent more often than non-experts. In addition to being able to generate more technically-sophisti-cated queries, experts also generated longer queries in terms of tokens and characters. It may be that because domain experts are more familiar with the domain vocabulary.” [White et al. 2009]
“Behavior of software engineers is quite distinct from general web search behavior. They use longer and more detailed queries. They make heavy use of specialized terms and search syntax. … Controlled vocabulary look-up lists or query processing tools should be in place to deal with acronyms, product names, and other technical terms” [Freund et al. 2006]
“When searching, experts found slightly more relevant documents. Experts issued more queries per task and longer queries, and their vocabulary overlapped somewhat more with the-saurus entries” [Zhang et al. 2005]
Domain experts use specialized, but formatted/standardized words
De-tails
2/12/2014 14
Domain Expert’sVocabulary Durability
“One important change in behavior was the use of a more specific vocabulary as students learned more about their research topic” [Vakkari et al. 2003]
“Experts’ use of domain-specific vocabulary changes only slightly over the duration of the study. However, many non-expert users exhibit an increase in their usage of domain-specific vocabu-lary” [White et al. 2009]
Domain expert’s unique word set remains for a long time without change
De-tails
2/12/2014 15
Usage of the Vocabulary: Overview
Heavy Users Words Light Users
2/12/2014 16
ContentsBackground and MotivationOverview of the MethodologyDetailed MethodologyExperiment EvaluationConclusions
2/12/2014 17
Basics of CQA Ser-vices
Top-Level Categories (e.g., Computers, Travel)
Defining the expertise of a user on a top-level category in our methodology
User Profile Selection Count = A Selection Ratio = B = A/D Recommendation Count = C
2/12/2014 18
Answer AffordanceConsidering both expertise and avail-
ability
𝐴𝑓𝑓𝑜𝑟𝑑𝑎𝑛𝑐𝑒 (𝑢𝑙 )=¿
2/12/2014 19
Estimated Expertise
WordLevel(w1)
WordLevel(w2)
WordLevel(w3)
WordLevel(wn)
Heavy Users UH
u2
un
u1
Expertise(u1)
Expertise(u2)
Expertise(un)
w1, w2, w4, w6 …
EstimatedExpertise(un+1)
...
Step 2 Step 3 Step 4
w2, w4, w6, w7 …
w1, w3, w6, w8 …
...
w1, w3, w4, w5 …
...
w3, w4, w5, w8 …
w2, w3, w6, w8 …
un+2
un+k
un+1
Light Users UL
WordLevel(wi )
Step 1
EstimatedExpertise(un+2)
EstimatedExpertise(un+k)
Vocabulary
2/12/2014 20
Step 1: the expert score of a heavy user is calculated using the abundant historical data Expertise(uh)The expertise of a user becomes higher (i) as
the user’s answers are more concentrated on the target category and (ii) as the user has higher selection count, selection ratio, and recommendation count
2/12/2014 21
Step 2: the level of a word is determined by the expert scores of the heavy users who used the word before WordLevel(wi)The word level of a word becomes higher as
the word is used by more expert users and more frequently
Decomposing an answer into words is reliable even for a small number of answers, because each answer typically has quite a few words
2/12/2014 22
Step 3: these word levels are propagated to a set of words used by a light user in his/her answersThis step is supported by the observation that
the vocabulary of an expert stays mostly un-changed despite a temporal gap [White, Du-mais, and Teevan 2009]
2/12/2014 23
Example: sample words in the travel cate-gory with their value of WordLevel(Wi)
2/12/2014 24
Step 4: the expert score of the light user is reversely calculated based on his/her vocabulary EstimatedExpertise(ul)
2/12/2014 25
AvailabilitySimply measuring the number of a user’s
answers with their importance proportional to their recency
2/12/2014 26
ContentsBackground and MotivationOverview of the MethodologyDetailed MethodologyExperiment EvaluationConclusions
2/12/2014 27
Data Set Collected from Naver Knowledge-In (KiN)
http://kin.naver.com Ranging from September 2002 to August 2012
Ten years Including two categories: Computers and Travel
Computers factual information, Travel subjective opinions
The entropy is used for measuring the expertise of a user, working well especially for the categories where factual expertise is primarily sought after [Adamic et al. 2008]
StatisticsComputers Travel
# of answers 3,926,794 585,316# of words 191,502 232,076# of users 228,369 44,866
2/12/2014 28
Period Division Dividing the 10 year period into three periods
The resource period is sufficiently long to learn the expertise of users, so is the test period; in contrast, the training period is not
Heavy users: those who joined during the resource pe-riod
Light users: those who joined during the training period (only one year)
Assuming that the end of the training period is the present
2/12/2014 29
Accuracy of Expertise Predic-tion: Preliminary Tests
Extracting the main interest declared by each user in CQA services
Measuring the ratio of such self-declared experts on the target category among the top-k light users sorted by EstimatedExpertise()
The ratio of users who expressed their interests
(a) Computers (b) Travel
2/12/2014 30
Accuracy of Expertise Predic-tion: Evaluation Method
Finding the top-k users by EstimatedExpertise() from the training period our prediction
Finding the top-k users by KiN’s ranking scheme from the test period ground truthKiN’s ranking scheme is a weighted sum of the selec-
tion count and the selection ratio Measuring (i) P@k and (ii) R-precisionRepeating the same procedure for comparison
with the following approachesExpertise(): the way of ranking heavy users rather
than light users in our methodologySelCount(): the selection countRecommCount(): the recommendation count
2/12/2014 31
Accuracy of Expertise Predic-tion: Results
The precision performance for the travel category
The precision performance for the computers category
2/12/2014 32
Accuracy of Answer Affordance: Evaluation Method
Finding the top-k users by Affordance() for light users our methodology
Finding the top-k users managed by KiN competitor
Measuring the user availability and the an-swer possession for the next one monthUser availability: the ratio of the number of the
top-k users who appeared on the day to the total number of users who appeared on that day
Answer possession: the ratio of the number of the answers posted by the top-k users on the day to the total number of answers posted on that day
2/12/2014 33
The result of the answer pos-session
(a) Computers (b) Travel
The result of the user avail-ability
(a) Computers (b) Travel
2/12/2014 34
ContentsBackground and MotivationOverview of the MethodologyDetailed MethodologyExperiment EvaluationConclusions
2/12/2014 35
ConclusionsDeveloped a new methodology that can
make CQA services more active and robust
Verified the effectiveness of our methodol-ogy using a real data set for ten years
Quote from the reviews:“I'm sold. If these results hold on another CQA site, this will be a very significant contribution to online communities. The study is well done, it's incredibly readable and clear, and the evaluation dataset is impeccable (10 years of data from one of the top 3 sites).”
2/12/2014 36
Thank You!