Upload
shalin
View
50
Download
0
Embed Size (px)
DESCRIPTION
QA Track: ciQA Task. Enterprise Track: Expert Search Task. Blog Track: Opinion Retrieval Task. Task Goal : locating blog posts that express an opinion about a target. Retrieval Unit : Permalinks (postings + comments): 3,215,171 documents. “Cleaned” Docs. Permalink Docs. - PowerPoint PPT Presentation
Citation preview
Douglas Oard, Tamer Elsayed, Yejun Wu, Pengyi Zhang, Eileen Abels, Jimmy Lin, and Dagobert SoergelDouglas Oard, Tamer Elsayed, Yejun Wu, Pengyi Zhang, Eileen Abels, Jimmy Lin, and Dagobert SoergelTREC-2006 at Maryland: Blog, Enterprise and QA TracksTREC-2006 at Maryland: Blog, Enterprise and QA Tracks
QA Track: QA Track: ciQA TaskciQA Task
ConclusionConclusion
Title+Narr/Email - retrieved
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
Topic (sorted by difference)
Diff
. fro
m M
edia
n (A
P)
Title+Narr/Email - retrieved+supported
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
Topic (sorted by difference)
Diff
. fro
m M
edia
n (A
P)
Email Support Ratio
0.0
0.2
0.4
0.6
0.8
1.0
Topic (sorted by email support ratio)
Rat
io
Number of Support Emails
0
300
600
900
1200
Topic (sorted by email support ratio)
Em
ails
Title+Narr/Email - retrieved
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
Topic (sorted by email support ratio)
Dif
f. f
rom
Med
ian
(AP
)
Title+Narr/Email - retrieved+supported
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
Topic (sorted by email support ratio)
Dif
f. f
rom
Med
ian
(AP
)
Title+Narr/Email - retrieved+supported
R2 = 0.4758
-1.0
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
Topic (sorted by email support ratio)
Diff
. fro
m B
est (
AP)
Enterprise Track: Enterprise Track: Expert Search TaskExpert Search TaskBlog Track: Blog Track: Opinion Retrieval TaskOpinion Retrieval Task
Retrieval ResultsRetrieval Results Supported Retrieval ResultsSupported Retrieval Results
Improved reference resolutionParameter tuning for weighted-field creditLearning from reply features
Hh hd
TcanddTcand ),|(support)|(score
),(level
),(assoc)),(root(sim),|(support
dhd
dcandTdTcandd
Thread-based ScoringThread-based Scoring
TRd
TcanddTcand ),|(support)|(score
),(assoc ),(sim),|(support dcandTdTcandd
Email-based ScoringEmail-based Scoring
df
f candwdcand )(),(assoc
Candidate List………………………………………………………………….
Models of IdentityEnriched
Candidate Models
Topic
Duplicate Removal
W3C Mailing Lists
Ranked List
Email and Thread Index
Email AddressesFull NamesNicknames
Candidate Scoring
Retrieval Engine
Reference Recognition
Average performanceThreads help in short queriesMore email support more accurate
Future WorkFuture Work
College of Information Studies /College of Information Studies / Computer Science Department / UMIACS, University of Maryland, College Park, USAComputer Science Department / UMIACS, University of Maryland, College Park, USA
Retrieval Support
Query Approach MAP P@10 MAP P@10
Title Email 0.195 0.406 0.072 0.182
Title + Narrative Email 0.350 0.504 0.141 0.298
Title Thread 0.218 0.449 0.090 0.198
Title + Narrative Thread 0.343 0.514 0.139 0.294
Title + Description Thread 0.315 0.502 0.119 0.278
Avg. of Medians 0.341 0.508 0.154 0.294
Removing non email-supported 0.365 0.525 0.147 0.311
Reference Credit Wf for Email Fields
Sender 2.0 Receiver 1.0
Subject 1.0 New text tf
Quoted sender 1.0 Quoted receiver 0.5
Quoted text tf
Title+Narr/Email - retrieved
R2 = 0.3531
-1.0
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
Topic (sorted by email support ratio)
Dif
f. f
rom
Bes
t (A
P)
ResultsResults
Hom Much Email Support Over Topics?Hom Much Email Support Over Topics?
Performance Relative to Email SupportPerformance Relative to Email Support
Task GoalTask Goal: locating blog posts that express an opinion about a target.Retrieval Unit: Permalinks (postings + comments): 3,215,171 documents
0.35160.58000.38660.2733PasTiDesDef
0.31620.52800.35800.2362ParTiDef
0.35420.62000.40340.2812ParTiDesDmt3
0.35010.62000.40400.2845ParTiDesDmt2
0.34900.62000.39980.2849ParTitDesDef
R-PrecP@10BprefMAPRuns
Comparison at Topic RelevanceComparison at Topic Relevance
0.22640.34600.22740.1631ParrTiDesDef
0.21060.33600.22560.1547ParTiDef
0.24170.37800.25680.1873ParTiDesDmt3
0.24210.37800.25730.1887ParTiDesDmt2
0.24410.37800.25210.1882ParTitDesDef
R-PrecP@10BprefMAPRuns
Comparison at Opinion RelevanceComparison at Opinion Relevance
ConclusionsConclusions Paragraphs better for both topic and opinion retrieval. Title+Description queries beat title only. Demoting non-opinionated documents had little effect.
Future WorkFuture WorkParameter tuning for: Low frequency words. Paragraph detection, passage size. Aggregation of opinion scores. Threshold of opinion scores.
ParTiDesDmt2 Better
-0.20
-0.100.00
0.100.20
0.30
0.400.50
0.60
892
859
865
883
852
874
863
870
884
851
873
871
854
867
855
889
875
890
869
872
864
861
881
900
895
885
886
856
857
868
897
877
882
878
853
888
891
880
899
898
896
860
862
876
894
887
866
879
858
893
Median Better
Dif
fere
nce
in A
P ParTiDesDmt2 Better
-0.0800
-0.0600
-0.0400
-0.0200
0.0000
0.0200
0.0400
0.0600
886
859
851
856
867
881
875
872
890
878
865
889
882
864
895
893
853
860
896
891
876
852
885
868
866
862
877
899
900
858
898
857
854
897
863
874
880
870
879
888
861
873
887
871
869
883
894
892
855
884
ParTiDesDef BetterDif
fere
nce
in A
P
))(
1)(
1
)(1
(log),(
21
21
221
whitsN
whitsN
NEARwwhitsNwwPMI
SO-PMI(w) = PMI(w, {positive paradigms})
– PMI(w, {negative paradigms})
Compute Semantic Orientation of words (Turney & Littman, 2002):
Par0001Par0002Par0003Par0004Par0005
Par0001Par0002Par0003Par0004Par0005
Compute SO of words
Wilson & Wiebe’s lexicon
Demotionby 2 or 3 times If <0.15 normalized
0.210.120.440.560.32
12345
16773 lemmas
(-3 ~ -0.05) negative(0.05 ~ 5) positive
8221 lemmas
0.2
1.0
lemmatized
Top 1000 paragraphs
Top 1000 paragraphs
Ranked List
Ranked List
Docs Docs
Permalink Docs
Permalink Docs
“Cleaned” Docs
“Cleaned” Docs
Fixed sizedPassages
Fixed sizedPassages
ParagraphsParagraphs
Indri IndexIndri Index
cleaning
query
Window = 50 words Overlap = 10 words
Cleaning rules based on top 5 blog hosting sites
Topic relevance evaluation
merge merge
Lemmatize;Remove: stop words &Not in dictionary by “spell” DF<=40
Opinion relevance evaluation
Docsmerge
BLOG06-20051224-029-0001622821.clnNotify Blogger about objectionable content. What does this mean? Blogger Get your own blog Flag Blog Next blog rabbit + crow blog It's like anything. Tuesday, December 20, 2005 Blue Planet "...Whoa!...Wow!...WOW!...Holy shit!...WOW!!!..." (my wife and I watching the first episode of Sir David Attenborough's "The Blue Planet" tonight) posted by Neal Romanek at 9:18 PM - permalink 0 Comments: Comment? About Me My Photo Name:Neal Romanek Location:Los Angeles
The Previous Posts Archives SUBSCRIBE to the Rabbit + Crow Blog! WEEKLEY POALE In our last Weeklie Poll, we asked which was your favorite Ceratopsian. The winner, amid spiky competition, was, of course... ...STYRACOSAURUS. THIS WEEK... If you found you could no longer walk, which mode of ambulation would you instead adopt? (_) Bustling (_) Charging (_) Creeping (_) Tip-Toeing (_) All of the above in various combinations (_) None of the above. I would adopt a stony stillness. buy the new shirt questions? suggestions? [email protected] Listed on BlogShares blog search directory
Example Topic:
<num> Number: 851
<title> "March of the Penguins"
<desc> Description:
Provide opinion of the film documentary "March of the Penguins".
<narr> Narrative:
Relevant documents should include opinions concerning the film documentary "March of the Penguins". Articles or comments about penguins outside the context of this film documentary are not relevant.
Participation GoalsParticipation Goals Building an expert search baseline system Applying models of identity to public mailing lists Building a reference-resolution infrastructure
Participation GoalsParticipation Goals
Results and AnalysisResults and Analysis
ConclusionConclusion
Pre Post Manual Manual + AutoFiller
Automatic
Consistent judgments
427(87.9%)
995(90.3%)
452(90.0%)
Y Y 194 224 78
N N 233 771 374
Inconsistent judgments
59(12.1%)
107(9.7%)
50(10.0%)
Y N 37 48 20
N Y 22 59 30
Difference -15 +11 +10
Type # Topics Avg. Improvement
#1 10 -0.0124 (-4.0%)
#2 12 0.0300 (8.2%)
#3 8 0.106 (44.0%)
Relevant Sentences
Partially Relevant
Sentences
Not Relevant Sentences
Nugget 74 8 16
Not Nugget 258 69 270
All 332 77 286
Percentage 22% 10% 6%
Relevance feedback does not always work for QA The error margin of nugget judgments is ~10% Relevant sentence ≠ answer nugget
To explore the effectiveness of single-iteration written clarification dialogs; To explore different strategies for clarifying user needs in question answering; To better understand the nature of complex, template-based questions.
Run F-Score
UMDM1pre
UMDM1post
0.316
0.350 (+10.6%)
UMDA1pre
UMDA1post
0.224
0.180 (-19.4%)
Analysis 2: Consistency in Judgment
Analysis 3: Relevant Sentences vs. Answer Nuggets
Future WorkFuture Work Examination of possible systematic errors in nugget judgments Exploration of the relationship between relevant sentences and answer nuggets
Document Retrieval
Top 20 relevant documents Answer Generation
Unordered AnswersAnswer Ranking
Interaction Forms Generation
Analysis of Interaction Responses
Ordered Answers
Refined Answers
Questions
Queries
Example Question: Topic 26. Question: What evidence is there for transport of [smuggled VCDs] from [Hong Kong] to [China]? Narrative: The analyst is particularly interested in knowing the volume of smuggled VCDs and also the ruses used by smugglers to hide their efforts.
External resources:• CIA World Fact Book• Google• WordNet• Roget’s Thesaurus• Wikipedia
Interaction QuestionsTopic 0261. What types of smuggled disks are you interested in? Check all that apply: □ VCDs □ CDs □ DVDs □ Other. Please specify: …
Importance of Answer TypesTopic 042Please rate the importance of following types of evidence.1. General claim of effects of aspirin. ○ Important. ○ Somewhat important. ○ Not needed at all.2. Guideline of how aspirin can be used to treat heart diseases. ○ Important. ○ Somewhat important. ○ Not needed at all.…
Relevance Feedback Topic 055Please indicate the relevance of the following answers.1. Most of Sierra Leone's diamonds were and still are smuggled into neighboring Liberia for sale, according to several human rights groups and diamond industry experts. ○ Relevant. ○ Somewhat relevant. ○ Not relevant.…
Three types of interaction:
1
2
3
MethodsMethods
Analysis 1: Interaction Performances by Type of Interaction
-50%
0%
50%
100%
150%
200%
250%
300%
topics
imp
rove
me
nt
of
F-s
core
(%
)
Sample relevance feedback
Importance of answer types
Clarification questions