Douglas Oard, Tamer Elsayed, Yejun Wu, Pengyi Zhang, Eileen Abels, Jimmy Lin, and Dagobert Soergel

Douglas Oard, Tamer Elsayed, Yejun Wu, Pengyi Zhang, Eileen Abels, Jimmy Lin, and Dagobert SoergelDouglas Oard, Tamer Elsayed, Yejun Wu, Pengyi Zhang, Eileen Abels, Jimmy Lin, and Dagobert SoergelTREC-2006 at Maryland: Blog, Enterprise and QA TracksTREC-2006 at Maryland: Blog, Enterprise and QA Tracks

QA Track: QA Track: ciQA TaskciQA Task

ConclusionConclusion

Title+Narr/Email - retrieved

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

Topic (sorted by difference)

Diff

. fro

m M

edia

n (A

P)

Title+Narr/Email - retrieved+supported

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

Topic (sorted by difference)

Diff

. fro

m M

edia

n (A

P)

Email Support Ratio

0.0

0.2

0.4

0.6

0.8

1.0

Topic (sorted by email support ratio)

Rat

io

Number of Support Emails

0

300

600

900

1200


Em

ails


-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8


Dif

f. f

rom

Med

ian

(AP

)


-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8


Dif

f. f

rom

Med

ian

(AP

)


R2 = 0.4758

-1.0

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.0


Diff

. fro

m B

est (

AP)

Enterprise Track: Enterprise Track: Expert Search TaskExpert Search TaskBlog Track: Blog Track: Opinion Retrieval TaskOpinion Retrieval Task

Retrieval ResultsRetrieval Results Supported Retrieval ResultsSupported Retrieval Results

Improved reference resolutionParameter tuning for weighted-field creditLearning from reply features

Hh hd

TcanddTcand ),|(support)|(score

),(level

),(assoc)),(root(sim),|(support

dhd

dcandTdTcandd

Thread-based ScoringThread-based Scoring

TRd

TcanddTcand ),|(support)|(score

),(assoc ),(sim),|(support dcandTdTcandd

Email-based ScoringEmail-based Scoring

df

f candwdcand )(),(assoc

Candidate List………………………………………………………………….

Models of IdentityEnriched

Candidate Models

Topic

Duplicate Removal

W3C Mailing Lists

Ranked List

Email and Thread Index

Email AddressesFull NamesNicknames

Candidate Scoring

Retrieval Engine

Reference Recognition

Average performanceThreads help in short queriesMore email support more accurate

Future WorkFuture Work

College of Information Studies /College of Information Studies / Computer Science Department / UMIACS, University of Maryland, College Park, USAComputer Science Department / UMIACS, University of Maryland, College Park, USA

Retrieval Support

Query Approach MAP P@10 MAP P@10

Title Email 0.195 0.406 0.072 0.182

Title + Narrative Email 0.350 0.504 0.141 0.298

Title Thread 0.218 0.449 0.090 0.198

Title + Narrative Thread 0.343 0.514 0.139 0.294

Title + Description Thread 0.315 0.502 0.119 0.278

Avg. of Medians 0.341 0.508 0.154 0.294

Removing non email-supported 0.365 0.525 0.147 0.311

Reference Credit Wf for Email Fields

Sender 2.0 Receiver 1.0

Subject 1.0 New text tf

Quoted sender 1.0 Quoted receiver 0.5

Quoted text tf


R2 = 0.3531

-1.0

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.0


Dif

f. f

rom

Bes

t (A

P)

ResultsResults

Hom Much Email Support Over Topics?Hom Much Email Support Over Topics?

Performance Relative to Email SupportPerformance Relative to Email Support

Task GoalTask Goal: locating blog posts that express an opinion about a target.Retrieval Unit: Permalinks (postings + comments): 3,215,171 documents

0.35160.58000.38660.2733PasTiDesDef

0.31620.52800.35800.2362ParTiDef

0.35420.62000.40340.2812ParTiDesDmt3

0.35010.62000.40400.2845ParTiDesDmt2

0.34900.62000.39980.2849ParTitDesDef

R-PrecP@10BprefMAPRuns

Comparison at Topic RelevanceComparison at Topic Relevance

0.22640.34600.22740.1631ParrTiDesDef

0.21060.33600.22560.1547ParTiDef

0.24170.37800.25680.1873ParTiDesDmt3

0.24210.37800.25730.1887ParTiDesDmt2

0.24410.37800.25210.1882ParTitDesDef

R-PrecP@10BprefMAPRuns

Comparison at Opinion RelevanceComparison at Opinion Relevance

ConclusionsConclusions Paragraphs better for both topic and opinion retrieval. Title+Description queries beat title only. Demoting non-opinionated documents had little effect.

Future WorkFuture WorkParameter tuning for: Low frequency words. Paragraph detection, passage size. Aggregation of opinion scores. Threshold of opinion scores.

ParTiDesDmt2 Better

-0.20

-0.100.00

0.100.20

0.30

0.400.50

0.60

892

859

865

883

852

874

863

870

884

851

873

871

854

867

855

889

875

890

869

872

864

861

881

900

895

885

886

856

857

868

897

877

882

878

853

888

891

880

899

898

896

860

862

876

894

887

866

879

858

893

Median Better

Dif

fere

nce

in A

P ParTiDesDmt2 Better

-0.0800

-0.0600

-0.0400

-0.0200

0.0000

0.0200

0.0400

0.0600

886

859

851

856

867

881

875

872

890

878

865

889

882

864

895

893

853

860

896

891

876

852

885

868

866

862

877

899

900

858

898

857

854

897

863

874

880

870

879

888

861

873

887

871

869

883

894

892

855

884

ParTiDesDef BetterDif

fere

nce

in A

P

))(

1)(

1

)(1

(log),(

21

21

221

whitsN

whitsN

NEARwwhitsNwwPMI

SO-PMI(w) = PMI(w, {positive paradigms})

– PMI(w, {negative paradigms})

Compute Semantic Orientation of words (Turney & Littman, 2002):

Par0001Par0002Par0003Par0004Par0005

Par0001Par0002Par0003Par0004Par0005

Compute SO of words

Wilson & Wiebe’s lexicon

Demotionby 2 or 3 times If <0.15 normalized

0.210.120.440.560.32

12345

16773 lemmas

(-3 ~ -0.05) negative(0.05 ~ 5) positive

8221 lemmas

0.2

1.0

lemmatized

Top 1000 paragraphs

Top 1000 paragraphs

Ranked List

Ranked List

Docs Docs

Permalink Docs

Permalink Docs

“Cleaned” Docs

“Cleaned” Docs

Fixed sizedPassages

Fixed sizedPassages

ParagraphsParagraphs

Indri IndexIndri Index

cleaning

query

Window = 50 words Overlap = 10 words

Cleaning rules based on top 5 blog hosting sites

Topic relevance evaluation

merge merge

Lemmatize;Remove: stop words &Not in dictionary by “spell” DF<=40

Opinion relevance evaluation

Docsmerge

BLOG06-20051224-029-0001622821.clnNotify Blogger about objectionable content. What does this mean? Blogger Get your own blog Flag Blog Next blog rabbit + crow blog It's like anything. Tuesday, December 20, 2005 Blue Planet "...Whoa!...Wow!...WOW!...Holy shit!...WOW!!!..." (my wife and I watching the first episode of Sir David Attenborough's "The Blue Planet" tonight) posted by Neal Romanek at 9:18 PM - permalink 0 Comments: Comment? About Me My Photo Name:Neal Romanek Location:Los Angeles

The Previous Posts Archives SUBSCRIBE to the Rabbit + Crow Blog! WEEKLEY POALE In our last Weeklie Poll, we asked which was your favorite Ceratopsian. The winner, amid spiky competition, was, of course... ...STYRACOSAURUS. THIS WEEK... If you found you could no longer walk, which mode of ambulation would you instead adopt? (_) Bustling (_) Charging (_) Creeping (_) Tip-Toeing (_) All of the above in various combinations (_) None of the above. I would adopt a stony stillness. buy the new shirt questions? suggestions? [email protected] Listed on BlogShares blog search directory

Example Topic:

<num> Number: 851

<title> "March of the Penguins"

<desc> Description:

Provide opinion of the film documentary "March of the Penguins".

<narr> Narrative:

Relevant documents should include opinions concerning the film documentary "March of the Penguins". Articles or comments about penguins outside the context of this film documentary are not relevant.

Participation GoalsParticipation Goals Building an expert search baseline system Applying models of identity to public mailing lists Building a reference-resolution infrastructure

Participation GoalsParticipation Goals

Results and AnalysisResults and Analysis

ConclusionConclusion

Pre Post Manual Manual + AutoFiller

Automatic

Consistent judgments

427(87.9%)

995(90.3%)

452(90.0%)

Y Y 194 224 78

N N 233 771 374

Inconsistent judgments

59(12.1%)

107(9.7%)

50(10.0%)

Y N 37 48 20

N Y 22 59 30

Difference -15 +11 +10

Type # Topics Avg. Improvement

#1 10 -0.0124 (-4.0%)

#2 12 0.0300 (8.2%)

#3 8 0.106 (44.0%)

Relevant Sentences

Partially Relevant

Sentences

Not Relevant Sentences

Nugget 74 8 16

Not Nugget 258 69 270

All 332 77 286

Percentage 22% 10% 6%

Relevance feedback does not always work for QA The error margin of nugget judgments is ~10% Relevant sentence ≠ answer nugget

To explore the effectiveness of single-iteration written clarification dialogs; To explore different strategies for clarifying user needs in question answering; To better understand the nature of complex, template-based questions.

Run F-Score

UMDM1pre

UMDM1post

0.316

0.350 (+10.6%)

UMDA1pre

UMDA1post

0.224

0.180 (-19.4%)

Analysis 2: Consistency in Judgment

Analysis 3: Relevant Sentences vs. Answer Nuggets

Future WorkFuture Work Examination of possible systematic errors in nugget judgments Exploration of the relationship between relevant sentences and answer nuggets

Document Retrieval

Top 20 relevant documents Answer Generation

Unordered AnswersAnswer Ranking

Interaction Forms Generation

Analysis of Interaction Responses

Ordered Answers

Refined Answers

Questions

Queries

Example Question: Topic 26. Question: What evidence is there for transport of [smuggled VCDs] from [Hong Kong] to [China]? Narrative: The analyst is particularly interested in knowing the volume of smuggled VCDs and also the ruses used by smugglers to hide their efforts.

External resources:• CIA World Fact Book• Google• WordNet• Roget’s Thesaurus• Wikipedia

Interaction QuestionsTopic 0261. What types of smuggled disks are you interested in? Check all that apply: □ VCDs □ CDs □ DVDs □ Other. Please specify: …

Importance of Answer TypesTopic 042Please rate the importance of following types of evidence.1. General claim of effects of aspirin. ○ Important. ○ Somewhat important. ○ Not needed at all.2. Guideline of how aspirin can be used to treat heart diseases. ○ Important. ○ Somewhat important. ○ Not needed at all.…

Relevance Feedback Topic 055Please indicate the relevance of the following answers.1. Most of Sierra Leone's diamonds were and still are smuggled into neighboring Liberia for sale, according to several human rights groups and diamond industry experts. ○ Relevant. ○ Somewhat relevant. ○ Not relevant.…

Three types of interaction:

1

2

3

MethodsMethods

Analysis 1: Interaction Performances by Type of Interaction

-50%

0%

50%

100%

150%

200%

250%

300%

topics

imp

rove

me

nt

of

F-s

core

(%

)

Sample relevance feedback

Importance of answer types

Clarification questions

Documents

Douglas Oard, Tamer Elsayed, Yejun Wu, Pengyi Zhang, Eileen Abels, Jimmy Lin, and Dagobert Soergel