34
Assessing The Retrieval A.I Lab 2007.01.20 박박박

Assessing The Retrieval

  • Upload
    oliver

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Assessing The Retrieval. A.I Lab 2007.01.20 박동훈. Contents. 4.1 Personal Assessment of Relevance 4.2 Extending the Dialog with RelFbk 4.3 Aggregated Assessment : Search Engine Performance 4.4 RAVE : A Relevance Assessment Vehicle 4.5 Summary. 4.1 Personal Assessment of Relevance. - PowerPoint PPT Presentation

Citation preview

Page 1: Assessing The Retrieval

Assessing The Retrieval

A.I Lab2007.01.20박동훈

Page 2: Assessing The Retrieval

Contents

• 4.1 Personal Assessment of Relevance• 4.2 Extending the Dialog with RelFbk• 4.3 Aggregated Assessment : Search Engi

ne Performance• 4.4 RAVE : A Relevance Assessment Vehicl

e• 4.5 Summary

Page 3: Assessing The Retrieval

4.1 Personal Assessment of Relevance

• 4.1.1 Cognitive Assumptions– Users trying to do ‘object recognition’– Comparison with respect to prototypic docum

ent– Reliability of user opinions?– Relevance Scale– RelFbk is nonmetric

Page 4: Assessing The Retrieval

Relevance Scale

Page 5: Assessing The Retrieval

• Users naturally provides only preference information

• Not(metric) measurement of how relevant a retrieved document is!

RelFbk is nonmetric

Page 6: Assessing The Retrieval

4.2 Extending the Dialog with RelFbk

RelFbk Labeling of the Retr Set

Page 7: Assessing The Retrieval

Query Session, Linked by RelFbk

Page 8: Assessing The Retrieval

4.2.1 Using RelFbk for Query Refinment

Page 9: Assessing The Retrieval
Page 10: Assessing The Retrieval
Page 11: Assessing The Retrieval

4.2.2 Document Modifications due to RelFbk

• Fig 4.7• Change

documents!?• More/less the

query that successfully / un matches them

Page 12: Assessing The Retrieval

4.3 Aggregated Assessment : Search Engine Performance

• 4.3.1 Underlying Assumptions – RelFbk(q,di) assessments independent– Users’ opinions will all agree with single ‘omni

scient’ expert’s

Page 13: Assessing The Retrieval

4.3.2 Consensual relevance

Consensually

relevant

Page 14: Assessing The Retrieval

4.3.4 Basic Measures

• Relevant versus Retrieved Sets

Page 15: Assessing The Retrieval

Contingency table

Relevant Relevant

Retrieved

Retrieved

RelRetr

NNRet

NRet

NRel NNRel NDoc

RelRetr

RelRetr RelRetr

• NRel : the number of relevant documents

• NNRel : the number of irrelevant documents

• NDoc : the total number of documents

• NRet : the number of retrieved documents

• NNRet : the number of documents not retrieved

Page 16: Assessing The Retrieval

4.3.4 Basic Measures (cont)

Ret

RelRetPrecision

Ret

RelRetRe

call

Page 17: Assessing The Retrieval

4.3.4 Basic Measures (cont)

• Ret

RelRetFallout

Page 18: Assessing The Retrieval

4.3.5 Ordering the Retr set

• Each document assigned hitlist rank Rank(di)• Descending Match(q,di)• Rank(di)<Rank(dj) ⇔ Match(q,di)>Match(q,dj)

– Rank(di)<Rank(dj) ⇔ Pr(Rel(di))>Pr(Rel(dj))

• Coordination level : document’s rank in Retr– Number of keywords shared by doc and query

• Goal:Probability Ranking Principle

Page 19: Assessing The Retrieval

• A tale of tworetrievals

Query1 Query2

Page 20: Assessing The Retrieval

Recall/precision curveQuery1

Page 21: Assessing The Retrieval

Recall/precision curveQuery1

Page 22: Assessing The Retrieval

Retrieval envelope

Page 23: Assessing The Retrieval

4.3.6 Normalized recall

ri : i 번째 relevant doc 의 hitlist rank

Worst

Best

Page 24: Assessing The Retrieval

4.3.8 One-Parameter Criteria

• Combining recall and precision• Classification accuracy• Sliding ratio• Point alienation

Page 25: Assessing The Retrieval

Combining recall and precision

• F-measure– [Jardine & van Rijsbergen71]– [Lewis&Gale94]

• Effectiveness– [vanRijsbergen, 1979]

• E=1-F, α=1/(β2+1)• α=0.5=>harmonic mean of

precision & recall

RrecallPrecisionβ

Recall*n1)Precisio(βF

2

2

β

1

Recall

1

Precision1E

Page 26: Assessing The Retrieval

Classification accuracy

• accuracy

• Correct identification of relevant and irrelevant

NDoc

RelRetrRelRetr

Page 27: Assessing The Retrieval

Sliding ratio

• Imagine a nonbinary, metric Rel(di) measure• Rank1, Rank2 computed by two separate system

s

Page 28: Assessing The Retrieval

Point alienation

• Developed to measure human preference data• Capturing fundamental nonmetric nature of RelFb

k

Page 29: Assessing The Retrieval

4.3.9 Test corpora

• More data required for “test corpus”• Standard test corpora• TREC:Text Retrieval Evaluation Conference• TREC’s refined queries• TREC constantly expanding, refining tasks

Page 30: Assessing The Retrieval

More data required for “test corpus”

• Documents• Queries• Relevance assessments Rel(q,d)• Perhaps other data too

– Classification data (Reuters)– Hypertext graph structure (EB5)

Page 31: Assessing The Retrieval

Standard test corpora

Page 32: Assessing The Retrieval

TREC constantly expanding,refining tasks

• Ad hoc queries tasks• Routing/filtering task• Interactive task

Page 33: Assessing The Retrieval

Other Measure

• Expected search length (ESL)– Length of “path” as user walks down HitList

– ESL=Num. irrelevant documents before each relevant document

– ESL for random retrieval

– ESL reduction factor

Page 34: Assessing The Retrieval

4.5 Summary

• Discussed both metric and nonmetric relevance feedback

• The difficulties in getting users to provide relevance judgments for documents in the retrieved set

• Quantified several measures of system perfomance