Upload
oliver
View
35
Download
0
Embed Size (px)
DESCRIPTION
Assessing The Retrieval. A.I Lab 2007.01.20 박동훈. Contents. 4.1 Personal Assessment of Relevance 4.2 Extending the Dialog with RelFbk 4.3 Aggregated Assessment : Search Engine Performance 4.4 RAVE : A Relevance Assessment Vehicle 4.5 Summary. 4.1 Personal Assessment of Relevance. - PowerPoint PPT Presentation
Citation preview
Assessing The Retrieval
A.I Lab2007.01.20박동훈
Contents
• 4.1 Personal Assessment of Relevance• 4.2 Extending the Dialog with RelFbk• 4.3 Aggregated Assessment : Search Engi
ne Performance• 4.4 RAVE : A Relevance Assessment Vehicl
e• 4.5 Summary
4.1 Personal Assessment of Relevance
• 4.1.1 Cognitive Assumptions– Users trying to do ‘object recognition’– Comparison with respect to prototypic docum
ent– Reliability of user opinions?– Relevance Scale– RelFbk is nonmetric
Relevance Scale
• Users naturally provides only preference information
• Not(metric) measurement of how relevant a retrieved document is!
RelFbk is nonmetric
4.2 Extending the Dialog with RelFbk
RelFbk Labeling of the Retr Set
Query Session, Linked by RelFbk
4.2.1 Using RelFbk for Query Refinment
4.2.2 Document Modifications due to RelFbk
• Fig 4.7• Change
documents!?• More/less the
query that successfully / un matches them
4.3 Aggregated Assessment : Search Engine Performance
• 4.3.1 Underlying Assumptions – RelFbk(q,di) assessments independent– Users’ opinions will all agree with single ‘omni
scient’ expert’s
4.3.2 Consensual relevance
Consensually
relevant
4.3.4 Basic Measures
• Relevant versus Retrieved Sets
Contingency table
Relevant Relevant
Retrieved
Retrieved
RelRetr
NNRet
NRet
NRel NNRel NDoc
RelRetr
RelRetr RelRetr
• NRel : the number of relevant documents
• NNRel : the number of irrelevant documents
• NDoc : the total number of documents
• NRet : the number of retrieved documents
• NNRet : the number of documents not retrieved
4.3.4 Basic Measures (cont)
•
•
Ret
RelRetPrecision
Ret
RelRetRe
call
4.3.4 Basic Measures (cont)
• Ret
RelRetFallout
4.3.5 Ordering the Retr set
• Each document assigned hitlist rank Rank(di)• Descending Match(q,di)• Rank(di)<Rank(dj) ⇔ Match(q,di)>Match(q,dj)
– Rank(di)<Rank(dj) ⇔ Pr(Rel(di))>Pr(Rel(dj))
• Coordination level : document’s rank in Retr– Number of keywords shared by doc and query
• Goal:Probability Ranking Principle
• A tale of tworetrievals
Query1 Query2
Recall/precision curveQuery1
Recall/precision curveQuery1
Retrieval envelope
4.3.6 Normalized recall
ri : i 번째 relevant doc 의 hitlist rank
Worst
Best
4.3.8 One-Parameter Criteria
• Combining recall and precision• Classification accuracy• Sliding ratio• Point alienation
Combining recall and precision
• F-measure– [Jardine & van Rijsbergen71]– [Lewis&Gale94]
• Effectiveness– [vanRijsbergen, 1979]
• E=1-F, α=1/(β2+1)• α=0.5=>harmonic mean of
precision & recall
RrecallPrecisionβ
Recall*n1)Precisio(βF
2
2
β
1
Recall
1
Precision1E
Classification accuracy
• accuracy
• Correct identification of relevant and irrelevant
NDoc
RelRetrRelRetr
Sliding ratio
• Imagine a nonbinary, metric Rel(di) measure• Rank1, Rank2 computed by two separate system
s
Point alienation
• Developed to measure human preference data• Capturing fundamental nonmetric nature of RelFb
k
4.3.9 Test corpora
• More data required for “test corpus”• Standard test corpora• TREC:Text Retrieval Evaluation Conference• TREC’s refined queries• TREC constantly expanding, refining tasks
More data required for “test corpus”
• Documents• Queries• Relevance assessments Rel(q,d)• Perhaps other data too
– Classification data (Reuters)– Hypertext graph structure (EB5)
Standard test corpora
TREC constantly expanding,refining tasks
• Ad hoc queries tasks• Routing/filtering task• Interactive task
Other Measure
• Expected search length (ESL)– Length of “path” as user walks down HitList
– ESL=Num. irrelevant documents before each relevant document
– ESL for random retrieval
– ESL reduction factor
4.5 Summary
• Discussed both metric and nonmetric relevance feedback
• The difficulties in getting users to provide relevance judgments for documents in the retrieved set
• Quantified several measures of system perfomance