Upload
minsub-yim
View
52
Download
1
Embed Size (px)
Citation preview
IR�MODELING�검색�시스템의�모델링
NAVER�임민섭
PROBABILISTIC�RELEVANCE�FRAMEWORK�:�BM25�AND�BEYOND
• INTRODUCTION�
IR의�classical한�확률�모델링은�보통��
“질의와�문서의�probability�of�relevance를�구하고�주어진�질의에�대해�문서를�내림차순으로�ranking한�것”�
이며,�가장�잘�알려진�term-weighting�및�document�scoring�함수로�BM25가�있다.�
DEVELOPMENT�OF�THE�BASIC�MODEL
• Relevance를�다시�한�번�짚고�갑시다.��
-�Information�need에�대해�document가�가지는�(유저가�판단하는)�연관성,�혹은�적합성�
•몇가지�assumption도�넣어보고!�
-�다른�문서�없이도,�information�need만으로�판단되는�property�이다.�
-�Rel,�nonrel�두가지로�결정되는�binary�property.�
-�universal하게�받아들여지지�않을�assumption일�수도�있지만�relevant를�“유저가�보고�싶어하는”을�의미한다고�하면�괜찮은�notion일�듯.
DEVELOPMENT�OF�THE�BASIC�MODEL
• Relevance를�다시�한�번�짚고�갑시다.��
-�Information�need에�대해�document가�가지는�(유저가�판단하는)�연관성,�혹은�적합성�
•몇가지�assumption도�넣어보고!�
-�다른�문서�없이도,�information�need만으로�판단되는�property�이다.�
-�Rel,�nonrel�두가지로�결정되는�binary�property.�
-�universal하게�받아들여지지�않을�assumption일�수도�있지만�relevant를�“유저가�보고�싶어하는”을�의미한다고�하면�괜찮은�notion일�듯.
PROBABILITY�RANKING�PRINCIPLE
•시스템이�각�document가�가지는�relevance�property를�사전에�알�수�없으니까�확률로�표현하는�것이�베스트!�
• Document와�query에�대해�시스템이�알고있는�정보들로�relevance의�확률을�판단�할�수�있다!�
• “If�retrieved�documents�are�ordered�by�decreasing�probability�of�relevance�on�the�data�available,�then�the�system’s�effectiveness�is�the�best�that�can�be�obtained�for�the�data.”
PROBABILITY�RANKING�PRINCIPLE
•시스템이�각�document가�가지는�relevance�property를�사전에�알�수�없으니까�확률로�표현하는�것이�베스트!�
• Document와�query에�대해�시스템이�알고있는�정보들로�relevance의�확률을�판단�할�수�있다!�
• “If�retrieved�documents�are�ordered�by�decreasing�probability�of�relevance�on�the�data�available,�then�the�system’s�effectiveness�is�the�best�that�can�be�obtained�for�the�data.”
PROBABILITY�RANKING�PRINCIPLE
•시스템이�각�document가�가지는�relevance�property를�사전에�알�수�없으니까�확률로�표현하는�것이�베스트!�
• Document와�query에�대해�시스템이�알고있는�정보들로�relevance의�확률을�판단�할�수�있다!�
• “If�retrieved�documents�are�ordered�by�decreasing�probability�of�relevance�on�the�data�available,�then�the�system’s�effectiveness�is�the�best�that�can�be�obtained�for�the�data.”
RANKING�FUNCTION�FOR�QUERY�TERMS
•먼저�문서�d를�term�frequency의�벡터로�표현해�보자:�
•그리고�우리가�원하는�“질의�q�와�문서�d�가�relevant할�확률은:�
•우리는��probability가�아니라�ranking에�관심이�있기�때문에�ranking을�보존하는�odds�of�relevance를�랭킹�함수로�사용할�수�있겠다:�
P (R = rel|q, d)
O(R = rel|d, q) = P (R = rel|d, q)P (R = rel|d, q)
d = (tf 1, tf 2, ..., tf |V |)
RANKING�FUNCTION�FOR�QUERY�TERMS
•먼저�문서�d를�term�frequency의�벡터로�표현해�보자:�
•그리고�우리가�원하는�“질의�q�와�문서�d�가�relevant할�확률은:�
•우리는��probability가�아니라�ranking에�관심이�있기�때문에�ranking을�보존하는�odds�of�relevance를�랭킹�함수로�사용할�수�있겠다:�
P (R = rel|q, d)
O(R = rel|d, q) = P (R = rel|d, q)P (R = rel|d, q)
d = (tf 1, tf 2, ..., tf |V |)
RANKING�FUNCTION�FOR�QUERY�TERMS
•먼저�문서�d를�term�frequency의�벡터로�표현해�보자:�
•그리고�우리가�원하는�“질의�q�와�문서�d�가�relevant할�확률은:�
•우리는��probability가�아니라�ranking에�관심이�있기�때문에�ranking을�보존하는�odds�of�relevance를�랭킹�함수로�사용할�수�있겠다:�
P (R = rel|q, d)
O(R = rel|d, q) = P (R = rel|d, q)P (R = rel|d, q)
d = (tf 1, tf 2, ..., tf |V |)
RANKING�FUNCTION�FOR�QUERY�TERMS
O(R = rel|d, q) = P (R = rel|d, q)P (R = rel|d, q)
=
P (R=rel|q)P (d|R=rel,q)P (d|q)
P (R=rel|q)P (d|R=rel,q)P (d|q)
by Bayes’ rule
=P (R = rel|q)P (d|R = rel, q)
P (R = rel|q)P (d|R = rel, q)
=P (d|R = rel, q)
P (d|R = rel, q)
RANKING�FUNCTION�FOR�QUERY�TERMS
O(R = rel|d, q) = P (R = rel|d, q)P (R = rel|d, q)
=
P (R=rel|q)P (d|R=rel,q)P (d|q)
P (R=rel|q)P (d|R=rel,q)P (d|q)
by Bayes’ rule
=P (R = rel|q)P (d|R = rel, q)
P (R = rel|q)P (d|R = rel, q)
=P (d|R = rel, q)
P (d|R = rel, q)
RANKING�FUNCTION�FOR�QUERY�TERMS
O(R = rel|d, q) = P (R = rel|d, q)P (R = rel|d, q)
=
P (R=rel|q)P (d|R=rel,q)P (d|q)
P (R=rel|q)P (d|R=rel,q)P (d|q)
by Bayes’ rule
=P (R = rel|q)P (d|R = rel, q)
P (R = rel|q)P (d|R = rel, q)
=P (d|R = rel, q)
P (d|R = rel, q)
RANKING�FUNCTION�FOR�QUERY�TERMS
O(R = rel|d, q) = P (R = rel|d, q)P (R = rel|d, q)
=
P (R=rel|q)P (d|R=rel,q)P (d|q)
P (R=rel|q)P (d|R=rel,q)P (d|q)
by Bayes’ rule
=P (R = rel|q)P (d|R = rel, q)
P (R = rel|q)P (d|R = rel, q)
=P (d|R = rel, q)
P (d|R = rel, q)
positive constant for a given query
RANKING�FUNCTION�FOR�QUERY�TERMS
O(R = rel|d, q) = P (R = rel|d, q)P (R = rel|d, q)
=
P (R=rel|q)P (d|R=rel,q)P (d|q)
P (R=rel|q)P (d|R=rel,q)P (d|q)
by Bayes’ rule
=P (R = rel|q)P (d|R = rel, q)
P (R = rel|q)P (d|R = rel, q)positive constant for
a given query
/qP (d|R = rel, q)
P (d|R = rel, q)
RANKING�FUNCTION�FOR�QUERY�TERMS
/qP (d|R = rel, q)
P (d|R = rel, q)
=
|V |Y
i=1
P (ti|R = rel, q)
P (ti|R = rel, q)
term frequency가 0인 애들과 아닌 애들을 나눠서 적어보면
=Y
ti>0
P (ti|R = rel, q)
P (ti|R = rel, q)
Y
ti=0
P (ti|R = rel, q)
P (ti|R = rel, q)
RANKING�FUNCTION�FOR�QUERY�TERMS
/qP (d|R = rel, q)
P (d|R = rel, q)
문서의 벡터 표현방식 & 조건부 독립
term frequency가 0인 애들과 아닌 애들을 나눠서 적어보면
=Y
ti>0
P (ti|R = rel, q)
P (ti|R = rel, q)
Y
ti=0
P (ti|R = rel, q)
P (ti|R = rel, q)
=
|V |Y
i=1
P (tfi |R = rel, q)
P (tfi |R = rel, q)
RANKING�FUNCTION�FOR�QUERY�TERMS
/qP (d|R = rel, q)
P (d|R = rel, q)
문서의�벡터�표현방식�&�조건부�독립
term�frequency가�0인�애들과�아닌�애들을�나눠서�적어보면
=Y
ti>0
P (ti|R = rel, q)
P (ti|R = rel, q)
Y
ti=0
P (ti|R = rel, q)
P (ti|R = rel, q)
=
|V |Y
i=1
P (tfi |R = rel, q)
P (tfi |R = rel, q)
RANKING�FUNCTION�FOR�QUERY�TERMS
/qP (d|R = rel, q)
P (d|R = rel, q)
term�frequency가�0인�애들과�아닌�애들을�나눠서�적어보면
문서의�벡터�표현방식�&�조건부�독립=
|V |Y
i=1
P (tfi |R = rel, q)
P (tfi |R = rel, q)
=Y
tfi>0
P (tfi |R = rel, q)
P (tfi |R = rel, q)
Y
tfi=0
P (0|R = rel, q)
P (0|R = rel, q)
RANKING�FUNCTION�FOR�QUERY�TERMS
• 한가지�assumption을�더�넣어보자:�
질의에�포함되지�않은�term들은�relevant한�문서와�non-relevant한�문서에�동일한�확률로�나타난다.�즉,�
• 그래서 query term들이 아닌 애들은 cancel out 되고 query term 들로만 식을 다시 적어보면:
=Y
ti2Q,ti>0
P (ti|R = rel, q)
P (ti|R = rel, q)
Y
ti2Q,ti=0
P (ti|R = rel, q)
P (ti|R = rel, q)
P (tfi |R = rel, q) = P (tfi |R = rel, q)
RANKING�FUNCTION�FOR�QUERY�TERMS
• 그래서�query�term들이�아닌�애들은�cancel�out�되고�query�term�들로만�식을�다시�적어보면:
=Y
ti2Q,ti>0
P (ti|R = rel, q)
P (ti|R = rel, q)
Y
ti2Q,ti=0
P (ti|R = rel, q)
P (ti|R = rel, q)
• 한가지�assumption을�더�넣어보자:�
질의에�포함되지�않은�term들은�relevant한�문서와�non-relevant한�문서에�동일한�확률로�나타난다.�즉,�P (tfi |R = rel, q) = P (tfi |R = rel, q)
RANKING�FUNCTION�FOR�QUERY�TERMS
• 그래서�query�term들이�아닌�애들은�cancel�out�되고�query�term�들로만�식을�다시�적어보면:
=Y
ti2Q,tfi>0
P (tfi |R = rel, q)
P (tfi |R = rel, q)
Y
ti2Q,tfi=0
P (0|R = rel, q)
P (0|R = rel, q)
• 한가지�assumption을�더�넣어보자:�
질의에�포함되지�않은�term들은�relevant한�문서와�non-relevant한�문서에�동일한�확률로�나타난다.�즉,�P (tfi |R = rel, q) = P (tfi |R = rel, q)
RANKING�FUNCTION�FOR�QUERY�TERMS
• 오른쪽 product에 인 term들에 대해서도 곱하고 왼쪽 product 에서 나눠주면 같은 식이 된다:
ti > 0, ti 2 Q
=Y
ti2Q,ti>0
P (ti|R = rel, q)P (0|R = rel, q)
P (ti|R = rel, q)P (0|R = rel, q)
·Y
ti2Q
P (0|R = rel, q)
P (0|R = rel, q)
=Y
ti2Q,tfi>0
P (tfi |R = rel, q)
P (tfi |R = rel, q)
Y
ti2Q,tfi=0
P (0|R = rel, q)
P (0|R = rel, q)
RANKING�FUNCTION�FOR�QUERY�TERMS
• 오른쪽�product에��������������������인�term들에�대해서도�곱하고�왼쪽�product�에서�나눠주면�같은�식이�된다:
=Y
ti2Q,ti>0
P (ti|R = rel, q)P (0|R = rel, q)
P (ti|R = rel, q)P (0|R = rel, q)
·Y
ti2Q
P (0|R = rel, q)
P (0|R = rel, q)
=Y
ti2Q,tfi>0
P (tfi |R = rel, q)
P (tfi |R = rel, q)
Y
ti2Q,tfi=0
P (0|R = rel, q)
P (0|R = rel, q)
tfi > 0, ti 2 Q
RANKING�FUNCTION�FOR�QUERY�TERMS
·Y
ti2Q
P (0|R = rel, q)
P (0|R = rel, q)
=Y
ti2Q,tfi>0
P (tfi |R = rel, q)
P (tfi |R = rel, q)
Y
ti2Q,tfi=0
P (0|R = rel, q)
P (0|R = rel, q)
=Y
ti2Q,tfi>0
P (tfi |R = rel, q)P (0|R = rel, q)
P (tfi |R = rel, q)P (0|R = rel, q)
• 오른쪽�product에��������������������인�term들에�대해서도�곱하고�왼쪽�product�에서�나눠주면�같은�식이�된다:
tfi > 0, ti 2 Q
positive constant for a given query
RANKING�FUNCTION�FOR�QUERY�TERMS
·Y
ti2Q
P (0|R = rel, q)
P (0|R = rel, q)
=Y
ti2Q,tfi>0
P (tfi |R = rel, q)
P (tfi |R = rel, q)
Y
ti2Q,tfi=0
P (0|R = rel, q)
P (0|R = rel, q)
=Y
ti2Q,tfi>0
P (tfi |R = rel, q)P (0|R = rel, q)
P (tfi |R = rel, q)P (0|R = rel, q)
• 오른쪽�product에��������������������인�term들에�대해서도�곱하고�왼쪽�product�에서�나눠주면�같은�식이�된다:
tfi > 0, ti 2 Q
RANKING�FUNCTION�FOR�QUERY�TERMS
Log함수는 ranking을 보존하니까:
/q log
0
@Y
ti2Q,ti>0
Ui(ti)
1
A =X
ti2Q,ti>0
log(Ui(ti))
) P (rel|q, d) =X
ti2Q,ti>0
wi
=Y
ti2Q,tfi>0
P (tfi |R = rel, q)P (0|R = rel, q)
P (tfi |R = rel, q)P (0|R = rel, q)
RANKING�FUNCTION�FOR�QUERY�TERMS
Log함수는�ranking을�보존하니까:�
/q log
0
@Y
ti2Q,ti>0
Ui(ti)
1
A =X
ti2Q,ti>0
log(Ui(ti))
) P (rel|q, d) =X
ti2Q,ti>0
wi
=Y
ti2Q,tfi>0
P (tfi |R = rel, q)P (0|R = rel, q)
P (tfi |R = rel, q)P (0|R = rel, q)
RANKING�FUNCTION�FOR�QUERY�TERMS
) P (rel|q, d) =X
ti2Q,ti>0
wi
=Y
ti2Q,tfi>0
P (tfi |R = rel, q)P (0|R = rel, q)
P (tfi |R = rel, q)P (0|R = rel, q)
/q log
0
@Y
ti2Q,tfi>0
Ui(ti)
1
A =X
ti2Q,tfi>0
log(Ui(ti))
Log함수는�ranking을�보존하니까:�
RANKING�FUNCTION�FOR�QUERY�TERMS
=Y
ti2Q,tfi>0
P (tfi |R = rel, q)P (0|R = rel, q)
P (tfi |R = rel, q)P (0|R = rel, q)
/q log
0
@Y
ti2Q,tfi>0
Ui(ti)
1
A =X
ti2Q,tfi>0
log(Ui(ti))
) P (rel|q, d) /q
X
ti2Q,tfi>0
wi
Log함수는�ranking을�보존하니까:�
RANKING�FUNCTION�FOR�QUERY�TERMS
•단점은?�이�ranking�function은�rank�order에만�focus를�두고있다.�경우에�따라�rank�order보다�각각의�document가�가지는�explicit�probability가�선호되는�때가�있다.��아쉽게도�위�ranking�함수는�explicit�probability를�나타낼�수가�없네…�
DERIVED�MODELS
•앞서�봤던�random�variable�TF는�term�frequency�뿐만�아니라�document가�가지는�어떠한�성질이든�나타낸다고�생각하면�됩니다!��
•우선은�TF가�이진(binary)�property인�모델(document에�present/absent)을�살펴봅시다�
DERIVED�MODELS
•앞서�봤던�random�variable�TF는�term�frequency�뿐만�아니라�document가�가지는�어떠한�성질이든�나타낸다고�생각하면�됩니다!��
•우선은�TF가�이진(binary)�property인�모델(document에�present/absent)을�살펴봅시다�
THE�BINARY�INDEPENDENCE�MODEL
• TF�가�binary�random�variable이므로�앞서�봤던�weight�함수는�이렇게�됩니다.�
w
BIMi = log
✓P (ti|rel, q)(1� P (ti|rel, q))P (ti|rel, q)(1� P (ti|rel, q))
◆이렇게
wi = log
✓P (tfi |rel, q)P (0|rel, q)P (tfi |rel, q)P (0|rel, q)
◆
THE�BINARY�INDEPENDENCE�MODEL
• TF�가�binary�random�variable이므로�앞서�봤던�weight�함수는�이렇게�됩니다.�
이렇게
wi = log
✓P (tfi |rel, q)P (0|rel, q)P (tfi |rel, q)P (0|rel, q)
◆
w
BIMi = log
✓P (tfi |rel, q)(1� P (tfi |rel, q))P (tfi |rel, q)(1� P (tfi |rel, q))
◆
THE�BINARY�INDEPENDENCE�MODEL
•그런데�이�수식의�조건부가�relevance이니까�whole�collection에�대해�relevance�judge가�되어�있다고�가정을�하고�다음��notation들을�정의해�봅시다.
= size of the whole collection
= number of docs. in the collection containing t_i
= relevant set size
= number of judged relevant docs containing t_i
ni
R
ri
N
w
BIMi = log
✓P (tfi |rel, q)(1� P (tfi |rel, q))P (tfi |rel, q)(1� P (tfi |rel, q))
◆
THE�BINARY�INDEPENDENCE�MODEL
•그런데�이�수식의�조건부가�relevance이니까�whole�collection에�대해�relevance�judge가�되어�있다고�가정을�하고�다음��notation들을�정의해�봅시다.
= size of the whole collection
= number of docs. in the collection containing t_i
= relevant set size
= number of judged relevant docs containing t_i
ni
R
ri
N
w
BIMi = log
✓P (tfi |rel, q)(1� P (tfi |rel, q))P (tfi |rel, q)(1� P (tfi |rel, q))
◆
THE�BINARY�INDEPENDENCE�MODEL
•이렇게�했을�때��
직관적으로�이렇게�나오겠지만,�log�씌우면�weight값이�infinity로�치솟는�경우���������������������(예�R=�1,�r_i�=�0)가�생기므로�패스.��
이러한�문제점을�해결하기�위해서�분자/분모에�작은�상수(pseudo-count)를�추가해�봅시다.
= size of the whole collection
= number of docs. in the collection containing t_i
= relevant set size
= number of judged relevant docs containing t_i
ni
R
ri
N
w
BIMi = log
✓P (tfi |rel, q)(1� P (tfi |rel, q))P (tfi |rel, q)(1� P (tfi |rel, q))
◆
P (tfi |rel, q) =riR, P (tfi |rel, q) =
ni � riN �R
THE�BINARY�INDEPENDENCE�MODEL
•이렇게�했을�때��
직관적으로�이렇게�나오겠지만,�log�씌우면�weight값이�infinity로�치솟는�경우���������������������(예�R=�1,�r_i�=�0)가�생기므로�패스.��
이러한�문제점을�해결하기�위해서�분자/분모에�작은�상수(pseudo-count)를�추가해�봅시다.
= size of the whole collection
= number of docs. in the collection containing t_i
= relevant set size
= number of judged relevant docs containing t_i
ni
R
ri
N
P (tfi |rel, q) =riR, P (tfi |rel, q) =
ni � riN �R
w
BIMi = log
✓P (tfi |rel, q)(1� P (tfi |rel, q))P (tfi |rel, q)(1� P (tfi |rel, q))
◆
THE�BINARY�INDEPENDENCE�MODEL
그리고�이걸�위�weight�함수에�대입해�보면?
= size of the whole collection
= number of docs. in the collection containing t_i
= relevant set size
= number of judged relevant docs containing t_i
ni
R
ri
N
w
RSJi = log
✓(ri + 0.5)(N �R� ni + ri + 0.5)
(ni � ri + 0.5)(R� ri + 0.5)
◆
w
BIMi = log
✓P (tfi |rel, q)(1� P (tfi |rel, q))P (tfi |rel, q)(1� P (tfi |rel, q))
◆
P (tfi |rel, q) =riR
! ri + 0.5
R+ 1, P (tfi |rel, q) =
ni � riN �R
! ni � ri + 0.5
N �R+ 1
THE�BINARY�INDEPENDENCE�MODEL
그리고�이걸�위�weight�함수에�대입해�보면?
= size of the whole collection
= number of docs. in the collection containing t_i
= relevant set size
= number of judged relevant docs containing t_i
ni
R
ri
N
w
BIMi = log
✓P (tfi |rel, q)(1� P (tfi |rel, q))P (tfi |rel, q)(1� P (tfi |rel, q))
◆
P (tfi |rel, q) =riR
! ri + 0.5
R+ 1, P (tfi |rel, q) =
ni � riN �R
! ni � ri + 0.5
N �R+ 1
w
RSJi = log
✓(ri + 0.5)(N �R� ni + ri + 0.5)
(ni � ri + 0.5)(R� ri + 0.5)
◆
THE�BINARY�INDEPENDENCE�MODEL
•다음은�좀�더�현실적인�상황을�고려해�볼까!�whole�collection이�아닌�small�portion만�relevance�judge가�되어있다고�가정을�해보자.�
• judged�set에는�앞서�구했던�wRSJ로�weight값을�구하면�될�것이고.�judged가�되지�않은�set에는�한가지�assumption을�더�추가�:�judged가�아닌�document는�무조건�nonrel�:�complement�method.�
•새로운�assumption으로�notion�업데이트를�해�보면:
= size of the whole collection
= number of docs. in the collection containing t_ini
N
THE�BINARY�INDEPENDENCE�MODEL
•다음은�좀�더�현실적인�상황을�고려해�볼까!�whole�collection이�아닌�small�portion만�relevance�judge가�되어있다고�가정을�해보자.�
• judged�set에는�앞서�구했던�wRSJ로�weight값을�구하면�될�것이고.�judged가�되지�않은�set에는�한가지�assumption을�더�추가�:�judged가�아닌�document는�무조건�nonrel�:�complement�method.�
•새로운�assumption으로�notion�업데이트를�해�보면:
= size of the whole collection
= number of docs. in the collection containing t_ini
N
THE�BINARY�INDEPENDENCE�MODEL
•다음은�좀�더�현실적인�상황을�고려해�볼까!�whole�collection이�아닌�small�portion만�relevance�judge가�되어있다고�가정을�해보자.�
• judged�set에는�앞서�구했던�wRSJ로�weight값을�구하면�될�것이고.�judged가�되지�않은�set에는�한가지�assumption을�더�추가�:�judged가�아닌�document는�무조건�nonrel�:�complement�method.�
•새로운�assumption으로�notion�업데이트를�해�보면:
= size of the judged sample
= number of docs. in the sample containing t_ini
N
THE�BINARY�INDEPENDENCE�MODEL
•실험적으로�봤을�때�complement�method가�judged�sample로�estimate한�것보다�결과가�좋다!�
•마지막으로�relevance에�대한�정보가�전혀�없다고�가정을�해보자.�Complement�method를�사용하면�:�모든�document가�query에�대해�non-relevant하다.�즉,�R=r_i=0.��그렇다면�weight�function은�다음과�같이�바뀐다!�(classical�idf�와�유사한�weight�함수)
Classical idf
w
IDFi = log
✓N � ni + 0.5
ni + 0.5
◆⇠ w
idfi = log
✓N
ni
◆
THE�BINARY�INDEPENDENCE�MODEL
•실험적으로�봤을�때�complement�method가�judged�sample로�estimate한�것보다�결과가�좋다!�
•마지막으로�relevance에�대한�정보가�전혀�없다고�가정을�해보자.�Complement�method를�사용하면�:�모든�document가�query에�대해�non-relevant하다.�즉,�R=r_i=0.��그렇다면�weight�function은�다음과�같이�바뀐다!�(classical�idf�와�유사한�weight�함수)
Classical idf
w
IDFi = log
✓N � ni + 0.5
ni + 0.5
◆⇠ w
idfi = log
✓N
ni
◆
THE�BINARY�INDEPENDENCE�MODEL
•실험적으로�봤을�때�complement�method가�judged�sample로�estimate한�것보다�결과가�좋다!�
•마지막으로�relevance에�대한�정보가�전혀�없다고�가정을�해보자.�Complement�method를�사용하면�:�모든�document가�query에�대해�non-relevant하다.�즉,�R=r_i=0.��그렇다면�weight�function은�다음과�같이�바뀐다!�(classical�idf�와�유사한�weight�함수)
Classical idf
w
IDFi = log
✓N � ni + 0.5
ni + 0.5
◆⇠ w
idfi = log
✓N
ni
◆
RELEVANCE�FEEDBACK�AND�QUERY�EXPANSION
•오,�그렇다면�이제�relevance�feedback(query�modification)이�가능하겠군!�Relevance에�대한�사전�정보가�전혀�없다면�w_IDF를�통해�term�weighting을�하고�relevance�judgement가�좀�생기면�w_RSJ를�사용해서�re-weighting을�한�뒤에�term�weight가�높은�순으로�query에�include�시키면�되겠다.�
• 근데�term�re-weighting은�검색�개선에�별로�효율적이지�않아..�이런�방식으로�include되는�term에는�noise가�많기도�하고..�좀�더�conservative한�방법�없나?�
• w_RSJ를�사용하면�rare�term들�가중치가�너무�높아질�거고..�물론�rare�term이�relevance와�상호관계가�높을�것�같긴�하지만�그런�term들�가진�doc이�많지�않아서�검색�결과�향상에�크게�도움은�안�될듯.�
• 그럼�term�inclusion이�전체적인�score에�미치는�영향도를�측정하는�weight를�살펴볼까?�offer�weight
RELEVANCE�FEEDBACK�AND�QUERY�EXPANSION
•오,�그렇다면�이제�relevance�feedback(query�modification)이�가능하겠군!�Relevance에�대한�사전�정보가�전혀�없다면�w_IDF를�통해�term�weighting을�하고�relevance�judgement가�좀�생기면�w_RSJ를�사용해서�re-weighting을�한�뒤에�term�weight가�높은�순으로�query에�include�시키면�되겠다.�
• 근데�term�re-weighting은�검색�개선에�별로�효율적이지�않아..�이런�방식으로�include되는�term에는�noise가�많기도�하고..�좀�더�conservative한�방법�없나?�
• w_RSJ를�사용하면�rare�term들�가중치가�너무�높아질�거고..�물론�rare�term이�relevance와�상호관계가�높을�것�같긴�하지만�그런�term들�가진�doc이�많지�않아서�검색�결과�향상에�크게�도움은�안�될듯.�
• 그럼�term�inclusion이�전체적인�score에�미치는�영향도를�측정하는�weight를�살펴볼까?�offer�weight
RELEVANCE�FEEDBACK�AND�QUERY�EXPANSION
•오,�그렇다면�이제�relevance�feedback(query�modification)이�가능하겠군!�Relevance에�대한�사전�정보가�전혀�없다면�w_IDF를�통해�term�weighting을�하고�relevance�judgement가�좀�생기면�w_RSJ를�사용해서�re-weighting을�한�뒤에�term�weight가�높은�순으로�query에�include�시키면�되겠다.�
• 근데�term�re-weighting은�검색�개선에�별로�효율적이지�않아..�이런�방식으로�include되는�term에는�noise가�많기도�하고..�좀�더�conservative한�방법�없나?�
• w_RSJ를�사용하면�rare�term들�가중치가�너무�높아질�거고..�물론�rare�term이�relevance와�상호관계가�높을�것�같긴�하지만�그런�term들�가진�doc이�많지�않아서�검색�결과�향상에�크게�도움은�안�될듯.�
• 그럼�term�inclusion이�전체적인�score에�미치는�영향도를�측정하는�weight를�살펴볼까?�offer�weight
RELEVANCE�FEEDBACK�AND�QUERY�EXPANSION
•오,�그렇다면�이제�relevance�feedback(query�modification)이�가능하겠군!�Relevance에�대한�사전�정보가�전혀�없다면�w_IDF를�통해�term�weighting을�하고�relevance�judgement가�좀�생기면�w_RSJ를�사용해서�re-weighting을�한�뒤에�term�weight가�높은�순으로�query에�include�시키면�되겠다.�
• 근데�term�re-weighting은�검색�개선에�별로�효율적이지�않아..�이런�방식으로�include되는�term에는�noise가�많기도�하고..�좀�더�conservative한�방법�없나?�
• w_RSJ를�사용하면�rare�term들�가중치가�너무�높아질�거고..�물론�rare�term이�relevance와�상호관계가�높을�것�같긴�하지만�그런�term들�가진�doc이�많지�않아서�검색�결과�향상에�크게�도움은�안�될듯.�
• 그럼�term�inclusion이�전체적인�score에�미치는�영향도를�측정하는�weight를�살펴볼까?�offer�weight
RELEVANCE�FEEDBACK�AND�QUERY�EXPANSION
⇡ riRwi
/q riwRSJi = OWRSJ
i
OWi = (P ((tfi)|rel)� P (tfi |rel)) · wi
⇡ P (tfi |rel) · wi because P (tfi |rel) � P (tfi |rel)
RELEVANCE�FEEDBACK�AND�QUERY�EXPANSION
⇡ riRwi
/q riwRSJi = OWRSJ
i
OWi = (P ((tfi)|rel)� P (tfi |rel)) · wi
⇡ P (tfi |rel) · wi because P (tfi |rel) � P (tfi |rel)
RELEVANCE�FEEDBACK�AND�QUERY�EXPANSION
⇡ riRwi
/q riwRSJi = OWRSJ
i
OWi = (P ((tfi)|rel)� P (tfi |rel)) · wi
⇡ P (tfi |rel) · wi because P (tfi |rel) � P (tfi |rel)
RELEVANCE�FEEDBACK�AND�QUERY�EXPANSION
⇡ riRwi
/q riwRSJi = OWRSJ
i
OWi = (P ((tfi)|rel)� P (tfi |rel)) · wi
⇡ P (tfi |rel) · wi because P (tfi |rel) � P (tfi |rel)
R is a positive constant
RELEVANCE�FEEDBACK�AND�QUERY�EXPANSION
⇡ riRwi
OWi = (P ((tfi)|rel)� P (tfi |rel)) · wi
⇡ P (tfi |rel) · wi because P (tfi |rel) � P (tfi |rel)
R is a positive constant
/q ri · wRSJi = OWRSJ
i
RELEVANCE�FEEDBACK�AND�QUERY�EXPANSION
•그렇다면�OW를�이용한�query�expansion�과정은�다음과�같겠구나:�
(1)�Relevant�문서들에게서�모든�term들을�추출해서�OW_RSJ로�ranking을�한다.�
(2)�ranking�된�리스트에서�첫�k�개를�query에�include�한다.�
•이�질의확장법은�BIM과�RSJ�weighting을�사용하는�모델이�대상이긴�하지만�BM25에도�나름�성공적이었다고�한다.�물론�단점도�있다.��이�질의확장법을�통해�유사어들을�질의에�포함시킬�수는�있지만�자연스럽게�query�term�independence�assumption이�약해진다는�것.��(지금까지�고려했던�weight�함수들은�이�독립�assumption에�기반을�두고�있다)
RELEVANCE�FEEDBACK�AND�QUERY�EXPANSION
•그렇다면�OW를�이용한�query�expansion�과정은�다음과�같겠구나:�
(1)�Relevant�문서들에게서�모든�term들을�추출해서�OW_RSJ로�ranking을�한다.�
(2)�ranking�된�리스트에서�첫�k�개를�query에�include�한다.�
•이�질의확장법은�BIM과�RSJ�weighting을�사용하는�모델이�대상이긴�하지만�BM25에도�나름�성공적이었다고�한다.�물론�단점도�있다.��이�질의확장법을�통해�유사어들을�질의에�포함시킬�수는�있지만�자연스럽게�query�term�independence�assumption이�약해진다는�것.��(지금까지�고려했던�weight�함수들은�이�독립�assumption에�기반을�두고�있다)
RELEVANCE�FEEDBACK�AND�QUERY�EXPANSION
•그렇다면�OW를�이용한�query�expansion�과정은�다음과�같겠구나:�
(1)�Relevant�문서들에게서�모든�term들을�추출해서�OW_RSJ로�ranking을�한다.�
(2)�ranking�된�리스트에서�첫�k�개를�query에�include�한다.�
•이�질의확장법은�BIM과�RSJ�weighting을�사용하는�모델이�대상이긴�하지만�BM25에도�나름�성공적이었다고�한다.�물론�단점도�있다.��이�질의확장법을�통해�유사어들을�질의에�포함시킬�수는�있지만�자연스럽게�query�term�independence�assumption이�약해진다는�것.��(지금까지�고려했던�weight�함수들은�이�독립�assumption에�기반을�두고�있다)
RELEVANCE�FEEDBACK�AND�QUERY�EXPANSION
•그렇다면�OW를�이용한�query�expansion�과정은�다음과�같겠구나:�
(1)�Relevant�문서들에게서�모든�term들을�추출해서�OW_RSJ로�ranking을�한다.�
(2)�ranking�된�리스트에서�첫�k�개를�query에�include�한다.�
•이�질의확장법은�BIM과�RSJ�weighting을�사용하는�모델이�대상이긴�하지만�BM25에도�나름�성공적이었다고�한다.�물론�단점도�있다.��이�질의확장법을�통해�유사어들을�질의에�포함시킬�수는�있지만�자연스럽게�query�term�independence�assumption이�약해진다는�것.��(지금까지�고려했던�weight�함수들은�이�독립�assumption에�기반을�두고�있다)
BLIND�FEEDBACK
•먼저�relevance�judgement가�전혀�없다고�가정.�그리고�다음�procedure를�따른다:�
(1)�Initial�query로�검색�ㄱㄱ�
(2)�top�k�개의�문서를�relevant하다고�가정한다.�
(3)�앞에서�본�질의�확장법�사용�:�(2)에서�가정한�relevant�set에서�term�extraction�한�뒤�OW_RSJ사용해서�랭킹하고�top�m�개를�query에�include!
BLIND�FEEDBACK
•먼저�relevance�judgement가�전혀�없다고�가정.�그리고�다음�procedure를�따른다:�
(1)�Initial�query로�검색�ㄱㄱ�
(2)�top�k�개의�문서를�relevant하다고�가정한다.�
(3)�앞에서�본�질의�확장법�사용�:�(2)에서�가정한�relevant�set에서�term�extraction�한�뒤�OW_RSJ사용해서�랭킹하고�top�m�개를�query에�include!
BLIND�FEEDBACK
•먼저�relevance�judgement가�전혀�없다고�가정.�그리고�다음�procedure를�따른다:�
(1)�Initial�query로�검색�ㄱㄱ�
(2)�top�k�개의�문서를�relevant하다고�가정한다.�
(3)�앞에서�본�질의�확장법�사용�:�(2)에서�가정한�relevant�set에서�term�extraction�한�뒤�OW_RSJ사용해서�랭킹하고�top�m�개를�query에�include!
BLIND�FEEDBACK
•먼저�relevance�judgement가�전혀�없다고�가정.�그리고�다음�procedure를�따른다:�
(1)�Initial�query로�검색�ㄱㄱ�
(2)�top�k�개의�문서를�relevant하다고�가정한다.�
(3)�앞에서�본�질의�확장법�사용�:�(2)에서�가정한�relevant�set에서�term�extraction�한�뒤�OW_RSJ사용해서�랭킹하고�top�m�개를�query에�include!
BLIND�FEEDBACK
•먼저�relevance�judgement가�전혀�없다고�가정.�그리고�다음�procedure를�따른다:�
(1)�Initial�query로�검색�ㄱㄱ�
(2)�top�k�개의�문서를�relevant하다고�가정한다.�
(3)�앞에서�본�질의�확장법�사용�:�(2)에서�가정한�relevant�set에서�term�extraction�한�뒤�OW_RSJ사용해서�랭킹하고�top�m�개를�query에�include!
BLIND�FEEDBACK
•다음�사항들을�보고�Blind�feedback을�마무리�하도록�하자.
(1)�Blind�feedback은�평균적으로�검색�결과�향상에�도움이�되긴�하지만�initial�결과가�좋지�않은�경우는�fail…�(극단적인�케이스로�초기�검색의�top�k�가�모두�relevant하지�않은�문서였다면�엄한�term들로�질의�확장을�하게�되니까…)�
(2)�말�그대로�blindly�선택하는�것이기�때문에�“relevant하다”라고�단정�짓기�보다는�explicit�probability�of�relevance를�고려하면�좋겠지만�아까도�말했듯이�지금�모델(rank만�고려하는�모델)에서는�그걸�알�방도가�없네…
BLIND�FEEDBACK
•다음�사항들을�보고�Blind�feedback을�마무리�하도록�하자.
(1)�Blind�feedback은�평균적으로�검색�결과�향상에�도움이�되긴�하지만�initial�결과가�좋지�않은�경우는�fail…�(극단적인�케이스로�초기�검색의�top�k�가�모두�relevant하지�않은�문서였다면�엄한�term들로�질의�확장을�하게�되니까…)�
(2)�말�그대로�blindly�선택하는�것이기�때문에�“relevant하다”라고�단정�짓기�보다는�explicit�probability�of�relevance를�고려하면�좋겠지만�아까도�말했듯이�지금�모델(rank만�고려하는�모델)에서는�그걸�알�방도가�없네…
BLIND�FEEDBACK
•다음�사항들을�보고�Blind�feedback을�마무리�하도록�하자.
(1)�Blind�feedback은�평균적으로�검색�결과�향상에�도움이�되긴�하지만�initial�결과가�좋지�않은�경우는�fail…�(극단적인�케이스로�초기�검색의�top�k�가�모두�relevant하지�않은�문서였다면�엄한�term들로�질의�확장을�하게�되니까…)�
(2)�말�그대로�blindly�선택하는�것이기�때문에�“relevant하다”라고�단정�짓기�보다는�explicit�probability�of�relevance를�고려하면�좋겠지만�아까도�말했듯이�지금�모델(rank만�고려하는�모델)에서는�그걸�알�방도가�없네…
THE�ELITENESS�MODEL�AND�BM25
•새로운�컨셉을�소개해�보자.�Eliteness.��이것또한�이진(binary)�property인데��
로�생각하면�될�듯.�Definition은�다음과�같다:�
T ⇥D ! {elite, elite}
E(t, d) =
8<
:
elite, if d is about the term t
elite, otherwise
THE�ELITENESS�MODEL�AND�BM25
•새로운�컨셉을�소개해�보자.�Eliteness.��이것또한�이진(binary)�property인데��
로�생각하면�될�듯.�Definition은�다음과�같다:�
T ⇥D ! {elite, elite}
E(t, d) =
8<
:
elite, if d is about the term t
elite, otherwise
THE�ELITENESS�MODEL�AND�BM25
•새로운�컨셉을�소개해�보자.�Eliteness.��이것또한�이진(binary)�property인데��
로�생각하면�될�듯.�Definition은�다음과�같다:�
T ⇥D ! {elite, elite}
E(t, d) =
8<
:
elite, if d is about the term t
elite, otherwise
THE�ELITENESS�MODEL�AND�BM25
• Eliteness�property가�가지는�assumptions�
(1)��������depends�on�the�eliteness.��
-�즉,�한�document�d�내에서�term�t�의�출현�빈도수는�“d�가�t�에�관한�것이냐�아니냐”�에�영향을�받는다.�
(2)�There�“maybe”�an�association�between�eliteness�and�relevance.�
(3)�위�2개의�assumption만으로�tf와�relevance의�관계가�설명�가능하다.�(즉,�tf는�relevance�와�독립적이다).
tfi
THE�ELITENESS�MODEL�AND�BM25
• Eliteness�property가�가지는�assumptions�
(1)��������depends�on�the�eliteness.��
-�즉,�한�document�d�내에서�term�t�의�출현�빈도수는�“d�가�t�에�관한�것이냐�아니냐”�에�영향을�받는다.�
(2)�There�“maybe”�an�association�between�eliteness�and�relevance.�
(3)�위�2개의�assumption만으로�tf와�relevance의�관계가�설명�가능하다.�(즉,�tf는�relevance�와�독립적이다).
tfi
THE�ELITENESS�MODEL�AND�BM25
• Eliteness�property가�가지는�assumptions�
(1)��������depends�on�the�eliteness.��
-�즉,�한�document�d�내에서�term�t�의�출현�빈도수는�“d�가�t�에�관한�것이냐�아니냐”�에�영향을�받는다.�
(2)�There�“maybe”�an�association�between�eliteness�and�relevance.�
(3)�위�2개의�assumption만으로�tf와�relevance의�관계가�설명�가능하다.�(즉,�tf는�relevance�와�독립적이다).
tfi
THE�ELITENESS�MODEL�AND�BM25
• Eliteness�property가�가지는�assumptions�
(1)��������depends�on�the�eliteness.��
-�즉,�한�document�d�내에서�term�t�의�출현�빈도수는�“d�가�t�에�관한�것이냐�아니냐”�에�영향을�받는다.�
(2)�There�“maybe”�an�association�between�eliteness�and�relevance.�
(3)�위�2개의�assumption만으로�tf와�relevance의�관계가�설명�가능하다.�(즉,�tf는�relevance�와�독립적이다).
tfi
THE�ELITENESS�MODEL�AND�BM25
•이제는�Eliteness�property의�notation을�보자.�
(1)���������������������������������������:�문서�d�가�relevant�할�때�d�가�t_i에�관한�문서일�확률.�
(2)�
(3)���������������������������������������������������:�문서�d�가�t_i에�관한�문서일�때,�d�에�나오는�t_i의�빈도수가�tf일�확률.�
(4)�
(5)
Pi1 = P (Ei = elite|rel)
Pi0 = P (Ei = elite|rel)
Ei1 = P (TFi = tfi|Ei = elite)
Ei0 = P (TFi = tfi|Ei = elite)
) P (TFi = tfi|rel) = Pi1 · Ei1(tfi) + (1� Pi1) · Ei0(tfi)
THE�ELITENESS�MODEL�AND�BM25
•이제는�Eliteness�property의�notation을�보자.�
(1)���������������������������������������:�문서�d�가�relevant�할�때�d�가�t_i에�관한�문서일�확률.�
(2)�
(3)���������������������������������������������������:�문서�d�가�t_i에�관한�문서일�때,�d�에�나오는�t_i의�빈도수가�tf일�확률.�
(4)�
(5)
Pi1 = P (Ei = elite|rel)
Pi0 = P (Ei = elite|rel)
Ei1 = P (TFi = tfi|Ei = elite)
Ei0 = P (TFi = tfi|Ei = elite)
) P (TFi = tfi|rel) = Pi1 · Ei1(tfi) + (1� Pi1) · Ei0(tfi)
THE�ELITENESS�MODEL�AND�BM25
•이제는�Eliteness�property의�notation을�보자.�
(1)���������������������������������������:�문서�d�가�relevant�할�때�d�가�t_i에�관한�문서일�확률.�
(2)�
(3)���������������������������������������������������:�문서�d�가�t_i에�관한�문서일�때,�d�에�나오는�t_i의�빈도수가�tf일�확률.�
(4)�
(5)
Pi1 = P (Ei = elite|rel)
Pi0 = P (Ei = elite|rel)
Ei1 = P (TFi = tfi|Ei = elite)
Ei0 = P (TFi = tfi|Ei = elite)
) P (TFi = tfi|rel) = Pi1 · Ei1(tfi) + (1� Pi1) · Ei0(tfi)
THE�ELITENESS�MODEL�AND�BM25
•이제는�Eliteness�property의�notation을�보자.�
(1)���������������������������������������:�문서�d�가�relevant�할�때�d�가�t_i에�관한�문서일�확률.�
(2)�
(3)���������������������������������������������������:�문서�d�가�t_i에�관한�문서일�때,�d�에�나오는�t_i의�빈도수가�tf일�확률.�
(4)�
(5)
Pi1 = P (Ei = elite|rel)
Pi0 = P (Ei = elite|rel)
Ei1 = P (TFi = tfi|Ei = elite)
Ei0 = P (TFi = tfi|Ei = elite)
) P (TFi = tfi|rel) = Pi1 · Ei1(tfi) + (1� Pi1) · Ei0(tfi)
THE�ELITENESS�MODEL�AND�BM25
•이제는�Eliteness�property의�notation을�보자.�
(1)���������������������������������������:�문서�d�가�relevant�할�때�d�가�t_i에�관한�문서일�확률.�
(2)�
(3)���������������������������������������������������:�문서�d�가�t_i에�관한�문서일�때,�d�에�나오는�t_i의�빈도수가�tf일�확률.�
(4)�
(5)
Pi1 = P (Ei = elite|rel)
Pi0 = P (Ei = elite|rel)
Ei1 = P (TFi = tfi|Ei = elite)
Ei0 = P (TFi = tfi|Ei = elite)
) P (TFi = tfi|rel) = Pi1 · Ei1(tfi) + (1� Pi1) · Ei0(tfi)
THE�ELITENESS�MODEL�AND�BM25
•이제는�Eliteness�property의�notation을�보자.�
(1)���������������������������������������:�문서�d�가�relevant�할�때�d�가�t_i에�관한�문서일�확률.�
(2)�
(3)���������������������������������������������������:�문서�d�가�t_i에�관한�문서일�때,�d�에�나오는�t_i의�빈도수가�tf일�확률.�
(4)�
(5)
Pi1 = P (Ei = elite|rel)
Pi0 = P (Ei = elite|rel)
Ei1 = P (TFi = tfi|Ei = elite)
Ei0 = P (TFi = tfi|Ei = elite)
) P (TFi = tfi|rel) = Pi1 · Ei1(tfi) + (1� Pi1) · Ei0(tfi)
THE�ELITENESS�MODEL�AND�BM25
• (5)를�이전�weight�함수에�대입해�주면?��
) P (TFi = tfi|rel) = Pi1 · Ei1(tfi) + (1� Pi1) · Ei0(tfi)
w
elitei = log
✓(p1E1(tf) + (1� p1)E0(tf))(p0E1(0) + (1� p0)E0(0))
(p1E1(0) + (1� p1)E0(0))(p0E1(tf) + (1� p0)E0(tf))
◆
THE�ELITENESS�MODEL�AND�BM25
• (5)를�이전�weight�함수에�대입해�주면?��
) P (TFi = tfi|rel) = Pi1 · Ei1(tfi) + (1� Pi1) · Ei0(tfi)
w
elitei = log
✓(p1E1(tf) + (1� p1)E0(tf))(p0E1(0) + (1� p0)E0(0))
(p1E1(0) + (1� p1)E0(0))(p0E1(tf) + (1� p0)E0(tf))
◆
THE�ELITENESS�MODEL�AND�BM25
•Assumption�하나만�추가요�:��
term�frequency�는�poisson�분포를�따른다.�
즉,����������������������.��.�마찬가지로�������������������������.�일반적으로������������������이라고�expect�한다.�(d�가�t_i�에�관한�문서라면�t_i가�더�많이�나타날�것이라�예상한다?�말이되네).��그리고�우리는�이것을�2�Poisson�model�이라�부른다.�
•뿌와송�분포�:�
Ei0(tf) ⇠ P (�i0) Ei1(tf) ⇠ P (�i1)
�i1 > �i0
P (k events in interval) =�ke��
k!
THE�ELITENESS�MODEL�AND�BM25
•Assumption�하나만�추가요�:��
term�frequency�는�poisson�분포를�따른다.�
즉,�������������������������.�마찬가지로�������������������������.�일반적으로������������������이라고�expect�한다.�(d�가�t_i�에�관한�문서라면�t_i가�더�많이�나타날�것이라�예상한다?�말이되네).��그리고�우리는�이것을�2�Poisson�model�이라�부른다.�
•뿌와송�분포�:�
Ei0(tf) ⇠ P (�i0) Ei1(tf) ⇠ P (�i1)
�i1 > �i0
P (k events in interval) =�ke��
k!
THE�ELITENESS�MODEL�AND�BM25
•Assumption�하나만�추가요�:��
term�frequency�는�poisson�분포를�따른다.�
즉,�������������������������.�마찬가지로�������������������������.�일반적으로������������������이라고�expect�한다.�(d�가�t_i�에�관한�문서라면�t_i가�더�많이�나타날�것이라�예상한다?�말이되네).��그리고�우리는�이것을�2�Poisson�model�이라�부른다.�
•뿌와송�분포�:�
Ei0(tf) ⇠ P (�i0) Ei1(tf) ⇠ P (�i1)
�i1 > �i0
P (k events in interval) =�ke��
k!
THE�ELITENESS�MODEL�AND�BM25
•그런데�왜�뿌와송�분포를�따른다는거지?�이�뿌와송�분포는�어디서�나온거지?
• Harter의�모델링을�보면�:�
(1)�각�document�는�“word�position을�채우는�방식”으로�생성되며,��
(2)�각�position에�각�word가�들어갈�확률�분포는�다항분포(multinomial�distribution)를�따른다.���
-�즉,�각�position에�각�word가�들어갈�확률은�fixed�(position�dependent�하지�않음)이며,�다른�word가�들어갈�확률과는�독립적이다.�
(3)�따라서�주어진�term에�관해�tf는�이항분포(binomial�distribution)를�따르며�이는�poisson�분포와�유사하다!��굳굳.�왜�뿌와송�분포를�따른다고�했는지�이해가�된다.
THE�ELITENESS�MODEL�AND�BM25
•그런데�왜�뿌와송�분포를�따른다는거지?�이�뿌와송�분포는�어디서�나온거지?
• Harter의�모델링을�보면�:�
(1)�각�document�는�“word�position을�채우는�방식”으로�생성되며,��
(2)�각�position에�각�word가�들어갈�확률�분포는�다항분포(multinomial�distribution)를�따른다.���
-�즉,�각�position에�각�word가�들어갈�확률은�fixed�(position�dependent�하지�않음)이며,�다른�word가�들어갈�확률과는�독립적이다.�
(3)�따라서�주어진�term에�관해�tf는�이항분포(binomial�distribution)를�따르며�이는�poisson�분포와�유사하다!��굳굳.�왜�뿌와송�분포를�따른다고�했는지�이해가�된다.
THE�ELITENESS�MODEL�AND�BM25
•그런데�왜�뿌와송�분포를�따른다는거지?�이�뿌와송�분포는�어디서�나온거지?
• Harter의�모델링을�보면�:�
(1)�각�document�는�“word�position을�채우는�방식”으로�생성되며,��
(2)�각�position에�각�word가�들어갈�확률�분포는�다항분포(multinomial�distribution)를�따른다.���
-�즉,�각�position에�각�word가�들어갈�확률은�fixed�(position�dependent�하지�않음)이며,�다른�word가�들어갈�확률과는�독립적이다.�
(3)�따라서�주어진�term에�관해�tf는�이항분포(binomial�distribution)를�따르며�이는�poisson�분포와�유사하다!��굳굳.�왜�뿌와송�분포를�따른다고�했는지�이해가�된다.
THE�ELITENESS�MODEL�AND�BM25
•그런데�왜�뿌와송�분포를�따른다는거지?�이�뿌와송�분포는�어디서�나온거지?
• Harter의�모델링을�보면�:�
(1)�각�document�는�“word�position을�채우는�방식”으로�생성되며,��
(2)�각�position에�각�word가�들어갈�확률�분포는�다항분포(multinomial�distribution)를�따른다.���
-�즉,�각�position에�각�word가�들어갈�확률은�fixed�(position�dependent�하지�않음)이며,�다른�word가�들어갈�확률과는�독립적이다.�
(3)�따라서�주어진�term에�관해�tf는�이항분포(binomial�distribution)를�따르며�이는�poisson�분포와�유사하다!��굳굳.�왜�뿌와송�분포를�따른다고�했는지�이해가�된다.
THE�ELITENESS�MODEL�AND�BM25
•그런데�왜�뿌와송�분포를�따른다는거지?�이�뿌와송�분포는�어디서�나온거지?
• Harter의�모델링을�보면�:�
(1)�각�document�는�“word�position을�채우는�방식”으로�생성되며,��
(2)�각�position에�각�word가�들어갈�확률�분포는�다항분포(multinomial�distribution)를�따른다.���
-�즉,�각�position에�각�word가�들어갈�확률은�fixed�(position�dependent�하지�않음)이며,�다른�word가�들어갈�확률과는�독립적이다.�
(3)�따라서�주어진�term에�관해�tf는�이항분포(binomial�distribution)를�따르며�이는�poisson�분포와�유사하다!��굳굳.�왜�뿌와송�분포를�따른다고�했는지�이해가�된다.
THE�ELITENESS�MODEL�AND�BM25
•그러면�Ei1이랑�Ei0대신에�뿌와송�분포를�대입해서�식을�보면�되겠네!�근데�결과가�너무�지저분…�어떻게�좀�쉽게�만들�수�없을까?��
•������������랑�비슷한�“모양”을�가진�함수�이면서�좀�더�간단한�함수로�approximate하면�좋을�것�같은데?�그럼������������의�일반적�특징을�좀�보도록�하자:�
(1)�������������������������(term�frequency가�0�이면�weight�가�0이다).�
(2)������가�늘어나면�������������������도�같이�늘어난다.�
(3)�하지만�����가�무한으로�갈�수록������������������는�������������������로�수렴한다�:�saturation�(어떠한�term도�document�scoring에�끼칠�수�있는�영향은�한계가�있다)
welitei (0) = 0
tf welitei (tf)
tf welitei (tf) wBIM
i (tf)
welitei
welitei
THE�ELITENESS�MODEL�AND�BM25
•그러면�Ei1이랑�Ei0대신에�뿌와송�분포를�대입해서�식을�보면�되겠네!�근데�결과가�너무�지저분…�어떻게�좀�쉽게�만들�수�없을까?��
•������������랑�비슷한�“모양”을�가진�함수�이면서�좀�더�간단한�함수로�approximate하면�좋을�것�같은데?�그럼������������의�일반적�특징을�좀�보도록�하자:�
(1)�������������������������(term�frequency가�0�이면�weight�가�0이다).�
(2)������가�늘어나면�������������������도�같이�늘어난다.�
(3)�하지만�����가�무한으로�갈�수록������������������는�������������������로�수렴한다�:�saturation�(어떠한�term도�document�scoring에�끼칠�수�있는�영향은�한계가�있다)
welitei (0) = 0
tf welitei (tf)
tf welitei (tf) wBIM
i (tf)
welitei
welitei
THE�ELITENESS�MODEL�AND�BM25
•그러면�Ei1이랑�Ei0대신에�뿌와송�분포를�대입해서�식을�보면�되겠네!�근데�결과가�너무�지저분…�어떻게�좀�쉽게�만들�수�없을까?��
•������������랑�비슷한�“모양”을�가진�함수�이면서�좀�더�간단한�함수로�approximate하면�좋을�것�같은데?�그럼������������의�일반적�특징을�좀�보도록�하자:�
(1)�������������������������(term�frequency가�0�이면�weight�가�0이다).�
(2)������가�늘어나면�������������������도�같이�늘어난다.�
(3)�하지만�����가�무한으로�갈�수록������������������는�������������������로�수렴한다�:�saturation�(어떠한�term도�document�scoring에�끼칠�수�있는�영향은�한계가�있다)
welitei (0) = 0
tf welitei (tf)
tf welitei (tf) wBIM
i (tf)
welitei
welitei
THE�ELITENESS�MODEL�AND�BM25
•그러면�Ei1이랑�Ei0대신에�뿌와송�분포를�대입해서�식을�보면�되겠네!�근데�결과가�너무�지저분…�어떻게�좀�쉽게�만들�수�없을까?��
•������������랑�비슷한�“모양”을�가진�함수�이면서�좀�더�간단한�함수로�approximate하면�좋을�것�같은데?�그럼������������의�일반적�특징을�좀�보도록�하자:�
(1)�������������������������(term�frequency가�0�이면�weight�가�0이다).�
(2)������가�늘어나면�������������������도�같이�늘어난다.�
(3)�하지만�����가�무한으로�갈�수록������������������는�������������������로�수렴한다�:�saturation�(어떠한�term도�document�scoring에�끼칠�수�있는�영향은�한계가�있다)
welitei (0) = 0
tf welitei (tf)
tf welitei (tf) wBIM
i (tf)
welitei
welitei
THE�ELITENESS�MODEL�AND�BM25
•그러면�Ei1이랑�Ei0대신에�뿌와송�분포를�대입해서�식을�보면�되겠네!�근데�결과가�너무�지저분…�어떻게�좀�쉽게�만들�수�없을까?��
•������������랑�비슷한�“모양”을�가진�함수�이면서�좀�더�간단한�함수로�approximate하면�좋을�것�같은데?�그럼������������의�일반적�특징을�좀�보도록�하자:�
(1)�������������������������(term�frequency가�0�이면�weight�가�0이다).�
(2)������가�늘어나면�������������������도�같이�늘어난다.�
(3)�하지만�����가�무한으로�갈�수록������������������는�������������������로�수렴한다�:�saturation�(어떠한�term도�document�scoring에�끼칠�수�있는�영향은�한계가�있다)
welitei (0) = 0
tf welitei (tf)
tf welitei (tf) wBIM
i (tf)
welitei
welitei
THE�ELITENESS�MODEL�AND�BM25
•음…�그런데�저런�특징들을�만족하는�함수가�뭐가�있을까…�일단�잘�모르겠으니까�간단한�함수로�시작�해보자:�
�������������������������
• 이�함수면�아까�말�했던�3가지�property들�다�만족을�하는군.�
• k�가�작으면�금방�asymptote에�다다르고�클수록�계속해서�tf의�증가량이�영향력을�보인다.
tf
k + tffor some k > 0
THE�ELITENESS�MODEL�AND�BM25
•음…�그런데�저런�특징들을�만족하는�함수가�뭐가�있을까…�일단�잘�모르겠으니까�간단한�함수로�시작�해보자:�
�������������������������
• 이�함수면�아까�말�했던�3가지�property들�다�만족을�하는군.�
• k�가�작으면�금방�asymptote에�다다르고�클수록�계속해서�tf의�증가량이�영향력을�보인다.
tf
k + tffor some k > 0
THE�ELITENESS�MODEL�AND�BM25
•음…�그런데�저런�특징들을�만족하는�함수가�뭐가�있을까…�일단�잘�모르겠으니까�간단한�함수로�시작�해보자:�
�������������������������
• 이�함수면�아까�말�했던�3가지�property들�다�만족을�하는군.�
• k�가�작으면�금방�asymptote에�다다르고�클수록�계속해서�tf의�증가량이�영향력을�보인다.
tf
k + tffor some k > 0
THE�ELITENESS�MODEL�AND�BM25
•음…�그런데�저런�특징들을�만족하는�함수가�뭐가�있을까…�일단�잘�모르겠으니까�간단한�함수로�시작�해보자:�
�������������������������
• 이�함수면�아까�말�했던�3가지�property들�다�만족을�하는군.�
• k�가�작으면�금방�asymptote에�다다르고�클수록�계속해서�tf의�증가량이�영향력을�보인다.
tf
k + tffor some k > 0
THE�ELITENESS�MODEL�AND�BM25
• Saturation과�asymptotic�maximum�������������의�approximation�������������을�합쳐서�BM25의�초기모델이�탄생!�
•한가지�마지막으로�고려해야�될�점이�남았다.�바로�document�length!
(wBIM )
(wRSJ)
wi(tf) =tf
k + tfwRSJ
i
THE�ELITENESS�MODEL�AND�BM25
• Saturation과�asymptotic�maximum�������������의�approximation�������������을�합쳐서�BM25의�초기모델이�탄생!�
•한가지�마지막으로�고려해야�될�점이�남았다.�바로�document�length!
(wBIM )
(wRSJ)
wi(tf) =tf
k + tfwRSJ
i
THE�ELITENESS�MODEL�AND�BM25
• Saturation과�asymptotic�maximum�������������의�approximation�������������을�합쳐서�BM25의�초기모델이�탄생!�
•한가지�마지막으로�고려해야�될�점이�남았다.�바로�document�length!
(wBIM )
(wRSJ)
wi(tf) =tf
k + tfwRSJ
i
THE�ELITENESS�MODEL�AND�BM25
•Document�length는�다음�두가지�factor�에�의해�길어질�수�있다.�
(1)�Verbosity�:�같은�얘기를�주절주절�길게하는�저자�(극단적인�케이스�:�복붙해서�문서�크기�n배로�만들기)�
(2)�Scope�:�한�document에�여러�주제들을�구겨�넣는�저자�(극단적인�케이스�:�전혀�다른�주제의�문서들을�concatenate)�
• Verbosity가�발견되는�경우에는�document�length로�normalize하면�되고�Scope인�경우에는�반대로�해주면�된다.
THE�ELITENESS�MODEL�AND�BM25
•Document�length는�다음�두가지�factor�에�의해�길어질�수�있다.�
(1)�Verbosity�:�같은�얘기를�주절주절�길게하는�저자�(극단적인�케이스�:�복붙해서�문서�크기�n배로�만들기)�
(2)�Scope�:�한�document에�여러�주제들을�구겨�넣는�저자�(극단적인�케이스�:�전혀�다른�주제의�문서들을�concatenate)�
• Verbosity가�발견되는�경우에는�document�length로�normalize하면�되고�Scope인�경우에는�반대로�해주면�된다.
THE�ELITENESS�MODEL�AND�BM25
•Document�length는�다음�두가지�factor�에�의해�길어질�수�있다.�
(1)�Verbosity�:�같은�얘기를�주절주절�길게하는�저자�(극단적인�케이스�:�복붙해서�문서�크기�n배로�만들기)�
(2)�Scope�:�한�document에�여러�주제들을�구겨�넣는�저자�(극단적인�케이스�:�전혀�다른�주제의�문서들을�concatenate)�
• Verbosity가�발견되는�경우에는�document�length로�normalize하면�되고�Scope인�경우에는�반대로�해주면�된다.
THE�ELITENESS�MODEL�AND�BM25
•Document�length는�다음�두가지�factor�에�의해�길어질�수�있다.�
(1)�Verbosity�:�같은�얘기를�주절주절�길게하는�저자�(극단적인�케이스�:�복붙해서�문서�크기�n배로�만들기)�
(2)�Scope�:�한�document에�여러�주제들을�구겨�넣는�저자�(극단적인�케이스�:�전혀�다른�주제의�문서들을�concatenate)�
• Verbosity가�발견되는�경우에는�document�length로�normalize하면�되고�Scope인�경우에는�반대로�해주면�된다.
THE�ELITENESS�MODEL�AND�BM25
•실제�문서에는�verbosity,�scope�둘�다�나타나는데�일반적으로�두개의�콤보로�나온다고�생각을�하고�다음�notion들을�보자.�
�Document�length�:�
�Average�doclength�:�
�Length�Normalisation�component�:�
(b가�1이면�fully�정규화,�b�=�0�이면�노�정규화)
dl =
|V |X
i
tfi
avdl
B :=
✓(1� b) + b
dl
avdl
◆, 0 b 1
THE�ELITENESS�MODEL�AND�BM25
•실제�문서에는�verbosity,�scope�둘�다�나타나는데�일반적으로�두개의�콤보로�나온다고�생각을�하고�다음�notion들을�보자.�
�Document�length�:�
�Average�doclength�:�
�Length�Normalisation�component�:�
(b가�1이면�fully�정규화,�b�=�0�이면�노�정규화)
dl =
|V |X
i
tfi
avdl
B :=
✓(1� b) + b
dl
avdl
◆, 0 b 1
THE�ELITENESS�MODEL�AND�BM25
•실제�문서에는�verbosity,�scope�둘�다�나타나는데�일반적으로�두개의�콤보로�나온다고�생각을�하고�다음�notion들을�보자.�
�Document�length�:�
�Average�doclength�:�
�Length�Normalisation�component�:�
(b가�1이면�fully�정규화,�b�=�0�이면�노�정규화)
dl =
|V |X
i
tfi
avdl
B :=
✓(1� b) + b
dl
avdl
◆, 0 b 1
THE�ELITENESS�MODEL�AND�BM25
•실제�문서에는�verbosity,�scope�둘�다�나타나는데�일반적으로�두개의�콤보로�나온다고�생각을�하고�다음�notion들을�보자.�
�Document�length�:�
�Average�doclength�:�
�Length�Normalisation�component�:�
(b가�1이면�fully�정규화,�b�=�0�이면�노�정규화)
dl =
|V |X
i
tfi
avdl
B :=
✓(1� b) + b
dl
avdl
◆, 0 b 1
THE�ELITENESS�MODEL�AND�BM25
•자,�이제�이�term들을�가지고�saturation�function에�적용해�보자!�
•많은�experiment들을�해보셨고�0.5�<�b�<�0.8,�그리고�1.2�<�k�<�2�정도가�괜춘.��하지만�최적값은�문서의�종류나�질의�종류�등�다른�factor들에�dependent�하다는�설도�있다.
tf 0 =tf
B
wBM25i (tf) =
tf 0
k1 + tf 0 · wRSJi
=tf
k1�(1� b) + b dl
avdl
�+ tf
· wRSJi
THE�ELITENESS�MODEL�AND�BM25
•자,�이제�이�term들을�가지고�saturation�function에�적용해�보자!�
•많은�experiment들을�해보셨고�0.5�<�b�<�0.8,�그리고�1.2�<�k�<�2�정도가�괜춘.��하지만�최적값은�문서의�종류나�질의�종류�등�다른�factor들에�dependent�하다는�설도�있다.
tf 0 =tf
B
wBM25i (tf) =
tf 0
k1 + tf 0 · wRSJi
=tf
k1�(1� b) + b dl
avdl
�+ tf
· wRSJi
THE�ELITENESS�MODEL�AND�BM25
•자,�이제�이�term들을�가지고�saturation�function에�적용해�보자!�
•많은�experiment들을�해보셨고�0.5�<�b�<�0.8,�그리고�1.2�<�k�<�2�정도가�괜춘.��하지만�최적값은�문서의�종류나�질의�종류�등�다른�factor들에�dependent�하다는�설도�있다.
tf 0 =tf
B
wBM25i (tf) =
tf 0
k1 + tf 0 · wRSJi
=tf
k1�(1� b) + b dl
avdl
�+ tf
· wRSJi
THE�ELITENESS�MODEL�AND�BM25
•자,�이제�이�term들을�가지고�saturation�function에�적용해�보자!�
•많은�experiment들을�해보셨고�0.5�<�b�<�0.8,�그리고�1.2�<�k�<�2�정도가�괜춘.��하지만�최적값은�문서의�종류나�질의�종류�등�다른�factor들에�dependent�하다는�설도�있다.
tf 0 =tf
B
wBM25i (tf) =
tf 0
k1 + tf 0 · wRSJi
=tf
k1�(1� b) + b dl
avdl
�+ tf
· wRSJi
THE�ELITENESS�MODEL�AND�BM25
•자,�이제�이�term들을�가지고�saturation�function에�적용해�보자!�
•많은�experiment들을�해보셨고�0.5�<�b�<�0.8,�그리고�1.2�<�k�<�2�정도가�괜춘.��하지만�최적값은�문서의�종류나�질의�종류�등�다른�factor들에�dependent�하다는�설도�있다.
tf 0 =tf
B
wBM25i (tf) =
tf 0
k1 + tf 0 · wRSJi
=tf
k1�(1� b) + b dl
avdl
�+ tf
· wRSJi
MULTIPLE�STREAMS�AND�BM25F
•지금까지는�문서가�하나의�body로�된�text이며�특정한�구조를�가지고�있지�않다고�생각했다.��하지만�보통�최소한의�구조는�가지고�있다.��여기서는�문서가�field�혹은�stream으로�구성된�구조를�가지고�있다고�생각해보자.�
예)�논문�:�Title�/�abstract�/�body�/�references�
• Title,�abstract,�body,�references가�이�문서의�stream들이다.�
• Stream으로�나누는�이유는�특정�stream이�relevance에�더�큰�영향을�미칠�수�있을�가능성이�있기�때문!�(예�:�타이틀�stream에서�질의�term�매칭이�references�stream에서�질의�term�매칭보다�relevance가�높다고�생각할�수�있다)
MULTIPLE�STREAMS�AND�BM25F
•지금까지는�문서가�하나의�body로�된�text이며�특정한�구조를�가지고�있지�않다고�생각했다.��하지만�보통�최소한의�구조는�가지고�있다.��여기서는�문서가�field�혹은�stream으로�구성된�구조를�가지고�있다고�생각해보자.�
예)�논문�:�Title�/�abstract�/�body�/�references�
• Title,�abstract,�body,�references가�이�문서의�stream들이다.�
• Stream으로�나누는�이유는�특정�stream이�relevance에�더�큰�영향을�미칠�수�있을�가능성이�있기�때문!�(예�:�타이틀�stream에서�질의�term�매칭이�references�stream에서�질의�term�매칭보다�relevance가�높다고�생각할�수�있다)
MULTIPLE�STREAMS�AND�BM25F
•지금까지는�문서가�하나의�body로�된�text이며�특정한�구조를�가지고�있지�않다고�생각했다.��하지만�보통�최소한의�구조는�가지고�있다.��여기서는�문서가�field�혹은�stream으로�구성된�구조를�가지고�있다고�생각해보자.�
예)�논문�:�Title�/�abstract�/�body�/�references�
• Title,�abstract,�body,�references가�이�문서의�stream들이다.�
• Stream으로�나누는�이유는�특정�stream이�relevance에�더�큰�영향을�미칠�수�있을�가능성이�있기�때문!�(예�:�타이틀�stream에서�질의�term�매칭이�references�stream에서�질의�term�매칭보다�relevance가�높다고�생각할�수�있다)
MULTIPLE�STREAMS�AND�BM25F
•자,�그렇다면�어떤�text�chunk�(stream으로�구분되지�않는)에�적용할�수�있는�scoring�함수�f�가�있다고�가정을�해보자.��직관적으로는�각각의�stream에�f�를�씌운�다음�linear�combination을�구하면�되겠네.�
•위�방식을�택한다고�했을�때,�Eliteness�모델을�예로�들자면�각각의�(stream,�term)�페어에다가�각기�다른�eliteness�property를�준다는건데…�각�term을�하나의�문서�내에서�다른�stream들에게�독립적으로�적용한다?�이건�좀�말이�안되네…�
• (Term,�document)�property가�문서�내�stream들과�공유를�한다는�설정이�좋겠다.
MULTIPLE�STREAMS�AND�BM25F
•자,�그렇다면�어떤�text�chunk�(stream으로�구분되지�않는)에�적용할�수�있는�scoring�함수�f�가�있다고�가정을�해보자.��직관적으로는�각각의�stream에�f�를�씌운�다음�linear�combination을�구하면�되겠네.�
•위�방식을�택한다고�했을�때,�Eliteness�모델을�예로�들자면�각각의�(stream,�term)�페어에다가�각기�다른�eliteness�property를�준다는건데…�각�term을�하나의�문서�내에서�다른�stream들에게�독립적으로�적용한다?�이건�좀�말이�안되네…�
• (Term,�document)�property가�문서�내�stream들과�공유를�한다는�설정이�좋겠다.
MULTIPLE�STREAMS�AND�BM25F
•자,�그렇다면�어떤�text�chunk�(stream으로�구분되지�않는)에�적용할�수�있는�scoring�함수�f�가�있다고�가정을�해보자.��직관적으로는�각각의�stream에�f�를�씌운�다음�linear�combination을�구하면�되겠네.�
•위�방식을�택한다고�했을�때,�Eliteness�모델을�예로�들자면�각각의�(stream,�term)�페어에다가�각기�다른�eliteness�property를�준다는건데…�각�term을�하나의�문서�내에서�다른�stream들에게�독립적으로�적용한다?�이건�좀�말이�안되네…�
• (Term,�document)�property가�문서�내�stream들과�공유를�한다는�설정이�좋겠다.
MULTIPLE�STREAMS�AND�BM25F
•Notations!
streams s = 1, ..., S
stream length sls
stream weights vs
document (tf1, ..., tf|V|)
tfi vector (tf1i, ..., tfSi)
MULTIPLE�STREAMS�AND�BM25F
•Notations!
streams s = 1, ..., S
stream length sls
stream weights vs
document (tf1, ..., tf|V|)
tfi vector (tf1i, ..., tfSi)
MULTIPLE�STREAMS�AND�BM25F
•Notations!
streams s = 1, ..., S
stream length sls
stream weights vs
document (tf1, ..., tf|V|)
tfi vector (tf1i, ..., tfSi)
MULTIPLE�STREAMS�AND�BM25F
•Notations!
streams s = 1, ..., S
stream length sls
stream weights vs
document (tf1, ..., tf|V|)
tfi vector (tf1i, ..., tfSi)
MULTIPLE�STREAMS�AND�BM25F
•Notations!
streams s = 1, ..., S
stream length sls
stream weights vs
document (tf1, ..., tf|V|)
tfi vector (tf1i, ..., tfSi)
MULTIPLE�STREAMS�AND�BM25F
• stream에�weight를�추가한�BM25의�간단한�확장판!
ftfi =
SX
s=1
vstf si
edl =
SX
s=1
vssls
gavdl = average of
edl across documents
wsimpleBM25Fi =
ftfik1
⇣(1� b) + b
edlgavdl
⌘+
ftfi· wRSJ
i
MULTIPLE�STREAMS�AND�BM25F
• stream에�weight를�추가한�BM25의�간단한�확장판!
ftfi =
SX
s=1
vstf si
edl =
SX
s=1
vssls
gavdl = average of
edl across documents
wsimpleBM25Fi =
ftfik1
⇣(1� b) + b
edlgavdl
⌘+
ftfi· wRSJ
i
MULTIPLE�STREAMS�AND�BM25F
• stream에�weight를�추가한�BM25의�간단한�확장판!
ftfi =
SX
s=1
vstf si
edl =
SX
s=1
vssls
gavdl = average of
edl across documents
wsimpleBM25Fi =
ftfik1
⇣(1� b) + b
edlgavdl
⌘+
ftfi· wRSJ
i
MULTIPLE�STREAMS�AND�BM25F
• stream에�weight를�추가한�BM25의�간단한�확장판!
ftfi =
SX
s=1
vstf si
edl =
SX
s=1
vssls
gavdl = average of
edl across documents
wsimpleBM25Fi =
ftfik1
⇣(1� b) + b
edlgavdl
⌘+
ftfi· wRSJ
i
MULTIPLE�STREAMS�AND�BM25F
• stream마다�다른�normalisation�factor를�고려한�또다른�확장판!
ftfi =SX
s=1
vstf siBs
Bs =
✓(1� bs) + bs
slsavsls
◆, 0 bs 1
wBM25Fi =
ftfik1 +ftfi
· wRSJi
MULTIPLE�STREAMS�AND�BM25F
• stream마다�다른�normalisation�factor를�고려한�또다른�확장판!
ftfi =SX
s=1
vstf siBs
Bs =
✓(1� bs) + bs
slsavsls
◆, 0 bs 1
wBM25Fi =
ftfik1 +ftfi
· wRSJi
MULTIPLE�STREAMS�AND�BM25F
• stream마다�다른�normalisation�factor를�고려한�또다른�확장판!
ftfi =SX
s=1
vstf siBs
Bs =
✓(1� bs) + bs
slsavsls
◆, 0 bs 1
wBM25Fi =
ftfik1 +ftfi
· wRSJi
MULTIPLE�STREAMS�AND�BM25F
• stream마다�다른�normalisation�factor를�고려한�또다른�확장판!
ftfi =SX
s=1
vstf siBs
Bs =
✓(1� bs) + bs
slsavsls
◆, 0 bs 1
wBM25Fi =
ftfik1 +ftfi
· wRSJi
MULTIPLE�STREAMS�AND�BM25F
•두�weight�함수�다�성능�괜춘.�
• degenerate�case도�있다:�stream이�굉장히�verbose해서�거의�모든�term을�다�포함하는�경우.�
•이전과�마찬가지로�relevance정보가�없는�경우�RSJ�대신�IDF를�사용해서�진행하면�됨.
wBM25Fi =
ftfik1 +ftfi
· wRSJiwsimpleBM25F
i =ftfi
k1 eB +ftfi· wRSJ
i
MULTIPLE�STREAMS�AND�BM25F
•두�weight�함수�다�성능�괜춘.�
• degenerate�case도�있다:�stream이�굉장히�verbose해서�거의�모든�term을�다�포함하는�경우.�
•이전과�마찬가지로�relevance정보가�없는�경우�RSJ�대신�IDF를�사용해서�진행하면�됨.
wBM25Fi =
ftfik1 +ftfi
· wRSJiwsimpleBM25F
i =ftfi
k1 eB +ftfi· wRSJ
i
MULTIPLE�STREAMS�AND�BM25F
•두�weight�함수�다�성능�괜춘.�
• degenerate�case도�있다:�stream이�굉장히�verbose해서�거의�모든�term을�다�포함하는�경우.�
•이전과�마찬가지로�relevance정보가�없는�경우�RSJ�대신�IDF를�사용해서�진행하면�됨.
wBM25Fi =
ftfik1 +ftfi
· wRSJiwsimpleBM25F
i =ftfi
k1 eB +ftfi· wRSJ
i
MULTIPLE�STREAMS�AND�BM25F
•두�weight�함수�다�성능�괜춘.�
• degenerate�case도�있다:�stream이�굉장히�verbose해서�거의�모든�term을�다�포함하는�경우.�
•이전과�마찬가지로�relevance정보가�없는�경우�RSJ�대신�IDF를�사용해서�진행하면�됨.
wBM25Fi =
ftfik1 +ftfi
· wRSJiwsimpleBM25F
i =ftfi
k1 eB +ftfi· wRSJ
i
• The�Probabilistic�Relevance�Framework:�BM25�and�Beyond,�by�S.�Robertson�and�H.�Zaragoza.�
• Evaluation�in�information�retrieval,�Introduction�to�Information�Retrieval�:�http://npl.stanford.edu/IR-book/pdf/11prob.pdf�
• IRBasic_Modeling_조근희.pdf�
• 정보�검색론,�이준호
References