Upload
percy
View
28
Download
1
Embed Size (px)
DESCRIPTION
Iterative residual rescaling: An analysis and generalization of LSI. Rie Kubota Ando & Lillian Lee. Iterative residual rescaling: An analysis and generalization of LSI . In the 24th Annual International ACM SIGIR Conference (SIGIR'2001), 2001. Presenter: 游斯涵. Introduction. - PowerPoint PPT Presentation
Citation preview
1
Iterative residual rescaling: An analysis and generalization of LSI
Rie Kubota Ando & Lillian Lee. Iterative residual rescaling: An analysis and generalization of LSI. In the 24th Annual Int
ernational ACM SIGIR Conference (SIGIR'2001), 2001.
Presenter: 游斯涵
2
Introduction
• The disadvantage of VSM:– Documents that do not share terms are mapped to orthogonal
vectors even if they are clearly related.
• LSI attempts to overcome this shortcomings by projecting the term-document matrix onto a lower-dimensional subspace.
}0,1,0{id
}1,0,1{jd
3
Introduction of IRR
• LSI
• IRR
Aterm
doc
Weight
SVD
U
VT
eigenvalue
eigenvector
eigenvector
rescaling
4
Frobenius norm and matrix 2-norm
• Frobenius norm:
• 2-norm:
m
i
n
jji
def
FXX
1 1
2],[ )(
)( & XrankhX nm
2122
max XyXy
def
5
Analyzing LSI
• Topic-based similarities– C: an n-document collection– D: m-by-n term-document matrix – k: underlying topics (k<n)– Relevance score:
for each document and each topic:
for each document:
True topic-based similarity between and
then we can get a n-by-n matrix S
),( dtrel
)(
2 1),(Ctopicst
dtrel
d d
)(
),(),(),(ctopicst
dtreldtrelddsim
),(],[ ddsimddS
doc
topic
Sdoc
topic
topic
doc
doc
doc
6
The optimum subspace
• Give a subspace of , and B form an orthonormal basis of
•
m
xBBxP T)(
)( m
B
x
xBxBxBxB
xBxB T
Tdef
),cos(),cos(
0)(,
)( ,
xPxfor
xxPxfor
7
The optimum subspace
• We have m-by-n term-document matrix D
• D= the projection of D onto
is
nddd ...... 21
)]( ... )([ 1 ndpdp
)()(
)()())(),(cos(
ji
jT
iji
dPdP
dPdPdPdP
8
The optimum subspace
• Deviation matrix:
find a subspace such that the entries of it are small.
• The optimum subspace
• Optimum error
)()()(, DPDPSdiff TDS
2,)(
)(minarg
DSDrange
opt diff
2, )( DSopt diff
if optimum error is high, then we cannot expect the optimum subspace to fully reveal the topic dominances.
9
The singular value decomposition and LSI
• SVD
• Gained on the left singular vector by following observation:
be the projection of onto the span of
let be the residual vector:
TVUZ
Left singular vector span Z’s rangeand
21 Z
)()(i
j dproj jd
juuu ,...,, 21
)( jir )()1(
ij
i dprojd
10
Analysis of LSI
11
Non-uniformity and LSI
• A crucial quantity in our analysis is the dominance of a given topic t
t
Cd
t dtrel 2),(
12
Non-uniformity and LSI
• Topic mingling
• If the topic mingling is high means the similarity of each document with different topics is high, then the topics will be fairly difficult to distinguish.
'),(',
2)),'(),(()(ttCtopictt Cd
dtreldtrelC
Sdoc
doc
doc
topic
topic
doc
0)( case, documents topic-single thein C
13
Non-uniformity and LSI
• let be the ith largest singular value of . Then i )(DPopt
))((22 Coptii
Cd
t dtrel 2),(
)()()(, DPDPSdiff TDS
2, )( DSopt diff
'),(',
2)),'(),(()(ttCtopictt Cd
dtreldtrelC
n
di direl1
2),(
14
Non-uniformity and LSI
• Define
• We can get the ratio:
the more largest topic dominates the collection, the higher this ratio will tend to be.
opth of dimension theis h where and min1max
min
max
15
Non-uniformity and LSI
• Original error:
Let denote the VSM space
then as
• Root original error
VSM
2
max ~~AASE T
VSM
AAPVSM
~)
~(
maxmax
VSMEVSM
( Input error )
16
Non-uniformity and LSI
• Let be the h-dimension LSI subspace spanned by the first h left singular vectors of D
if
must be close to when the topic-document distribution is relatively uniform.
LSI
2min
min
min
max
2 )ˆˆ(1
ˆˆˆ
ˆ)),(tan(
vsm
vsmoptLSI
vsm̂ˆmin
LSIopt
17
Notation for related values
• is topic mingling• For
we write
the approximation becomes closer as the optimum error (or optimum error) becomes smaller.
0... and 0... 321321 nn yyyyxxxx
n
1
222 if i iiii yxyx
avgiiiiii
opt
i optoptE
n
yxEyxyx
n
1i
222max22
)( and max if
2,max )( DSdiffE
n
diffE FDSavg
)(,
18
Ando’s IRR algorithm
• IRR algorithm
n
j
iji
ij
xi rxru
1
2)()(
1)),cos((maxarg
2
DR~)1(
),( )()()1(i
iii uRprojRR
19
Introduction of IRR
20
Ando’s IRR algorithm
}){,(
~
))),(,cos(),((maxarg
),(
)()()1(
)1(
1
2)(
2
)(
1
2
2
IRRi
iii
n
j
ij
ij
x
IRRi
q
uRprojRR
DR
qrpowxqrpowu
rrqrpow
find the max x which
approximate R
21
Ando’s IRR algorithm
] ... [ ,: D
onto projection its fromSubtract
,...,2,1:for
))),ˆcos(ˆ((maxarg:
:r̂
,...,2,1:for
,...,2,1:for
:
:),(
21IRR
1
21:
lT
ii
n
i iixxj
i
q
ii
bbbBwhereDBB
br
ni
xrrb
rr
ni
lj
DR
lqIRR
22
Auto-scale method
• Automatic scaling factor determination:2)()(
n
DDDf F
Tdef
k
tt
k
tt
k
t tt
n
d
n
d
n
d
k
t
Cdd CtopictFF
T
dtreldtreldtrel
dtreldtrel
dtreldtrelSDD
1
4
22
1
22
1 '
2
1
2
1
2
n
1d 1'
2
1
2
', )(
22
)()(
)),'(),(()),((
))',(),((
))',(),((~~
n
di direl1
2),(
When approximately single-topicdoc
topic
23
Auto-scale method
• Implement auto-scale
We set q to a linear function of f(D)
)(Dfq
24
Dimension selection
• Stopping criterion:
residual ratio (effective for both LSI and IRR)
n
RF
j 2)(
25
Evaluation Matrix
• Kappa average precision:– Pair-wise average precision:
the measured similarity for any two intra-topic documents( share at least one topic) should be higher than for any two cross-topic documents which have no topics in common.
j
jipofpprec i
j
that such pairs topicintra #)(
chance
chancepprecpprec j
j
1
)()(
Denote the document pair with the jth largest measure cosine
pairsdocument of #
pairs topic-intra of #
Non intra-topic probability
26
Evaluation Matrix
• Clustering:
let C be a cluster-topic contingency table
is the number of documents in cluster i that relevance to topic j.
define:
],[ jiC
n
NCS ji ji
, ,)(
column androw its both in
maximum ],[ ],[, uniquetheisjiCifjiCN ji
othewise 0, jiN
27
Experimental setting
• (1)Choose two TREC topics (can choose more than two)• (2)Specified seven distribution type:
– (25,25), (30,20), (35,15), (40,10), (43,7), (45,5), (46,4)– Each document was relevant to exactly one of the pre-select
topics.
• (3)Extracted single-word stemmed terms using TALENT and removed stop-words.
• (4)Create term-document matrix, and length-normalized the document vector.
• (5)implement AUTO-SCALE, set0 3.5 where)( Dfq
28
Controlled-distribution results
• The chosen scaling factor increases on average as the non-uniformity goes up.
min
max
29
Controlled-distribution results
Highest S(C)
lowest S(C)
30
Controlled-distribution results
31
Conclusion
• Provided a new theoretical analysis of LSI.• Showing a precise relationship between LSI’s
performance and the uniformity of the underlying topic-document distribution.
• Extend Ando’s IRR algorithm.• IRR provide a very good performance in comparison to
LSI.
32
IRR on summarization
term
doc
turn to term
sentence
IRRU
VT
Put all document as a query to countthe similarity