View
31
Download
0
Category
Preview:
DESCRIPTION
International Workshop on Machine Learning and Text Analytics (MLTA2013). Linear Algebra for Machine Learning and IR. Manoj Kumar Singh. DST-Centre for Interdisciplinary Mathematical Sciences(DST-CIMS) Banaras Hindu University (BHU), Varanasi-221005, INDIA. E-mail: manoj.dstcims@bhu.ac.in. - PowerPoint PPT Presentation
Citation preview
BHU Banaras Hindu University 1 DST-CIMS
International Workshop on Machine Learning and Text Analytics (MLTA2013)
Linear Algebra for Machine Learning and IR
Manoj Kumar Singh
DST-Centre for Interdisciplinary Mathematical Sciences(DST-CIMS)
Banaras Hindu University (BHU), Varanasi-221005, INDIA.
E-mail: manoj.dstcims@bhu.ac.in
December 15, 2013
South Asian University (SAU), New Delhi.
BHU Banaras Hindu University 2 DST-CIMS
Content
Vector Matrix Model in IR, ML and Other Area
Vector Space
- Formal definition - Linear Combination - Independence - Generator and Basis
- Dimension - Inner product, Norm, Orthogonality - Example
Linear Transformation
- Definition - Matrix and Determinant - LT using Matrix - Rank and Nullity
- Column Space and Row Space - Invertility - Singularity and Non-Singularity – Eigen
Value Eigen Vector - Linear Algebra
Different Type of Matrix And Matrix Algebra
Matrix Factorization
Applications
( ), ( )n nR R R C
BHU Banaras Hindu University 3 DST-CIMS
Vector Matrix Model in IR
A collection consisting of the following five documents is queried for latent semantic indexing (q):
d1 = LSI tutorials and fast tracks.
d2 = Books on semantic analysis.
d3 = Learning latent semantic indexing.
d4 = Advances in structures and advances in indexing.
d5 = Analysis of latent structures.
Rank documents in decreasing order of relevance to the
query?
Recommendation System:Item based collaborative filtering
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
Classification
BHU Banaras Hindu University 4 DST-CIMS
Blind Source Separation
12
Cocktail Party Problem
Humans are capable in steering hearing attention. So it is like identification of source of interest but BSS is about separation of sources, very near to CPP solution
What, who from where
13
Confused Computer in Cocktail Party Situation
In Multiple speaker environment microphones collect garbage …. a hotchpotch of speech !
Source
Measured
,x As [ ]ij m nA a
BHU Banaras Hindu University 5 DST-CIMS
Imaging Application
Figure 1. PSF of the components in FPA imaging system
InputScene
Lens
( , )opticsh x y( , )I x ySample and
HoldCircuit
WGArray
Electronics DisplayHuman
EyeOutputScene
( , )WGh x y
det ( , )h x y
( , )elech x y ( , )disph x y ( , )eyeh x y
( , )O x y
( , )shh x y
Rx
( , )Rxh x y
1f̂ H y
BHU Banaras Hindu University 6 DST-CIMS
Vector Space
Def.: Algebraic structure with sets and binary operations
is vector space if
( ) V,F, ,+, ,
: V V V + :F F F : F F F :F V V
, V F
1 -1 -1
i. (a b) c=a (b c) , a,b,c Vii. e V s.t. a
Associativity:Identity :
Inv
e e a=a, a V
iii. a V, a V s.t. a aerse :Commutativi
a a=eiv. a b=b a, a,b Vty:
(V, ) is Abelian Group:
* *
i. (F,+)
ii. (F , ) F F-{0}
is Abelian Group.
is Abelian Group. Where Multiplication operation is distributive oveiii. , , +:
a (b+c)=a b+a c, , Fr
a,b c
(F,+, ) is Field:
a V, a V, F (a b)= a b, a,b V, F ( ) a= a a, a V, , F ( ) a= ( a), a V, , F 1 a=a, a V, 1 is unity element
i.ii.iii.iv.v. of F
Scalar Mult. satisfy following:
BHU Banaras Hindu University 7 DST-CIMS
Vector Space
Note: 1. Elements of V are called as vector and F are scalar.
2. Vector do not mean vector quantity as defined in vector algebra as directed line segment.
3. We say vector space V over field F and denote it as
V(F).
A vector space is called a over field F if there is an additional operat
(V,F, ,+, , )
multiplication ofion : V V
V called and satisfying the follow
vecing
tpostulates:
a b V,
linear algebra
ors
i.
a,b V a (b c)=(a b) c, a,b,c V
a (b+c)=(a b)+(a c), a,b,c V (a b)=( a) b, a,b V, FIf there is an element 1 in V such that 1 a=a 1=a, a V,then V is linear alg
ii. ii
ebar with ide
i.iv.
n
tity. And 1 is called as
the identity of V.
Algebra V(F) is if:
a b=
commuati
b a,
t
a,b V
ive
Linear Algebra:
BHU Banaras Hindu University 8 DST-CIMS
Vector Space
Linear Combination:
1 1 2 2 n n 1 2 n
1 2 n
a = α a +α a +....+α a , where α ,α ,..α F is called
linear combinat
of the vectors a ,a ,...,ai n .o
V(F) be a vector space. Any vector
Subspace :
Generator:
V(F) be vector space. W V is called a subspace of V if W(F) is itself a vector space
w.r.t. operation in V(F).
be a vector space and . If is a subspace of containing S and is contained in every subspaV(F) S V U V
V S U smallest subspacece containg , then is of containing . This subspace U of containing
VS V i S
s called as subspace of V generated or spanned by S and denoted as [S], i.e U=., [S].
e.g. 3W={(x,2y,3z): x,y,z R} is subpsace of R (R).
3W={(x,y,0): x,y R} is subpsace of R (R).
be vector space of all matrices and be an V(F) 1 A F W={x V: Ax=o matrix i over . s subsp a} ce of V. n m n
BHU Banaras Hindu University 9 DST-CIMS
Vector Space
Linear Span:
1 1 2 2 3 3 1 2 3,
be a vector space and . Linear Span of S, is the set of all linear combinations
of finite sets of elements of S
a a a ... a whe
V(F) S( )
re { , , ,.., }
V L(S),
F and
.
L(S) = n n n
1 2 3, , , } .{a a a . S.,a n
Note: is subspace of andL(S) V(F) L(S)= [S].
Linear Dependence (LD):
1 21 2
1 21 2
, ,.., F
V(F) be vector space, and {a ,a ,...,a } V is said to be LD if s.t.
+ + 0; and some 0.a a ..+ ann
n in
Linear Independence (LI):
1 2
1 21 2 F, 1
0, 1 .
V(F) be vector space, and {a ,a ,...,a } V is said to be LI if
+ + 0, a a ..+ a
i
n
n in i n
i n
Basis: i) is basis of vector space , if S consists of LI elements ii. VS V(F =[S]) V(F =L)) (S).
Dimension: V(F) is said to be finite dimensonal if finite subset S V such that V=L(S)=[S].Number of elemnet in the basis of the finite dimensonal V(F) is the dimenson of V(F).
e.g. 31 2S ={(1,0,0),(0,1,0),(0,0,1)} and S ={(1,0,0),(1,1,0),(1,1,1)} are basis of R (R).
BHU Banaras Hindu University 10 DST-CIMS
Vector Space
Inner Product
An inner product on vector space V(R/C) is a functiion < >:V×V R/C, which assigns each ordered pair of vectors a,b in V a scalar <a, b> such that
i. <a,b>= <a,b> [< > denote complex conjugate]ii.<α
a+βb,c>=α<a,c>+β<b,c>iii. <α,α> 0 and <α,α>=0 α=0
V=C[a,b]. Then inner product: x,y ( ) ( )b
ax t y t dt
1 2 1 2
1 1 2 2
V=R . Then inner product of x=(x ,x ,..,x ), y=(y ,y ,..,y ) is given as
x,y x y ,+x y +..+x
nn n
n ny
Norm / Length: Lenght of a vector in V(F): x = <x,x>
Distance: Distance between two vectors x, y in V(F): d(x,y)= x-y x-y,x-y
Note: (V,d) is metric space.
Orthogonality: (V,<>) is inner product and let x, y V. Vectors x and y are said to be orthogonal to each other if : <x,y>=0
Gram-Schmid
Orthogonal set ( 0) LI ; LI Orthogonality; LI Orthogonality t
Orthogonality: Orthogonal x 1
BHU Banaras Hindu University 11 DST-CIMS
Linear Transformation
Definition (LT): U(F) and V(F) be two vector spaces, a from is
function T:U V such that:
T( x+ y)=
Linear Trnsfor
T(x)
mation
+ T(y
U into
); , F and x,y U
V
Linear Operator:
Range Space of LT: T:U(F) V(F) is LT. The range space of T, R(T), is given as follows:
R(T)={T(x) V: x U}
Null Space of LT: T:U(F) V(F) is LT. The null space of T, N(T), is given as follows:
N(T)={x U: T(x)=0 V}
1. R(T) V is subspace of V, N(T) U is subspace of U.
2. If U(F) is finite dimensonal, then R(T) also finite dimensonal.
Note:
Rank and Nullity of LT:
on is function T:V V such that:
T( x+ y)
Linear
= T(x
V
)
(F
+ T(y); , F and x,y
Operato )r
U
1. Rank:
2. Nullity:
Dimenson of range space of LT. (T)=dim(R(T)).
Dimenson of null space of LT. (T) =dim(N(T)).
Note: (T)+ (T)=dim(U) T:U(F) V(F),
Non-Singular Transform: T:U V is if N(T)={0}, i.e., xA LT U non-si and T(ng x)u =la 0r x =0
Singular Transform: T:U V is if x 0 UA LT s suchingu thalar t T(x)=0
BHU Banaras Hindu University 12 DST-CIMS
Matrices
Definition: A set of elements of any field F arranged in the form of a rectangular array having
rows columns is called an matrix over the field F.
mn
m n m n
Unit / Identity Matrix:
then matrix called as square matrix.
a for which constitute principal diagonal.ij
m n A
i j
1 0 00 1 0 1,
I , , [ ]0,
0 0 1
ij ij n n
i ja A
i j
Diagonal Matrix:
11 12 1n
21 22 2n
1 2 mn
a a a a
, [ ] ;
a a
ij m n
m m m n
aa
A A a
a
Square A=[a ] for which a 0 for .ij n n ij i j
Scalar Matrix: A iDia
s agonal
ny matMatrix A=[a ] for
rix awhich a
nd S is sclar matrix then SA = AS for .
= kA
ij n n ii k i j k 0 00 k 0
S
0 0 kn n
1
3 3 3
d 0 0D 0 0 0
0 0 d
BHU Banaras Hindu University 13 DST-CIMS
Matrices
Upper Triangular Matrix: Square matrix A=[a ] is upper triangular, if a 0 whenever .ij n n ij i j
Lower Triangular Matrix: Square matrix A=[a ] is lower triangular if a 0 whenever .ij n n ij i j
Symmetric :
11 12 13 1n
22 23 2n
33 3n
nn n n
a a a a0 a a a0 0 a a
0 0 0 a
A
11
21 22
31 32 33
n1 n2 n3 nn n n
a 0 0 0a a 0 0a a a 0
a a a a
A
Square matrix A=[a ] is symmetric if a a , , .ij n n ij ji i j
3 3
a b cD b e d
c d f
Skew Symmetric: Square matrix A=[a ] is skey symmetric if a a , , .ij n n ij ji i j 3 3
0 h gD -h 0 f
-g -f 0
BHU Banaras Hindu University 14 DST-CIMS
Transpose :
Matrices
Trace :
T
A=[a ] , the n m matrix obtained from A by changing its rows into columns and column into rows is called the transpose of A, and is denoted by A' or A .
ij m n
T
3 44 3
1 2 3 1 2 3 4 2 3 4 2 3 4 1 , 3 4 23 4 2 1 4 1 1
A A
1
A=[a ] square matrix. The sum of the main diagonal element of A is the trace of the
matrix. tr(A) = n
iii
ij n n
a
If A=[a ] , B=[b ] , then C= A B is defined as c =a b ; .
k A=A k [k
Addition:
Scalar Mult.:
(M,+) is abelian group
no.Multiplic
a ]
A=[a ] , B=[b ] , A B is possible ation: w e h n
ij m n ij m n ij ij ij
ij m n
ij m n ij n p
1
in A is equal to
in B. A B is matrix C = [c ] such that:
of column no. of
rows
c a b
ik n p
n
ik ij jkj
n p
11 12 1n
21 22 2n
1 2 mn
a a a a
a am m m n
aa
A
a
Row /Column Vector Representation of Matrix:
1 2 3 , row of matrix is denoted by vector r (a ,a ,a , ,a )thi i i i i ni
1 2 3 column of matrix is denoted by vector c (a ,a ,a , ,a )thi i i i mii
1
21 2 3
rr c ,c ,c , ,c
rn m n
m m n
A
1 2 1 2row vectors r ,r , ,r V (F) and column vectors c ,c , ,c V (F)n mm m
BHU Banaras Hindu University 15 DST-CIMS
Matrices
Row Space And Row Rank of Matrix :
1 2Let R={ r ,r , ,r } the linear span L(R) V (F) is called as row space Row rank of the maof the matrix.
(A) = dim(L(R)) d
trix
im(V (F))= .
nm
nr n
1 2Let C={ c ,c , ,c } the linear span L(C) V (F) is called as column spac Col.
rank of the matrix
e of the matrix.
(A)=dim(C(R)) dim(V (F))= .
mm
mc m
Column Space And Column Rank of Matrix :
Rank of Matrix : (A) min( (A), (A)).r c
(A)= ?r n 1 2{r ,r ,...,r } is LIm
(A)= ?c m 1 2{c ,c ,...,c } is LIm
Determinant of Square Matrix:
1 2
1 2 1 2
Let be a scalar function(not vector or a matrix function) of x x
x x x x
, , x , called the determinant of A, satisfying the following conditions: ( , , , , x ) ( , , i) , , ,x ), where is s.
n
i n i n
f
f cx cf x c
1 2 1 2x x x x
calar.This condition means that if any row is multiplied by a scalar then it is equivalent to multyplying thewhole determinant by the scalar. ( , ,ii) . , + , x ) ( , , , , x , ,x ). Ifi i j n i j nf x cx x f x
1 2 1 2x x x x
scalr
multiple of ith row (col.) is added to the jth row (col.) the value of the determinant remains the same. x is written as sum of two vectors, x y z , then ( , , ,y z , x ) (iii). , , i i i i i i nf f
1 2
1 2
x x
,y , ,x )( , , , , ,x ). This means that if the i-th row (col.) is split as sum of two vectors(col.),y z ,
then the determinant becomes sum of two determinants. (e ,e , ,e , ,e ) 1, w eri hv).
i n
i n i i
i n
f zf
1 2e e ,e , ,e , ,e are the basic unit vectors. This condition says that the determinant of identity matrix is 1.i n
BHU Banaras Hindu University 16 DST-CIMS
Determinant
The conditions (i)-(iv) are called the postulates define the determinant of a square matrix. Standard
notation: |A|, det(A)= determinant of A.
Some Properties of Determinant:11 12
21 22
i. Determinant of a 2 2 matrix A= is |A|=ad-bc.
ii. The det. of square null matrix is zero. Determinant of a square matrix with one or more rows or column null is zero.iii. The de
a aa a
terminant of a diagonal matrix is the product of the diagonal elements. iv. The determinant of traingular matrix is the product of the diagonal elements.v. If any two row (col.) are interchange then the value of determinant of the new matrix is -1 times the value of original determinat.v. The value of determinant of a matrix of real numbers can be negative, positive, or
From zero.
postulate (ii) the value of determinant remains the same if any multiple of any row (col.) added to any other row (col.). Thus if one or more rows (col.) are LD on other rows (col.) then these dependent vi. |A| 0 iff all rows (col.) form
rows (col.) can be made null be LI set of vectors. And hence
Then t
For matrices A and B: |AB|
linea he determinant isr o
=|A|
zeperations. ro. (A)= (A)= .
vii. r c
n n
n
1
1 1 1 1 1 1
|B|
viii. A be any nxn matrix. Then matrix B, if exists, such that, AB=BA =I , denoted as B=A .
1ix. AA A A=I |AA | | A A|=|I|=1 |A||A | 1 |A
|
|A |
n
BHU Banaras Hindu University 17 DST-CIMS
Cofactor Expansion
Minors:
A=[a ] be m n matrix. Delete m rows and n columns, m<n. The determinant of the resulting submatrix is
called a minor. If the ith row and jth columns ar deleted then the determinant of the resulting sij
ubmatrix is called the minor of a .ij
11 22
2 0 1 2 4 1 4A= 1 2 4 then minor of a , minor of a1 5 0 50 1 5
Leading Minors:If the submatrices are formed by deleting the rows and columns from 2nd onward, from the 3rd onward,
and show on then the corresponding minors are called the leading minors.
2, 2 0 12 0 , 1 2 41 2 0 1 5
Cofactors :
aLet A=[a ] be nxn matrix. The cofactor of a is defined as (-1) times the minor . That is, if the cofactor
and minor of a is denoted by C and M respectively then:
ij
i j
ij ij
ij ij ij
C ( 1) Mij
i jij
BHU Banaras Hindu University 18 DST-CIMS
Cofactor Expansion
Evaluation of Determinant:
11 12 111 12 1
11
Let A=[a ] be nxn matrix, and cofactor and minor of a is denoted by C and M Then
C C C
.
A a a a
a
ij ij ij ij
nn
11 12 1
1 2
1 2
112 1
1 2
1 21 2
C C C
M a M ( 1) a M
a a a
a ( 1) M a ( 1) M ( 1) a M ; for i=1,2, n
n
i i in
i I in
nn
i i in
i i i ni i in
Cofactor Matrix:
11 12 1
21 22 2
1 2
Let A=[a ] be nxn matrix, and cofactor of a is denoted by C Then the cofactor matrix of A, cof(A):
cof(A)=
.
|C | |C | |C ||C | |C | |C |
|C | |C
ij ij ij
n
n
n n
| |C |nn
Inverse of Matrix:
1 T
Let A=[a ] be nxn matrix, the inverse of A, if it exist, is given by:
A
1[cof(A)] , |A| 0
| A |
ij
Singular and Non Singular Matrix:
A sqaure matrix A =[a] is said to be non-singular or singular according as |A| 0 or |A|=0 n n
BHU Banaras Hindu University 19 DST-CIMS
Cofactor Expansion
Invertbility of Matrix:
Rank of Matrix:A number r is said to be the rank of a matrix A if it possesses the following two properties:
There is at least one square submatrix of A of size rxr whose det. is not zero.
ii. If matrix contain any
i.
square submatrix of size (r+1)x(r+1), then the det. of every such square matrix
must be zero.
Following are equivalent statement:1 1 1
1 2 1 2
A exist AA A A I |A|=0 A is non-singular (A)=n (A)=n (A)=n
R={r r r } LI C={c ,c } is LI., , ,c ,r c
n n
BHU Banaras Hindu University 20 DST-CIMS
LT using Matrix
1 2 1 2T:U(F) V(F) is LT, B = { , , , } and B' = { , , , } be ordered bases for U and V. Then
B, each of the n vectors T( ) V is uniquely expressed as linear combination of elements
of B'. T(
n m
i j
j
1 1 2 21
1 111 21 1
21 22 m22 2
1 2 mn
)=a a a T( )= i.e.
T( ) a a a aT( )
= ; T [T; B; B'] matrix of
a aT( )
m
j j mj m j ij ii
m
m m m nn m
a
aa
a
T relative to B, B'.
Example: 2
1 2
Let T be a LT on vector space V (F) be defined by T(a,b)=(a,0). Find matrix of T relative to
standard bases B={e , e }={(1,0),(0,1)}
1 1 2 1 1
2 1 2 2 2
B
T(e )=T(1,0)=(1,0)=1(1,0)+0(0,1)=1e 0e T(e ) e1 00 0T(e )=T(0,1)=(0,0)=0(1,0)+0(0,1)=0e 0e T(e ) e
1 0The matrix of T relative to ordered basis B =T =[T;B]=0 0
.
BHU Banaras Hindu University 21 DST-CIMS
Eigen Value and Eigen Vector
Let T:V(F) V(F) be LT. The scalar c F is called a of T if x(=0) V, such that
T(x)=cx
Then is called as
eigen value
vector x ei corresponding togen vector
eigen value c.
Eigen Value and Eigen Vector of LT:
T(x)=cx T(x)=cI(x), where I is identity transform. T(x)-cI(x)=0 (T-cI)(x)=0 T'(x)=
x V such tha
0.,
whe
t T'(x)
re T' is LT an
=0 T' is sin
d T'=
gular. det(T')=0
T-cI.
Eigen Value and Eigen Vector of Matrix:
Let A be nxn matrix. Consider Eq.
Ax= x
where is scalar and x is an nx1 vector. Null vector is trivial solution of this equation. If the equation has
solution for a and
for a non-null x then is called an of A.
And the Non-null x satisfying equation for that particular is call
eigenvalue or characteristic or latent
ed eigenvector or characteristic vector
roo
o
t
la
r
tent vector corresponding to that eigenvalue .
Ax= x Ax= I(x) (A- I)x=0 is homogeneous linear equation have non-null solution
A- I is singular A- I 0
BHU Banaras Hindu University 22 DST-CIMS
Eigen Value and Eigen Vector
Properties:
i. The eigenvalues of a diagonal values of a matrix are its diagonal element.
ii. The eigenvalues of a triangular (upper or lower) matrix are its diagonal elements.
iii. The eigenvalues of a scalar matr
1 2 3
ix with the diagonal elements c each are c repeated n times.
iv. The eigenvalues of a Identity matrix are 1 repeated n times.
v. |A|=
vi. Matrix A is sigular if atleast its one eigenvalue in
1 2
11 22 1 2 3
A and A
If x , x
s zero.
vii. tr(A)=a + a + +a
viii. have the same eigenvalues.
ix. The eigenvectors corresponding different eigenvalue are LI.
x. are two eigenvector corresponding
.T
nn n
1 1 2 2 same eigenvalue then c is also
eigenvector for same eigenvalue.
xi. Eigen value of real symmetric matrix is real.
x. Eigenvectors corresponding different eigenvalue of real symmetric matr
x c x
ix are orthogonal.
BHU Banaras Hindu University 23 DST-CIMS
Similarity of Matrix
1
Let A and B be square matrices of order n. Then B is said to similar to A if there exists a non-
singular matrix P such that
B = P AP.
Diagonalizable Matrix:
Def.
1
A matrix A is said to be diagonalizable if it is similar to a diagonal matrix. Thus A diagonalizable
if there exists an invertable matrix P such that
P AP=D, wher e D is a diagonal matrix.
1. Similarity is equivalence relation.
2. If matrix A is similar to diagonal matrix D, then diagonal elements of D are eigenvalues
Note:
of A.
i. A nxn matrix is diagonalizable iff it possesses n LI eigenvector.
ii. If eigenvalues of an nxn matrix are all distinct then it is always similar to diagonal matrix.
iii. Two nxn matrices with t1 1
Spectral Decomposition for Symmetric
he same set of n distinct eigenvalues are similar.
iv. P AP=D A=PDP is EVD.
v. ( ): Square symmetric matrix A can be expressed
in terms of its
Mat
ei
rix
genva
1 1 1 2 2 2 3 3 3
lue-eigenvector pairs ( ,e ) as
A= e e e e e e e e
i i
T T T Tn n n
BHU Banaras Hindu University 24 DST-CIMS
Singular Value Decomposition
Similarity of Matrix
A singular value and corresponding singular vectors of a rectangular matrix A are, respectively, a scalar σ and a
pair of vectors u and v that satisfyTAv= u and A u= v
With the singular values on the diagonal of a diagonal matrix Σ and the corresponding singular vectors forming
the columns of two orthogonal matrices U and V, we have :TAV= U and A U= V
Since U and V are orthogonal, this becomes the singular value decomposition: TA=U V
Def.: TEvery mxn matrix A can be written A =U V where U is mxm, V is nxn orthogonal matrices and
is mxn diagonal matrix.
T T T T 2 T T 2 T
1. Diagonal Element of termed as singular values of A.
2. Using SVD directly we get A =(U V ) (U V )=V A =U
Columns of U and V represent
Note
the eigenvect
:
r
o s
A V and A U .
T T
2
of AA and A A respectively, and the diagonal
entries of represent their set of eigenvalues.
BHU Banaras Hindu University 25 DST-CIMS
LU Factorization:
Similarity of Matrix
Cholesky Factorization:The Cholesky factorization expresses a symmetric matrix as the product of a triangular
matrix and its transpose.TA R R
where R is an upper triangular matrix. Not all symmetric matrices can be factored in this way; the matrices that
have such a factorization are said to be positive definite. The Cholesky factorization allows the linear system:
to be replaced by to form triangular system of equation. Solved easily by forward and
backward substitution.
Ax=b TR Rx b
LU factorization, or Gaussian elimination, expresses any square matrix A as the product
of a permutation of a lower triangular matrix and an upper triangular matrix
A=LU
where L is a permutation of a lower triangular matrix with ones on its diagonal and U is an upper triangular ma-trix. 1 1 1
11 22A L U U u u u and A U Lnn
QR Factorization: The orthogonal, or QR, factorization expresses any rectangular matrix as the product
of an orthogonal or unitary matrix and an upper triangular matrix.
A=QRwhere Q is orthogonal or unitary, R is upper triangular.
BHU Banaras Hindu University 26 DST-CIMS
APPLICATION
Documents Ranking
BHU Banaras Hindu University 27 DST-CIMS
Rank documents in decreasing order of relevance to the query?
Documents Ranking
A collection consisting of the following five documents:
d1 = LSI tutorials and fast tracks. d2 = Books on semantic analysis. d3 = Learning latent semantic indexing.
d4 = Advances in structures and advances in indexing. d5 = Analysis of latent structures.
queried for latent semantic indexing (q).
Decreasing order of cosine similarities
Assume that:
1. Documents are linearized, tokenized, and their stop words removed. Stemming is not used. Survival terms
are used to construct a term-document matrix A. This matrix is populated with term weights :ij ij i ja L G N
, where frequency of term, ,in document . This is so-called FREQ model.
log( / ), where is the collection size and is the number of documents conatining term .
This is so calle
ij ij
i i i
L f i j
G D d D d i
2
d IDF model. IDF Satand for Inverse Document Frequency.
1 / l; i.e. document lengths are normalized to 1/l. In general, l is the so called L norm or Frobenius
length.
jN
log( / ) .ij ij i ja f D d N
BHU Banaras Hindu University 28 DST-CIMS
Documents Ranking
Query terms are scored using FREQ; i.e., , where is the frequency of term in the query q.2. iq iq iq iqa L f f i
Procedure:
n
n
Tn n
Compute A and q.
2. Normalize the document vectors query vector.
A A
3. Compute q
where n denotes normalized vector.
1.
q q
A .
d1 d2 d3 d4 d5
LSI 1 0 0 0 0Tutorials 1 0 0 0 0fast 1 0 0 0 0tracks 1 0 0 0 0books 0 1 0 0 0semantic 0 1 1 0 0analysis 0 1 0 0 1learning 0 0 1 0 0latent 0 0 1 0 1indexing 0 0 1 1 0advances 0 0 0 2 0structures 0 0 0 1 1
d1 = LSI tutorials and fast tracks.
d2 = Books on semantic analysis.
d3 = Learning latent semantic indexing.
d4 = Advances in structures and advances in indexing.
d5 = Analysis of latent structures.
Documents in collection:Term-Document Matrix
BHU Banaras Hindu University 29 DST-CIMS
Documents Ranking
Step1: Weight Matrix
d1 d2 d3 d4 d5
1log(5/1) 0 0 0 0
1log(5/1) 0 0 0 0
1log(5/1) 0 0 0 0
1log(5/1) 0 0 0 0
0 1log(5/1) 0 0 0
0 1log(5/2) 1log(5/2) 0 0
0 1log(5/2) 0 0 1log(5/2)
0 0 1log(5/1) 0 0
0 0 1log(5/2) 0 1log(5/1)
0 0 1log(5/2) 1log(5/1) 0
0 0 0 2log(5/1) 0
0 0 0 1log(5/1) 1log(5/1)
A= =
d1 d2 d3 d4 d5
0.6990 0 0 0 0
0.6990 0 0 0 0
0.6990 0 0 0 0
0.6990 0 0 0 0
0 0.6990 0 0 0
0 0.3979 0.3979 0 0
0 0.3979 0 0 0.3979
0 0 0.6990 0 0
0 0 0.3979 0 0.6990
0 0 0.3979 0.6990 0
0 0 0 1.3980 0
0 0 0 0.6990 0.6990
d1
0
0
0
0
0
1
0
0
1
1
0
0
q=
BHU Banaras Hindu University 30 DST-CIMS
Documents Ranking
Step2: Normalization:
An=
d1 d2 d3 d4 d5
0.5000 0 0 0 0
0.5000 0 0 0 0
0.5000 0 0 0 0
0.5000 0 0 0 0
0 0.7790 0 0 0
0 0.4434 0.4054 0 0
0 0.4434 0 0 0.5774
0 0 0.7121 0 0
0 0 0.4054 0 0.5774
0 0 0.4054 0.2640 0
0 0 0 0.9277 0
0 0 0 0.2640 0.5774
d1
0
0
0
0
0
0.5774
0
0
0.5774
0.5774
0
0
qn=
2Frobenius norm(L norms, Euclidean lengths) of documents:
Tn =q 0 0 0 0 0 0.5774 0 0 0.5774 0.5774 0 0
BHU Banaras Hindu University 31 DST-CIMS
Documents Ranking
Step3: Compute AnTnq
An=Tnq
d1 d2 d3 d4 d5
0 0.2560 0.7022 0.1524 0.3334
Documents rank as follows: 5 43 2 1d d d d d
Explain any difference in computed results.
Exercises
1. Repeat the above calculations, this time including all stopwords. Explain any difference in computed re-sults.
2. Repeat the above calculations, this time scoring global weights using IDF probabilistic (IDFP):log(( ) / )i iG D d d
BHU Banaras Hindu University 32 DST-CIMS
APPLICATION
Latent Semantic Indexing (LSI)
Using SVD
BHU Banaras Hindu University 33 DST-CIMS
Latent Semantic Indexing
Use of LSI to cluster term, and find the terms that could be used to expand or reformulate the query.
d1 = Shipment of gold damaged in a fire.
d2 = Delivery of silver arrived in a silver truck.
d3 = Shipment of gold arrived in a truck.
Example: Collection consist of following documents:
SVD
Every matrix A of dimensions m n m n can be decomposed as : A=U V
where
U has dimension m m, and col. are orthogonal, ie. UU
has dimension m n, the only non-zero elements are on main daia
- U U I .
-
T
T Tm m
gonal.
- V has dimension n n and its col. are orthogonal, i.e. VV
A U V
U is m p, with orthogonal col.
is p p, and diagonal.
- V n p with orth
I
-
-
T
T Tn n
p p p
p
p
p
V V
ogonal col.
Assume that the query is gold silver truck.
BHU Banaras Hindu University 34 DST-CIMS
Latent Semantic Indexing (Procedure)
Step1: Score term weights and construct the term – document matrix A and query matrix.
d1 d2 d3
a 1 1 1arrived 0 1 1damaged 1 0 0delivery 0 1 0fire 1 0 0gold 1 0 1in 1 1 1of 1 1 1shipment 1 0 1silver 0 2 0truck 0 1 1
1 1 10 1 11 0 00 1 01 0 01 0 11 1 11 1 11 0 10 2 00 1 1
A= q=
00000100011
BHU Banaras Hindu University 35 DST-CIMS
Latent Semantic Indexing (Procedure)
Step2-1: Decompose matrix A using SVD procedure into U, S and V matrices.
1 1 10 1 11 0 00 1 01 0 01 0 11 1 11 1 11 0 10 2 00 1 1
A=
A=U V
U= =
-0.49447 -0.64918 -0.57799
-0.64582 0.719447 -0.25556
-0.58174 -0.24691 0.774995
V=
BHU Banaras Hindu University 36 DST-CIMS
Latent Semantic Indexing (Procedure)
Step2-2: Decompose matrix A using SVD procedure into U, S and V matrices. A=U V
U= =-0.49447 -0.64918 -0.57799
-0.64582 0.719447 -0.25556
-0.58174 -0.24691 0.774995
V=
-0.42012 -0.0748 -0.04597
-0.29949 0.200092 0.407828
-0.12063 -0.27489 -0.4538
-0.15756 0.304648 -0.20065
-0.12063 -0.27489 -0.4538
-0.26256 -0.37945 0.154674
-0.42012 -0.0748 -0.04597
-0.42012 -0.0748 -0.04597
-0.26256 -0.37945 0.154674
-0.31512 0.609295 -0.40129
-0.29949 0.200092 0.407828
4.098872 0 0
0 2.361571 0
0 0 1.273669
Step3: Rank 2 Approximation :
4.098872 0
0 2.361571
-0.42012 -0.0748
-0.29949 0.200092
-0.12063 -0.27489
-0.15756 0.304648
-0.12063 -0.27489
-0.26256 -0.37945
-0.42012 -0.0748
-0.42012 -0.0748
-0.26256 -0.37945
-0.31512 0.609295
-0.29949 0.200092
Uk= k =-0.49447 -0.64918
-0.64582 0.719447
-0.58174 -0.24691
Vk=
BHU Banaras Hindu University 37 DST-CIMS
Latent Semantic Indexing (Procedure)
Step 4: Find the new term vector coordinates in this reduced 2-dimensonal space.
Rows of U holds eigenvector values. These are coordinates of the individual term vectors. Thus from the
reduced matrix (Uk) :1 a -0.42012 -0.0748
2 arrived -0.29949 0.200092
3 Damaged -0.12063 -0.27489
4 delivery -0.15756 0.304648
5 fire -0.12063 -0.27489
6 gold -0.26256 -0.37945
7 in -0.42012 -0.0748
8 of -0.42012 -0.0748
9 shipment -0.26256 -0.37945
10 silver -0.31512 0.609295
11 truck -0.29949 0.200092
Step 5: Find the new query vector coordinates in the reduced 2-dimensional space. UsingT 1
kq=q U Sk
-0.42012 -0.0748
-0.29949 0.200092
-0.12063 -0.27489
-0.15756 0.304648
-0.12063 -0.27489
-0.26256 -0.37945
-0.42012 -0.0748
-0.42012 -0.0748
-0.26256 -0.37945
-0.31512 0.609295
-0.29949 0.200092
0 0 0 0 0 1 0 0 0 1 1q= 10
4.09891
02.3616
= [-0.2140 -0.1821 ]
BHU Banaras Hindu University 38 DST-CIMS
Latent Semantic Indexing (Procedure)
Step 6: Group terms into clusters
Grouping is done by comparing cosine angles between any two pair of vectors.
The following clusters are obtained:1. a, in of2. gold, shipment3. damaged, fire4. arrived, truck5. silver6. delivery
Some vectors are not shown since these are completely
superimposed. This is the case of points 1 – 4.
If unit vectors are used and small deviation ignored, clusters
3 and 4 and clusters 4 and 5 can be merged.
BHU Banaras Hindu University 39 DST-CIMS
Latent Semantic Indexing (Procedure)
Step 7: Find terms that could be used to expand or reformulate the query
The query is gold silver truck. Note that in relation to the query, clusters 1, 2 and 3 are far away from
the query. Similarity wise these could be viewed as belonging to a “long tail”. If we insist in combin-
ing these with the query, possible expanded queries could be
gold silver truck shipment gold silver truck damaged
gold silver truck shipment damaged gold silver truck damaged in a fire
shipment of gold silver truck damaged in a fire etc…
Looking around the query, the closer clusters are 4, 5, and 6. We could use these clusters to expand
or reformulate the query. For example, the following are some of the expanded queries one could
test.
gold silver truck arrived delivery gold silver truck
gold silver truck delivery gold silver truck delivery arrived etc…
Documents containing these terms should be more relevant to the initial query.
BHU Banaras Hindu University 40 DST-CIMS
APPLICATION
Latent Semantic Indexing (LSI)
Exercise
BHU Banaras Hindu University 41 DST-CIMS
Latent Semantic Indexing (Exercise)
The svd was the original factorization proposed for Latent Semantic Indexing (LSI), the process of replacing
a term-document matrix A with a low-rank approximation Ap which reveals implicit relationships among
documents that don’t necessarily share common terms. Example:
Term D1 D2 D3 D4 D5
twain 53 65 0 30 1
clemens 10 20 40 43 0
huckleberry 30 10 25 52 70
A query on clemens will retrieve D1, D2, D3, and D4.
A query on twain will retrieve D1, D2, and D4.
For p = 2, the svd gives Term D1 D2 D3 D4 D5
twain 49 65 7 34 -5
clemens 23 22 14 30 21
huckleberry 25 9 34 57 63
Now a query on clemens will retrieve all documents.
A query on twain will retrieve D1, D2, D4, and possibly D3.
The negative entry is disturbing to some and motivates the nonnegative factorizations.
BHU Banaras Hindu University 42 DST-CIMS
References
1. Linear Algebra –I module 1, Vector and Matrices, by A.M. MATHAI, Centre for Mathematical Sciences (CMS) Pala.
2. Linear Algebra –II module 2, Determinants and Eigenvalues by A.M. MATHAI, Centre for Mathematical Sciences (CMS) Pala.
3. Introduction to Linear Algebra, Wellesley – Cambridge Press, 1993.
4. Matrix Computation, C. Golub and C. Van Loan, Johns Hopkins University Press, 1989.
5. Linear Algebra, A. R. Vasishtha and J.N. Sharma, Krishana Prakashan.
6. Matrices, A. R. Vasishtha and J.N. Sharma, Krishana Prakashan.
7. Linear Algebra, Ramji Lal, Sail Publication, Allahabad.
8. An Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, Hinrich Schutze,
Cambridge University Press.
Recommended