Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Data Mining Lab, Big Data Research Center, UESTCEmail: [email protected]: //dm.uestc.edu.cn/
DMLESS IS MORE
Multi-View Learning:NMF Based Method
Zhong Zhang
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Outline
• Introduction
• Motivation, assumptions and existing methods
• Non-negative Matrix Factorization(NMF) basedmethod
• A straightforward case
• A little derivation
DMLESS IS MORE Data Mining Lab
数据挖掘实验室
Figure: a) a web document can be represented by its url and words on the page, b) a web image can be depicted by its surrounding text separate to the visual information, c) images of a 3D object taken from different viewpoints, d) video clips are combinations of audio signals and visual frames, e) multilingual documents have one view in each language.
Introduction
DMLESS IS MORE Data Mining Lab
数据挖掘实验室
Section 1Motivation, assumptions and
existing methods
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Motivation
Tasks• Multi-view clustering• Multi-view classification• Multi-view semi-supervised learning• Multi-view transfer learning• Multi-view dimensionality reduction• Multi-view ensemble learning• Multi-view image denoising.• Multi-view matrix completion• …
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Motivation
Tasks• Multi-view clustering• Multi-view classification
Common assumptions• Consensus (共识): different views share a common part or latent subspace.• Complementary (互补): each view has it’s special information.
• Sufficiency (充足性): each view is sufficient for classification on its own.• Compatibility (兼容性): the target function of views predict the same labels
for co-occurring features with a high probability• Conditional independent (条件独立性): views are conditionally independent
given the label
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Motivation
Some other assumptions• Partial view assumption: some instances appear in one view but not in
another view.• Multi-task assumption: some instances’ label are not consistent in views.• Multi-label assumption: instances in views share partial consistent labels.• …
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Motivation
Key points• How to integrate multi-view data, as well as build an efficient model to solve
your problem.• For clustering task, how to extract consistent or common information from
multi-view data.• For classification task, how to extract discriminative information while fusing
multi-view data.
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Methods
Common methods• Co-training, co-EM: trains alternately to maximize the mutual agreement on
two distinct views of the unlabeled data.• Multiple kernel learning: exploits kernels correspond to different views to do
data fusion.• Subspace learning (projection): learning a common or distinctive subspace,
e.g. CCA, LRR, RPCA, NMF…• Semi-supervised learning• Transfer learning• Metric learning• Propagation• Probability graph model• …
DMLESS IS MORE Data Mining Lab
数据挖掘实验室
Section 2NMF based method
DMLESS IS MORE Data Mining Lab
数据挖掘实验室NMF
Standard NMF
原始矩阵V的列向量是对左矩阵W中所有列向量的加权和,而权重系数就是右矩阵H对应列向量的元素,故W称为基矩阵,H为系数矩阵。
V Î Rm´n,W Î Rm´k,H Î Rk´n
minW ,H
||V -WH ||F2 s.t. W ³ 0,H ³ 0
DMLESS IS MORE Data Mining Lab
数据挖掘实验室NMF
Update rules• There are many ways to derive update rules as flowing:
• If use divergence to describe reconstruction loss:
we have update rules:
DMLESS IS MORE Data Mining Lab
数据挖掘实验室NMF
Advantages of NMF• Interpretable• Adoptable and flexible• Mature theory• Easy to optimize• Suitable for multi-view learning• …
Disadvantages of NMF• Sensitive to parameter• Locally optimal solution• Initialization dependent• Lack of innovation• …
DMLESS IS MORE Data Mining Lab
数据挖掘实验室NMF
Semi-Supervised NMF (SSNMF)• Idea: Using label information to minimize empirical loss to restrict coefficient
matrix be more discriminative.
• Target: minW ,H ,B
|| I ·(V -WH ) ||F2 +l || L·(Y -BH ) ||F
2
s.t.W ³ 0,H ³ 0,B ³ 0
Ii, j =1, if Vi, j is observed
0, if Vi, j is unobserved
ìíï
îï
L:, j =1, if the label of v j is know
0, otherwise
ìíî
W =W ·[I ·V ]H T
[I ·WH ]H T
B = B·[L·Y ]H T
[L·BH ]H T
H = H ·W T [I ·V ]+ lBT [L·Y ]
W T [I ·WH ]+ lBT [L·BH ]
DMLESS IS MORE Data Mining Lab
数据挖掘实验室NMF
Fisher NMF (FNMF)• Idea: Add Fisher discrimination criterion to encode supervised information.
• Target:
DMLESS IS MORE Data Mining Lab
数据挖掘实验室NMF
Graph regularized NMF(GNMF)• Idea: Construct a graph(similarity matrix) to describe the similarity or
distance among instances, and encode this similarity into NMF.
• Target:
DMLESS IS MORE Data Mining Lab
数据挖掘实验室NMF
Constrained NMF(CNMF)• Idea: Construct a indicator matrix to force the latent representations to be
the same if they have same label.
• Target:
• introducing an auxiliary matrix Z:
• A is indicator matrix. Ai,j=1 if instance i has label j. And unlabeled data is corresponding to an identity matrix like this:
• If xi and xj have the same label, then vi = vj.
DMLESS IS MORE Data Mining Lab
数据挖掘实验室
NMF in Multi-View
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Unsupervised NMF
MultiNMF(Jialu Liu, ICDM, 2013)• Idea: Each view can be decomposed as it’s own basis and it’s own coefficient,
by force all coefficient matrices to agree on one shared V* to integrate multi-view data.
• Method: Learning a shared coefficient matrix V* from multi view
• Target:
• From another point of view, we can think each view corresponds to a specific subspace from which data is generated. And there is consistency among views, so we can get a shared V* from it.
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Unsupervised NMF
MultiNMF(Jialu Liu, ICDM, 2013)• Method: Learning a shared coefficient matrix V* from multi view
• Target:
为什么不直接考虑一个common的V呢?
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Unsupervised NMF
ColNMF(Akata, CVWW, 2011)(Yuhong Guo, AAAI, 2013)• Method: Using the shared coefficient matrix but different basis matrices
across views:
• Target:
• Note:If set λv = 1, ColNMF is equivalent to concatenate the features of all views,and then run NMF directly on this concatenated view.
• Similarly:
equivalent to concatenate the instances of all views, so people rarely do this without modification…
X (v) -U*V (v)( )F
2
v=1
n
å
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Unsupervised NMF
SCDTM(Hannah Kim, KDD, 2015)• Idea: There are common and discriminative topics in document collections,
reveal those two kinds of topics and do clustering and topic modeling.
• Method: Learning a partial shared basis matrix, achieved by adding constraint to divide basis matrix into common and discriminative parts.
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Unsupervised NMF
JSNMF(Sunil Kumar Gupta, KDD, 2010)• Idea: Similar work like last one. It assume two views shared a partial
common subspace. In this paper, it use the discriminant subspace and joint shared subspace to improve social media retrieval performance.
RJSNMF(Sunil Kumar Gupta, DMKD, 2011)• 把上面的式子改一改,加一发正则项,又得到了一篇新的工作
• Adding constraint on the individual subspaces to prevent them from capturing basis vectors from the shared subspace.
• Adding constraint that stops leakage of individual basis vectors into the shared subspace.
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Unsupervised NMF
PVC(Shao-Yuan Li, AAAI, 2014)• Idea: There are some instances do not appear in all views. So instances
corresponding to the same example in different views are sharing a commonlatent representation.
• Method: Learning a partial shared coefficient matrix.• Note: The notation in this paper has little difference with previous papers,
which is P denotes coefficient matrix, and U denotes basis matrix.
• Target:
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Supervised NMF
SULF(Yu Jiang, Machine Vision and Applications, 2014)• Idea: Similar to ColNMF, it assumes that instances of views are generated
from different subspace, but their latent representations are the same since they are one certain object’s different angles. In addition, label information are integrated to help classification or clustering.
• Method: Jointly learning a unified latent space shared across multiple views, in the mean time it minimize prediction loss on latent representation.
• Target:
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Supervised NMF
PSLF(Jing Liu, IEEE T NEUR NET LEAR, 2015)• Idea: Consider partial shared coefficient matrix and labeled/unlabeled data
on the same time.• Method: There’s just slightly difference between SULF. It consider every
coefficient matrix has shared part and specific part.• Target:
DMLESS IS MORE Data Mining Lab
数据挖掘实验室
Section 3A straightforward case
DMLESS IS MORE Data Mining Lab
数据挖掘实验室AMIV
AMIV(Yue Zhu, ACML, 2014)• Motivation:
• Consider a scenario, a single-instance view augmented with a multi-instance view, i.e. an instance(a feature vector) x, then with the augmented information(a bag of instances) B. Note there are multiple instances with unknown labels in B.
• The task is to learn a latent semantic subspace (LSS) and representation of a single-instance view as well as augmented multi-instance view, the handles classification in a latent semantic subspace.
• Major assumption: • single-instance view instance and its corresponding augmented multi-
instance view bag B is close to each other in LSS.
A paper(feature vector)
Reference bag
.
.
.
REF 1
REF 2
REF n
DMLESS IS MORE Data Mining Lab
数据挖掘实验室AMIV
AMIV(Yue Zhu, ACML, 2014)• Idea 1:
• Documents can be decomposed to establish the latent semantic subspace (LSS) and obtain representations of instances.
n instances in the single-instance view
the representation of X in LSS
the basis of LSS
DMLESS IS MORE Data Mining Lab
数据挖掘实验室AMIV
AMIV(Yue Zhu, ACML, 2014)• Idea 2:
• The augment view also comes from the LSS, but in another representation.
• Z is instances in the augmented multi-instance view, where
DMLESS IS MORE Data Mining Lab
数据挖掘实验室AMIV
AMIV(Yue Zhu, ACML, 2014)• Problem:
• Multiple instances in B have unknown labels, manually labeling those augment instances is impractical, meanwhile the augment view may not so robust(i.e. not all labels of instances in B are identical with it’s primal instance’s label).
DMLESS IS MORE Data Mining Lab
数据挖掘实验室AMIV
AMIV(Yue Zhu, ACML, 2014)• Idea 3:
• Consider a key prototype to represent the whole bag of instances.• Obviously taking the center of instances in that bag is not so god.
• By taking a locally linear assumption (Roweis and Saul, 2000), it assume that prototype si of bag Bi is the linear combination of intra bag instances in the neighborhood of xi.
Where δ is a k nearest neighbors indicator.• Also the prototype has it’s latent representation:
DMLESS IS MORE Data Mining Lab
数据挖掘实验室AMIV
AMIV(Yue Zhu, ACML, 2014)• Combining together:
• Stage 2:• In order to handle classification task in LSS, it train a SVM with an
additional constraint that representation of a prototype si should hold the same label as representation of corresponding single-instance view instance xi.
DMLESS IS MORE Data Mining Lab
数据挖掘实验室AMIV
AMIV(Yue Zhu, ACML, 2014)
DMLESS IS MORE Data Mining Lab
数据挖掘实验室
Section 4Derivation
DMLESS IS MORE Data Mining Lab
数据挖掘实验室Derivation
Solving NMF problem• Optimizing all variables is a non-convex problem, usually optimize one
variable while fixing the other.• Using gradient descend algorithm and KKT conditions to derive multiplicative
update rules。• Appropriate step factor in GD is constructed.
KKT conditions
DMLESS IS MORE Data Mining Lab
数据挖掘实验室
Conclusion• NMF是一种简单,强大的工具,不仅仅在multi-view中,在其他许多方向
都可以使用。• 在multi-view中,使用NMF有很强的灵活性,适用于各种场景和假设。• NMF有很强的解释性,这也许是它最大的优点,并且天然地适合multi-
view场景。• 正因为NMF简单,因此论文中的motivation和assumption往往比方法本身
更重要,目标函数完全是根据自己的假设和应用场景灵活设计,因此简单的方法也能做出很厉害的文章。
• 结果高度依赖调参和初始化。
Data Mining Lab, Big Data Research Center, UESTCEmail: [email protected]: //dm.uestc.edu.cn/
DMLESS IS MORE
Thanks