Multi-View Learning - Data Mining Labdm.uestc.edu.cn/wp-content/uploads/seminar/multi-view.pdf · LESS IS MORE Data Mining Lab 数据挖掘实验室 Figure: a) a web document can

Data Mining Lab, Big Data Research Center, UESTCEmail: [email protected]: //dm.uestc.edu.cn/

DMLESS IS MORE

Multi-View Learning:NMF Based Method

Zhong Zhang

DMLESS IS MORE Data Mining Lab

数据挖掘实验室Outline

• Introduction

• Motivation, assumptions and existing methods

• Non-negative Matrix Factorization(NMF) basedmethod

• A straightforward case

• A little derivation


数据挖掘实验室

Figure: a) a web document can be represented by its url and words on the page, b) a web image can be depicted by its surrounding text separate to the visual information, c) images of a 3D object taken from different viewpoints, d) video clips are combinations of audio signals and visual frames, e) multilingual documents have one view in each language.

Introduction



Section 1Motivation, assumptions and

existing methods


数据挖掘实验室Motivation

Tasks• Multi-view clustering• Multi-view classification• Multi-view semi-supervised learning• Multi-view transfer learning• Multi-view dimensionality reduction• Multi-view ensemble learning• Multi-view image denoising.• Multi-view matrix completion• …



Tasks• Multi-view clustering• Multi-view classification

Common assumptions• Consensus (共识): different views share a common part or latent subspace.• Complementary (互补): each view has it’s special information.

• Sufficiency (充足性): each view is sufficient for classification on its own.• Compatibility (兼容性): the target function of views predict the same labels

for co-occurring features with a high probability• Conditional independent (条件独立性): views are conditionally independent

given the label



Some other assumptions• Partial view assumption: some instances appear in one view but not in

another view.• Multi-task assumption: some instances’ label are not consistent in views.• Multi-label assumption: instances in views share partial consistent labels.• …



Key points• How to integrate multi-view data, as well as build an efficient model to solve

your problem.• For clustering task, how to extract consistent or common information from

multi-view data.• For classification task, how to extract discriminative information while fusing

multi-view data.


数据挖掘实验室Methods

Common methods• Co-training, co-EM: trains alternately to maximize the mutual agreement on

two distinct views of the unlabeled data.• Multiple kernel learning: exploits kernels correspond to different views to do

data fusion.• Subspace learning (projection): learning a common or distinctive subspace,

e.g. CCA, LRR, RPCA, NMF…• Semi-supervised learning• Transfer learning• Metric learning• Propagation• Probability graph model• …



Section 2NMF based method


数据挖掘实验室NMF

Standard NMF

原始矩阵V的列向量是对左矩阵W中所有列向量的加权和，而权重系数就是右矩阵H对应列向量的元素，故W称为基矩阵，H为系数矩阵。

V Î Rm´n,W Î Rm´k,H Î Rk´n

minW ,H

||V -WH ||F2 s.t. W ³ 0,H ³ 0



Update rules• There are many ways to derive update rules as flowing:

• If use divergence to describe reconstruction loss:

we have update rules:



Advantages of NMF• Interpretable• Adoptable and flexible• Mature theory• Easy to optimize• Suitable for multi-view learning• …

Disadvantages of NMF• Sensitive to parameter• Locally optimal solution• Initialization dependent• Lack of innovation• …



Semi-Supervised NMF (SSNMF)• Idea: Using label information to minimize empirical loss to restrict coefficient

matrix be more discriminative.

• Target: minW ,H ,B

|| I ·(V -WH ) ||F2 +l || L·(Y -BH ) ||F

2

s.t.W ³ 0,H ³ 0,B ³ 0

Ii, j =1, if Vi, j is observed

0, if Vi, j is unobserved

ìíï

îï

L:, j =1, if the label of v j is know

0, otherwise

ìíî

W =W ·[I ·V ]H T

[I ·WH ]H T

B = B·[L·Y ]H T

[L·BH ]H T

H = H ·W T [I ·V ]+ lBT [L·Y ]

W T [I ·WH ]+ lBT [L·BH ]



Fisher NMF (FNMF)• Idea: Add Fisher discrimination criterion to encode supervised information.

• Target:



Graph regularized NMF(GNMF)• Idea: Construct a graph(similarity matrix) to describe the similarity or

distance among instances, and encode this similarity into NMF.

• Target:



Constrained NMF(CNMF)• Idea: Construct a indicator matrix to force the latent representations to be

the same if they have same label.

• Target:

• introducing an auxiliary matrix Z:

• A is indicator matrix. Ai,j=1 if instance i has label j. And unlabeled data is corresponding to an identity matrix like this:

• If xi and xj have the same label, then vi = vj.



NMF in Multi-View


数据挖掘实验室Unsupervised NMF

MultiNMF(Jialu Liu, ICDM, 2013)• Idea: Each view can be decomposed as it’s own basis and it’s own coefficient,

by force all coefficient matrices to agree on one shared V* to integrate multi-view data.

• Method: Learning a shared coefficient matrix V* from multi view

• Target:

• From another point of view, we can think each view corresponds to a specific subspace from which data is generated. And there is consistency among views, so we can get a shared V* from it.



MultiNMF(Jialu Liu, ICDM, 2013)• Method: Learning a shared coefficient matrix V* from multi view

• Target:

为什么不直接考虑一个common的V呢？



ColNMF(Akata, CVWW, 2011)(Yuhong Guo, AAAI, 2013)• Method: Using the shared coefficient matrix but different basis matrices

across views:

• Target:

• Note:If set λv = 1, ColNMF is equivalent to concatenate the features of all views,and then run NMF directly on this concatenated view.

• Similarly:

equivalent to concatenate the instances of all views, so people rarely do this without modification…

X (v) -U*V (v)( )F

2

v=1

n

å



SCDTM(Hannah Kim, KDD, 2015)• Idea: There are common and discriminative topics in document collections,

reveal those two kinds of topics and do clustering and topic modeling.

• Method: Learning a partial shared basis matrix, achieved by adding constraint to divide basis matrix into common and discriminative parts.



JSNMF(Sunil Kumar Gupta, KDD, 2010)• Idea: Similar work like last one. It assume two views shared a partial

common subspace. In this paper, it use the discriminant subspace and joint shared subspace to improve social media retrieval performance.

RJSNMF(Sunil Kumar Gupta, DMKD, 2011)• 把上面的式子改一改，加一发正则项，又得到了一篇新的工作

• Adding constraint on the individual subspaces to prevent them from capturing basis vectors from the shared subspace.

• Adding constraint that stops leakage of individual basis vectors into the shared subspace.



PVC(Shao-Yuan Li, AAAI, 2014)• Idea: There are some instances do not appear in all views. So instances

corresponding to the same example in different views are sharing a commonlatent representation.

• Method: Learning a partial shared coefficient matrix.• Note: The notation in this paper has little difference with previous papers,

which is P denotes coefficient matrix, and U denotes basis matrix.

• Target:


数据挖掘实验室Supervised NMF

SULF(Yu Jiang, Machine Vision and Applications, 2014)• Idea: Similar to ColNMF, it assumes that instances of views are generated

from different subspace, but their latent representations are the same since they are one certain object’s different angles. In addition, label information are integrated to help classification or clustering.

• Method: Jointly learning a unified latent space shared across multiple views, in the mean time it minimize prediction loss on latent representation.

• Target:


数据挖掘实验室Supervised NMF

PSLF(Jing Liu, IEEE T NEUR NET LEAR, 2015)• Idea: Consider partial shared coefficient matrix and labeled/unlabeled data

on the same time.• Method: There’s just slightly difference between SULF. It consider every

coefficient matrix has shared part and specific part.• Target:



Section 3A straightforward case


数据挖掘实验室AMIV

AMIV(Yue Zhu, ACML, 2014)• Motivation:

• Consider a scenario, a single-instance view augmented with a multi-instance view, i.e. an instance(a feature vector) x, then with the augmented information(a bag of instances) B. Note there are multiple instances with unknown labels in B.

• The task is to learn a latent semantic subspace (LSS) and representation of a single-instance view as well as augmented multi-instance view, the handles classification in a latent semantic subspace.

• Major assumption: • single-instance view instance and its corresponding augmented multi-

instance view bag B is close to each other in LSS.

A paper(feature vector)

Reference bag

.

.

.

REF 1

REF 2

REF n



AMIV(Yue Zhu, ACML, 2014)• Idea 1:

• Documents can be decomposed to establish the latent semantic subspace (LSS) and obtain representations of instances.

n instances in the single-instance view

the representation of X in LSS

the basis of LSS




• The augment view also comes from the LSS, but in another representation.

• Z is instances in the augmented multi-instance view, where



AMIV(Yue Zhu, ACML, 2014)• Problem:

• Multiple instances in B have unknown labels, manually labeling those augment instances is impractical, meanwhile the augment view may not so robust(i.e. not all labels of instances in B are identical with it’s primal instance’s label).




• Consider a key prototype to represent the whole bag of instances.• Obviously taking the center of instances in that bag is not so god.

• By taking a locally linear assumption (Roweis and Saul, 2000), it assume that prototype si of bag Bi is the linear combination of intra bag instances in the neighborhood of xi.

Where δ is a k nearest neighbors indicator.• Also the prototype has it’s latent representation:



AMIV(Yue Zhu, ACML, 2014)• Combining together:

• Stage 2：• In order to handle classification task in LSS, it train a SVM with an

additional constraint that representation of a prototype si should hold the same label as representation of corresponding single-instance view instance xi.



AMIV(Yue Zhu, ACML, 2014)



Section 4Derivation


数据挖掘实验室Derivation

Solving NMF problem• Optimizing all variables is a non-convex problem, usually optimize one

variable while fixing the other.• Using gradient descend algorithm and KKT conditions to derive multiplicative

update rules。• Appropriate step factor in GD is constructed.

KKT conditions



Conclusion• NMF是一种简单，强大的工具，不仅仅在multi-view中，在其他许多方向

都可以使用。• 在multi-view中，使用NMF有很强的灵活性，适用于各种场景和假设。• NMF有很强的解释性，这也许是它最大的优点，并且天然地适合multi-

view场景。• 正因为NMF简单，因此论文中的motivation和assumption往往比方法本身

更重要，目标函数完全是根据自己的假设和应用场景灵活设计，因此简单的方法也能做出很厉害的文章。

• 结果高度依赖调参和初始化。

Data Mining Lab, Big Data Research Center, UESTCEmail: [email protected]: //dm.uestc.edu.cn/

DMLESS IS MORE

Thanks