25
UCseal Semi-Supervised Learning on Graphs Spectral Graph Wavelet Theory Semi-Supervised Learning with SGWT Application Generalized Semi-Supervised Learning on Undirected Graphs with Multiscale Spectral Graph Wavelet Transformation Cheng Gao Advisor: Prof. Risi Kondor Department of Statistics The University of Chicago June 17, 2013

master thesis presentation

Embed Size (px)

Citation preview

Page 1: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning onUndirected Graphs with Multiscale Spectral

Graph Wavelet Transformation

Cheng Gao

Advisor: Prof. Risi Kondor

Department of StatisticsThe University of Chicago

June 17, 2013

Page 2: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Outline

1 Semi-Supervised Learning on Graphs2 Spectral Graph Wavelet Theory

Spectral Graph Wavelet TransformationSpectral Graph TheorySpectral Graph WaveletsScaling Waveform

3 Semi-Supervised Learning with SGWTGeneralized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

4 ApplicationBackgroundResults

Page 3: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Semi-Supervised Learning on Graphs

In Graph-based methods, we have a weighted graphwhere the nodes are the labeled and unlabeled data pointsand weighted edges reflect the similarity of nodes.

Most of the current graph-based approaches tosemi-supervised learning are based on the assumptionthat the target function is smooth with respect to the graphtopology.

However,labels may have large local variations. Onepossible way to solve this problem is to construct alearning method with multi-scale wavelets that canapproximate piecewise smooth signals.

Page 4: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Spectral Graph Wavelet TransformationSpectral Graph TheorySpectral Graph WaveletsScaling Waveform

Spectral Graph Wavelet Transformation

Weighted graphs G = (E ,V ,w):

N vertices, w a weight function of edge ESymmetric adjacency matrix A: Am,n = w(em,n) if em,n ∈ Ef (m) is the value on the mth node

Wavelets on weighted graphs:Linear,multiscale representation for functions on verticesovercomplete transformMeasure "how much of" a wavelet is present in the signal

SGWT:Flexibly model complicated data domainsCan be used to approximate piecewise smooth signals

Page 5: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Spectral Graph Wavelet TransformationSpectral Graph TheorySpectral Graph WaveletsScaling Waveform

Spectral Graph Theory

Graph Laplacian:

L = D − A where Di,i =N∑

k=1Ak ,i is the degree matrix

Spectral decomposition of L:Lχ` = λ`χ` (1)

Graph “Fourier transform”:

f (`) =< χ`, f >=N∑

n=1

χ∗` (n)f (n). (2)

Graph “inverse Fourier transform”:

f (n) =N−1∑`=0

f (`)χ`(n). (3)

Page 6: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Spectral Graph Wavelet TransformationSpectral Graph TheorySpectral Graph WaveletsScaling Waveform

Spectral Graph Wavelets

Wavelet kernels g : R+ → R+:behave as a band-pass filterlocalization at small scales

Wavelet operator at scale t :

T tg = g(tL) (4)

Wavelets:ψt ,n = T t

gδn. (5)

where ψt ,n(m) =N−1∑l=0

g(tλl)χ∗l (n)χl(m).

Wavelet coefficients:

Wf (t ,n) = (T tg f )(n) =< ψt ,n, f >=

N−1∑`=0

g(tλ`)f (`)χ`(m). (6)

Page 7: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Spectral Graph Wavelet TransformationSpectral Graph TheorySpectral Graph WaveletsScaling Waveform

Scaling Waveform

Scaling waveforms:Used to represent the low frequency content of signal fHelps ensure stable recovery of the original signal f fromthe wavelet coefficients when the scale parameter t issampled at a discrete number of values

Scaling kernel h: h(0) > 0 and h(x)→ 0 as x →∞

Spectral graph scaling functions:

φn = Thδn = h(L)δn (7)

Stable recovery:Hammond [1] proves that stable recovery will be assured if the

quantity G(λ) = h(λ)2 +J∑

j=1g(tjλ)2 is bounded away from zero

on the spectrum of L by a upper bound A and a lower bound B.

Page 8: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Spectral Graph Wavelet TransformationSpectral Graph TheorySpectral Graph WaveletsScaling Waveform

SGWT Design

Page 9: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

Reconstruction

SGWT Reconstruction[1]:V = (W ∗W )−1W ∗c (8)

Simplified "Reconstruction":V = W ∗b (9)

Lemma

Let W be the linear map RN → RN(J+1) with ψtk ,n(m) as(kN + m)th row and nth column element of the matrix, thenW ∗W is a diagonal matrix.

Page 10: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

Generalized Regularization Framework

b = arg minb

1|N`|

∑i∈N`

ωiV (fi , (W ∗b)i) + λ‖bW‖τ ;q,p , (10)

where N` is a set of labeled data; ωi is the weight for the i th

label; bW represents the wavelet coefficients vector; The `q/`ppenalty function: ‖bW‖τ ;q,p = (

∑Nn=1(

∑Lj=1 τj,n‖bW

tj ,n‖p)

qp )

1q ; λ is

the tuning parameter; V is the loss function.

Why use this penalization?wavelet coefficients across spatial locations tend to besparse, but the scaling coefficients are not.coefficients at the same location are consistent.

Page 11: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

Optimization

Douglas-Rachford Splitting method [2]:Splitting in that the two functions are used individuallyConvex function is involved by its proximity operator(prox)Ability to deal with non-smooth convex functionsQuick convergence rates

Algorith scheme:Fix ε ∈ [0,1], γ > 0, y0 ∈ RN

For n = 0,1, ...xn = proxγf2ynλn ∈ [ε,2− ε]yn+1 = yn + λn(proxγf1(2xn − yn)− xn).

Page 12: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

Semi-Supervised Squared Loss Regression

Assumptionslabels belong to a continuous space.labels are piecewise smooth with repsect to the graphtopology.

Model Set-Up

b = arg minb

1|N`|‖f` −W ∗

` b‖2 + λ‖bW‖τ ;p,1. (11)

.where f` is a vector with the i th entry as wi ∗ fi ;W∗` has thecorresponding i th row of W* where i ∈ N`.

Page 13: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

Example 1

X=1,2,...,1000Jump at x=50025% labeled dataThresholdedGaussian weightfunction:

ωu,v=

e−

(xu−xv )2

2σ2 if ‖xu−xv‖≤$

0 o.w.

Page 14: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

Estimated Wavelet Coefficients

True Coefficients Estimated Coefficient

Both are only active around the discontinuities in signals

Page 15: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

Robustness Test by Adding Noise in Labels

Note: Normal error with mean 0 and standard deviation sigma is added to the true label.

Page 16: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

Semi-Supervised Support Vector Machine

Assumption

Labels are −1,1

Model Set-Up

b = arg minb

1|N`|

∑i∈N`

max (0,1− yiΩib) + λ‖bW‖τ ;1,p. (12)

where yi is the i th entry of the label vector fl ; Ωi is the i th row ofW ∗

l .We can give higher penalty to higher frequency informationto "smooth" the information.The tuning parameter λ controls theoverall amount of penalization.

Page 17: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

Example 2

1000 nodes x = (x1, x2) randomly placed in the [0,1]× [0,1]square. The labels are given by

f (x) =

−1 if x1 < 0.8 & x2 ≤ (x1 + 0.5)2

1 o.w .(13)

Original Signal 25% Sample Estimated Signal

Note: The misclassification rate is 0.051.

Page 18: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

Average Missclassification VS Sample Size

Page 19: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

Robustness Test by Changing Graph

Page 20: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

Example 3

f (x) =

−1 if (x1≤0.4 & x2≥(x1+0.8)2) | (x1−0.45)2+(x2−0.45)2≤0.04 | (x2≤x1−0.5 & x1≥0.5)

1 o.w

Only 5% of the data is sampled.

Original Signal 5% Sample Estimated Signal

Page 21: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

Control the Smoothness

To "smooth" the result, we used the function below to givehigher penalty to higher freqency information.

τj,n ∝ t−5j (14)

where tj is the value of j th scale j = 1,2,3,4.

The model with that structure of penalty at different scaleshas smaller missclassification rate(0.095) than the modelwith same penalty at different scales(0.119) when we onlyhave 5% of labeled data.

Page 22: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

BackgroundResults

Background

The Semi-Supervised Support Vector Machine with SGWT isapplied to document categorization experiments using the 20newsgroups dataset.

This dataset contains 205 Life Sciences articles and 280Medicine articles from Science News and 1153 words arechosen as being relevant for this body of documents.

A document-word matrix whose entry (i ; j) is the frequency ofthe jth word in the dictionary in the ith document wasconstructed. The following weight function on the edges isused:

wu,v = exp(− 1

0.03

(1− uT v|u||v |

))(15)

Page 23: master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

BackgroundResults

Average Missclassification VS Sample Size

Page 24: master thesis presentation

UCseal

AppendixAcknowledgementReference

Thanks!

Acknowledgement: Prof. Risi Kondor

Page 25: master thesis presentation

UCseal

AppendixAcknowledgementReference

Reference I

K. D. Hammond, P. Vandergheynst, R. Gribonval(2009),Wavelets on Graphs via Spectral Graph Theory,Applied and Computational Harmonic Analysis 30 129-150.

P. Combettes and J. Pesquet(2007).A douglas-rachford splitting approach to nonsmooth convexvariational signal recovery.IEEE J. Selected Topics in Signal Processing,1(4):564-574.