master thesis presentation

UCseal

Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory

Semi-Supervised Learning with SGWTApplication

Generalized Semi-Supervised Learning onUndirected Graphs with Multiscale Spectral

Graph Wavelet Transformation

Cheng Gao

Advisor: Prof. Risi Kondor

Department of StatisticsThe University of Chicago

June 17, 2013

UCseal



Outline

1 Semi-Supervised Learning on Graphs2 Spectral Graph Wavelet Theory

Spectral Graph Wavelet TransformationSpectral Graph TheorySpectral Graph WaveletsScaling Waveform

3 Semi-Supervised Learning with SGWTGeneralized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

4 ApplicationBackgroundResults

UCseal



Semi-Supervised Learning on Graphs

In Graph-based methods, we have a weighted graphwhere the nodes are the labeled and unlabeled data pointsand weighted edges reflect the similarity of nodes.

Most of the current graph-based approaches tosemi-supervised learning are based on the assumptionthat the target function is smooth with respect to the graphtopology.

However,labels may have large local variations. Onepossible way to solve this problem is to construct alearning method with multi-scale wavelets that canapproximate piecewise smooth signals.

UCseal




Spectral Graph Wavelet Transformation

Weighted graphs G = (E ,V ,w):

N vertices, w a weight function of edge ESymmetric adjacency matrix A: Am,n = w(em,n) if em,n ∈ Ef (m) is the value on the mth node

Wavelets on weighted graphs:Linear,multiscale representation for functions on verticesovercomplete transformMeasure "how much of" a wavelet is present in the signal

SGWT:Flexibly model complicated data domainsCan be used to approximate piecewise smooth signals

UCseal




Spectral Graph Theory

Graph Laplacian:

L = D − A where Di,i =N∑

k=1Ak ,i is the degree matrix

Spectral decomposition of L:Lχ` = λ`χ` (1)

Graph “Fourier transform”:

f (`) =< χ`, f >=N∑

n=1

χ∗` (n)f (n). (2)

Graph “inverse Fourier transform”:

f (n) =N−1∑`=0

f (`)χ`(n). (3)

UCseal




Spectral Graph Wavelets

Wavelet kernels g : R+ → R+:behave as a band-pass filterlocalization at small scales

Wavelet operator at scale t :

T tg = g(tL) (4)

Wavelets:ψt ,n = T t

gδn. (5)

where ψt ,n(m) =N−1∑l=0

g(tλl)χ∗l (n)χl(m).

Wavelet coefficients:

Wf (t ,n) = (T tg f )(n) =< ψt ,n, f >=

N−1∑`=0

g(tλ`)f (`)χ`(m). (6)

UCseal




Scaling Waveform

Scaling waveforms:Used to represent the low frequency content of signal fHelps ensure stable recovery of the original signal f fromthe wavelet coefficients when the scale parameter t issampled at a discrete number of values

Scaling kernel h: h(0) > 0 and h(x)→ 0 as x →∞

Spectral graph scaling functions:

φn = Thδn = h(L)δn (7)

Stable recovery:Hammond [1] proves that stable recovery will be assured if the

quantity G(λ) = h(λ)2 +J∑

j=1g(tjλ)2 is bounded away from zero

on the spectrum of L by a upper bound A and a lower bound B.

UCseal




SGWT Design

UCseal



Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT

Reconstruction

SGWT Reconstruction[1]:V = (W ∗W )−1W ∗c (8)

Simplified "Reconstruction":V = W ∗b (9)

Lemma

Let W be the linear map RN → RN(J+1) with ψtk ,n(m) as(kN + m)th row and nth column element of the matrix, thenW ∗W is a diagonal matrix.

UCseal




Generalized Regularization Framework

b = arg minb

1|N`|

∑i∈N`

ωiV (fi , (W ∗b)i) + λ‖bW‖τ ;q,p , (10)

where N` is a set of labeled data; ωi is the weight for the i th

label; bW represents the wavelet coefficients vector; The `q/`ppenalty function: ‖bW‖τ ;q,p = (

∑Nn=1(

∑Lj=1 τj,n‖bW

tj ,n‖p)

qp )

1q ; λ is

the tuning parameter; V is the loss function.

Why use this penalization?wavelet coefficients across spatial locations tend to besparse, but the scaling coefficients are not.coefficients at the same location are consistent.

UCseal




Optimization

Douglas-Rachford Splitting method [2]:Splitting in that the two functions are used individuallyConvex function is involved by its proximity operator(prox)Ability to deal with non-smooth convex functionsQuick convergence rates

Algorith scheme:Fix ε ∈ [0,1], γ > 0, y0 ∈ RN

For n = 0,1, ...xn = proxγf2ynλn ∈ [ε,2− ε]yn+1 = yn + λn(proxγf1(2xn − yn)− xn).

UCseal




Semi-Supervised Squared Loss Regression

Assumptionslabels belong to a continuous space.labels are piecewise smooth with repsect to the graphtopology.

Model Set-Up

b = arg minb

1|N`|‖f` −W ∗

` b‖2 + λ‖bW‖τ ;p,1. (11)

.where f` is a vector with the i th entry as wi ∗ fi ;W∗` has thecorresponding i th row of W* where i ∈ N`.

UCseal




Example 1

X=1,2,...,1000Jump at x=50025% labeled dataThresholdedGaussian weightfunction:

ωu,v=

e−

(xu−xv )2

2σ2 if ‖xu−xv‖≤$

0 o.w.

UCseal




Estimated Wavelet Coefficients

True Coefficients Estimated Coefficient

Both are only active around the discontinuities in signals

UCseal




Robustness Test by Adding Noise in Labels

Note: Normal error with mean 0 and standard deviation sigma is added to the true label.

UCseal




Semi-Supervised Support Vector Machine

Assumption

Labels are −1,1

Model Set-Up

b = arg minb

1|N`|

∑i∈N`

max (0,1− yiΩib) + λ‖bW‖τ ;1,p. (12)

where yi is the i th entry of the label vector fl ; Ωi is the i th row ofW ∗

l .We can give higher penalty to higher frequency informationto "smooth" the information.The tuning parameter λ controls theoverall amount of penalization.

UCseal




Example 2

1000 nodes x = (x1, x2) randomly placed in the [0,1]× [0,1]square. The labels are given by

f (x) =

−1 if x1 < 0.8 & x2 ≤ (x1 + 0.5)2

1 o.w .(13)

Original Signal 25% Sample Estimated Signal

Note: The misclassification rate is 0.051.

UCseal




Average Missclassification VS Sample Size

UCseal




Robustness Test by Changing Graph

UCseal




Example 3

f (x) =

−1 if (x1≤0.4 & x2≥(x1+0.8)2) | (x1−0.45)2+(x2−0.45)2≤0.04 | (x2≤x1−0.5 & x1≥0.5)

1 o.w

Only 5% of the data is sampled.

Original Signal 5% Sample Estimated Signal

UCseal




Control the Smoothness

To "smooth" the result, we used the function below to givehigher penalty to higher freqency information.

τj,n ∝ t−5j (14)

where tj is the value of j th scale j = 1,2,3,4.

The model with that structure of penalty at different scaleshas smaller missclassification rate(0.095) than the modelwith same penalty at different scales(0.119) when we onlyhave 5% of labeled data.

UCseal



BackgroundResults

Background

The Semi-Supervised Support Vector Machine with SGWT isapplied to document categorization experiments using the 20newsgroups dataset.

This dataset contains 205 Life Sciences articles and 280Medicine articles from Science News and 1153 words arechosen as being relevant for this body of documents.

A document-word matrix whose entry (i ; j) is the frequency ofthe jth word in the dictionary in the ith document wasconstructed. The following weight function on the edges isused:

wu,v = exp(− 1

0.03

(1− uT v|u||v |

))(15)

UCseal



BackgroundResults

Average Missclassification VS Sample Size

UCseal

AppendixAcknowledgementReference

Thanks!

Acknowledgement: Prof. Risi Kondor

UCseal

AppendixAcknowledgementReference

Reference I

K. D. Hammond, P. Vandergheynst, R. Gribonval(2009),Wavelets on Graphs via Spectral Graph Theory,Applied and Computational Harmonic Analysis 30 129-150.

P. Combettes and J. Pesquet(2007).A douglas-rachford splitting approach to nonsmooth convexvariational signal recovery.IEEE J. Selected Topics in Signal Processing,1(4):564-574.

Documents

master thesis presentation