Upload
sophia-gao
View
347
Download
0
Embed Size (px)
Citation preview
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Generalized Semi-Supervised Learning onUndirected Graphs with Multiscale Spectral
Graph Wavelet Transformation
Cheng Gao
Advisor: Prof. Risi Kondor
Department of StatisticsThe University of Chicago
June 17, 2013
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Outline
1 Semi-Supervised Learning on Graphs2 Spectral Graph Wavelet Theory
Spectral Graph Wavelet TransformationSpectral Graph TheorySpectral Graph WaveletsScaling Waveform
3 Semi-Supervised Learning with SGWTGeneralized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT
4 ApplicationBackgroundResults
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Semi-Supervised Learning on Graphs
In Graph-based methods, we have a weighted graphwhere the nodes are the labeled and unlabeled data pointsand weighted edges reflect the similarity of nodes.
Most of the current graph-based approaches tosemi-supervised learning are based on the assumptionthat the target function is smooth with respect to the graphtopology.
However,labels may have large local variations. Onepossible way to solve this problem is to construct alearning method with multi-scale wavelets that canapproximate piecewise smooth signals.
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Spectral Graph Wavelet TransformationSpectral Graph TheorySpectral Graph WaveletsScaling Waveform
Spectral Graph Wavelet Transformation
Weighted graphs G = (E ,V ,w):
N vertices, w a weight function of edge ESymmetric adjacency matrix A: Am,n = w(em,n) if em,n ∈ Ef (m) is the value on the mth node
Wavelets on weighted graphs:Linear,multiscale representation for functions on verticesovercomplete transformMeasure "how much of" a wavelet is present in the signal
SGWT:Flexibly model complicated data domainsCan be used to approximate piecewise smooth signals
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Spectral Graph Wavelet TransformationSpectral Graph TheorySpectral Graph WaveletsScaling Waveform
Spectral Graph Theory
Graph Laplacian:
L = D − A where Di,i =N∑
k=1Ak ,i is the degree matrix
Spectral decomposition of L:Lχ` = λ`χ` (1)
Graph “Fourier transform”:
f (`) =< χ`, f >=N∑
n=1
χ∗` (n)f (n). (2)
Graph “inverse Fourier transform”:
f (n) =N−1∑`=0
f (`)χ`(n). (3)
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Spectral Graph Wavelet TransformationSpectral Graph TheorySpectral Graph WaveletsScaling Waveform
Spectral Graph Wavelets
Wavelet kernels g : R+ → R+:behave as a band-pass filterlocalization at small scales
Wavelet operator at scale t :
T tg = g(tL) (4)
Wavelets:ψt ,n = T t
gδn. (5)
where ψt ,n(m) =N−1∑l=0
g(tλl)χ∗l (n)χl(m).
Wavelet coefficients:
Wf (t ,n) = (T tg f )(n) =< ψt ,n, f >=
N−1∑`=0
g(tλ`)f (`)χ`(m). (6)
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Spectral Graph Wavelet TransformationSpectral Graph TheorySpectral Graph WaveletsScaling Waveform
Scaling Waveform
Scaling waveforms:Used to represent the low frequency content of signal fHelps ensure stable recovery of the original signal f fromthe wavelet coefficients when the scale parameter t issampled at a discrete number of values
Scaling kernel h: h(0) > 0 and h(x)→ 0 as x →∞
Spectral graph scaling functions:
φn = Thδn = h(L)δn (7)
Stable recovery:Hammond [1] proves that stable recovery will be assured if the
quantity G(λ) = h(λ)2 +J∑
j=1g(tjλ)2 is bounded away from zero
on the spectrum of L by a upper bound A and a lower bound B.
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Spectral Graph Wavelet TransformationSpectral Graph TheorySpectral Graph WaveletsScaling Waveform
SGWT Design
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT
Reconstruction
SGWT Reconstruction[1]:V = (W ∗W )−1W ∗c (8)
Simplified "Reconstruction":V = W ∗b (9)
Lemma
Let W be the linear map RN → RN(J+1) with ψtk ,n(m) as(kN + m)th row and nth column element of the matrix, thenW ∗W is a diagonal matrix.
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT
Generalized Regularization Framework
b = arg minb
1|N`|
∑i∈N`
ωiV (fi , (W ∗b)i) + λ‖bW‖τ ;q,p , (10)
where N` is a set of labeled data; ωi is the weight for the i th
label; bW represents the wavelet coefficients vector; The `q/`ppenalty function: ‖bW‖τ ;q,p = (
∑Nn=1(
∑Lj=1 τj,n‖bW
tj ,n‖p)
qp )
1q ; λ is
the tuning parameter; V is the loss function.
Why use this penalization?wavelet coefficients across spatial locations tend to besparse, but the scaling coefficients are not.coefficients at the same location are consistent.
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT
Optimization
Douglas-Rachford Splitting method [2]:Splitting in that the two functions are used individuallyConvex function is involved by its proximity operator(prox)Ability to deal with non-smooth convex functionsQuick convergence rates
Algorith scheme:Fix ε ∈ [0,1], γ > 0, y0 ∈ RN
For n = 0,1, ...xn = proxγf2ynλn ∈ [ε,2− ε]yn+1 = yn + λn(proxγf1(2xn − yn)− xn).
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT
Semi-Supervised Squared Loss Regression
Assumptionslabels belong to a continuous space.labels are piecewise smooth with repsect to the graphtopology.
Model Set-Up
b = arg minb
1|N`|‖f` −W ∗
` b‖2 + λ‖bW‖τ ;p,1. (11)
.where f` is a vector with the i th entry as wi ∗ fi ;W∗` has thecorresponding i th row of W* where i ∈ N`.
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT
Example 1
X=1,2,...,1000Jump at x=50025% labeled dataThresholdedGaussian weightfunction:
ωu,v=
e−
(xu−xv )2
2σ2 if ‖xu−xv‖≤$
0 o.w.
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT
Estimated Wavelet Coefficients
True Coefficients Estimated Coefficient
Both are only active around the discontinuities in signals
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT
Robustness Test by Adding Noise in Labels
Note: Normal error with mean 0 and standard deviation sigma is added to the true label.
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT
Semi-Supervised Support Vector Machine
Assumption
Labels are −1,1
Model Set-Up
b = arg minb
1|N`|
∑i∈N`
max (0,1− yiΩib) + λ‖bW‖τ ;1,p. (12)
where yi is the i th entry of the label vector fl ; Ωi is the i th row ofW ∗
l .We can give higher penalty to higher frequency informationto "smooth" the information.The tuning parameter λ controls theoverall amount of penalization.
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT
Example 2
1000 nodes x = (x1, x2) randomly placed in the [0,1]× [0,1]square. The labels are given by
f (x) =
−1 if x1 < 0.8 & x2 ≤ (x1 + 0.5)2
1 o.w .(13)
Original Signal 25% Sample Estimated Signal
Note: The misclassification rate is 0.051.
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT
Average Missclassification VS Sample Size
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT
Robustness Test by Changing Graph
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT
Example 3
f (x) =
−1 if (x1≤0.4 & x2≥(x1+0.8)2) | (x1−0.45)2+(x2−0.45)2≤0.04 | (x2≤x1−0.5 & x1≥0.5)
1 o.w
Only 5% of the data is sampled.
Original Signal 5% Sample Estimated Signal
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
Generalized Semi-Supervised Learning with SGWTSemi-Supervised Squared Loss Regression with SGWTSemi-Supervised Support Vector Machine with SGWT
Control the Smoothness
To "smooth" the result, we used the function below to givehigher penalty to higher freqency information.
τj,n ∝ t−5j (14)
where tj is the value of j th scale j = 1,2,3,4.
The model with that structure of penalty at different scaleshas smaller missclassification rate(0.095) than the modelwith same penalty at different scales(0.119) when we onlyhave 5% of labeled data.
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
BackgroundResults
Background
The Semi-Supervised Support Vector Machine with SGWT isapplied to document categorization experiments using the 20newsgroups dataset.
This dataset contains 205 Life Sciences articles and 280Medicine articles from Science News and 1153 words arechosen as being relevant for this body of documents.
A document-word matrix whose entry (i ; j) is the frequency ofthe jth word in the dictionary in the ith document wasconstructed. The following weight function on the edges isused:
wu,v = exp(− 1
0.03
(1− uT v|u||v |
))(15)
UCseal
Semi-Supervised Learning on GraphsSpectral Graph Wavelet Theory
Semi-Supervised Learning with SGWTApplication
BackgroundResults
Average Missclassification VS Sample Size
UCseal
AppendixAcknowledgementReference
Thanks!
Acknowledgement: Prof. Risi Kondor
UCseal
AppendixAcknowledgementReference
Reference I
K. D. Hammond, P. Vandergheynst, R. Gribonval(2009),Wavelets on Graphs via Spectral Graph Theory,Applied and Computational Harmonic Analysis 30 129-150.
P. Combettes and J. Pesquet(2007).A douglas-rachford splitting approach to nonsmooth convexvariational signal recovery.IEEE J. Selected Topics in Signal Processing,1(4):564-574.