Transcript
Page 1: Graph-based Semi-Supervised and Active Learning for Edge …arb/posters/FlowSSL-KDD-2019-poster.pdfGraph-based Semi-Supervised and Active Learning for Edge Flows Junteng Jia, Michael

Graph-based Semi-Supervised and Active Learning for Edge FlowsJunteng Jia, Michael T. Schaub, Santiago Segarra and Austin R. BensonR [email protected], [email protected], [email protected], [email protected]

� https://github.com/000Justin000/ssl_edge

Motivation & Problem StatementConsider the problem of monitoring traffic flows in a region.Setting up sensors on all roads would provide accurate mea-surements, but is costly. Given traffic flow measurements ona subset of the roads, can we estimate the remaining flows?

Problem statement

Given:◦ a graph topology G = (V , E)◦flows on a subset of the edges EL

Infer:◦unknown flows on EU ≡ E/E L.

?? ?

??

labeled unknown?

Key insight: a suitable learning assumption for edge flows.Flow conservation – flows that enter/exit a node must balance.

Edge- vs. vertex-based semi-supervised learning

labeled inferred inferredlabeled

Vertex-based learning◦given some vertex labels◦ impose smoothness assumption◦ interpolate unknown vertices

Edge-based flow learning◦given some edge flows◦ impose flow conservation◦ infer unknown edge flows

Formulation & Inference Algorithm

1

2

3 4

5

6

71

2

3

4

5

6

7

80

1 1

1

0 00

0-1

0 0

0

0 0

00

0 0

01 1 0

0 00

000 0

0

0

0

0

0

0 1-1

00

0 -1

0

1

-1 0

-1 0

0

0-10

1 0

0

-1-1

incidence matrix B

◦undirected graph G = (V , E) with |V| = n and |E | = m◦vertex label vector y ∈ Rn

◦define (net) edge flow as alternating function f : V ×V → R

f (i, j) ={− f (j, i), ∀ (i, j) ∈ E0, otherwise.

edge flow vector f ∈ Rm with fr = f (i, j) if Er ≡ (i, j), i < j

◦vertex-edge incidence matrix B ∈ Rn×m (right panel)

Computations: edge vs. vertex-based learningVertex-based learning

◦ ‖Bᵀy‖2 = ∑(i,j)∈E(yi− yj)2 measures “unsmoothness”

◦minimize sum-of-squares difference

y∗ = arg miny‖Bᵀy‖2 s.t. yi = yi, ∀Vi ∈ VL.

Edge flow learning

◦ (Bf)i measures the flow “divergence” on the ith vertex◦minimize sum-of-square divergence, with regularization

f∗ = arg minf‖Bf‖2 + λ2 · ‖f‖2 s.t. fr = fr, ∀Er ∈ EL.

◦ least-square solution (null space method f = f0 + ΦfU)

fU∗ = arg minfU

∥∥∥∥[BΦλ · I

]fU−

[−Bf0

0

]∥∥∥∥2

.

Empirical results & Reconstruction error bound

ZeroFill

LineGraph

FlowSSL

0.0 0.2 0.4 0.6 0.8 1.0

ratio labeled (| L|/| |)

0.0 0.2 0.4 0.6 0.8 1.0

ratio labeled (| L|/| |)

WinnipegPower Grid

0.0

0.2

0.4

0.6

0.8

1.0

corr

ela

tion

◦Transforming flow learning problem with LineGraph to besolved as vertex label learning problem performs no betterthan consistently predicting zero.

◦FlowSSL, our proposed semi-supervised edge-flow learningalgorithm, outperforms the baselines by a large margin.

Theorem: Assume the ground truth flow f = f + δ, where fis a divergence free flow; and we have flow measurements ona subset C edges with cardinality at least m− n + 1. Denotethe null-space of the incidence matrix as V = Null(B). Thenas the regularization parameter λ → 0 in our method, the re-construction error is bounded by [σ−1

min(VC, :) + 1] · ‖δ‖, whereσmin(·) is the smallest singular value of a matrix.

Active Learning Problem & Strategies

Goal: Select a set of edges |EL| = mL to minimize reconstruc-tion error (optimal sensor deployment with a limited budget).

1. RRQR – minimize error bound◦use rank revealing QR (RRQR)

to select well conditioned rows

VᵀC, : Π = Q [R1 R2] .

◦ EL from leading columns of Π

2. RB – select bottleneck edges◦ capture global flow trends◦ recursively bisect (RB) & select

edges that bridge clustersblack arrow: edges selected by RB

0.0 0.2 0.4 0.6 0.8 1.0

ratio labeled (| L|/| |)0.0 0.2 0.4 0.6 0.8 1.0

ratio labeled (| L|/| |)

Power Grid

0.0

0.2

0.4

0.6

0.8

1.0

corr

ela

tion

Winnipeg

Baseline

Random

RRQR

RB

Findings: RRQR provides additional gains for approximatelydivergence-free flows (left), RB works well for flows withglobal trends (right).

This research was supported by NSF award DMS-1830274, ARO AwardW911NF-19-1-0057, and European Union’s Horizon 2020 research andinnovation programme under the Marie Sklodowska-Curie grant agree-ment No 702410.

Recommended