29
Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Collaborators: - Szabolcs Horvát (U. Notre Dame) - István Miklós (Rényi Inst. Math.) - Peter L. Erdős (Rényi Inst. Math.) - Kevin E. Bassler (U. Houston) - Charo del I Genio (U. Warwick) - Hyunju Kim (Arizona State) - László Székely (U. South Carolina) - Éva Czabarka (U. South Carolina) Sponsors:

Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Embed Size (px)

Citation preview

Page 1: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Constrained Graph Construction Problems in Network Modeling

Zoltán ToroczkaiDepartment of Physics,

University of Notre DameCollaborators:

- Szabolcs Horvát (U. Notre Dame)- István Miklós (Rényi Inst. Math.)- Peter L. Erdős (Rényi Inst. Math.)- Kevin E. Bassler (U. Houston)- Charo del I Genio (U. Warwick)- Hyunju Kim (Arizona State)- László Székely (U. South Carolina)- Éva Czabarka (U. South Carolina)

Sponsors:

Page 2: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Chess Puzzle: Swap the positions of white knights with those of the black knights

Page 3: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department
Page 4: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

a b c d

1

2

3

4

Page 5: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

a b c d

1

2

3

4

c4 b2

d1

d3

c3

a2

b1

c1 b3 a1 c2

I. Network representation

II. Redirected the process of thought to the “where” pathway.

Page 6: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

c4 b2

d1

d3

c3a2

b1

c1 b3 a1 c2

Optimized layout: minimal edge crossings, minimal wire length.Not optimal: it only

respects the relationships.This representation allows us to infer and exploit GLOBAL Information quickly.

Global information is necessary for finding solutions fast (esp. NP-hard problems).

“Dumb” algorithms: representation independent very inefficient.

“Smart” algorithms: exploit the structure of the data / global information.

How do we know that there is global information in a dataset?

How do we extract it?OPEN!

Page 7: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

c4 b2

d1

d3

c3a2

b1

c1 b3 a1 c2

This is very typical, e.g. : Interareal network in the macaque cortex

N.T. Markov, et al. Science 342(6158), 1238406 (2013)

Page 8: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

We are looking for the essential factors that generate the global information within the structure.

I understand a network if I can generate it (or similar versions of it).

Essential factors Constraints may appear through

Page 9: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Indeed, for the cortical network:

M. Ercsey-Ravasz et al., Neuron 80, 184-197 (2013).

N.T. Markov, et al. Science 342(6158), 1238406 (2013)

Wiring costs and cortical geometry

+

- Many features and network measures captured

- What is not captured: noise, or structures that need new constraints/info

Page 10: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Data Driven Network Modeling

constraints ensemble

Typical scenarios

Data

o Partial info

Want

o Good guesses about the rest

o Complete o Plausible constraints capturing the data

This setting defines a set of fundamental problems related to ensemble-based modeling of complex networks.

Constraints can be imposed:

• precisely/verbatim – Sharp constraints

• “softly”, via ensemble averages – Average constraints

Page 11: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Sharp Constraints

Consider: - the set of all simple graphs on N nodes: ( )

- a set of graph measures, or observables (the constraints):

Def. 1 : Sharply constrained ensemble:

i.e., all members of the ensemble have the same values precisely for the corresponding graph measures as given by the constraints.

There are 4 main problem classes related to network modeling with sharp constraints:

Existence:

Construction:

Sampling:

Counting:

Under what conditions on , ?

How to build any (or all) members of ?

How to sample by some distribution (uniformly) members of ?

How to compute or estimate ?

Page 12: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Typically studied problems:

Degree Sequence

for undirected graphs

for directed graphs

for bipartite graphs

Specifies the number of neighbors for all nodes.

Joint Degree Matrix (JDM)

A JDM specifies the number of edges between nodes of given degrees, for all degree pairs.

Partition the nodes into groups of given degrees (classes):

Then:

A JDM is a stronger constraint than the degree sequence which it also determines uniquely:

One can think of the JDM as specifying “two-point correlations” as well between nodal degrees.

Applications are for e.g., in social networks which are distinguished by positive degree correlations (assortative networks).

A.N. Patrinos & S.L. Hakimi. Discr. Math. 15, 347 (1976). I. Stanton & A. Pinar. ACM J. Exp. Alg. 17(3), 3.5 (2012).

Page 13: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Existence

Degree Sequence Well known, characterized.

Erdős-Gallai (EG)/Fulkerson-Ryser type theorems

E.g., 1) must be even and 2) must hold for all

Havel-Hakimi algorithm: Given a graphical sequence, choose a node , and connect all its stubs to other nodes with the largest residual degrees. Repeat until all stubs are connected into edges.

Def. 2 : If , we say that the constraint is graphical. Any graph

is said to realize . in this case is called a graphical realization of .

Joint Degree Matrix (JDM)

Theorem. A matrix is a graphical JDM iff:

1) 2) 3)

E. Czabarka, A. Dutle, P.L. Erdos, I. Miklos. Disc. Appl. Math. 181, 283 (2015). a clean and short proof to this EG type theorem.

Others have also provided similar characterizations (Stanton-Pinar, Amanatidis-Green-Mihail)

Page 14: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Construction

oDirect construction: sequentially connect stubs (half-edges).

How do we build any graph from ? Efficiently?

o Switches/Swaps: start from a realization then move edges around via some operations (e.g., edge swaps/switches) to arrive at another member .

What operations guarantee that all members can be reached this way?

Degree Sequences

H. Kim, Z. Toroczkai, P.L. Erdös, I. Miklós and L.A. Székely. J. Phys. A: Math. Theor. 42, 392001 (2009).

Undirected graphs:

Directed graphs: P.L. Erdös, I. Miklós and Z. Toroczkai. Elec. J. Comb. 17(1), R66 (2010).

Theorem (KTEMS): Provides necessary and sufficient conditions for graphicality of degree sequences that are restricted with forbidden edges forming a k-star on an arbitrary node i :

- non-edges (forbidden links)

oDirect construction

J. Blitzstein, P. Diaconis. Internet Mathematics, 6(4), 487–520 (2010)

Page 15: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

oSwitches/Swaps

Swap the ends of two independent edges (2-swap):

1 2

3 4

1 2

3 4- This preserves the degree sequence and connects (Ryser)

- Start from a graphical realization (e.g., H-H made), then do 2-swaps.

oSwitches/Swaps 1 2

3 4

1 2

3 4- Restricted Swap Operation (RSO):

Same degree class

- The RSO preserves the JDM and connects

É. Czabarka, A. Dutle, P.L. Erdős, I. Miklós. Disc. Appl. Math. 181, 283 (2015).

Joint Degree Matrix (JDM)

oDirect construction

- Generate a degree spectrum (any), then build all bipartite graphs between the degree classes, then create all simple graphs within every degree class.

P.L. Erdős, I. Miklós, C. I. Del Genio, K.E. Bassler & Z. Toroczkai. New. J. Phys. 17, 083052 (2015).

Def. : Let be the degree of node towards . The degree “spectrum” of node is the vector .

Page 16: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Sampling

oDirect construction based importance sampling

oMarkov Chain Monte Carlo (MCMC) based on switches

Requirements:- Sample a in steps (“in poly-time”).

- Obtain pseudo-random realizations via MCMC switching in poly-mixing time.

Degree Sequence

oDirect construction based

C.I. Del Genio, H. Kim, Z. Toroczkai and K.E. Bassler. PLoS ONE, 5(4) e10012 (2010). - undirectedH. Kim, C.I. Del Genio, K.E. Bassler and Z. Toroczkai. New J. Phys. 14, 023012 (2012). - directed

oMCMC based on edge swapsThis is the most studied, in particular the Mixing Time Problem

“supergraph” whose nodes are all the graphical realizations in .

A “super-edge” means that a 2-swap in the graph takes it to graph .

a Markov chain with transition matrix

Definitions:

The MCMC is a random walk on with probability transition matrix .

Page 17: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Let be the eigenvalues of and

Thus to show fast mixing one needs to find a polynomial upper bound (in the size of the graphs N – nr of nodes) on the mixing time, or the relaxation time:

Conjecture (Kannan, Tetali, Vempala, 1999):

The switch MCMC based on 2-swaps mixes rapidly over the set of all realizations of any graphical degree sequence.

This is still open!

- They have shown it only for regular bipartite graphs (same degrees everywhere).R. Kannan, P. Tetali and S. Vempala. Rand. Struct. Alg. 14 (4), 293-308 (1999)

C. Cooper, M. Dyer and C. Greenhill. Comp. Prob. Comp. 16 (4), 557-593 (2007)

- Cooper, Dyer and Greenhill has shown it for arbitrary regular undirected graphs .

C. Greenhill. Electronic J. Comb. 16 (4), 557-593 (2011)

- Greenhill proved it for regular directed graphs.

- C. Greenhill proved it for general bounded maximum degree undirected graphs Proc. 26th ACM-SIAM Symp. Discr. Alg., New York-Philadelphia, pp. 1564-1572 (2015). http://arxiv.org/abs/1412.5249

Page 18: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Additionally:

I. Miklós, P.L. Erdős & L. Soukup. Electronic J. of Comb. 20 (1), #P16, 1-51, (2013).

Miklós, Erdős and Soukup have just proved it for half-regular bipartite graphs

(A very technical proof on over 50 pages).

1)

Can we generate graphs uniformly at random that realize a given graphical degree sequence such that all realizations avoid creating edges from a forbidden subgraph?

They answered this question affirmatively for the following constraints:

where is a half-regular bi-degree sequence such that

and are arbitrary for

is a k-star centered on node .

is a 1-factor (a perfect matching) between the two node classes

P.L. Erdős, S.Z. Kiss, I. Miklós and L. Soukup. PLOS ONE, #e0131300 (2015). http://arxiv.org/abs1301.7523v2.

Theorem: There is switch MCMC that is mixing fast (in poly-time) in the state space of all realizations .

Page 19: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Joint Degree Matrix (JDM)

Theorem: The space of all realizations of any given JDM is connected via RSOs.

The RSO-based MCMC is irreducible.

Question: is the RSO-based MCMC mixing rapidly (poly-time in N ) on the set of all realizations of a JDM?

Theorem: The restricted swap operation Markov chain mixes rapidly over the balanced realizations of any JDM, i.e., , where N is the number of nodes.

P.L. Erdős, I. Miklós & Z. Toroczkai. SIAM Discr. Math. 29, 481 (2015) . http://arxiv.org/abs/1307.5295

All graphical JDMs admit balanced realizations. A JDM realization is balanced if the degrees of nodes within a degree class towards another degree class are as uniformly distributed as possible and this is true for all degree classes.

Def. : A realization of a JDM is balanced iff for all :

É. Czabarka, A. Dutle, P.L. Erdős, I. Miklós. Disc. Appl. Math. 181, 283 (2015).

Page 20: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Counting Compute or estimate

How constraining (or “non-random”) is ?

Fully Polynomial Almost Uniform Sampler (FPAUS):

- An MCMC algorithm that generates graph samples almost uniformly, in poly-time.

(sampling)

Fully Polynomial Randomized Approximation Scheme (FPRAS): (counting)

- An algorithm that estimates in poly-time.

Def. 3:

• M.R. Jerrum, L.G. Valiant and V.V. Vazirani. Theor. Comput. Sci. 43, 169 (1986).

•V.V. Vazirani. Approximation Algorithms. Springer (2003).

• http://www.cc.gatech.edu/~vigoda/MCMC_Course/Sampling-Counting.pdf

FPRAS FP Exact U Sampler

Computational hardness: “A is harder than B”:

Counting U. Sampling Construction Existence

Page 21: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Def. 4 : A problem is self-reducible if the solutions to any of its instances can be recursively generated from solutions to smaller instances of the same problem, s.t. the number of branches at each recursion step is polynomially bounded by the size of the problem instance.

M.R. Jerrum, L.G. Valiant and V.V. Vazirani. Theor. Comput. Sci. 43, 169 (1986)

Implications:

Exact Counter Exact U Sampler

FPRAS FPAUS

Thus, if we have an an FPAUS we can estimate efficiently

The classical degree-based graph construction problem is not self-reducible.

Theorem: The degree sequence problem constrained by a 1-factor and a k-star is self-reducible.

This implies that that an FPRAS can be constructed allowing to estimate .

P.L. Erdős, S.Z. Kiss, I. Miklós and L. Soukup. PLOS ONE, #e0131300 (2015). http://arxiv.org/abs1301.7523v2.

Page 22: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Graph measures:

Soft Constraints: Maximum Entropy Ensembles

E.g: # of edges |, # of , ...

There are many ways to choose probabilities P(G) that satisfy these! How do we choose the P(G) ?

Soft constraints:

Find a distribution P(G) over the set of all graphs such that the ensemble average obeys:

- are the constraints, e.g. given by data.

E. T. Jaynes, Physical Review 106, 620 (1957).

Equivalent treatment: use distributions over measures instead of over graphs:

nr of graphs in with property . ,

The Maximum Entropy Principle:

Choose the distribution that maximizes the information entropy

subject to the constraints and .

whereThe parameters control .In practice, is typically found numerically for a given .

Page 23: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

The Degeneracy problem (Strauss 1986):The sampled graphs may not be representative of the averages.

sparse dense

(# of edges)

(# of edges)

This happens when is not unimodal.

o How does become bimodal/multimodal ?

o What can we do to eliminate/minimize this issue?

Example:

terrorist cells

pairs interacted

triples collaborated

The probability that the 9 cells form a connected network?

What is the most likely network?

Using the MaxEnt:

Disctd. Conctd.

Exact enum:

none connected!

Yet MaxEnt says that it is connected with 0.6 probability!(but none has 17 edges and 19 triangles!)

Page 24: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

E. T. Jaynes, Physical Review 106, 620 (1957); ibid. 108, 171 (1957)

MaxEnt has been used extensively:

It is applicable to systems of any size

Tool to study mesoscale systems!

R.V. Chamberlin. Phys. Rev. Lett. 82, 2520 (1999); R.V. Chamberlin. Nature 408, 337 (2000).

Nanothermodynamics:R.V. Chamberlin. Science 298, 1172 (2002); R. Balian. From Microphysics to Macrophysics: Methods and Applications of Statistical Physics (Springer) 2007.

Many applications:

- Image reconstruction: S.F. Gull, G.J. Daniell. Nature 272, 686 (1978) [real-space images from x-ray scattering data]

Fluorescence of L-tryptohan: A.K. Livesey, J.C. Brochon. Biophys. J. 52, 693 (1987)

- Conformational states of poly-(L-proline) from single molecule Foester energy transfer resonance data:

L.P.. Watkins, H. Chang, H. Yang. J. Phys. Chem. A 110, 5191 (2006).

- Folding kinetics of dihydrofolate reductase: P.J. Steinbach, R. Jonescu, C.R. Matthews. Biophys. J. 82, 2244 (2002).

- CO ligand rebinding to a heme protein: P.J. Steinbach, K. Chu, H. Frauenfelder, et al. Biophys. J. 61, 235 (1991).

- Gene regulatory networks: A.M. Walczak, G. Tkacik, W. Bialek. Phys. Rev. E. 81, 041905 (2010).

- Infotaxis of moths: M. Vergassola, E. Villermaux, B.I. Shraiman. Nature 446, 406 (2007).

- And many many others....

Page 25: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

Sz. Horvát, É. Czabarka, & Z. Toroczkai. Phys. Rev. Lett., 114 158701 (2015).

THEOREM: The MaxEnt model is non-degenerate if and only if the density of states function is log-concave .

# of edges:

# of

two-

star

s:

Another example

For the terrorist network

Page 26: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

A solution:

We still use the same data as in the degenerate model, however, we

consider a one-to-one transformation

such that the corresponding density of states function

is log-concave and thus the corresponding model is non-degenerate.

Can still work in the same coordinate system but the states are sampled by the non-degenerate model with constraints .

How to choose ?

The typical reason for why is not log-concave is because its domain is not convex.

Any transformation that convexifies the domain is good!

Page 27: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

(⟨m| , ⟩ ⟨mv ) model⟩ (⟨m|2 , ⟩ ⟨mv ) model⟩

A data network example: Zachary’s Karate Club (ZKC)

Consider

Fit:

is degenerate!

Page 28: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

After linearization to obtain a convex domain.

Let us try to predict the number of triangles

The distribution of triangles by the same model is also bimodal.

Both and appear with very low probability in this model.

Recall:

Page 29: Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Zoltán Toroczkai Department

The linearized (or convexified) model produces a unimodal distribution.

It predicts:

Both and appear with high probability in this model.