61
Structure Learning Probabilis2c Graphical Models BN Structure Learning

Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

  • Upload
    lehanh

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Structure'Learning'

Probabilis2c'Graphical'Models' BN'Structure'

Learning'

Page 2: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Why Structure Learning •  To learn model for new queries, when

domain expertise is not perfect

•  For structure discovery, when inferring network structure is goal in itself

Page 3: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Importance of Accurate Structure

•  Incorrect independencies •  Correct distribution P*

cannot be learned •  But could generalize better

•  Spurious dependencies •  Can correctly learn P* •  Increases # of parameters •  Worse generalization

A B D

C

Adding an arc Missing an arc

A B D

C A B D

C

Page 4: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Score-Based Learning

A,B,C <1,0,0> <1,1,1> <0,0,1> <0,1,1> . . <0,1,0>

A B

C

C

B A

C B A

Search for a structure that maximizes the score

Define scoring function that evaluates how well a structure matches the data

Page 5: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Likelihood)Structure)Score)

Probabilis3c)Graphical)Models) BN)Structurds)

Learning)

Page 6: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Likelihood Score •  Find (G,θ) that maximize the likelihood

Page 7: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Example X Y X Y

Page 8: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

General Decomposition •  The Likelihood score decomposes as:

Page 9: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Limitations of Likelihood Score

•  Mutual information is always ≥ 0 •  Equals 0 iff X, Y are independent

–  In empirical distribution •  Adding edges can’t hurt, and

almost always helps •  Score maximized for fully

connected network

X Y X Y

Page 10: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Avoiding Overfitting •  Restricting the hypothesis space – restrict # of parents or # of

parameters •  Scores that penalize complexity: – Explicitly – Bayesian score averages over all

possible parameter values

Page 11: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Summary •  Likelihood score computes log-likelihood of D

relative to G, using MLE parameters –  Parameters optimized for D

•  Nice information-theoretic interpretation in terms of (in)dependencies in G

•  Guaranteed to overfit the training data (if we don’t impose constraints)

Page 12: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

BIC$Score$and$Asympto3c$Consistency$

Probabilis3c$Graphical$Models$ BN$Structure$

Learning$

Page 13: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Penalizing Complexity

•  Tradeoff between fit to data and model complexity

Page 14: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Asymptotic Behavior

•  Mutual information grows linearly with M while complexity grows logarithmically with M –  As M grows, more emphasis is given to fit to data

Page 15: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Consistency

•  As M∞, the true structure G* (or any I-equivalent structure) maximizes the score – Asymptotically, spurious edges will not

contribute to likelihood and will be penalized – Required edges will be added due to linear

growth of likelihood term compared to logarithmic growth of model complexity

Page 16: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Summary •  BIC score explicitly penalizes model

complexity (# of independent parameters) –  Its negation often called MDL

•  BIC is asymptotically consistent: –  If data generated by G*, networks I-equivalent

to G* will have highest score as M grows to ∞

Page 17: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Bayesian(Score(

Probabilis0c(Graphical(Models( BN(Structure(

Learning(

Page 18: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Bayesian Score Marginal likelihood Prior over structures

Marginal probability of Data

Page 19: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Marginal Likelihood of Data Given G

Likelihood Prior over parameters

Page 20: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Marginal Likelihood Intuition

Page 21: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Marginal Likelihood: BayesNets

∫∞

−−=Γ0

1)( dtetx tx

)1()( −Γ⋅=Γ xxx

Page 22: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Marginal Likelihood Decomposition

Page 23: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Structure Priors

•  Structure prior P(G) – Uniform prior: P(G) ∝ constant –  Prior penalizing # of edges: P(G) ∝ c|G| (0<c<1) –  Prior penalizing # of parameters

•  Normalizing constant across networks is similar and can thus be ignored

Page 24: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Parameter Priors •  Parameter prior P(θ|G) is usually the BDe prior

–  α: equivalent sample size –  B0: network representing prior probability of events –  Set α(xi,pai

G) = α P(xi,paiG| B0)

•  Note: paiG are not the same as parents of Xi in B0

•  A single network provides priors for all candidate networks

•  Unique prior with the property that I-equivalent networks have the same Bayesian score

Page 25: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

BDe and BIC •  As M∞, a network G with Dirichlet

priors satisfies

Page 26: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Summary •  Bayesian score averages over parameters to avoid

overfitting •  Most often instantiated as BDe –  BDe requires assessing prior network –  Can naturally incorporate prior knowledge –  I-equivalent networks have same score

•  Bayesian score –  Asymptotically equivalent to BIC –  Asymptotically consistent –  But for small M, BIC tends to underfit

Page 27: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Structure'Learning'In'Trees'

Probabilis4c'Graphical'Models' BN'Structure'

Learning'

Page 28: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Score-Based Learning

A,B,C <1,0,0> <1,1,1> <0,0,1> <0,1,1> . . <0,1,0>

A B

C

C

B A

C B A

Search for a structure that maximizes the score

Define scoring function that evaluates how well a structure matches the data

Page 29: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Optimization Problem Input: – Training data – Scoring function (including priors, if needed) – Set of possible structures

Output: A network that maximizes the score Key Property: Decomposability

Page 30: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

•  Forests – At most one parent per variable

•  Why trees? – Elegant math – Efficient optimization – Sparse parameterization

Learning Trees/Forests

Page 31: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Learning Forests •  p(i) = parent of Xi, or 0 if Xi has no parent

•  Score = sum of edge scores + constant

Score of “empty” network

Improvement over “empty” network

Page 32: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

•  Set w(i→j) = Score(Xj | Xi ) - Score(Xj)

•  For likelihood score, w(i→j) = M I (Xi; Xj), and all edge weights are nonnegative Optimal structure is always a tree

•  For BIC or BDe, weights can be negative Optimal structure might be a forest

Learning Forests I

Page 33: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

•  A score satisfies score equivalence if I-equivalent structures have the same score – Such scores include likelihood, BIC, and BDe

•  For such a score, we can show w(i→j) = w(j→i), and use an undirected graph

Learning Forests II

Page 34: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

•  Define undirected graph with nodes {1,…,n} •  Set w(i,j) = max[Score(Xj | Xi ) - Score(Xj),0] •  Find forest with maximal weight

– Standard algorithms for max-weight spanning trees (e.g., Prim’s or Kruskal’s) in O(n2) time –  Remove all edges of weight 0 to produce a forest

Learning Forests III (for score-equivalent scores)

Page 35: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

PCWP CO HRBP

HREKG HRSAT

ERRCAUTER HR HISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

PVSAT FIO2

PRESS

INSUFFANESTH TPR

LVFAILURE

ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME

PCWP CO HRBP

HREKG HRSAT

ERRCAUTER HR HISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

PVSAT FIO2

PRESS

INSUFFANESTH TPR

LVFAILURE

ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME

HYPOVOLEMIA

CVP

BP

HYPOVOLEMIA

CVP

BP

Learning Forests: Example

•  Not every edge in tree is in the original network •  Inferred edges are undirected – can’t determine direction

Correct edges

Spurious edges

Tree learned from data of Alarm network

Page 36: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Summary •  Structure learning is an optimization over

the combinatorial space of graph structures •  Decomposability network score is a sum

of terms for different families •  Optimal tree-structured network can be

found using standard MST algorithms •  Computation takes quadratic time

Page 37: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

General'Graphs:'Search'

Probabilis2c'Graphical'Models' BN'Structure'

Learning'

Page 38: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Optimization Problem Input: – Training data – Scoring function – Set of possible structures

Output: A network that maximizes the score

Page 39: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Beyond Trees •  Problem is not obvious for general networks

–  Example: Allowing two parents, greedy algorithm is no longer guaranteed to find the optimal network

•  Theorem: –  Finding maximal scoring network structure with at

most k parents for each variable is NP-hard for k>1

Page 40: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Heuristic Search

A B

C

D

A B

C

D

A B

C

D

A B

C

D

Page 41: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

•  Search operators: –  local steps: edge addition, deletion, reversal –  global steps

•  Search techniques: –  Greedy hill-climbing –  Best first search –  Simulated Annealing –  ...

Heuristic Search

Page 42: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

•  Start with a given network – empty network – best tree – a random network – prior knowledge

•  At each iteration – Consider score for all possible changes – Apply change that most improves the score

•  Stop when no modification improves score

Search: Greedy Hill Climbing

Page 43: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Greedy Hill Climbing Pitfalls •  Greedy hill-climbing can get stuck in: – Local maxima – Plateaux

• Typically because equivalent networks are often neighbors in the search space

Page 44: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Why Edge Reversal

B A

C

B A

C

Page 45: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

A Pretty Good, Simple Algorithm •  Greedy hill-climbing, augmented with: •  Random restarts: – When we get stuck, take some number of

random steps and then start climbing again •  Tabu list: – Keep a list of K steps most recently taken – Search cannot reverse any of these steps

Page 46: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Example: ICU-Alarm

0

0.5

1

1.5

2

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

KL D

iver

genc

e

M

True Structure/BDe α = 10 Unknown Structure/BDe α = 10

Page 47: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

JamBayes

Horvitz, Apacible, Sarin, & Liao, UAI 2005

Page 48: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Predicting Surprises

Horvitz, Apacible, Sarin, & Liao, UAI 2005

Page 49: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Learned Model

Horvitz, Apacible, Sarin, & Liao, UAI 2005

Page 50: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Influences in Learned Model

Horvitz, Apacible, Sarin, & Liao, UAI 2005

Page 51: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Known 15/17 Supported 2/17 Reversed 1 Missed 3

From “Causal protein-signaling networks derived from multiparameter single-cell data” Sachs et al., Science 308:523, 2005. Reprinted with permission from AAAS.

Biological Network Reconstruction Phospho-Proteins Phospho-Lipids Perturbed in data

PKC

Raf

Erk

Mek

Plcγ

PKA

Akt

Jnk P38

PIP2

PIP3

Subsequently validated in wetlab

This figure may be used for non-commercial and classroom purposes only. Any other uses require the prior written permission from AAAS

Page 52: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Summary •  Useful for building better predictive models: – when domain experts don’t know the structure –  for knowledge discovery

•  Finding highest-scoring structure is NP-hard •  Typically solved using simple heuristic search –  local steps: edge addition, deletion, reversal – hill-climbing with tabu lists and random restarts

•  But there are better algorithms

Page 53: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

General'Graphs:'Decomposability'

Probabilis5c'Graphical'Models' BN'Structure'

Learning'

Page 54: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Heuristic Search

A B

C

D

A B

C

D

A B

C

D

A B

C

D

Page 55: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Naïve Computational Analysis •  Operators per search step:

•  Cost per network evaluation: –  Components in score –  Compute sufficient statistics –  Acyclicity check

•  Total: O(n2 (Mn + m)) per search step

Page 56: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Exploiting Decomposability

A B

C

D

A B

C

D Δscore(D) = Score(D | {B,C}) - Score(D | {C})

score = Score(A | {}) + Score(B | {}) + Score(C | {A,B}) + Score(D | {C})

score = Score(A | {}) + Score(B | {}) + Score(C | {A,B}) + Score(D | {B,C})

Page 57: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Exploiting Decomposability

A B

C

D

A B

C

D

A B

C

D

A B

C

D

Δscore(D) = Score(D | {B,C}) - Score(D | {C})

Δscore(C) = Score(C | {A}) - Score(C | {A,B})

Δscore(C)+Δscore(B) = Score(C | {A}) - Score(C | {A,B}) + Score(B | {C}) - Score(B | {})

Page 58: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Exploiting Decomposability

A B

C

D

A B

C

D

A B

C

D

A B

C

D

To recompute scores, only need to re-score families that changed in the last move

Δscore(C) = Score(C | {A}) - Score(C | {A,B})

Page 59: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Computational Cost •  Cost per move – Compute O(n) delta-scores damaged by move – Each one takes O(M) time

•  Keep priority queue of operators sorted by delta-score – O(n log n)

Page 60: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

More Computational Efficiency •  Reuse and adapt previously computed

sufficient statistics •  Restrict in advance the set of operators

considered in the search

Page 61: Graphical' Models' Structure' Learning'spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-5... · Models$ BN$Structure$ Learning$ ... • Not every edge in tree is in the

Daphne Koller

Summary •  Even heuristic structure search can get

expensive for large n •  Can exploit decomposability to get orders

of magnitude reduction in cost • Other tricks are also used for scaling