Boosting probabilistic graphical model inference by incorporating prior knowledge from multiple sources

Praveen, Paurush, and Holger Fröhlich. "Boosting probabilistic graphical model inference by

incorporating prior knowledge from multiple sources." PloS one 8.6 (2013): e67410.

@St_Hakky

これは論文を読んだ際のまとめであり、私が書いた論文ではありません。

Introduction• Graphical models is useful for extracting meaningful

biological insights from experimental data in life science research. • (Dynamic) Bayesian Networks• Gaussian Graphical Models• Poisson Graphical Models• And so many other methods…

• These models can infer features of cellular networks in a data driven manner.

Problem• Network inference from experimental data is

challenging because of …• the typical low signal to noise ratio• High dimensional• Low number of replicates and noisy measurements(the

sample size is very small)

• The inference of regulatory network from such data is hence challenging and often fails to reach the desired level of accuracy.

How to deal with this problem?• One can either work at experimental level by

increasing the sample size.• This is the best way, but practically this is very difficult.

• The limitation of cost and time.

• We can deal with this problem at inference level and we will consider about it.

There are so many data in Bio• There are many knowledge in Bio : pathway

databases [6–8], Gene Ontology [9] and others.

• So, Integrating known information from databases and biological literature as prior knowledge thus appears to be beneficial.

Integrating heterogenous information into the learning process is not straight forward.• However, biological knowledge covers many

different aspects and is widely distributed across multiple knowledge resources.

• So, integrating heterogenous information into the learning process is not straight forward.

The usuful databases• Pathway databases [6–8]• Gene Ontology [9]• Promoter sequences• Protein-protein interactions• Evolutionary information• KEGG pathway• GO annotation

How to deal with the problem at inference level• From one information• Generate sample data• Integrating one or more particular information into

learning process• Gene regulatory networks were inferred from a

combination of gene expression data with transcription factor binding motifs• promoter sequences, protein-protein interactions,

evolutionary information, KEGG pathways, GO anotation

Several approaches have been proposed• On the technical side several approaches for

integrating prior knowledge into the inference of probabilistic graphical models have been published.

Prior Research • In [16] and [17] the authors only generate candidate structures with

significance above a certain threshold according to prior knowledge. • [16] Larsen, Peter, et al. "A statistical method to incorporate biological knowledge for

generating testable novel gene regulatory interactions from microarray experiments." BMC bioinformatics 8.1 (2007): 1.

• [17] Almasri, Eyad, et al. “Incorporating literature knowledge in Bayesian network for inferring gene networks with gene expression data.” International Symposium on Bioinformatics Research and Applications. Springer Berlin Heidelberg, 2008.

• Another idea is to introduce a probabilistic Bayesian prior over network structures. [18] introduced a prior for individual edges based on an a-priori assumed degree of belief.• [18] Fröhlich, Holger, et al. “Large scale statistical inference of signaling pathways

from rnai and microarray data.” BMC bioinformatics 8.1 (2007): 1.

Prior Research

• [19] describes a more general set of priors, which can also capture global network properties, such as scale-free behavior. • [19] Mukherjee, Sach, and Terence P. Speed. "Network inference using informative

priors." Proceedings of the National Academy of Sciences 105.38 (2008): 14313-14318.

• [20] use a similar form of prior as [18], but additionally combine multiple information sources via a linear weighting scheme.• Werhli, Adriano V., and Dirk Husmeier. "Reconstructing gene regulatory networks

with Bayesian networks by combining expression data with multiple sources of prior knowledge." Statistical applications in genetics and molecular biology 6.1 (2007).

Prior Research • [21] treat different information sources as statistically independent,

and consequently the overall prior is just the product over the priors for the individual information sources.

• [21] Gao, Shouguo, and Xujing Wang. "Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data." BMC bioinformatics 12.1 (2011): 1.

We focus on…

• The focus of this paper is on construction of consensus priors from multiple, heterogenous knowledge sources.

Proposal Methods• Latent Factor Model(LFM)• LFM relies on the idea of a generative process for the

individual information sources and uses Bayesian inference to estimate a consensus prior.

• Noisy-OR• The second model integrates different information

sources via a Noisy-OR gate.

Materials and Methods• 1.1： Edge-wise Priors for Data Driven Network

Inference.• 1.2 : Latent Factor Model (LFM)• 1.3 : Noisy-OR Model (NOM)• 1.4 : Information Sources

Edge-wise Priors for Data Driven Network Inference• D : experimental data• : network graph (represented by an m×m

adjacency matrix) to infer from data.• Bayes rule :

Edge-wise Priors for Data Driven Network Inference• P() is a prior. We assume P() can be decomposed

into

• is a matrix of prior edge confidence.• = 1 : high prior degree of belief in the existence of

the edge i→j

Our purpose• Our purpos is to compile in a consistent manner

from n available resources.

• we have n edge confidence matrix with n information matrix.

Latent Factor Model(LFM)• LFM is based on the idea that the prior information

encoded in matrices all originate from the true but unknown network .

Latent Factor Model(LFM)• We use this notion to conduct joint Bayesian

inference on W as well as additional parameters given

• The idea behind this equation is that we can identify with the posterior

Latent Factor Model(LFM)• The entries of each matrix can be assumed to

follow beta distribution.

•詳しい説明は省きますが、 αと βの値をいじれるので、それぞれの情報源の重要度を独立に決めることができます

MCMC to get and • Employ an adaptive Markov Chain Monte Carlo

strategy to learn the latent variable together with parameters .• In network space MCMC moves are edge insertion,

deletion and reversal. • In parameter space and are adapted on log-scale using

a multivariate Gaussian transition kernel.

What is MCMC?• PRML11章と、緑本の 8章あたりが参考になる

Noisy-OR model(NOM)• The NOM represents a non-deterministic

disjunctive relation between an effect.• Its possible causes and has been extensively used in

artificial intelligence.

Noisy-OR model(NOM)• The Noisy-OR principle is governed by two

hallmarks: • First, each cause has a probability to produce the effect.• Second, the probability of each cause being sufficient to

produce the effect is independent of the presence of other causes.

Noisy-OR model(NOM)• In our case are interpreted as causes and as

effect. The link between both is given by

• = 1 : high prior degree of belief in the existence of the edge i→j.

LFM vs NOM• LFM• A high level of agreement between information sources

is required in order to achieve high values in

• NOM• In consequence becomes close to 1, if the edge i→j

has a high confidence in at least one information source, because then the product gets close to 0.

NOM.RNK• In addition to NOM, we also experimented with a

variant based on relative ranks.

Information Sources• We use the following information as prior

information• GO annotation• Pathway Databases

• KEGG, PathwayCommons• Protain domain annotation

• InterPro• Protatin domain interaction

• DOMINE

GO annotation• Interacting proteins often function in the same

biological process.

• So, two proteins acting in the same biological process are more likely to interact than two proteins involved in different processes.

• We used the default similarity measure for gene products implemented in the R-package GOSim.

Pathway Databases• KEGG

• R-package : KEGGgraph• PathwaysCommons database• To compute a condence value for each interaction

between a pair of genes/proteins we can work at the level of this graph in different ways:• Shortest path distance between the two entities

• Dijkstra’s algorithm• R-package : RBGL• In this paper, we use this one because we observed that the use of

the diffusion kernel in place of the shortest path measure did not affect the results much.

• Diffusion kernels• R-package : pathClass

Protein domain annotation

• Protein domain annotation was compared on the basis of a binary vector representation via the cosine similarity.

• proteins with similar domains are more likely to act in similar biological pathways.

• The condence of interaction between two proteins can thus be seen as a function of the similarity of the domain annotations of proteins.

Protein domain annotation• Protein domain annotation can be retrieved from

the Inter-Pro database.• A “1" in a component thus indicates that the

protein is annotated with the corresponding domain. Otherwise a “0" is filled in. • The similarity between two binary vectors u, v

(domain signatures) is presented in terms of the cosine similarity

Protatin domain interaction• The relative frequency of interacting protein domain pairs

was taken as another confidence measure for an edge a-b.

• Two proteins are more likely to interact if they contain domains, which can potentially interact.

• The DOMINE database collates known and predicted domain-domain interactions• where H is the number of hit pairs found in the DOMINE

database and DA and DB are the the number of domains in proteins A and B, respectively

Results• 2.1 : Correlation of Prior Edge Confidences with

True Biological Network• Network Sampling• Simulated Information Sources• Weighting of Information Sources• Real Information Sources

• 2.2 : Enhancement of Data Driven Network Reconstruction Accuracy• Simulated Data• Application to Breast Cancer• Application to Yeast Heat-Shock Network




•今回は割愛




Enhancement of Data Driven Network Reconstruction Accuracy• We next investigated, in how far our priors could

enhance the reconstruction performance of Bayesian Networks learned from data.

• From the set of best fitting network structures the one with minimum BIC value was selected.

Network reconstruction for the breast cancer data• (Right) Without using any prior• (Left) Using the NOM prior

• Black edges could be verified with established literature knowledge• Grey edges could not be verified.

Network reconstruction for the breast cancer data

Yeast heat-shock response network obtained via Bayesian network reconstruction• (a) Network without any prior knowledge• (b) The gold standard network from YEASTRACT database• (c) Network reconstructed with prior knowledge (NOM).

Reconstruction performance of Yeast heat-shock response network with Bayesian Networks and different priors (NP = No Prior).

Conclusion• We propose two methods:• Latent Factor Model(LFM)• Noisy-OR model(NOM)

• Result : • Both of our models yielded priors which were

significantly closer to the true biological network than competing methods.• They could significantly enhance the reconstruction

performance of Bayesian Networks compared to a situation without any prior.

Conclusion• Our methods were also superior to purely using

STRING edge confidence scores as prior information.

• The proposal methods are robust for network size. They can manage small , medium, large network.• The LFM’s computational cost is higher than NOM, so we

should use NOM to large network.

Data & Analytics

Boosting probabilistic graphical model inference by incorporating prior knowledge from multiple sources