View
221
Download
5
Category
Preview:
Citation preview
Marek CieplakMarek CieplakInstitute of Physics, Polish Academy of Science, Institute of Physics, Polish Academy of Science,
WarsawWarsaw
Inferring genetic interaction networks Inferring genetic interaction networks from gene expression patterns in from gene expression patterns in
microarraysmicroarrays
(for Saccharomyces (for Saccharomyces cerevisiae)cerevisiae)
Microarrays arrays of spots responding to specific genes that are being expressed
fluorescently labeled mobile probes - DNA or fluorescently labeled mobile probes - DNA or mRNA – hybridize with the complementary mRNA – hybridize with the complementary templatetemplate
cDNA – complementary DNA – obtained by reverse transcription from mRNA (no introns)
glass slides, silicon chips, nylon membranes
cDNA Microarrays: single strands attached covalently at fixed locations on a solid support. The location identifies a particular gene.
medical diagnostics
incubate and then wash out unbound probesDetect by using laser confocal fluorescence
scanning
1 spot: ~ 100 μm
Oligonucleotide Oligonucleotide microarraysmicroarrays
Santa Clara, CA since 1991
first product HIV genotyping GeneChip in 1994
Light-directed, spatially addressable parallel chemical synthesisLight-directed, spatially addressable parallel chemical synthesis,,
S.P.A. FodorS.P.A. Fodor, J.L. Read, M.C. Pirrung, L. Stryer, A.T. Lu, D. Solas, , J.L. Read, M.C. Pirrung, L. Stryer, A.T. Lu, D. Solas,
Science 251, 767 (1991) Science 251, 767 (1991) 1024 spots1024 spots
Agilent Technologies
Spots with oligonucleotidic fragments (5-50 bases)
$700 mln. market in the US
semiconductor-based semiconductor-based photolithography with photolithography with photo-sensitive linkers photo-sensitive linkers
one-by-one nucleotideone-by-one nucleotide
Schena M, Shalon D, Davis RW, Brown PO. (1995). Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 270: 467-70.
droplets of a DNA probe sample onto a functionalized glass slide
poly-lysin or poly-amin for electrostatic adsorption
streptavidin & biotin-labeled DNA probes
Up to 10 000 spots in 1cm2
Spots with single strand cDNA (100 - 5000 bases) Spots with single strand cDNA (100 - 5000 bases)
the first had 96 spots
Spotted Spotted microarrays microarrays
Robotic spotting
Competing technology: Competing technology: high-throughput gene high-throughput gene sequencing systemssequencing systems
Assignment of function to genesThe kinds and amounts of mRNA produced by a cell tell which genes are expressed, when the cell responds to its needs and stimuli.
Gene expression is a highly complex and tightly regulated process. "on/off" switch - "volume control"
Brown: Studies of anaerobic fermentation by YEAST
the well studied eukaryotethe well studied eukaryote
Saccharomyces cerevisiaeSaccharomyces cerevisiae
study gene expression as a function of study gene expression as a function of timetime
for thousands of for thousands of genes genes simultaneouslysimultaneously
Egyptian relief tomb sculpture from 2400 BC – steps in the brewing of beer. Grain.
Known since at least 4000 Known since at least 4000 BC. Arose perhaps 8000 BCBC. Arose perhaps 8000 BC
CC66HH1212OO66 2C 2C22HH55OH + 2COOH + 2CO22
Saccharomyces Saccharomyces cerevisiaecerevisiae
Used in production Used in production of top-fermenting of top-fermenting alesales
anaerobic fermentation by anaerobic fermentation by YEASTYEAST led to the discovery of proteins (invertase) and their enzymatic led to the discovery of proteins (invertase) and their enzymatic
functionfunction
glucosglucosee
or fructose ethanoethanoll
invertase 2 2+
A good supply of sugar: nuclear DNA expresses invertase to catalyse FERMENTATIONFERMENTATION.
Otherwise, yeast turns on its mitochondrial DNA, to perform RESPIRATIONRESPIRATION
mitochondrmitochondrionion
Addition of the PO4
2-
phosphate group
nucleusnucleuswith DNAwith DNA
OO22-consuming-consuming phosphorylation phosphorylation (production of ATP from (production of ATP from ADP) in the inner ADP) in the inner membrane membrane
19 genes19 genes
~620~6200 0 genegeness
burnsburns available sources of carbon, including the alcohol
Diauxic (‘second life’) shift – Diauxic (‘second life’) shift – discovered by J. Monod in 1941: discovered by J. Monod in 1941: enzymes do not adaptenzymes do not adapt – other – other enzymes are expressed when enzymes are expressed when conditions change.conditions change.
ethanol
glucose
glycogen
trehalose
pH
cell number
Growth larger during Growth larger during fermentationfermentation
inside cellsinside cells
in milieuin milieu
Both the pH of the culture Both the pH of the culture and the cell numbers and the cell numbers oscillate: oscillate: cell division gets cell division gets in step, as well in step, as well
Metabolic oscillations in yeast Metabolic oscillations in yeast culturescultures Porro et al. 1988 6 h cycle6 h cycle
The cells go back and forth The cells go back and forth between consuming glucose between consuming glucose and producing ethanol and and producing ethanol and storing the glucose as storing the glucose as glycogen and trehalose. glycogen and trehalose.
respiratiorespirationn
fermentatifermentationonThe period of oscillations The period of oscillations
depends on the rate of sugar depends on the rate of sugar deliverydelivery
dissolved oxygen = - dissolved oxygen = - consumedconsumed
Raught, Gingras, Sonenberg 2001
Rohde, Heitman, Cardenas 2001
TwoTwo partially overlapping partially overlapping signaling pathwayssignaling pathways that that control this have been control this have been identified:identified:
protein kinase A (PKA)protein kinase A (PKA) & & target-of-rapamycin (TOR)target-of-rapamycin (TOR)
the two key proteins that the two key proteins that are expressed on the are expressed on the pathways. pathways.
Rapamycin a molecule (antibiotic) produced by bacteria Streptomyces hygroscopicus in the soil of Rapa Nui. Used e.g. to prevent rejection of kidney transplants
Mitochondria have to give a shout to the nucleus to make all the proteins that are involved in their structure and in breaking down sugars and performing oxidative phosphorylation.
TOR signals hunger: couples TOR signals hunger: couples nutrient availability with cell nutrient availability with cell growthgrowth
40 minute metabollic 40 minute metabollic oscillationsoscillations
Total of 6100 genes Total of 6100 genes expressed: 5329 expressed: 5329 nuclear & 19 nuclear & 19 mitochondrialmitochondrial
Klevecz, Bolen, Forrest, Murray (Yale) Klevecz, Bolen, Forrest, Murray (Yale) 20042004
650
2429
2250
RNA samples collected every 4 RNA samples collected every 4 min.min.
respiration
fermentation
A A genomewidgenomewide oscillatione oscillation
Mycoplasma genitalium – 470 genes – the smallest genome sustaining independent life was not studied
Genes involved in similar Genes involved in similar functions expressed in groups functions expressed in groups at the same timeat the same time
(another ~11 hour cycle: (another ~11 hour cycle: DeRisi, Iyer, Brown DeRisi, Iyer, Brown 1997)1997)
degradationconstruction of proteins and rRNA in ribosomes
Ubiquitin-Proteosom
Cytosolic Ribosomal
Sulfur&Methinine metabolism
Mitochondrial Ribosomal clustering
determined through correlations
Statistical clustering: guilt-by-associationStatistical clustering: guilt-by-association
“ “members of the same members of the same choir”choir”
Tu, Kudlicki, Rowicka, McKnight., Science 310:1152 (2005)
Ribosomal proteins
Mitochondrial proteins
DNA replication/cell
division
Energy metabolism Redox
homeostasis/sulfur metabolism
functionally interconnected groups of genes: proteins encoded by genes involved in the same biological process are often co-regulatedEisen, Spellman, Brown, Botstein 1998;
Saldanha, Brauer, Botstein 2004
300 min. cycle, ~6209 300 min. cycle, ~6209 genes, sampled every ~25 genes, sampled every ~25 minmin
Statistical tools – correlations - to detect Statistical tools – correlations - to detect significant differences in expression levels and significant differences in expression levels and identify groups of genes exhibiting similar identify groups of genes exhibiting similar expression patternsexpression patternsCorrelation measures do not provide direct Correlation measures do not provide direct insight into the identity or nature of the gene insight into the identity or nature of the gene interactions that give rise to the observed interactions that give rise to the observed expression patterns expression patterns
Violin players Violin players “express” in a “express” in a correlated way, but it correlated way, but it is the conductor that is the conductor that directs the directs the expressionexpression
WHO ARE THE WHO ARE THE CONDUCTORS CONDUCTORS IN THE IN THE GENETIC GENETIC NETWORKS?NETWORKS?
WHAT ARE THE EFFECTIVE GENE-GENE INTERACTIONS?
GG genes genesP<<GP<<G measurements of transcript levels measurements of transcript levelsGGxxPP data points data points GG((GG-1)/2 binary interaction values-1)/2 binary interaction values
INTERACTIONS MEDIATED BY PROTEINS
The challenge of direct network inference:What are the gene interactions that coordinate cellular changes in response to the environment?
The problem:
Under-determined
A variety of approaches to extract gene A variety of approaches to extract gene interactions: interactions: simple Boolean networks simple Boolean networks
Kauffman 1969, Liang, Fuhrman, Somogyi 1998; Akutsu, Miyano, Kuhara 2000; Shmulevich et al. 2002
Chen et al. 2000dynamical models of cellular processes dynamical models of cellular processes
Reverse engineering based on wiring rules for binary elements
Xi(t+1)=Fi [X1(t),…,XN(t)]
e.g. X1(t+1)=1 if X2(t) and X3(t) are 1
Fixed points and limit cyclesFixed points and limit cycles
dXi/dt=Fi [X1(t),…,XN(t),I]
dXi/dt= - Xi/τi + Σj Tijfj(Xj) + Σjk Tijk fj(Xj)fk(Xk) + Ii(t)
XXii gene product gene product concentrationconcentrationII noise, external input noise, external input
ffii sigmoidal regulation sigmoidal regulation functionfunctionττii degradation time degradation time
pairwispairwisee
tripletriple also spatially non-uniform
Guess the coupling constants
Bayesian network models Bayesian network models
graphical Gaussian models graphical Gaussian models
relevance networks relevance networks
Friedman 2004
Toh, Horimoto 2002; Schafer, Strimmer 2005
Butte, Kahane 2000
Probability for a gene transcript to be expressed at a given level is conditionally dependent on the expression levels of only a few other genes (its ‘parents’) – a directed network
Metrix reflecting functional relationships of genes. Pearson correlation coefficient above a treshold
Direct and indirect couplings in an undirected network based on ‘conditional independence’ between two genes; correlations
Equations for probabilities
‘‘Petri nets’, ‘process algebra’, ‘grammars’, ‘rule-based Petri nets’, ‘process algebra’, ‘grammars’, ‘rule-based formalism’formalism’
Efforts to constrain the model space by incorporating additional Efforts to constrain the model space by incorporating additional information from interventions and perturbations, other types information from interventions and perturbations, other types of molecular data, or literature mining are useful on a small of molecular data, or literature mining are useful on a small scalescale
Become unwieldy with increasing gene Become unwieldy with increasing gene numbers numbers
The principle of information entropy maximization to identify the most probable network
But myriad networks can reproduce the observed data with But myriad networks can reproduce the observed data with fidelityfidelity
SOLUTION:
PNAS 103, 19033-19038 PNAS 103, 19033-19038 (2006)(2006)
Jayanth R. Banavar Department of
Physics, Penn State University
Previous: PNAS 97 (2000) & 98 (2001) also with Holter and Mitra – fundamental modes in the temporal patterns of genetic expression
Nina Fedoroff Biology Dept. &
Huck Inst., Penn State
Univ. & Santa Fe Inst.
Amos Maritan Dipartimento di
Fisica, Universita di Padova, Italy
Timothy R. Lezon Penn
State University, now
UPitt.
The universal tendency for the amount of disorder in a system to increase
1865 – Clausius introduces concept of entropy for thermodynamics
1865 – Clausius introduces concept of entropy for thermodynamics
1877 – Boltzmann defines entropy in terms of probabilities, extending it to statistical mechanics
1877 – Boltzmann defines entropy in terms of probabilities, extending it to statistical mechanics
1948 – Shannon bases his theory of information on entropy
1948 – Shannon bases his theory of information on entropy
1957 - JaynesCurrent applications:Global climateNeural networksPlate tectonicsEcosystems
1957 - JaynesCurrent applications:Global climateNeural networksPlate tectonicsEcosystems
Entropy:
Entropy maximization Entropy maximization applied to gene expression dataapplied to gene expression data
G = Number of relevant degrees of freedom (“genes”)xi = Expression level of gene i
(The state of the genome in a given photograph)),...,( 1 Gxxx
x
xxS
)(ln)(
Probability of observing the state )(x x
Find the form of
)(x
that maximizes the system entropy:
Summation over various possible values of x
Constraints imposed by Lagrange multipliers
kj
P
k
ki
xjiji xxP
xxxxx
1
1)(
x
x
)(1
P
k
ki
xii xP
xxx1
1)(
Statistical inference with minimal reliance on the form of missing information
Most robust to experimental errors and noise in data
Use polished Use polished data:data:
XXii X Xii -<X -<Xii>>=0=0
P = Number of experiments (“photographs”)xi
k = Expression level of gene i in photograph k
M=C−1
The matrix of interactions is the pseudo-inverse of the correlation matrix
jijiij xxxxC
kj
P
k
kiji xx
Pxx
1
1
P
k
kii x
Px
1
1
xMxex
21
~)( M is the matrix of pair-wise interactions between genes
(higher order interactions determined perturbationally)
the polished data
GxG But only P data points on Cij
2 3 0
1 2 0
0 0 0
2 -3 0
-1 2 0
0 0 0
A=
A =
has no inverse
A A=1 0 0
0 1 0
0 0 0
BUT
where
-
-pseudoinverse
PSEUDOINVERSE MATRIX APSEUDOINVERSE MATRIX A-
For polished data y=x
If If P<GP<G, then , then CC is singular and has only is singular and has only P-P-
11 non-zero eigenvalues non-zero eigenvalues λλk k with the with the
eigenvector veigenvector vkk. . Pseudo-inverse Pseudo-inverse in the in the
non-zero eigenspacenon-zero eigenspace..
1
1
P
k k
kj
ki
ij
vvM
The gross, general correlations indicate little about the nature of the couplings between genes.
The eigenvectors with small eigenvalues dominate the calculation of M. These eigenvectors correspond to the residual fluctuations in expression levels that remain when the common, large-scale fluctuations are removed.
P-1
G
Negative MNegative Mij ij - excitory- excitoryThe change in the expression level of either The change in the expression level of either gene leads to a similar change in the othergene leads to a similar change in the other
Positive MPositive Mij ij - - inhibitoryinhibitory The change is oppositeThe change is opposite
Diagonal Mii – self-regulation
Nodes with large diagonal values generally have strong couplings with several other nodes
xMxex
21
~)( xMxex
21
~)( Hij = -JijSiSj
M is like a Hamiltonian
spin systems
Mij ~ -Jij
4670 genes4670 genes
582582
10081008
Switching from 582 to 1008 Switching from 582 to 1008 genes in the set leaves the Mgenes in the set leaves the Mijij largely intactlargely intact
Data on 5846 Data on 5846 genesgenes
19 of them 19 of them mitochondrialmitochondrial
Adding noise to the data
Robust to Robust to noisenoise
experimental level noise experimental level noise ~5~5
582: mean + standard deviation
Yeast chemostat cultures showing 40-min metabolic Yeast chemostat cultures showing 40-min metabolic oscillationsoscillations
Selecting the genes: highest profile varianceSelecting the genes: highest profile variance
Amino acid and protein synthesis
Peroxisomal anddegradative processes
Sulfur metabolism,redox homeostasis,stress response
Mitochondrial functionand biogenesis
Cell division,DNA synthesis,cytoskeleton
Nucleotide and RNA metabolism
Carbonmetabolism
Lipid metabolism
Other or unknown
The pairwise The pairwise interaction interaction
networknetwork
110 strongest interactions
582 582 genesgenes
Top 6 strongestThe hubs:
7
red <0
blue >0excitory
Amino acid and protein synthesisPerixomal and degradative processes
Sulfur metabolism, redox homeostasis, stress response
Nucleotide and RNA metabolism
Carbon metabolism
Lipid metabolism
Other or unknown
Mitochondrial function and biogenesisCell division, DNA synthesis, cytoskeleton
HFD1 – mitochondrial membrane protein – affects spindle pole body organizationFPR1 – part of TOR nutrient signaling pathway
BMH1 – regulates retrograde signaling from mitochondrion to nucleus – intersection between carbon and nitrogen
nutrient sensing
RPP1A – ribosomal stalk protein – under TOR regulation
CMD1 – calmodulin – involved in organization of actin cyto- skeleton, endocytosis and nuclear division
ARC15 – involved in assembly of actin-based cellular
structures – required for mitochondrial motility
during mitosisUTH1 – cell wall and mitochondrial outer membrane protein, involved in mitochondrial biogenesis and rapamycin resistance
Nutrient Signaling hubs: coordinating the nucleus and mitochondria
mitochondrial mitochondrial function and function and biogenesisbiogenesis
Amino acid and Amino acid and protein protein synthesissynthesis
Cell division, DNA Cell division, DNA synthesis, synthesis, cytoskeletoncytoskeleton
PKA
excitatoryexcitatoryinhibitoryinhibitory
Retrograde signaling
PKAPKA TORTOR FPR1FPR1
Bmh1Bmh1Rpp1ARpp1A
Ribosomebiogenesis
Uth1Uth1
Mitochondrialbiogenesis
Arc15Arc15 Mitochondrialmotility
Hfd1Hfd1
Cmd1Cmd1
Cytoskeletal dynamics
Translation Transcription Autophagy Cell division
retrograde retrograde signalingsignaling
(or back signalling)(or back signalling)
Nucleus Mitochondrion
in lysosomes
Glucoseregulation Mitochondrial
membrane
Nutrient Signaling Hubs: coordinating the nucleus and mitochondria
hubs
The The pairwise pairwise
interaction interaction networknetwork
Reveals more hubs, e.g. SNO1
Sno1 encodes a subunit of a glutaminase required for pyridoxine biosynthesis and its transcription is nutrient-regulated through the TOR pathway
1008 1008 genesgenes
The previously identified hubs, like BMH1, are still well connected
No HFD1 & CMD1No HFD1 & CMD1
Level 2: cellular infrastructureLevel 2: cellular infrastructure
Rim101Rim101
Cell wallpH regulationCell wallpH regulation
Pol30Pol30
DNA replication& repair, chromatinDNA replication& repair, chromatin
Pet18Pet18
MitochondrialmaintenanceMitochondrialmaintenance
Sphingolipid biosynthesis,Ca++ homeostasisSphingolipid biosynthesis,Ca++ homeostasis
Sur1Sur1
RNA synthesis,RNA polymerase IIRNA synthesis,RNA polymerase II
Rpb8Rpb8SnoISnoI
Pyridoxine biosynthesis,Enzyme cofactorsPyridoxine biosynthesis,Enzyme cofactors
Pbp4Pbp4
RNA synthesis,PolyadenylationRNA synthesis,Polyadenylation
Bmh1Bmh1
Cmd1Cmd1
Arc15Arc15
Hfd1Hfd1
NucleusNucleus MitochondrionMitochondrion
GlucoseregulationGlucose
regulation
Translation Transcription Autophagy Cell division Translation Transcription Autophagy Cell division
Cytoskeletal dynamics
Cytoskeletal dynamics
Mitochondrialmotility
Mitochondrialmotility
Uth1Uth1Rpp1ARpp1A
Retrograde signaling
Retrograde signaling
RibosomebiogenesisRibosomebiogenesis
Level I: nutrient signalingLevel I: nutrient signaling
PKAPKA TORTOR FPR1FPR1
Cellular Infrastructure Hubs
110 weakest 110 weakest interactionsinteractions
Disjoint – like for a Disjoint – like for a random networkrandom network
The contribution of two- and three-gene interactionsThe contribution of two- and three-gene interactions
triplets
pairs
all all genesgenes
3-body 3-body obtained obtained perturbativperturbativelyely
Small strenghts of the 3-body Small strenghts of the 3-body couplingscouplings
110 strongest interactions 110 strongest interactions inferred from the top 693 inferred from the top 693 genes in the genes in the long-period long-period dataset (5 h)dataset (5 h)
Overlaps between the major gene categoriesLong – 130 genes short - 102
Different networks (but correlations ~ the same)
The hub common in both The hub common in both networksnetworks
Rpp1A – ribosomalRpp1A – ribosomal
conclusionsconclusions
A method based on the principle of entropy A method based on the principle of entropy maximization to identify the gene interaction maximization to identify the gene interaction network with the highest probability of network with the highest probability of giving rise to experimentally observed giving rise to experimentally observed transcript profiles.transcript profiles.
Analysis of microarray data from genes Analysis of microarray data from genes in Saccharomyces cerevisiae identifies a in Saccharomyces cerevisiae identifies a gene interaction network that reflects gene interaction network that reflects the intracellular communication the intracellular communication pathways that adjust cellular metabolic pathways that adjust cellular metabolic activity and cell division to the limiting activity and cell division to the limiting nutrient conditions that trigger nutrient conditions that trigger metabolic oscillations.metabolic oscillations.
The method extracts meaningful The method extracts meaningful genetic connections and hubs in the genetic connections and hubs in the network.network.
Recommended