48
Genomic Firsts 1976: RNA virus -- Phage MS2 (3 kbp) 1977: DNA virus -- Phage Φ-X174 (6 kbp) 1995: Bacteria -- Haemophilus influenzae (1.8 Mbp) 1995: Eukarya -- Saccharomyces cerevisiae (12 Mbp) 1996: Archaea -- Methanococcus jannaschii (1.6 Mbp) 2000: draft human genome -- J. Craig Venter (3 Gbp)

Genomic Firsts

  • Upload
    trevet

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

Genomic Firsts. 1976: RNA virus -- Phage MS2 (3 kbp) 1977: DNA virus -- Phage Φ-X174 (6 kbp) 1995: Bacteria -- Haemophilus influenzae (1.8 Mbp) 1995: Eukarya -- Saccharomyces cerevisiae (12 Mbp) 1996: Archaea -- Methanococcus jannaschii (1.6 Mbp) - PowerPoint PPT Presentation

Citation preview

Page 1: Genomic Firsts

Genomic Firsts

1976: RNA virus -- Phage MS2 (3 kbp)

1977: DNA virus -- Phage Φ-X174 (6 kbp)

1995: Bacteria -- Haemophilus influenzae (1.8 Mbp)

1995: Eukarya -- Saccharomyces cerevisiae (12 Mbp)

1996: Archaea -- Methanococcus jannaschii (1.6 Mbp)

2000: draft human genome -- J. Craig Venter (3 Gbp)

Page 2: Genomic Firsts

Genome Sequencing Explosion

Page 3: Genomic Firsts

Genome Sequencing Explosion

Page 4: Genomic Firsts

Three domains of life

16S rRNA sequences

Woese 1987

Page 5: Genomic Firsts

Global phylogeny of 191 organisms derived from 31 conserved protein genes.

Tree is fairly well resolved and agrees mostly with rRNA tree.

Ciccarelli et al (2006) Science

Page 6: Genomic Firsts

~1000 bp/gene

short intergenic regions

Genomic streamlining in prokaryotes

Proteobacteria (from Higgs & Attwood)

Page 7: Genomic Firsts

Hou and Lin – PLoS ONE 2009

Efficiency in the Genome

Small organisms care about DNA replication time.No wasted spaceHigh coding density (85-90%)

1 gene per 1000 bases in prokaryotes

Haemophilus influenzae1762 genes in 1.8 Mb

Human23000 genes in 3080 Mb

Eukaryotic genomes have lots of transposons and repetitive sequences.

The larger organelle genomes also have a greater fraction of non-coding sequence, but small animal mitochondria fit the trend of the bacteria.

Page 8: Genomic Firsts

large variation in genome size between bacteria

Sorangium cellulosum (14000kb)

11599 codong sequencesSoil bacterium

Tremblaya princeps (140kb)121 coding sequencesEndosymbiont in insect cells

Page 9: Genomic Firsts

McCutcheon and MoranNature Reviews (2012)

Page 10: Genomic Firsts

McCutcheon and MoranNature Reviews (2012)

Reduced size genomes evolve independently in different lineages. Usually on long branches = fast sequence evolution.

Page 11: Genomic Firsts

Subdivisions of proteobacteria identified using 16S rRNA originally

proteobacteria- Agrobacterium tumefaciens - genetic engineering- Rickettsia conorii – ticks – spotted fever- Rickettsia prowazeckii – lice – typhus

proteobacteria-Neisseria meningitidis - N. gonorrhoea

proteobacteria-Escherichia coli – commensal – lab study- Yersinia pestis – plague- Haemophilus influenzae – respiratory pathogen. (First bacterial genome)- Xanthomonas / Xylella – plant pathogens

proteobacteria- Helicobacter pylori – intestinal infections

Considerable change in GC content among related genomes.Short genomes are derived from longer genomes – lots of deletions in cases of intracellular parasites and endosymbionts.

Page 12: Genomic Firsts

Pathogens and intracellular bacteria have low GC content –May be a result of metabolic cost of synthesis of G and C being higher (Rocha and Danchin, 2002)

These genomes are also small – use it or lose it!This may explain correlation of GC content with genome size

It has also been argued that there is a general mutation bias towards AT, and that selection for GC keeps this from going to very low GC in most organisms. This stabilizing selection might be weaker in smaller intracellular organisms. Therefore smaller genomes have more AT.

...However, two extremely small genomes break the trend. Maybe these have a mutation bias in the other direction (towards GC) – this is not yet measured.

Page 13: Genomic Firsts

Circular representation of the R. conorii genome (strain Malish 7). The outermost circle indicates the nucleotide positions. The second and third circles locate the ORFs on the plus and minus strands, respectively. Function categories are color-coded [see Web fig. 1 (10)]. The fourth and fifth circles locate tRNAs. The locations of three rRNAs are indicated by black arrows. The sixth and seventh circles indicate the locations of repeats. The eighth circle shows the G-C skew (G- C/G+C) with a window size of 10 kb. The region locally breaking the genome colinearity with R. prowazekii is indicated by a shaded sector. The four major genomic segments involved in this rearrangement are colored in blue, yellow, green, and red. Ogata et al – Science (2001)

Page 14: Genomic Firsts

Illustration of the colinearity. Three distinct segments from the R. conorii genome aligned with the homologous segments from the R. prowazekii genome are shown. These segments were chosen to show three types of gene alteration: split genes in R. prowazekii (top), a split gene in R. conorii (middle), and a gene remnant in R. prowazekii (bottom).

Page 15: Genomic Firsts

Comparison of genomes of related organisms shows synteny –but relatively rapid evolution of gene order

Mycoplasma genitalium and M. pneumoniae

Each dot shows a high-scoring BLAST match between a gene of one species and a gene of the other species

Page 16: Genomic Firsts

Gene gain via Horizontal Gene Transfer(mostly prokaryotes)

Page 17: Genomic Firsts

Gene gain via Gene Duplication (mostly eukaryotes)

Page 18: Genomic Firsts

Genomic streamlining in symbionts and pathogens

McCutcheon & Moran (2012)

Page 19: Genomic Firsts

Free-living bacteria

•Selection to maintain reasonably large set of functional genes.•Gene acquisition balances gene loss•HGT mediated by viruses and plasmids gain of functions•Some cells are competent for DNA uptake (transformation)•Homologous recombination can eliminate some deleterious mutations

Host-restricted parasites and endosymbionts

•Fewer essential genes because of environment provided by host•Smaller effective population size (bottlenecks) •Reduced selection against slightly deleterious mutations & Reduced opportunity for homologous recombination faster sequence evolution, reduced functionality and stability of proteins (need for high level of chaperones)•Reduced selection against the deletion of slightly beneficial genes, inherent bias toward deletions, & reduced opportunity to acquire genes horizontally gene loss much faster than gene gain.

Page 20: Genomic Firsts

Balance between selection and mutation in a large population

Fitness w = (1-s)k

nk = number of individuals with k deleterious mutations

N = total population size

U = number of deleterious mutations per genome per generation

Assume no advantageous mutations. Back-mutations are very rare.

For a very large population, selection balances mutation.

There is a stationary state:

)/exp(!

)/(sU

k

sU

N

n kk

Page 21: Genomic Firsts

Muller’s Ratchet –Acumulation of deleterious mutations in asexual species with small populations

If N is fairly small, then the number of individuals in the fittest class, n0, can be very small.

This fluctuates, and eventually goes to zero.If there are no back-mutations, the fittest class is gone forever.This is one click of the ratchet.

fitness

More and more deleterious mutations with time until “mutational meltdown” kills the species

Page 22: Genomic Firsts

Muller’s Ratchet is stopped by recombination

Initial population After one click of the ratchet, every chromosome has at least one deleterious mutation, but they don’t all have the same one.

mutation

recombination

Cross-over can recreate the fittest class. This is much more likely than back-mutation in sexual species.

Page 23: Genomic Firsts

Muller’s Ratchet and the Evolution of Sex

• Two-fold cost of males in sexual species must be a big benefit of sex to outweigh this cost

• A few parthenogenetic species are derive from sexual ancestors. These do not do well in the long term.

• The ability of recombination to stop Muller’s ratchet is one large advantage of sex, and is one possible reason for the prevalence of sexual species.

• Host-parasite co-evolution is probably another important reason.

• Maybe most free-living bacteria should be thought of as sexual, not asexual.• Uptake of fragments of DNA from similar cells gives the possibility of

homologous recombination. This functions like sex in eukaryotes. It can remove deleterious mutations.

• Uptake of DNA from distantly related organisms (Horizontal Gene Transfer) can lead to the spread of beneficial genes

• When bacteria become obligate parasites or endosymbionts, they become truly asexual.

• Consequences are gene loss and accumulation of deleterious mutations.

Page 24: Genomic Firsts

Global phylogeny of 191 organisms derived from 31 conserved protein genes.

Tree is fairly well resolved and agrees mostly with rRNA tree.

Ciccarelli et al (2006) Science

Page 25: Genomic Firsts

Do prokaryotic taxa mean anything?

-Proteobacteria?

Enterobacteriaceae?

E. coli?

Need to consider Eukaryotes separately for 2 reasons.(i)Almost everyone believes there is a tree for Eukaryotes.(ii)Origin of Eukaryotes is a later unique event that is very likely not tree-like.

Page 26: Genomic Firsts

Criticisms of the Prokaryotic Tree of Life (Bapteste et al. 2009)

“Belief in the universal tree of life is stronger than the evidence from genomes that supports it.”

1.Circularity of tree methods – Phylogenetic methods always produce a tree of some kind.2.Statistical problems – weak signals from many individual genes. Failure to reject the consensus tree is not necessarily support for it.3.Systematic biases in phylogenetic methods.4.Large-scale exclusion of conflicting data. Core genes not necessarily representative of a species tree.5.Closely related species may exchange genes more frequently.6.Unrelated species in similar niches may exchange genes more frequently. Convergent evolution?

This is an interesting paper but take it with a pinch of salt

Page 27: Genomic Firsts

Spectrum of Opinions

1. The tree of rRNA and translational genes is the species tree. Other genes appear to give different trees just because of noise and phylogenetic errors. HGT is unimportant.

2. The tree of rRNA and translational genes is the best information we have about the tree of cell divisions and speciations. Most genes follow this tree most of the time, even if most genes may have been horizontally transferred at some point in their history.

3. The tree of rRNA and translational genes tells us only about the history of these genes, and is therefore not particularly important. There are other essential groups of genes that follow other evolutionary paths. We need a network representation, not a single tree.

4. HGT is so frequent that all genes follow different histories. Therefore tree-building is a waste of time. We only get results that look like trees because our methods are designed to produce trees.

Page 28: Genomic Firsts

Gene Content Variation among E. coli genomes. Evidence for horizontal transfer –

Welch et al (2002).

Core genome = intersection of setsPangenome = union of sets

Page 29: Genomic Firsts

Core genome

Pan-genome

Rasko et al (2008) J. Bacteriol.

Core and Pan-genome of E. coli

Page 30: Genomic Firsts

Rapid Gain and Loss of genes among closely related genomes of Bacillus

Hao and Golding (2006) Genome Research

• Assumes a tree to begin with (many conserved genes)• Only two of the patterns shown require more than one character change• Does not distinguish HGT from innovation

Page 31: Genomic Firsts

Gao and Gupta (2007) BMC Genomics

Tree of Archaea based on signature genes

• Signature genes are those that are shared by all members of a group and are not posessed by any other speies.• Can the tree be constructed from gene content alone?• Does not show events that do not fit the hierarchical tree.• What about transfers within niches? Groups of genes confer metabolic activity

Page 32: Genomic Firsts

Phylogeny of three domains of life based on shared gene contentSHOT – Korbel et al (2002)

S = fraction of genes that are orthologues between two speciesd = -lnSInput d to NJ method

Major domains and groups of bacteria are obtained the same as for rRNADoes not work for very reduced genomes of parasites & symbionts

Page 33: Genomic Firsts

Always possible to explain a presence/absence pattern by either multiple deletions or by horizontal transfer.

Examples from Dagan et al (2007)(a)Loss only, (b) Single origin, (c) Origin + 1 HGT, (d) Orign + 2 HGTs

The problem is, we don’t know the ratio of HGT to deletions….

Page 34: Genomic Firsts

If HGT is disallowed or penalized too much, then ancestral genomes must have been far larger than any current genomes.

If HGT is too frequent then ancestral genomes are apparently too small.

This helps to find a moderate value for the ratio of HGT to deletions.

Reconstructing ancestral genomes using parsimony (Dagan et al 2007)

Page 35: Genomic Firsts

Collect genomesfrom NCBI

All-vs-All BLASTP

Single-linkage clustering

Global amino acid alignment

Phylogenetic reconstruction using Maximum Likelihood

Identification of universalsingle-copy clusters

Concatenation of alignments

Method of Collins & Higgs (2012)

Page 36: Genomic Firsts

Core and Pangenomes

Closed – means that pangenome size tends to a maximum as number of genomes increases

Open – means that pangenome keeps increasing as you add new genomes

Fitting the data suggests that the pangenome is open for most groups of bacteria and that Gpan (n) increases in proportion to ln(n).

This is expected on a tree like a coalescent (a). On a star tree (b), it would increase linearly with n.

Page 37: Genomic Firsts

9 Prochlorococcus genomesBaumdicker et al (2009)

293 Bacterial genomesLapierre and Gogarten (2009)

Gene Frequency Spectra

G(k) is the number of genes found in k genomes from a group of n.

There is a U-shape: many genes found in only 1 or 2 genomes, a certain number of core genes in (almost) all n, and fewer genes in between.

The U-shape applies at all scales from species to the full bacerial domain.

Collins and Higgs (2012)

Page 38: Genomic Firsts

Core, Shell and Cloud genes(Koonin and Wolf – 2012)

Page 39: Genomic Firsts

Collins and Higgs (2011)

The role of gene duplication:Gene family size distributions

Page 40: Genomic Firsts

22

3

1

2

0u

33

4

etc.

Modelling duplication and deletion of genes

Page 41: Genomic Firsts

Origin of Mitochondria

Sequence similarity to Rickettsia – within proteobacteria

Also conserved gene order between Rickettsia and the mitochondrial genome of the protist Reclinomonas (one of the largest mitochondrial genomes).

Page 42: Genomic Firsts

Gene order and phylogeny for Hodgkinia (very small endosymbiont – see assignment 3)

Shows it has evolved independently of the lineage leading to Rickettsia and mitochondria

Derived change in Rickettsia not shared with Hodgkinia

Hodgkinia placed within Rhizobiales –raises questions of GC content bias and long branch attraction

Page 43: Genomic Firsts

Long Branch Attraction - An artefact of phylogenetic methods that tends to put unrelated species with rapid evolution together.

It can also draw long branch species closer to the root, because they are attracted to the outgroup.

Rooting the tree of life using ancient gene duplications

Page 44: Genomic Firsts

Long Branch attraction and the tree of rRNA(Gribaldo and Philippe 2002)

Typical tree in older papers shows many lineages on long branches close to the roots of Bacteria and Eukarya

Were ancestral organisms hyperthermophiles?

Are there any eukaryotes that never had mitochondria?

Root is usually inferred from ancient gene duplications – eg EFTu and EFG

Page 45: Genomic Firsts

After correcting for long branch attraction...

Microsporidia are now related to fungi. They have small genomes with lots of gene loss and rapid sequence evolution.

Current thought says there may never have been eukaryotes without mitochondria. Eukaryotes evolved by fusion of an protobacterium with an archaeon. The event that created the mitochondria also created the nucleus.

Phylogeny of major bacterial groups is still uncertain. Deduction of temperature at base of tree is difficult. Most papers still argue for hyperthermophiles at common ancestor of archaea and bacteria.

Root is still most likely here, although this paper questions it.

Seems strange! This would make prokaryotes monophyletic after all

Page 46: Genomic Firsts

Growth temperature mapped onto the rRNA tree

Or was there a mesophilic origin after all?

Page 47: Genomic Firsts

TA Williams, et al. Nature 504, 231-236 (2013) doi:10.1038/nature12779

Competing hypotheses for the origin of the eukaryotic host cell.

Standard picture:

The root is on the bacterial branch

There is a common ancestor of archaea amd eukaryotes

Eocyte hypothesis:

The root is (still) on the bacterial branchEukaryotes fall within the archaea. They have a common ancestor with Eocytes/Crenarchaeota.

Only Two Domains!

Page 48: Genomic Firsts

Maybe Giant Viruses are a Fourth Domain?RNA polymerase sequences

from Global Ocean SurveyGOS

Wu et al. (2011)