39
Comparative Microbial Genomics group Center for Biological Sequence analysis Department of Systems Biology, Technical University of Denmark Dave Ussery 4 th annual workshop on Comparative Microbial Genomics and Taxonomy Petropolois, Brazil Lecture #2 Tuesday, 4 August, 2009 16S rRNA trees & BLAST matrices

16S rRNA trees & BLAST matrices - DTU Bioinformatics · [email protected] Cc: [email protected], [email protected] Dansk titel: Comparative microbial genomics for Staphylococcus

Embed Size (px)

Citation preview

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    Dave Ussery4th annual workshop on Comparative Microbial Genomics and Taxonomy Petropolois, Brazil

    Lecture #2 Tuesday, 4 August, 2009

    16S rRNA trees & BLAST matrices

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    46

    Universal phylogenetic tree showing the relationships among Bacteria (e.g., most bacteria and blue-green algae), Archaea (e.g., methanogens and halophiles) andEucarya (e.g., protists, plants, animals, and fungi).

    rRNA tree

    corn

    HUMANS

    Vibrios

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    From: [email protected]

    Subject: Re: Bachelor-projektDate: 4 August 2009 07:33:18 GMT-03:00To: [email protected]: [email protected], [email protected]

    Dansk titel:Comparative microbial genomics for Staphylococcus aureus

    Engelsk titel:Comparative microbial genomics for Staphylococcus aureus

    ECTS Point:20

    Startdato:31 august 2009

    Afslutningsdato:December 2009

    Beskrivelsessprog:Engelsk

    Vejleder(e):David W. Ussery

    Vejleder (ekstern):Henrik Westh

    Studerende (studie-nummer):s072486

    Samarbejdsinstitutter:Klinisk mikrobiologisk afdeling, Hivdovre Hospital

    Samarbejdsform:Bachelorprojekt

  • www.cbs.dtu.dk/services/GenomeAtlas/ 868 projects found

    as of 4 August, 2009

    Vibrio

    Vibrio

    Vibrio

    Vibrio

    Vibrio

    Clostridium

    Clostridium

    Clostridium

    Clostridium

    Bacillus

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    Transc

    riptio

    nPe

    rcen

    tage

    of g

    enom

    eN

    umbe

    r of g

    enom

    es

    Repli

    cation

    ,

    recom

    binati

    on

    and re

    pair S

    ignal

    transd

    uction

    mecha

    nisms Ce

    ll wall

    or

    cell m

    embra

    ne

    bioge

    nesis

    V. choleraeV. fischeriV. parahaemolyticusV. vulnificus YJ016

    P. profundumAquaticHost-associatedTerrestrial

    Cell m

    otility

    Post-tr

    anslat

    ional

    modifi

    cation

    and

    protei

    n turn

    over

    Energ

    y prod

    uction

    and co

    nversio

    n

    0

    01 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 20 40 60 80 100 120 140 160

    20

    40

    60

    Num

    ber o

    f gen

    omes

    0

    20

    40

    60

    2

    4

    6

    8

    10a

    b c

    Number of 16S rRNAs Number of tRNAs

    COGs

    !"#$%&'()*$+(,()$-).$/"(%()$/&))#/%(&)

    The two main virulence factors of V. cholerae, cholera toxin (CT), which is encoded on the filamentous cholera toxin phage (CTX) and causes a profuse watery diarrhoea, and the toxin co-regulated pilus (TCP), which is an essential intestinal coloniza tion factor and the host receptor for CT, were both acquired by a subset of isolates by horizontal gene transfer (HGT)44,45. In the aquatic environment, V. cholerae has been found in association with zooplankton and phytoplankton, on the chitinous exoskeletons and moults of copepods (crustaceans) and in the mucilaginous sheaths of blue-green algae4649. The mannose-sensitive haemag-glutinin (MSHA), which is encoded by all the sequenced Vibrionaceae genomes, is involved in V. cholerae adherence to zooplankton50. Recent studies have described a role for a

    chitin-binding protein, ORF VCA0811, from V. cholerae in attachment to both the chitinous exoskeleton of zooplankton and human epithelial cells, by binding to the sugars present on both surfaces51,52. This protein is found in the genomes of all sequenced Vibrionaceae and might also be involved in the environmental persistence of these bacteria.

    V. fischeri exists naturally either in a free-living planktonic state or as a sym-biont of the luminescent bobtail squid, E. scolopes53. V. fischeri provides the source of luminescence, which the squid can use for camouflage, protection against preda-tors, and which might also be involved in mating8. The bacteria only luminesce in this symbiotic state in the light organ and not when they are free living. The interaction of V. fischeri with the light organ of squid

    involves several genes, including mot, fla, flr and rpoN8. Colonization is aided by the secretion of a mucoid substance above the pores of the light organ, which traps the V. fischeri, and MSHA is required for subsequent colonization54,55.

    Although, at first glance, the human pathogen V. cholerae and the squid symbiont V. fischeri have markedly different lifestyles, there are suggestive similarities between both species. In particular, the abundance of chitinase genes in their genomes reflects the high degradative ability attributed to vibrios56. Chitin is a biopolymer of N-acetylglucosamine and, after cellulose, is the most abundant carbohydrate polymer; it is present in large quantities in the sea, being a constituent of the exoskeletons of crus-taceans and zooplankton. Chitin induces TCP production, which, along with MSHA, allows V. cholerae to colonize the exoskel-etons of crustaceans and zooplankton57,58. V. cholerae, in addition to V. fischeri, can use chitin as a carbon and nitrogen source. Homologues of TCP were identified in the V. fischeri genome sequence5. Chitin might therefore have a similar role in V. fischeri, stimulating TCP-mediated biofilm forma-tion or perhaps even facilitating colonization of the squid.

    The diversity of TCP-mediated bacteria host interactions in the aquatic niche has led to speculation on the role of CT in the environment59,60. CT might function as an osmoregulator for crustaceans by the removal of salts from the cell by its intrinsic function. Briefly, CT increases Cl secretion and reduces Na+ absorption, which could be advantageous to the crustacean as it moves into environments of increasing salinity60. As V. cholerae is associated with copepods, V. cholerae might establish a symbiotic relationship with these crustaceans, obtain-ing a suitable place (chitinous exoskeleton) to attach and feed, and providing the host with a powerful osmoregulator (CT). Homologues of the CTX prophage were identified in the sequenced genome of V. fischeri, however this prophage did not carry the ctxAB genes5. CTX prophage lacking the ctxAB genes were also observed in V. cholerae, indicating the presence of a precursor CTX phage that acquired the CT genes from a still-unknown source61.

    Another pathogenicity factor in V. cholerae is neuraminidase, which is encoded by the nanH gene on Vibrio pathogenicity island-2 (VPI-2)62. Neuraminidase cleaves mucin from intestinal cells, unmasking GM1 gangliosides, the receptors for CT, and releasing sialic acid, a carbon source63,64.

    !"#$%&'(')'!"#$%"&%'#'#(&)$&*"+'#"&,'-.'$#/0"/"&01/./0)".'2)'023&/')'*"+,%"-$,"./'.0'12$+,&%+'

    .0'.%,3.2.#.$+'#%.$4+'5678+9':;./#',3&'':/>',&%%&+,%":2'+4&1"&+C'D3&'>:,:':E:"2:-2&':,'F6GH'=&%&'$+&>',.'#&/&%:,&':'>"+,%"-$B

    ,"./'13:%,'"/'=3"13',3&'#&/&'1./,&/,'0.%':/I'4:%,"1$2:%'0$/1,"./'1:/'-&'1.;4:%&>':1%.++',3&'

    #&/$+':/>'"/',3&'1./,&?,'.0',3&'J%.,&.-:1,&%":':/>'-:1,&%":':+':'=3.2&C'-A'0')'*"+,%"-$,"./'.0'

    %KFL':/>',KFL'#&/&+':;./#',3&'+&@$&/1&>'-:1,&%":2'#&/.;&+C'D3&'+&@$&/1&>'E"-%".+':%&'

    "/>"1:,&>'"/'I&22.=C'D3&'IB:?"+'%&4%&+&/,+',3&'/$;-&%'.0'+&@$&/1&>'#&/.;&+'1./,:"/"/#':'

    4:%,"1$2:%'/$;-&%'.0'%KFL'.%',KFL'#&/&+'5?B:?"+9C'

    !"#$!"%& '("$

    0$| ADVANCE ONLINE PUBLICATION &4443#/)5."30$%6."7'"426%'0.$

    Vibrios Vibrios

    NATURE REVIEWS | MICROBIOLOGY, 4:697-704, (2006).

    The genomic code: inferring Vibrionaceae niche specialization

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    rRN

    ArR

    NA

    rRN

    A

    rRNA

    rRN

    ArR

    NA

    rRN

    A

    rRN

    A

    rRNA

    rRN

    A

    Ori

    gin

    0M

    0.5M1M

    1.5M2

    M2.

    5M

    BASE ATLAS

    Center for Biological Sequence Analysishttp://www.cbs.dtu.dk/

    G Contentfixavg

    0.00 0.23

    A Contentfixavg

    0.00 0.45

    T Contentfixavg

    0.00 0.45

    C Contentfixavg

    0.00 0.23

    Annotations:

    CDS +

    CDS -

    rRNA

    tRNA

    AT Skewfixavg

    -0.12 0.12

    GC Skewfixavg

    -0.10 0.10

    Percent ATdevavg

    0.67 0.75

    Resolution: 1120

    C. tetani E88 2,799,251 bp

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    0M0.5M

    1M

    1.5M

    2M2.5

    M

    3M

    3 .5M

    P. profundum SS9 4,085,304 bp

    BASE ATLAS

    Center for Biological Sequence Analysishttp://www.cbs.dtu.dk/

    G Contentdevavg

    0.17 0.25

    A Contentdevavg

    0.26 0.32

    T Contentdevavg

    0.26 0.32

    C Contentdevavg

    0.17 0.25

    Annotations:

    CDS +

    CDS -

    rRNA

    tRNA

    AT Skewfixavg

    -0.03 0.03

    GC Skewfixavg

    -0.05 0.05

    Percent ATfixavg

    0.55 0.61

    Resolution: 1635

    - rR

    NA

    - rR

    NA

    - rRN

    A

    - rRN

    A

    - rRN

    A

    - rRN

    A

    - rRN

    A

    - rRNA

    - rRNA

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    0M0.5M

    1M

    1.5M

    2M2.5

    M

    3M

    3 .5M

    P. profundum SS9 4,085,304 bp

    GENOME ATLAS

    Center for Biological Sequence Analysishttp://www.cbs.dtu.dk/

    Intrinsic Curvaturedevavg

    0.18 0.23

    Stacking Energydevavg

    -8.05 -7.36

    Position Preferencedevavg

    0.14 0.16

    Annotations:

    CDS +

    CDS -

    rRNA

    tRNA

    Global Direct Repeatsfixavg

    5.00 7.50

    Global Inverted Repeatsfixavg

    5.00 7.50

    GC Skewfixavg

    -0.05 0.05

    Percent ATfixavg

    0.55 0.61

    Resolution: 1635

    - rR

    NA

    - rR

    NA

    - rRN

    A

    - rRN

    A

    - rRN

    A

    - rRN

    A

    - rRN

    A

    - rRNA

    - rRNA

  • DNA, and comparative genome analysis revealed preferential insertion of novel DNA at three tRNA sites among the Vibrionaceae: tRNASer, tRNAMet and tmRNA, with tmRNA being a generally popular insertion site for novel DNA in bacteria1113.

    !"#$%&$'())(*$)+,-$#-(

    An interesting although not unique feature in the genetic organization of the Vibrionaceae is the presence of two chromosomes of unequal size14,15 (TABLE 1). This feature is also found in other Proteobacteria, with another example from the -Proteobacteria being Pseudoaltermonas haloplanktis. In the Vibrionaceae, the larger chromosome, chromosome 1, generally carries most of the essential genes whereas the smaller chromo-some, chromosome 2, carries more species-specific genes. Many studies have focused on this distribution of functional genes between the large and small chromosomes2,4,16,17.

    The presence of genes encoding proteins involved in several essential metabolic and regulatory pathways on chromosome 2 demonstrates that this smaller chromosome is essential for growth and viability24. Indeed,

    it has been proposed that a two-chromosome genome structure is advantageous to vibrios under the specific environmental condi-tions that they encounter in their life cycles, and might contribute to the environmental diversity in the genus14,18. It has been pro-posed that chromosome 2 might have been acquired as a megaplasmid, probably before the diversification of the Vibrionaceae2,19.

    The small chromosome has probably evolved an as-yet-undefined specialized function leading to selective pressure to maintain independent replication2,14,19. The presence of two replicons might also be an important factor in the ability to replicate rapidly and therefore add to the evolutionary fitness of these aquatic bacteria. Although the origins of replication (ori) of both chro-mosomes are different, they share common initiation factors and initiate replication synchronously19,20.

    Our analysis of the Vibrionaceae using the Artemis Comparison Tool (ACT) revealed that the evolution of their genome structure is marked by several intra- and interchromosomal rearrangements, but the overall gene content and position in

    chromosome 1 is better conserved than in the smaller chromosome 2 (REFS 21,22) (FIG. 3). There is some conservation of gene order and sequence homology between chromosome 2 of V. vulnificus YJ016 and V. parahaemolyticus RIMD 2210633, how-ever, limited sequence similarity was identi-fied between the other species examined at chromosome 2 using ACT (FIG. 3).

    .)*(&&$%-$)+($,/0,)%1$(-2%*#-3(-)

    Survival in any particular ecosystem requires a microorganism to be equipped with a battery of adaptive response mechanisms to meet demands such as nutrient limitation, UV stress, temperature fluctuations, proto-zoan predation, viral infection and changes in salinity. Many of these responses are mediated through quorum sensing, a global phenomenon first identified in luminescent V. fischeri and Vibrio harveyi23. The signalling systems identified among the Vibrionaceae are divergent and respond to different environmental stimuli that are related to the interaction between a specific species and its environment or host24. The LuxI/R system, the paradigm for quorum sensing

    !"#$%&'&(&4(-#3($1#35#&%)%#-$,3#-6$3(3'(*&$#7$)+($7,3%89$:%'*%#-,1(,(

    !"#$"%&'()')**+,)(")+ -)&%.)*+"/)*01#2**

    3*-4 5$%6)"&+ 7$"8"&*09#2 :;.#)$*%+

    :;.#)$*%+*

    :;.#)$*%.=-"+&;

    !"#$!"%& '("$

    NATURE REVIEWS | 45678958:8;

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    Vibrio cholerae N16961 Vibrio cholerae O395 TIGR Vibrio cholerae O395 TEDA Vibrio cholerae MO10 Vibrio cholerae V52 Vibrio cholerae MZO-2 Vibrio cholerae 1587 Vibrio cholerae 2740-80 Vibrio cholerae AM-19226 Vibrio cholerae MJ-1236 Vibrio cholerae M66-2 Vibrio cholerae 12129(1) Vibrio cholerae VL426 Vibrio cholerae TM 11079-80 Vibrio cholerae B33 Vibrio cholerae BX330286 Vibrio cholerae TMA 21 Vibrio cholerae RC9

    Vibrio campbellii AND4 Vibrio harveyi ATCC BAA-1116 Vibrio fischeri ES114 Vibrio fischeri MJ11 Vibrio vulnificus CMCP6 Vibrio vulnificus YJ016 Vibrio shilonii AK1 Vibrio sp. Ex25 Vibrio sp. MED222 Vibrio splendidus LGP32 Vibrio parahaemolyticus 16 Vibrio parahaemolyticus RIMD2210633Aliivibrio salmonicida LFI1238 Photobacterium profundum SS9Pelagibacter ubique HTCC1062

    Vibrios

    32 Vibrio genomes

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    27 species , 618 sitesM

    aximum

    Likelihood (ln(L)=-1375.065)50 bootstrap replicates

    0.006

    V. angustum S14

    P. profundum SS9

    V. alginolyticus 12G01

    V. vulnificus CM

    CP6

    V. vulnificus YJ016

    V. parahaemolyticus R

    IMD

    2210633

    V. parahaemolyticus A

    Q3810

    V. sp. Ex25

    V. salmonidica 25contigs

    V. fischeri ES114

    V. splendidus 12B01

    V. sp. MED

    222

    V. cholerae O395

    V. cholerae MZO

    -2

    V. cholerae NC

    TC8457

    V. cholerae V51

    V. cholerae 623-39

    V. cholerae 2740-80

    V. cholerae RC

    385

    V. cholerae N16961

    V. cholerae MO

    10

    V. cholerae MZO

    -3

    V. cholerae V52

    V. cholerae MA

    K757

    V. cholerae B33

    V. cholerae 1587

    V. cholerae AM

    -19226

    16S rRNA tree

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    R E S E A R C H L E T T E R

    Thenonrandommicroheterogeneityof16S rRNAgenes inVibriosplendidusmayreect adaptation toversatile lifestylesSigmund Jensen1, Petter Frost2 & Vigdis L. Torsvik1

    1Department of Biology, University of Bergen, Bergen, Norway; and 2Institute of Marine Research, Bergen, Norway

    Correspondence: Sigmund Jensen,

    Department of Biology, University of Bergen,

    PO Box 7800, Jahnebakken 5, N-5020

    Bergen, Norway. Tel.:147 5558 2661; fax:

    147 5558 9671; e-mail:

    [email protected]

    Present address: Petter Frost, Intervet

    Norbio As, Thormhlens gate 55, N-5008

    Bergen, Norway.

    Received 23 January 2009; accepted 4 March

    2009.

    First published online 1 April 2009.

    DOI:10.1111/j.1574-6968.2009.01567.x

    Editor: Aharon Oren

    Keywords

    Vibrio splendidus ; 16S rRNA;

    microheterogeneity; phylogeny; lateral gene

    transfer (LGT).

    Abstract

    16S rRNA molecules in a microbial strain can differ due to nucleotide variationbetween their genes. This is a typical trait of fast-growing bacteria to cope withdifferent niches. We investigated characteristics of 16S rRNA genes in Vibriosplendidus strain PB1-10, from the normal flora of Atlantic halibut. Sequencing of16S rRNA gene clones detected 35 variable positions in a total of 13 different genecopies. More than two-thirds of the substitutions occurred in regions correspond-ing to helix H6 and helix H17 of the 16S rRNA molecule. Possible recombinationbetween these helixes in related bacteria (Vibrio, Photobacterium, Colwellia) fromsimilar environments impacts 16S rRNA-based phylogeny of V. splendidus. Weargue that these nonrandom modifications are maintained to provide a fine-tuning of the ribosome function to optimize translation machinery performanceand ultimately bacterial niche fitness.

    Introduction

    Ribosomal RNA (rRNA) provides ribosomes with shape andfunction and plays a central role in protein synthesis of alllife. rRNA genes are also important phylogenetic markers(Woese, 1987; Amann et al., 1995). Microheterogeneity in16S rRNA originates from duplication and mutation (Uedaet al., 1999) and is opposed by conversion and concertedevolution (Lan & Reeves, 1998; Liao, 2000). Variabilitycauses many single-species GenBank sequences to be moredifferent than would be expected from sequencing errors(Clayton et al., 1995). Typically, o 1% divergent 16S rRNAgene copies occur in prokaryotic cells (Coenye & Van-damme, 2003; Acinas et al., 2004). Extreme variation appliesto Clostridium paradoxum and Photobacterium profondumwith 15 gene copies each (Rainey et al., 1996; Vezzi et al.,2005) and to Thermoanaerobacter tengcongensis with 11.6%copy variability (Acinas et al., 2004). The advent of genomesequencing has provided direct information about intraspe-

    cies microheterogeneity. Completed Vibrio genome se-quences show the following 16S rRNA gene copy numberand variability: Vibrio cholerae (8, 1.0%), Vibrio splendidus(8, 1.5%), Vibrio vulnificus (9, 0.9%), Vibrio parahaemolyti-cus (11, 0.6%), Vibrio harveyii (11, 1.2%) and Aliivibriofischeri (12, 1.1%) (Acinas et al., 2004; http://www.ncbi.nlm.nih.gov/).

    Vibrio bacteria are of major importance for the miner-alization of organic material in the sea and for causingdiseases, which lead to serious health and economicalproblems (Reen et al., 2006). The free-living, symbiotic andpathogenic vibrios are closely related, and their lifestyle isdifficult to disclose. Over 1000 different genotypes ofV. splendidus have been found as part of coastal plankton(Thompson et al., 2005b) and there is currently 117sequenced V. splendidus 16S rRNA genes in GenBank(January 2009). Analysis of 16S rRNA gene variabilityrevealed microheterogeneity in every Vibrio strain examinedfrom culture collections and coastal seawater (Moreno et al.,

    FEMS Microbiol Lett 294 (2009) 207215 c! 2009 Federation of European Microbiological SocietiesPublished by Blackwell Publishing Ltd. All rights reserved

    R E S E A R C H L E T T E R

    Thenonrandommicroheterogeneityof16S rRNAgenes inVibriosplendidusmayreect adaptation toversatile lifestylesSigmund Jensen1, Petter Frost2 & Vigdis L. Torsvik1

    1Department of Biology, University of Bergen, Bergen, Norway; and 2Institute of Marine Research, Bergen, Norway

    Correspondence: Sigmund Jensen,

    Department of Biology, University of Bergen,

    PO Box 7800, Jahnebakken 5, N-5020

    Bergen, Norway. Tel.:147 5558 2661; fax:

    147 5558 9671; e-mail:

    [email protected]

    Present address: Petter Frost, Intervet

    Norbio As, Thormhlens gate 55, N-5008

    Bergen, Norway.

    Received 23 January 2009; accepted 4 March

    2009.

    First published online 1 April 2009.

    DOI:10.1111/j.1574-6968.2009.01567.x

    Editor: Aharon Oren

    Keywords

    Vibrio splendidus ; 16S rRNA;

    microheterogeneity; phylogeny; lateral gene

    transfer (LGT).

    Abstract

    16S rRNA molecules in a microbial strain can differ due to nucleotide variationbetween their genes. This is a typical trait of fast-growing bacteria to cope withdifferent niches. We investigated characteristics of 16S rRNA genes in Vibriosplendidus strain PB1-10, from the normal flora of Atlantic halibut. Sequencing of16S rRNA gene clones detected 35 variable positions in a total of 13 different genecopies. More than two-thirds of the substitutions occurred in regions correspond-ing to helix H6 and helix H17 of the 16S rRNA molecule. Possible recombinationbetween these helixes in related bacteria (Vibrio, Photobacterium, Colwellia) fromsimilar environments impacts 16S rRNA-based phylogeny of V. splendidus. Weargue that these nonrandom modifications are maintained to provide a fine-tuning of the ribosome function to optimize translation machinery performanceand ultimately bacterial niche fitness.

    Introduction

    Ribosomal RNA (rRNA) provides ribosomes with shape andfunction and plays a central role in protein synthesis of alllife. rRNA genes are also important phylogenetic markers(Woese, 1987; Amann et al., 1995). Microheterogeneity in16S rRNA originates from duplication and mutation (Uedaet al., 1999) and is opposed by conversion and concertedevolution (Lan & Reeves, 1998; Liao, 2000). Variabilitycauses many single-species GenBank sequences to be moredifferent than would be expected from sequencing errors(Clayton et al., 1995). Typically, o 1% divergent 16S rRNAgene copies occur in prokaryotic cells (Coenye & Van-damme, 2003; Acinas et al., 2004). Extreme variation appliesto Clostridium paradoxum and Photobacterium profondumwith 15 gene copies each (Rainey et al., 1996; Vezzi et al.,2005) and to Thermoanaerobacter tengcongensis with 11.6%copy variability (Acinas et al., 2004). The advent of genomesequencing has provided direct information about intraspe-

    cies microheterogeneity. Completed Vibrio genome se-quences show the following 16S rRNA gene copy numberand variability: Vibrio cholerae (8, 1.0%), Vibrio splendidus(8, 1.5%), Vibrio vulnificus (9, 0.9%), Vibrio parahaemolyti-cus (11, 0.6%), Vibrio harveyii (11, 1.2%) and Aliivibriofischeri (12, 1.1%) (Acinas et al., 2004; http://www.ncbi.nlm.nih.gov/).

    Vibrio bacteria are of major importance for the miner-alization of organic material in the sea and for causingdiseases, which lead to serious health and economicalproblems (Reen et al., 2006). The free-living, symbiotic andpathogenic vibrios are closely related, and their lifestyle isdifficult to disclose. Over 1000 different genotypes ofV. splendidus have been found as part of coastal plankton(Thompson et al., 2005b) and there is currently 117sequenced V. splendidus 16S rRNA genes in GenBank(January 2009). Analysis of 16S rRNA gene variabilityrevealed microheterogeneity in every Vibrio strain examinedfrom culture collections and coastal seawater (Moreno et al.,

    FEMS Microbiol Lett 294 (2009) 207215 c! 2009 Federation of European Microbiological SocietiesPublished by Blackwell Publishing Ltd. All rights reserved

    R E S E A R C H L E T T E R

    Thenonrandommicroheterogeneityof16S rRNAgenes inVibriosplendidusmayreect adaptation toversatile lifestylesSigmund Jensen1, Petter Frost2 & Vigdis L. Torsvik1

    1Department of Biology, University of Bergen, Bergen, Norway; and 2Institute of Marine Research, Bergen, Norway

    Correspondence: Sigmund Jensen,

    Department of Biology, University of Bergen,

    PO Box 7800, Jahnebakken 5, N-5020

    Bergen, Norway. Tel.:147 5558 2661; fax:

    147 5558 9671; e-mail:

    [email protected]

    Present address: Petter Frost, Intervet

    Norbio As, Thormhlens gate 55, N-5008

    Bergen, Norway.

    Received 23 January 2009; accepted 4 March

    2009.

    First published online 1 April 2009.

    DOI:10.1111/j.1574-6968.2009.01567.x

    Editor: Aharon Oren

    Keywords

    Vibrio splendidus ; 16S rRNA;

    microheterogeneity; phylogeny; lateral gene

    transfer (LGT).

    Abstract

    16S rRNA molecules in a microbial strain can differ due to nucleotide variationbetween their genes. This is a typical trait of fast-growing bacteria to cope withdifferent niches. We investigated characteristics of 16S rRNA genes in Vibriosplendidus strain PB1-10, from the normal flora of Atlantic halibut. Sequencing of16S rRNA gene clones detected 35 variable positions in a total of 13 different genecopies. More than two-thirds of the substitutions occurred in regions correspond-ing to helix H6 and helix H17 of the 16S rRNA molecule. Possible recombinationbetween these helixes in related bacteria (Vibrio, Photobacterium, Colwellia) fromsimilar environments impacts 16S rRNA-based phylogeny of V. splendidus. Weargue that these nonrandom modifications are maintained to provide a fine-tuning of the ribosome function to optimize translation machinery performanceand ultimately bacterial niche fitness.

    Introduction

    Ribosomal RNA (rRNA) provides ribosomes with shape andfunction and plays a central role in protein synthesis of alllife. rRNA genes are also important phylogenetic markers(Woese, 1987; Amann et al., 1995). Microheterogeneity in16S rRNA originates from duplication and mutation (Uedaet al., 1999) and is opposed by conversion and concertedevolution (Lan & Reeves, 1998; Liao, 2000). Variabilitycauses many single-species GenBank sequences to be moredifferent than would be expected from sequencing errors(Clayton et al., 1995). Typically, o 1% divergent 16S rRNAgene copies occur in prokaryotic cells (Coenye & Van-damme, 2003; Acinas et al., 2004). Extreme variation appliesto Clostridium paradoxum and Photobacterium profondumwith 15 gene copies each (Rainey et al., 1996; Vezzi et al.,2005) and to Thermoanaerobacter tengcongensis with 11.6%copy variability (Acinas et al., 2004). The advent of genomesequencing has provided direct information about intraspe-

    cies microheterogeneity. Completed Vibrio genome se-quences show the following 16S rRNA gene copy numberand variability: Vibrio cholerae (8, 1.0%), Vibrio splendidus(8, 1.5%), Vibrio vulnificus (9, 0.9%), Vibrio parahaemolyti-cus (11, 0.6%), Vibrio harveyii (11, 1.2%) and Aliivibriofischeri (12, 1.1%) (Acinas et al., 2004; http://www.ncbi.nlm.nih.gov/).

    Vibrio bacteria are of major importance for the miner-alization of organic material in the sea and for causingdiseases, which lead to serious health and economicalproblems (Reen et al., 2006). The free-living, symbiotic andpathogenic vibrios are closely related, and their lifestyle isdifficult to disclose. Over 1000 different genotypes ofV. splendidus have been found as part of coastal plankton(Thompson et al., 2005b) and there is currently 117sequenced V. splendidus 16S rRNA genes in GenBank(January 2009). Analysis of 16S rRNA gene variabilityrevealed microheterogeneity in every Vibrio strain examinedfrom culture collections and coastal seawater (Moreno et al.,

    FEMS Microbiol Lett 294 (2009) 207215 c! 2009 Federation of European Microbiological SocietiesPublished by Blackwell Publishing Ltd. All rights reserved

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    nucleotides were distributed nonrandomly in covaryingpairs to maintain the necessary secondary structure. Theintraspecific 16S rRNA gene sequence variability inV. splendidus is c. 2%, which is high, taking into considera-tion that the most divergent Vibrio species have 16S rRNAgene distances of c. 7% (Kita-Tsukamoto et al., 1993;Moreno et al., 2002), and that 16S rRNA genes of V. lentusare only 0.8% different from those of the closest V. splendi-dus strains. H6 and H17 variability of the genus Vibrio wasfirst described by Dorsch et al. (1992), who proposed theseregions as targets for specific oligonucleotide primers andprobes. In accordance with the present study, their se-

    quences and the sequence of V. splendidus type strain (ATCC33125T) show undetermined positions. Heteroduplexscreening (Moreno et al., 2002) verified that V. splendidus,like many other vibrios, have several variable positions in the16S rRNA helixes H6 and H17. Further investigations ofH17 in V. parahaemolyticus identified four H17 variants(up to three in the same strain), indicating very high geneticrecombination of 16S rRNA genes in the genus Vibrio(Gonzalez-Escalona et al., 2005; Harth et al., 2007). Thecurrent genome draft of V. splendidus (strain 12B01; M. Polz,S. Ferriera, J. Johnson et al., unpublished data) and acompleted genome sequence of V. splendidus (strain

    Fig. 2. Variants of helixes H6 and H17 in 16S rRNA genes from Vibrio splendidus PB1-10. Each variant (IIV) is defined by a characteristic set of

    nucleotides (in bold). The helixes were drawn from the secondary structure model of Escherichia coli 16S rRNA gene (Gutell et al., 1994) modified to

    locate all variable positions of PB1-10 (pinpointed). Double-headed arrows indicate the number of GenBank sequences exactly matching these helix

    variants (January 2009; BLASTN, PROBE MATCH). A shaded box indicates lateral transfer with species of Photobacterium and Colwellia. Included below is a

    linkage map of the helixes in their respective rrn genes. Alignments with potential LGT donors to rrnG (H6-III) and to rrnE (H17-III) are shown at the

    bottom, to visualize the similarity of these regions.

    FEMS Microbiol Lett 294 (2009) 207215 c! 2009 Federation of European Microbiological SocietiesPublished by Blackwell Publishing Ltd. All rights reserved

    211Variable 16S rRNA genes in Vibrio splendidus

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    LGP32; D. Mazel & F. Le Roux, unpublished data) contain12 and eight 16S rRNA genes, respectively. Helix H6 variantsI and II are present in both genomes, with the addition ofH6-IV (12B01) and H6-V (LGP32), shared with 54 and oneother Vibrio species, respectively (BLASTN, February 2009).Only helix H17-I is represented by the drafted 12B01sequences. The completed LGP32 sequences, however, con-tains both H17-I, II and two additional variants, which wename H17-V and VI that is shared with 4963 other vibrios(BLASTN, February 2009). Interestingly, although sequencesof these helixes vary, their conserved positions (Fig. 2) areidentical. Further homology searches show that V. splendi-dus strain 3d harbours helix variants H6-I, II and H17-I, II,

    IV. The H17 variants of the V. parahaemolyticus (Gonzalez-Escalona et al., 2005; Harth et al., 2007) differ from H17 inV. splendidus, but nucleotides in the loop are conserved.

    Phylogenetic analysis revealed that V. splendidus 16SrRNA genes overlap with other Vibrio species. This wasobserved for all the cloned 16S rRNA genes examined(strains PB1-10, 3d, 12B01 and LGP) irrespective of themethod used, and also appears to be the case for theA. salmonicida strains PB1-8 and PB3-7 (data not shown).Evolutionary rates vary within the rRNA gene (Smit et al.,2007), and highly conserved regions, such as around primer338f, remain unchanged because it is critical for the functionof the ribosome. It is, however, unclear why regions like

    Fig. 3. Implications of microheterogeneous

    16S rRNA genes for the phylogeny of Vibrio

    splendidus. Overlap between individual

    V. splendidus 16S rRNA genes (cloned) and extant

    species 16S rRNA genes (genomic PCR products)

    is shown. Cloned genes are given the following

    symbols; !, PB1-10 rrnA-rrnM;, Vibrio sp. 3d;and., V. splendidus 12B01. The maximum-

    likelihood tree was constructed with ARB (AxML)

    and a filter of 1362 aligned nucleotide positions

    (43-1405 Escherichia coli numbering), excluding

    ambiguities and missing data. Nodes supported

    with bootstrap values above 50 are indicated

    with ! (PHYLIP; 100 DNAML iterations). Partialsequences (111880bp; dashed line) were added

    using a parsimony option within ARB. The 16S

    rRNA gene of Colwellia psychrerythraea

    (AF001375) was used as outgroup. The scale bar

    represents 2% sequence divergence.

    FEMS Microbiol Lett 294 (2009) 207215c" 2009 Federation of European Microbiological SocietiesPublished by Blackwell Publishing Ltd. All rights reserved

    212 S. Jensen et al.

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

  • Phylogeny-Aware Gap PlacementPrevents Errors in Sequence Alignmentand Evolutionary AnalysisAri Lytynoja* and Nick Goldman

    Genetic sequence alignment is the basis of many evolutionary and comparative studies,and errors in alignments lead to errors in the interpretation of evolutionary informationin genomes. Traditional multiple sequence alignment methods disregard the phylogeneticimplications of gap patterns that they create and infer systematically biased alignmentswith excess deletions and substitutions, too few insertions, and implausible insertion-deletionevent histories. We present a method that prevents these systematic errors byrecognizing insertions and deletions as distinct evolutionary events. We show theoreticallyand practically that this improves the quality of sequence alignments and downstreamanalyses over a wide range of realistic alignment problems. These results suggest thatinsertions and sequence turnover are more common than is currently thought andchallenge the conventional picture of sequence evolution and mechanisms of functional andstructural changes.

    New DNA sequencing methods permitquick and affordable exploration of ge-nomic sequences of different orga-nisms. Some of the greatest beneficiaries ofthe rapid increase of sequence data are com-parative genomic studies that seek to provideincreasingly accurate reconstruction of evolu-tionary histories of related genomes, e.g., tostudy functional and structural sequence changesleading to phenotypic differences betweenspecies (14). However, all sequence analysesthat rely on evolutionary information requirean accurate sequence alignment, i.e., the cor-rect identification of homologous nucleotidesor amino acids and the positioning of gapsindicating inserted and deleted sequence.

    Alignment is still a highly error-prone stepin comparative sequence analysis. Differentmultiple sequence alignment methods oftenlead to drastically different conclusions in bothphylogenetic analyses and functional studies(supporting online material text), and alter-native alignments of the same data can supportentirely different mechanisms driving evolu-tionary and functional changes in sequences.As an example, a traditional alignment of HIVand SIVenvelope glycoprotein gp120 (5) (Fig.1A) has a familiar pattern of insertions anddeletions squeezed compactly between con-served blocks of structurally important resi-dues and suggests that part of the variable V2region has a high amino acidsubstitution rateand has shortened over time at a mutationhotspot where overlapping sites have beenindependently deleted in different evolutionary

    branches: some sites as many as eight timesamong the 23 sequences included. With analignment method that considers the sequences'phylogeny and distinguishes insertions fromdeletions (5), the story is different: Instead ofmultiple point substitutions and loss of se-quence, the region evolves through short inser-tions and deletions, allowing for rapid andradical changes in the coding sequence (Fig.1B). The latter alignment, which suggests rapidturnover of sequence material instead of longancestral sequences shrinking in length, pro-vides a more convincing mechanism for theevolution of this region. Furthermore, its as-sociation of gap patterns with meaningfulinsertion and deletion events at the branchesof the phylogenetic tree, i.e., specific points inthe history of the sequences, allows a realisticreconstruction of the evolutionary processleading to the present-day sequences. In thisexample, the different implications of the al-ternative alignments for the mechanisms andtime scale of sequence changes may be ofmedical importance for understanding theevolutionary dynamics of HIV (6), particularlyin this protein region where insertions, dele-tions, and substitutions are associated with theefficiency of HIV entry, biological phenotype,and neutralizing antibody response (711).

    Progressive algorithms (1215), the mul-tiple sequence alignment methods most wide-ly used today, are based on backtracking theevolutionary process and building a multiplealignment from pairwise alignments betweensequences and sequence alignments, performedin order of decreasing relatedness (Fig. 2) (sup-porting online material text). However, whereasinsertion and deletion events are indistinguish-able when comparing one pair of sequences,the two events differ greatly in progressive it-eration of pairwise alignments. A gap for a

    deletion, with its associated penalty, is createdonly once, but a gap for an insertion has to beopened multiple times (Fig. 2, A and B). Simpleiteration associates a full penalty with each ofthese gap-opening events, which leads to ex-cessive penalization of single insertion events.

    No alignment methods have previously im-plemented a precise solution to this problem;instead, heuristics to lower the penalty for open-ing gaps at positions already containing gapshave been used (12, 14). Although these site-specific penalties reduce the high overall cost ofsingle insertion events and encourage subse-quent alignment iterations to correctly placetheir gaps at the same position, the approachfails when there are multiple nearby insertionsand deletions and becomes systematically biased.By definition, inserted characters are not descend-ants ofand thus are not homologouswithanyother insertions or ancestral characters, and shouldnever align with anything (Fig. 2C, evolution).Progressive algorithms, however, always incor-rectly align neighboring insertions in the samecolumn if that is not explicitly prevented; theuse of site-specific gap penalties, instead of pre-venting the incorrect matching of independentinsertions, encourages it (Fig. 2C, site-specificalignment). Such collapsed insertions createincorrect homologies and, as the resulting gappattern implies multiple independent deletions,give an impression of deletion hotspots wherethe overly long ancestral sequences are short-ened (Fig. 2C, interpretation). In addition, theprocedure also lowers the penalties at deletionsites where no further gaps are required, creatinggap magnets that make nearby deletions co-incide in subsequent stages of progressive iteration(Fig. 2D, evolution and site-specific alignment).Similarly to incorrectly aligned insertions, theclustering of deletions creates false homologiesand gives an impression of deletion hotspots(Fig. 2D, interpretation).

    We previously identified the problem ofmultiple penalization of insertions and reporteda preliminary attempt to solve it (16). This usesa phylogeny-aware approach that flags thegaps made in previous alignments and, usingevolutionary information from related sequencesto indicate whether each gap has been createdby an insertion or a deletion, permits theirreusefor inserted characters without further penaltyin the next stage of the progressive alignment(Fig. 2C, phylogeny-aware alignment). In ad-dition, information from closely related se-quences can be used to infer sites aspermanentinsertions that cannot be matched in subsequentalignments (5), so that distinct insertion eventsare correctly kept separate even when they occurat exactly the same position. If related sequencesindicate that a gap is caused by a deletion, flagsare removed and no further free gaps at thatposition are permitted (Fig. 2D), and the effectis correctly targeted on insertions only.

    To understand the type and magnitude ofalgorithm-based errors in traditional sequence

    European Molecular Biology LaboratoryEuropean Bioin-formatics Institute, Wellcome Trust Genome Campus, HinxtonCB10 1SD, UK.

    *To whom correspondence should be addressed. E-mail:[email protected]

    20 JUNE 2008 VOL 320 SCIENCE www.sciencemag.org1632

    REPORTS

    on

    Au

    gu

    st

    7,

    20

    08

    w

    ww

    .scie

    nce

    ma

    g.o

    rgD

    ow

    nlo

    ad

    ed

    fro

    m

    Phylogeny-Aware Gap PlacementPrevents Errors in Sequence Alignmentand Evolutionary AnalysisAri Lytynoja* and Nick Goldman

    Genetic sequence alignment is the basis of many evolutionary and comparative studies,and errors in alignments lead to errors in the interpretation of evolutionary informationin genomes. Traditional multiple sequence alignment methods disregard the phylogeneticimplications of gap patterns that they create and infer systematically biased alignmentswith excess deletions and substitutions, too few insertions, and implausible insertion-deletionevent histories. We present a method that prevents these systematic errors byrecognizing insertions and deletions as distinct evolutionary events. We show theoreticallyand practically that this improves the quality of sequence alignments and downstreamanalyses over a wide range of realistic alignment problems. These results suggest thatinsertions and sequence turnover are more common than is currently thought andchallenge the conventional picture of sequence evolution and mechanisms of functional andstructural changes.

    New DNA sequencing methods permitquick and affordable exploration of ge-nomic sequences of different orga-nisms. Some of the greatest beneficiaries ofthe rapid increase of sequence data are com-parative genomic studies that seek to provideincreasingly accurate reconstruction of evolu-tionary histories of related genomes, e.g., tostudy functional and structural sequence changesleading to phenotypic differences betweenspecies (14). However, all sequence analysesthat rely on evolutionary information requirean accurate sequence alignment, i.e., the cor-rect identification of homologous nucleotidesor amino acids and the positioning of gapsindicating inserted and deleted sequence.

    Alignment is still a highly error-prone stepin comparative sequence analysis. Differentmultiple sequence alignment methods oftenlead to drastically different conclusions in bothphylogenetic analyses and functional studies(supporting online material text), and alter-native alignments of the same data can supportentirely different mechanisms driving evolu-tionary and functional changes in sequences.As an example, a traditional alignment of HIVand SIVenvelope glycoprotein gp120 (5) (Fig.1A) has a familiar pattern of insertions anddeletions squeezed compactly between con-served blocks of structurally important resi-dues and suggests that part of the variable V2region has a high amino acidsubstitution rateand has shortened over time at a mutationhotspot where overlapping sites have beenindependently deleted in different evolutionary

    branches: some sites as many as eight timesamong the 23 sequences included. With analignment method that considers the sequences'phylogeny and distinguishes insertions fromdeletions (5), the story is different: Instead ofmultiple point substitutions and loss of se-quence, the region evolves through short inser-tions and deletions, allowing for rapid andradical changes in the coding sequence (Fig.1B). The latter alignment, which suggests rapidturnover of sequence material instead of longancestral sequences shrinking in length, pro-vides a more convincing mechanism for theevolution of this region. Furthermore, its as-sociation of gap patterns with meaningfulinsertion and deletion events at the branchesof the phylogenetic tree, i.e., specific points inthe history of the sequences, allows a realisticreconstruction of the evolutionary processleading to the present-day sequences. In thisexample, the different implications of the al-ternative alignments for the mechanisms andtime scale of sequence changes may be ofmedical importance for understanding theevolutionary dynamics of HIV (6), particularlyin this protein region where insertions, dele-tions, and substitutions are associated with theefficiency of HIV entry, biological phenotype,and neutralizing antibody response (711).

    Progressive algorithms (1215), the mul-tiple sequence alignment methods most wide-ly used today, are based on backtracking theevolutionary process and building a multiplealignment from pairwise alignments betweensequences and sequence alignments, performedin order of decreasing relatedness (Fig. 2) (sup-porting online material text). However, whereasinsertion and deletion events are indistinguish-able when comparing one pair of sequences,the two events differ greatly in progressive it-eration of pairwise alignments. A gap for a

    deletion, with its associated penalty, is createdonly once, but a gap for an insertion has to beopened multiple times (Fig. 2, A and B). Simpleiteration associates a full penalty with each ofthese gap-opening events, which leads to ex-cessive penalization of single insertion events.

    No alignment methods have previously im-plemented a precise solution to this problem;instead, heuristics to lower the penalty for open-ing gaps at positions already containing gapshave been used (12, 14). Although these site-specific penalties reduce the high overall cost ofsingle insertion events and encourage subse-quent alignment iterations to correctly placetheir gaps at the same position, the approachfails when there are multiple nearby insertionsand deletions and becomes systematically biased.By definition, inserted characters are not descend-ants ofand thus are not homologouswithanyother insertions or ancestral characters, and shouldnever align with anything (Fig. 2C, evolution).Progressive algorithms, however, always incor-rectly align neighboring insertions in the samecolumn if that is not explicitly prevented; theuse of site-specific gap penalties, instead of pre-venting the incorrect matching of independentinsertions, encourages it (Fig. 2C, site-specificalignment). Such collapsed insertions createincorrect homologies and, as the resulting gappattern implies multiple independent deletions,give an impression of deletion hotspots wherethe overly long ancestral sequences are short-ened (Fig. 2C, interpretation). In addition, theprocedure also lowers the penalties at deletionsites where no further gaps are required, creatinggap magnets that make nearby deletions co-incide in subsequent stages of progressive iteration(Fig. 2D, evolution and site-specific alignment).Similarly to incorrectly aligned insertions, theclustering of deletions creates false homologiesand gives an impression of deletion hotspots(Fig. 2D, interpretation).

    We previously identified the problem ofmultiple penalization of insertions and reporteda preliminary attempt to solve it (16). This usesa phylogeny-aware approach that flags thegaps made in previous alignments and, usingevolutionary information from related sequencesto indicate whether each gap has been createdby an insertion or a deletion, permits theirreusefor inserted characters without further penaltyin the next stage of the progressive alignment(Fig. 2C, phylogeny-aware alignment). In ad-dition, information from closely related se-quences can be used to infer sites aspermanentinsertions that cannot be matched in subsequentalignments (5), so that distinct insertion eventsare correctly kept separate even when they occurat exactly the same position. If related sequencesindicate that a gap is caused by a deletion, flagsare removed and no further free gaps at thatposition are permitted (Fig. 2D), and the effectis correctly targeted on insertions only.

    To understand the type and magnitude ofalgorithm-based errors in traditional sequence

    European Molecular Biology LaboratoryEuropean Bioin-formatics Institute, Wellcome Trust Genome Campus, HinxtonCB10 1SD, UK.

    *To whom correspondence should be addressed. E-mail:[email protected]

    20 JUNE 2008 VOL 320 SCIENCE www.sciencemag.org1632

    REPORTS

    on A

    ugust

    7,

    2008

    ww

    w.s

    cie

    ncem

    ag.o

    rgD

    ow

    nlo

    aded f

    rom

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    v

    prankster /Users/dave/Desktop/Vibrios_all_16S.txt

    58

    26

    1413

    1211

    10

    96

    51

    43

    2

    87

    2516

    15

    2422

    2120

    1918

    17

    23

    5756

    5553

    52

    5041

    40

    36

    3330

    29

    2827

    3231

    3534

    3938

    37

    49

    4843

    42

    4746

    4544

    51

    54

    Pprof_1a_16Pprof_1l_16SPprof_1b_16Pprof_1k_16Pprof_1d_16Pprof_1g_16Pprof_1e_16Pprof_1m_16Pprof_1j_16SPprof_1f_16SPprof_1h_16Pprof_1i_16SPprof_1n_16Pprof_1c_16Pprof_2a_16Vfish_1a_16Vfish_1b_16Vfish_1i_16SVfish_1c_16Vfish_1h_16Vfish_1e_16Vfish_1j_16SVfish_1k_16Vfish_2a_16Vfish_1f_16SVfish_1d_16Vfish_1g_16Vpara_1a_16Vpara_1g_16Vpara_1j_16Vpara_1h_16Vpara_2a_16Vpara_1c_16Vpara_1f_16Vpara_1e_16Vpara_1b_16Vpara_1d_16Vpara_1i_16Vib40B_a_16Vib40B_f_16Vib40B_e_16Vib40B_g_16Vib40B_d_16VvulC_1a_16VvulC_1h_16VvulC_1c_16VvulC_1b_16VvulC_1e_16VvulC_2a_16VvulC_1f_16VvulC_1g_16VvulC_1d_16Vib40B_b_16Vib40B_c_16Vib1DA3_d_Vib1DA3_a_Vib1DA3_b_Vib1DA3_e_Vib1DA3_c_

    A G C G - - - - - - - - - - G T A A C A G G A A T T - A G C T T G C T A A T T C - - - G C T G A - - - - - - - - -A G C G - - - - - - - - - - G T A A C A G G A A G G A A G C T T G C T - - T T C T T T G C T G A - - - - - - - - -A G C G - - - - - - - - - - G T A A C A G G A A T T - A G C T T G C T A A T T T - - - G C T G A - - - - - - - - -A G C G - - - - - - - - - - G T A A C A G G A A T T - A G C T T G C T A A T T T - - - G C T G A - - - - - - - - -A G C G - - - - - - - - - - G T A A C A G G A A T T - A G C T T G C T A A T T T - - - G C T G A - - - - - - - - -A G C G - - - - - - - - - - G T A A C A G G A A T T - A G C T T G C T A A T T T - - - G C T G A - - - - - - - - -A G C G - - - - - - - - - - G T A A C A G G A A T T - A G C T T G C T A A T T T - - - G C T G A - - - - - - - - -A G C G - - - - - - - - - - G T A A C A G G A A T T - A G C T T G C T A A T T T - - - G C T G A - - - - - - - - -A G C G - - - - - - - - - - G T A A C A G G A A T T - A G C T T G C T A A T T T - - - G C T G A - - - - - - - - -A G C G - - - - - - - - - - G T A A C A G G A A T T - A G C T T G C T A A T T C - - - G C T G A - - - - - - - - -A G C G - - - - - - - - - - G T A A C A G G A A T T - A G C T T G C T A A T T C - - - G C T G A - - - - - - - - -A G C G - - - - - - - - - - G T A A C A G G A A G A A A G C T T G C T - - T T C T T T G C T G A - - - - - - - - -A G C G - - - - - - - - - - G T A A C A G G A A T T - A G C T T G C T A A T T C - - - G C T G A - - - - - - - - -A G C G - - - - - - - - - - G T A A C A G G A A T T - A G C T T G C T A A T T C - - - G C T G A - - - - - - - - -A G C G - - - - - - - - - - G T A A C A G G G A T T - A G C T T G C T A A T T C - - - G C T G A - - - - - - - - -A G C G - - - - - - - - - - G A A A C - - G A C T T A A - - - - - C T G A A C C T T C G G G G A A C G T T A A G GA G C G - - - - - - - - - - G A A A C - - G A C T T A A - - - - - C T G A A C C T T C G G G G A A C G T T A A G GA G C G - - - - - - - - - - G A A A C - - G A C T T A A - - - - - C T G A A C C T T C G G G G A A C G T T A A G GA G C G - - - - - - - - - - G A A A C - - G A C T T A A - - - - - C T G A A C C T T C G G G G A A C G T T A A G GA G C G - - - - - - - - - - G A A A C - - G A C T T A A - - - - - C T G A A C C T T C G G G G A A C G T T A A G GA G C G - - - - - - - - - - G A A A C - - G A C T T A A - - - - - C T G A A C C T T C G G G G A A C G T T A A G GA G C G - - - - - - - - - - G A A A C - - G A C T T A A - - - - - C T G A A C C T T C G G G G A A C G T T A A G GA G C G - - - - - - - - - - G A A A C - - G A C T T A A - - - - - C T G A A C C T T C G G G G A A C G T T A A G GA G C G - - - - - - - - - - G A A A C - - G A C T T A A - - - - - C T G A A C C T T C G G G G A A C G T T A A G GA G C G - - - - - - - - - - G A A A C - - G A C T T A A - - - - - C T G A A C C T T C G G G G A A C G T T A A G GA G C G - - - - - - - - - - G A A A C - - G A C T T A A - - - - - C T G A A C C T T C G G G G A A C G T T A A G GA G C G - - - - - - - - - - G A A A C - - G A C T T A A - - - - - C T G A A C C T T C G G G G A A C G T T A A G GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A A C C T T C G G G G A A C G A T A A C GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A A C C T T C G G G G A A G G A T A A C GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A A C C T T C G G G G A A G G A T A A C GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A A C C T T C G G G G A A C G A T A A C GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A A C C T T C G G G G A A C G A T A A C GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A A C C T T C G G G G A A C G A T A G C GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A A C C T T C G G G G A A C G A T A G C GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A A C C T T C G G G G A A C G A T A G C GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C A G A A C C T T C G G G G G A C G A T A A C GA G C G - - - - - - - - - - G A A A C - - G A G T T A A - - - - - C T G A A C C T T C G G G G G A C G A T A A C GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A A C C T T C G G G G G A C G A T A A C GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A A C T T T C G G G G G A C G A T A A C GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A A C C T T C G G G G G A C G A T A A C GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A G C C T T C G G G G G A C G A T A A C GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A A C C T T C G G G G A A C G A T A A C GA G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A A C C T T C G G G G A A C G A T A A C GA G C G G C A G C A C A G A G A A A C - - T T G T T - T - - - - - C T - - - - - - - C G G - - - - - - - - - - - -A G C G G C A G C A C A G A G A A A C - - T T G T T - T - - - - - C T - - - - - - - C G G - - - - - - - - - - - -A G C G G C A G C A C A G A G A A A C - - T T G T T - T - - - - - C T - - - - - - - C G G - - - - - - - - - - - -A G C G G C A G C A C A G A G A A A C - - T T G T T - T - - - - - C T - - - - - - - C G G - - - - - - - - - - - -A G C G G C A G C A C A G A G A A A C - - T T G T T - T - - - - - C T - - - - - - - C G G - - - - - - - - - - - -A G C G G C A G C A C A G A G A A A C - - T T G T T - T - - - - - C T - - - - - - - C G G - - - - - - - - - - - -A G C G G C A G C A C A G A G A A A C - - T T G T T - T - - - - - C T - - - - - - - C G G - - - - - - - - - - - -A G C G G C A G C A C A G A G A A A C - - T T G T T - T - - - - - C T - - - - - - - C G G - - - - - - - - - - - -A G C G G C A G C A C A G A G A A A C - - T T G T T - T - - - - - C T - - - - - - - C G G - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - C - - G A G T T A T - - - - - C T G A G C C T T C G G G G G A C G A T A A C G- - - - - - - - - - - - - - - - - - - - - - - - T T A T - - - - - C T G A A C C T T C G G G G G A C G A T A A C G- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -A G C G - - - - - - - - - - G A A A C - - G A G T T A T - - - - - C T G A A C C T T C G G G G G A C G A T A A C G- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    60 70 80 90 100 110

  • ORIGINAL PAPER

    Intraspecific polymorphism of 16S rRNA genes in two halophilicarchaeal genera, Haloarcula and Halomicrobium

    Heng-Lin Cui Pei-Jin Zhou Aharon Oren Shuang-Jiang Liu

    Received: 9 July 2008 / Accepted: 16 September 2008! Springer 2008

    Abstract All members of the genera Haloarcula andHalomicrobium whose names have been validly publishedwere surveyed for 16S rRNA gene polymorphism, and the

    transcription of the genes from two species was investi-gated during growth at different NaCl concentrations. The

    species of Haloarcula and Halomicrobium harbour at leasttwo different 16S rRNA gene copies, and 18 newsequences of 16S rRNA genes were obtained. The type I

    and type II 16S rRNA genes of Haloarcula are divergent at4.85.6% of their nucleotide positions. The type III andtype IV 16S rRNA genes from Halomicrobium mukohataeiJCM 9738T are 9.0% divergent, which represents the

    highest intraspecific divergent 16S rRNA genes so far seen.Phylogenetic analysis based on 16S rRNA genes indicated

    that all type I 16S rRNA genes were clustered, and the

    same was true for the type II 16S rRNA genes of Hal-oarcula species. The two clusters, respectively generatedfrom type I and type II 16S rRNA genes, were sharply

    separated and their divergences (4.85.6%) are in the rangeof various divergence usually found between genera in the

    order Halobacteriales (about 510%). Results from reversetranscription-PCR showed that the type I and type II copiesof Har. amylolytica BD-3T and type III and type IV copiesof Hmc. mukohataei JCM 9738T were all transcribed to16S rRNA molecules under different salt concentrations(1528% NaCl).

    Keywords Halophilic archaea ! Haloarcula !Halomicrobium ! 16S rRNA gene polymorphism

    Introduction

    The gene coding for the small subunit ribosomal RNA(SSU) has been an important molecular chronometer for

    identification and classification of prokaryotes (Woeseet al. 1990) and is now being extensively used to evaluate

    prokaryotic diversity in natural environments (Case et al.

    2007). However, some prokaryotic species harbour severaldivergent 16S rRNA genes in their genomes, which cause

    concern about the reliability of 16S rRNA gene analysis in

    the classification and identification of prokaryotes, as wellas in the evaluation of prokaryotic diversity (Clayton et al.

    1995; Wang et al. 1997; Yap et al. 1999; Marchandin

    et al. 2003). Within the domain Archaea, several metha-nogenic species such as Methanocaldococcus jannaschii,Methanospirillum hungatei and Methanothermobacterthermautotrophicus exhibit 16S rRNA gene polymor-phisms displaying 0.1% divergence, while some halophilic

    archaea harbour more divergent 16S rRNA gene copies

    Communicated by A. Driessen.

    H.-L. Cui ! P.-J. Zhou ! S.-J. LiuState Key Laboratory of Microbial Resources,Institute of Microbiology, Chinese Academy of Sciences,100080 Beijing, Peoples Republic of China

    H.-L. CuiSchool of Food and Biological Engineering,Jiangsu University, 212013 Zhenjiang,Peoples Republic of China

    A. OrenThe Institute of Life Sciences and the Moshe Shilo MinervaCenter for Marine Biogeochemistry,The Hebrew University of Jerusalem,91904 Jerusalem, Israel

    S.-J. Liu (&)Institute of Microbiology, Chinese Academy of Sciences,Datun Road Jia 3#, Chaoyang District,100101 Beijing, Peoples Republic of Chinae-mail: [email protected]

    123

    Extremophiles

    DOI 10.1007/s00792-008-0194-2

    ORIGINAL PAPER

    Intraspecific polymorphism of 16S rRNA genes in two halophilicarchaeal genera, Haloarcula and Halomicrobium

    Heng-Lin Cui Pei-Jin Zhou Aharon Oren Shuang-Jiang Liu

    Received: 9 July 2008 / Accepted: 16 September 2008! Springer 2008

    Abstract All members of the genera Haloarcula andHalomicrobium whose names have been validly publishedwere surveyed for 16S rRNA gene polymorphism, and the

    transcription of the genes from two species was investi-gated during growth at different NaCl concentrations. The

    species of Haloarcula and Halomicrobium harbour at leasttwo different 16S rRNA gene copies, and 18 newsequences of 16S rRNA genes were obtained. The type I

    and type II 16S rRNA genes of Haloarcula are divergent at4.85.6% of their nucleotide positions. The type III andtype IV 16S rRNA genes from Halomicrobium mukohataeiJCM 9738T are 9.0% divergent, which represents the

    highest intraspecific divergent 16S rRNA genes so far seen.Phylogenetic analysis based on 16S rRNA genes indicated

    that all type I 16S rRNA genes were clustered, and the

    same was true for the type II 16S rRNA genes of Hal-oarcula species. The two clusters, respectively generatedfrom type I and type II 16S rRNA genes, were sharply

    separated and their divergences (4.85.6%) are in the rangeof various divergence usually found between genera in the

    order Halobacteriales (about 510%). Results from reversetranscription-PCR showed that the type I and type II copiesof Har. amylolytica BD-3T and type III and type IV copiesof Hmc. mukohataei JCM 9738T were all transcribed to16S rRNA molecules under different salt concentrations(1528% NaCl).

    Keywords Halophilic archaea ! Haloarcula !Halomicrobium ! 16S rRNA gene polymorphism

    Introduction

    The gene coding for the small subunit ribosomal RNA(SSU) has been an important molecular chronometer for

    identification and classification of prokaryotes (Woeseet al. 1990) and is now being extensively used to evaluate

    prokaryotic diversity in natural environments (Case et al.

    2007). However, some prokaryotic species harbour severaldivergent 16S rRNA genes in their genomes, which cause

    concern about the reliability of 16S rRNA gene analysis in

    the classification and identification of prokaryotes, as wellas in the evaluation of prokaryotic diversity (Clayton et al.

    1995; Wang et al. 1997; Yap et al. 1999; Marchandin

    et al. 2003). Within the domain Archaea, several metha-nogenic species such as Methanocaldococcus jannaschii,Methanospirillum hungatei and Methanothermobacterthermautotrophicus exhibit 16S rRNA gene polymor-phisms displaying 0.1% divergence, while some halophilic

    archaea harbour more divergent 16S rRNA gene copies

    Communicated by A. Driessen.

    H.-L. Cui ! P.-J. Zhou ! S.-J. LiuState Key Laboratory of Microbial Resources,Institute of Microbiology, Chinese Academy of Sciences,100080 Beijing, Peoples Republic of China

    H.-L. CuiSchool of Food and Biological Engineering,Jiangsu University, 212013 Zhenjiang,Peoples Republic of China

    A. OrenThe Institute of Life Sciences and the Moshe Shilo MinervaCenter for Marine Biogeochemistry,The Hebrew University of Jerusalem,91904 Jerusalem, Israel

    S.-J. Liu (&)Institute of Microbiology, Chinese Academy of Sciences,Datun Road Jia 3#, Chaoyang District,100101 Beijing, Peoples Republic of Chinae-mail: [email protected]

    123

    Extremophiles

    DOI 10.1007/s00792-008-0194-2

  • EF645694, DQ826512DQ826513, DQ854818). The

    overall size of the amplified fragments of 16S rRNA genes

    of Haloarcula species was 1472 bp, while the two 16SrRNA gene fragments of Hmc. mukohataei JCM 9738T

    were 1473 bp (rrnA) and 1472 bp (rrnB), respectively.Sequence alignments and analyses of the 16S rRNA genesof Haloarcula species showed that these genes formed twostable clusters (Fig. 1). The genes that clustered with rrnAand rrnB or rrnC of Har. marismortui ATCC 43049T were,

    respectively, designated as type I and type II 16S rRNA

    genes. Alignment of the type I and type II 16S rRNA gene

    sequences of all Haloarcula species revealed 4.85.6%nucleotide substitutions. The rrnA and rrnB of Hmc. mu-kohataei JCM 9738T were, respectively, designated as typeIII and type IV 16S rRNA genes, and they showed 9.0%divergence, representing the highest intraspecific divergent

    16S rRNA genes so far known.

    Detection of various types of 16S rRNA

    molecules by RT-PCR

    To examine whether all the 16S rRNA genes from Hal-oarcula and Halomicrobium species were transcribed andwhether their transcription was regulated by the NaClconcentration, total RNAs were prepared from cultures

    grown in CM medium containing different concentrations

    of NaCl. The cDNAs were first synthesized from the totalRNAs by RT, using the universal primer 1518R that was

    complementary to the 30 end of both type I and type II, ortype III and type IV 16S rRNAs. Second, the specific PCR

    primers (Table 1) were used, so that either type I, type II,

    type III, or type IV c DNAs of 16S rRNA fragments wereamplified. The results are shown in Figs. 3 and 4. When the

    cDNAs were used as the template in PCR, all pairs of

    primers (IIV, Table 1) amplified fragments of the correctsize. None of the primers amplified products from the mock

    cDNA reaction where RT was omitted, which excluded the

    possibility of PCR amplification from genomic DNAcontaminants. The RT-PCR experiment clearly demon-

    strated that both type I and type II 16S rRNA genes were

    transcribed in Har. amylolytica. Although these experi-ments did not quantify the transcription of each type of 16S

    rRNA gene, the amounts of the type I and type II 16S

    rRNAs from cells cultivated at different NaCl concentra-tions were comparable as judged by the intensities of the

    bands of PCR-amplified fragments, indicating that the type

    Fig. 1 Specificity of primers for detection of two types of 16S rRNAgenes in Har. amylolytica BD-3T (left) and Hmc. mukohataei JCM9738T (right). Lanes M-DNA marker, lanes 1 and 4: Har. amylolytica

    BD-3T rrnA, lanes 2 and 5: Har. amylolytica BD-3T rrnB, lanes 3 and6: Har. amylolytica BD-3T rrnC, lanes 10 and 20: Hmc. mukohataeiJCM 9738T rrnA, lanes 20 and 40: Hmc. mukohataei JCM 9738T rrnB

    Har. marismortui CGMCC1.1784TrrnB(EF645693) Har. marismortui ATCC43049TrrnC (NC006396)

    Har. vallismortis CGMCC1.2048TrrnB(EF645688) Har. quadrata 801030/1TrrnB(AB010964)

    Har. quadrata JCM11048TrrnB(EF645694) Har. japonica JCM7785TrrnC(EF645686) Har. marismortui rrnB(AF034620)

    Har. marismortui ATCC43049TrrnB(NC006397) Har. vallismortis ATCC29715T(U17593)

    Har. argentinensis JCM9737TrrnB(EF645681) Har. argentinensis JCM9737T(D50849)

    Har. amylolytica BD-3TrrnC(DQ854818) Har. hispanica CGMCC1.2049TrrnB(EF645683)

    Har. hispanica ATCC33960TrrnA(AB090167) Har. amylolytica BD-3TrrnB(DQ826513)

    Har. japonica JCM7785TrrnB(EF645685) Har. hispanica ATCC33960T(U68541)

    Har. vallismortis IFO14741T(D50851) Har. hispanica ATCC33960TrrnB(AB090168)

    Har. japonica JCM7785TrrnA(EF645684) Har. quadrata 801030/1TrrnA(AB010965) Har. quadrata JCM11048TrrnA(EF645689)

    Har. argentinensis JCM9737TrrnA(EF645680) Har. japonica JCM7785T (AB355986)

    Har. vallismortis CGMCC1.2048TrrnA (EF645687) Har. amylolytica BD-3TrrnA(DQ826512)

    Har. hispanica CGMCC1.2049TrrnA(EF645682) Har. marismortui rrnA (AF034619) Har. marismortui ATCC43049TrrnA(NC006396) Har. marismortui CGMCC1.1784TrrnA(EF6455692)

    I

    IIIHmc.mukohataei JCM9738TrrnA(EF645690) Hmc. mukohataei JCM9738TrrnB(EF645691)

    Hmc. mukohataei JCM9738T(D50850) IV100

    100

    7186

    96

    91

    99

    99

    100

    100

    99

    91

    92II

    0.01

    Fig. 2 Neighbour-Joining phylogenetic tree based on 16S rRNAgene sequences of all members of Haloarcula, Halomicrobium.Bootstrap values (%) are based on 1,000 replicates and are shown forbranches with more than 70% bootstrap support. Bar represents 0.01expected changes per site. The boldfaced species were newlysequenced in this study

    Extremophiles

    123

  • 28 sp

    ecies , 1812 sites (g

    lob

    al g

    ap

    removal)

    Neig

    hb

    or J

    oin

    ing M

    ethod

    Ob

    served

    div

    ergen

    ce

    500 b

    ootstra

    p rep

    licates

    0.0

    62

    100

    Vch

    olera

    e_M

    O10

    Vch

    olera

    e_V

    52

    78

    596

    5

    Vch

    olera

    e_M

    AK

    757

    64 9

    2

    Vch

    olera

    e_623-3

    9

    Vch

    olera

    e_A

    M-1

    9226

    69

    100 87

    Vch

    olera

    e_M

    ZO

    -2

    Vch

    olera

    e_N

    16961_1_2

    Vch

    olera

    e_O

    395

    100

    6666 10

    0100

    100

    100

    Pp

    rofu

    nd

    um

    _3T

    CK

    Pp

    rofu

    nd

    um

    _S

    S9_1-2

    Van

    gu

    stum

    _S

    14

    Vfisch

    eri_E

    S114_1-2

    Vsa

    lmon

    icida_L

    FI1

    238

    69

    66

    Vp

    ara

    haem

    oly

    ticus_

    AQ

    3810

    Vp

    ara

    haem

    oly

    ticus_

    RIM

    D2210633_1-2

    100

    Valg

    inoly

    ticus_

    12G

    01

    Vsp

    lend

    idu

    s_12B

    01

    Vsp

    _M

    ED

    222

    Vvu

    lnificu

    s_C

    MC

    P6_1-2

    Vvu

    lnificu

    s_Y

    J016_1-2

    86

    Vch

    olera

    e_1587

    46

    Vch

    olera

    e_N

    CT

    C8457

    47

    Vch

    olera

    e_B

    33

    57

    Vch

    olera

    e_2740-8

    0

    84

    100

    Vch

    olera

    e_M

    ZO

    -3

    Vch

    olera

    e_R

    C385

    Vch

    olera

    e_V

    51

    Vsp

    _E

    x25 31 universally conserved proteins

    tree

  • 28 species , 1338 sites (global gap removal)

    BION

    J Method

    Observed divergence

    500 bootstrap replicates

    0.046

    100

    100

    70

    100

    100100 100

    Vvulnificus_Y

    J016_1-2V

    vulnificus_CM

    CP6_1-2

    Vsplendidus_12B01

    Vparahaem

    olyticus_AQ

    3810

    100 100V

    cholerae_V52

    Vcholerae_M

    ZO-3

    91 100 100 86V

    cholerae_O395

    Vcholerae_N

    16961_1_2V

    cholerae_MA

    K757

    Vcholerae_M

    O10

    Vcholerae_N

    CTC

    8457V

    cholerae_2740-80

    57V

    sp_Ex25V

    alginolyticus_12G01

    Vparahaem

    olyticus_RIM

    D2210633_1-2

    75 100

    100 100100

    Vsp_M

    ED222

    Vsalm

    onicida_LFI1238V

    fischeri_ES114_1-2

    100 45 87

    Vcholerae_A

    M-19226

    Vcholerae_623-39

    Vcholerae_1587

    Vcholerae_M

    ZO-2

    Vcholerae_V

    51

    99

    Vcholerae_R

    C385

    100

    Vcholerae_B33

    96

    Vangustum

    _S14Pprofundum

    _3TCK

    Pprofundum_SS9_1-2

    MLST tree

  • abc.nice.tree Tue Feb 20 14:11:01 2007 Page 1 of 1

    Pprofundum_3TC

    K

    Pprofundum_SS9_1-2

    Vangustum

    _S14

    Vsalm

    onicida_LFI1238

    Vfischeri_ES114_1-2 Vvulnificus_Y

    J016_1-2

    Vvulnificus_C

    MC

    P6_1-2

    Vparahaem

    olyticus_RIM

    D2210633_1-2

    Valginolyticus_12G

    01

    Vsplendidus_12B

    01

    Vsp_M

    ED222

    Vparahaem

    olyticus_AQ

    3810

    Vsp_Ex25

    Vcholerae_2740-80

    Vcholerae_N

    16961_1_2

    Vcholerae_R

    C385

    Vcholerae_V

    51

    Vcholerae_M

    ZO-2

    Vcholerae_A

    M-19226

    Vcholerae_1587

    Vcholerae_623-39

    Vcholerae_M

    ZO-3

    Vcholerae_V

    52

    Vcholerae_N

    CTC

    8457

    Vcholerae_M

    O10

    Vcholerae_B

    33

    Vcholerae_M

    AK

    757

    Vcholerae_O

    3952e+02

    gene order (synteny) tree

  • 492.consensus.tree Tue Feb 20 14:10:57 2007 Page 1 of 1

    Pprofundum_3TC

    K

    Pprofundum_SS9_1-2

    Vangustum

    _S14

    Vsalm

    onicida_LFI1238

    Vfischeri_ES114_1-2

    Vvulnificus_Y

    J016_1-2

    Vvulnificus_C

    MC

    P6_1-2

    Vparahaem

    olyticus_AQ

    3810

    Vparahaem

    olyticus_RIM

    D2210633_1-2

    Vsp_Ex25

    Valginolyticus_12G

    01

    Vsplendidus_12B

    01

    Vsp_M

    ED222

    Vcholerae_R

    C385

    Vcholerae_V

    51

    Vcholerae_623-39

    Vcholerae_M

    ZO-3

    Vcholerae_1587

    Vcholerae_A

    M-19226

    Vcholerae_M

    ZO-2

    Vcholerae_O

    395

    Vcholerae_V

    52

    Vcholerae_2740-80

    Vcholerae_M

    AK

    757

    Vcholerae_N

    CTC

    8457

    Vcholerae_M

    O10

    Vcholerae_B

    33

    Vcholerae_N

    16961_1_25e+02

    genome-based most common tree (492 genes)

    27 sp

    ecie

    s , 6

    18 si

    tes

    Max

    imum

    Lik

    elih

    ood

    (ln(L

    )=-1

    375.

    065)

    50 b

    oots

    trap

    rep

    licat

    es

    0.00

    6

    V. a

    ngus

    tum

    S14

    P. p

    rofu

    ndum

    SS9

    V. a

    lgin

    olyt

    icus

    12G

    01

    V. v

    ulni

    ficus

    CM

    CP6

    V. v

    ulni

    ficus

    YJ0

    16

    V. p

    arah

    aem

    olyt

    icus

    RIM

    D22

    1063

    3

    V. p

    arah

    aem

    olyt

    icus

    AQ

    3810

    V. s

    p. E

    x25

    V. s

    alm

    onid

    ica

    25co

    ntig

    s

    V. fi

    sche

    ri E

    S114

    V. s

    plen

    didu

    s 12

    B01

    V. s

    p. M

    ED22

    2

    V. c

    hole

    rae

    O39

    5

    V. c

    hole

    rae

    MZO

    -2

    V. c

    hole

    rae

    NC

    TC84

    57

    V. c

    hole

    rae

    V51

    V. c

    hole

    rae

    623-

    39

    V. c

    hole

    rae

    2740

    -80

    V. c

    hole

    rae

    RC

    385

    V. c

    hole

    rae

    N16

    961

    V. c

    hole

    rae

    MO

    10

    V. c

    hole

    rae

    MZO

    -3

    V. c

    hole

    rae

    V52

    V. c

    hole

    rae

    MA

    K75

    7

    V. c

    hole

    rae

    B33

    V. c

    hole

    rae

    1587

    V. c

    hole

    rae

    AM

    -192

    26

  • "On The Origins of a Vibrio Species"Tammi Vesth, et al.

    21 A salmo

    1 P profun

    V. cholerae N16961

    V. cholerae MO10

    V. cholerae 0395 TIGR

    V. cholerae V52

    V. cholerae AM-19226

    V. cholerae 2740-80

    V. cholerae 1587

    V. cholerae MZO-2

    V. cholerae 12129

    V. cholerae B33VCE

    V. cholerae VL426

    V. cholerae BX330286

    V. cholerae RC9

    V. cholerae TM11079-80

    V. cholerae TMA21

    V. cholerae MJ1236

    V. cholerae M66-2

    V. cholerae 0395 TEDA

    V. vulnificus YJ016

    V. vulnificus CMCP6

    V. shilonii AK1

    V. parahaemolyticus 2210633

    V. parahaemolyticus 16

    Vibrio sp Ex25

    Vibrio sp MED222

    V. harveyi BAA1116

    V. campbellii AND4

    V. splendidus LGP32

    A. salmonicida LFI1238

    P. profundum SS9

    A. fischeri ES114

    A. fischeri MJ1116S rRNA tree

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    Pan genome

    Core genome

    New gene families

    V. c

    ho

    lera

    e N

    16

    96

    1

    V. c

    ho

    lera

    e M

    66

    -2

    V. c

    ho

    lera

    e O

    39

    5 T

    ED

    A

    V. c

    ho

    lera

    e O

    39

    5 T

    IGR

    V. c

    ho

    lera

    e M

    O1

    0

    V. c

    ho

    lera

    e B

    X3

    30

    28

    6

    V. c

    ho

    lera

    e R

    C9

    V. c

    ho

    lera

    e M

    J1

    23

    6

    V. c

    ho

    lera

    e B

    33

    VC

    E

    V. c

    ho

    lera

    e 2

    74

    0-8

    0

    V. c

    ho

    lera

    e 1

    58

    7

    V. c

    ho

    lera

    e A

    M-1

    92

    26

    V. c

    ho

    lera

    e M

    ZO

    -2

    V. c

    ho

    lera

    e 1

    21

    29

    V. c

    ho

    lera

    e T

    MA

    21

    V. c

    ho

    lera

    e T

    M1

    10

    79

    -80

    V. c

    ho

    lera

    e V

    L4

    26

    V. c

    ho

    lera

    e V

    52

    Vib

    rio. s

    p M

    ED

    22

    2

    V.s

    ple

    nd

    idu

    s L

    GB

    2

    A. fis

    he

    ri ES

    11

    4

    A. fis

    he

    ri MJ1

    1

    A.s

    alm

    on

    icid

    a L

    FI1

    23

    8

    Vib

    rio s

    p E

    x2

    5

    V.c

    am

    pb

    ellii

    V.h

    arv

    eyi B

    AA

    -11

    16

    V.s

    hilo

    nii A

    K1

    P.p

    rofu

    nd

    um

    SS

    9

    V. p

    ara

    ha

    iem

    . 22

    10

    63

    3V

    . pa

    rah

    ae

    m. 1

    6

    V. v

    uln

    ificu

    s C

    MC

    P6

    V. v

    uln

    ificu

    s Y

    J0

    16

    5000

    25000

    20000

    10000

    0

    15000

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    0.30 0.25 0.20 0.15 0.10 0.05 0.00

    Vibrio

    Relative manhattan distance

    !

    !

    ! Aliivibrio salmonicida LFI1238

    !

    ! Vibrio fischeri ES114

    ! Vibrio fischeri MJ11

    !

    !

    !

    !

    ! Vibrio cholerae V52

    ! Vibrio cholerae 2740!80

    !

    !

    ! Vibrio cholera O395 TEDA

    !

    ! Vibrio cholera M66!2

    ! Vibrio cholerae N16961

    !

    !

    ! Vibrio cholera BX330286

    !

    ! Vibrio cholera RC9

    !

    ! Vibrio cholera MJ1236

    ! Vibrio cholera B33VCE

    !

    ! Vibrio cholerae O395

    ! Vibrio cholerae MO10

    !

    !

    ! Vibrio cholera VL426

    !

    ! Vibrio cholera TM11079!80

    !

    ! Vibrio cholera 12129

    ! Vibrio cholera TMA21

    !

    ! Vibrio cholerae MZO!2

    !

    ! Vibrio cholerae AM!19226

    ! Vibrio cholerae 1587

    !

    !

    !

    !

    ! Vibrio parahaemolyticus RIMD2210633

    ! Vibrio sp Ex25

    !

    ! Vibrio campbellii AND4

    ! Vibrio harveyi ATCC BAA1116

    !

    ! Vibrio parahaemolyticus 16

    !

    ! Vibrio vulnificus YJ016

    ! Vibrio vulnificus CMCP6

    !

    !

    ! Vibrio splendidus LGP32

    ! Vibrio sp MED222

    !

    ! Photobacterium profundum SS9

    ! Vibrio shilonii AK1

    10056

    100

    56

    100

    100

    100

    100

    100

    100

    100

    9797

    88

    77

    100

    87

    62

    62

    98

    100

    100

    100

    100

    98

    91

    76

    80

    67

    79

    79

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    243D.W. Ussery et al., Computing for Comparative Microbial Genomics, Computational Biology 8, DOI 10.1007/978-1-84800-255-5_14, Springer-Verlag London Limited 2009

    Chapter 14Evolution of Microbial Communities; or, On the Origins of Bacterial Species

    Outline Evolution can be thought of as the adaptation or optimization of species to their environment. Since, at the level of microorganisms, there can be considerable differences in microenvironments, it is not hard to imagine that many bacteria have a constant need to be adaptable and ready to change to new surroundings. In this final chapter, we will take a look at the processes that drive evolution, and at the evolutionary traces that are visible in the DNA sequences of genomes. Mobile DNA elements play an important role in evolution and an example is given for insertion sequences in Shigella flexneri. Genome islands can be considered genetic building blocks that can be added to or removed from a genome core. Finally, we will take a closer look at Vibrio cholerae, to see how this species differs from other Vibrio spe-cies, and how a relatively small set of genes can be responsible for niche adaptation (and sometimes speciation). The amount of genomic diversity within closely related bacterial populations is far greater than anyone had imagined, and the raw material for evolution is abundant in the microbial world.

    Introduction

    As mentioned in the first chapter, cells obey the laws of chemistry and physics, and there is no need to invoke supernatural forces to explain the physical mechani-cal events happening inside bacterial cells. One of the undercurrent themes of this book has been to build up a firm post-genomic foundation from which to view the bacterial communities. Weve now come full circle, and in this last chapter, we will have a look at the evidence for evolution within individual genomes, and how we can extrapolate such observations to bacterial populations.

    In order for evolution to happen, three components are necessary: (1) a number of organisms must have a diverse set of traits that have different advantages under different conditions, (2) these traits must have the ability to change, and finally (3) selection must take place by some particular condition so that (some of) these traits become dominant in the offspring population. We can add the time factor to this as an essential component, because evolution is rarely instantaneous. Before turning to biological examples, we will first take a closer look at evolution in general.

  • Comparative Microbial Genomics group Ce

    nte

    r for B

    iolo

    gic

    al S

    eq

    ue

    nc

    e a

    na

    lysis

    Departm

    ent of System

    s Biology, Technical U

    niversity of Denm

    ark

    bridization experiments of bacterial genomes typically use ei-ther simple cutoff values to partition data points into presentand absent DNA sequence segments, e.g., based on estimatesfrom known reference hybridizations (3) or based on standarddeviation estimates (11). However, the physical chromosomalposition (mapping) of a probe is often ignored when analyzingthis type of data. Statistical approaches for this purpose havebeen widely developed for copy number analyses in cancerresearch. These methods use statistics for partitioning probesinto sets with the same copy number (corresponding to thesame level of DNA). Recent advances and evaluation of theirperformance have demonstrated their usefulness and superi-ority compared to one-probe-at-a-time approaches (38).

    In early 2005, when this study began, seven completely se-quenced E. coli genomes were publicly available, includingboth pathogenic and nonpathogenic strains. These genomesvary in size from approximately 4.6 Mbp to 5.5 Mbp, andamong these, there is a considerable amount of diversity asillustrated by the matrix shown in Fig. 1A, which compares thecoding sequence overlap between the seven different E. coligenomes. Moreover, next to the matrix, their relatedness isillustrated by a phylogenetic tree, based on their 16S rRNAs.The low relatedness of CFT073 to the other strains may also be

    illustrated by several large distinct chromosomal regions thatcontain genes unique to the CFT073 genome compared toother E. coli genomes (Fig. 1B).

    Here we describe the design and use of a high-density oli-gonucleotide microarray covering seven sequenced E. coli ge-nomes as well as several sequenced E. coli plasmids, bacterio-phages, pathogenicity islands, and virulence genes. Theperformance of this microarray is evaluated, and its utility isillustrated for the hybridization of genomic DNA in order tocompare two uncharacterized E. coli strains which have notbeen sequenced with the seven known, sequenced E. colistrains. Recent advances in analysis of genomic DNA hybrid-ization data were exploited. In particular, the physical mappinginformation was used to classify genes detected in the hybrid-ization data into present and absent chromosomal segments.

    MATERIALS AND METHODS

    In this paper, we distinguish between the sequenced E. coli strains for whichprobes were designed on our custom-