Mitochondrial genome.pdf

Embed Size (px)

Citation preview

  • 8/13/2019 Mitochondrial genome.pdf

    1/17

  • 8/13/2019 Mitochondrial genome.pdf

    2/17

    sequencing results, the libraries yielding the largest number of mito-chondrial sequences exhibited very low terminal C to T substitution

    frequencies (#3% and#6% at 59 and 39 ends, respectively; ExtendedData Fig. 2) indicating that they are dominated by present-day humancontamination. Libraries showing C to T substitution frequencies ofless than 5% at either end were considered to be too contaminated andtherefore disregarded in subsequent analyses.

    Variation in C to T substitution frequencies among libraries suggestthat two populations of sequences are present in the data, an endo-genous population strongly affected by cytosine deamination and acontaminating population showing much less deamination. To test ifthis is the case, we determined the 59 C to T substitution frequenciesfor sequences showing a 39 C to T difference to the reference and vice

    versa, thereby enriching for putatively endogenous DNA. C to T sub-stitution frequencies indeed increased to 55% at 59 ends and 62% at 39ends, numbers that are close to those determined for the U. deningeri

    sample from Sima de losHuesos14 (Fig.3a). Furthermore,stratificationof the deamination signal by fragment length shows that the endogen-ous DNA is primarily present among sequences that are shorter than45 bp, again in agreement with the situation in the U. deningeri sample(Fig. 3b and Extended Data Fig. 3). Based on these results, we removedsequences longer than 45bp and those that do not carry a terminal Cto T substitution on either the 59 or 39 end (Extended Data Table 3). Inaddition, we applied a mapping quality filter to ensure unique place-ment of the sequences within the mtDNA genome and readjusted thealignment parameters to tolerate up to five C to T differences but nomore than two other differences to the reference mtDNA sequence todiscriminate against spurious alignments. Finally, T bases at the firstand last three positions of each sequence were masked to reduce theimpactof deamination-induced substitutions during consensus calling.

    We first called consensus bases for 15,181 positions of the mito-chondrial genome that were covered by 5 or more sequences of whichat least 80% agreed. Average coverage across these positions was 21.8.However, such strict filtering increases the risk of ascertainment biasbecause residual modern human contamination as well as capture andmapping biases may lead to the exclusion of positions where the Simade losHuesos specimendiffersfrom theprobes or thereferencesequence.We therefore built a second more inclusive consensus by consideringthe three terminal positions while selecting sequences with C to Tsubstitution and loweringthe requirementsfor coverageand consensusagreement to 3 and .67%, respectively. This consensus encompasses16,302 positions or,98% of the human mitochondrial reference gen-ome, with an average coverage of 31.6 (Extended Data Fig. 4). Third,to evaluate whether the use of Denisovan capture probes influence the

    results, we built a consensususing thestrictest filtering criteria described

    above, but including only sequences isolated with present-day humanmtDNA probes (Extended Data Fig. 5).

    We reconstructed phylogenetic trees in a Bayesian statistical frame-work24 using thethree Sima de los Huesos consensusmtDNAsequencesas well as themtDNAsof present-day andancienthumans, Neanderthals,Denisovans, chimpanzeesandbonobos.All three trees support a topologyin which the Sima de los Huesos mtDNA shares a common ancestorwith Denisovan mtDNAs to the exclusion of the other mtDNAs ana-lysed with maximum posterior probability (Fig. 4 and Extended DataFig. 6). As expected owing to its age, the branch leading to the Sima delos Huesos mtDNA is shorter than those leading to any of the otherarchaic or present-day humans. Using 13 directly dated ancient mtDNAsequences for calibration25 and the three consensus sequences, we esti-mated the age of the Sima de los Huesos specimen based on the lengthof its mtDNA branch (Table 1). These dates vary between 0.15 to0.64 million years with point estimates close to 400,000 years. This is

    in striking agreement with the point estimate of 409,000years for theU. deningeri mtDNA14. We similarly estimated the divergence times ofthemajormitochondrial lineages (Table1 andExtended Data Table 4)and find that the estimates for the divergence of the mtDNAs of the

    1.0

    0 1 2 3 4 5

    Distance to 5end Distance to 3end

    Fragment size (bp)

    5end3end

    All sequencesSequences with 5C to TSequences with 3C to TSima de los Huesos cave bear

    6 7 8 9 10 10 9 8 7 6 5 4 3 2 1

    3135

    3640

    4145

    4650

    5155

    5660

    6165

    66700

    Ct

    oTsubstitutionfrequency

    Ct

    oTsubstitutionfrequency

    CtoTsubstitutionfre

    quency

    0.9

    0.8

    0.7

    0.6

    0.5

    0.4

    0.30.2

    0.1

    0.0

    1.0 0.40

    0.35

    0.30

    0.25

    0.20

    0.15

    0.10

    0.05

    0

    0.9

    0.8

    0.7

    0.6

    0.5

    0.4

    0.30.2

    0.1

    0.0

    ba

    Figure 3| Patterns of cytosine deamination in the libraries constructedfrom the Sima de los Huesos hominin femur. a, C to T substitutionfrequencies areshownfor theterminalpositions of thealigned sequencesfor allsequences (black), those sequences carrying a C to T substitutions at their 59

    ends (blue), at their 39 ends (red), and for all Sima de los Huesos cave bearsequences from theU. deningerisample9 (dotted line).b, C to T substitutionfrequencies at the first and last base of sequences in different fragmentlength bins.

    Africans

    Neanderthals

    Denisovans

    Sima de los Huesos

    Asians and

    Europeans

    0.0090

    Figure 4| Bayesian phylogenetic tree of hominin mitochondrialrelationships based on the Simade losHuesosmtDNAsequence determinedusing the inclusive filtering criteria. All nodes connecting the denotedhominin groups are supported with posterior probability of 1. The tree wasrootedusing chimpanzee and bonobomtDNA genomes. The scale bar denotes

    substitutions per site.

    RESEARCH LETTER

    2 | N A T U R E | V O L 0 0 0 | 0 0 M O N T H 2 0 1 3

    Macmillan Publishers Limited. All rights reserved2013

  • 8/13/2019 Mitochondrial genome.pdf

    3/17

    Sima de los Huesos hominin and Densiovans vary between 0.40 and1.06 million years with point estimates around 700,000 years ago.

    The fact that the Sima de los Huesos mtDNA shares a commonancestor with Denisovan rather than Neanderthal mtDNAs is unex-pected in light of the fact that the Sima de los Huesos fossils carryNeanderthal-derived features (for example, in their dental,mandibular,midfacial, supraorbital and occipital morphology2,6,7,26). Denisovanswere identified in 2010 based on DNA sequences retrieved from amanual phalanx and a molar found in southern Siberia9,23. Based onanalyses of their nuclear genome9,10 they area sistergroup of Neanderthals,although themtDNAsof Neanderthals and present-day humansshare anmtDNA ancestor more recently with each other than with Denisovans23.

    This maybe owing to eitherincompletelineage sorting in thecommonancestral populations of these groups or to gene flow into Denisovansfrom another archaic group9.

    Several evolutionary scenarios are compatible with the presence ofa mtDNA sequence that falls on the Denisovan mtDNA lineage in a,400,000-year-old hominin in western Europe. First, the Sima de losHuesos hominins maybe closelyrelated to the ancestors of Denisovans.However, this seems unlikely,becausethepresence of Denisovans in west-ern Europe would indicate an extensive spatial overlap withNeanderthalancestors, raising the question how the two groups could geneticallydiverge while overlapping in range. Furthermore, although almost nomorphological information is available for Denisovans, a molar thatcarries Denisovan DNA is of exceptionally large size9 and does notexhibit the cusp reduction seen in the Sima de los Huesos hominins7.

    Most importantly, the Sima de los Huesos specimen is so old that itprobably predates the population split time between Denisovans andNeanderthals, which is estimated to one-half to two-thirds of the timeto the split between Neanderthals and modern humans, which is esti-matedtobe170,000to700,000yearsago9.Second,itispossiblethattheSima de los Huesos hominins represent a group distinct from bothNeanderthals and Denisovans thatlater perhapscontributed themtDNAto Denisovans. However, this scenario would imply the independentemergence of several Neanderthal-like morphological features in agroup unrelated to Neanderthals. Third, the Sima de losHuesos homi-nins may be related to the population ancestral to both NeanderthalsandDenisovans. Considering theage of theSimade los Huesos remainsand their incipient Neanderthal-like morphology, this scenario seemsplausible to us, but it requires an explanation for the presence of two

    deeply divergent mtDNA lineages in the same archaic group, one thatlater recurred in Denisovans and one that became fixed in Neanderthals,respectively. A forth possible scenario is that gene flow from anotherhominin population brought the Denisova-like mtDNA into the Simade los Huesos population or its ancestors. Such a hominin group mighthave also contributed mtDNA to the Densiovans in Asia9,10. Based onthe fossil record, more than one evolutionary lineage may have existedin Europe during the Middle Pleistocene27. Several fossils have beenfound in Europe as well as in Africa and Asia that are close in time toSima de los Huesos but do not exhibit clear Neanderthal traits. Thesefossils are often grouped into H. heidelbergensis, a taxon that is difficultto define8,28,29, particularly with regard to whether the Sima de losHuesos hominins should be included8. Furthermore, there may havebeenrelict populations of stillearlier hominins, notably thoseclassified

    as Homo antecessor, whichshare some morphologicaltraits with Asian

    Homo erectus30 and have been found just a few hundred metres awayfrom Sima de los Huesos in Gran Dolina.

    Although nuclear sequence data are needed to clarify the geneticrelationship of the Sima de los Huesos hominins to Neanderthals andDensiovans,the mtDNA sequence establishesan unexpectedlink betweenDenisovans and the western European MiddlePleistocenefossil record.Futureefforts will nowfocuson describing themtDNA variation of theSima de los Huesos hominins and retrieving nuclear DNA sequencesfrom them. The latter will be a huge challenge given that almost twograms of bone were required to generate the mtDNA sequence eventhough several hundred copies of mtDNA exist per cell. Althoughpreservation of DNA for such long periods of time may be favoured

    by unique preservation conditions in the Sima de los Huesos, thepresent results show that ancient DNA sequencing techniques havebecome sensitive enough to warrant further investigation of DNAsurvival at sites where Middle Pleistocene hominins are found.

    METHODS SUMMARYA detailed description of the methods used for data generation and analysis isprovided in the Methods section.

    Online Content Any additional Methods, ExtendedData displayitems andSourceData are available in theonline version of the paper; references unique to thesesections appear only in the online paper.

    Received 10 September; accepted 17 October 2013.

    Published online 4 December 2013.

    1. Carbonell, E.et al. The first hominin of Europe.Nature 452,465469 (2008).2. Arsuaga, J. L., Martinez, I., Gracia, A. & Lorenzo, C. The Sima de los Huesos crania

    (Sierra de Atapuerca, Spain). A comparative study.J. Hum. Evol.33,219281(1997).

    3. Arsuaga, J. L.et al.Size variation in Middle Pleistocene humans.Science277,10861088 (1997).

    4. Bermudez de Castro, J. M. & Nicolas, M. E. Palaeodemography of the Atapuerca-SH Middle Pleistocene hominid sample.J. Hum. Evol.33, 333355 (1997).

    5. Bischoff, J. L. et al.Geology and preliminary dating of the hominid-bearingsedimentaryfill of theSima de losHuesos Chamber, Cueva Mayor of theSierradeAtapuerca, Burgos, Spain.J. Hum. Evol.33, 129154 (1997).

    6. Martnez,I. & Arsuaga, J. L. The temporal bones from Sima de los Huesos MiddlePleistocene site (Sierra de Atapuerca, Spain). A phylogenetic approach.J. Hum.Evol. 33,283318 (1997).

    7. Martinon-Torres,M., Bermudez deCastro,J.M., Gomez-Robles, A.,Prado-Simon,L.& Arsuaga, J. L. Morphological descriptionand comparison of the dental remainsfrom Atapuerca-Sima de los Huesos site (Spain).J. Hum. Evol.62, 758 (2012).

    8. Stringer, C. The status ofHomo heidelbergensis(Schoetensack 1908).Evol.Anthropol. 21, 101107 (2012).

    9. Reich,D. etal. Genetichistoryof an archaic hominin group fromDenisova CaveinSiberia. Nature 468, 10531060 (2010).

    10. Meyer, M.et al.A high-coverage genome sequence from a n archaic Denisovanindividual. Science 338, 222226 (2012).

    11. Ortega, A. I. et al. Evolution of multilevel caves in the Sierra de Atapuerca (Burgos,Spain) and its relation to human occupation.Geomorphology196,122137(2013).

    12. Arsuaga, J. L.et al.Sima de los Huesos (Sierra de Atapuerca, Spain). The site.J. Hum. Evol.33, 109127 (1997).

    13. Valdiosera, C.et al. Typing single polymorphic nucleotides in mitochondrial DNAas a way to access Middle Pleistocene DNA.Biol. Lett.2, 601603 (2006).

    14. Dabney, J.et al. Complete mitochondrial genome sequence of a MiddlePleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. NatlAcad. Sci. USA110,1575815763 (2013).

    15. Willerslev, E.et al. Ancient biomolecules from deep ice cores reveal a forestedsouthern Greenland.Science317, 111114 (2007).

    16. Orlando, L. et al. Recalibrating Equusevolution using the genome sequence of anearly Middle Pleistocene horse.Nature 499, 7478 (2013).

    Table 1| Divergence timesof themajor hominin mtDNAlineages and theageof theSimade losHuesosspecimen asestimated byusingthreedifferent filtering strategies for consensus calling

    Mitochondrial lineages Divergence dates and molecular age estimates in Myr BP (95% highest posterior density (HPD) intervals in brackets)

    Strict filters Inclusive filters Strict filters enriched with human

    probes only

    Sima de los Huesos age estimate 0.38 (0.180.60) 0.37 (0.160.59) 0.40 (0.150.64)Human Neanderthal 0.51 (0.350.69) 0.53 (0.360.72) 0.44 (0.290.61)Sima de los Huesos/Denisova Human/Neanderthal 0.99 (0.701.33 ) 1.04 (0.721.41 ) 0.83 (0.541.11 )Sima de los Huesos Denisova 0.73 (0.501.01) 0.76 (0.511.06) 0.65 (0.400.92)

    LETTER RESEARCH

    0 0 M O N T H 2 0 1 3 | V O L 0 0 0 | N A T U R E | 3

    Macmillan Publishers Limited. All rights reserved2013

    http://www.nature.com/doifinder/10.1038/nature12788http://www.nature.com/doifinder/10.1038/nature12788
  • 8/13/2019 Mitochondrial genome.pdf

    4/17

  • 8/13/2019 Mitochondrial genome.pdf

    5/17

    METHODSDescription of the femur, archaeological context and sampling. The largestfemoralfragment(AT-999) wasdiscoveredin 1994and represents thedistal(lower)two thirds of the bone. The proximal (upper) third of the femur (AT-2943) wasrecovered in 1999. The third femoral fragment (AT-2944), also found in 1999, isa much smaller shaft fragment, which partially connects the two larger femoralfragments(see Fig. 2).All three fragments were foundcloseto each other insquareU-15 (the Sima de losHuesosexcavation grid hassquares of 0.5m in length). U-15excavation squareis in thecentral area of thesite andis particularly rich in human

    fossils, including complete skulls.Sampling was performed by drilling small holes into the cortical tissue of all

    threebone fragments starting frompre-existing fractures. To reduce the impact ofmodern human contamination, approximately 1 mm of surface material wasremoved beforedrilling holes in thebone at lowspeed(1,000r.p.m.) usinga steriledentistry drill. No damage was done to the outer surface of the femur.DNA extraction, library preparation and shotgun sequencing.Using 1.95 g ofbone material in total, 39 DNA extracts were made from between 25 and75 mg ofbone powder each using a recently published silica-based DNA extraction pro-tocol optimized for the recovery of very short molecules from ancient biologicalmaterial14. Two of these extracts were made from surface material, whereas allother extracts were generated from bone powder sampled from inside the bone.Extraction blank controls were carried alongside with the samples in each set ofDNA extractions. As substantialpellets of undigested bone powder remained afterextraction, we also generated re-extracts from6 bone pellets to investigate whetheradditional DNA can be released by repeating the extraction. Libraries were pre-pared insets of 16usingbetween20 and30ml of sample or blankDNA extract(outof50ml totalvolume) and followinga single-stranded library preparation protocolspecifically developed for highly degraded ancient DNA18. One positive controlandone blank control were included in each setof library preparations. No uracil-DNA glycosylase treatment was performed to preserve the C to T substitutionpatterns that are typical for sequences from ancient DNA20.

    The number of unique molecules in each library was estimated by quantitativePCR (qPCR), using primers hybridizing to the adaptor sequences18. All samplelibraries(n5 77)yieldedqPCR moleculecounts between 1.93 109 and2.43 1010,with exception of the libraries prepared from re-extracted bone pellets, whichreturned numbers in the range of 1.33 108 to 1.13109 (Extended Data Table 1).Molecule counts of the extraction and library preparation blanks (n5 13), whichrepresent artefacts and library molecules derived from contamination with exo-genous DNA, were consistently lower (in the range of 4.93 106 to 4.83 107).qPCR molecule counts thus indicate that substantial amounts of DNA reside in

    the bone and that most of this DNA is successfully released in a single round ofDNA extraction.Each library wasdividedintofouraliquotsof 12.5ml, whichwere then amplified

    into PCR plateau in 100 ml reactions with AccuPrime Pfx DNA polymerase (LifeTechnologies) as described elsewhere31. During amplification, two unique indexsequences were introduced into the adaptors of each library, following a double-indexing scheme described elsewhere32. Amplification products from the samelibrary werepooledand purifiedusingthe MinElutePCR purificationkit (Qiagen).

    Aliquotsfrom a subset of libraries werepooled and subjected to shallow shotgunsequencing using Illuminas MiSeq platformin double-index configuration (23 76123 7 cycles)32. Raw sequence data were processed as described below. Thefragment size distribution of the sequences before mapping shows a mode around30 bp (ExtendedData Fig.1), indicating that smallDNA fragmentswere efficientlyextractedand convertedinto library molecules. Sequenceswere aligned against thehuman genome (GRCh37/1000 Genomesrelease) using BWA19 withrelaxedalign-ment parameters10. In most libraries, less than 0.1% of the sequences $ 35bp

    mapped against the human genome with a mapping quality$30 (see ExtendedDataTable 2).Owingto the small numbersof alignedsequences,C to T substitutionfrequencies cannot be determined with confidence in most cases. A small numberof libraries, most notably the ones generated from surface material, standout withhigh fractions of aligned sequences (up to 8.4%). However, the C to T substitutionfrequencies in theselibraries are extremely low,indicating that the vast majority ofsequences are derived from modern human contamination. Based on these ana-lyses, none of the libraries prepared from the femur is a suitable candidate fordeeper shotgun sequencing.Enrichment of mitochondrial DNA and sequencing. All sample and blanklibraries were first enriched for mitochondrial DNA using present-day humanmitochondrial probes synthesized on an oligonucleotide array (in 3-bp tiling den-sity, using humanmtDNAsequence NC_001807),following themethod describedin ref. 33, except that the hybridization and wash temperatures were lowered to60 uC and 55 uC to facilitate enrichment of short library molecules14. After phylo-genetic analyses showedthat themitochondrial genomeof theSima de losHuesos

    hominin is closer to Denisovans than modern humans, the libraries were also

    enriched using Denisovan mitochondrial probes. To construct these probes, 19overlapping DNA fragments of approximately 1 kb were designed (GeneArtFragments, Life Technologies) using the mitochondrial genome of the Denisovanmanual phalanx23 as reference. The fragments encompassedthe following sequencecoordinates: 3191289, 12232191, 21013088, 30183950, 38974889, 48065763, 56886663, 66127601, 75298486, 84289418, 937110300, 1020311156, 1108512017, 1196612931, 1288113813, 1376214706, 1464115600,1555116503 and 16460381. One of the fragments (84289418) failed severalsynthesisattemptsand couldnot be includedin the probe pool. The 18 successfully

    synthesized fragments were amplified with Q5 Hot Start High-Fidelity DNAPolymerase (NEB) according to the suppliers instructions using a 59 biotinylatedforward primer and an unmodified reverse primer. Amplified fragments werepurified using solid-phase reversible immobilization (SPRI) beads as describedelsewhere34 and pooled in equimolar ratios. Bead capture was performed asdescribed in ref. 35, but with lowered hybridization and wash temperatures asdetailed above. Two successive rounds of hybridization enrichment were carriedout with both probe sets.

    The enriched libraries were combined into three pools and sequenced onIlluminas HiSeq 2500 platform in rapid mode, using recipes for paired-endsequencing with two index reads (961 71 961 7 or 76171 761 7 cycles)32.The first pool(includinglibrariesB2949B2994enriched withpresent-dayhumanprobes) was sequenced together with libraries from another experiment (mito-chondrial captures of ancient human samples), occupying 75% of two lanes of aflow cell. The second and third pools (including libraries A1543A2045 enrichedwith present-day human probes and all libraries enriched with Denisovan probes,respectively) were sequenced on one lane of a flow cell each.

    Raw data processing and mapping. Base calling was performed using Bustard(Illumina) or freeIbis36. Sequencesthat didnot perfectlymatchone of theexpectedindex combinations werediscardedand full-lengthmolecule sequences wererecon-structed by overlap-merging of paired-end reads37. Merged sequences $ 30bpwere aligned against the revised Cambridge reference sequence (NC_012920)using BWA19 with the parameters 2n 5, which allows up to five mismatches,and 2l 16500, which turns off seeding. Sequences with identical alignment startand end coordinates were collapsed into single sequences by consensus calling23.

    Enrichment success and recovery of mitochondrialsequences. The efficiency ofenrichment varied between the human and Denisovan probe sets, with 6.8% and27.6% of the sequences $ 30 bp aligning to the human mitochondrial referencegenome before duplicate removal (compare Extended Data Table 3). Each uniquesequence is represented by 21 duplicates on average (but note that this value isdeflated by few libraries yielding very large numbers of sequences; see Extended

    DataTable 1),indicating thatthe librarieswere sequencedto exhaustion. Thereareremarkable differences in the number of unique sequences obtained from eachlibrary (ExtendedDataTable1 andExtended Data Fig. 2),rangingfrom122 to 719for the blank controls, from 448 to 9,757 for the libraries prepared from re-extracted bone pellets, and from 1,529 to 773,319 for the regular sample libraries.Extended Data Table 3 summarizes the number of sequences retained in thesample libraries after each processing step.

    In previous work on the cave bear sample from Sima de los Huesos, almost allsequenced fragments (94%) were # 50 bp in length14. The fragment size distri-bution inferred from the hominin sequencesexhibits a larger proportion of longermolecules (Extended Data Fig.3), possibly reflecting contaminant sequences. Whenstratifying terminal C to T substitution frequencies by sequence length, we find astrong decline of the deamination signal with length (Fig. 3b), indicating that thepronounced tail of long sequences is due to modern human contamination.

    Basic filters applied to the sequences before consensus calling. We used the

    followingsetof filters todecreasethe load ofmodern humancontamination andtoeliminate spuriousalignments before consensus calling. The number of sequencesretained after each step of filtering is provided in Extended Data Table 3. First, weexcluded libraries without substantial signals of cytosine deamination (,5.0%terminal C to T substitution frequencies). Second, for mapping the sequenceswithBWA we allowed up to 5 mismatches and one insertion or deletion to prevent theloss of sequences with several damage-derived C to T substitutions, but theseparametersare extremelypermissive and do not sufficientlydiscriminative againstspurious alignments. We therefore removed sequences showing more than twodifferences to thehuman mitochondrial reference genome thatcannotbe explainedby cytosine deamination (that is, sequences with more than two non-C-to-T sub-stitutions in the orientation as sequenced). In addition, we limited the maximumnumber of acceptable differences to the reference to five, counting also insertionsor deletions. Third, sequences witha mappingqualityof lessthan 30 wereremovedto ensure secure placement within the mitochondrial genome. Fourth, sequenceslonger than 45 bp were removed, because they are particularly rich in contamina-

    tion (see Fig. 3b).

    LETTER RESEARCH

    Macmillan Publishers Limited. All rights reserved2013

  • 8/13/2019 Mitochondrial genome.pdf

    6/17

  • 8/13/2019 Mitochondrial genome.pdf

    7/17

    41. Briggs, A. W.et al.Targeted retrieval and analysis of five Neandertal mtDNAgenomes. Science325, 318321 (2009).

    42. Zsurka, G.et al. Distinct patterns of mitochondrial genome diversity in bonobos(Pan paniscus) and humans.BMC Evol. Biol.10, 270 (2010).

    43. Bjork, A., Liu, W.,Wertheim,J. O.,Hahn, B.H. & Worobey,M. Evolutionaryhistoryofchimpanzees inferred from complete mitochondrial genomes. Mol. Biol. Evol.28,615623 (2011).

    44. Katoh, K. & Standley,D. M. MAFFT multiple sequence alignmentsoftware version7: improvements in performance and usability.Mol. Biol. Evol.30,772780(2013).

    45. Posada, D. & Crandall, K. A. MODELTEST: testing the model of DNA substitution.

    Bioinformatics 14, 817818 (1998).

    46. Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis bysampling trees.BMC Evol. Biol.7, 214 (2007).

    47. Kass, R. E. & Raftery, A. E. Bayes Factors.J. Am. Stat. Assoc. 90, 773795 (1995).48. Horai, S.et al.Mans place in Hominoidea revealed by mitochondrial DNA

    genealogy. J. Mol. Evol.35,3243 (1992).49. Green, R. E.et al.A complete Neandertal mitochondrial genome sequence

    determined by high-throughput sequencing.Cell 134, 416426 (2008).50. Stone, A.C. etal. Morereliableestimatesof divergencetimesin Pan usingcomplete

    mtDNA sequences and accounting for population structure.Phil. Trans. R. Soc.Lond. B365,32773288 (2010).

    51. Soares, P.et al.Correcting for purifying selection: an improved human

    mitochondrial molecular clock.Am. J. Hum. Genet.84,740759 (2009).

    LETTER RESEARCH

    Macmillan Publishers Limited. All rights reserved2013

  • 8/13/2019 Mitochondrial genome.pdf

    8/17

    Extended Data Figure 1| Size distribution of all overlap-merged sequences generated by shotgun sequencing (before mapping).

    RESEARCH LETTER

    Macmillan Publishers Limited. All rights reserved2013

  • 8/13/2019 Mitochondrial genome.pdf

    9/17

  • 8/13/2019 Mitochondrial genome.pdf

    10/17

    Extended Data Figure 3| Sequence length distribution of unique sequences. The distribution obtained from the Sima de los Huesos cave bear is shown forcomparison.

    RESEARCH LETTER

    Macmillan Publishers Limited. All rights reserved2013

  • 8/13/2019 Mitochondrial genome.pdf

    11/17

    Extended Data Figure 4| Sequence coverage of the mitochondrial genome obtained from sequences with terminal C to T substitutions.

    LETTER RESEARCH

    Macmillan Publishers Limited. All rights reserved2013

  • 8/13/2019 Mitochondrial genome.pdf

    12/17

    ExtendedData Figure 5 | Sequence coverage of themitochondrial genomeplotted separately forboth capture probe sets used (basedon sequenceswith a Cto T substitution at the first or last alignment position).

    RESEARCH LETTER

    Macmillan Publishers Limited. All rights reserved2013

  • 8/13/2019 Mitochondrial genome.pdf

    13/17

    Extended Data Figure 6| Complete view of the mid-point rootedphylogenetic tree constructed with a Bayesian approach under aGTR1 I1Cmodel of sequence evolution using the Sima de los Huesos

    consensus sequence generated with inclusive filters as wellas 54 present-dayhumans, 9 ancient humans, 7 Neanderthals, 2 Denosivans, 22 bonobos and24 chimpanzees. Theposterior probabilities are providedfor the major nodes.

    LETTER RESEARCH

    Macmillan Publishers Limited. All rights reserved2013

  • 8/13/2019 Mitochondrial genome.pdf

    14/17

  • 8/13/2019 Mitochondrial genome.pdf

    15/17

    Extended Data Table 2| Results from shallow shotgun sequencing of a subset of libraries

    LETTER RESEARCH

    Macmillan Publishers Limited. All rights reserved2013

  • 8/13/2019 Mitochondrial genome.pdf

    16/17

    Extended Data Table 3| Number of sequences retained in the sample libraries after each step of processing and filtering

    Sequences obtained from the blank controls are not included. All filters are additive in the order indicated.

    RESEARCH LETTER

    Macmillan Publishers Limited. All rights reserved2013

  • 8/13/2019 Mitochondrial genome.pdf

    17/17

    Extended Data Table 4| Inferred time to the most recent common ancestor (TMRCA) of the modern human, Neanderthal, chimpanzee andbonobo mtDNAs, as well as divergence estimates for human/chimpanzee and bonobo/chimpanzee mtDNA (continuation of Table 1)

    LETTER RESEARCH