Xin Zhou - Saturday Closing Plenary

  • View
    888

  • Download
    1

Embed Size (px)

Text of Xin Zhou - Saturday Closing Plenary

  • 1. Taxon diversity analysis for bulk insect samples using Illumina Hi-seq platform Xin ZHOU, Shanlin LIU, Yiyuan LI,Qing YANG, and Xu SUDepartment of Science and TechnologyEnvironmental Genomics Research Group BGI, China Adelaide, Australia, 3 December 2011

2. Problem Solutions?Opt.1: ......zzzzZZZZZOpt.2: morph sorting indiv. ID Opt.1Opt.3: morph sorting indiv. barcoding Opt.1Opt.4: grinding up NGS CLUSTERING/BLAST DIVERSITY!Zhou et al. 2011, 4th International Barcode of Life Conference 3. Environmental barcoding of bulk insects aquatic insects mini-barcode (130bp) 454 bat diet (insects) COI fragment, 157 bp 454Biodiversity soup: metabarcoding of arthropods Malaise trap (insects)for rapid biodiversity assessment and COI fragment, ~400 bpbiomonitoring, Yu D.W. et.al., in review 454 Zhou et al. 2011, 4th International Barcode of Life Conference 4. Major NGS platforms applicable in environmental barcodingRequirement ReadData/runNGS platformsRun timeof librarylength (GB)construction454 platform~400bp 0.723 hr.Yes(GS FLX Titanium XL+)Illumina platform 150bp60014 d.Yes(Hi-Seq 2000) PE readsIllumina platform 150bp2 27 hr.Yes(Mi-Seq)PE readsIon Torrent 200bp ~13.5 hr.NoIllumina Hi-Seq higher through-put less $ / bp increasing reading length variety of bioinformatics tools available from genomicpipelinesZhou et al. 2011, 4th International Barcode of Life Conference 5. Sequencing capacity at BGI28 Illumina GAIIxData production: 137 Illumina Hi-Seq2000 100 Gb / day (2009)25 Life Tech SOLiD 4 >5 Tb / day (end of 2010)16 ABI 3730XL >1500X human genome / day 110 MegaBACEs 2 Illumina iScan 1 Roche 454 1 Ion Torrent 1 Illumina Mi-SeqZhou et al. 2011, 4th International Barcode of Life Conference 6. What I am NOT going to talk about: Primer optimization Systematic comparisons of NGS platforms Quantitative diversity analysisWhat I AM going to talk about: Can Illumina NGS be used in diversity analysis?Zhou et al. 2011, 4th International Barcode of Life Conference 7. Can Illumina NGS be used in diversity analysis? Sequencing error rate Read-lengthZhou et al. 2011, 4th International Barcode of Life Conference 8. Sequencing error rate No indel issue in homopolymers Sequencing quality keeps increasing Rare nucleotide error can be easilycorrected by:Recent improvement in sequencing quality increasing sequencing depth using Illuminas V3 chemical (even at 100 bp, only about 10% of the base callings has error pair-end (PE) sequencing rate >1%) setting stringent matching criteria in 150bp the overlapping fragment by allowing only >99% identity150bp Insert-size 250nt PE sequencing enables forming sequence contigs Zhou et al. 2011, 4th International Barcode of Life Conference 9. Read length 150bp 150bp Insert-size 250nt Read length keeps increasing150PE enables contig read of 250bp Short-gun reads can be further assembledinto longer fragments (short-gun assemblystrategy used in genome sequencingprojects) Option of scaffold assemblyZhou et al. 2011, 4th International Barcode of Life Conference 10. Illumina environmental barcodingIlluminae-barcoding PCR based PCR freeLib1 (658bp, 150PE) Lib2 (200bp, 150PE) Full length COI COI ampliconsbarcode PEshotgun PE Mitochondrialsequencingsequencing shotgun PE sequencingFull length COIFull length COIwithout PCR bias Zhou et al. 2011, 4th International Barcode of Life Conference 11. Approach #1: PCR-basedSample informationXSBN Mock(provided by Yu et al.)# Specimens23 292# Haplotypes (2%)12 230 Soup protocolDNA extracted individually and mixed for PCRPCR primers LepF1/LepR1CustomizedSequence length658 bp700 bp SequencingFull length (658bp) + Short-gun library (~200bp) library details Sequencing150PEprotocolZhou et al. 2011, 4th International Barcode of Life Conference 12. Approach #1: PCR-basedPre-analysis data filteringLib 1 Mock XSBN Raw data 1.67G4.04G Filtering adapter1.60G1.28GHigh quality (Q20)0.35G0.50G# Reads 1,081,997 1,150,477(Primer removed)# Unique reads36,61845,444 (Abundance > 1) Zhou et al. 2011, 4th International Barcode of Life Conference 13. OTU filtering workflow Unique OTU Alignment Remove Compared readscluster Chimerato reads (abunda(98%)of Lib 2 nce > 1)Mock36,61878449011944XSBN45,444 4,1893887403 399 Zhou et al. 2011, 4th International Barcode of Life Conference 14. Sanger Reference Blast at 100% identity ResultsNGS OTUsMock 4836LepF1/R1CustomizedXSBN 32 198197primersZhou et al. 2011, 4th International Barcode of Life Conference 15. Sanger ReferenceMockNGS OTUsFalse positive? 31 can be found in False negativeour total sample, from which our mock samplesNot found in raw were assembleddata (likely dueto primer failure)4 836 5 likely to be PCR errors Zhou et al. 2011, 4th International Barcode of Life Conference 16. Sanger ReferenceXSBN Cross-sampleNGS OTUs contamination?17 not found in rawdata (primer failure) Mean + SE 32198 197(group1) (group2)15 were lost indata filteringZhou et al. 2011, 4th International Barcode of Life Conference 17. Sanger Reference NGS OTUs Significantly less false positivesafter removal of sequenceswith abundance 98% identity;2. Reference coverage > 90%;Taxon groups # OTUsReference 1Coverage: 100%Lepidoptera20Correct Diptera 2mapping Hemiptera 3Psocoptera1 Total 26Reference 2Not found 13Coverage: 30%Incorrectmapping Zhou et al. 2011, 4th International Barcode of Life Conference 25. Potential sources of failure in detecting taxaTaxon specificor Bio-mass (size & number)Zhou et al. 2011, 4th International Barcode of Life Conference 26. Failures in taxon detectionTaxon bias?Taxon groups # Total # OTUs undetectedOTUsmissing Lepidoptera 25 5 Diptera7 5Hymenoptera 2 2Hemiptera 4 1 Psocoptera 1 0Total3913 Zhou et al. 2011, 4th International Barcode of Life Conference 27. Failures in taxon detectionOR bio-mass (body size, # individuals)? Readily detectedMissing Average length> 5mm Average length < 5mmZhou et al. 2011, 4th International Barcode of Life Conference 28. Approach #2: PCR-free method Method 2: Reference independent(Will we be able to identify diversity without reference MT genomesfor the targeted species?)Workflow:1. Assembly of COI gene using genome assembly program (SOAPdenovo);2. Annotation using ~240 MT genomes downloaded from Genbank;Zhou et al. 2011, 4th International Barcode of Life Conference 29. PCR-Free reference-independent: results 23/31 falling in standard COI barcode region (mostly >600 bp); 1 of 23 is not in our reference barcodes; (Insecta; Lepidoptera; Pyralidae); Multiple genes obtained simultaneously; 1 nearly complete mitochondrial genome (~15k bp); 3 fragments >6000 bp; Zhou et al. 2011, 4th International Barcode of Life Conference 30. Reference independent 23/31 falling in standard COI barcode 1 of 23 was not presented in our reference barcodes; region (mostly >600 bp); (Insecta; Lepidoptera; Pyralidae); Number of individuals we collected5 individuals failed in Sanger sequencing89 individuals3 OTUs not detected in referenceBarcode references independent method because:39 OTUs (84 individuals)References based(1) sequencing depth is too low 26 OTUs (