23
Sequencing technologies and Velvet assembly Lecturer Du Shengyang September 29 2012

Sequencing technologies and Velvet a ssembly

  • Upload
    varick

  • View
    54

  • Download
    0

Embed Size (px)

DESCRIPTION

Sequencing technologies and Velvet a ssembly. Lecturer : Du Shengyang September 29 , 2012. The Advances of DNA Sequencing Technology. 化学降解法. Sanger 法. 荧光自动测序技术. 454. The second generation of sequencing technologies. Solexa. SOLiD. - PowerPoint PPT Presentation

Citation preview

Page 1: Sequencing  technologies and   Velvet  a ssembly

Sequencing technologies and Velvet assembly

Lecturer: Du ShengyangSeptember 29 , 2012

Page 2: Sequencing  technologies and   Velvet  a ssembly

The Advances of DNA Sequencing Technology

The first generation of se-quencing technologies

化学降解法Sanger 法

荧光自动测序技术

The second generation of sequencing technologies

454

Solexa

SOLiD

Page 3: Sequencing  technologies and   Velvet  a ssembly

The third generation of sequencing

一、 Helico BioScience 单分子测序技术 二、 Pacific Bioscience SMRTT 技术 三、 Oxford Nanopore Technologies 的纳米孔单分子测序技术

Page 4: Sequencing  technologies and   Velvet  a ssembly

三代测序技术的优点

High throughput, low cost, long read length, sequencing time is short

And avoid the second generation sequencing of PCR amplification link

reduce the sequencing of the error rate, the real realize the single molecule sequencing

Page 5: Sequencing  technologies and   Velvet  a ssembly

The key to Sequencing success

1 、 Sample preparation

2 、 Choose the right sequencing platform

3 、 Late bioinformatics analysis

Page 6: Sequencing  technologies and   Velvet  a ssembly

Bioinformatics analysis

Introduction Some sequencing techniques are commercially available (e.g. 454

Sequencing, Solexa)

454 Sequencing ~ 100 – 200bp

Solexa ~ 30bp

Page 7: Sequencing  technologies and   Velvet  a ssembly

Introduction

Euler assembler (Pevzner 2001) used k-mer for a node of de Bruijn graphs

Reads are mapped as a path through the de Brujin graph

High redundancy does not affect the number of nodes

“Velvet” effectively deals with experimental errors and repeats by us-ing Brujin graphs with k-mers

Page 8: Sequencing  technologies and   Velvet  a ssembly

De Bruijn Graphs - structure

Page 9: Sequencing  technologies and   Velvet  a ssembly

De Bruijn Graphs – construction

Adjacent k-mers overlap by k-1 nucleotides

Each node is attached to twin node Reverse series of reverse complement k-mers Overlap between reads from opposite strand

Union of a node and its twin node is called a “block”

Page 10: Sequencing  technologies and   Velvet  a ssembly

De Bruijn Graphs – construction

For each k-mer, hash table records ID of the first read and its posi-tion

Each k-mer is recorded with reverse complement

Reads are traced through the graph

Create a directed arc if necessary

Page 11: Sequencing  technologies and   Velvet  a ssembly

11

De Bruijn Graphs – simplification

Simplify the chains of blocks No information loss

If node A has only one outgoing arc to node B, and if node B has only one ingoing arc → merge

A B

Page 12: Sequencing  technologies and   Velvet  a ssembly

12

De Bruijn Graphs – error removal

Velvet focuses on “topological features” of

the graph

First step: remove tips Tip: chain of nodes disconnected on one end

Use two criteria: (1) length and (2) minority count Length: remove a tip if < 2k bp

since two nearby errors can create a tip up to 2k bp

error error

k k

Page 13: Sequencing  technologies and   Velvet  a ssembly

13

De Bruijn Graphs – error removal

Minority count: multiplicity m < n

Starting from node B, going through the tip is an alterna-tive to a more common path

m

n

B

tip

A

C

Page 14: Sequencing  technologies and   Velvet  a ssembly

14

De Bruijn Graphs – error removal

Second step: remove bubbles using Tour

Bus

Redundant paths start and end at the same nodes

Bubbles are created by errors or biological variants such as SNP

Bubble

Page 15: Sequencing  technologies and   Velvet  a ssembly

15

De Bruijn Graphs – error removal

1. Detect redundant paths

2. Compare them using dynamic programming methods

3. If similar, merge them

Tour Bus

Page 16: Sequencing  technologies and   Velvet  a ssembly

16

De Bruijn Graphs – error removal

Third step: remove erroneous connections

Remove erroneous connections after Tour Bus algorithm

Remove erroneous connections with basic coverage cutoff

Genuine short nodes which cannot be simplified in the graph should have high coverage

Page 17: Sequencing  technologies and   Velvet  a ssembly

17

Breadcrumb: resolution of repeats

1. Using read pairs, pair up the long nodes

2. Flag paired reads using unambiguous long nodes

unambiguous long nodes

Page 18: Sequencing  technologies and   Velvet  a ssembly

18

Breadcrumb: resolution of repeats

Extends the nodes as far as possible using flagged paired reads

All nodes between A and B are paired up to either A or B

Page 19: Sequencing  technologies and   Velvet  a ssembly

19

Experimental Results

Test error removal pipeline on simulated data Simulate reads are from E. coli, S. cerevisiae,

C.elegans, and H. sapiens

Page 20: Sequencing  technologies and   Velvet  a ssembly

20

Experimental Results

Test error removal pipeline on experimental data

173,428 bp human BAC was sequenced using Solexa machines

Reads were 35bp long, and k=31

Tour Bus increased sensitivity by correcting errors andpreserved the integrity of the graph structure

Page 21: Sequencing  technologies and   Velvet  a ssembly

21

Experimental Results (cont)

Page 22: Sequencing  technologies and   Velvet  a ssembly

22

Conclusions

Velvet is a de Bruijn graph based sequence assembly method for short reads

Errors are handled by removing tips and Tour Bus algo-rithm

A large number of repeats are resolved by Breadcrumb algorithm

Velvet was assessed using simulated and real datasets and it performed well

Page 23: Sequencing  technologies and   Velvet  a ssembly