Lesson 5 Genetic Variant Annotation

Preview:

DESCRIPTION

CBI Tech. Workshop - NGS Special Session. Lesson 5 Genetic Variant Annotation. Linlin Yan ( 颜林林 ) Center for Bioinformatics, Peking University Jun 13, 2011. Outline. Review & Overview Thoughts & Methods Variant Browsing Variant Annotation Association Study More Beyond Demos & Exercises. - PowerPoint PPT Presentation

Citation preview

Lesson 5

Genetic Variant Annotation

Linlin Yan ( 颜林林 )Center for Bioinformatics, Peking University

Jun 13, 2011

CBI Tech. Workshop - NGS Special Session

2

Outline

Review & Overview

Thoughts & MethodsVariant BrowsingVariant AnnotationAssociation StudyMore Beyond

Demos & Exercises

Part I: Review & Overview

4

Workshop ScheduleTopic Title Speake

rDate

0 Warm-up Warm-up and Introduction GaoG 4-25

1 Basic File Format & Reads Mapping YanLL 5-9

2 Solexa Pipeline CaiT 5-16

3 Genetics Alignment File Manipulate YeYX 5-23

4 Genetic Variant Caller LiuH 5-30

5 Genetic Variant Annotation YanLL 6-13

6 Genome Assembling LiZ 6-20

7 Transcriptome(RNA-Seq)

... CaiT 6-27

8 Transcript Mapping ZhaoHQ 7-4

9 Transcript Assembling LiuXQ 7-11

10 Differential Expression Caller ChenWB 7-18

11 ChIP-Seq Peak Caller TangX 7-25

5

NGS Analysis Workflow

Short Reads

Sequencer

Assembling Mapping

Contigs / Scaffolds AlignmentsCall Variants

Call PeaksCalculate

ExpressionSNV / CNV / SV

Expression Profile

Peaks / RegionsAnnotation

6

Genetic Variant Analysis WorkflowSolexa Pipeline (Lesson 2)

File Format (Lesson 1) FASTQ / Quality / SAM / ...

Reads Mapping (Lesson 1) Maq / Bowtie / BWA

Alignment File Manipulate (Lesson 3) Samtools / BedTools / FastX-tool

Genetic Variant Caller (Lesson 4) GATK

Genetic Variant Annotation (Lesson 5) PolyPhen / SIFT / ANNOVAR / PLINK / ...

Sequencer

Short Reads

Mapping

Alignments

Call Variants

SNV / CNV / SV

Annotation

Part II: Thoughts & Methods

8

What Could Be Inferred from Variants

What at the positions?

How affect functions?

What related to phenotype?

More beyond ...

=> Genome Browser

=> Variant Annotation

=> Association Study

=> Disease: CDCV vs. CDRV

SNV / CNV / SVGenetic Variants

Genome Annotation

Mutation Effects

PhenotypeDisease

9

Genome Browser

Online Browsers:

UCSC Genome Browserhttp://genome.ucsc.edu/

Ensembl Genome Browserhttp://www.ensembl.org/

DNAnexushttps://dnanexus.com/genomes/hg18/public_brows

e

Local Browsers:

IGV (Integrative Genomics Viewer)http://www.broadinstitute.org/igv/

10

UCSC Genome Browser

(http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammal&org=Human&db=hg19)

11

UCSC Genome Browser (cont.)

Support Formats:BED / bigBedbedGraphGFFGTFWIG / bigWig

MAFBAMBED detailPersonal Genome SNPPSL

(http://genome.ucsc.edu/)

12

IGV (Integrative Genomics Viewer)

(http://www.broadinstitute.org/igv/)

13

UCSC: Table Browser & Public DB

Retrieve track data in batch

Retrieve sequences in specific regions

Combine regions and/or annotations

Query track data in public MySQL database

(http://genome.ucsc.edu/cgi-bin/hgTables)

These are KNOWN variants.

How about UNKNOWN variants?

15

Mutation Effects Prediction

SIFT (Sorting Intolerant From Tolerant)http://sift.jcvi.org/

PolyPhen (Polymorphism Phenotyping)http://genetics.bwh.harvard.edu/pph/

MAPP (Multivariate Analysis of Protein Polymorphism)http://mendel.stanford.edu/SidowLab/downloads/MAPP/in

dex.html

SNPs3Dhttp://www.snps3d.org/

16

Automatically Variant Annotation

ANNOVAR (ANNOtate VARiation)http://www.openbioinformatics.org/annovar/

Gene-based annotationSNPs/CNVs affect protein coding

Region-based annotationsVariants in specific region

Filter-based annotationVariants reported in dbSNP, 1000 genomesFilter by SIFT score

OthersRetrieve sequences or cadidate gene list in batch

17

Between Patients and Normals

Too many variants detected

Most variants are not related to target disease

Comparing MAF (Minor allele Frequency) between patients and normals can indicate related variants

MAF Patients Normals Related

SNP1 5% 5% No

SNP2 40% 10% Yes

19

More Beyond: Find Out Causal Gene

Two Disease Hypothesis Models:CDCV: Common Disease, Common VariantCDRV: Common Disease, Rare Variant

To Find Out Rare VariantFrom GWAS (Microarray) to SequencingMore SamplesPool-up analysis methods

20

Rare Variant Analysis

Gene-Based Method

(PMID:17660818)

21

Pool Up The Rare Variants

Fixed-Threshold Method (Li, et al, 2008)

Weighted Approach (Madsen, et al, 2009)

Variable-Threshold Method (VT-Test) (Price, et al, 2010)http://genetics.bwh.harvard.edu/rare_variant

s/

Part III: Demos & Exercises

23

Demos

Data PreparingReads MappingVariant CallingBED/Wig generation

24

Demos (cont.)

UCSC Genome BrowserUploading BAM/BED/Wig

IGV Genome BrowserLoading BAM/BED/Wig

UCSC Table BrowserRetrieve track dataRetrieve coding sequences

UCSC Public Database

25

Demos (cont.)

SIFT & PolyPhen

ANNOVAR

PLINK

VT-Test

Thanks for your attention!

Recommended