40
Analysis Of High-throughput DNA Methylation Profiling 張益峯 Yi-Feng Chang PhD Candidate, Biomedical Informatics NYMU

20140613 Analysis of High Throughput DNA Methylation Profiling

Embed Size (px)

Citation preview

Page 1: 20140613 Analysis of High Throughput DNA Methylation Profiling

Analysis Of High-throughput DNA Methylation Profiling

張益峯Yi-Feng ChangPhD Candidate, Biomedical Informatics NYMU

Page 2: 20140613 Analysis of High Throughput DNA Methylation Profiling

Outline

DNA methylation

The fundamental of bisulfite sequencing technology

Current status of published BS-Seq resources

Information could be presented in a BS-Seq study

Published tools for analyzing BS-Seq data

A comprehensive BS-Seq analysis tool: MethPipe

2

Page 3: 20140613 Analysis of High Throughput DNA Methylation Profiling

Epigenetics Overview

3http://commonfund.nih.gov/epigenomics/figure.aspx

Page 4: 20140613 Analysis of High Throughput DNA Methylation Profiling

DNA Methylation

4

Singal, R. & Ginder, G.D. DNA Methylation. Blood Journal 93, 4059-4070 (1999).

Page 5: 20140613 Analysis of High Throughput DNA Methylation Profiling

DNA Methylation Pathway

5

Moore, L.D., Le, T. & Fan, G. DNA methylation and its basic function. Neuropsychopharmacology 38, 23-38 (2013).

Singal, R. & Ginder, G.D. DNA Methylation. Blood Journal 93, 4059-4070 (1999).

Page 6: 20140613 Analysis of High Throughput DNA Methylation Profiling

DNA Demethylation Pathway

6Moore, L.D., Le, T. & Fan, G. DNA methylation and its basic function. Neuropsychopharmacology 38, 23-38 (2013).

• Tet: Ten-eleven translocation enzymes

• AID/ APOBEC: activation-induced cytidine

deaminase/apolipo- protein B mRNA-editing

enzyme complex

• TDG: Thymine DNA glycosylase

• SMUG1: Single-strand-selective

monofunctional uracil-DNA glycosylase 1

• 5mC: 5-Methylcytosine

• 5hmC: 5-hydroxymethyl-cytosine

• 5hmU: 5-hydroxymethyl-uracil

• 5fC: 5-formyl-cytosine

• 5caC: 5-carboxy-cytosine

Page 7: 20140613 Analysis of High Throughput DNA Methylation Profiling

Timeline of DNA Methylation Analysis

7Harrison, A. & Parle-McDermott, A. DNA methylation: a timeline of methods and applications. Front Genet 2, 74 (2011).

MS-HRM

MeDIP-Seq

BS-Seq

MethylC-Seq

TAB-Seq

Page 8: 20140613 Analysis of High Throughput DNA Methylation Profiling

Bisulfite Sequencing Technology

Page 9: 20140613 Analysis of High Throughput DNA Methylation Profiling

The Steps to Determining the Methylation Status of Cytosine in a Known DNA Sequence by The Bisulfite Conversion Method

9Singal, R. & Ginder, G.D. DNA Methylation. Blood Journal 93, 4059-4070 (1999).

Page 10: 20140613 Analysis of High Throughput DNA Methylation Profiling

Techniques for Enrichment of Methylated or Target Regions Prior to BS Sequencing

10Lister, R. & Ecker, J.R. Finding the fifth base: genome-wide sequencing of cytosine methylation. Genome Res 19, 959-66 (2009).

Genomic DNA

Deep Sequencing

Harrison, A. & Parle-McDermott, A. DNA

methylation: a timeline of methods and

applications. Front Genet 2, 74 (2011).

Page 11: 20140613 Analysis of High Throughput DNA Methylation Profiling

Techniques for Genome-Wide Sequencing of Cytosine Methylation Sites

11

Lister, R. & Ecker, J.R. Finding the fifth base: genome-wide sequencing of cytosine methylation. Genome Res 19, 959-66 (2009).

Genomic DNA

Deep Sequencing

TAB-Seq: Tet-Assisted Bs-Seq

Yu, M. et al. Tet-assisted bisulfite sequencing of 5-hydroxymethylcytosine. Nat Protoc 7, 2159-70 (2012).Yu, M. et al. Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Cell 149, 1368-80 (2012).

Page 12: 20140613 Analysis of High Throughput DNA Methylation Profiling

Genomic Coverage of MeDIP-seq, MethylCap-seq, RRBS and Infinium

12Bock, C. et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol 28, 1106-14 (2010).

MeDIP-seq and MethylCap-seq provide broad cover- age of the genome, whereas RRBS and Infinium are more restricted to CpG islands and promoter regions

Page 13: 20140613 Analysis of High Throughput DNA Methylation Profiling

Key Metrics of the Technology Comparison

13Beck, S. Taking the measure of the methylome. Nat Biotechnol 28, 1026-8 (2010).

Page 14: 20140613 Analysis of High Throughput DNA Methylation Profiling

Sequencing Coverages of NGS Platforms

14Sims, D., Sudbery, I., Ilott, N.E., Heger, A. & Ponting, C.P. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet 15, 121-32 (2014).

Page 15: 20140613 Analysis of High Throughput DNA Methylation Profiling

Purified gDNA

5 mg End RepairFragmentation

3’ End AdenylationMethylated

adapter ligation

Fragment size

selection 200-400 bp

Whole Genome Bisulfite Sequencing Library Construction

Purify ligation

product

Library preparation using

PE sample prep kit

200-250 bp

250-300 bp

300-350 bp

Bisulfite

conversion

Zymo EZ DNA

Methylation Kit

(Qiagen EpiTec Kit)

C C

C U

Purify

3 separate tubes

for each library3 libraries

PCR, 4 to 8

cycles

PfuTurbo Cx

Hotstart DNA

polymerase

Purify

Validate library

15陽明大學榮陽基因體研究中心

Page 16: 20140613 Analysis of High Throughput DNA Methylation Profiling

Whole Genome Bisulfite Sequencing Library Construction

16

回收 200-400 bp片段

純化 3-5 μg 基因體 DNA

修補端點(End repair)DNA 斷裂

3’ End

Adenylation

C-Methylated

adapter 黏合 純化黏合後序列

使用 PE sample prep kit 進行 Library preparation

200-250 bp

250-300 bp

300-350 bp

亞硫酸氫鹽處理(Bisulfite

conversion)

Zymo EZ DNA

Methylation Kit

(Qiagen EpiTec Kit)

C C

C U

純化

3 separate tubes

for each library3 libraries

PCR, 4 to 8

cycles

PfuTurbo Cx

Hotstart DNA

polymerase

純化

Validate library

定序

Page 17: 20140613 Analysis of High Throughput DNA Methylation Profiling

IVC (Intensity versus Cycle) Plot of Bisulfite Sequencing

17

Library size 250 bpPhiX control

45% GC

Read 1 Read 2

% Base % Intensity

29% GC

Library size 350 bp Library size 430 bp

40% GC

22% GC

Read 1 Read 2

% Intensity

定序到adapter

% Base

Page 18: 20140613 Analysis of High Throughput DNA Methylation Profiling

IVC (Intensity versus Cycle) Plot of Bisulfite Sequencing

18

PhiX control Library size 250 bp

45% GC

Read 1 Read 2

% Base % Intensity

29% GC

Library size 350 bp Library size 430 bp

40% GC

22% GC

Read 1 Read 2

% Intensity

Reading

into adapter

% Base

Page 19: 20140613 Analysis of High Throughput DNA Methylation Profiling

Library size 300 bp

Library size 400 bp

Library size 500 bp

Fragment Size Effects

19

PhiX control

Reading into adapter Genomic coverage will be uneven

Read length 2x75

bp

Amplification bias, bisulfite conversion bias, sequencing bias

DNA fragments size <

250 bp,

library size < 350 bp

(insert +121 bp)

Page 20: 20140613 Analysis of High Throughput DNA Methylation Profiling

Public BS-Seq Resources from MethBase

http://smithlabresearch.org/software/methbase/

20

Page 21: 20140613 Analysis of High Throughput DNA Methylation Profiling

Human

Acute Myeloid Leukemia

B Cells

BCell/Fibro/iPSC

Blood Cells from Different Ages

Brains (Chimp)

Breast Cancer

Buccal cells

Chronic Lymphocytic Leukemia

Colon Cancer

Colorectal Cancer and Adenomatous Polyp

Developing human brain

ENCODE RRBS Dataset

ESC Differentiation

Fetal Lung Fibroblasts

Fibroblasts

Hematopoietic Stem Cells (Chimp)

Induced Pluripotent Stem Cells

Leukocytes

LuWen-Brain-2014

Lymphoblastoid

Mutiple tissue

Neuroepithelium Cells

Neuronal Cells

Peripheral Blood Mononuclear Cells

Placenta, kidney, etc

Sperm (Chimp)

21

Page 22: 20140613 Analysis of High Throughput DNA Methylation Profiling

Mouse

5hmC in ESC

Aid Deficiency

Colon Epithelial Cells

Developing human brain

Early Embryo

Embryonic Fibroblasts

Embryonic Stem Cells and Neuroprogenitors

Frontal Cortex

Gamete and Early Embryo

Normal liver vs HCC (HBx TG mouse liver) GEO: GSE48052

Hematopoietic Cells, DNMT3A KO

Hematopoietic Cells, IDH1-R132H KI

Intestinal stem cell

Lung Tissue

mESC

mESC (Tet1)

Mouse B Lymphocyte

Multiple tissues (17)

Nucleus-transferred Zygotes

Oocyte

Oocytes and PreimplantationEmbryos

Primordial Germ Cells

22

Page 23: 20140613 Analysis of High Throughput DNA Methylation Profiling

Plant

Endosperm, embryo and aerial tissue

Floral and leaf (IDN mutant)

Floral buds methylome: C24 and Ler hybrid

IDM1 regulates active demethylation

Leaf: ATXR5/ATXR6 mutants

Leaf: spontaneous epimutation

Rossetes: spontaneous epimutation

Seedling: hybrid

23

Page 24: 20140613 Analysis of High Throughput DNA Methylation Profiling

Other Organisms (from NCBI GEO)

Glycine max (Soy beans)

Schistocerca gregaria (Locust)

Rattus norvegicus (Rat)

Danio rerio (Zebra fish)

Drosophila melanogaster (Fruit fly)

Oryza sativa (Rice)

Pan troglodytes (Chimp)

Macaca mulatta (Rhesus monkey)

Mus musculus domesticus (Western Europen house mouse)

Xenopus (Silurana) tropicalis (Frog)

Cynoglossus semilaevis (Tongue sole, bony fish)

Bombyx mori (Silkworm)

Harpegnathos saltator (Jerdon'sjumping ant)

Camponotus floridanus (Florida carpenter ant)

24

Page 25: 20140613 Analysis of High Throughput DNA Methylation Profiling

To access MethBase

25

http://smithlabresearch.org/software/methbase/

Page 26: 20140613 Analysis of High Throughput DNA Methylation Profiling

Information could be Presented in a BS-Seq Study

Sequencing depth

Coverage of Genome length

CpG sites

Bisulfite conversion rates Lambda virus DNA

CHG, CHH Sites (H = Not G = A, C, or T)

Statistics of methylation ratios of CpG, CHG, CHH

Methylation ratios of gene structures

Association with regulatory elements

Differential methylation region (DMR)

26

Page 27: 20140613 Analysis of High Throughput DNA Methylation Profiling

DNA Methylome Studies

27

Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomicdifferences. Nature 462, 315-22 (2009).

Cokus, S.J. et al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452, 215-9 (2008).

Methylome only Methylome/Transcriptome

Page 28: 20140613 Analysis of High Throughput DNA Methylation Profiling

Contrast Studies

Hon, G.C. et al. Epigenetic memory at embryonic enhancers identified in DNA methylation maps from adult mouse tissues. Nat Genet 45, 1198-206 (2013).

Lister, R. et al. Global epigenomic reconfiguration during

mammalian brain development. Science 341, 1237905 (2013).28

17 Tissues

Human/Mouse Brain Development

Page 29: 20140613 Analysis of High Throughput DNA Methylation Profiling

Association with Regulatory Elements

29Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315-22 (2009).

Hon, G.C. et al. Epigenetic memory at embryonic enhancers identified in DNA methylation maps from adult mouse tissues. Nat Genet 45, 1198-206 (2013).

Page 30: 20140613 Analysis of High Throughput DNA Methylation Profiling

Differential methylation region (DMR)

30Hon, G.C. et al. Epigenetic memory at embryonic enhancers identified in DNA methylation maps from adult mouse tissues. Nat Genet 45, 1198-206 (2013).

Page 31: 20140613 Analysis of High Throughput DNA Methylation Profiling

DNA Methylome Analysis Using BS-Seq Data

31

Page 32: 20140613 Analysis of High Throughput DNA Methylation Profiling

Effect and Problems of Bisulfite Treatment of DNA

32

Krueger, F., Kreck, B., Franke, A. & Andrews, S.R. DNA methylomeanalysis using short bisulfite sequencing data. Nat Methods 9, 145-51 (2012).

Xi, Y. & Li, W. BSMAP: whole genome bisulfite sequence MAPpingprogram. BMC Bioinformatics 10, 232 (2009).

Mapping bisulfite reads to 4

possible bisulfite strands

(BSW/BSWR/BSC/BSCR) is

equivalent to mapping the

bisulfite read and its reverse

complementary read to both

Watson/Crick strands of the

original reference sequence.

Page 33: 20140613 Analysis of High Throughput DNA Methylation Profiling

How to Align BS Reads Against Reference Genome?

33Krueger, F. & Andrews, S.R. Bismark: A flexible aligner and methylation caller for Bisulfite-Seqapplications. Bioinformatics (2011).

. Bock, C. Analysing and interpreting DNA methylation data. Nat Rev Genet 13, 705-19 (2012)

Y=C or T

TCGA TCGT ACGTATGA

Multiple hits

TTGT ATGT

Multiple hits

Page 34: 20140613 Analysis of High Throughput DNA Methylation Profiling

Recommended Workflow for the Primary Analysis of BS-Seq data

34Krueger, F., Kreck, B., Franke, A. & Andrews, S.R. DNA methylome analysis using short bisulfite sequencing data. Nat Methods 9, 145-51 (2012).

http://omictools.com/bisulfite-seq/

Page 35: 20140613 Analysis of High Throughput DNA Methylation Profiling

Published Tools

35Bock, C. et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol 28, 1106-14 (2010).Krueger, F., Kreck, B., Franke, A. & Andrews, S.R. DNA methylome analysis using short bisulfite sequencing data. Nat Methods 9, 145-51 (2012).http://omictools.com/bisulfite-seq/

B-SOLANA Bisulphite aligner for processing bisulphite-sequencing color space data http://code.google.com/p/bsolana

BatMeth Base and color space data http://code.google.com/p/batmeth

Bicycle Lister et al. 2009 workflow http://sing.ei.uvigo.es/bicycle/howitworks.html

BiQ Analyzer HTLocus-specific analysis and visualization of high-throughput bisulfite sequencing data

http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de

BiSeq DMR for RRBS data R/Bioconductor package BiSeq

BISMA Support analysis of repetitive sequences http://biochem.jacobs-university.de/BDPC/BISMA

BismarkProbably the most widely used three-letter bisulphite aligner; supports both Bowtie (fast, gap-free alignment) and Bowtie 2.0 (sensitive, gapped alignment)

http://www.bioinformatics.babraham.ac.uk/projects/bismark

Bis-SNPVariant caller for inferring DNA methylation levels and genomic variants from BS-Seq reads that have been aligned by other tools

http://epigenome.usc.edu/publicationdata/bissnp2011

Bisulfighter Using Last for mapping, HMM for DMR detection http://epigenome.cbrc.jp/bisulfighter

BRAT Highly configurable and well-documented three-letter BS-Seq aligner http://compbio.cs.ucr.edu/brat

BS-SeekerBS-Seeker 2

Three-letter BS-Seq aligner based on Bowtiehttp://pellegrini.mcdb.ucla.edu/BS_Seeker/BS_Seeker.html

BSMAP Probably the most widely used wild-card BS-Seq aligner http://code.google.com/p/bsmap

Bsmooth Mapping, quality control and DMR analysis pipeline http://rafalab.jhsph.edu/bsmooth

COHCAP Integration with gene expression data https://sourceforge.net/projects/cohcap/

CpG_MPs Methylation patterns of genomic regions http://202.97.205.78/CpG_MPs/

DMAP DMR for BS-Seq and RRBS datahttp://biochem.otago.ac.nz/research/databases-software/

DSS Bayesian hierarchical model to detect differentially methylated loci (DML) R/Bioconductor package DSS

Epidiff DMR detection http://bioinfo.hrbmu.edu.cn/epidiff

Page 36: 20140613 Analysis of High Throughput DNA Methylation Profiling

Published Tools (cont.)

36

GSNAP Wild-card BS-Seq aligner included in a widely used general-purpose alignment tool http://share.gene.com/gmap

GBSA Analysis pipeline for gene-centric or gene-independent focus http://ctrad-csi.nus.edu.sg/gbsa

FadE Mapping for Base and Color space http://code.google.com/p/fade

Kismeth Designed to be used with plants http://katahdin.mssm.edu/kismeth

LastRecent and well-validated wild-card BS aligner included in a general-purpose alignment tool

http://last.cbrc.jp

MethPipe Mapping, BS conversion rate, HMR, DMR pipeline http://smithlabresearch.org/software/methpipe

Methyl-MAPSMethyl-Analyzer

Base and color space data + post analysishttp://epigenomicspub.columbia.edu/methylanalyzer_data.html

MethylCoderThree-letter Bs-Seq aligner that can be used with either Bowtie (high speed) or GSNAP (high sensitivity)

https://github.com/brentp/methylcode

MethylExtract Detects variation http://bioinfo2.ugr.es/MethylExtract

MethylSig R package pipeline for BS-Seq and RRBS http://sartorlab.ccmb.med.umich.edu/software

MOABS DMR detection http://code.google.com/p/moabs

Pash Wild-card BS aligner included in a general-purpose alignment tool http://brl.bcm.tmc.edu/pash

RMAPRMAPBS

Wild-card BS aligner included in a general-purpose alignment toolhttp://www.cmb.usc.edu/people/andrewds/rmaphttp://smithlabresearch.org/software/methpipe

RRBSMAPVariant of BSMAP that is specialized on reduced-representation bisulphitesequencing (RRBS) data

http://rrbsmap.computational-epigenetics.org

SAAP-RRBS RRBS mappinghttp://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm

segemehl Wild-card bisulphite aligner included in a general-purpose alignment tool http://www.bioinf.uni-leipzig.de/Software/segemehl

SOCS-B Robin-Karp hashin, color space data http://solidsoftwaretools.com/gf/project/socs

Bock, C. et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol 28, 1106-14 (2010).Krueger, F., Kreck, B., Franke, A. & Andrews, S.R. DNA methylome analysis using short bisulfite sequencing data. Nat Methods 9, 145-51 (2012).http://omictools.com/bisulfite-seq/

Page 37: 20140613 Analysis of High Throughput DNA Methylation Profiling

How to Select a BS-Seq Analysis Tool?

Actively update

Good supports from authors or communities BS-Seeker 2 Bismark

Post-analysis tools MethPipe

Kunde-Ramamoorthy, G. et al. Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing. Nucleic Acids Res 42, e43 (2014) Bismark (Balanced speed and genome coverage) BSMAP (Low genome coverage) Pash (High genome coverage, slow)

37

Page 38: 20140613 Analysis of High Throughput DNA Methylation Profiling

MethPipe

38Allele-specific Methylated Regionsamrfinder allelicmeth

Differential Methylation Regiondmr

Large Hypo/Hyper-Methylation Domainspmd

Hypo/Hyper-Methylation Regionshmr hmr_plant pmr

Methylation Callingmethcounts

Bisulfite Conversion Ratebsrate

Remove Duplicate Readsduplicate-remover

Mappingrmapbs rmapbs-pe

Quality Trimmingfastq_masker

Cross-species Comparison of MethylomesliftOver

Calculating Methylation Ratio for Genomic RegionsbigWigAverageOverBed

roimethstat Bwtools

Generate Methylation BED fileBedtools bedGraphToBigWig

fastx toolkit: http://hannonlab.cshl.edu/fastx_toolkit/

MethPipe: http://smithlabresearch.org/software/methpipe/

Bedtools: https://github.com/arq5x/bedtools2

Programs from UCSC Genome Browser: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64

bwtool: https://github.com/CRG-Barcelona/bwtool/wiki

Page 39: 20140613 Analysis of High Throughput DNA Methylation Profiling

39

基因體技術與資料分析手冊陽明大學榮陽基因體研究中心出版

Page 40: 20140613 Analysis of High Throughput DNA Methylation Profiling

Analysis Of High-throughput DNA Methylation Profiling

DNA methylation

The fundamental of bisulfite sequencing technology

Current status of published BS-Seq resources

Information could be presented in a BS-Seq study

Published tools for analyzing BS-Seq data

A comprehensive BS-Seq analysis tool: MethPipe

Questions?

40