22
Differential Methylation Analysis Simon Andrews [email protected] @simon_andrews 1

Differential Methylation Analysis Simon Andrews [email protected] @simon_andrews 1

Embed Size (px)

Citation preview

Page 1: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

1

Differential Methylation Analysis

Simon [email protected]

@simon_andrews

Page 2: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

2

A basic question…

Page 3: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

3

Factors to consider

• Number of observations• Magnitude of effect• Technical considerations• Biological variability• Biological common sense

Page 4: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

4

The problem of power…• Ideally want to cover every Cytosine (CpG)• Have to correct for the number of tests

• There’s no way you’ll collect enough data to analyse each C and have p-values which survive multiple testing correction

• Stats have to find a way to work round this.

Page 5: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

5

Maximising power• Options– Analyse in windows– Pre-filter– Hierarchical or Adaptive filtering

Page 6: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

6

Window sizes

Small windows• Good resolution• Specific biological effects• High MTC burden• Small observations• High p-values

Large windows• Lots of data• High statistical power• Low MTC burden• Low p-values• Effect averaging

Page 7: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

7

Simple Statistical Approach

• Is the proportion of methylated calls different between two samples, given the number of observations?

Meth count A Unmeth count A Meth count B Unmeth count B % change Significant?

2 0 0 2 100 No

200 2 198 5 1.5 No

100 50 75 60 11 Probably

Page 8: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

8

Contingency tests• Chi-square / G-test / Fisher’s exact test– Differ only at low observations– Significant changes require enough observations

that any of these should give the same answer• Operates on single replicates• Technical measure of difference

Meth A Unmeth A

Meth B Unmeth B

Page 9: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

9

Chi-Square results

Page 10: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

10

Biological considerations

• Minimum relevant effect size?– Balance power vs change– What makes biological sense – (what would you follow up?)

• Minimum coverage worth testing– No point testing poorly covered regions

Page 11: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

11

Effect of pre-filtering

Page 12: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

12

Distribution of methylation

Chi square assumes a normal distribution, and methylation data isn’t normally distributed

Page 13: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

13

Beta binomial distribution

More relevant statistics than chi-square. Need to fit custom model to actual data.

Page 14: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

14

Implications of a beta distribution• Many summaries assume normality– Mean– Standard Deviation– Boxplots

• None of these is strictly appropriate when looking at methylation data

Page 15: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

15

Dealing with replicates• Simple approach

– Merge data from replicates together– Single test, High power– Post-hoc test for consistency

• Explicitly account for batch effects– Logistic regression– Measures batch effects and excludes them from final significance

calculation

• Work with methylation values– Normalise percentage methylation values– Use conventional statistics (t-tests etc) for comparing groups

Page 16: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

16

Hierarchical testing

• Test larger regions– Windows / Features etc.

• Take significant hits and subdivide– Smaller windows– Individual CpGs– Correct only for these tests

• Assemble hits together to make up DMRs

Page 17: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

17

Hierarchical testing

GenomeCGI CGI CGI CGI CGI CGI

GenomeCGI CGI CGI CGI CGI CGIX X X XGenomeCGI CGI CGI CGI CGI CGIX X X X

Statistically ‘creative’ solution to not having enough data

Page 18: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

18

Methylation statistics packages• swDMR (Perl/R-package)

Sliding window DMR finding (choose between t_test, Kolmogorov, Fisher, ChiSquare, Wilcoxon for n = 2; ANOVA, Kruskal for n > 3)

• methylKit* (R-package by A. Akalin et al.)Sliding window, Fisher’s exact test or logistic regression. Adjusts p-values to q-values using SLIM method.

• bsseq* (R/Bioconductor by K.D. Hansen) Implements the BSmooth smoothing algorithm. Numerous CpG-wise t-tests and p-value cutoff to define DMRs. Outperforms Fisher’s exact

test. Requires biological replicates for DMR detection

• BiSeq* (R/Bioconductor by K. Hebestreit et al.)Beta regression model, impractical for very large data other than RRBS or targeted BS-Seq

• RnBeads* (R package by F. Mueller et al.) works for 450K arrays, BS-Seq, MeDIP or MBD-Seq data

• DMAP* (C command line tool by P. Stockwell et al.)RRBS fragment or fixed window approach, Fisher’s exact test, Chi-squared or ANOVA

• RADMeth (C++ command line tool by E. Dolzhenko and A.D. Smith)Beta-binomial regression analysis to find DMCs or DMRs, local likelihood, adjust for neighbouring CpGs

• MOABS* (C++ command line tool by D. Sun et al.)Beta binomial hierarchical model to capture sampling and biological variation, Credible Methylation Difference (CDIF) single metric that

combines biological and statistical significance

• ComMet (Y. Saito et al., 2014)Bisulfighter suite; DMR detection based on hidden Markov models (HMMs) that enable automated adjustment of DMC chaining criteria. Does not require

biological replicates

• DSS (R/Bioconductor by Feng et al., 2014)Constructs genome-wide prior distribution for beta-binomial dispersion. Bayesian hierarchical model to detect differentially methylated loci

• more appearing every other week… * interface well with

Page 19: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

19

Tool Statistical test Suitable for Implementation Notes

bsseq Sample-wise smoothing, then group differences via CpG-wise t-tests (p-value cutoff to define adjacent CpG sites as DMRs)

WGBS; not designed for targeted BS-Seq or RRBS

R package/Bioconductor

Outperforms Fisher’s exact test; intended to compare 2 groups;replicates required

BiSeq Define CpG clusters, smooth methylation data, model and test group effect (fitting beta regression model to smoothed methylation levels and testing for group effect using the Wald test), hierarchical testing procedure on CpG clusters, then define DMR boundaries

RRBS; targeted BS-Seq; for WGBS

R package/Bioconductor

Very computationally intensive; Not limited to 2 groups

MethylKit Models CpG methylation within a logistic regression. Sliding linear model (SLIM) to correct for multiple testing

(e)RRBS R package

* WGBS = whole genome BS-Seq; (e)RRBS = (enhanced) reduced representation BS-Seq

Page 20: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

20

bsseq – for whole genome BS-Seq• Smoothing of low coverage BS-Seq first to get reliable semi-local

methylation estimation estimates

• Not suitable for captured or restricted data

• After smoothing it uses biological replicates to estimate biological variation and identify methylated regions (DMRs)

• Smoothing suitable for even a single sample

• Works for CpG context in humans, will probably not scale to 2x585M Cs in non-CG context

Page 21: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

21

BSmooth algorithm

black: 25x (Lister)pink: 4x (Lister)

Page 22: Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews 1

22

Bsmooth t-values