96
Mul$pla(orm and crossmethodological reproducibility of transcriptome profiling by RNAseq in the ABRF Next Genera$on Sequencing Study (ABRFNGS) and FDA’s SEQC Study Christopher E. Mason, Ph.D. Assistant Professor Department of Physiology and Biophysics & The Ins$tute for Computa$onal Biomedicine at Weill Cornell Medical College and the TriIns$tu$onal Program on Computa$onal Biology and Medicine Affiliate Fellow of the Informa$on Society Project, Yale Law School July 10 th , 2014 @mason_lab

20140710 6 c_mason_ercc2.0_workshop

Embed Size (px)

DESCRIPTION

20140710 Christopher Mason ERCC 2.0 Workshop

Citation preview

Page 1: 20140710 6 c_mason_ercc2.0_workshop

Mul$-­‐pla(orm  and  cross-­‐methodological  reproducibility  of  transcriptome    

profiling  by  RNA-­‐seq  in  the  ABRF  Next-­‐Genera$on  Sequencing  Study  (ABRF-­‐NGS)  

and  FDA’s  SEQC  Study  

Christopher  E.  Mason,  Ph.D.  Assistant  Professor  

Department  of  Physiology  and  Biophysics  &  The  Ins$tute  for  Computa$onal  Biomedicine  at  

Weill  Cornell  Medical  College  and  the  Tri-­‐Ins$tu$onal  Program  on  Computa$onal  Biology  and  Medicine  Affiliate  Fellow  of  the  Informa$on  Society  Project,  Yale  Law  School  

July  10th,  2014   @mason_lab  

Page 2: 20140710 6 c_mason_ercc2.0_workshop

0.    Background    

1.  RNA  Standards  

2.  Removing  RNA-­‐Seq  Varia$on  

3.  Epitranscriptome  

4.  #JustSequenceIt  

 

Page 3: 20140710 6 c_mason_ercc2.0_workshop

Exome:  2%  of  the  genome?  

Page 4: 20140710 6 c_mason_ercc2.0_workshop

     

Human Genome Organization

from  Mason,  State,  and  Moldin,  Kaplan  &  Sadock’s  Comprehensive  Textbook  of  Psychiatry,  2009  

Page 5: 20140710 6 c_mason_ercc2.0_workshop

ENCODE  ac$ve  elements  

“These  data  enabled  us  to  assign  biochemical  func$ons  for    80%  of  the  genome,    

in  par$cular  outside  of  the  well-­‐studied  protein-­‐coding  regions.”  

Page 6: 20140710 6 c_mason_ercc2.0_workshop

“The  ENCODE  results  were  predicted  by  one  of  its  authors  to  necessitate  the  rewri$ng  of  textbooks.”  “We  agree,  many  textbooks  dealing  with  marke;ng,  mass-­‐media  hype,  and  public  rela;ons  may  well  have  to  be  rewriAen.”  

Page 7: 20140710 6 c_mason_ercc2.0_workshop

GENCODE  annota;on  (v19)  July  10th,  2014  

 

Coding  genes:  20,345    Noncoding  genes:  22,883    Psuedogenes:  14,206  

Other  genes:  516  57,820  total  

 

Page 8: 20140710 6 c_mason_ercc2.0_workshop

RNA-­‐Seq  and  all  its  flavors  create  excitement  

Page 9: 20140710 6 c_mason_ercc2.0_workshop

Differential expression by gene, exon, splice isoform, allele, & transcript!Algorithms: STAR, r-make, ASE, limma-voom, RSEM"

3!

reads"

alignments"

Sorted BAMs"

GENCODE "annotation"

queries "

Rea

ds/b

p"

Alignment on HPC nodes"

References!

hg19"

RefSeq"

miRBase"

rRNA"

Adapters"

Find ncRNAs and new genes!Algorithms: r-make, Aceview"

4!

Sequencing Data"

Gene fusion detection!Algorithms: r-make, Snowshoes"

5!

Genetic variation (SNVs and indels)!Algorithms: STAR/GATK, r-make"

2!

Predict polyA sites & gain/loss of miRNA binding sites!Algorithms: r-make, BAGET AlexaSeq, TargetScan"

6!

!Viruses/Bacteria/Other Algorithm: MG-RAST"

Remaining Reads! TCTGCTTTAGGATAGATCGATAGCTAGTTCAT CTGCTTTAGGATAGATCGATAGCTAGTTCATC!TCTGCTTTAGGATAGATCGATAGCTAGTTCAT!

7!

1!

RNA-­‐seq  =  Love  R-­‐make  tools  

Page 10: 20140710 6 c_mason_ercc2.0_workshop

But!  There  is  some  noise  

Page 11: 20140710 6 c_mason_ercc2.0_workshop

What  is  the  source  of  the  wiggles?

Genes  

ESTs  

Liver

Kidney

Adult Brain

Fetal Hippocampus

Fetal Amygdala

Page 12: 20140710 6 c_mason_ercc2.0_workshop

The  Dirty  Dozen:  >=  12  Sources  of  Technical  Noise  in  RNA-­‐Seq    (1)  RNA  integrity:  Sample  purity  or  degrada$on  (2)  Sample  RNA  complexity:  polyA  RNA,  total  RNA,  miRNA  (3)  cDNA  synthesis:  random  hexamer  vs.  polyA-­‐primed  (4)  Library  isola;on:  Gel  excision  vs.  column  (5)  Technical  Errors:  Machine,  Site,  Lane,  Technician,  Library  Size  (6)  Amplifica;on  Cycles  or  Methods:  NuGen,  Tn5,  Phi29  (7)  Input  amount:  (1,  10,  100,  1000  cells)  (8)  Algorithms:  for  alignment  and  assembly  (9)  Fragment  size  distribu;on:  Paired-­‐End,  Single-­‐End  (adaptors)  (10)  Liga;on  Efficiency:  Mul$plexing/Barcoding  and  RNA  ligases  (11)  Depth  of  Sequencing:  cost/benefit  point  (12)  RNA  fragmenta;on:  ca$on,  enzyma$c  

Page 13: 20140710 6 c_mason_ercc2.0_workshop

Which  type  of  RNA?  Type Abbreviation Function Organisms

7SK$RNA$ 7SK$ negatively$regulating$CDK9/cyclin$T$complex$ metazoansSignal$recognition$particle$RNA$ 7SRNA Membrane$integration$ All$organisms$

Antisense$RNA$ aRNA$ Regulatory All$organisms$CRISPR$RNA$ crRNA$ Resistance$to$parasites Bacteria$and$archaea$Guide$RNA$ gRNA$ mRNA$nucleotide$modification$ Kinetoplastid$mitochondria$

Long$noncoding$RNA$ lncRNA XIST$(dosage$compensation),$HOTAIR$(cancer) Eukaryotes$MicroRNA$ miRNA$ Gene$regulation$ Most$eukaryotes$

Messenger$RNA$ mRNA$ Codes$for$protein$ All$organismsPiwiRinteracting$RNA$ piRNA$ Transposon$defense,$maybe$other$functions$ Most$animals$

Repeat$associated$siRNA$ rasiRNA$ Type$of$piRNA;$transposon$defense$ Drosophila$Retrotransposon$ retroRNA selfRpropagation Eukaryotes$and$some$bacteria$Ribonuclease$MRP$ RNase$MRP$ rRNA$maturation,$DNA$replication$ Eukaryotes$Ribonuclease$P$ RNase$P$ tRNA$maturation$ All$organisms$Ribosomal$RNA$ rRNA$ Translation$ All$organisms

Small$Cajal$bodyRspecific$RNA$ scaRNA$ Guide$RNA$to$telomere$in$active$cells MetazoansSmall$interfering$RNA$ siRNA$ Gene$regulation$ Most$eukaryotes$

SmY$RNA$ SmY$ mRNA$transRsplicing$ Nematodes$Small$nucleolar$RNA$ snoRNA$ Nucleotide$modification$of$RNAs$ Eukaryotes$and$archaea$Small$nuclear$RNA$ snRNA$ Splicing$and$other$functions$ Eukaryotes$and$archaea$TransRacting$siRNA$ tasiRNA$ Gene$regulation$ Land$plants$Telomerase$RNA$ telRNA Telomere$synthesis$ Most$eukaryotes$

TransferRmessenger$RNA$ tmRNA$ Rescuing$stalled$ribosomes$ Bacteria$Transfer$RNA$ tRNA$ Translation$ All$organisms

Viral$Response$RNA$ viRNA AntiRviral$immunity C$elegansVault$RNA$ vRNA$ selfRpropagation Expulsion$of$xenobioticsY$RNA$ yRNA RNA$processing,$DNA$replication$ Animals$

Page 14: 20140710 6 c_mason_ercc2.0_workshop

And  -­‐  which  one  do  we  use?  Technologies  Bifurcate  into  two  main  realms:  

Platform Instrument Template0Preparation Chemistry Avearge0Length Longest0ReadIllumina HiSeq2500 BridgePCR/cluster Rev.<Term.,<SBS 100 150

Illumina HiSeq2000 BridgePCR/cluster Rev.<Term.,<SBS 100 150

Illumina MiSeq BridgePCR/cluster Rev.<Term.,<SBS 250 300

GnuBio GnuBio emPCR HybFAssist<Sequencing 1000* 64,000*

Life<Technologies SOLiD<5500 emPCR Seq.<by<Lig. 75 100

LaserGen LaserGen emPCR Rev.<Term.,<SBS 25* 100*

Pacific<Biosciences RS Polymerase<Binding RealFtime 1800 15,000

454 Titanium emPCR PyroSequencing 650 1100

454 Junior emPCR PyroSequencing 400 650

Helicos Heliscope adaptor<ligation Rev.<Term.,<SBS 35 57

Intelligent<BioSystems MAXFSeq Rolony<amplification TwoFStep<SBS<(label/unlabell) 2x100 300

Intelligent<BioSystems MINIF20 Rolony<amplification TwoFStep<SBS<(label/unlabell) 2x100 300

ZS<Genetics N/A Atomic<Lableing Electron<Microscope N/A N/A

Halcyon<Molecular N/A N/A Direct<Observation<of<DNA N/A N/A

Platform Instrument Template0Preparation Chemistry Avearge0Length Longest0ReadIBM<DNA<Transistor N/A none Microchip<Nanopore N/A N/A

NABsys N/A none Nanochannel N/A N/A

Bionanogenomics N/A anneal<7mers Nanochannel 10,000* 1,000,000*

Life<Technologies PGM emPCR SemiFconductor 150 300

Life<Technologies Proton emPCR SemiFconductor 120 240

Life<Technologies Proton<2 emPCR SemiFconductor 400* 800*

Genia N/A none Protein<nanopore<(aFhemalysin) N/A N/A

Oxford<Nanopore MinION none Protein<Nanopore 10,000 10,000*

Oxford<Nanopore GridION<2K none Protein<Nanopore 10,000 500,000*

Oxford<Nanopore GridION<8K none Protein<Nanopore 10,000 500,000*

*Values<are<estimates<from<companies<that<have<not<yet<released<actual<data

Optical<Sequencing

Electical<Sequencing

Table<1:<Types<of<HighFThroughput<Sequencing<Technologies

Mason  CE,  Porter  SG,  Smith  TM.  Adv  Exp  Med  Biol.  2014  

Page 15: 20140710 6 c_mason_ercc2.0_workshop

0.    Genomes    

1.  RNA  Standards  

2.  Removing  RNA-­‐Seq  Varia$on  

3.  Epitranscriptome  

4.  #JustSequenceIt  

 

Page 16: 20140710 6 c_mason_ercc2.0_workshop
Page 17: 20140710 6 c_mason_ercc2.0_workshop

The complexity of the transcriptome is vast

exon1    exon2    exon3  

exon1-­‐exon2  exon1-­‐exon3  exon2-­‐exon3  

exon1-­‐exon2-­‐exon3  

6   63   15  

7   127   21  

8   255   28  

3   7   3  

4   15   6  

5   31   10  

1   1   0  

2   3   1  

Exons   Variants   Junc;ons  

2n-1

Exon  1   Exon  2   Exon  3  

Exon  1   Exon  2   Exon  3  

n(n-1) 2

Exon4  

Exon4  

exon4    exon1-­‐  exon4  exon2-­‐exon4  exon3-­‐exon4  

exon1-­‐exon2-­‐exon4  exon1-­‐exon3-­‐exon4  exon2-­‐exon3-­‐exon4  

exon1-­‐exon2-­‐exon3-­‐exon4  

8x1083 theoretical transcript combinations 8x1080 atoms in the universe

(159 atoms/star, 111 stars/galaxy, 110 galaxies) Li  and  Mason,  ARGHG,  2014  

Page 18: 20140710 6 c_mason_ercc2.0_workshop

Es$mate  of  False  Junc$on  Rate  

5’   3’  

Exon-­‐Edge  DB  (3’-­‐5’  junc$ons)  

2 3 4 1 Exon # :

False  Exon-­‐Edge  DB  (3’-­‐3’  junc$ons)  

False  Exon-­‐Edge  DB  (5’-­‐5’  junc$ons)  

False  Exon-­‐Edge  DB  (5’-­‐3’  junc$ons)  

5’   3’  

5’   3’  

5’   3’  

Page 19: 20140710 6 c_mason_ercc2.0_workshop

FDA’s Sequencing Quality Control (SeQC) Consortium: Seq Standards from 323 members of

Government (FDA), Industry, Academia

>RNA sequence from pilot study based on MAQC samples and MAQC design: ~Pilot data from 2008-2012 MAQC Sample A/B, & ERCCs from: •  454 •  Helicos •  Illumina •  Lifetech ~1000Gb of Illumina BodyMap data ~300Gb of experimental RNA-Seq

Page 20: 20140710 6 c_mason_ercc2.0_workshop

Associa;on  of  Biomolecular  Research  Facili;es  (ABRF)  Next  Genera;on  Sequencing  (NGS)  Study:    Phase  I:  RNA-­‐Seq  and  degraded  RNA-­‐Seq  (2011-­‐2013)  Phase  II:  DNA-­‐Seq  and  Hard-­‐to-­‐sequence  regions  (2014-­‐2016)  Phase  III:  Asteroid  and  Mar$an  Surface  Sequencing  (2017-­‐2050?)    1.  Cross  Pla(orm:  HiSeq2000,  454,  PGM,  Proton,  PacBio  2.  Cross-­‐Protocol:  ribo-­‐,  direc$onal,  non-­‐direc$onal,  degraded  RNA  3.  Cross-­‐Site:  3  sites  for  each  pla(orm,  replicates  at  each  site  

Improve  the  detec$on  and  quan$fica$on  of  molecular  signatures    for  use  in  basic,  applied,  and  clinical  research.    Specifically,  the  ABRF-­‐NGS  group  will  test  and  characterize  the  myriad,  rapidly  evolving  NGS  techniques  for  RNA-­‐Seq  and  DNA-­‐Seq.  

Page 21: 20140710 6 c_mason_ercc2.0_workshop

Samples  of  the  MAQC,  SEQC  and  ABRF-­‐NGS  Study  

Page 22: 20140710 6 c_mason_ercc2.0_workshop

ß  SEQC  samples  (FDA)  AB

A B

A

BCD

A BA BA BA B C D E F

AB A

B

AB

ABEF

Pacific Biosciences RS

Life TechnologiesIon PGM

Life TechnologiesIon Proton

A:  10  cancer  cell  lines  B:  Brain  $ssues  

Page 23: 20140710 6 c_mason_ercc2.0_workshop

AB

A B

A

BCD

A BA BA BA B C D E F

AB A

B

AB

ABEF

Pacific Biosciences RS

Life TechnologiesIon PGM

Life TechnologiesIon Proton

Cross-­‐pla(orm  experimental  design  

Page 24: 20140710 6 c_mason_ercc2.0_workshop

AB

A B

A

BCD

A BA BA BA B C D E F

AB A

B

AB

ABEF

Pacific Biosciences RS

Life TechnologiesIon PGM

Life TechnologiesIon Proton

Cross-­‐Site  experimental  design  

Page 25: 20140710 6 c_mason_ercc2.0_workshop

AB

A B

A

BCD

A BA BA BA B C D E F

AB A

B

AB

ABEF

Pacific Biosciences RS

Life TechnologiesIon PGM

Life TechnologiesIon Proton

Cross-­‐Protocol  experimental  design  

Page 26: 20140710 6 c_mason_ercc2.0_workshop

All  technologies  have  strengths  &  weaknesses  

Vendor Instrument Version Run Time (hours)

Read Length (mean)

Reads per Run (million)

Yield per Run

Cost per Run ($)1

Cost per Mb

($)Paired-

end Application Strengths

Illumina HiSeq 2000 High Output 132 50 6,000 300 Gb2 18,725.00 0.06 Yes Gene expression; splice

junction detection; variant calling; fusions

Deep read counts for transcript

quantification

Illumina HiSeq 2500 High Output 132 50 6,000 300 Gb2 18,725.00 0.06 Yes as above as above

Illumina MiSeq v2 kit 39 250 30 7.5 Gb 982.75 0.13 Yes Splice junction detection; variant calling

Rapid transcript quantification and variant detection

Life Technologies PGM 318 chip 7.3 176 6 1.056 Gb 749.00 0.71 No Splice junction

detection; variant callingRapid transcript

quantification and variant detection

Life Technologies Proton Proton I

chip 2 to 4 81 70 5.67 Gb 834.00 0.15 No Gene expression; variant calling

Good read depth and length for transcript

quantification

Pacific Biosciences RS RS 0.5 to 2 1,289 0.03 38.67 Mb 136.38 3.53 No

Splice junction detection; full length

gene coverageExtended read

lengths

Roche 454Genome

Sequencer FLX+

20 686 1 686 Mb 5,985.00 8.72 No Splice junction detection Read length

1sequencing reaction reagents, at academic list price U.S. dollars, and does not include library preparation reagents, labor, data storage or analysis, equipment or maintenance; 2one sequencing run using 2 flowcells. Pacific Biosciences is calculated based on CCS reads; Gb: billion bases, Mb: million bases

Table 1. Summary of RNA-seq platforms as deployed in the ABRF-NGS Study.

Page 27: 20140710 6 c_mason_ercc2.0_workshop

I:  Inter-­‐pla(orm  performance  

Page 28: 20140710 6 c_mason_ercc2.0_workshop

Error  models  highly  variable  among  pla(orms  

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

55%

60%

65%

70%

75%

80%

85%

90%

95%

100%

���ï&ï

$ï�

���ï&ï

$ï�

���ï&ï

%ï�

���ï&ï

%ï�

���ï,ï$ï

����ï,ï$ï

����ï,ï%ï

����ï,ï%ï

����ï3ï

$ï�

���ï3ï

$ï�

���ï3ï

%ï�

���ï3ï

%ï�

,/01ï

&ï$ï

�,/01ï

&ï$ï

�,/01ï

&ï$ï

�,/01ï

&ï$ï

�,/01ï

&ï%ï

�,/01ï

&ï%ï

�,/01ï

&ï%ï

�,/01ï

&ï%ï

�,/01ï

&ï&ï

�,/01ï

&ï&ï

�,/01ï

&ï&ï

�,/01ï

&ï&ï

�,/01ï

&ï'ï

�,/01ï

&ï'ï

�,/01ï

&ï'ï

�,/01ï

&ï'ï

�,/01ï

&�32

/Y$ï

$ï�

,/01ï

&�32

/Y$ï

$ï�

,/01ï

&�32

/Y$ï

$ï�

,/01ï

&�32

/Y$ï

$ï�

,/01ï

&�32

/Y$ï

%ï�

,/01ï

&�32

/Y$ï

%ï�

,/01ï

&�32

/Y$ï

%ï�

,/01ï

&�32

/Y$ï

%ï�

,/01ï

&�32

/Y$ï

&ï�

,/01ï

&�32

/Y$ï

&ï�

,/01ï

&�32

/Y$ï

&ï�

,/01ï

&�32

/Y$ï

&ï�

,/01ï

&�32

/Y$ï

'ï�

,/01ï

&�32

/Y$ï

'ï�

,/01ï

&�32

/Y$ï

'ï�

,/01ï

&�32

/Y$ï

'ï�

,/01ï

/ï$ï

�,/01ï

/ï$ï

�,/01ï

/ï$ï

�,/01ï

/ï%ï

�,/01ï

/ï%ï

�,/01ï

/ï%ï

�,/01ï

0ï$

+ï�

,/01ï

0ï$

+ï�

,/01ï

0ï$

+ï�

,/01ï

0ï$

6ï�

,/01ï

0ï$

6ï�

,/01ï

0ï$

6ï�

,/01ï

1ï$+

ï�,/01ï

1ï$5

ï�,/01ï

1ï$5

ï�,/01ï

1ï%5

ï�,/01ï

1ï%5

ï�,/01ï

1ï%5

ï�,/01ï

5ï$ï

�,/01ï

5ï$ï

�,/01ï

5ï$ï

�,/01ï

5ï%ï

�,/01ï

5ï%ï

�,/01ï

5ï%ï

�,/01ï

9ï$ï

�,/01ï

9ï$ï

�,/01ï

9ï$ï

�,/01ï

9ï%ï

�,/01ï

9ï%ï

�,/01ï

9ï%ï

�3$

&ï$ï

$ï�

3$&ï

$ï$ï

�3$

&ï$ï

$ï�

3$&ï

$ï$ï

�3$

&ï$ï

%ï�

3$&ï

$ï%ï

�3$

&ï$ï

%ï�

3$&ï

$ï%ï

�3$

&ï)ï

$ï�

3$&ï

)ï$ï

�3$

&ï)ï

$ï�

3$&ï

)ï$ï

�3$

&ï)ï

%ï�

3$&ï

)ï%ï

�3$

&ï)ï

%ï�

3$&ï

)ï%ï

�3$

&ï+ï

$ï�

3$&ï

+ï$ï

�3$

&ï+ï

$ï�

3$&ï

+ï$ï

�3$

&ï+ï

%ï�

3$&ï

+ï%ï

�3$

&ï+ï

%ï�

3$&ï

+ï%ï

�3*

0ï+

ï$ï�

3*0ï+

ï$ï�

3*0ï+

ï%ï�

3*0ï+

ï%ï�

3*0ï3

ï$ï�

3*0ï3

ï$ï�

3*0ï3

ï%ï�

3*0ï3

ï%ï�

3*0ï6

ï$ï�

3*0ï6

ï$ï�

3*0ï6

ï$ï�

3*0ï6

ï$ï�

3*0ï6

ï%ï�

3*0ï6

ï%ï�

3*0ï6

ï%ï�

3*0ï6

ï%ï�

352ï%

ï$ï�

352ï%

ï%ï�

352ï%

ï&ï�

352ï%

ï'ï�

352ï6

ï$ï�

352ï6

ï$ï�

352ï6

ï$ï�

352ï6

ï%ï�

352ï6

ï%ï�

352ï6

ï%ï�

sample

mism

atch

type

s per

cent

age

deletion insertion single base substitution

0%

2%

4%

6%

454ILM

NPACPGMPRO

platform

Inde

ls m

ismat

chs p

erce

ntag

e

a b

454:  Roche  454  GS  FLX+  ILMN:  Illumina  HiSeq  2000/2500  PAC:  Pacific  Biosciences  RS  I  PGM:  Ion  Personal  Genome  Machine  PRO:  Ion  Torrent  Proton  

       

Page 29: 20140710 6 c_mason_ercc2.0_workshop

Full  length  genebody  coverage      

Coverage across genebody (%)

���ï&ï$

ï����ï&

ï$ï�

���ï,ï$ï�

���ï,ï$ï�

���ï3ï$

ï����ï3

ï$ï�

���ï&ï%

ï����ï&

ï%ï�

���ï,ï%ï�

���ï,ï%ï�

���ï3ï%

ï����ï3

ï%ï�

,/01ï&

ï$ï�

,/01ï&

ï$ï�

,/01ï&

ï$ï�

,/01ï&

ï$ï�

,/01ï&

�32/Y$

ï$ï�

,/01ï&

�32/Y$

ï$ï�

,/01ï&

�32/Y$

ï$ï�

,/01ï&

�32/Y$

ï$ï�

,/01ï/ï$

ï�,/0

1ï/ï$

ï�,/0

1ï/ï$

ï�,/0

1ï5

ï$ï�

,/01ï5

ï$ï�

,/01ï5

ï$ï�

,/01ï9

ï$ï�

,/01ï9

ï$ï�

,/01ï9

ï$ï�

,/01ï0

ï$+ï�

,/01ï0

ï$+ï�

,/01ï0

ï$+ï�

,/01ï1

ï$+ï�

,/01ï1

ï$5ï�

,/01ï1

ï$5ï�

,/01ï0

ï$6ï�

,/01ï0

ï$6ï�

,/01ï0

ï$6ï�

,/01ï&

ï%ï�

,/01ï&

ï%ï�

,/01ï&

ï%ï�

,/01ï&

ï%ï�

,/01ï&

�32/Y$

ï%ï�

,/01ï&

�32/Y$

ï%ï�

,/01ï&

�32/Y$

ï%ï�

,/01ï&

�32/Y$

ï%ï�

,/01ï/ï%

ï�,/0

1ï/ï%

ï�,/0

1ï/ï%

ï�,/0

1ï5

ï%ï�

,/01ï5

ï%ï�

,/01ï5

ï%ï�

,/01ï9

ï%ï�

,/01ï9

ï%ï�

,/01ï9

ï%ï�

,/01ï1

ï%5ï�

,/01ï1

ï%5ï�

,/01ï1

ï%5ï�

,/01ï&

ï&ï�

,/01ï&

ï&ï�

,/01ï&

ï&ï�

,/01ï&

ï&ï�

,/01ï&

�32/Y$

ï&ï�

,/01ï&

�32/Y$

ï&ï�

,/01ï&

�32/Y$

ï&ï�

,/01ï&

�32/Y$

ï&ï�

,/01ï&

ï'ï�

,/01ï&

ï'ï�

,/01ï&

ï'ï�

,/01ï&

ï'ï�

,/01ï&

�32/Y$

ï'ï�

,/01ï&

�32/Y$

ï'ï�

,/01ï&

�32/Y$

ï'ï�

,/01ï&

�32/Y$

ï'ï�

3$&ï$

ï$ï�

3$&ï$

ï$ï�

3$&ï$

ï$ï�

3$&ï$

ï$ï�

3$&ï)ï$

ï�3$&

ï)ï$ï�

3$&ï)ï$

ï�3$&

ï)ï$ï�

3$&ï+

ï$ï�

3$&ï+

ï$ï�

3$&ï+

ï$ï�

3$&ï+

ï$ï�

3$&ï$

ï%ï�

3$&ï$

ï%ï�

3$&ï$

ï%ï�

3$&ï$

ï%ï�

3$&ï)ï%

ï�3$&

ï)ï%ï�

3$&ï)ï%

ï�3$&

ï)ï%ï�

3$&ï+

ï%ï�

3$&ï+

ï%ï�

3$&ï+

ï%ï�

3$&ï+

ï%ï�

3*0ï+

ï$ï�

3*0ï+

ï$ï�

3*0ï3

ï$ï�

3*0ï3

ï$ï�

3*0ï6

ï$ï�

3*0ï6

ï$ï�

3*0ï6

ï$ï�

3*0ï6

ï$ï�

3*0ï+

ï%ï�

3*0ï+

ï%ï�

3*0ï3

ï%ï�

3*0ï3

ï%ï�

3*0ï6

ï%ï�

3*0ï6

ï%ï�

3*0ï6

ï%ï�

3*0ï6

ï%ï�

352ï%

ï$ï�

352ï6

ï$ï�

352ï6

ï$ï�

352ï6

ï$ï�

352ï%

ï%ï�

352ï6

ï%ï�

352ï6

ï%ï�

352ï6

ï%ï�

352ï%

ï&ï�

352ï%

ï'ï�

5'

platform454,/013$&3*0352

sample$%&'

site&,3&�32/Y$/5901$)+6%

typeintact+56

���

���

���

���

typesitesampleplatform

typesitesampleplatform

454   ILMN   PAC   PGM   PRO  

Note:  PAC  used  CCS  read  only;  PAC  on  average  only  cover  3k  GENCODE  genes.  

Page 30: 20140710 6 c_mason_ercc2.0_workshop

Inter-­‐site  normalized  gene  expression  varia$on:  CV  and  R2  

●●

●●●●●

●●

●●●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●●

●●●●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●●

●●●●●●

●●●●●

●●

●●●

●●

●●●●

●●

●●●●●●

●●●●

●●

●●●●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●

●●●

●●●

●●●●●●●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●●●

●●

●●

●●

●●●●●●

●●

●●

●●●●●

●●

●●●●

●●●●

●●

●●●●

●●●●●●

●●●●●●●●

●●

●●●●

●●

●●●●●●●●●●●●

●●

●●

●●

●●●●●●●

●●●●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●●●●●

●●

●●

●●●

●●●

●●

●●●

●●●

●●●●

●●●●●●

●●

●●●●●●●

●●

●●●●

●●

●●●●●

●●

●●●●

●●●●●●●

●●●●●●

●●

●●

●●

●●

●●●●●●●

●●●

●●●

●●

●●●●

●●●●●●●

●●●●

●●

●●●

●●●●●●●●●●●●●●

●●

●●

●●

●●●

●●●

●●●

●●●●●●

●●●

●●

●●●●●

●●●●●●

●●●●

●●●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●●●

●●●●●●

●●

●●

●●●●

●●●

●●●

●●●●

●●●

●●●●●

●●●●

●●

●●●

●●●

●●●●●●●●●●●●

●●●

●●●●

●●●●

●●●●●●

●●

●●●

●●●●

●●

●●

●●●

●●●

●●●●●

●●

●●●

●●●●●●

●●●●

●●●

●●●●●●●●●

●●

●●

●●

●●●

●●●●●

●●●●●●●

●●●●

●●

●●

●●●

●●

●●●●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●●

●●●●●●

●●●●●

●●

●●●●●

●●●

●●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●●

●●●●

●●

●●●●

●●●

●●●●●●

●●●●

●●●●●●●●●

●●

●●

●●●●●

●●●●

●●

●●●

●●

●●

●●

●●●●●●●●

●●

●●●●●●

●●

●●

●●

●●●

●●●

●●●●●

●●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●●●

●●●

●●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●●

●●●●

●●●●

●●

●●●

●●●●●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●

●●●●●

●●●

●●●●

●●●●●●●●●

●●●●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●●●●●●

●●

●●●●●

●●

●●●●●●

●●

●●●

●●●●

●●

●●●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●●

●●●

●●●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●●●●●●●●●●

●●●

●●

●●●

●●●

●●●●●●●

●●

●●●●●●

●●

●●

●●

●●●

●●●●

●●●

●●

●●

●●●●●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●●●●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

0

50

100

150

200

454−A454−B

ILMN−A

ILMN−B

PAC−A1

PAC−A2

PAC−A3

PAC−B1

PAC−B2

PAC−B3

PGM−A

PGM−B

PRO−A

PRO−B

Coe

ffici

ent o

f var

iatio

n (%

)

454 ILMN PAC PGM PROintra−platform

Aintra−platform

Binter−platform

Ainter−platform

B

0.920.970.910.96

0.850.980.950.95

0.820.880.850.840.930.890.750.820.780.860.920.92

0.00

0.25

0.50

0.75

1.00

454ILM

NPG

MPRO

454ILM

NPG

MPRO454−ILM

N454−PG

M454−PRO

ILMN−PG

MILM

N−PRO

PGM−PRO

454−ILMN

454−PGM

454−PROILM

N−PG

MILM

N−PRO

PGM−PRO

platform

Cor

rela

tion

coef

ficie

nts

0.950.940.940.960.970.970.96

0.00

0.25

0.50

0.75

1.00

A−AHA−ASA−AR

AH−AS

AH−AR

AR−AS

B−BR

sample

Cor

rela

tion

coef

ficie

nts

50000

100000

150000

200000

9.0

9.5

10.0

10.5

11.0

bases sequenced (log10)

dete

cted

junc

tions

● 454 ILMN PAC PGM PRO

● ●A B

0

25

50

75

454

ILMN

PAC

PGM

PRO

platform

junc

tions

per

milli

on b

ases

A Bknown novel

0

20000

40000

60000

1 2 3 4 5 3 4 5

occurrence among platforms

junc

tions

A B

●●

10000

20000

30000

9.0

9.5

10.0

10.5

11.0

bases sequenced (log10)

dete

cted

gen

es

● 454 ILMN PAC PGM PRO

● ●A B

gd e f

a b c

ILMN  with  lowest  inter-­‐site  CV   ILMN  with  highest  inter-­‐site  R2  ILMN-­‐PRO  &  PRO-­‐PGM  with  highest  R2  

               

Page 31: 20140710 6 c_mason_ercc2.0_workshop

Genes  detec$on  is  log-­‐linear;  Junc$on  detec$on  is  length-­‐dependent  

●●

●●●●●

●●

●●●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●●

●●●●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●●

●●●●●●

●●●●●

●●

●●●

●●

●●●●

●●

●●●●●●

●●●●

●●

●●●●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●

●●●

●●●

●●●●●●●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●●●

●●

●●

●●

●●●●●●

●●

●●

●●●●●

●●

●●●●

●●●●

●●

●●●●

●●●●●●

●●●●●●●●

●●

●●●●

●●

●●●●●●●●●●●●

●●

●●

●●

●●●●●●●

●●●●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●●●●●

●●

●●

●●●

●●●

●●

●●●

●●●

●●●●

●●●●●●

●●

●●●●●●●

●●

●●●●

●●

●●●●●

●●

●●●●

●●●●●●●

●●●●●●

●●

●●

●●

●●

●●●●●●●

●●●

●●●

●●

●●●●

●●●●●●●

●●●●

●●

●●●

●●●●●●●●●●●●●●

●●

●●

●●

●●●

●●●

●●●

●●●●●●

●●●

●●

●●●●●

●●●●●●

●●●●

●●●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●●●

●●●●●●

●●

●●

●●●●

●●●

●●●

●●●●

●●●

●●●●●

●●●●

●●

●●●

●●●

●●●●●●●●●●●●

●●●

●●●●

●●●●

●●●●●●

●●

●●●

●●●●

●●

●●

●●●

●●●

●●●●●

●●

●●●

●●●●●●

●●●●

●●●

●●●●●●●●●

●●

●●

●●

●●●

●●●●●

●●●●●●●

●●●●

●●

●●

●●●

●●

●●●●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●●

●●●●●●

●●●●●

●●

●●●●●

●●●

●●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●●

●●●●

●●

●●●●

●●●

●●●●●●

●●●●

●●●●●●●●●

●●

●●

●●●●●

●●●●

●●

●●●

●●

●●

●●

●●●●●●●●

●●

●●●●●●

●●

●●

●●

●●●

●●●

●●●●●

●●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●●●

●●●

●●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●●

●●●●

●●●●

●●

●●●

●●●●●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●

●●●●●

●●●

●●●●

●●●●●●●●●

●●●●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●●●●●●

●●

●●●●●

●●

●●●●●●

●●

●●●

●●●●

●●

●●●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●●

●●●

●●●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●●●●●●●●●●

●●●

●●

●●●

●●●

●●●●●●●

●●

●●●●●●

●●

●●

●●

●●●

●●●●

●●●

●●

●●

●●●●●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●●●●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

0

50

100

150

200

454−A454−B

ILMN−A

ILMN−B

PAC−A1

PAC−A2

PAC−A3

PAC−B1

PAC−B2

PAC−B3

PGM−A

PGM−B

PRO−A

PRO−B

Coe

ffici

ent o

f var

iatio

n (%

)

454 ILMN PAC PGM PROintra−platform

Aintra−platform

Binter−platform

Ainter−platform

B

0.920.970.910.96

0.850.980.950.95

0.820.880.850.840.930.890.750.820.780.860.920.92

0.00

0.25

0.50

0.75

1.00

454ILM

NPG

MPR

O

454ILM

NPG

MPR

O454−ILM

N454−PG

M454−PR

OILM

N−PG

MILM

N−PR

OPG

M−PR

O454−ILM

N454−PG

M454−PR

OILM

N−PG

MILM

N−PR

OPG

M−PR

O

platform

Cor

rela

tion

coef

ficie

nts

0.950.940.940.960.970.970.96

0.00

0.25

0.50

0.75

1.00

A−AH

A−AS

A−AR

AH−AS

AH−AR

AR−AS

B−BR

sample

Cor

rela

tion

coef

ficie

nts

50000

100000

150000

2000009.0

9.5

10.0

10.5

11.0

bases sequenced (log10)

dete

cted

junc

tions

● 454 ILMN PAC PGM PRO

● ●A B

0

25

50

75

454

ILMN

PAC

PGM

PRO

platform

junc

tions

per

milli

on b

ases

A Bknown novel

0

20000

40000

60000

1 2 3 4 5 3 4 5

occurrence among platforms

junc

tions

A B

●●

10000

20000

30000

9.0

9.5

10.0

10.5

11.0

bases sequenced (log10)

dete

cted

gen

es

● 454 ILMN PAC PGM PRO

● ●A B

gd e f

a b c

               

Page 32: 20140710 6 c_mason_ercc2.0_workshop

Junc$ons  detec$on  efficiency  highly  variable;  agreement  common  

●●

●●●●●

●●

●●●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●●

●●●●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●●

●●●●●●

●●●●●

●●

●●●

●●

●●●●

●●

●●●●●●

●●●●

●●

●●●●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●

●●●

●●●

●●●●●●●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●●●

●●

●●

●●

●●●●●●

●●

●●

●●●●●

●●

●●●●

●●●●

●●

●●●●

●●●●●●

●●●●●●●●

●●

●●●●

●●

●●●●●●●●●●●●

●●

●●

●●

●●●●●●●

●●●●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●●●●●

●●

●●

●●●

●●●

●●

●●●

●●●

●●●●

●●●●●●

●●

●●●●●●●

●●

●●●●

●●

●●●●●

●●

●●●●

●●●●●●●

●●●●●●

●●

●●

●●

●●

●●●●●●●

●●●

●●●

●●

●●●●

●●●●●●●

●●●●

●●

●●●

●●●●●●●●●●●●●●

●●

●●

●●

●●●

●●●

●●●

●●●●●●

●●●

●●

●●●●●

●●●●●●

●●●●

●●●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●●●

●●●●●●

●●

●●

●●●●

●●●

●●●

●●●●

●●●

●●●●●

●●●●

●●

●●●

●●●

●●●●●●●●●●●●

●●●

●●●●

●●●●

●●●●●●

●●

●●●

●●●●

●●

●●

●●●

●●●

●●●●●

●●

●●●

●●●●●●

●●●●

●●●

●●●●●●●●●

●●

●●

●●

●●●

●●●●●

●●●●●●●

●●●●

●●

●●

●●●

●●

●●●●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●●

●●●●●●

●●●●●

●●

●●●●●

●●●

●●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●●

●●●●

●●

●●●●

●●●

●●●●●●

●●●●

●●●●●●●●●

●●

●●

●●●●●

●●●●

●●

●●●

●●

●●

●●

●●●●●●●●

●●

●●●●●●

●●

●●

●●

●●●

●●●

●●●●●

●●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●●●

●●●

●●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●●

●●●●

●●●●

●●

●●●

●●●●●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●

●●●●●

●●●

●●●●

●●●●●●●●●

●●●●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●●●●●●

●●

●●●●●

●●

●●●●●●

●●

●●●

●●●●

●●

●●●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●●

●●●

●●●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●●●●●●●●●●

●●●

●●

●●●

●●●

●●●●●●●

●●

●●●●●●

●●

●●

●●

●●●

●●●●

●●●

●●

●●

●●●●●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●●●●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

0

50

100

150

200

454−A454−B

ILMN−A

ILMN−B

PAC−A1

PAC−A2

PAC−A3

PAC−B1

PAC−B2

PAC−B3

PGM−A

PGM−B

PRO−A

PRO−B

Coe

ffici

ent o

f var

iatio

n (%

)

454 ILMN PAC PGM PROintra−platform

Aintra−platform

Binter−platform

Ainter−platform

B

0.920.970.910.96

0.850.980.950.95

0.820.880.850.840.930.890.750.820.780.860.920.92

0.00

0.25

0.50

0.75

1.00

454ILM

NPG

MPRO

454ILM

NPG

MPRO454−ILM

N454−PG

M454−PRO

ILMN−PG

MILM

N−PRO

PGM−PRO

454−ILMN

454−PGM

454−PROILM

N−PG

MILM

N−PRO

PGM−PRO

platform

Cor

rela

tion

coef

ficie

nts

0.950.940.940.960.970.970.96

0.00

0.25

0.50

0.75

1.00

A−AHA−ASA−AR

AH−AS

AH−AR

AR−AS

B−BR

sample

Cor

rela

tion

coef

ficie

nts

50000

100000

150000

200000

9.0

9.5

10.0

10.5

11.0

bases sequenced (log10)

dete

cted

junc

tions

● 454 ILMN PAC PGM PRO

● ●A B

0

25

50

75

454

ILMN

PAC

PGM

PRO

platform

junc

tions

per

milli

on b

ases

A Bknown novel

0

20000

40000

60000

1 2 3 4 5 3 4 5

occurrence among platforms

junc

tions

A B

●●

10000

20000

30000

9.0

9.5

10.0

10.5

11.0

bases sequenced (log10)

dete

cted

gen

es

● 454 ILMN PAC PGM PRO

● ●A B

gd e f

a b c

Note:  Only  use  a  subset  of  reads  among  pla(orms  to  normalize  the  scale  

               

Most  of  the  known  junc$ons  are  shared  by  at  least  3  pla(orms  

Page 33: 20140710 6 c_mason_ercc2.0_workshop

Sequencing  depth  is  important  to  discover  low  abundance  transcripts    

a b

Page 34: 20140710 6 c_mason_ercc2.0_workshop

Inter-­‐pla(orm  differen$al  gene  expression  show  88-­‐97%  agreement  

Shared  sets  of  greater  than  1000  genes  are  indicated  in  red,    100-­‐999  yellow,    <100  blue.    Unique  DEGs:  454            -­‐  3.0%  POLYA  -­‐  9.2%  RIBO        -­‐  8.8%  PRO          -­‐  11.9%  PGM        -­‐  3.9%        

Page 35: 20140710 6 c_mason_ercc2.0_workshop

Propor$onal  venn  diagrams  don’t  always  add  clarity,  but  they  are  prewy  

RIBO

POLYA

454

PGMPRO

1766

482

2046

202

69

105

2199

1104484158

678 76 1828

740

1010

401133

559

62

2036595

2792

1187

406

1791

212

57

99

2151

Page 36: 20140710 6 c_mason_ercc2.0_workshop

II:  Inter-­‐protocol  quan$fica$ons  •  Illumina  

– Poly-­‐A  selec$ons  (POLYA)  – Ribo-­‐deple$on  (RIBO)  

Page 37: 20140710 6 c_mason_ercc2.0_workshop

Gene  regions  distribu$on  varies  between  protocols  

0"

5,000"

10,000"

15,000"

20,000"

25,000"

≥"2" ≥"4" ≥"8" ≥"16" ≥"32" ≥"64" ≥"128"

PolyA"

RiboDep"

Overlap"

0"

5,000"

10,000"

15,000"

20,000"

25,000"

30,000"

≥"2" ≥"4" ≥"8" ≥"16" ≥"32" ≥"64" ≥"128"

PolyA"

RiboDep"

Overlap"

!

c"

a"

d"

Num

ber o

f UTR

s!

!b"

Num

ber o

f Gen

es!

# of Mapped Reads!# of Mapped Reads!

Map

ping

Distr

ibutio

n (%

)!

Freq

uenc

y!

FPKM Difference: log2(RiboDep - PolyA)!

0%

25%

50%

75%

100%

5,%2ï$ï�5,%2ï$ï�5,%2ï$ï�5,%2ï$ï�5,%2ï%ï�5,%2ï%ï�5,%2ï%ï�5,%2ï%ï�5,%2ï&ï�5,%2ï&ï�5,%2ï&ï�5,%2ï&ï�5,%2ï'ï�5,%2ï'ï�5,%2ï'ï�5,%2ï'ï�32/<$ï$ï�32/<$�$ï�32/<$ï$ï�32/<$ï$ï�32/<$ï%ï�32/<$ï%ï�32/<$ï%ï�32/<$ï%ï�32/<$ï&ï�32/<$ï&ï�32/<$ï&ï�32/<$ï&ï�32/<$ï'ï�32/<$ï'ï�32/<$ï'ï�32/<$ï'ï�

Sample

Map

ping

distri

butio

n (%

)intergenic mito exon �XWU 5utr intron ribo

log2 RIBO - log2 POLYA

POLYARIBOOverlap

POLYARIBOOverlap

a

d

       

Page 38: 20140710 6 c_mason_ercc2.0_workshop

Very  similar  gene  measurements  of  genes  (RPKM)  between  polyA  &  Ribo-­‐deple$on  

Ribo0>PolyA+ PolyA>Ribo0+

RPKM%difference%(polyA%–%ribo0)%

Num

ber%o

f%Gen

es%

Page 39: 20140710 6 c_mason_ercc2.0_workshop

ENSEMBL Gene ID Gene Symbol Description Length PolyA Reads

RiboDep Reads

PolyA FPKM

RiboDep FPKM FPKM Diff

ENSG00000210082 J01415.4 Mt_rRNA 1559 17040155 133679955 18568.31 200404.74 7181836.43ENSG00000211459 J01415.24 Mt_rRNA 954 2232892 14445488 3976.16 35389.28 731413.11ENSG00000202198 RN7SK misc_RNA 331 3303 2818202 16.95 19899.03 719882.08ENSG00000258486 RN7SL1 antisense 300 3061 520624 17.33 4055.93 74038.60ENSG00000202364 SNORD3A snoRNA 216 718 186779 5.65 2020.98 72015.33ENSG00000202538 RNU472 snRNA 141 168 102560 2.02 1699.99 71697.97ENSG00000251562 MALAT1 lincRNA 8708 437102 4931496 85.27 1323.57 71238.30ENSG00000199916 RMRP misc_RNA 264 148 83019 0.95 734.96 7734.00ENSG00000201098 RNY1 misc_RNA 113 172 19210 2.59 397.32 7394.73ENSG00000238741 SCARNA7 snoRNA 330 53 50182 0.27 355.40 7355.13ENSG00000200795 RNU471 snRNA 141 35 18724 0.42 310.36 7309.94ENSG00000200087 SNORA73B snoRNA 204 336 23778 2.80 272.42 7269.62ENSG00000252010 SCARNA5 snoRNA 276 29 25379 0.18 214.91 7214.73ENSG00000199568 RNU5A71 snRNA 116 38 10224 0.56 205.99 7205.44ENSG00000252481 SCARNA13 snoRNA 275 145 24034 0.90 204.26 7203.36ENSG00000212232 SNORD17 snoRNA 237 102 20571 0.73 202.86 7202.13ENSG00000207008 SNORA54 snoRNA 123 8 9655 0.11 183.46 7183.35ENSG00000200156 RNU5B71 snRNA 116 28 8301 0.41 167.25 7166.84ENSG00000209582 SNORA48 snoRNA 135 38 9272 0.48 160.52 7160.04ENSG00000239002 SCARNA10 snoRNA 330 19 21801 0.10 154.40 7154.30ENSG00000254911 SCARNA9 antisense 353 501 19842 2.41 131.37 7128.96ENSG00000230043 TMSB4XP6 pseudogene 135 0 6272 0.00 108.58 7108.58ENSG00000239039 SNORD13 snoRNA 104 0 4521 0.00 101.60 7101.60ENSG00000208892 SNORA49 snoRNA 136 7 5352 0.09 91.97 791.89ENSG00000223336 RNU276P snRNA 190 22 6365 0.20 78.29 778.10

Supplemental Table 3 - Top 25 Genes with Highest Enrichment from Ribo-Depletion Preparation

Page 40: 20140710 6 c_mason_ercc2.0_workshop

polyA  favors  high  expressors  •  Gencode14  •  Unique  genes:  55889  

0  

5,000  

10,000  

15,000  

20,000  

25,000  

30,000  

≥  2   ≥  4   ≥  8   ≥  16   ≥  32   ≥  64   ≥  128  

PolyA  

RiboZero  

0  

5,000  

10,000  

15,000  

20,000  

25,000  

30,000  

≥  2   ≥  4   ≥  8   ≥  16   ≥  32   ≥  64   ≥  128  Read  count  

Genome-­‐level  Gene  Counts   Gene  Overlap  

Read  count  

Page 41: 20140710 6 c_mason_ercc2.0_workshop

High-­‐overlap  DEGs  and    comparable  accuracy  

FDR: 0.001 FDR: 0.01 FDR: 0.05

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

FC: 1.5FC: 2

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

comparison

Prop

ortio

n

type POLYA RIBO Sharedlibrary: POLYA−POLYA library: RIBO−RIBO

0200040006000

0200040006000

0200040006000

0200040006000

0200040006000

0200040006000

FC: 1.5FC: 1.5

FC: 1.5FC: 2

FC: 2FC: 2

FDR: 0.001FDR: 0.01

FDR: 0.05FDR: 0.001

FDR: 0.01FDR: 0.05

A−B

A−C

A−D

B−C

B−D

C−D

A−B

A−C

A−D

B−C

B−D

C−D

Sample

Num

ber o

f DEG

s

type up downa" b"

c" A−B

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.05; FC:1.5

RIBO:0.83POLYA:0.8

A−C

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.05; FC:1.5

RIBO:0.86POLYA:0.89

A−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.05; FC:1.5

RIBO:0.8POLYA:0.78

B−C

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.05; FC:1.5

RIBO:0.86POLYA:0.83

B−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.05; FC:1.5

RIBO:0.82POLYA:0.82

C−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.05; FC:1.5

RIBO:0.83POLYA:0.81

A−B

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.01; FC:1.5

RIBO:0.83POLYA:0.81

A−C

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.01; FC:1.5

RIBO:0.87POLYA:0.89

A−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.01; FC:1.5

RIBO:0.8POLYA:0.78

B−C

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.01; FC:1.5

RIBO:0.85POLYA:0.82

B−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.01; FC:1.5

RIBO:0.82POLYA:0.82

C−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.01; FC:1.5

RIBO:0.82POLYA:0.8

A−B

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.001; FC:1.5

RIBO:0.83POLYA:0.81

A−C

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.001; FC:1.5

RIBO:0.87POLYA:0.9

A−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.001; FC:1.5

RIBO:0.79POLYA:0.77

B−C

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.001; FC:1.5

RIBO:0.85POLYA:0.83

B−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.001; FC:1.5

RIBO:0.83POLYA:0.83

C−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.001; FC:1.5

RIBO:0.83POLYA:0.81

A−B

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.05; FC:2

RIBO:0.84POLYA:0.82

A−C

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.05; FC:2

RIBO:0.91POLYA:0.95

A−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.05; FC:2

RIBO:0.84POLYA:0.82

B−C

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.05; FC:2

RIBO:0.88POLYA:0.85

B−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.05; FC:2

RIBO:0.86POLYA:0.88

C−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.05; FC:2

RIBO:0.84POLYA:0.82

A−B

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.01; FC:2

RIBO:0.83POLYA:0.82

A−C

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.01; FC:2

RIBO:0.91POLYA:0.95

A−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.01; FC:2

RIBO:0.83POLYA:0.82

B−C

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.01; FC:2

RIBO:0.88POLYA:0.85

B−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.01; FC:2

RIBO:0.86POLYA:0.88

C−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.01; FC:2

RIBO:0.84POLYA:0.82

A−B

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.001; FC:2

RIBO:0.83POLYA:0.82

A−C

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.001; FC:2

RIBO:0.92POLYA:0.95

A−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.001; FC:2

RIBO:0.82POLYA:0.81

B−C

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.001; FC:2

RIBO:0.87POLYA:0.85

B−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.001; FC:2

RIBO:0.86POLYA:0.88

C−D

False positive rate (1−Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FDR: 0.001; FC:2

RIBO:0.85POLYA:0.83

FC: 1.5FDR: 0.001

FC: 1.5FDR: 0.01

FC: 1.5FDR: 0.05

FC: 2FDR: 0.001

FC: 2FDR: 0.01

FC: 2FDR: 0.05

0.0

0.2

0.4

0.6

0.8

A−B

A−C

A−D

B−C

B−D

C−D

A−B

A−C

A−D

B−C

B−D

C−D

A−B

A−C

A−D

B−C

B−D

C−D

A−B

A−C

A−D

B−C

B−D

C−D

A−B

A−C

A−D

B−C

B−D

C−D

A−B

A−C

A−D

B−C

B−D

C−D

comp

Exte

rnal

val

idat

ion

(MCC

)

RIBO POLYAc

Using  TaqMan  as  external  valida$on  812  reac$ons  

               

Page 42: 20140710 6 c_mason_ercc2.0_workshop

III:  Degraded  RNA?  

Page 43: 20140710 6 c_mason_ercc2.0_workshop

0.0

0.5

1.0

1.5

2.0

0 10 20 30 40 50 60 70 80 90

100

Gene (%, 5'−3')

Cov

erag

e (%

)

UHR−RIN0 UHR−RIN3 UHR−RIN6 UHR−RIN9

Page 44: 20140710 6 c_mason_ercc2.0_workshop

Genetic variation (SNVs, indels, CNVs)!2!

Differential expression by gene, exon, splice isoform, allele, & transcript!3!

Predict gene fusions, polyA sites and alterations to miRNA binding sites!

5!

!Viruses/Bacteria/Other"

De Novo Assembly of Remaining Reads! TCTGCTTTAGGATAGATCGATAGCTAGTTCAT CTGCTTTAGGATAGATCGATAGCTAGTTCATC!TCTGCTTTAGGATAGATCGATAGCTAGTTCAT!

6!

reads"

alignments"

Sorted BAMs"

annotations"

queries "

Rea

ds/b

p"

Alignment on HPC nodes"

References!

hg19"

RefSeq"

miRBase"

rRNA"

Adapters"

Find ncRNAs and new genes!

4!

1!

Sequencing Data"

Many  areas  could  be  impacted  

Page 45: 20140710 6 c_mason_ercc2.0_workshop

Entropy  is  usually  a  source  of  fear

Page 46: 20140710 6 c_mason_ercc2.0_workshop

"FEAR is  the  main  source  of    superstition,    and  one  of  the  main  sources  of      cruelty. To  conquer  fear  is  the  beginning  of    wisdom."      

–  Bertrand  Russell  

HBR

−RIN9

HBR

−RIN0

UHR−R

IN0

UHR−R

IN3

UHR−R

IN9

UHR−R

IN6

HBR−RIN9

HBR−RIN0

UHR−RIN0

UHR−RIN3

UHR−RIN9

UHR−RIN6

0.6 0.7 0.8 0.9 1Value

Color Key

Page 47: 20140710 6 c_mason_ercc2.0_workshop

Can  we  remove  supers$$on?

HEAT  99oC  for    

10  minutes  

SONICATION  6  x  55  seconds    at  5%DC,    I  intensity  3    100  c/b  

RNAse    RNAse  A    1  ng/uL  

Un$l  RIN  <2  

Page 48: 20140710 6 c_mason_ercc2.0_workshop

Degraded  RNA  looks  great!  

Coverage across genebody (%)

���ï&ï$

ï����ï&

ï$ï�

���ï,ï$ï�

���ï,ï$ï�

���ï3ï$

ï����ï3

ï$ï�

���ï&ï%

ï����ï&

ï%ï�

���ï,ï%ï�

���ï,ï%ï�

���ï3ï%

ï����ï3

ï%ï�

,/01ï&

ï$ï�

,/01ï&

ï$ï�

,/01ï&

ï$ï�

,/01ï&

ï$ï�

,/01ï&

�32/Y$

ï$ï�

,/01ï&

�32/Y$

ï$ï�

,/01ï&

�32/Y$

ï$ï�

,/01ï&

�32/Y$

ï$ï�

,/01ï/ï$

ï�,/0

1ï/ï$

ï�,/0

1ï/ï$

ï�,/0

1ï5

ï$ï�

,/01ï5

ï$ï�

,/01ï5

ï$ï�

,/01ï9

ï$ï�

,/01ï9

ï$ï�

,/01ï9

ï$ï�

,/01ï0

ï$+ï�

,/01ï0

ï$+ï�

,/01ï0

ï$+ï�

,/01ï1

ï$+ï�

,/01ï1

ï$5ï�

,/01ï1

ï$5ï�

,/01ï0

ï$6ï�

,/01ï0

ï$6ï�

,/01ï0

ï$6ï�

,/01ï&

ï%ï�

,/01ï&

ï%ï�

,/01ï&

ï%ï�

,/01ï&

ï%ï�

,/01ï&

�32/Y$

ï%ï�

,/01ï&

�32/Y$

ï%ï�

,/01ï&

�32/Y$

ï%ï�

,/01ï&

�32/Y$

ï%ï�

,/01ï/ï%

ï�,/0

1ï/ï%

ï�,/0

1ï/ï%

ï�,/0

1ï5

ï%ï�

,/01ï5

ï%ï�

,/01ï5

ï%ï�

,/01ï9

ï%ï�

,/01ï9

ï%ï�

,/01ï9

ï%ï�

,/01ï1

ï%5ï�

,/01ï1

ï%5ï�

,/01ï1

ï%5ï�

,/01ï&

ï&ï�

,/01ï&

ï&ï�

,/01ï&

ï&ï�

,/01ï&

ï&ï�

,/01ï&

�32/Y$

ï&ï�

,/01ï&

�32/Y$

ï&ï�

,/01ï&

�32/Y$

ï&ï�

,/01ï&

�32/Y$

ï&ï�

,/01ï&

ï'ï�

,/01ï&

ï'ï�

,/01ï&

ï'ï�

,/01ï&

ï'ï�

,/01ï&

�32/Y$

ï'ï�

,/01ï&

�32/Y$

ï'ï�

,/01ï&

�32/Y$

ï'ï�

,/01ï&

�32/Y$

ï'ï�

3$&ï$

ï$ï�

3$&ï$

ï$ï�

3$&ï$

ï$ï�

3$&ï$

ï$ï�

3$&ï)ï$

ï�3$&

ï)ï$ï�

3$&ï)ï$

ï�3$&

ï)ï$ï�

3$&ï+

ï$ï�

3$&ï+

ï$ï�

3$&ï+

ï$ï�

3$&ï+

ï$ï�

3$&ï$

ï%ï�

3$&ï$

ï%ï�

3$&ï$

ï%ï�

3$&ï$

ï%ï�

3$&ï)ï%

ï�3$&

ï)ï%ï�

3$&ï)ï%

ï�3$&

ï)ï%ï�

3$&ï+

ï%ï�

3$&ï+

ï%ï�

3$&ï+

ï%ï�

3$&ï+

ï%ï�

3*0ï+

ï$ï�

3*0ï+

ï$ï�

3*0ï3

ï$ï�

3*0ï3

ï$ï�

3*0ï6

ï$ï�

3*0ï6

ï$ï�

3*0ï6

ï$ï�

3*0ï6

ï$ï�

3*0ï+

ï%ï�

3*0ï+

ï%ï�

3*0ï3

ï%ï�

3*0ï3

ï%ï�

3*0ï6

ï%ï�

3*0ï6

ï%ï�

3*0ï6

ï%ï�

3*0ï6

ï%ï�

352ï%

ï$ï�

352ï6

ï$ï�

352ï6

ï$ï�

352ï6

ï$ï�

352ï%

ï%ï�

352ï6

ï%ï�

352ï6

ï%ï�

352ï6

ï%ï�

352ï%

ï&ï�

352ï%

ï'ï�

5'

platform454,/013$&3*0352

sample$%&'

site&,3&�32/Y$/5901$)+6%

typeintact+56

���

���

���

���

typesitesampleplatform

typesitesampleplatform

454   ILMN   PAC   PGM   PRO  

Page 49: 20140710 6 c_mason_ercc2.0_workshop

Degraded  RNA  highly  correlates  with  intact  RNA  gene  expression  

●●

●●●●●

●●

●●●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●●

●●●●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●●

●●●●●●

●●●●●

●●

●●●

●●

●●●●

●●

●●●●●●

●●●●

●●

●●●●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●

●●●

●●●

●●●●●●●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●●●

●●

●●

●●

●●●●●●

●●

●●

●●●●●

●●

●●●●

●●●●

●●

●●●●

●●●●●●

●●●●●●●●

●●

●●●●

●●

●●●●●●●●●●●●

●●

●●

●●

●●●●●●●

●●●●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●●●●●

●●

●●

●●●

●●●

●●

●●●

●●●

●●●●

●●●●●●

●●

●●●●●●●

●●

●●●●

●●

●●●●●

●●

●●●●

●●●●●●●

●●●●●●

●●

●●

●●

●●

●●●●●●●

●●●

●●●

●●

●●●●

●●●●●●●

●●●●

●●

●●●

●●●●●●●●●●●●●●

●●

●●

●●

●●●

●●●

●●●

●●●●●●

●●●

●●

●●●●●

●●●●●●

●●●●

●●●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●●●

●●●●●●

●●

●●

●●●●

●●●

●●●

●●●●

●●●

●●●●●

●●●●

●●

●●●

●●●

●●●●●●●●●●●●

●●●

●●●●

●●●●

●●●●●●

●●

●●●

●●●●

●●

●●

●●●

●●●

●●●●●

●●

●●●

●●●●●●

●●●●

●●●

●●●●●●●●●

●●

●●

●●

●●●

●●●●●

●●●●●●●

●●●●

●●

●●

●●●

●●

●●●●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●●

●●●●●●

●●●●●

●●

●●●●●

●●●

●●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●●

●●●●

●●

●●●●

●●●

●●●●●●

●●●●

●●●●●●●●●

●●

●●

●●●●●

●●●●

●●

●●●

●●

●●

●●

●●●●●●●●

●●

●●●●●●

●●

●●

●●

●●●

●●●

●●●●●

●●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●●●

●●●

●●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●●

●●●●

●●●●

●●

●●●

●●●●●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●

●●●●●

●●●

●●●●

●●●●●●●●●

●●●●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●●●●●●

●●

●●●●●

●●

●●●●●●

●●

●●●

●●●●

●●

●●●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●●

●●●

●●●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●●●●●●●●●●

●●●

●●

●●●

●●●

●●●●●●●

●●

●●●●●●

●●

●●

●●

●●●

●●●●

●●●

●●

●●

●●●●●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●●●●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

0

50

100

150

200

454−A454−B

ILMN−AILMN−BPAC−A1PAC−A2PAC−A3PAC−B1PAC−B2PAC−B3PGM−APGM−BPRO−APRO−B

Coeff

icien

t of v

ariati

on (%

)

454 ILMN PAC PGM PROintra−platform

Aintra−platform

Binter−platform

Ainter−platform

B

0.920.970.910.96

0.850.980.950.95

0.820.880.850.840.930.890.750.820.780.860.920.92

0.00

0.25

0.50

0.75

1.00

454ILMNPGMPRO

454ILMNPGMPRO454−ILMN454−PGM454−PRO

ILMN−PGMILMN−PROPGM−PRO454−ILMN454−PGM454−PRO

ILMN−PGMILMN−PROPGM−PRO

platform

Corre

lation

coeff

icien

ts

0.950.940.940.960.970.970.96

0.00

0.25

0.50

0.75

1.00

A−AHA−ASA−AR

AH−ASAH−ARAR−ASB−BR

sample

Corre

lation

coeff

icien

ts

50000

100000

150000

200000

9.0 9.5

10.0

10.5

11.0bases sequenced (log10)

detec

ted ju

nctio

ns

● 454 ILMN PAC PGM PRO

● ●A B

0

25

50

75

454

ILMN

PAC

PGM

PRO

platform

juncti

ons p

er mi

llion b

ases

A Bknown novel

0

20000

40000

60000

1 2 3 4 5 3 4 5

occurrence among platforms

juncti

ons

A B

●●

10000

20000

30000

9.0 9.5

10.0

10.5

11.0

bases sequenced (log10)

detec

ted ge

nes

● 454 ILMN PAC PGM PRO

● ●A B

gd e f

a b c

AB

A B

A

BCD

A BA BA BA B C D E F

AB A

B

AB

ABEF

Pacific Biosciences RS

Life TechnologiesIon PGM

Life TechnologiesIon Proton

(AR)  (AS)  (AH)  (BR)  

         

Illumina  Ribo-­‐deple$on  protocol  

Page 50: 20140710 6 c_mason_ercc2.0_workshop

0.    Genomes    

1.  RNA  Standards  

2.  Removing  RNA-­‐Seq  Varia$on  

3.  Epitranscriptome  

4.  #JustSequenceIt  

 

Page 51: 20140710 6 c_mason_ercc2.0_workshop

How  Sequencing  Quality  affect  Inter-­‐site  gene  expression  

quan$fica$on?  

SEQC  study  

Page 52: 20140710 6 c_mason_ercc2.0_workshop

Ameliora$ng  Inter-­‐site  Varia$on  SEQC

samplesEach site has 4 replicates

ILM1 ILM2

Replicate 5

ILM3

Replicate 5

ILM4 ILM5

Replicate 5

ILM6

Replicate 5 libraries were vendor supplied

SEQC  study  

HiSeq  2000  

Page 53: 20140710 6 c_mason_ercc2.0_workshop

Ameliora$ng  Inter-­‐site  Varia$on  SEQC

samplesEach site has 4 replicates

ILM1 ILM2

Replicate 5

ILM3

Replicate 5

ILM4 ILM5

Replicate 5

ILM6

Replicate 5 libraries were vendor supplied

SEQC  study  

Page 54: 20140710 6 c_mason_ercc2.0_workshop

Determine  sequencing  varia$on  sources  SEQC  study  

Page 55: 20140710 6 c_mason_ercc2.0_workshop

Nucleo$de  frequency  bias  from  library  prepara$on  

0.0

0.5

1.0

0 25 50 75 100

Percent of genebody (5'...>3')

Perc

enta

ge o

f rea

ds

1

5

ILM1

ILM2

ILM3

ILM4

ILM5

ILM6

A B C D

3.5

4.0

4.5IL

M1

ILM

2IL

M3

ILM

4IL

M5

ILM

6IL

M1

ILM

2IL

M3

ILM

4IL

M5

ILM

6IL

M1

ILM

2IL

M3

ILM

4IL

M5

ILM

6IL

M1

ILM

2IL

M3

ILM

4IL

M5

ILM

6

site

Max

imum

GC

read

%

1 2 3 4 5

● ● ● ● ● ●ILM1 ILM2 ILM3 ILM4 ILM5 ILM6

A G C T

23

24

25

26

2720 40 60 80 10

020 40 60 80 10

020 40 60 80 10

020 40 60 80 10

0

Position in read

Nucle

otid

e fre

quen

cy (%

)

ILM1 ILM2 ILM3 ILM4 ILM5 ILM6

1 5

A B C D

31

33

35

37

ILM

1IL

M2

ILM

3IL

M4

ILM

5IL

M6

ILM

1IL

M2

ILM

3IL

M4

ILM

5IL

M6

ILM

1IL

M2

ILM

3IL

M4

ILM

5IL

M6

ILM

1IL

M2

ILM

3IL

M4

ILM

5IL

M6

site

Coef

ficie

nt o

f var

iatio

n(g

eneb

ody

%)

1 2 3 4 5

● ● ● ● ● ●ILM1 ILM2 ILM3 ILM4 ILM5 ILM6

A B C D

2.0

2.5

ILM

1IL

M2

ILM

3IL

M4

ILM

5IL

M6

ILM

1IL

M2

ILM

3IL

M4

ILM

5IL

M6

ILM

1IL

M2

ILM

3IL

M4

ILM

5IL

M6

ILM

1IL

M2

ILM

3IL

M4

ILM

5IL

M6

site

Aver

age

base

erro

r rat

e (%

)

1 2 3 4 5

● ● ● ● ● ●ILM1 ILM2 ILM3 ILM4 ILM5 ILM6

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0

1

2

3

4

GC content (0% to 100%)

Perc

enta

ge o

f rea

ds

● ● ● ● ● ●ILM1 ILM2 ILM3 ILM4 ILM5 ILM6

● 1 5

e

da b c

f

SEQC  study  Replicate 5 libraries

were vendor supplied

       

<-­‐  5th  replicate  of  ILM2  and  ILM3  

<-­‐  1th  replicate  of  ILM2  

<-­‐  1th  replicate  of  ILM3  

Page 56: 20140710 6 c_mason_ercc2.0_workshop

Sample  A  vs.  A  =  the  Answer  should  be  zero  DEGs;  SVA  can  remove  false  posi$ve  DEGs    

inter−site

0.97

20.

960

0.97

50.

973

0.97

60.

955

0.97

10.

968

0.96

90.

957

0.95

80.

958

0.97

10.

973

0.97

0

0.00

0.25

0.50

0.75

1.00

ILM

1−IL

M2

ILM

1−IL

M3

ILM

1−IL

M4

ILM

1−IL

M5

ILM

1−IL

M6

ILM

2−IL

M3

ILM

2−IL

M4

ILM

2−IL

M5

ILM

2−IL

M6

ILM

3−IL

M4

ILM

3−IL

M5

ILM

3−IL

M6

ILM

4−IL

M5

ILM

4−IL

M6

ILM

5−IL

M6

Site

Cor

rela

tion

coef

ficie

nts

−2 −1 0 1 2

−0.5

0.0

0.5

Dimension 1

Dim

ensi

on 2

AAAAAAAAA

AAAAAAAAAAAAA

A

AAAA BBB

BBBBB

B

BBBBBBBBB

BBBB

B

BBBB

CCCCCCCCCCCCCCCCCCCCCCCCCCC

DDDDDDDDDDDDDDDDDDDDDDDDDDD

ILM1ILM2ILM3ILM4ILM5ILM6

0250050007500

10000

0250050007500

10000

0250050007500

10000

0250050007500

10000

AB

CD

ILM

1−IL

M2

ILM

1−IL

M3

ILM

1−IL

M4

ILM

1−IL

M5

ILM

1−IL

M6

ILM

2−IL

M3

ILM

2−IL

M4

ILM

2−IL

M5

ILM

2−IL

M6

ILM

3−IL

M4

ILM

3−IL

M5

ILM

3−IL

M6

ILM

4−IL

M5

ILM

4−IL

M6

ILM

5−IL

M6

Comparison

Fals

e po

sitiv

e D

EGs

Without SVA With SVA

a

b

cIllumina  

SEQC  study   Leek  JT,  Storey  JD.PLoS  Genet.2007  

Compare  the  same  sample  across  sites  

       

Page 57: 20140710 6 c_mason_ercc2.0_workshop

SVA  remove  false  posi$ve  DEGs  Validated  by  PGM  data  from  ABRF-­‐NGS  study    

inter−site

0.97

20.

960

0.97

50.

973

0.97

60.

955

0.97

10.

968

0.96

90.

957

0.95

80.

958

0.97

10.

973

0.97

0

0.00

0.25

0.50

0.75

1.00

ILM

1−IL

M2

ILM

1−IL

M3

ILM

1−IL

M4

ILM

1−IL

M5

ILM

1−IL

M6

ILM

2−IL

M3

ILM

2−IL

M4

ILM

2−IL

M5

ILM

2−IL

M6

ILM

3−IL

M4

ILM

3−IL

M5

ILM

3−IL

M6

ILM

4−IL

M5

ILM

4−IL

M6

ILM

5−IL

M6

Site

Cor

rela

tion

coef

ficie

nts

−2 −1 0 1 2

−0.5

0.0

0.5

Dimension 1

Dim

ensi

on 2

AAAAAAAAA

AAAAAAAAAAAAA

A

AAAA BBB

BBBBB

B

BBBBBBBBB

BBBB

B

BBBB

CCCCCCCCCCCCCCCCCCCCCCCCCCC

DDDDDDDDDDDDDDDDDDDDDDDDDDD

ILM1ILM2ILM3ILM4ILM5ILM6

0250050007500

10000

0250050007500

10000

0250050007500

10000

0250050007500

10000

AB

CD

ILM

1−IL

M2

ILM

1−IL

M3

ILM

1−IL

M4

ILM

1−IL

M5

ILM

1−IL

M6

ILM

2−IL

M3

ILM

2−IL

M4

ILM

2−IL

M5

ILM

2−IL

M6

ILM

3−IL

M4

ILM

3−IL

M5

ILM

3−IL

M6

ILM

4−IL

M5

ILM

4−IL

M6

ILM

5−IL

M6

Comparison

Fals

e po

sitiv

e D

EGs

Without SVA With SVA

a

b

cIllumina   PGM  

−2 −1 0 1

−0.5

0.0

0.5

1.0

1.5

Dimension 1

Dim

ensi

on 2

A.1A.2 A.3

A.4

B.1B.2B.3B.4

A.1

A.2

B.1B.2

A.1A.2

B.1B.2

PGM1PGM2PGM3

A B

4.5

5.0

5.5

6.0

PGM

1

PGM

2

PGM

3

PGM

1

PGM

2

PGM

3

siteM

axim

um G

C re

ad %

)

● ● ●PGM1 PGM2 PGM3

1 2 3 4

A B

56789

1011

PGM

1

PGM

2

PGM

3

PGM

1

PGM

2

PGM

3

siteAver

age

base

erro

r rat

e (%

)

● ● ●PGM1 PGM2 PGM3

1 2 3 4

A B

26

28

30

PGM

1

PGM

2

PGM

3

PGM

1

PGM

2

PGM

3

site

Coe

ffici

ent o

f var

iatio

n(g

eneb

ody

%)

● ● ●PGM1 PGM2 PGM3

1 2 3 4

comp: A comp: B

0

50

100

150

200

PGM

1−PG

M2

PGM

1−PG

M3

PGM

2−PG

M3

PGM

1−PG

M2

PGM

1−PG

M3

PGM

2−PG

M3

Comparison

Fals

e po

sitiv

e D

EGs

Without SVA With SVA

0

2500

5000

7500

10000

12500

PGM

1

PGM

2

PGM

3

Comparison

Num

ber o

f DEG

s

Without SVA With SVA

0.0

0.1

0.2

0.3

0.4

PGM

1

PGM

2

PGM

3

Site

MC

C

Without SVA With SVA

d e f g

a b c

SEQC  study   Li  et  al.,  in  press,  Nat.  Biotech.,  2014   ABRF-­‐NGS  study  

       

Page 58: 20140710 6 c_mason_ercc2.0_workshop

NIST,  SEQC,  ABRF,  ENCODE,  others  •  Provide  reference  data  

resources    •  Best  prac$ces  for    

–  gene  quan$fica$on,    –  isoform  characteriza$on,  –  dynamic  range  comparisons,    –  managing  inter-­‐site  and  intra-­‐

site  varia$on,    –  analysis  pipelines,  and    –  cross-­‐pla(orm  tes$ng  of  

transcriptome  hypotheses    •  To  address  some  other  aspects  

of  RNA-­‐seq,  including    –  variant  detec$on,    –  allele-­‐specific  expression,    –  RNA  edi$ng,  and  gene  fusions.    

•  And  more  …  

Page 59: 20140710 6 c_mason_ercc2.0_workshop

0.    Genomes    

1.  RNA  Standards  

2.  Removing  RNA-­‐Seq  Varia$on  

3.  Epitranscriptome  

4.  #JustSequenceIt  

 

Page 60: 20140710 6 c_mason_ercc2.0_workshop

Which  transcriptome  is  important?      

All  …  RNA  bases.  

Page 61: 20140710 6 c_mason_ercc2.0_workshop
Page 62: 20140710 6 c_mason_ercc2.0_workshop

The  four-­‐base  genome    is  just  the  beginning  

Page 63: 20140710 6 c_mason_ercc2.0_workshop

The  four-­‐base  genome    is  just  the  beginning  

Page 64: 20140710 6 c_mason_ercc2.0_workshop

There  are  many  RNA-­‐mods  as  well:  

Chuan  He  

Page 65: 20140710 6 c_mason_ercc2.0_workshop

Abbreviation Chemical0namem1acp3Ψ 1'methyl'3'(3'amino'3'carboxypropyl)5pseudouridine

m1A 1'methyladenosinem1G 1'methylguanosinem1I 1'methylinosinem1Ψ 1'methylpseudouridinem1Am 1,2''O'dimethyladenosinem1Gm 1,2''O'dimethylguanosinem1Im 1,2''O'dimethylinosinem2A 2'methyladenosine

ms2io6A 2'methylthio'N6'(cis'hydroxyisopentenyl)5adenosinems2hn6A 2'methylthio'N6'hydroxynorvalyl5carbamoyladenosinems2i6A 2'methylthio'N6'isopentenyladenosinems2m6A 2'methylthio'N6'methyladenosinems2t6A 2'methylthio'N6'threonyl5carbamoyladenosines2Um 2'thio'2''O'methyluridines2C 2'thiocytidines2U 2'thiouridineAm 2''O'methyladenosineCm 2''O'methylcytidineGm 2''O'methylguanosineIm 2''O'methylinosineΨm 2''O'methylpseudouridineUm 2''O'methyluridineAr(p) 2''O'ribosyladenosine5(phosphate)Gr(p) 2''O'ribosylguanosine5(phosphate)acp3U 3'(3'amino'3'carboxypropyl)uridinem3C 3'methylcytidinem3Ψ 3'methylpseudouridinem3U 3'methyluridinem3Um 3,2''O'dimethyluridineimG'14 4'demethylwyosines4U 4'thiouridine

chm5U 5'(carboxyhydroxymethyl)uridinemchm5U 5'(carboxyhydroxymethyl)uridine5methyl5esterinm5s2U 5'(isopentenylaminomethyl)'52'thiouridineinm5Um 5'(isopentenylaminomethyl)'52''O'methyluridineinm5U 5'(isopentenylaminomethyl)uridinenm5s2U 5'aminomethyl'2'thiouridinencm5Um 5'carbamoylmethyl'2''O'methyluridinencm5U 5'carbamoylmethyluridine

cmnm5Um 5'carboxymethylaminomethyl'52''O'methyluridinecmnm5s2U 5'carboxymethylaminomethyl'2'thiouridinecmnm5U 5'carboxymethylaminomethyluridinecm5U 5'carboxymethyluridinef5Cm 5'formyl'2''O'methylcytidinef5C 5'formylcytidinehm5C 5'hydroxymethylcytidineho5U 5'hydroxyuridine

mcm5s2U 5'methoxycarbonylmethyl'2'thiouridinemcm5Um 5'methoxycarbonylmethyl'2''O'methyluridinemcm5U 5'methoxycarbonylmethyluridinemo5U 5'methoxyuridinem5s2U 5'methyl'2'thiouridine

mnm5se2U 5'methylaminomethyl'2'selenouridinemnm5s2U 5'methylaminomethyl'2'thiouridinemnm5U 5'methylaminomethyluridinem5C 5'methylcytidinem5D 5'methyldihydrouridinem5U 5'methyluridineτm5s2U 5'taurinomethyl'2'thiouridineτm5U 5'taurinomethyluridinem5Cm 5,2''O'dimethylcytidinem5Um 5,2''O'dimethyluridinepreQ1 7'aminomethyl'7'deazaguanosinepreQ0 7'cyano'7'deazaguanosinem7G 7'methylguanosineG+ archaeosineD dihydrouridineoQ epoxyqueuosinegalQ galactosyl'queuosineOHyW hydroxywybutosine

I inosineimG2 isowyosinek2C lysidine

manQ mannosyl'queuosinemimG methylwyosinem2G N2'methylguanosinem2Gm N2,2''O'dimethylguanosinem2,7G N2,7'dimethylguanosinem2,7Gm N2,7,2''O'trimethylguanosinem2

2G N2,N2'dimethylguanosinem2

2Gm N2,N2,2''O'trimethylguanosinem2,2,7G N2,N2,7'trimethylguanosineac4Cm N4'acetyl'2''O'methylcytidineac4C N4'acetylcytidinem4C N4'methylcytidinem4Cm N4,2''O'dimethylcytidinem4

2Cm N4,N4,2''O'trimethylcytidineio6A N6'(cis'hydroxyisopentenyl)adenosineac6A N6'acetyladenosineg6A N6'glycinylcarbamoyladenosinehn6A N6'hydroxynorvalylcarbamoyladenosinei6A N6'isopentenyladenosine

m6t6A N6'methyl'N6'threonylcarbamoyladenosinem6A N6'methyladenosinet6A N6'threonylcarbamoyladenosine

m6Am N6,2''O'dimethyladenosinem6

2A N6,N6'dimethyladenosinem6

2Am N6,N6,2''O'trimethyladenosineo2yW peroxywybutosineΨ pseudouridineQ queuosine

OHyW undermodified5hydroxywybutosinecmo5U uridine55'oxyacetic5acidmcmo5U uridine55'oxyacetic5acid5methyl5ester

yW wybutosineimG wyosine

Table515'5List5of5Base5Modifications5Covered5by5Claims

m6A  is  just  1  of  the  110    known  RNA  modifica$ons  from  the    RNA  Modifica$on  Database  

Page 66: 20140710 6 c_mason_ercc2.0_workshop

m6A  marks  are  widespread  in  cancer  cells’  RNA  

Meyer  et  al.,  Cell,  2012  

Page 67: 20140710 6 c_mason_ercc2.0_workshop

A  new  method:  MeRIP-­‐Seq  

Meyer  et  al.,  Cell,  2012  

Page 68: 20140710 6 c_mason_ercc2.0_workshop

Conserva$ons  of  signal  and  sites  in  orthologous    genes    

Meyer  et  al.,  Cell,  2012  

Page 69: 20140710 6 c_mason_ercc2.0_workshop

Peaks  also  show  strong  conserva$on  

Meyer,  2012  

Page 70: 20140710 6 c_mason_ercc2.0_workshop

Many  puta$ve  roles  for  m6A  in  RNA  

Paz-­‐Yaakov  et  al,  2010  

Saletore,  Chen-­‐Kiang,  and  Mason,  RNA  Biology,  2013.  

Page 71: 20140710 6 c_mason_ercc2.0_workshop
Page 72: 20140710 6 c_mason_ercc2.0_workshop

RNA  modifica$ons  give  a  new  layer  of  cellular  regula$on  

N

NN

N

NH2

O

OHO

O

RNA

RNA

N

NN

N

HN

O

OHO

ORNA

RNA

H2C

OH

N

NN

N

HN

O

OHO

ORNA

RNA

CH

O

N

NN

N

HN

O

OHO

ORNA

RNA

COH

O

N

NN

N

HN

O

OHO

O

RNA

RNA

CH3

A m6A

ca6A f6A hm6A

O

H HO

H OH

H2O

Mettl3

FTO/ALKBH5

O2, FeII, -KG

FTO/ALKBH5

O2, FeII, -KG

Li  and  Mason,  ARGHG,  2014  

Page 73: 20140710 6 c_mason_ercc2.0_workshop

New  layer  to  study!  

Page 74: 20140710 6 c_mason_ercc2.0_workshop

Direct  Detec$on  of  Methyla$on  on  PacBio  

70.5 71.0 71.5 72.0 72.5 73.0 73.5 74.0 74.50

100

200

300

400

Fluo

resc

ence

inte

nsity

(a.u

.)

Time (s)

104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.50

100

200

300

400

Fluo

resc

ence

inte

nsity

(a.u

.)

Time (s)

C  

T   G   A   T  C   G   T   A   C  

mA  A  G   T  C  T   A   A  

G   C   C   A  A   A  A  

Approach:  Kine$c  detec$on  of  methylated  bases  during  SMRT  DNA  sequencing  

Example:  N6-­‐methyladenosine  (mA)  

Page 75: 20140710 6 c_mason_ercc2.0_workshop

SMRT RNA Sequencing – Assay Design

I.  RNA template (purple) is hybridized to a DNA primer (orange) and immobilized on the bottom of a zero-mode waveguide (ZMW) in the presence of fluorescently-labeled nucleotide analogues.

II.  After the addition of HIV RT to the solution, the polymerase binds to the RNA template at the 3’-end of DNA primer.

III.  The first nucleotide analogue can bind to the active site of the polymerase resulting in a fluorescent pulse with a color characteristic to the type of nucleotide (i.e. A, C, G, or T). With HIV RT, the rates of nucleotide binding, dissociation and incorporation have not been optimized to a state where each binding event would result in an incorporation event (image IV.). Consequently, nucleotide analogues often dissociate before they can be incorporated resulting in a series of like pulses before the polymerase can translocate to the next position and bind the nucleotide analogue complementary to the next nucleotide in RNA template.

IV.  Incorporation of the nucleotide and translocation to the next nucleotide on RNA template.

Page 76: 20140710 6 c_mason_ercc2.0_workshop

First demonstration SMRT RNA Sequencing – Expected Signal

A.  Example RNA template is shown in black. DNA primer is shown in orange. The first cDNA strand synthesized by HIV RT during SMRT RNA Sequencing is color-coded (A – yellow, C – red, G – gree, T – blue). The first cDNA strand is complementary to the unhybridized section of the example RNA template and can be used to determine the sequence of that section.

B.  A typical time trace obtained in a ZMW with RNA template immobilized on the bottom surface. One can observe a succession of blocks of like-colored pulses corresponding to the sequence of cDNA strand (blocks are indicated by lines above the trace).

C.  As mentioned on Slide 3 – Point III., due to the kinetics of the current HIV RT, we observe multiple binding events at each RNA template position before nucleotide incorporation and translocation to the next position.

A

B

C

Page 77: 20140710 6 c_mason_ercc2.0_workshop

Saletore  et  al.,  Genome  Biology,  2012  

Page 78: 20140710 6 c_mason_ercc2.0_workshop

m6A  is  dis$nct  from  A  

Saletore  et  al.,  Genome  Biology,  2012  

Page 79: 20140710 6 c_mason_ercc2.0_workshop

Epigenome                              

Epitranscriptome                              

Epiproteome  (PTM)                              

DNA   RNA   Protein  transcrip$on   transla$on  

RT  

ACGT  

viRNA            

I  N  H  E  R  I  T  A  N  C  E      or      T  R  A  N  S  M  I  S  S  I  O  N  

prions  

iDNA  

mC,hmC,  8oxoG,  m6A  

(sn/sno/g)RNA   (t/r/tm)RNA  

(lnc/nc/mi/piwi/si/vi)RNA  

scRNA  

RbP  

Ribozymes   Prions  

iRNA   iProtein  

mC,hmC,  8oxoG,  m6A,  …   m6A,  5’G,polyA,  …   Acet,  Phos,  Citr,  SUMO,  …    

from  Saletore  et  al.,  Genome  Biology,  2012  

Page 80: 20140710 6 c_mason_ercc2.0_workshop

0.    Genomes    

1.  RNA  Standards  

2.  Removing  RNA-­‐Seq  Varia$on  

3.  Epitranscriptome  

4.  #JustSequenceIt  

 

Page 81: 20140710 6 c_mason_ercc2.0_workshop

Which  genome  is  important?      

All  …  Cells.  

Page 82: 20140710 6 c_mason_ercc2.0_workshop

−5 0 5

−6−4

−20

24

Comp.1

Com

p.2 C01

C02

C03

C04C05

C06C07

C08

C09

C10

C11

C12

C13

C14

C15

C16

C17

C18

C19

C20

C21

C22

C23

C24

C25

C26

C27

C28

C29

C30

C31

C32C33

C34

C35

C36

C37

C38

C39

C40

C41C42

C43

C44

C45

C46

C47

C48

C49

C50

C51

C52

C53

C54

C55

C56

C57

C58C59

C60

C61

C62

C63

C64C65

C66

C67

C68

C69

C70

C71

C72

C73

C74

C75

C76

C77

C78

C79C80C81

C82

C83

C84

C85C86

C87

C88

C89

C90

C91

C92

C93

C94

C95

C96

Principal  Component  1  

Principal  Com

pone

nt  2  

C05

C12

C29

C33

C44

C78

C88

C35

C03

C19

C16

C20

C72

C38

C83

C01

C67

C60

C42

C52

C32

C86

C02

C91

C09

C46

C80

C11

C58

C64

C56

C77

C06

C23

C15

C90

C62

C65

C43

C48

C84

C34

C53

C69

C37

C82

C28

C76

C87

C61

C92

C85

C47

C96

C50

C59

C39

C66

C93

C26

C27

C54

C71

C07

C22

C14

C81

C08

C41

C55

C73

C70

C51

C75

C24

C94

C25

C10

C17

C04

C68

C13

C63

C18

C57

C40

C74

C36

C79

C30

C49

C21

C45

C95

C31

C89

MGST3URODPSMA5PSMB2DHRS3L.XKR8HBXIPS100A11R.spk7midRPS27PSMD4MAST2YBX1SLC39A1UBR4B4GALT3S.8CNIH4NCSTNSRMRNF220WDR77TRAPPC3RPL11VPS72ILF2NUDCS.4PARP1PRDX6BCRABLHIST2H2BEYARSPI4KBUBE2Q1VAMP3KIFAP3SF3A3TPRDIRAS3GNPATZMPSTE24HSD17B7IPO13KIF2CCAPZA1FAF1MRPL9SDHBAHCYL1NOC2LHMGN2L.AXIN2GNL2SFRS4CREG1ETV1SOX5MEAF6S.2DTLMARCKSL1MRPS15JAK1GNB1S.10PTCH1ALDH9A1RPA2KHDRBS1SFPQARPC5TXNIPUFC1RABGGTBS.5KIAA0907CDC20PRPF3KPNA6RPL22ACTBWASF2ATP5F1S.6KIF14LRRC47

0

5

10

15

20

25

30

Check  Cells  

Sort  Cells,    Load  plate  

Library  prep.  &  RNA-­‐Seq  

Lyse,  cDNA  

AML-­‐related  genes’  expression  (RPKM

)    

Purified  Cell  (1-­‐96)  

Single-­‐cell  RNA-­‐seq  reveals  extraordinary  heterogeneity  

Leukemia  Sample  

Page 83: 20140710 6 c_mason_ercc2.0_workshop

Which  genome  is  important?      

All  …  Species.  

Page 84: 20140710 6 c_mason_ercc2.0_workshop

nhprtr.org  Pipes,  Li,  Bozinoski  et  al.,  NAR,  2013  

Page 85: 20140710 6 c_mason_ercc2.0_workshop

Prosimian

Human

Chimpanzee

Gorilla

Rhesus Macaque

Japanese Macaque

Cynomolgus Macaque

Pig-tailed Macaque

Vervet

Sooty Mangabey

Common Marmoset

Squirrel Monkey

Ring-tailed Lemur

Mouse Lemur

Olive Baboon

New World Monkeys

Old World

Hominoids

~70 mya

35-40

23-25

6

8

9

10

14 species selected for phylogenetic & pharmacogenomic significance:

• Samples  from  Washington,  Oregon,  Yerkes,  and  Southwestern  Na$onal  Primate  Research  Centers;  the  Duke  Lemur  Center;  the  MD  Anderson  Cancer  Center,  and  Covance,  Inc.    

Page 86: 20140710 6 c_mason_ercc2.0_workshop

21  Matching  “BodyMap”  Dissected  Tissues  Brain   Immune/Sex   Soma  

Cerebellum   Bone  Marrow   Adipose  

Frontal  Cortex   Lymph  Node   Colon  

Hippocampus   Spleen   Heart  

Hypothalamus   Thymus   Kidney  

Pituitary   Thyroid   Liver  

Temporal  Lobe   Ovary   Lung  

Tes$s   Skeletal  Muscle  

Whole  Blood  Phase  I  (2010  -­‐2012)  RNA  extracted  from    each  $ssue,  pooled,    sequenced  on  two  FCs    (HS2K,  GAIIx)  for  overall  annota$on.  

Phase  II  (2012  –  2014):  RNA  extracted  from    12  $ssues,  individually  sequenced  (1/2  lane  HS2K),  for  $ssue-­‐specific    annota$on.        

Page 87: 20140710 6 c_mason_ercc2.0_workshop

AceView  Genes  Expressed  

Species  /isolate  

Number  of  genes  

expressed  

Known  human  genes  

Un-­‐annotated  human  genes  

BAB   52065   22376   29689  

CMC   50073   21916   28157  

CMM   50155   21974   28181  

JMI   49253   21785   27468  

PTM   50863   22084   28779  

RMC   49845   21888   27957  

RMI   49922   21913   28009  

Jean  Thierry-­‐Mieg  

Page 88: 20140710 6 c_mason_ercc2.0_workshop

Species   Library   #  input  sequences  (billion)  

#  con;gs   Total  length  (Mb)  

N25  (bp)  

N50  (bp)  

N75  (bp)  

Longest  con;g  (bp)  

Baboon  TOT   0.149   658,581   281   1,029   434   276   35,170  

UDG   1.543   1,131,951   844   3,534   1,368   479   131,395  

Chimpanzee   UDG   1.465   987,615   1,433   6,356   3,806   1,666   47,873  

Cynomologus  Macaque  Chinese  

RNA   1.676   911,282   864   4,769   2,316   706   59,032  

UDG   1.630   990,604   1,055   5,479   2,822   875   122,916  

Cynomolgus  Macaque  Mauri;an  

RNA   1.078   1,142,531   929   4,368   1,813   525   30,364  

TOT   0.166   526,723   200   657   377   266   36,976  

UDG   1.177   1,015,657   834   4,360   1,927   532   22,239  

Gorilla   UDG   1.256   732,336   1,122   5,878   3,680   1,804   33,526  

Japanese  Macaque     RNA   1.863   703,246   738   5,010   2,687   898   21,620  

Marmoset  TOT   0.253   332,782   118   545   357   261   14,561  

UDG   1.659   814,235   476   2,206   785   359   122,518  

Pig-­‐tailed  Macaque  RNA   1.725   969,993   1,044   5,189   2,807   916   25,056  

UDG   1.573   1,301,087   1,222   5,015   2,265   668   32,637  

Rhesus  Macaque  Chinese  

RNA   1.210   923,017   836   4,694   2,230   639   32,796  

UDG   1.310   969,421   850   4,638   2,114   594   34,020  

Rhesus  Macaque  Indian  

RNA   3.200   703,246   738   5,010   2,687   898   21,620  

UDG   1.410   1,051,149   833   3,966   1,644   519   68,269  

Ring-­‐tailed  Lemur   UDG   1.403   611,678   822   6,169   3,630   1,468   33,401  

Sooty  Mangabey   UDG   1.635   1,188,472   1,466   6,431   3,483   1,116   30,331  

Page 89: 20140710 6 c_mason_ercc2.0_workshop

Chimpanzee  de  novo  assembled  transcriptome  fully  recovers  most  genes  in  current  annota$on  Chimpanzee De Novo Assembled Transcriptome

Proportion of gene covered by transcriptome

Num

ber o

f gen

es

0.0 0.2 0.4 0.6 0.8 1.0

02000

4000

6000

8000

10000

12000

Page 90: 20140710 6 c_mason_ercc2.0_workshop

Which  genome  is  important?      

All  …  Interac$ng  Elements.  

Page 91: 20140710 6 c_mason_ercc2.0_workshop

We  need  more  Longitudinal,  Integra$ve  Systems  Biology  

Page 92: 20140710 6 c_mason_ercc2.0_workshop
Page 93: 20140710 6 c_mason_ercc2.0_workshop
Page 94: 20140710 6 c_mason_ercc2.0_workshop

ScoA  Kelly  –  ISS  for  one  year  

Mark  Kelly  –  Earth  control  

Telomere  Length    

DNA  Muta$ons  &  Structural  Varia$on  

DNA  Hydroxy-­‐methyla$on  

Chroma$n  

RNA  expression  &  RNA  Methyla$on   Proteomics  

An$body  Titers  

Cytokines    

DNA  Methyla$on  

B-­‐cells  /  T-­‐cells    

Targeted  and  Global  Metabolomics  Microbiome    

Cogni$on    

Vasculature    

Page 95: 20140710 6 c_mason_ercc2.0_workshop

These People are Awesome @mason_lab  

Page 96: 20140710 6 c_mason_ercc2.0_workshop

Gratitude to Many People and Places

Illumina Gary Schroth Marc Van Oene

Univ. Chicago Yoav Gilad

Yale University Nenad Sestan Sherman Weissman FDA/SEQC Leming Shi NIH/UDP/NCBI Jean and Danielle Thierry-Mieg William Gahl

Baylor Jeff Rogers

MSKCC Christina Leslie Ross Levine

PKI/Geospiza Todd Smith

Mason Lab Noah Alexander Marjan Bozinoski Dhruva Chandramohan Sagar Chhangawala Jorge Gandara Fran Garrett-Bakelman Dyala Jaroudi Sheng Li Cem Meyden Lenore Pipes Darryl Reeves Yogesh Saletore Priyanka Vijay Paul Zumbo

Cornell/WCMC Jason Banfelder Scott Blanchard Selina Chen-Kiang Olivier Elemento Yariv Houvras Samie Jaffrey Ari Melnick Margaret Ross Adam Siepel Epigenomics Core

Katze Lab/NHPRT Michael Katze Bob Palermo Xinxia Pan

Icahn/MSSM Eric Schadt, Andrew Kasarskis, Joel Dudley, Ali Bashir, Bobby Sebra

NASA Kosuke Fujishima Lynn Rothschild

ABRF George Grills Scott Tighe Don Baldwin

UMMS Maria E Figueroa

AMNH George Amato Mark Sidall

U-Minnesota Ran Blekhman

@mason_lab