33
RNA-SEQ Francesca Bertolini DTU-Bioinformatics 08-06-2018

RNA-SEQ - cbs.dtu.dk · 21 DTU Bioinformatics, Technical University of Denmark 6. Data analyses COMBINED ASSEMBLY Martin et al. 2011, Nature Review Genetics. 22 DTU Bioinformatics,

  • Upload
    buikien

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

RNA-SEQ

Francesca BertoliniDTU-Bioinformatics

08-06-2018

DTU Bioinformatics, Technical University of Denmark

SUMMARY1. Definition

2. List of major applications

3. RNA classifications

4. Sample collections and RNA integrity

5. Library preparation

6. Data analyses

– Long reads alignment/assembly

– Read count

– Normalization

– Differential expression

– Gene enrichment

– Short RNA2

DTU Bioinformatics, Technical University of Denmark

1. TRANSCRIPTOMICS

3

• Complete set of transcripts in a cell and their

quantity, for a specific developmental stage

or a physiological condition.

• Accurate profiling depends massively to the

quality of the annotation of the reference

genome.

DTU Bioinformatics, Technical University of Denmark 16 May 20184

2.RNA-seq applications

• Abundance estimation/differential

expression

• Alternative splicing

• RNA editing

• Novel transcripts

• Allele specific expression

• Fusion transcripts

DTU Bioinformatics, Technical University of Denmark 16 May 20185

2.RNA-seq applications

• Abundance estimation/differential

expression

• Alternative splicing

• RNA editing

• Novel transcripts

• Allele specific expression

• Fusion transcripts

DTU Bioinformatics, Technical University of Denmark 16 May 20186

3. RNA classification

•Ribosomal RNA (rRNA): catalytic

component of ribosomes (about 80-85%)

•Transfer RNA (tRNA): transfers amino acids

to polypeptide chain at the ribosomal site of

protein synthesis (about 15%)

•Coding RNA(mRNA): carries information

about a protein sequence to the ribosomes

(about 5%)

•Other Non coding regulatory RNAs

DTU Bioinformatics, Technical University of Denmark7

Other non coding regulatory RNAs

Delpu et al. 2016. Drug Discovery in Cancer Epigenetics

3. RNA classification

DTU Bioinformatics, Technical University of Denmark8

Long RNAs: splicing

DNA

RNA

mRNA

lncRNA

3. RNA classification

DTU Bioinformatics, Technical University of Denmark9

Before RNA extraction

RNA is more unstable than DNA, therefore higher

precautions are needed to avoid degradation

TISSUE COLLECTION:

• Liquid nitrogen

• RNA later (for solid tissues)

• Tempus/Pax tubes (for blood)

-80C storage

-20C short term

-80C long term

4. Samples collection and RNA integrity

DTU Bioinformatics, Technical University of Denmark10

After RNA extraction

RIN (RNA integrity number): algorithm for

assigning integrity values to RNA measurements.

10: maximum

0: minimum

Integrity RIN>7 ok

4. Samples collection and RNA integrity

DTU Bioinformatics, Technical University of Denmark11

RNA quality (RIN)

and quantification:

Bioanalyzer

4. Samples collection and RNA integrity

DTU Bioinformatics, Technical University of Denmark12

Before Library preparation

▪ Total RNA seq (Ribosomal depletion, DNase

treatment, fragmentation, library preparation)

▪ mRNA+lnc (polyA+) RNA seq (polyA

enrichment, DNase treatment, fragmentation,

library preparation)

▪ shortRNA seq (Size selection, DNase treatment,

library preparation)

5. Library preparation

DTU Bioinformatics, Technical University of Denmark13

DTU Bioinformatics, Technical University of Denmark14

5. Library preparation

DTU Bioinformatics, Technical University of Denmark15

5. Library preparation

DTU Bioinformatics, Technical University of Denmark16

WORKFLOW FOR LONG READS

Martin et al. 2011, Nature Review Genetics

6. Data analyses

DTU Bioinformatics, Technical University of Denmark 16 May 201817

Transcriptome assembly strategies

❖ Reference-based

❖ De novo

❖ Combined (Reference based + de novo)

❖ Pseudoalignment

6. Data analyses

DTU Bioinformatics, Technical University of Denmark18

Reference-based: Most common tools

• Unspliced read aligner

✓BWA

✓Bowtie2

✓Novoalign

• Spliced read aligner

✓Tophat2/Hisat2

✓STAR

• Splice-junction not

considered

• Ideal for mapping against

cDNA databases

• Novel splice-junction

detected

• Better performance for

polymorphic regions and

pseudogenes

6. Data analyses

DTU Bioinformatics, Technical University of Denmark19

Splice junctions view through IGV (Integrative Genomics Viewer)

6. Data analyses

DTU Bioinformatics, Technical University of Denmark 16 May 201820

• Velvet

✓ Genomics and transcriptomics

• Trinity

✓ Transcriptomics

• Cufflinks

✓ Reassemble pre-aligned transcripts to

find alternative splicing based on

differential expression

6. Data analyses

De novo assembly: Most common tools

DTU Bioinformatics, Technical University of Denmark21

6. Data analyses

COMBINED ASSEMBLY

Martin et al. 2011, Nature Review Genetics

DTU Bioinformatics, Technical University of Denmark22

6. Data analyses

DTU Bioinformatics, Technical University of Denmark23

6. Data analyses

READ COUNT

Count the number of reads aligned to each known

transcripts/isoform

• Need an annotation file

E.g HTSeq-count

DTU Bioinformatics, Technical University of Denmark24

6. Data analyses

NORMALIZATION

• Longer genes will have more reads mapping to

them (within samples)

• Sequencing run with more depth will have more

reads mapping on each gene (between

samples)

DTU Bioinformatics, Technical University of Denmark25

NORMALIZATION6. Data analyses

Gene name

Rep 1 Rep 2 Rep 3

A (2Kb) 10 12 30

B (4Kb) 20 25 60

C (1Kb) 5 8 15

D (10Kb) 0 0 1

Raw counts

DTU Bioinformatics, Technical University of Denmark26

DIFFERENTIAL

EXPRESSION

6. Data analyses

Differential gene expression based on the negative

binomial distribution, to account for the variability

of gene expression (e.g. lower expressed genes

have higher variance than higher expressed genes)

DTU Bioinformatics, Technical University of Denmark 16 May 201827

Input: Read count table (e.g. HTSeq but also from

pseudoalignment)

Output: Table containing statistics for whether a

gene is differentially expressed between condition

6. Data analyses

DESeq2

DTU Bioinformatics, Technical University of Denmark28

Differentially expressed

gene Categories

Gene Ontology project provides an ontology

of defined terms representing gene product properties.

The ontology covers three domains:

• Molecular function

molecular activities of gene products

• Cellular component

where gene products are active

• Biological process

pathways and larger processes made up of the

activities of multiple gene products.

DTU Bioinformatics, Technical University of Denmark29

6. Data analyses

Some GO and pathway

analysis tools

http://amp.pharm.mssm.edu/Enrichr/

http://cbl-gorilla.cs.technion.ac.il/

https://david.ncifcrf.gov/

https://www.qiagenbioinformatics.com/

DTU Bioinformatics, Technical University of Denmark30

Micro RNA (miRNA)

• 18-22 nucleotide length

• One of the most studied among the small RNAs

• miRNA-mRNA interaction

• www.mirbase.org

6. Data analyses

DTU Bioinformatics, Technical University of Denmark 16 May 201831

The short sequence length makes small RNA difficult to

map in large and complex reference genome. Common

aligner for long RNA are therefore not accurate for

short RNA mapping (Ziemann et al. 2016, RNA)

6. Data analyses

DTU Bioinformatics, Technical University of Denmark32

THANK YOU !

DTU Bioinformatics, Technical University of Denmark33

• Morris KV, Mattick JS. The rise of regulatory RNA. Nat Rev Genet. 2014

Jun;15(6):423-37. doi: 10.1038/nrg3722. Epub 2014 Apr 29. Review.

PubMed PMID: 24776770; PubMed Central PMCID: PMC4314111.

• Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A,

McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi

A. A survey of best practices for RNA-seq data analysis. Genome Biol.

2016 Jan 26;17:13. doi:10.1186/s13059-016-0881-8. Review. Erratum in:

Genome Biol. 2016;17(1):181.

• Costa-Silva J, Domingues D, Lopes FM. RNA-Seq differential

expression analysis: An extended review and a software tool. PLoS One.

2017 Dec 21;12(12):e0190152. doi: 10.1371/journal.pone.0190152.

eCollection 2017. Review

https://statquest.org/2015/07/09/rpkm-fpkm-and-tpm-clearly-explained/