41
Measuring Isoform Expression from RNA- Seq data Based on LDA 刘刘刘 2012.9.21

Measuring Isoform Expression from RNA-Seq data Based on LDA

  • Upload
    brinly

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Measuring Isoform Expression from RNA-Seq data Based on LDA. 刘学军 2012.9.21. Outlines. Background Modeling RNA-Seq data Results. Alternatively spliced isoforms. RNA-Seq data – an example. reference ACGTCCCC 12 ACGTC reads 8 CGTCC reads - PowerPoint PPT Presentation

Citation preview

Page 1: Measuring Isoform Expression from RNA-Seq data Based on LDA

Measuring Isoform Expression from RNA-Seq data Based on

LDA

刘学军2012.9.21

Page 2: Measuring Isoform Expression from RNA-Seq data Based on LDA

Outlines

• Background

• Modeling RNA-Seq data

• Results

Page 3: Measuring Isoform Expression from RNA-Seq data Based on LDA

Alternatively spliced isoforms

Page 4: Measuring Isoform Expression from RNA-Seq data Based on LDA

RNA-Seq data – an example

reference ACGTCCCC

12 ACGTC reads

8 CGTCC reads

9 GTCCC reads

5 TCCCC reads

This gene can be summarized by a sequence of counts 12, 8, 9, 5.

Page 5: Measuring Isoform Expression from RNA-Seq data Based on LDA

Structure of RNA-seq data

Page 6: Measuring Isoform Expression from RNA-Seq data Based on LDA

LDA, Latent dirichlet allocation

Page 7: Measuring Isoform Expression from RNA-Seq data Based on LDA

LDAseq - probe

Page 8: Measuring Isoform Expression from RNA-Seq data Based on LDA

LDAseq

Page 9: Measuring Isoform Expression from RNA-Seq data Based on LDA

Convert \theta to expression level

• Obtain P(\theta|D)

• Normalize counts to sequencing depth and isoform length:

Page 10: Measuring Isoform Expression from RNA-Seq data Based on LDA

Workflow of LDAseq

Page 11: Measuring Isoform Expression from RNA-Seq data Based on LDA

Data set 1

Tissues liverliver1 liver2

brainbrain1 brain2

musclemuscle1 muscle2

Number of reads

31578097 44173056 31116663 47781892 31763031 38007934

3 conditions, each with 2 technical replicates

9370 genes which contain multiple isoforms.

Page 12: Measuring Isoform Expression from RNA-Seq data Based on LDA

Data set 1• Histogram of probe number per gene

• 72.43 on average

Page 13: Measuring Isoform Expression from RNA-Seq data Based on LDA
Page 14: Measuring Isoform Expression from RNA-Seq data Based on LDA
Page 15: Measuring Isoform Expression from RNA-Seq data Based on LDA

Data set 2

• Two conditions, 8 qRT-PCR validated isoforms

Condition MCF-7 HME

# of lanes 4 7

# of reads 16059515 19282346

Page 16: Measuring Isoform Expression from RNA-Seq data Based on LDA

Data set 2• Histogram of probe number per gene

• 72.57 on average

Page 17: Measuring Isoform Expression from RNA-Seq data Based on LDA

Comparisons qRT-PCR Cufflinks RSEM LDAseq

TRAP1 uc002cvt.2 HME vs. MCF-7 -(0.4) -(0.6) -(0.4) -(0.9)

TRAP1 uc002cvs.1 HME vs. MCF-7 -(0.5) -(1.1) -(0.8) -(0.6)

TRAP1 HME uc002cvt.2 vs. uc002cvs.1 -(0.9) +(4.8) +(4.9) -(0.8)

TRAP1 MCF-7 uc002cvt.2 vs. uc002cvs.1 -(1.0) +(4.3) +(4.4) -(0.6)

ZNF581/0 uc002qlq.1 HME vs. MCF-7 -(0.3) -(1.2) -(0.9) -(1.4)

ZNF581/0 uc002qlp.1 HME vs. MCF-7 -(1.0) -(1.2) -(0.7) -(1.4)

ZNF581/0 HME uc002qlq.1 vs. uc002qlp.1 +(1.2) +(1.9) +(1.3) +(0.7)

ZNF581/0 MCF-7 uc002qlq.1 vs. uc002qlp.1 +(1.0) +(1.9) +(1.5) +(0.8)

WISP2 uc002xmn.1 HME vs. MCF-7 -(5.6) -(6.9) -(5.4) -(8.3)

WISP2 uc002xmo.1 HME vs. MCF-7 -(4.5) -(5.4) -(4.7) -(6.0)

WISP2 HME uc002xmn.1 vs. uc002xmo.1 +(0.4) (0.0) (0.0) -(0.4)

WISP2 MCF-7 uc002xmn.1 vs. uc002xmo.1 +(1.5) +(1.5) +(0.8) +(1.9)

HIST1H2BD

uc003ngr.1 HME vs. MCF-7 -(4.7) -(3.7) -(2.9) -(2.4)

HIST1H2BD

uc003ngs.1 HME vs. MCF-7 -(5.2) -(4.2) -(4.5) -(3.9)

HIST1H2BD

HME uc003ngr.1 vs. uc003ngs.1 -(5.4) +(2.4) +(1.8) -(2.4)

HIST1H2BD

MCF-7 uc003ngr.1 vs. uc003ngs.1 -(5.9) +(1.8) +(0.2) -(3.9)

Correlation coefficient with qRT-PCR results - 0.4899 0.5279 0.8596# of wrong regulation direction - 4 5 1

Page 18: Measuring Isoform Expression from RNA-Seq data Based on LDA

Modelling Multi-response Surfaces for Airfoil Design with

Multiple Output Gaussian Process Regression

Page 19: Measuring Isoform Expression from RNA-Seq data Based on LDA

• Gaussian processes

• Multiple output GP

• MGP in airfoil design

Page 20: Measuring Isoform Expression from RNA-Seq data Based on LDA

Gaussian Processes

• A Gaussian process (GP) is used to describe a distribution over functions.

• A GP is a collection of random variables, any finite number of which have a joint Gaussian distribution.

Page 21: Measuring Isoform Expression from RNA-Seq data Based on LDA

Gaussian Processes

The mean function and the covariance function are defined,

The GP can be written as

Page 22: Measuring Isoform Expression from RNA-Seq data Based on LDA

Gaussian Processes

The mean function and the covariance function are defined,

The GP can be written as

Page 23: Measuring Isoform Expression from RNA-Seq data Based on LDA

Gaussian Processes

The covariance function implies the prior distribution over functions.

Page 24: Measuring Isoform Expression from RNA-Seq data Based on LDA
Page 25: Measuring Isoform Expression from RNA-Seq data Based on LDA

Gaussian Processes

Prediction with noise-free observations,

Page 26: Measuring Isoform Expression from RNA-Seq data Based on LDA

Gaussian Processes

Page 27: Measuring Isoform Expression from RNA-Seq data Based on LDA

Multiple Outputs

Page 28: Measuring Isoform Expression from RNA-Seq data Based on LDA

Convolution processes for multiple outputs

• Consider a set of D output functions

where is the input domain. is expressed as

Page 29: Measuring Isoform Expression from RNA-Seq data Based on LDA

Convolution processes for multiple outputs

Consider more than one latent function

are taken to be draw from a zero-mean GP with

Page 30: Measuring Isoform Expression from RNA-Seq data Based on LDA

Convolution processes for multiple outputs

If

Page 31: Measuring Isoform Expression from RNA-Seq data Based on LDA

Convolution processes for multiple outputs

If the kernel smoothing function is

and the covariance for the latent process is

the covariance for the multiple responses is

Page 32: Measuring Isoform Expression from RNA-Seq data Based on LDA

Convolution processes for multiple outputs

Each of the outputs can be corrupted with an independent process,

The likelihood is

and the prediction is

Page 33: Measuring Isoform Expression from RNA-Seq data Based on LDA

  Group A Group BLabel f xf t Label f

xf t  

A1 1 3 11 B1 1.12 4.79 13.56  A2 1 4 14 B2 1.34 3.11 11.53  A3 1 5 13 B3 1.66 3.83 13.67  A4 2 3 12 B4 2.04 4.20 17.05  A5 2 4 15 B5 2.29 5.15 15.10  A6 2 5 11 B6 2.63 3.02 16.72  A7 3 3 16 B7 2.67 2.96 12.43  A8 3 4 11 B8 2.94 4.92 11.28  A9 3 5 13 B9 2.99 3.09 14.17  

A10 4 3 11 B10 3.51 4.12 13.33  

A11 4 4 13 B11 3.88 3.75 11.61  

A12 4 5 16 B12 3.93 3.20 17.07  

A13 5 3 11 B13 4.08 5.03 11.10  

A14 5 4 12 B14 4.26 3.40 12.04  

A15 5 5 17 B15 4.53 4.75 12.43  

A16 6 3 14 B16 4.59 2.76 13.21  

A17 6 4 15 B17 4.77 4.69 14.34  

A18 6 5 12 B18 5.76 4.92 13.11  

Page 34: Measuring Isoform Expression from RNA-Seq data Based on LDA

Correlation between Cl and Cd

R^2=0.8525

Page 35: Measuring Isoform Expression from RNA-Seq data Based on LDA

Pdf of predictive joint distribution

• A12

Page 36: Measuring Isoform Expression from RNA-Seq data Based on LDA

Method RMSE

SOGP

SEISO 0.2446

SEARD 0.2337

NN 0.1498

NN+SEISO 0.2668

NN+SEARD 0.2684

BP 0.1491

RBF 0.2279

MOGP 0.1000

Page 37: Measuring Isoform Expression from RNA-Seq data Based on LDA
Page 38: Measuring Isoform Expression from RNA-Seq data Based on LDA

Inverse design

• Pressure distribution -> airfoil shape

Page 39: Measuring Isoform Expression from RNA-Seq data Based on LDA

Method RMSE

SOGP

SEISO 0.1253

SEARD 0.1057

NN 0.1428

NN+SEISO 0.1333

NN+SEARD 0.1230

BP 0.1601

RBF 0.1758

MOGP 0.1008

Page 40: Measuring Isoform Expression from RNA-Seq data Based on LDA
Page 41: Measuring Isoform Expression from RNA-Seq data Based on LDA

Acknowledgment

• 李蒙• 闫国启• 张礼• 祝青雷