Measuring Isoform Expression from RNA-Seq data Based on LDA

Preview:

DESCRIPTION

Measuring Isoform Expression from RNA-Seq data Based on LDA. 刘学军 2012.9.21. Outlines. Background Modeling RNA-Seq data Results. Alternatively spliced isoforms. RNA-Seq data – an example. reference ACGTCCCC 12 ACGTC reads 8 CGTCC reads - PowerPoint PPT Presentation

Citation preview

Measuring Isoform Expression from RNA-Seq data Based on

LDA

刘学军2012.9.21

Outlines

• Background

• Modeling RNA-Seq data

• Results

Alternatively spliced isoforms

RNA-Seq data – an example

reference ACGTCCCC

12 ACGTC reads

8 CGTCC reads

9 GTCCC reads

5 TCCCC reads

This gene can be summarized by a sequence of counts 12, 8, 9, 5.

Structure of RNA-seq data

LDA, Latent dirichlet allocation

LDAseq - probe

LDAseq

Convert \theta to expression level

• Obtain P(\theta|D)

• Normalize counts to sequencing depth and isoform length:

Workflow of LDAseq

Data set 1

Tissues liverliver1 liver2

brainbrain1 brain2

musclemuscle1 muscle2

Number of reads

31578097 44173056 31116663 47781892 31763031 38007934

3 conditions, each with 2 technical replicates

9370 genes which contain multiple isoforms.

Data set 1• Histogram of probe number per gene

• 72.43 on average

Data set 2

• Two conditions, 8 qRT-PCR validated isoforms

Condition MCF-7 HME

# of lanes 4 7

# of reads 16059515 19282346

Data set 2• Histogram of probe number per gene

• 72.57 on average

Comparisons qRT-PCR Cufflinks RSEM LDAseq

TRAP1 uc002cvt.2 HME vs. MCF-7 -(0.4) -(0.6) -(0.4) -(0.9)

TRAP1 uc002cvs.1 HME vs. MCF-7 -(0.5) -(1.1) -(0.8) -(0.6)

TRAP1 HME uc002cvt.2 vs. uc002cvs.1 -(0.9) +(4.8) +(4.9) -(0.8)

TRAP1 MCF-7 uc002cvt.2 vs. uc002cvs.1 -(1.0) +(4.3) +(4.4) -(0.6)

ZNF581/0 uc002qlq.1 HME vs. MCF-7 -(0.3) -(1.2) -(0.9) -(1.4)

ZNF581/0 uc002qlp.1 HME vs. MCF-7 -(1.0) -(1.2) -(0.7) -(1.4)

ZNF581/0 HME uc002qlq.1 vs. uc002qlp.1 +(1.2) +(1.9) +(1.3) +(0.7)

ZNF581/0 MCF-7 uc002qlq.1 vs. uc002qlp.1 +(1.0) +(1.9) +(1.5) +(0.8)

WISP2 uc002xmn.1 HME vs. MCF-7 -(5.6) -(6.9) -(5.4) -(8.3)

WISP2 uc002xmo.1 HME vs. MCF-7 -(4.5) -(5.4) -(4.7) -(6.0)

WISP2 HME uc002xmn.1 vs. uc002xmo.1 +(0.4) (0.0) (0.0) -(0.4)

WISP2 MCF-7 uc002xmn.1 vs. uc002xmo.1 +(1.5) +(1.5) +(0.8) +(1.9)

HIST1H2BD

uc003ngr.1 HME vs. MCF-7 -(4.7) -(3.7) -(2.9) -(2.4)

HIST1H2BD

uc003ngs.1 HME vs. MCF-7 -(5.2) -(4.2) -(4.5) -(3.9)

HIST1H2BD

HME uc003ngr.1 vs. uc003ngs.1 -(5.4) +(2.4) +(1.8) -(2.4)

HIST1H2BD

MCF-7 uc003ngr.1 vs. uc003ngs.1 -(5.9) +(1.8) +(0.2) -(3.9)

Correlation coefficient with qRT-PCR results - 0.4899 0.5279 0.8596# of wrong regulation direction - 4 5 1

Modelling Multi-response Surfaces for Airfoil Design with

Multiple Output Gaussian Process Regression

• Gaussian processes

• Multiple output GP

• MGP in airfoil design

Gaussian Processes

• A Gaussian process (GP) is used to describe a distribution over functions.

• A GP is a collection of random variables, any finite number of which have a joint Gaussian distribution.

Gaussian Processes

The mean function and the covariance function are defined,

The GP can be written as

Gaussian Processes

The mean function and the covariance function are defined,

The GP can be written as

Gaussian Processes

The covariance function implies the prior distribution over functions.

Gaussian Processes

Prediction with noise-free observations,

Gaussian Processes

Multiple Outputs

Convolution processes for multiple outputs

• Consider a set of D output functions

where is the input domain. is expressed as

Convolution processes for multiple outputs

Consider more than one latent function

are taken to be draw from a zero-mean GP with

Convolution processes for multiple outputs

If

Convolution processes for multiple outputs

If the kernel smoothing function is

and the covariance for the latent process is

the covariance for the multiple responses is

Convolution processes for multiple outputs

Each of the outputs can be corrupted with an independent process,

The likelihood is

and the prediction is

  Group A Group BLabel f xf t Label f

xf t  

A1 1 3 11 B1 1.12 4.79 13.56  A2 1 4 14 B2 1.34 3.11 11.53  A3 1 5 13 B3 1.66 3.83 13.67  A4 2 3 12 B4 2.04 4.20 17.05  A5 2 4 15 B5 2.29 5.15 15.10  A6 2 5 11 B6 2.63 3.02 16.72  A7 3 3 16 B7 2.67 2.96 12.43  A8 3 4 11 B8 2.94 4.92 11.28  A9 3 5 13 B9 2.99 3.09 14.17  

A10 4 3 11 B10 3.51 4.12 13.33  

A11 4 4 13 B11 3.88 3.75 11.61  

A12 4 5 16 B12 3.93 3.20 17.07  

A13 5 3 11 B13 4.08 5.03 11.10  

A14 5 4 12 B14 4.26 3.40 12.04  

A15 5 5 17 B15 4.53 4.75 12.43  

A16 6 3 14 B16 4.59 2.76 13.21  

A17 6 4 15 B17 4.77 4.69 14.34  

A18 6 5 12 B18 5.76 4.92 13.11  

Correlation between Cl and Cd

R^2=0.8525

Pdf of predictive joint distribution

• A12

Method RMSE

SOGP

SEISO 0.2446

SEARD 0.2337

NN 0.1498

NN+SEISO 0.2668

NN+SEARD 0.2684

BP 0.1491

RBF 0.2279

MOGP 0.1000

Inverse design

• Pressure distribution -> airfoil shape

Method RMSE

SOGP

SEISO 0.1253

SEARD 0.1057

NN 0.1428

NN+SEISO 0.1333

NN+SEARD 0.1230

BP 0.1601

RBF 0.1758

MOGP 0.1008

Acknowledgment

• 李蒙• 闫国启• 张礼• 祝青雷