Upload
brinly
View
29
Download
0
Embed Size (px)
DESCRIPTION
Measuring Isoform Expression from RNA-Seq data Based on LDA. 刘学军 2012.9.21. Outlines. Background Modeling RNA-Seq data Results. Alternatively spliced isoforms. RNA-Seq data – an example. reference ACGTCCCC 12 ACGTC reads 8 CGTCC reads - PowerPoint PPT Presentation
Citation preview
Measuring Isoform Expression from RNA-Seq data Based on
LDA
刘学军2012.9.21
Outlines
• Background
• Modeling RNA-Seq data
• Results
Alternatively spliced isoforms
RNA-Seq data – an example
reference ACGTCCCC
12 ACGTC reads
8 CGTCC reads
9 GTCCC reads
5 TCCCC reads
This gene can be summarized by a sequence of counts 12, 8, 9, 5.
Structure of RNA-seq data
LDA, Latent dirichlet allocation
LDAseq - probe
LDAseq
Convert \theta to expression level
• Obtain P(\theta|D)
• Normalize counts to sequencing depth and isoform length:
Workflow of LDAseq
Data set 1
Tissues liverliver1 liver2
brainbrain1 brain2
musclemuscle1 muscle2
Number of reads
31578097 44173056 31116663 47781892 31763031 38007934
3 conditions, each with 2 technical replicates
9370 genes which contain multiple isoforms.
Data set 1• Histogram of probe number per gene
• 72.43 on average
Data set 2
• Two conditions, 8 qRT-PCR validated isoforms
Condition MCF-7 HME
# of lanes 4 7
# of reads 16059515 19282346
Data set 2• Histogram of probe number per gene
• 72.57 on average
Comparisons qRT-PCR Cufflinks RSEM LDAseq
TRAP1 uc002cvt.2 HME vs. MCF-7 -(0.4) -(0.6) -(0.4) -(0.9)
TRAP1 uc002cvs.1 HME vs. MCF-7 -(0.5) -(1.1) -(0.8) -(0.6)
TRAP1 HME uc002cvt.2 vs. uc002cvs.1 -(0.9) +(4.8) +(4.9) -(0.8)
TRAP1 MCF-7 uc002cvt.2 vs. uc002cvs.1 -(1.0) +(4.3) +(4.4) -(0.6)
ZNF581/0 uc002qlq.1 HME vs. MCF-7 -(0.3) -(1.2) -(0.9) -(1.4)
ZNF581/0 uc002qlp.1 HME vs. MCF-7 -(1.0) -(1.2) -(0.7) -(1.4)
ZNF581/0 HME uc002qlq.1 vs. uc002qlp.1 +(1.2) +(1.9) +(1.3) +(0.7)
ZNF581/0 MCF-7 uc002qlq.1 vs. uc002qlp.1 +(1.0) +(1.9) +(1.5) +(0.8)
WISP2 uc002xmn.1 HME vs. MCF-7 -(5.6) -(6.9) -(5.4) -(8.3)
WISP2 uc002xmo.1 HME vs. MCF-7 -(4.5) -(5.4) -(4.7) -(6.0)
WISP2 HME uc002xmn.1 vs. uc002xmo.1 +(0.4) (0.0) (0.0) -(0.4)
WISP2 MCF-7 uc002xmn.1 vs. uc002xmo.1 +(1.5) +(1.5) +(0.8) +(1.9)
HIST1H2BD
uc003ngr.1 HME vs. MCF-7 -(4.7) -(3.7) -(2.9) -(2.4)
HIST1H2BD
uc003ngs.1 HME vs. MCF-7 -(5.2) -(4.2) -(4.5) -(3.9)
HIST1H2BD
HME uc003ngr.1 vs. uc003ngs.1 -(5.4) +(2.4) +(1.8) -(2.4)
HIST1H2BD
MCF-7 uc003ngr.1 vs. uc003ngs.1 -(5.9) +(1.8) +(0.2) -(3.9)
Correlation coefficient with qRT-PCR results - 0.4899 0.5279 0.8596# of wrong regulation direction - 4 5 1
Modelling Multi-response Surfaces for Airfoil Design with
Multiple Output Gaussian Process Regression
• Gaussian processes
• Multiple output GP
• MGP in airfoil design
Gaussian Processes
• A Gaussian process (GP) is used to describe a distribution over functions.
• A GP is a collection of random variables, any finite number of which have a joint Gaussian distribution.
Gaussian Processes
The mean function and the covariance function are defined,
The GP can be written as
Gaussian Processes
The mean function and the covariance function are defined,
The GP can be written as
Gaussian Processes
The covariance function implies the prior distribution over functions.
Gaussian Processes
Prediction with noise-free observations,
Gaussian Processes
Multiple Outputs
Convolution processes for multiple outputs
• Consider a set of D output functions
where is the input domain. is expressed as
Convolution processes for multiple outputs
Consider more than one latent function
are taken to be draw from a zero-mean GP with
Convolution processes for multiple outputs
If
Convolution processes for multiple outputs
If the kernel smoothing function is
and the covariance for the latent process is
the covariance for the multiple responses is
Convolution processes for multiple outputs
Each of the outputs can be corrupted with an independent process,
The likelihood is
and the prediction is
Group A Group BLabel f xf t Label f
xf t
A1 1 3 11 B1 1.12 4.79 13.56 A2 1 4 14 B2 1.34 3.11 11.53 A3 1 5 13 B3 1.66 3.83 13.67 A4 2 3 12 B4 2.04 4.20 17.05 A5 2 4 15 B5 2.29 5.15 15.10 A6 2 5 11 B6 2.63 3.02 16.72 A7 3 3 16 B7 2.67 2.96 12.43 A8 3 4 11 B8 2.94 4.92 11.28 A9 3 5 13 B9 2.99 3.09 14.17
A10 4 3 11 B10 3.51 4.12 13.33
A11 4 4 13 B11 3.88 3.75 11.61
A12 4 5 16 B12 3.93 3.20 17.07
A13 5 3 11 B13 4.08 5.03 11.10
A14 5 4 12 B14 4.26 3.40 12.04
A15 5 5 17 B15 4.53 4.75 12.43
A16 6 3 14 B16 4.59 2.76 13.21
A17 6 4 15 B17 4.77 4.69 14.34
A18 6 5 12 B18 5.76 4.92 13.11
Correlation between Cl and Cd
R^2=0.8525
Pdf of predictive joint distribution
• A12
Method RMSE
SOGP
SEISO 0.2446
SEARD 0.2337
NN 0.1498
NN+SEISO 0.2668
NN+SEARD 0.2684
BP 0.1491
RBF 0.2279
MOGP 0.1000
Inverse design
• Pressure distribution -> airfoil shape
Method RMSE
SOGP
SEISO 0.1253
SEARD 0.1057
NN 0.1428
NN+SEISO 0.1333
NN+SEARD 0.1230
BP 0.1601
RBF 0.1758
MOGP 0.1008
Acknowledgment
• 李蒙• 闫国启• 张礼• 祝青雷