31
Deep Learning based Multi-omics integration A survey

Deep learning based multi-omics integration, a survey

Embed Size (px)

Citation preview

Page 1: Deep learning based multi-omics integration, a survey

Deep Learning based Multi-omics integration

A survey

Page 2: Deep learning based multi-omics integration, a survey

Deep Learning in Bioinformatics

Min, Seonwoo, Byunghan Lee, and Sungroh Yoon. "Deep learning in bioinformatics." Briefings in Bioinformat-ics (2016)

Page 3: Deep learning based multi-omics integration, a survey

Outline• Summarize three related works on deep learning based

feature extraction / survival prediction on omics data• Unsupervised feature construction and knowledge extraction

from genome-wide assays of breast cancer with denoising au-toencoders• A deep learning approach for cancer detection and relevant

gene identification• Deep Learning based multi-omics integration robustly predicts

survival in liver cancer

Page 4: Deep learning based multi-omics integration, a survey

Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencodersPacific Symposium on Biocomputing, 2015

Page 5: Deep learning based multi-omics integration, a survey

Denoising Auto-Encoder (DAE)• Build features that recon-

struct initial input data from corrupted data• Generate robust features• Unsupervised learning• Extract features in the

non-linear space

Page 6: Deep learning based multi-omics integration, a survey

Data• Two largest breast cancer dataset• Train DAs and identify predictive features with METABRIC dataset• 2137 samples, 3000 2520 genes• gene expression data from European Genomephenome Ar-

chive• Evaluate with TCGA dataset independently• 547 samples, 2520 genes

Page 7: Deep learning based multi-omics integration, a survey

Features to clinical characteristics• Genes are not linked to their neigh-

bors• Genes are linked by transcription

factors, pathway memberships• Are constructed features linked to

clinical and molecular features of the samples?• Categorize tumor / normal samples• Categorize ER+/- samples• Categorize samples into molecular

subtypes(Luminal A/B, Basal-like, HER2-enriched, Normal-like)

Page 8: Deep learning based multi-omics integration, a survey

Features to clinical characteristics•  classifying tumor from

normal samples• classifying ER + from ER -

samples

Robust performance across datasets

Page 9: Deep learning based multi-omics integration, a survey

Features to transcription factor• Breast cancer related transcription factors are linked to these

high-weight features (Node58)• It contained genes that reflect activity of key ER-associated TFs

Most genes gave zero or low weight to a hidden node

High positive weightHigh negative weight

Page 10: Deep learning based multi-omics integration, a survey

Features to patient survival• Node whose activities best sepa-

rated two high / low survival groups (Node5)• Highly predictive of patient sur-

vival

Page 11: Deep learning based multi-omics integration, a survey

Features to Biological pathways• Pathways significantly associated with genes that con-

sistently gave high weights to a nodePID pathways enriched in Node5(5th fea-ture)

Page 12: Deep learning based multi-omics integration, a survey

Summary• Unsupervised feature construction based on DAEs and

interpretation• Apply to a breast cancer gene expression data• Consistent results across different datasets• In the future..• Multiple layers of stacked DAEs• Consistency across datasets will useful for data integration• Limitations for large-scale data integration

Page 13: Deep learning based multi-omics integration, a survey

A deep learning approach for cancer de-tection and relevant gene identificationPacific Symposium on Biocomputing, 2016

Page 14: Deep learning based multi-omics integration, a survey

RNA-seqsamples

TCGAHealth

yCancer

Test Train

SDAE fea-turesDCGs

ModelValidation

weights

Overview

Supervised classification(cancer detection)

Highly interactive genes identification

1210 breast cancer samples

Page 15: Deep learning based multi-omics integration, a survey

Stacked Denoising Auto-Encoder• Extract functional features from high dimensional, noisy gene ex-

pression profiles with reduced loss of information• Select a layer has both low dimension and low validation error

Page 16: Deep learning based multi-omics integration, a survey

Classification result• Classify cancer samples from

healthy control samples• Feature extraction

• SDAE• Differentially expressed genes

(DIFFEXP)• PCA• KPCA (RBF kernel)

• Classification model• SVM• SVM (RBF kernel)• single-layer ANN

Page 17: Deep learning based multi-omics integration, a survey

Deeply connected genes• Genes with the largest weights in W (the product of the

weight matrices for each layer) are the most strongly connected to the extracted and highly predictive fea-tures

But lower performance than SDAE feateures

….

Page 18: Deep learning based multi-omics integration, a survey

Summary• SDAE to transform high-dimensional, noisy gene expression data to a

lower dimensional, meaningful representation• Classify breast cancer samples from the healthy control samples using

new compact features• Identify a set of highly interactive genes critical for the diagnosis of

breast cancer• In the future..

• Need to improve the extraction of DCGs• Limitation on the requirement for large data sets• Identify cross-cancer biomarkers through the analysis of aggregated heteroge-

neous cancer data

Page 19: Deep learning based multi-omics integration, a survey

Deep Learning based multi-omics integra-tionrobustly predicts survival in liver cancerpreprint, 2017

Page 20: Deep learning based multi-omics integration, a survey

360 tumor samples

15629 genes 365 miRNAs 19883 genes

100 features

37 features

high/poor survival

Page 21: Deep learning based multi-omics integration, a survey

Why Autoencoders?• Produce features linked to

clinical outcomes• Analyze high-dimensional

gene expression data• Integrate heterogeneous

data• Interpret the biological func-

tions (aggregate genes shar-ing similar pathways)

Page 22: Deep learning based multi-omics integration, a survey

Classification result

PCA

Page 23: Deep learning based multi-omics integration, a survey

Classification result

Single-omics based DL models

Page 24: Deep learning based multi-omics integration, a survey

Validation in five cohorts• Robustness of the model at predicting survival out-

comes

Page 25: Deep learning based multi-omics integration, a survey

Adding clinical information• Age, Stage, Grade, Race, Risk factors (HBV, HCV, Alco-

hol, …)• DL-based multi-omics model performs sufficiently well

even without clinical features

Page 26: Deep learning based multi-omics integration, a survey

Functional analysis of the survival-subgroups

• KEGG pathway analysis to pinpoint the pathways en-riched in two subtypes• Two subtypes have different

and disjoint active pathways

Page 27: Deep learning based multi-omics integration, a survey

Enriched pathway-gene analysis for upregulated genes• S1 aggressive tu-

mor sub-group

• Enriched with can-cer related path-ways

Page 28: Deep learning based multi-omics integration, a survey

Enriched pathway-gene analysis for upregulated genes• S2 less aggressive tu-

mor sub-group

• Activated metabolism related pathways

Page 29: Deep learning based multi-omics integration, a survey

Summary• Contributions• Identified two subtypes from the molecular level• Consistent performance implying the reliability and robustness

of the model• Sufficient performance without adding clinical features• AE has much more efficiency to infer features linked to survival• Validated in five additional cohorts

• Challenges• The absence of cluster label information in original reports• Lack of survival data in some cases

Page 30: Deep learning based multi-omics integration, a survey

Conclusion• Feature extraction with SDAE• Robust to noisy datasets• Extract meaningful features and reflect both linear and non-

linear relationships• Consistent performance, good for multi-omics integration

• Multi-omics integration• More sophisticated strategy to combine multiple features• May incorporate pathways, handle overlapping genes

Page 31: Deep learning based multi-omics integration, a survey

Thank you!Q & A