Upload
nils-gehlenborg
View
165
Download
3
Embed Size (px)
Citation preview
Approaches for the Integration of Visual and Computational Analysis of Biomedical Data
HARVARD MEDICAL SCHOOL DEPARTMENT OF BIOMEDICAL INFORMATICS
NILS GEHLENBORG
@nils_gehlenborg
http://gehlenborglab.org
FRITZ LEKSCHAS HARVARD MEDICAL SCHOOL
BIG PILES OF DATA …
Data Repositories
general specialized
ArrayExpress GEO
Metabolights PRIDE
dbGAP …
ENCODE Roadmap Epigenomics
…
… OFFER OPPORTUNITIES …
SINGLE OR FEW DATA SETS
Test hypotheses without generating new data.
Use published data as supporting evidence for findings based on our your own data sets.
MANY DATA SETS
Conduct meta analyses, e.g. characterize expression patterns in human tissues or to link diseases.
M. Lukk, et al., Nature Biotechnology, 28(4):322–324 (2010)
S. Suthram et al.,PLoS Computational Biology 6(2)(2010)
SINGLE OR FEW DATA SETS
Test hypotheses without generating new data.
Use published data as supporting evidence for findings based on our your own data sets.
MANY DATA SETS
Conduct meta analyses, e.g. characterize expression patterns in human tissues or to link diseases.
COMMON BEHAVIOR OF RESEARCH PARASITES!
N Gehlenborg et al. , manuscript in preparation
!
!
|
DATA REPOSITORY
VISUALIZATION TOOLS
ANALYSIS PIPELINES
N Gehlenborg et al. , manuscript in preparation
!
!
|
DATA REPOSITORY
VISUALIZATION TOOLS
ANALYSIS PIPELINES
! ANALYSIS PIPELINES
N Gehlenborg et al. , manuscript in preparation
!
!
|
DATA REPOSITORY
VISUALIZATION TOOLS
ANALYSIS PIPELINES!
! ANALYSIS PIPELINES
N Gehlenborg et al. , manuscript in preparation
!
!
|
DATA REPOSITORY
VISUALIZATION TOOLS
ANALYSIS PIPELINES
GALAXY! Toolshed
Workflow Editor
Tools
REST API
! ANALYSIS PIPELINES
N Gehlenborg et al. , manuscript in preparation
!
!
|
DATA REPOSITORY
VISUALIZATION TOOLS
ANALYSIS PIPELINES
GALAXY! Toolshed
Workflow Editor
Tools
REST API
Workflow Inputs
Workflow Outputs
N Gehlenborg et al. , manuscript in preparation
!
!
|
DATA REPOSITORY
VISUALIZATION TOOLS
ANALYSIS PIPELINES
http://www.refinery-platform.org
… BUT NOT SO FAST!
Z
Text-Based Search
Data Sets
Metadata
Data Files
X Y
Ontologies
Z
A1
X Y
Z
A2A3A4
X Y
Z- -
K K K K
L M L M
Free Text
AnnotationMapping
K
L, M
X, Y
Z
X YZX Y
Terminal
Root
subc
lass
of
Keywords
Z
Text-Based Search
Data Sets
Metadata
Data Files
X Y
Ontologies
Z
A1
X Y
Z
A2A3A4
X Y
Z- -
K K K K
L M L M
Free Text
AnnotationMapping
K
L, M
X, Y
Z
X YZX Y
Terminal
Root
subc
lass
of
Keywords
Z
Text-Based Search
Data Sets
Metadata
Data Files
X Y
Ontologies
Z
A1
X Y
Z
A2A3A4
X Y
Z- -
K K K K
L M L M
Free Text
AnnotationMapping
K
L, M
X, Y
Z
X YZX Y
Terminal
Root
subc
lass
of
Keywords
Z
Text-Based Search
Data Sets
Metadata
Data Files
X Y
Ontologies
Z
A1
X Y
Z
A2A3A4
X Y
Z- -
K K K K
L M L M
Free Text
AnnotationMapping
K
L, M
X, Y
Z
X YZX Y
Terminal
Root
subc
lass
of
Keywords
Z
Text-Based Search
Data Sets
Metadata
Data Files
X Y
Ontologies
Z
A1
X Y
Z
A2A3A4
X Y
Z- -
K K K K
L M L M
Free Text
AnnotationMapping
K
L, M
X, Y
Z
X YZX Y
Terminal
Root
subc
lass
of
Keywords
X
Semantic VisualExploration
YZ
Text-Based Search
Data Sets
Metadata
Data Files
X Y
Ontologies
Z
A1
X Y
Z
A2A3A4
X Y
Z- -
K K K K
L M L M
Free Text
AnnotationMapping
K
L, M
X, Y
Z
X YZX Y
SATORI
Terminal
Root
subc
lass
of
Keywords
YX
Z
Z
X
SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories
http://satori.refinery-platform.org
Data set
Repository
Collection of interest
Data Analyst Group Leader Data Curator
Data set
Repository
Collection of interest
Data Analyst Group Leader Data Curator
Data set
Repository
Collection of interest
Data Analyst Group Leader Data Curator
Data set
Repository
Collection of interest
Data Analyst Group Leader Data Curator
Need 1 find data sets that match certain experimental characteristics.
Need 2 find data sets that are similar (or dissimilar) to given data sets.
Need 3 get an overview of the distribution of the experimental characteristics across a collection of data sets.
Need 4 get an overview of the annotation term hierarchy and term usage.
Peter Pirolli and Stu Card
SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories
http://satori.refinery-platform.org
C A B
C
List graphB C
B
Tree
Tree map A
A B
C
Data sets
BC
BC
BC
CB
CB
A B
C
Scenario 1:
Scenario 2:
Scenario 3:
AnnotationsTerm
1 2 3 4
SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories
http://satori.refinery-platform.org
SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories
http://satori.refinery-platform.org
The Art Institute of Chicago
HARVARD MEDICAL SCHOOL
JOHANNES KEPLER UNIVERSITY LINZ Stefan Luger, Holger Stitz, Marc Streit
Web http://satori.refinery-platform.org · http://refinery-platform.org
AcknowledgementsPeter J Park & all members of the Computational Genomics Lab Fritz Lekschas, Jennifer K Marx, Scott Ouellette, Anton Xue, Psalm Haseley
HARVARD SCHOOL OF PUBLIC HEALTH Ilya Sytchev, Shannan Ho Sui
UNIVERSITY OF SHEFFIELD David R Jones, Winston Hide
Funding NIH/NHGRI R00 HG007583, Harvard Stem Cell Institute
We are hiring postdocs & developers!
HARVARD MEDICAL SCHOOL DEPARTMENT OF BIOMEDICAL INFORMATICS
See http://gehlenborglab.org or http://dbmi.med.harvard.edu for details.
Data visualization, analysis, and management for: • genomic structural variants • dynamics of the 3D genome • cancer subtypes in patient cohorts • exploration tools for data repositories • provenance graphs
X
B
A
D
A
X XX Term Terminal term To be deleted
AA
X To be duplicated
A A
C
ABA
C
B
C'
0 0 00 5 5 5 5
0 5
1 5
5 10 5 10
Term size Cumulative sizeX1 2
2 7
2 7
1 5
D
C
F D
C
F
F'
1. Global 2. Tree Map 3. Node-Link Diagram
5 10
1 5 1 105 5
0 10
G G
BB
B
C
C
C E EA'C