Upload
joel-saltz
View
103
Download
2
Embed Size (px)
DESCRIPTION
Citation preview
Computa(onal Pathology: Research
Joel Saltz MD, PhD Chair Biomedical Informa(cs Stony
Brook University Associate Director for Informa(cs,
Stony Brook Cancer Center
Computa(onal Pathology Research
• Computa(onal Science – Context • High Dimensional Fused Informa(cs • Internet of People and Things
Computa(onal Science
Detect and track changes in data during production
Invert data for reservoir properties Detect and track reservoir changes
Assimilate data & reservoir properties into
the evolving reservoir model Use simulation and optimization to guide future production
Example: Oil Field Management – Joint ITR with Mary Wheeler, Paul Stoffa
Coupled Ground Water and Surface Water Simula(ons
Multiple codes -- e.g. fluid code, contaminant transport code Different space and time scales Data from a given fluid code run is used in different contaminant transport code scenarios
Pete Beckman – Workshop on Big Data and Extreme Scale Computing
Titan – Peak Speed 30,000,000,000,000,000 floa(ng point opera(ons per second!
Pete Beckman – Workshop on Big Data and Extreme Scale Computing
Computa(onal Pathology: High Dimensional Fused-‐Informa(cs
• Anatomic/func(onal characteriza(on at fine and gross level
• Integrate of anatomic/func(onal characteriza(on, mul(ple types of “omic” informa(on, outcome
• Predict treatment outcome, select, monitor treatments
• Integrated analysis and presenta(on of observa(ons, features analy(cal results – human and machine generated
Ex-‐vivo Imaging
Pa.ent Outcome
In vivo imaging
“Omic” Data
Correlating Imaging Phenotypes with Genomic Signatures: Scientific Opportunities
(Imaging Genomics Workshop NCI June 2013)
Clinical Approach and Use • Development of imaging+analysis methods to
characterize heterogeneity • within a tumor at one time point • evolution over time • among different tumor types
• Development of imaging metrics that: • can predict and detect emergence of resistance? • correlates with genomic heterogeneity? • correlates with habitat heterogeneity? • can identify more homogeneous sub-types
Tumor Heterogeneity
Marusyk 2012
Pathology Analy(cal Imaging
• Provide rich informa(on about morphological and func(onal characteris(cs
• Image analysis, feature extrac(on on mul(ple scales • Spa(ally mapped “omics” • Mul(ple microscopy modali(es
Glass Slides Scanning Whole Slide Images Image Analysis
• Quantitative Feature Analysis in Pathology: Emory In Silico Center for Brain Tumor Research (PI = Dan Brat, PD= Joel Saltz)
• NLM/NCI: Integrative Analysis/Digital Pathology R01LM011119, R01LM009239 (Dual PIs Joel Saltz, David Foran)
• New - NCI: 1U24CA180924-01A1 Tools to Analyze Morphology and Spatially Mapped Molecular Data (PI=Saltz)
Direct Study of Relationship Between vs
Lee Cooper, Carlos Moreno
Clustering identifies three morphological groups • Analyzed 200 million nuclei from 162 TCGA GBMs (462 slides)
• Named for functions of associated genes: Cell Cycle (CC), Chromatin Modification (CM), Protein Biosynthesis (PB)
• Prognostically-significant (logrank p=4.5e-4)
Feat
ure
Indi
ces
CC CM PB
10
20
30
40
500 500 1000 1500 2000 2500 3000
0
0.2
0.4
0.6
0.8
1
Days
Surv
ival
CCCMPB
Associations
Gene Expression Correlates of GBM with High Oligo-Astro Ratio
Oligo Related Genes Myelin Basic Protein Proteolipoprotein HoxD1 Nuclear features most Associated with Oligo Signature Genes: Circularity (high) Eccentricity (low)
Microenvironment and Master Regulators
• Extent of Necrosis Related Expression of Master Regulators of the Mesenchymal Transi(on
Necrosis and C/EBP-β
Computa(on and Data Management: Requirements and Challenges
• Explosion of derived data – 105x105 pixels per image – 1 million objects per image – Hundreds to thousands of images per study
• High computa(onal complexity – Image analysis, feature extrac(on, machine learning pipelines
– Spa(al queries involve heavy duty geometric computa(ons
Projec(on – 2025
• 100K – 1M pathology slides/hospital/year • 2GB compressed per slide • 1-‐10 slides used for Pathologist computer aided diagnosis
• 100-‐10K slides used in hospital Quality control • Groups of 100K+ slides used for clinical research studies -‐-‐ Combined with molecular, outcome data
HPC: Tools for Image Analysis, Feature Extrac.on, Machine Learning Pipelines
Large Scale Data Management
Ø Data model capturing mul(-‐faceted informa(on including markups, annota(ons, algorithm provenance, specimen, etc.
Ø Support for complex rela(onships and spa(al query: mul(-‐level granulari(es, rela(onships between markups and annota(ons, spa(al and nested rela(onships
Ø Highly op(mized spa(al query and analyses Ø Implemented in a variety of ways including op(mized CPU/GPU, Hadoop/HDFS and IBM DB2
PAIS Database Ø Implemented with IBM DB2 for large scale pathology image metadata (~million markups per slide) Ø Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc. Ø Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships Ø Support for high-level data statistical analysis
Spa(al Centric – Pathology Imaging “GIS” Point query: human marked point inside a nucleus
.
Window query: return markups contained in a rectangle
Spa.al join query: algorithm valida(on/comparison
Containment query: nuclear feature aggrega(on in tumor regions
Fusheng Wang
MICCAI 2014 BRAIN TUMOR
Classification and Segmentation Challenges
TCGA
TCIA
IMAGING CHALLENGE
DIGITAL PATHOLOGY CHALLENGE
Phase 1: Training June 20 -‐ July 31
Phase 2: Leader Board Aug 1 -‐ Aug 29
Phase 3: Test Sept 8 -‐ Sept 12
For more informa+on about these challenges and a related workshop on September 14, 2014 at MICCAI in Boston, see: cancerimagingarchive.net
MICCAI: Medical Image Compu.ng and Computer Aided Interven.ons -‐ MICCAI2014.org TCGA: The Cancer Genome Atlas -‐ cancergenome.nih.gov TCIA: The Cancer Image Archive -‐ cancerimagingarchive.net
Digital Pathology/Brain Tumor Image Segmenta(on (BRATS)
• Use data currently available through data archive resources of the Na(onal Ins(tutes of Health (NIH), namely, the Cancer Genome Atlas (TCGA) and the Cancer Image Archive (TCIA)
• Digital Pathology challenge will use digital slides related to pa(ents whose genomics data are available from TCGA. Similarly, BRATS 2014 Challenge will use clinical MRI image data, also from the TCGA study subjects.
• Coordinated Pathology/Radiology 2015 challenge – feature selec.on and sta.s.cal/machine learning algorithms to leverage Radiology, Pathology and “omic” features to predict outcome, response to treatment
Computa(onal Pathology: Popula(ons
Suffolk County PPS IT Architecture Suffolk County
Providers
Suffolk county PPS Master Pa.ent Index (MPI)
Suffolk county PPS Health Informa.on Exchange (HIE)
E-‐HNLI RHIO (HIE)
Suffolk County PPS Pa.ent Portal
Stony Brook Medicine
Suffolk County Big Data Plaaorm
Suffolk County PPS Popula.on Management Tools
EMRs or clinical Informa.on System EMRs or clinical Informa.on System
eForms Pa(ent Wellness
Alerts Mobile Monitoring
Pa(ent Educa(on
Clinical Records
Collabora(on
Predic(ve Analy(cs Event Engine Structured Data Financial Data Legacy Data
Machine Learning NLP Unstructured Data Wearables Data Social Data
Anomaly Detec(on Rules Device Data HL7/CCD Open Data
Clinical Data for P
a.en
t Care
Jim Murry CIO, Charles Boisey
Suffolk PPS Organiza(onal Structure for exchange of clinical data and alerts for pa(ent visits
through e-‐HNLI
Stony Brook Medicine
Suffolk PPS HIE (SB Clinical
Network IPA, LLC)
Health Systems
Hospitals
Community Health Centers
Behavioral Healthcare Providers
Skilled Nursing Facili.es CHHA’s/
LTHHC
Physician Groups
Health Homes
Community-‐Based Agencies
Pharmacies
Those not part of the Stony Brook Medicine
Network
Other Healthcare Providers
Develop-‐mental Disability Providers
6
Suffolk county RHIO (e-‐HNLI)
Jim Murry CIO, Charles Boisey
The Internet of People and Things
• Distributed mHealth devices, sensors, point of care devices, EHRs computers and databases
• Collections of interacting services • Ubiquitous access to all clinical, laboratory, sensor,
radiology, pathology, treatment data • Iteratively scan patient information to evaluate
interventions • Aggregate and iterative mine patient information to
evaluate how to optimize treatment • Predictive/interactive analytics that anticipate
problems and launch preventive measures • QC/QA on data and process
Minimize Surprise
• Evaluate, track, quantify progression of known disease states
• Track, evaluate risk factors and carry out diagnostic screenings where risk factors are significant
• Active learning to formulate correct questions to ask • When unanticipated catastrophic event occurs, or
disease is first found in advanced state carry out systematic retrospective population study – Identify what was different about “surprise”
patients and unaffected cohorts
Our work at Emory: Find hot spots in readmissions within 30 days – Integrative analysis - crucial lab data role - to
characterize co-morbidities and clinical course – What fraction of patients with a given principal
diagnosis will be readmitted within 30 days? – What fraction of patients with a given set of
diseases will be readmitted within 30 days? – How does severity and time course of co-
morbidities affect readmissions?
EMR Data Analytics: Tools for Clinical Phenotyping and Population Health
Johns Hopkins Medical Institutions
Department of Pathology Johns Hopkins
(1999) Joel Saltz MD, PhD – Director Pathology Informatics
Jim Nichols, MD -- Assistant Professor JHU and head of POCT Program
Merwyn Taylor, PhD -- Instructor, Informatics Division, Dept of Pathology, JHU
Laboratory Without Walls
Johns Hopkins Medical Institutions
POCT Anywhere ● Provide patients with up-to-date clinical data,
interpretations of clinical data and health related educational materials * Integrated archive of patient clinical information,
education materials used by patients, families and health care providers
● Maintain collection of medical information gathered at patient’s home, in clinics and during hospitalizations Alert clinicians about abnormal values, non-compliance
● Interactive monitoring of POC device
Where Does Pathology Fit In?
• Capture and analysis of laboratory data is Pathology • Sensor data can be thought of as generalized lab data
• Clinical Pathology: data quality, process control, sta(s(cal analyses, analy(c vs biological varia(on
• Predic(ons improved by including novel tests – reduc(on of “omics” to rou(ne clinical tes(ng
• Pharmacogenomics is just the beginning ….
Where does Computa(onal Pathology Fit In?
• Machine learning and predic(ve analy(cs algorithms applied to popula(on health
• Context sensi(ve modeling of how integrated data from mul(ple sources influences probability distribu(ons associated with different health condi(ons
• Applied popula(on “omics” • Integra(on and analysis of data from pa(ent sensors • Integra(on and analysis of spa(al data sources
Thanks!