Toward the Future of AI-Driven Medicineon-demand.gputechconf.com/gtc-taiwan/2018/pdf/3-3... ·...

Preview:

Citation preview

Toward the Future of

AI-Driven Medicine

葉肇元 醫師

雲象科技執行長

雲象科技

Bring state-of-the-art technology to healthcare.

Our Core

Our Mission

Our Goal

We’re a Medical Image AI company.

Empower medical imaging with A.I.

A survey of deep learning in medical image analysis

Mammographic Mass Classification

Diabetic RetinopathyDetection

Breast CancerMetastasis Detection

Brain Lesion Segmentation

Airway Segmentation of Chest CT Image

Lung Nodule Detection

Bone Suppression in X-Ray Image

Skin DiseaseClassification

Prostate

Segmentation

Organs of interest for Medical Image AI

• Brain

• Brain tumor segmentation

• Disease classification

• Survival Prediction

• Eyes (Retina)

• DM retinopathy, cardiac risk factor

• Lungs

• Lung nodule detection

• Breast

• Breast cancer screening

• Heart

• Cardiac image analysis

• Intestine

• Polyp classification

• Prostate

• Prostate segmentation

• Bones

• Age determination

• Skins

• Disease classification

• Blood Vessels

• Blood vessel segmentation

• Blood

• Blood cell counting and classification

• Authored by Google, Verily Life Sciences, and Stanford School of Medicine

• Inception-V3 Model trained on data from 236,234 patients from EyePACS , 48,101(UK Biobank), validated on data from 12,026 patients from UK Biobank, and 999 patients from EyePACS.

• Used Retinal Fundus Image to predict

• Age, gender, smoking status, BMI, systolic blood pressure, diastolic blood pressure

Poplin R, et al. Nature Biomedical Engineeringvolume 2, pages158–164(2018)

MAE : Mean Absolute ErrorFor continuous risk factors (like age), the baseline value is the Mean Absolute Error of Predicting the mean value for all patients.

The cost of making medical image AI not often talked about :

Time

Expected Timeline for a Medical Image AI Project

Required Skill Category:• Interdisciplinary Knowledge• Hospital Information System

Time(Month)

Identify Topic

Collect & Process

Data

Train & Validate ModelCollect More Data

Train & Validate Model

Deploy

2 4 6 8

• AI Software and Hardware• Healthcare Workflow

In reality..

Time(Month)

Identify Topic Collect, Process and Label Data Train & Validate Model

2 4 6 8

Houston, we’ve got a problem.

• So it takes ten months to make one AI model happen (if you’re lucky).

• But there are thousands of clinical tasks that could potentially benefit from the help of A.I. !

• (How on earth can we replace Drs. with A.I. ?)

How Do We Get There ?

Time(Month)

2 4 6 81

Identify TopicCollect Data

Train and Validate Model

Deploy

What’s holding us back? Infrastructure.

• Hospital Information System

• AI Software and Hardware

• Interdisciplinary Knowledge

• Healthcare Workflow

Interdisciplinary Knowledge

Essential Ingredient of a Successful Medical Image AI Project

• Interdisciplinary knowledge

• Intricacies of medical diagnostic procedures

• Capabilities of different neural network models

• How medical data can be digested by neural networks and turned into insight

Our first attempt at Digital Pathology AI• Lymphoma screening using whole slide image

Digging into data : examining raw input

Dark Zone

Light Zone

Follicular Lymphoma

Mantle Zone

Tinged-Body Macrophage

Web interface for deep learning inferencing

Training statistics

Lymphoma Screening Model Used on Whole Slide Image

Improved Tools for Whole Slide Image Labeling

Dataset Statistics

• Labeled Training Slides : 56 Cancer, 56 Benign

• Total number of extracted patches

• Validation: ~40,000 patches

• Testing : ~40,000 patches

Benign Cancer Background

4,460,452 147,533 87,974

Neural Network Architecture

• Modified ResNet-50:

• Dense layer after Global Average Pooling for tissue / background binary classification

• Separate path with additional dense layers for cell type (cancer / benign) classification

Neural Network Training

• Heavy data augmentation

• Flipping (Up-down, left-right), Add, Multiply, Add to Hue and Saturation, Contrast Normalization, Gaussian Blur, Gaussian Noise

• Class balancing : random sampling of equal number from each class

• Optimizer : Adam Optimizer

• Early Stopping

Training Result

Foreground / background classification

Benign / Cancer classification

Loss

Accuracy

Statistics Of Validation Result

Testing Result

Recall = SensitivityPrecision = Positive Predictive Rate

Prediction on Separate Test SlidePrediction by Neural NetworkGround Truth

Yellow : Cancer, Blue : Benign Red : Predicted cancer region

Accuracy : 90.4 %, Precision : 93.4% , Recall : 93.0 %

Class Activation Map

AI Software and Hardware

• 1 Digital slide is larger than the entire CIFAR-10 dataset

• Digital slide : 80000*60000

• CIFAR-10 : 32*32*60000

Medical Images AI Needs a Lot of Memory

• Medical images have very high spatial resolution:

• Radiography image : 5000*4000 uint16

• CT image : 512*512*300 uint16

• Digital Slide : 60000*60000*3 uint8

• Average ImageNet image : 469*387*3 uint8

GPU memory alone is not sufficientfor Medical Image AI

• For VGG-16, during training

• A GTX-1080Ti can take an image up to 1200*1200

• A Tesla P40 can take an image up to 1700*1700

• A Tesla V100 can take an image up to 2100*2100

• CUDA unified memory

CUDA Unified Memory in Tensorflow

Specialized Hardware for AI Compute

A BREAKTHROUGH IN TRAINING AND INFERENCEEach of Tesla V100's 640 Tensor Cores operates on a 4x4 matrix, and their associated data paths are custom-designed to dramatically increase floating-point compute throughput with high-energy efficiency.

This key capability enables Volta to deliver 3X performance speedups in training and inference over the previous generation.

The Power of Tensor Cores

0

2

4

6

8

10

12

14

16

GTX 1080 TI TITAN XP TITAN V

Ba

tch

es

pe

r se

con

d

Float 16 Batchsize 512

Development environment:

GTX 1080 Ti : Tensorflow 1.4, CUDA 8, cuDNN 5, nvidia-381 driver

Titan Xp : Tensorflow 1.4, CUDA 9, cuDNN 7, nvidia-387 driver

Titan V : Tensorflow 1.4, CUDA 9, cuDNN 7, nvidia-387 driver

Neural Network : Convolution * 6 + fully connected * 2 , trained on cifar-10* 2

0

2

4

6

8

10

12

14

16

GTX 1080 TI TITAN XP TITAN V

Ba

tch

es

pe

r se

con

dFloat32 Batchsize 512

GPU is often thirsty : The Importance of Pipelining

9.7

15.2

8.1

4.2

2.33 2.28

0

2

4

6

8

10

12

14

16

1 CPU 2 CPU 4 CPU 8CPU 16 CPU

Training time per epoch

Without Queue

With Queue

Without Queue

With Queue

Healthcare Information System

Problems with Existing Hospital Information System

• Databases are not tightly connected

• Limited search functions

• The majority of data exists in unstructured format (.txt, .pdf, etc)

Unified Web Interface for Medical Image AI• Web-based system that integrates:

• Clinical data

• Digital slides

• DICOM images / videos

• Deep learning annotation, training and inferencing

Annotation Interface with Structured Reporting

Annotation and Image Markup (AIM)

• An NCI initiated project that provides a solution to the following imaging challenges:

• No agreed upon syntax for annotation and markup

• No agreed upon semantics to describe annotations

• No standard format (for example, DICOM, XML, HL7) for annotations and markup

• The link between the semantics and image annotation will help make more useful and more interpretable medical image AI.

https://wiki.nci.nih.gov/display/AIM/Annotation+and+Image+Markup+-+AIM

AIM Example

Medical Record De-Identification

• Due to privacy concerns, AI research requires that personal identification information be removed from medical record.

• It’s hard to achieve satisfactory result using regular expression or other rule-based methods.

• Using tools like NeuroNER (name entity recognition), we’ve successfully achieved an F1 score of >97% on public dataset.

Next Generation IT Infrastructure for AI-Powered Hospital

Clinical Terminal• Structured Report for Both Clinical and AI use

Hybrid Storage• Fast : Cache for AI training• Slow : Data Archive

AI Training Server• High Compute Capacity• Job Queues for Non-Stop Learning

Main Server• High Availability• Advanced Database System• Job Flow Control

AI Inferencing Server•Virtualization for On-Demand AI Inferencing • Optimized for Inferencing Speed

Clinical Data Clinical Data

AI Model

AI Model

AI-Powered Diagnostic Aid

Acknowledgement

• 長庚醫院病理科莊文郁副教授

• 長庚醫院巨量資料及統計中心張尚宏主任

• 臺大醫院心臟內科王宗道教授

• 臺大醫院影像醫學科李文正醫師

• 雲象科技張哲惟

• 雲象科技游為翔

• 雲象科技楊証琨

• 雲象科技蔡岳霖

Recommended