PowerPoint 簡報 · 2019. 11. 14. · segmentation of optic cup and disc in the retinal fundus image is useful in glaucoma screening segmentation of polyp in colonoscopy image is

1

長庚紀念醫院

醫療人工智能核心實驗室

翁唯城

2019/11/07

https://github.com/Bala93/Multi-task-deep-network/

https://github.com/Bala93/Multi-task-deep-network/

segmentation of optic cup and disc in the retinal fundus image is useful in glaucoma screening

segmentation of polyp in colonoscopy image is helpful in cancer diagnosis

segmentation of the organ, bones benefit surgery planning

segmentation of lung nodules in chest Computed Tomography aids physicians to differentiate malignant lesions from benign lesions

2

Applications of Image Segmentation

Comparison of the Four Models

3

U-Net DCAN Multi Psi-Net

Shape Information Χ √ √ √

Class Imbalance Χ Χ √ √

Smooth Boundary Χ Χ √ √

Multiple Object Instances √ √ Χ √

• U-Net: Convolutional Networks for Biomedical Image Segmentationhttps://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/

• DCAN: Deep Contour-Aware Networks for Accurate Gland Segmentationhttps://ieeexplore.ieee.org/document/7780642https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/580828/

• Deep multi-task and task-specific feature learning network for robust shape preserved organ segmentationhttps://ieeexplore.ieee.org/document/8363791

https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/

https://ieeexplore.ieee.org/document/7780642

https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/580828/

https://ieeexplore.ieee.org/document/8363791

U-Net

4

U-Net is an encoder-decoder type of network which takes an image as input and outputs a pixel-wise classification probability score with cross-entropy as its loss function. This network has set new state of the art results for different medical image segmentation tasks.

Drawbacks:• The encoder block of the network undersamples the input through max-pooling layers which results in loss of spatial

information.• U-Net having pixel-wise classification alone as a loss function produces uneven mask boundaries and outliers.• The loss function doesn’t take shape information into account which can help in performance improvement.• Using cross-entropy as a loss function introduces class imbalance problem for images in which background dominates

the object of interest which is very common in the medical images.

5

DCAN

Multi

DCAN & Multi

The decoders are used for mask and contour prediction in DCAN whereas in Multi it is used for estimating mask and distance map.

• Contour and distance map estimation act as regularizers to mask prediction. Shape information is imposed through contour and distance map.

• Class imbalance problem is mitigated in Multi through its joint classification and regression approach while it will still be an issue in DCAN because of both decoders acting as classifiers.

• The boundaries obtained using Multi are smooth and the segmentation has reduced outliers compared to U-Net and DCAN.

• But in multi-instance object segmentation cases, an object of smaller size can be treated as an outlier resulting in unsatisfactory segmentation.

contour

distance map

➢ A novel multi-task network Psi-Net has a single encoder and three decoders (architecture with shape ψ). The decoders are used to learn three different tasks in parallel. The mask prediction is the primary task while the contour detection and distance map estimation are auxiliary tasks. These additional tasks are used to regularize the mask prediction path to produce a refined mask with smooth boundaries.

➢ A novel joint loss function is proposed to handle the three different tasks together. The joint loss function consists of a combination of Negative Log Likelihood (NLL) for mask, Negative Log Likelihood (NLL) for contour, and Mean Square Error (MSE) for distance.

6

Psi-Net Introduction

Psi-Net Architecture

7

➢ The architecture Psi-Net is a UNet-like encoder-decoder network.

➢ The former two are pixel-wise classification tasks while the latter is a regression task.J/ψ

Loss Function

8

Mask

Contour

Distance

Lmask denotes the pixel-wise classification error. x is the pixel position in image space Ω. pmask(x; lmask) denotes the predicted probability for true label lmask after softmax activation function.

Lcontour denotes the pixel-wise classification error. pcontour(x; lcontour) denotes the predicted probability for true label lcontour after softmax activation function.

Ldistance denotes the pixel-wise mean square error. Dhat(x) is the estimated distance map after sigmoid activation function while D(x) is the ground-truth distance map.

Dataset, Preprocessing, Implementation

9

Dataset

• Optic cup and disc segmentation : ORIGA dataset consists of 650 color fundus image with ground truth segmentations for optic disc and cup. The color fundus images are of dimension 256 × 256. Ellipse fit is applied to output segmentation mask.https://www.ncbi.nlm.nih.gov/pubmed/21095735https://www.medicmind.tech/resources-2/

• Polyp segmentation : Dataset from MICCAI 2018 Gastrointestinal Image ANalysis (GIANA). The dataset consists of 912 images with ground truth masks. The dataset is split into 70% for training and 30% for testing. The images are center cropped and resized to 256 × 256.https://www.ncbi.nlm.nih.gov/pubmed/29065595(EndoVis) https://endovis.grand-challenge.org/

Preprocessing

The dataset contains only segmentation mask. The team made ground truth contour and distance maps.• The contour map is obtained by estimating the boundaries of connected components. These boundaries are

subsequently dilated with a disk filter of radius 5.• The distance map is obtained by applying an euclidean distance transform to the mask. The final distance

map will contain zeros in the mask region, with the rest of the pixels denoting the shortest distance between that pixel and the mask boundary.

Implementation

All the models are implemented using PyTorch. Models are trained for 150 epochs using Adam optimizer, with a learning rate of 1e-4 and batch size 4. Experiments have been conducted with NVIDIA GeForce GTX 1060 GPU -6GB RAM.

https://www.ncbi.nlm.nih.gov/pubmed/21095735

https://www.medicmind.tech/resources-2/

https://www.ncbi.nlm.nih.gov/pubmed/29065595

https://endovis.grand-challenge.org/

補充介紹

10

Cosine Similarity

Euclidean Distance

二維空間的歐幾里得距離

Evaluation Metrics

11

Segmentation Evaluation

Shape SimilarityIn mathematics, the Hausdorff distance measures how far two subsets of a metric space are from each other.

Jaccard Index

Dice Index

Hausdorff Distance

A: output of the used methodB: actual ground truth

Comparison of Segmentation and Shape Metrics

12

Psi-Net

Multi

DCAN

U-Net

Percent of Misclassified Pixels within Trimaps of Different Widths

13

• 一張影像的「Trimap」，是指將影像中的每個像素劃分為三種區域：前景（英語：Foreground）、背景（英語：Background）和待確認（英語：Unknown）。演算法會將標定好的前景和背景當成已知，再藉由顏色等資訊將待確認區域中的像素標為前景或背景。https://zh.wikipedia.org/zh-tw/%E5%BD%B1%E5%83%8F%E5%8E%BB%E8%83%8C

• https://medium.com/@literallywords/reproducing-deep-image-matting-4c01f956ba0f

https://zh.wikipedia.org/zh-tw/%E5%BD%B1%E5%83%8F%E5%8E%BB%E8%83%8C

https://medium.com/@literallywords/reproducing-deep-image-matting-4c01f956ba0f

Qualitative Comparison

14

Ground Truth Mask

U-Net

DCAN

Multi

Psi-Net

Documents

PowerPoint 簡報 · 2019. 11. 14. · segmentation of optic cup and disc in the retinal fundus image is useful in glaucoma screening segmentation of polyp in colonoscopy image is