Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
長庚紀念醫院
醫療人工智能核心實驗室
翁唯城
2019/11/07
https://github.com/Bala93/Multi-task-deep-network/
segmentation of optic cup and disc in the retinal fundus image is useful in glaucoma screening
segmentation of polyp in colonoscopy image is helpful in cancer diagnosis
segmentation of the organ, bones benefit surgery planning
segmentation of lung nodules in chest Computed Tomography aids physicians to differentiate malignant lesions from benign lesions
2
Applications of Image Segmentation
Comparison of the Four Models
3
U-Net DCAN Multi Psi-Net
Shape Information Χ √ √ √
Class Imbalance Χ Χ √ √
Smooth Boundary Χ Χ √ √
Multiple Object Instances √ √ Χ √
• U-Net: Convolutional Networks for Biomedical Image Segmentationhttps://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
• DCAN: Deep Contour-Aware Networks for Accurate Gland Segmentationhttps://ieeexplore.ieee.org/document/7780642https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/580828/
• Deep multi-task and task-specific feature learning network for robust shape preserved organ segmentationhttps://ieeexplore.ieee.org/document/8363791
U-Net
4
U-Net is an encoder-decoder type of network which takes an image as input and outputs a pixel-wise classification probability score with cross-entropy as its loss function. This network has set new state of the art results for different medical image segmentation tasks.
Drawbacks:• The encoder block of the network undersamples the input through max-pooling layers which results in loss of spatial
information.• U-Net having pixel-wise classification alone as a loss function produces uneven mask boundaries and outliers.• The loss function doesn’t take shape information into account which can help in performance improvement.• Using cross-entropy as a loss function introduces class imbalance problem for images in which background dominates
the object of interest which is very common in the medical images.
5
DCAN
Multi
DCAN & Multi
The decoders are used for mask and contour prediction in DCAN whereas in Multi it is used for estimating mask and distance map.
• Contour and distance map estimation act as regularizers to mask prediction. Shape information is imposed through contour and distance map.
• Class imbalance problem is mitigated in Multi through its joint classification and regression approach while it will still be an issue in DCAN because of both decoders acting as classifiers.
• The boundaries obtained using Multi are smooth and the segmentation has reduced outliers compared to U-Net and DCAN.
• But in multi-instance object segmentation cases, an object of smaller size can be treated as an outlier resulting in unsatisfactory segmentation.
contour
distance map
➢ A novel multi-task network Psi-Net has a single encoder and three decoders (architecture with shape ψ). The decoders are used to learn three different tasks in parallel. The mask prediction is the primary task while the contour detection and distance map estimation are auxiliary tasks. These additional tasks are used to regularize the mask prediction path to produce a refined mask with smooth boundaries.
➢ A novel joint loss function is proposed to handle the three different tasks together. The joint loss function consists of a combination of Negative Log Likelihood (NLL) for mask, Negative Log Likelihood (NLL) for contour, and Mean Square Error (MSE) for distance.
6
Psi-Net Introduction
Psi-Net Architecture
7
➢ The architecture Psi-Net is a UNet-like encoder-decoder network.
➢ The former two are pixel-wise classification tasks while the latter is a regression task.J/ψ
Loss Function
8
Mask
Contour
Distance
Lmask denotes the pixel-wise classification error. x is the pixel position in image space Ω. pmask(x; lmask) denotes the predicted probability for true label lmask after softmax activation function.
Lcontour denotes the pixel-wise classification error. pcontour(x; lcontour) denotes the predicted probability for true label lcontour after softmax activation function.
Ldistance denotes the pixel-wise mean square error. Dhat(x) is the estimated distance map after sigmoid activation function while D(x) is the ground-truth distance map.
Dataset, Preprocessing, Implementation
9
Dataset
• Optic cup and disc segmentation : ORIGA dataset consists of 650 color fundus image with ground truth segmentations for optic disc and cup. The color fundus images are of dimension 256 × 256. Ellipse fit is applied to output segmentation mask.https://www.ncbi.nlm.nih.gov/pubmed/21095735https://www.medicmind.tech/resources-2/
• Polyp segmentation : Dataset from MICCAI 2018 Gastrointestinal Image ANalysis (GIANA). The dataset consists of 912 images with ground truth masks. The dataset is split into 70% for training and 30% for testing. The images are center cropped and resized to 256 × 256.https://www.ncbi.nlm.nih.gov/pubmed/29065595(EndoVis) https://endovis.grand-challenge.org/
Preprocessing
The dataset contains only segmentation mask. The team made ground truth contour and distance maps.• The contour map is obtained by estimating the boundaries of connected components. These boundaries are
subsequently dilated with a disk filter of radius 5.• The distance map is obtained by applying an euclidean distance transform to the mask. The final distance
map will contain zeros in the mask region, with the rest of the pixels denoting the shortest distance between that pixel and the mask boundary.
Implementation
All the models are implemented using PyTorch. Models are trained for 150 epochs using Adam optimizer, with a learning rate of 1e-4 and batch size 4. Experiments have been conducted with NVIDIA GeForce GTX 1060 GPU -6GB RAM.
補充介紹
10
Cosine Similarity
Euclidean Distance
二維空間的歐幾里得距離
Evaluation Metrics
11
Segmentation Evaluation
Shape SimilarityIn mathematics, the Hausdorff distance measures how far two subsets of a metric space are from each other.
Jaccard Index
Dice Index
Hausdorff Distance
A: output of the used methodB: actual ground truth
Comparison of Segmentation and Shape Metrics
12
Psi-Net
Multi
DCAN
U-Net
Percent of Misclassified Pixels within Trimaps of Different Widths
13
• 一張影像的「Trimap」,是指將影像中的每個像素劃分為三種區域:前景(英語:Foreground)、背景(英語:Background)和待確認(英語:Unknown)。演算法會將標定好的前景和背景當成已知,再藉由顏色等資訊將待確認區域中的像素標為前景或背景。https://zh.wikipedia.org/zh-tw/%E5%BD%B1%E5%83%8F%E5%8E%BB%E8%83%8C
• https://medium.com/@literallywords/reproducing-deep-image-matting-4c01f956ba0f
Qualitative Comparison
14
Ground Truth Mask
U-Net
DCAN
Multi
Psi-Net