41
16 th Oct. 2015 Hansung Lee SW R&D Center, Samsung Electronics Multiple Object Class Detection & Localization with Deep Learning (CNN)

Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

  • Upload
    lamcong

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

16th Oct. 2015

Hansung Lee

SW R&D Center, Samsung Electronics

Multiple Object Class Detection &

Localization with Deep Learning (CNN)

Page 2: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

Outline

1. Introduction – definition of object recognition problems

2. Feature Extraction vs. Feature Learning

3. Design Issues

4. Methods and Algorithms

References

1

Page 3: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

I. Introduction – definition of object recognition problems

2

Page 4: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

3

Object Recognition Problem [1]

Major Tasks

Object

Recognition

Object Instance

Recognition

Object Class

Recognition

• Identifying previously seen object instances

• Matching problem in which the differences between the stored

exemplars and the objects to be re-identified in an input image

• Need some alignment process

• Known as category-level or generic object recognition

• Focuses on recognizing always unseen-before instances of

some predefined categories

• Challenging Problems

1) The inter-category visual differences sometimes may be very small

2) Large intra-category appearance variations caused by different object colors, textures, shapes,

as well as varying imaging conditions

3) An object in a real-world scene often occupies just a small portion of the scene and is occluded by

others or accompanied by similar looking background structures

Object Class

Detection

• Determine whether or not any instances of categories of interest

are present in an input image

• Locate instances of categories of interest accurately in the image

to separate them from the background

X. Zhang et al., “Object Class Detection: A Survey,” J. ACM Computing Survey, vol. 46, no. 1, pp. 10:1-10:53, 2013.

Page 5: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

4

Object Class Detection (1/2) [1]

Different facets related to object class detection

From X. Zhang et al., “Object Class Detection: A Survey,” J. ACM Computing Survey, vol. 46, no. 1, pp. 10:1-10:53, 2013.

Page 6: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

5

Object Class Detection (2/2) [1]

The bridging role of categorical appearance models in object class detection

From X. Zhang et al., “Object Class Detection: A Survey,” J. ACM Computing Survey, vol. 46, no. 1, pp. 10:1-10:53, 2013.

Page 7: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

6

Description of Relevant Visual Cues [1]

Descriptor Pixel-Level Feature

Description

Patch-Level Feature

Description

• Gray level pixel’s intensity

• Color Histogram

Region-Level

Feature Description

SIFT and its Variants

Filter Bank Responses

• Support region or the

neighborhood of the point

• Local feature descriptors

• pixel intensities, colors, textures,

edges, etc

Bag of Features

HoG and its Variants

GIST Feature

Shape Feature

Self-Similarity Feature

• Capturing the discriminating

visual properties of the target

categories or their components

• Keeping sufficient robustness

against possible intra-class

variations

X. Zhang et al., “Object Class Detection: A Survey,” J. ACM Computing Survey, vol. 46, no. 1, pp. 10:1-10:53, 2013.

Page 8: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

II. Feature Extraction vs. Feature Learning

7

Page 9: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

8

Filter Banks – Extracting Feature from Image (1/2) [2]

Gabor Filter Bank

Real Part Magnitude Part

Characteristics

• Pros:

1) Similar to Human Visual System, 2)Appropriate for Texture Representation, 3) Visual cortex of

mammalian brains can be modeled by Gabor functions

• Cons:

1) Difficult to Analyze the Results

Page 10: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

9

Filter Banks – Extracting Feature from Image (2/2) [3 - 5]

• Multi scale, multi orientation filter bank with 48

filters

• Mixture of edge, bar and spot filters

• 2 Gaussian derivative filters at 6 orientations and

3 scales, 8 Laplacian of Gaussian filters and 4

Gaussian filters

The Schmid (S) Filter Bank

• 13 rotationally invariant filters

• 13 isotropic, "Gabor-like" filters

The Maximum Response (MR) Filter Bank

Leung-Malik(LM) Filter Bank

• 2 anisotropic filters (an edge and a bar filter, at 6

orientations and 3 scales)

• 2 rotationally symmetric ones (a Gaussian and a

Laplacian of Gaussian)

Page 11: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

10

Filters Learnt by Convolutional Neural Network

1st Convolutional Layer

Page 12: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

11

Feature Descriptor – Bag of Feature [6 – 10]

Bag of Word

SIFT Feature

FAST Feature

SIFT: Scale Invariant Feature Transform

SURF: Speed Up Robust Features

FAST: Features from Accelerated Segment Test

BRIEF: Binary Robust Independent Elementary Features

SURF Feature

BRIEF Feature From http://www.codeproject.com/Articles/619039/Bag-of-Features-Descriptor-on-SIFT-Features-with-O

Page 13: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

Visual Vocabulary vs. Activation Feature Map of CNN

Convolutional Neural Network

From https://gilscvblog.wordpress.com/2013/08/23/bag-of-words-models-for-visual-categorization/

From H. Han, Deep Learning for Image Understanding - Applying in the Real World, BigComp 2015.

Page 14: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

13

Visualization of Feature Characteristics (1/4)

Gabor Feature Normalized RGB Feature

Similarity Matrix Visualization

Page 15: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

14

BoF - FAST

Visualization of Feature Characteristics (2/4)

BoF - SIFT

Page 16: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

15

BoF - SURF

Visualization of Feature Characteristics (3/4)

Page 17: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

16

Visualization of Feature Characteristics (4/4)

Feature Extraction from CNN (7 L) Feature Extraction from CNN (5 P)

Kernels

Page 18: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

III. Design Issues

17

Page 19: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

Object

Detection

18

Categories of Object Detection

Class specific

Object Detection

Generic Object

Detection

Single Object

Detection

Multiple Object

Detection

• Object detectors are specialized for one object class

• Examples: Face Detection (Haar Feature + Ada Booting),

Human Body detector (HoG Feature + SVM)

• Generally, Salience based Approach

• Objectness Score, Saliency Measure

• Examples: BING, EdgeBoxes, etc.

Objects are standalone things

with a well defined boundary and

center as opposed to amorphous

background stuff.

Page 20: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

19

Multiple Object Recognition & Localization [11]

Basic Design of MORL

• Issues:

1) Even a highly accurate classifier will produce false positives when faced with so many proposals.

2) Small sections of background can resemble actual objects, causing detection errors.

J. Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection,” arXiv: 1506.02640v3 [cs.CV], jun. 2015.

Page 21: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

20

GODL Approaches – Objectness [12, 13]

Learning Cue Parameters Visual Cues

• Multi-scale Saliency: an unique/salient appearance

• Color Contrast: a different appearance

• Edge Density: a closed boundary

• Superpixels Straddling: a closed boundary

Bayesian Cue Integration

Characteristics

• Use the 3-characteristics, a different appearance,

an unique/salient appearance, a closed boundary

• Pros:

- High recall ratio with the small no. of proposals

- Easy to control the no. of proposals

• Cons:

- Slower than Bing and EdgeBoxes

Page 22: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

21

Pipeline for Object Detection

Generating the Proposal

Windows

Matching Predefined

Features of Objects

Object Localization

Classifying Each

Bounding Box

Reject the Invalid

Bounding Boxes

Pruning the Invalid

Bounding Boxes

Finding Bounding Boxes with Objectness Measurement & Heuristics

Detecting & Localizing the Objects with Classifier

Page 23: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

22

Design Issues – Training dataset vs. Testing dataset

Training Data Testing Data

Image instance

Resized & Cropped Image

Image Instance(s)(Bounding Box)

Cropped & Resized Image

Matching

Page 24: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

23

Design Issues – Low confidence values

Page 25: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

24

Low confidence

High confidence

Design Issues – Low confidence value with high objectness

Page 26: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

25

The appearance of the object is not

enough to tell us about the object’s identity. ?

Contextual Information [14]

The scene add contextual information about

the object’s identity, so we can identify the

object as a kettle

Possibly From C. Galleguillos et al., “Context based Object Categorization: A Critical Survey,” Computer Vision and Image Understanding (CVIU), vol. 114, pp. 712-722, 2010.

Page 27: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

IV. Methods and Algorithms

26

Page 28: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

27

HCP – Hypotheses-CNN-Pooling (1/3) [15]

HCP Framework

From Y. Wei et al., “CNN: Single-label to Multi-label,” arXiv:1406.5726v3 [cs.CV] 9 Jul 2014.

Page 29: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

28

HCP – Hypotheses-CNN-Pooling (2/3) [15]

Initialization of HCP

From Y. Wei et al., “CNN: Single-label to Multi-label,” arXiv:1406.5726v3 [cs.CV] 9 Jul 2014.

Page 30: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

29

HCP – Hypotheses-CNN-Pooling (3/3) [15]

Samples of Predicted Scores

From Y. Wei et al., “CNN: Single-label to Multi-label,” arXiv:1406.5726v3 [cs.CV] 9 Jul 2014.

Page 31: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

30

R-CNN – Regions with CNN (1/3) [16, 17]

Object Detection System Overview

• Takes an input image

• Extracts around 2000 bottom-up region proposals

• Computes features for each proposal using a large convolutional neural

network (CNN)

• Classifies each region using class-specific linear SVMs

From R. Girshick et al., “Rich feature hierarchies for accurate object detection and semantic segmentation” In CVPR, 2014.

From R. Girshick et al., “Rich feature hierarchies for accurate object detection and semantic segmentation Supplementary material” In CVPR, 2014.

Page 32: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

31

R-CNN – Regions with CNN (2/3) [16, 17]

Object Proposal Transformations

From R. Girshick et al., “Rich feature hierarchies for accurate object detection and semantic segmentation” In CVPR, 2014.

From R. Girshick et al., “Rich feature hierarchies for accurate object detection and semantic segmentation Supplementary material” In CVPR, 2014.

Bounding Box Regression

Page 33: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

32

R-CNN – Regions with CNN (3/3) [16, 17]

From R. Girshick et al., “Rich feature hierarchies for accurate object detection and semantic segmentation” In CVPR, 2014.

From R. Girshick et al., “Rich feature hierarchies for accurate object detection and semantic segmentation Supplementary material” In CVPR, 2014.

Experimental Results

Page 34: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

33

Fast R-CNN [18]

Contributions of Fast R-CNN

• Higher detection quality (mAP) than R-CNN, SPPnet

• Training is single-stage, using a multi-task loss

• Training can update all network layers

• No disk storage is required for feature caching

From R. Girshick et al., “Fast R-CNN,” arXiv:1504.08083v2 [cs.CV] 27 Sep 2015.

Page 35: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

34

Faster R-CNN: Region Proposal Network [19]

Object Detection System Overview

• Takes an image (of any size) as input and outputs a set of rectangular

object proposals, each with an objectness score - Slide a small network over the conv feature map output by the last shared conv layer

- Each sliding window is mapped to a lower-dimensional vector

- This vector is fed into two sibling fully-connected layers—a box-regression layer (reg)

and a box-classification layer (cls)

From S. Ren et al., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” arXiv:1506.01497v1 [cs.CV] 4 Jun 2015.

Page 36: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

35

YOLO - You Only Look Once (1/4) [11]

From J. Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection,” arXiv: 1506.02640v3 [cs.CV], jun. 2015.

YOLO, a unified pipeline for object detection

• Define object detection as a regression problem to spatially separated

bounding boxes and associated.

• A single neural network predicts bounding boxes and class probabilities

directly from full images in one evaluation.

(1) Resizes the input image to 448 X 448.

(2) Runs a single convolutional network on the image.

(3) Thresholds the resulting detections by the model’s confidence.

Page 37: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

36

YOLO - You Only Look Once (2/4) [11]

From J. Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection,” arXiv: 1506.02640v3 [cs.CV], jun. 2015.

How It Works

• Divides the image into regions.

• Predicts bounding boxes and probabilities for each region.

• Bounding boxes are weighted by the predicted probabilities.

• Threshold the detections by some value to only see high scoring

detections.

From http://pjreddie.com/darknet/yolo/

Page 38: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

37

YOLO - You Only Look Once (3/4) [11]

From J. Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection,” arXiv: 1506.02640v3 [cs.CV], jun. 2015.

Unified Detection Model

A regression problem to a 7724 tensor which encodes bounding boxes and class

probabilities for all objects in the image.

24 convolutional layers + 2 fully connected layers.

Page 39: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

38

YOLO - You Only Look Once (4/4) [11]

From J. Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection,” arXiv: 1506.02640v3 [cs.CV], jun. 2015.

Experimental Results

Page 40: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

References

39

[1] X. Zhang et al., “Object Class Detection: A Survey,” J. ACM Computing Survey, vol. 46, no. 1, pp. 10:1-10:53, 2013.

[2] M. Haghighat et al., “Identification Using Encrypted Biometrics,” Computer Analysis of Images and Patterns, pp. 440-448,

2013.

[3] T. Leung et al., “Representing and Recognizing the Visual Appearance of Materials using Three-dimensional textons,”

Int. Journal of Computer Vision, vol. 43, no. 1, pp. 29-44, June 2001.

[4] C. Schmid et al., “Constructing Models for Content-based Image Retrieval,” CVPR, vol. 2, pp. 39-45, 2001.

[5] J. Geusebro et al., “Fast Anisotropic Gauss Filtering,” IEEE Transaction on Image Processing, vol. 12, no. 8, pp. 938-943,

2003.

[6] Csurka, Gabriella, et al. “Visual Categorization with Bags of Keypoints,” ECCV, vol. 1, 2004.

[7] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int. Journal of Computer Vision, vol. 60, no. 2,

pp. 91-110, 2004.

[8] H. Bay et al., “Speeded-Up Robust Features (SURF),” Computer Vision and Image Understanding, vol. 110, no. 3,

pp. 346-359, 2008.

[9] E. Rosten et al., “Faster and Better: A Machine Learning Approach to Corner Detection,” IEEE TPAMI, vol. 32, no. 1,

pp. 105-119, 2009.

[10] M. Calonder et al., “BRIEF: Computing a Local Binary Descriptor Very Fast,” IEEE TPAMI, vol. 34, no. 7, pp. 1281-1298, 2012.

[11] J. Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection,” arXiv: 1506.02640v3 [cs.CV], jun. 2015.

Page 41: Multiple Object Class Detection & Localization with Deep …mlcenter.postech.ac.kr/files/attach/workshop_fall_2015/삼성전자... · From H. Han, Deep Learning for Image Understanding

References

40

[12] B. Alexe, T. Deselaers, and V. Ferrari, "What is an Object?," CVPR 2010, 2010.

[13] B. Alexe, T. Deselaers, and V. Ferrari, " Measuring the Objectness of Image Windows," PAMI, vol. 34, No. 11,

pp. 2189-2202, 2012.

[14] C. Galleguillos et al., “Context based Object Categorization: A Critical Survey,” Computer Vision and Image Understanding

(CVIU), vol. 114, pp. 712-722, 2010.

[15] Y. Wei et al., “CNN: Single-label to Multi-label,” arXiv:1406.5726v3 [cs.CV] 9 Jul 2014.

[16] R. Girshick et al., “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation” In CVPR, 2014.

[17] R. Girshick et al., “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation Supplementary

Material” In CVPR, 2014.

[18] R. Girshick et al., “Fast R-CNN,” arXiv:1504.08083v2 [cs.CV] 27 Sep 2015.

[19] S. Ren et al., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” arXiv:1506.01497v1

[cs.CV] 4 Jun 2015.

[20] F. Anselmi et al., “Deep Convolutional Networks are Hierarchical Kernel Machines,” CBMM Memo, NSF, no. 35, 2015.