79
Combining Motion Information with Appearance Information and Utilizing of Mutual Information Encoded Among Image Features of the Same Object for Better Detection Performance Zhipeng Wang irected by: Pro. Katsushi Ikeuchi, Dr. Masataka Kegesawa, and Dr. Shintaro Ono Computer Vision Laboratory, The University of Tokyo 運運運運運運運運運運運運運運運運運運 運運 運 一体 運運運運運運運運運運運運運運運運運運運運運運 運運運運運運運運運運運運

Wang midterm-defence

Embed Size (px)

DESCRIPTION

This is the slides used for my mid-term defense.

Citation preview

Page 1: Wang midterm-defence

Combining Motion Information with Appearance Information and Utilizing of Mutual Information Encoded Among Image Features of the Same Object for Better Detection Performance

Zhipeng Wang

Directed by: Pro. Katsushi Ikeuchi, Dr. Masataka Kegesawa, and Dr. Shintaro Ono

Computer Vision Laboratory, The University of Tokyo

運動情報を外観情報との結びつけおよび 同一物体の画像特徴によりエンコードされた相互情報の利用 ーー検出性能の向上のため

Page 2: Wang midterm-defence

Outline

Background and state-of-the-art detection methods

Our work (goal, challenges, related work, method,

results, contribution, and conclusion) Section 1, 2: utilizing motion information Section 3, 4: utilizing mutual information

Conclusion Publication list

Page 3: Wang midterm-defence

Background

Object detection from images: One of the basic perceptual skills in human Plays an important role in machine vision area

Page 4: Wang midterm-defence

Two primary categories

Sliding-window methods Use classifiers to answer whether each sub-

image contains a target object Hough transform based methods

Infer about object status in a bottom-up manner from detected image features

Page 5: Wang midterm-defence

Better detection performance

Representative methods Image features: HOG, sift

Classifiers Machine learning methods: deep learning

Efficient search techniques Branch and bound[1]

[1] Lampert, C.H.; Blaschko, M.B.; Hofmann, T.; , "Efficient Subwindow Search: A Branch and Bound Framework for Object Localization," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.31, no.12, pp.2129-2142, Dec. 2009

Page 6: Wang midterm-defence

Our work towards better detection performance

Combining motion information with appearance information

Using mutual information encoded among the features of the same object

By using extra information, we expect to achieve better detection performance

Page 7: Wang midterm-defence

Our work

Combining motion information with appearance information

1. Common Fate Hough Transform

2. Emergency Telephone Indicator Detection in Tunnel Environment

Page 8: Wang midterm-defence

1. Common Fate Hough Transform

Page 9: Wang midterm-defence

Goal and challenges

Where and what label is each object in the scene

Challenges: Near objects Similar different-

class objects

Common Fate Hough Transform

Page 10: Wang midterm-defence

Related work

Motion for object detection Background subtraction Optical flow[1]

Combination of appearance and motion information Mainly for tracking

[1] G. Brostow and R. Cipolla. Unsupervised bayesian detection ofindependent motion in crowds. In CVPR, pages I: 594–601, 2006.

Page 11: Wang midterm-defence

Hough transform

On the current image: Detect keypoints and extract image feature Find the best matched codes from a trained

codebook Each keypoint votes for

object center and label

B. Leibe and B. Schiele. Interleaved object categorization and segmentation.In BMVC, pages 759–768, 2003.

Page 12: Wang midterm-defence

Hough transform

Discrete to continuous Blurring each vote Summing up all votes from

all object parts

Common Fate Hough Transform

Page 13: Wang midterm-defence

The common fate principle

One of the gestalt laws Elements with the same motion tend to be

perceived as one unit

Common Fate Hough Transform

Page 14: Wang midterm-defence

Common fate Hough transform

Motion analysis[1]

Keypoint tracking (KLT tracker)

Clustering the trajectories

[1] G. Brostow and R. Cipolla. Unsupervised bayesian detection ofindependent motion in crowds. In CVPR, pages I: 594–601, 2006.

Page 15: Wang midterm-defence

Common fate Hough transform

The weight of each vote is related to the support it gains from the motion group

Page 16: Wang midterm-defence

Common fate Hough Transform

Page 17: Wang midterm-defence

Common fate Hough transform

Page 18: Wang midterm-defence

Common Fate Hough Transform

Inference 1. Find the highest peak in the confidence space 2. Exclude the object parts belonging to the

highest peak 3. If the Hough image is not empty, go to 1.

Page 19: Wang midterm-defence

Inference

Common Fate Hough Transform

Page 20: Wang midterm-defence

Common Fate Hough Transform

Inference

Page 21: Wang midterm-defence

Common Fate Hough Transform

Inference

Page 22: Wang midterm-defence

Common Fate Hough Transform

Inference

Page 23: Wang midterm-defence

Common Fate Hough Transform

Inference

Page 24: Wang midterm-defence

Common Fate Hough Transform

Inference

Page 25: Wang midterm-defence

Common Fate Hough Transform

Inference

Page 26: Wang midterm-defence

Common Fate Hough Transform

Inference

Page 27: Wang midterm-defence

Experimental results

Dataset 720×576 401 continuous frames 633 different-class ground truth bounding boxes

on 79 frames

Common Fate Hough Transform

Page 28: Wang midterm-defence

Results

Common Fate Hough Transform

Page 29: Wang midterm-defence

Results

Common Fate Hough Transform

Page 30: Wang midterm-defence

Results

Common Fate Hough Transform

Page 31: Wang midterm-defence

Results

Common Fate Hough Transform

Page 32: Wang midterm-defence

Results

Common Fate Hough Transform

Page 33: Wang midterm-defence

Results

Common Fate Hough Transform

Page 34: Wang midterm-defence

Results

Common Fate Hough Transform

Page 35: Wang midterm-defence

Results

Common Fate Hough Transform

Page 36: Wang midterm-defence

Results

Common Fate Hough Transform

Page 37: Wang midterm-defence

Results

Common Fate Hough Transform

Page 38: Wang midterm-defence

Result comparison

[1] O. Barinova, V. Lempitsky, and P. Kohli. On detection of multiple object instances using hough transforms. In CVPR, pages 2233–2240, 2010.

Page 39: Wang midterm-defence

Contribution

A detection method with better detection performance than the CVPR’10 method

A successful combination of motion and appearance information for detection

A successful attempt to combine human perception rules into detection method in the computer vision area

Common Fate Hough Transform

Page 40: Wang midterm-defence

Conclusion

Motion largely improves detection results by being effectively combined with appearance information

The method is not efficient then we propose

Common Fate Hough Transform

Page 41: Wang midterm-defence

2. Detection of Emergency Telephone Indicators by Fusion of Motion Information with Appearance Information

Page 42: Wang midterm-defence

Goal

Emergency Telephone Indicator Detection

For vehicle positioning Detecting emergency telephone indicators

Installed at known location Infrared cameras (far infrared) installed on vehicle’s

top

Page 43: Wang midterm-defence

Challenges

Challenges: Noisy objects Real-time requirement

Emergency Telephone Indicator Detection

Page 44: Wang midterm-defence

Method Method: a two-step method

Detect, verify, and cluster keypoints Verify keypoint clusters by appearance and

motion information

Emergency Telephone Indicator Detection

Page 45: Wang midterm-defence

Keypoint Detection

Keypoint Verification

Keypoint Clustering

Keypoint Cluster Verificationby Appearance

Keypoint Cluster Tracking

Keypoint Cluster Verification by Motion

Pipeline

Page 46: Wang midterm-defence

Pipeline

Detect keypoints Averagely sampling Setting intensity thresholds

160-190 (0-255)

Emergency Telephone Indicator

Page 47: Wang midterm-defence

Pipeline

Verify keypoints Intensity histogram Building a mixture model

using k-means

Emergency Telephone Indicator Detection

Page 48: Wang midterm-defence

Pipeline

Cluster keypoints Build a minimum spanning tree (Euclidean

distance) Split the tree

Emergency Telephone Indicator Detection

Page 49: Wang midterm-defence

Pipeline

Verify keypoint clusters by appearance Adaboost machine

Trained from positive and negative examples Intensity histogram

Emergency Telephone Indicator Detection

Page 50: Wang midterm-defence

Pipeline

Keypoint cluster tracking No occlusion Connect each detection

response to its nearest trajectory (appearance, scale, and time gap)

Emergency Telephone Indicator Detection

Page 51: Wang midterm-defence

Pipeline

Keypoint cluster verification by motion Fit each trajectory as a straight line Use the significance of the fitting as criteria

Emergency Telephone Indicator Detection

Page 52: Wang midterm-defence

Benefits of the pipeline

The system is real-time More time consuming steps dealing with

fewer instances Original image: 105 pixels Keypoint detection: 104 points Keypoint verification: 103 keypoints Keypoint clustering: 102 keypoints Less than 10 keypoint clusters

Emergency Telephone Indicator Detection

Page 53: Wang midterm-defence

Experimental results

Real-time (34 fps) A laptop with Intel Core2 Duo 2.8GHz

processors

Page 54: Wang midterm-defence

Experimental results

Emergency Telephone Indicator Detection

Page 55: Wang midterm-defence

Contribution

A specialized real-time detection method on cluttered data

A successful combination of motion and appearance information for detection

The meaning of this research: a potential applicable solution for vehicle positioning in tunnel environment

Emergency Telephone Indicator Detection

Page 56: Wang midterm-defence

Conclusion

Motion information plays an important role in the detecting process

Emergency Telephone Indicator Detection

Page 57: Wang midterm-defence

Future work

We collect new data using new camera and plan to improve the pipeline on the new collected data

Emergency Telephone Indicator Detection

Page 58: Wang midterm-defence

Two methods utilizing motion information are proposed

When motion information is not available, we then go to

Page 59: Wang midterm-defence

Our work

Utilizing of Mutual Information Encoded Among Image Features of the Same Object

3. Pyramid Match Score for Object Detection

4. Detection of Object With In-plane Rotations

Page 60: Wang midterm-defence

3. Pyramid Match Score for Object Detection

Page 61: Wang midterm-defence

Motivation

Sliding window based methods: Often [1] ignore positional information of each

image feature Methods based on Hough transform

Ignore overall information of each object Intention of this method:

An efficient method to utilize positional information of each image feature and overall information gained from the whole object

[1] Lazebnik, S.; Schmid, C.; Ponce, J.; , "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on , vol.2, no., pp. 2169- 2178, 2006

Page 62: Wang midterm-defence

Pyramid Match

Pyramid Match is a method to find best one-one match between two sets of points Here gives a 2D example Real data is 14D

Appearance information:12D Position information: 2D

Page 63: Wang midterm-defence

Pyramid match

Divide the space from fine to coarse If two points are in the same grid under a division,

the are considered as match and excluded

Page 64: Wang midterm-defence

Pyramid match

The distance of two matched points No need to be calculated Assigned

According to the size of the grid, when they are found match

1/4 * 2 1/2 * 2 1*2

Page 65: Wang midterm-defence

Pyramid Match Score

Pyramid match between two sets Each sub-image: considered as a 14D point set

(appearance: 12, position: 2) A “super template” contains all 14D points of

training images

Page 66: Wang midterm-defence

Pyramid Match Score

Find the near best match between each sub-image of a test image and the “super template”

The match score (distance) is used as the confidence that the sub-image contains a target object

Page 67: Wang midterm-defence

Results on UIUC cars

Page 68: Wang midterm-defence

Results

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.1

0.2

0.3

0.4

0.5

0.6

3

10

7

av

back

Precison

Det

ecti

on

rat

e

Page 69: Wang midterm-defence

Comments

The results are not good enough The method seems promising, and in future

we will improve the method

This mutual information can be further used

Page 70: Wang midterm-defence

4 Detection of Object With In-plane Rotations

Page 71: Wang midterm-defence

About the related work

Training examples of all rotation directions,

not robust

or very time-consuming

Page 72: Wang midterm-defence

Related work

Two neural networks to deal with one sub-image One for estimation of the

sub-image’s rotation angle The sub-image is rotated so

that the potential face is upright

The other for decision of whether one object exists

H. Rowley, S. Baluja, and T. Kanade. Rotation invariant neural network-based face detection. In CVPR, pages 38–44, 1998.

Page 73: Wang midterm-defence

Related work

Local feature method Infer rotation based on the gradient direction of

sift feature Graph-based methods

Consider object as graph, and change the graph

Detection of Object With In-plane Rotations

Page 74: Wang midterm-defence

Idea

Suppose we have a codebook trained from objects without in-plane rotations

Detection of Object With In-plane Rotations

Page 75: Wang midterm-defence

Idea

With proof omitted For two or more votes from different keypoints,

there exists one and only one rotation angle which will minimize the difference of the voted centers

If all votes are good estimations, we can expect to be a good estimation the rotation angle of the object

Detection of Object With In-plane Rotations

Page 76: Wang midterm-defence

To do

Propose a method capable of detecting objects with in-plane

Evaluate the robustness of

Detection of Object With In-plane Rotations

Page 77: Wang midterm-defence

Conclusion

Motion information combined with appearance information Distinguish target objects with noisy object Enhance detection rate

Mutual information encoded among the image features of the same object We expect to get a method capable of utilizing

positional and overall information We expect to get a method capable of detecting

objects with in-plane rotations

Page 78: Wang midterm-defence

Publication list

[1] Zhipeng Wang; Jinshi Cui; Hongbin Zha; Kegesawa, M.; Ikeuchi, K.; , "Object detection by common fate Hough transform," Pattern Recognition (ACPR), 2011 First Asian Conference on , vol., no., pp.613-617, 28-28 Nov. 2011

[2] Zhipeng Wang; Kagesawa, M.; Ono, S.; Banno, A.; Ikeuchi, K.; , "Emergency light detection in tunnel environment: An efficient method," Pattern Recognition (ACPR), 2011 First Asian Conference on , vol., no., pp.628-632, 28-28 Nov. 2011

[3] Zhipeng Wang; Jinshi Cui; Hongbin Zha; Masataka Kegesawa; Shintaro Ono; Katsushi Ikeuchi;, “Detection by Motion-based Grouping of Object Parts,” International Journal of ITS Research (submitted)

[4] Zhipeng Wang; Masataka Kegesawa; Shintaro Ono; Atsuhiko Banno; Takeshi Oishi; Katsushi Ikeuchi;, “Detection of Emergency Telephone Indicators Using

Infrared Cameras for Vehicle Positioning in Tunnel Environment,” ITS World Congress 2013 (submitted)

Page 79: Wang midterm-defence

Thank you very much!