Upload
zhipeng-wang
View
89
Download
3
Embed Size (px)
DESCRIPTION
This is the slides used for my mid-term defense.
Citation preview
Combining Motion Information with Appearance Information and Utilizing of Mutual Information Encoded Among Image Features of the Same Object for Better Detection Performance
Zhipeng Wang
Directed by: Pro. Katsushi Ikeuchi, Dr. Masataka Kegesawa, and Dr. Shintaro Ono
Computer Vision Laboratory, The University of Tokyo
運動情報を外観情報との結びつけおよび 同一物体の画像特徴によりエンコードされた相互情報の利用 ーー検出性能の向上のため
Outline
Background and state-of-the-art detection methods
Our work (goal, challenges, related work, method,
results, contribution, and conclusion) Section 1, 2: utilizing motion information Section 3, 4: utilizing mutual information
Conclusion Publication list
Background
Object detection from images: One of the basic perceptual skills in human Plays an important role in machine vision area
Two primary categories
Sliding-window methods Use classifiers to answer whether each sub-
image contains a target object Hough transform based methods
Infer about object status in a bottom-up manner from detected image features
Better detection performance
Representative methods Image features: HOG, sift
Classifiers Machine learning methods: deep learning
Efficient search techniques Branch and bound[1]
[1] Lampert, C.H.; Blaschko, M.B.; Hofmann, T.; , "Efficient Subwindow Search: A Branch and Bound Framework for Object Localization," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.31, no.12, pp.2129-2142, Dec. 2009
Our work towards better detection performance
Combining motion information with appearance information
Using mutual information encoded among the features of the same object
By using extra information, we expect to achieve better detection performance
Our work
Combining motion information with appearance information
1. Common Fate Hough Transform
2. Emergency Telephone Indicator Detection in Tunnel Environment
1. Common Fate Hough Transform
Goal and challenges
Where and what label is each object in the scene
Challenges: Near objects Similar different-
class objects
Common Fate Hough Transform
Related work
Motion for object detection Background subtraction Optical flow[1]
Combination of appearance and motion information Mainly for tracking
[1] G. Brostow and R. Cipolla. Unsupervised bayesian detection ofindependent motion in crowds. In CVPR, pages I: 594–601, 2006.
Hough transform
On the current image: Detect keypoints and extract image feature Find the best matched codes from a trained
codebook Each keypoint votes for
object center and label
B. Leibe and B. Schiele. Interleaved object categorization and segmentation.In BMVC, pages 759–768, 2003.
Hough transform
Discrete to continuous Blurring each vote Summing up all votes from
all object parts
Common Fate Hough Transform
The common fate principle
One of the gestalt laws Elements with the same motion tend to be
perceived as one unit
Common Fate Hough Transform
Common fate Hough transform
Motion analysis[1]
Keypoint tracking (KLT tracker)
Clustering the trajectories
[1] G. Brostow and R. Cipolla. Unsupervised bayesian detection ofindependent motion in crowds. In CVPR, pages I: 594–601, 2006.
Common fate Hough transform
The weight of each vote is related to the support it gains from the motion group
Common fate Hough Transform
Common fate Hough transform
Common Fate Hough Transform
Inference 1. Find the highest peak in the confidence space 2. Exclude the object parts belonging to the
highest peak 3. If the Hough image is not empty, go to 1.
Inference
Common Fate Hough Transform
Common Fate Hough Transform
Inference
Common Fate Hough Transform
Inference
Common Fate Hough Transform
Inference
Common Fate Hough Transform
Inference
Common Fate Hough Transform
Inference
Common Fate Hough Transform
Inference
Common Fate Hough Transform
Inference
Experimental results
Dataset 720×576 401 continuous frames 633 different-class ground truth bounding boxes
on 79 frames
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Results
Common Fate Hough Transform
Result comparison
[1] O. Barinova, V. Lempitsky, and P. Kohli. On detection of multiple object instances using hough transforms. In CVPR, pages 2233–2240, 2010.
Contribution
A detection method with better detection performance than the CVPR’10 method
A successful combination of motion and appearance information for detection
A successful attempt to combine human perception rules into detection method in the computer vision area
Common Fate Hough Transform
Conclusion
Motion largely improves detection results by being effectively combined with appearance information
The method is not efficient then we propose
Common Fate Hough Transform
2. Detection of Emergency Telephone Indicators by Fusion of Motion Information with Appearance Information
Goal
Emergency Telephone Indicator Detection
For vehicle positioning Detecting emergency telephone indicators
Installed at known location Infrared cameras (far infrared) installed on vehicle’s
top
Challenges
Challenges: Noisy objects Real-time requirement
Emergency Telephone Indicator Detection
Method Method: a two-step method
Detect, verify, and cluster keypoints Verify keypoint clusters by appearance and
motion information
Emergency Telephone Indicator Detection
Keypoint Detection
Keypoint Verification
Keypoint Clustering
Keypoint Cluster Verificationby Appearance
Keypoint Cluster Tracking
Keypoint Cluster Verification by Motion
Pipeline
Pipeline
Detect keypoints Averagely sampling Setting intensity thresholds
160-190 (0-255)
Emergency Telephone Indicator
Pipeline
Verify keypoints Intensity histogram Building a mixture model
using k-means
Emergency Telephone Indicator Detection
Pipeline
Cluster keypoints Build a minimum spanning tree (Euclidean
distance) Split the tree
Emergency Telephone Indicator Detection
Pipeline
Verify keypoint clusters by appearance Adaboost machine
Trained from positive and negative examples Intensity histogram
Emergency Telephone Indicator Detection
Pipeline
Keypoint cluster tracking No occlusion Connect each detection
response to its nearest trajectory (appearance, scale, and time gap)
Emergency Telephone Indicator Detection
Pipeline
Keypoint cluster verification by motion Fit each trajectory as a straight line Use the significance of the fitting as criteria
Emergency Telephone Indicator Detection
Benefits of the pipeline
The system is real-time More time consuming steps dealing with
fewer instances Original image: 105 pixels Keypoint detection: 104 points Keypoint verification: 103 keypoints Keypoint clustering: 102 keypoints Less than 10 keypoint clusters
Emergency Telephone Indicator Detection
Experimental results
Real-time (34 fps) A laptop with Intel Core2 Duo 2.8GHz
processors
Experimental results
Emergency Telephone Indicator Detection
Contribution
A specialized real-time detection method on cluttered data
A successful combination of motion and appearance information for detection
The meaning of this research: a potential applicable solution for vehicle positioning in tunnel environment
Emergency Telephone Indicator Detection
Conclusion
Motion information plays an important role in the detecting process
Emergency Telephone Indicator Detection
Future work
We collect new data using new camera and plan to improve the pipeline on the new collected data
Emergency Telephone Indicator Detection
Two methods utilizing motion information are proposed
When motion information is not available, we then go to
Our work
Utilizing of Mutual Information Encoded Among Image Features of the Same Object
3. Pyramid Match Score for Object Detection
4. Detection of Object With In-plane Rotations
3. Pyramid Match Score for Object Detection
Motivation
Sliding window based methods: Often [1] ignore positional information of each
image feature Methods based on Hough transform
Ignore overall information of each object Intention of this method:
An efficient method to utilize positional information of each image feature and overall information gained from the whole object
[1] Lazebnik, S.; Schmid, C.; Ponce, J.; , "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on , vol.2, no., pp. 2169- 2178, 2006
Pyramid Match
Pyramid Match is a method to find best one-one match between two sets of points Here gives a 2D example Real data is 14D
Appearance information:12D Position information: 2D
Pyramid match
Divide the space from fine to coarse If two points are in the same grid under a division,
the are considered as match and excluded
Pyramid match
The distance of two matched points No need to be calculated Assigned
According to the size of the grid, when they are found match
1/4 * 2 1/2 * 2 1*2
Pyramid Match Score
Pyramid match between two sets Each sub-image: considered as a 14D point set
(appearance: 12, position: 2) A “super template” contains all 14D points of
training images
Pyramid Match Score
Find the near best match between each sub-image of a test image and the “super template”
The match score (distance) is used as the confidence that the sub-image contains a target object
Results on UIUC cars
Results
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
0.1
0.2
0.3
0.4
0.5
0.6
3
10
7
av
back
Precison
Det
ecti
on
rat
e
Comments
The results are not good enough The method seems promising, and in future
we will improve the method
This mutual information can be further used
4 Detection of Object With In-plane Rotations
About the related work
Training examples of all rotation directions,
not robust
or very time-consuming
Related work
Two neural networks to deal with one sub-image One for estimation of the
sub-image’s rotation angle The sub-image is rotated so
that the potential face is upright
The other for decision of whether one object exists
H. Rowley, S. Baluja, and T. Kanade. Rotation invariant neural network-based face detection. In CVPR, pages 38–44, 1998.
Related work
Local feature method Infer rotation based on the gradient direction of
sift feature Graph-based methods
Consider object as graph, and change the graph
Detection of Object With In-plane Rotations
Idea
Suppose we have a codebook trained from objects without in-plane rotations
Detection of Object With In-plane Rotations
Idea
With proof omitted For two or more votes from different keypoints,
there exists one and only one rotation angle which will minimize the difference of the voted centers
If all votes are good estimations, we can expect to be a good estimation the rotation angle of the object
Detection of Object With In-plane Rotations
To do
Propose a method capable of detecting objects with in-plane
Evaluate the robustness of
Detection of Object With In-plane Rotations
Conclusion
Motion information combined with appearance information Distinguish target objects with noisy object Enhance detection rate
Mutual information encoded among the image features of the same object We expect to get a method capable of utilizing
positional and overall information We expect to get a method capable of detecting
objects with in-plane rotations
Publication list
[1] Zhipeng Wang; Jinshi Cui; Hongbin Zha; Kegesawa, M.; Ikeuchi, K.; , "Object detection by common fate Hough transform," Pattern Recognition (ACPR), 2011 First Asian Conference on , vol., no., pp.613-617, 28-28 Nov. 2011
[2] Zhipeng Wang; Kagesawa, M.; Ono, S.; Banno, A.; Ikeuchi, K.; , "Emergency light detection in tunnel environment: An efficient method," Pattern Recognition (ACPR), 2011 First Asian Conference on , vol., no., pp.628-632, 28-28 Nov. 2011
[3] Zhipeng Wang; Jinshi Cui; Hongbin Zha; Masataka Kegesawa; Shintaro Ono; Katsushi Ikeuchi;, “Detection by Motion-based Grouping of Object Parts,” International Journal of ITS Research (submitted)
[4] Zhipeng Wang; Masataka Kegesawa; Shintaro Ono; Atsuhiko Banno; Takeshi Oishi; Katsushi Ikeuchi;, “Detection of Emergency Telephone Indicators Using
Infrared Cameras for Vehicle Positioning in Tunnel Environment,” ITS World Congress 2013 (submitted)
Thank you very much!