Upload
yuma
View
102
Download
0
Embed Size (px)
DESCRIPTION
CV 輪講 Putting Objects in Perspective. 藤吉研究室 土屋成光 2008 年 7 月 1 日. Back ground. 一般物体認識 / 画像シーン認識 低解像度 見えの違い 奥行きによるサイズの違い ⇒局所的な認識法が通用しない 人間は物体間の関係を利用 三次元構造のモデル化 局所的な認識手法を高精度に. Putting Objects in Perspective. Derek Hoiem , Alexei A. Efros , Martial Hebert - PowerPoint PPT Presentation
Citation preview
CV 輪講Putting Objects in
Perspective
藤吉研究室 土屋成光 2008年 7月 1日
Back ground
一般物体認識 / 画像シーン認識– 低解像度– 見えの違い– 奥行きによるサイズの違い ⇒局所的な認識法が通用しない
人間は物体間の関係を利用– 三次元構造のモデル化– 局所的な認識手法を高精度に
Putting Objects in Perspective
Derek Hoiem , Alexei A. Efros , Martial Hebert
Carnegie Mellon University Robotics Institute
CVPR2006
Understanding an Image
Today: Local and Independent
検出結果
Local Object Detection
True Detection
True Detections
MissedMissed
False Detections
Local Detector: [Dalal-Triggs 2005]
Object Support
Surface Estimation
Image Support Vertical Sky
V-Left V-Center V-Right V-Porous V-Solid
[Hoiem, Efros, Hebert ICCV 2005]
Software available online
ObjectSurface?
Support?
Object Size in the Image
Image World
Input Image
Object Size ↔ Camera Viewpoint
Loose Viewpoint Prior
Object Size ↔ Camera Viewpoint
Input Image Loose Viewpoint Prior
Object Position/Sizes Viewpoint
Object Size ↔ Camera Viewpoint
Object Position/Sizes Viewpoint
Object Size ↔ Camera Viewpoint
Object Position/Sizes Viewpoint
Object Size ↔ Camera Viewpoint
Object Size ↔ Camera Viewpoint
Object Position/Sizes Viewpoint
Efficient from surface and viewpoint
Image
P(object) P(object | surfaces)
P(surfaces) P(viewpoint)
P(object | viewpoint)
Image
P(object | surfaces, viewpoint)
Efficient from surface and viewpoint
P(object)
P(surfaces) P(viewpoint)
Scene Parts Are All Interconnected
Objects
3D SurfacesCamera Viewpoint
Input to Algorithm
Surface Estimates Viewpoint Prior
Surfaces: [Hoiem-Efros-Hebert 2005]
Local Car Detector
Local Ped Detector
Object Detection
Local Detector: [Dalal-Triggs 2005]
Approximate Model
Objects
3D SurfacesViewpoint
s1
o1
θ
on...
sn…
Local Object Evidence
Local Surface Evidence
Local Object Evidence
Local Surface Evidence
Viewpoint
Objects
Local Surfaces
Inference over Tree
Viewpoint estimation
Viewpoint Prior
HorizonHeight Height Horizon
Like
liho
od
Like
liho
od
Viewpoint Final
Object Identitie
Local detector
Surface Geometry
Probability map
Object detection
4 TP / 2 FP
3 TP / 2 FP
4 TP / 1 FP
Ped Detection
Car Detection
Local Detector: [Dalal-Triggs 2005]4 TP / 0 FP
Car: TP / FP
Ped: TP / FP
Initial (Local) Final (Global)
Experiments on LabelMe Dataset
Testing with LabelMe dataset: 422 images– 923 Cars at least 14 pixels tall– 720 Peds at least 36 pixels tall
Each piece of evidence improves performance
Local Detector from [Murphy-Torralba-Freeman 2003]
Car Detection Pedestrian Detection
Can be used with any detector that outputs confidences
Local Detector: [Dalal-Triggs 2005] (SVM-based)
Car Detection Pedestrian Detection
Accurate Horizon Estimation
Median Error: 8.5% 4.5% 3.0%
90% Bound:
[Murphy-Torralba-
Freeman 2003]
[Dalal- Triggs 2005]
Horizon Prior
Qualitative Results
Initial: 2 TP / 3 FP Final: 7 TP / 4 FP
Local Detector from [Murphy-Torralba-Freeman 2003]
Car: TP / FP Ped: TP / FP
Qualitative Results
Local Detector from [Murphy-Torralba-Freeman 2003]
Car: TP / FP Ped: TP / FP
Initial: 1 TP / 14 FP Final: 3 TP / 5 FP
Qualitative Results
Car: TP / FP Ped: TP / FP
Local Detector from [Murphy-Torralba-Freeman 2003]
Initial: 1 TP / 23 FP Final: 0 TP / 10 FP
Qualitative Results
Local Detector from [Murphy-Torralba-Freeman 2003]
Car: TP / FP Ped: TP / FP
Initial: 0 TP / 6 FP Final: 4 TP / 3 FP
Geometric Context
Estimate surface
ground: green, sky: blue, vertical: red, o:porous, x: solid
Geometric Cues
Color
Location
Texture
Perspective
Robust Spatial Support
RGB Pixels Superpixels
[Felzenszwalb and Huttenlocher 2004]
oversegmentation
Multiple Segmentations
Superpixels
…
Multiple Segmentations
単一のセグメントではセグメントエラーの可能性 複数のセグメント数でセグメンテーション
Labeling Segments
…
…
各セグメント結果を統合
Learn from training images
前準備– multiple segmentation の算出– 各セグメントのラベルの算出 – ground, vertical,
sky, or “mixed” boosted decision trees による密度計算
– 8 nodes per tree– Logistic regression version of Adaboost
[Collins and Schapire and Singer 2002]
Label LikelihoodHomogeneity Likelihood
Image Labeling
…
Labeled Segmentations
Labeled Pixels
Learned from training images
Summary & Future Work
meters
met
ers
Ped Pe
dCar
Reasoning in 3D:• Object to object• Scene label• Object segmentation
Conclusion
Image understanding is a 3D problem– Must be solved jointly
This paper is a small step– Much remains to be done
CV 輪講Recovering Occlusion
Boundaries from a Single Image,
Closing the Loop in Scene Interpretation
藤吉研究室 土屋成光 2008年 8月 26日
Back ground
一般物体認識 / 画像シーン認識– 低解像度– 見えの違い– 奥行きによるサイズの違い ⇒局所的な認識法が通用しない
人間は物体間の関係を利用– 三次元構造のモデル化– 局所的な認識手法を高精度に
Recovering Occlusion Boundaries from a Single
Image
Derek Hoiem , Andrew N. Stein, Alexei A. Efros , Martial Hebert
Carnegie Mellon University Robotics Institute
ICCV’07
単画像からのオクルージョン理解
オクルージョン,境界理解– 物体を探索する際に必須– Edge, region, depth によって推定
手法の流れ
1. 千領域にセグメンテーションWatershed with Pb soft boundaries
2. Region, Boundary, 3D Cues の算出depth : horizon + junction to ground
3. Boundary の算出Conditional random field (CRF)
4. Boundary を用いて更にセグメンテーション
results
Boundary
Object popout
Closing the Loopin Scene Interpretation
Derek Hoiem , Alexei A. Efros , Martial Hebert
Carnegie Mellon University Robotics Institute
CVPR’08
Putting Objects in Perspective
4 TP / 2 FP
3 TP / 2 FP
4 TP / 1 FP
Ped Detection
Car Detection
Local Detector: [Dalal-Triggs 2005]4 TP / 0 FP
Car: TP / FP
Ped: TP / FP
Initial (Local) Final (Global)
Scene Parts Are All Interconnected
Objects
3D SurfacesCamera Viewpoint
with Occlusions
一般物体認識フレームワークPutting Objects in Perspective
シーン構造認識Automatic Photo Pop-up
Occlusion, Boundary 情報の利用
関係モデル
相互に関係
Putting Objects への利用
相互的に情報を利用することで高精度に
Initial : Dalal-Triggs Iter 1 : Hoiem et al. Final : This paper
Car : Up, Ped : Down群衆の境界線の精度が問題
Photo popup への利用
Occlusion, Object の利用により高精度化
まとめ
Occlusion/Boundary の算出– 一枚の画像から geometry, depth などを用いて算出– 高精度なセグメンテーション
Occlusion/Boundary の利用– セグメンテーションによるエラーの低減– 一般物体認識に有用
課題:– 群衆などから得られる Boundary の高精度化