Nadia Barbara Figueroa Fernandez
3D Computer Vision and Applications in Robotics and Multimedia
Reconstruct your world
Reconstruct yourself
• BACKGROUND
• 3D COMPUTER VISION
• APPLICATIONS IN ROBOTICS Research Projects at TU Dortmund Master’s Thesis at DLR
• APPLICATIONS IN MULTIMEDIA Research Projects at NYU Abu Dhabi
DLR’s rollin’ JusEn Humanoid
AGENDA
Fundamentals
1
General DefiniEon
2
My DefiniEon
3
What if a point cloud?
“Generate 3D representaBons of the world from the viewpoint of a sensor, generally in the form of 3D point clouds.”
“Ability of powered devices to acquire a real Bme picture of the world in three dimensions”. -‐ Wikipedia
3D COMPUTER VISION
€
p∈P
€
p = (x,y,z,r,g,b)“A point cloud is a set of points where .”
• Primesense 3D sensor • MicrosoP Kinect
Example text
3 Light Coding – Structured Light
• Stereo Systems
• MulB-‐Camera Stereo
2 TriangulaEon-‐based Systems 1 Time-‐Of-‐Flight Sensors
Sensing Devices
3D COMPUTER VISION
• LIDAR (Light DetecBon and Ranging) • Radar • Sonar
• TOF Cameras • PMD (Photonic Mixing Device)
APPLICATIONS IN ROBOTICS
CalibraEon and VerificaEon Mapping and NavigaEon
Object RecogniEon and Mobile ManipulaEon
Nadia Figueroa and JiVu Kurian
OBJECT RECOGNITION FOR A MOBILE MANIPULATION PLATFORM
GOAL: Detect and esBmate the pose of a wanted object in a table top scenario.
PROPOSED APPROACH: Use CCD and PMD cameras. PRE-‐REQUISITES:
1.-‐ CalibraBon of PMD-‐CCD Camera Rig 2.-‐ Object Database
Pre-‐Requisite 1: CalibraEon of PMD-‐CCD rig
OBJECT RECOGNITION FOR A MOBILE MANIPULATION PLATFORM
CalibraEon and camera set-‐up (CCD-‐PMD) • Binocular camera setup of
PMD and CCD Camera. • Stereo System CalibraBon
Method. – MathemaBcally align the 2
cameras in 1 viewing plane. – Using epipolar geometry,
calculate essenBal and fundamental matrices.
Pre-‐Requisite 2: Object Database
OBJECT RECOGNITION FOR A MOBILE MANIPULATION PLATFORM
Object model generaEon • Each object is matched with 20 training images. • The keypoints (SURF) that are repeatedly matched are selected as the „best“ keypoints. • APer training each object, we get 100 keypoints per object.
Object 1 Object 2 Object 3
PMD Data FlaVening and Variance SegmentaEon Algorithm
OBJECT RECOGNITION FOR A MOBILE MANIPULATION PLATFORM
Original PMD
Segmented PMD Fla^ened PMD
DLR’S ROLLIN’ JUSTIN
Built of light-‐weight structures and joints with mechanical compliances and flexibiliEes.
(+) Compliant behavior of the arm (-‐) Low posiEong accuracy at the TCP (Tool-‐Center-‐Point) end pose.
Designed to interact with humans and unknown environments.
How is this low posiEon accuracy compensated in this lightweight design?
Using the torque sensors. (+) An approximaBon of a joint’s deflecBon is obtained by:
:measured torque :sBffness coefficient of the gear (-‐) This approx. is insufficient. It cannot measure the remaining mechanical flexibiliBes.
€
Θi = θ i +τ i Ki
€
τ
€
K
MASTER THESIS MOTIVATION
Problem
Goal
Requirements
Create a verificaBon rouBne to idenBfy the maximum bounds of the TCP posiBoning errors of humanoid JusBn’s upper kinemaBc chains.
The feasibility of moBon planning is highly dependent on the posiBoning accuracy.
1. Avoid using any external sensory system. 2. Avoid any human intervenBon
Supervisors: Florian Schmidt and Haider Ali
3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS
€
TCP = TwhTh
aTatcp
TCP measured by forward kinematics:
€
TCP = TwhTh
sTstcp
TCP measured by stereo vision system:
€
Tstcp
€
Ths
€
Tatcp
€
Tha
€
TCP
€
Twh
TCP End-Pose Error:
Proposed Approach: Use the on-‐board stereo vision system to esBmate the TCP end-‐pose.
3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS
3D point clouds of the hand from the stereo cameras.
EsBmate TCP by using registraBon between a point cloud of the hand and a model.
RegistraEon method evaluaEon 1. Keypoint extracBon (SIFT) & point-‐to-‐point correspondence. 2. Local descriptor (FPFH/SHOT/CSHOT) matching using Ransac-‐based correspondence search.
Model GeneraEon
Data AcquisiEon
Pose EsEmaEon Model generated from an extended metaview registraBon method from a selected subset of views generated by analyzing the distribuBon of max/min depth values.
Data AcquisiEon: Dense 3D point cloud generated from Stereo
3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS
3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS
Point Cloud Processing Pass-‐through filter (remove background). StaBsBcal Outlier Removal (remove outliers) Voxel Grid Filter (downsample).
Model GeneraEon
3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS
Extended Metaview RegistraEon Method Consists of 3 steps: Global Thresholding Process: Reject the views that lie in unstable areas. Next Best View Ordering Algorithm: Find an order for incrementally registering the subset of point clouds. Metaview RegistraEon: The resulBng subset of views are registered and merged.
VerificaEon RouEne
3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS
€
ek = 〈et ,eθ 〉
€
fk = 3dRMS
€
E = (e1,..,eN )
€
F = ( f1,..., fN )€
F* = RANSAC(F)
€
eb = 〈max(et ∈E*),max(eθ ∈E*)〉
Method EvaluaEon (Ground Truth)
3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS
Pose EsEmaEon using IR ART tracking system (Ground Truth)
ART System Set-‐up – MulB-‐camera setup that
esBmates the 6DOF pose of the tracking targets.
– Mean accuracy of 0.04 pixels.
– Speed of 100 fps.
Method EvaluaEon (Ground Truth)
3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS
Implicit loop closure with tracking system (Ground Truth) – By expressing in ART coordinate system a double loop closure is generated.
€
TCPfk = TartheTTheT
h ThaTa
tcp
€
TCPreg = TartheTTheT
h ThsTs
tcp
€
TCPart = (TartheTTheT
h )−1TarthaTThaT
tcp
§ Error IdenBficaBon
€
Tatcp
€
Tha
€
TCP
€
ART
€
TartheT
€
TarthaT
€
ThaTtcp
€
TheTh
€
Tstcp
€
Ths
€
TCPfk,TCPreg
Two step calibraEon: I. Center of RotaEon EsEmaEon: Non-‐rigid geometrically constrained sphere-‐fimng
min subject to :spherical fit :measurements :spherical constraint II. Axis of RotaEons EsEmaEon Combined plane/circle fimng for each axis.
min
:planar :radial
3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS
CalibraEon of Tracking targets to JusEn – The esBmaBon of relies on the idenBficaBon of and
€
TCPart
€
TheTh
€
ThaTtcp
€
f = (δk2 +ε k
2)k=1
N
∑
€
ε k =||vk −m ||2 −r2
€
uTDTDu
€
uTCu =1
€
εk
€
δk
€
u
€
C
€
D
3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS
CalibraEon of Tracking targets to JusEn (cont’d) – Create spherical trajectories around and .
– CoR is the posiBon of the joint deviaBons throughout 10 calibraBons. – AoRs are the rotaBons
– Moun*ng frames: deviaBons throughout 10 calibraBons.
€
R = [AoRx,AoRy,AoRz]€
t = [mx,my,mz ]T
€
head
€
TCP
€
ThaTtcp = TCP(R,t)−1Tart
haT
€
TheTh = head(R,t)−1Tart
heT
€
ThaTtcp
€
TheTh
Method EvaluaEon (Ground Truth)
3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS
Method EvaluaEon (Ground Truth)
3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS
Experimental Results (TranslaEonal Error)
3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS
Experimental Results (RotaEonal Error)
3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS
Nadia Figueroa and Haider Ali (DLR)
SEGMENTATION AND POSE ESTIMATION OF PLANAR METALLIC OBJECTS
PROBLEM: Pose esBmaBon of planar metallic objects in a pile.
PROPOSED APPROACH: (i) SegmentaBon using Euclidean clustering (ii) Pose EsBmaBon using RegistraBon
SEGMENTATION AND POSE ESTIMATION OF PLANAR METALLIC OBJECTS
3D point clouds of the cloud from a range sensor.
Cluster RegistraEon
Euclidean Clustering We extract n-‐clusters C from pile P that represent the planar objects by analyzing the angle deviaBons between the surface normal vectors.
Model PosiEve aligned clusters
3D point clouds of the cloud from a range sensor.
Data AcquisiEon
Euclidean Clustering
CONTEXTUAL OBJECT CATEGORY RECOGNITION IN RGB-‐D SCENES
PROBLEM: Object category recogniBon in RGB-‐D Data
PROPOSED APPROACH: (i) Novel combinaBon of depth and color features. (ii) Scene segmentaBon based on table detecBon and euclidean clustering. (iii) ClassificaBon results augmented by a context model learnt from social media.
CONTEXTUAL OBJECT CATEGORY RECOGNITION IN RGB-‐D SCENES RGB-‐D Object Features and Classifier
We use a linear SVM to train 6 object categories. The accuracy of our classicaBon framework (63.91%) is four-‐Bmes the minimum baseline generated by a random guess (16.67%).
MulE-‐object ClassificaEon
Kinect Fusion
Uses Truncated Signed Distance FuncEon (TSDF) to represent the 3D data. What is a TSDF? A TSDF cloud is a point cloud which use of how the data is stored within GPU at KinFu runBme.
Each element in the grid represents a voxel, and the value inside it represents the TSDF value. The TSDF value is the distance to the nearest isosurface.
RGB-‐D KINECT FUSION FOR CONSISTENT RECONSTRUCTIONS OF INDOOR SPACES Nadia Figueroa, Haiwei Dong and Abdulmotaleb El Saddik
PROBLEM: GeneraBng geometric models of environments for interior design, architectural and re-‐pair or remodeling of indoor spaces.
PROPOSED APPROACH: RGB-‐D Kinect Fusion, which is a combined approach towards consistent reconstrucBons of indoor Spaces based on Kinect Fusion and 6D RGB-‐D Odometry based on efficient feature matching.
FROM SENSE TO PRINT
SegmentaEon based on Camera Pose SemanEcs
Object on Table Top SegmentaEon Human Bust SegmentaEon