Download pdf - Nadia2013 research

Nadia Barbara Figueroa Fernandez

3D Computer Vision and Applications in Robotics and Multimedia

Reconstruct your world

Reconstruct yourself

•  BACKGROUND

•  3D COMPUTER VISION

•  APPLICATIONS IN ROBOTICS Research Projects at TU Dortmund Master’s Thesis at DLR

•  APPLICATIONS IN MULTIMEDIA Research Projects at NYU Abu Dhabi

DLR’s rollin’ JusEn Humanoid

AGENDA

EducaEon and Research PosiEons

BACKGROUND

Fundamentals

1

General DefiniEon

2

My DefiniEon

3

What if a point cloud?

“Generate 3D representaBons of the world from the viewpoint of a sensor, generally in the form of 3D point clouds.”

“Ability of powered devices to acquire a real Bme picture of the world in three dimensions”. -‐ Wikipedia

3D COMPUTER VISION

€

p∈P

€

p = (x,y,z,r,g,b)“A point cloud is a set of points where .”

•  Primesense 3D sensor •  MicrosoP Kinect

Example text

3 Light Coding – Structured Light

•  Stereo Systems

•  MulB-‐Camera Stereo

2 TriangulaEon-‐based Systems 1 Time-‐Of-‐Flight Sensors

Sensing Devices

3D COMPUTER VISION

•  LIDAR (Light DetecBon and Ranging) •  Radar •  Sonar

•  TOF Cameras •  PMD (Photonic Mixing Device)

APPLICATIONS IN ROBOTICS

CalibraEon and VerificaEon Mapping and NavigaEon

Object RecogniEon and Mobile ManipulaEon

Nadia Figueroa and JiVu Kurian

OBJECT RECOGNITION FOR A MOBILE MANIPULATION PLATFORM

GOAL: Detect and esBmate the pose of a wanted object in a table top scenario.

PROPOSED APPROACH: Use CCD and PMD cameras. PRE-‐REQUISITES:

1.-‐ CalibraBon of PMD-‐CCD Camera Rig 2.-‐ Object Database

Pre-‐Requisite 1: CalibraEon of PMD-‐CCD rig


CalibraEon and camera set-‐up (CCD-‐PMD) •  Binocular camera setup of

PMD and CCD Camera. •  Stereo System CalibraBon

Method. –  MathemaBcally align the 2

cameras in 1 viewing plane. –  Using epipolar geometry,

calculate essenBal and fundamental matrices.

Pre-‐Requisite 2: Object Database


Object model generaEon • Each object is matched with 20 training images. • The keypoints (SURF) that are repeatedly matched are selected as the „best“ keypoints. • APer training each object, we get 100 keypoints per object.

Object 1 Object 2 Object 3

Object RecogniEon Algorithm


PMD Data FlaVening and Variance SegmentaEon Algorithm


Original PMD

Segmented PMD Fla^ened PMD


DLR’S ROLLIN’ JUSTIN

Built of light-‐weight structures and joints with mechanical compliances and flexibiliEes.

(+) Compliant behavior of the arm (-‐) Low posiEong accuracy at the TCP (Tool-‐Center-‐Point) end pose.

Designed to interact with humans and unknown environments.

How is this low posiEon accuracy compensated in this lightweight design?

Using the torque sensors. (+) An approximaBon of a joint’s deflecBon is obtained by:

:measured torque :sBffness coefficient of the gear (-‐) This approx. is insufficient. It cannot measure the remaining mechanical flexibiliBes.

€

Θi = θ i +τ i Ki

€

τ

€

K

ROLLIN’ JUSTIN’S LOW POSITION ACCURACY

MASTER THESIS MOTIVATION

Problem

Goal

Requirements

Create a verificaBon rouBne to idenBfy the maximum bounds of the TCP posiBoning errors of humanoid JusBn’s upper kinemaBc chains.

The feasibility of moBon planning is highly dependent on the posiBoning accuracy.

1. Avoid using any external sensory system. 2. Avoid any human intervenBon

Supervisors: Florian Schmidt and Haider Ali

3D REGISTRATION FOR VERIFICATION OF HUMANOID JUSTIN’S UPPER BODY KINEMATICS

€

TCP = TwhTh

aTatcp

TCP measured by forward kinematics:

€

TCP = TwhTh

sTstcp

TCP measured by stereo vision system:

€

Tstcp

€

Ths

€

Tatcp

€

Tha

€

TCP

€

Twh

TCP End-Pose Error:

Proposed Approach: Use the on-‐board stereo vision system to esBmate the TCP end-‐pose.


3D point clouds of the hand from the stereo cameras.

EsBmate TCP by using registraBon between a point cloud of the hand and a model.

RegistraEon method evaluaEon 1. Keypoint extracBon (SIFT) & point-‐to-‐point correspondence. 2. Local descriptor (FPFH/SHOT/CSHOT) matching using Ransac-‐based correspondence search.

Model GeneraEon

Data AcquisiEon

Pose EsEmaEon Model generated from an extended metaview registraBon method from a selected subset of views generated by analyzing the distribuBon of max/min depth values.

Data AcquisiEon: Dense 3D point cloud generated from Stereo



Point Cloud Processing Pass-‐through filter (remove background). StaBsBcal Outlier Removal (remove outliers) Voxel Grid Filter (downsample).

3D RegistraEon Methods


Model GeneraEon


Model GeneraEon


Extended Metaview RegistraEon Method Consists of 3 steps: Global Thresholding Process: Reject the views that lie in unstable areas. Next Best View Ordering Algorithm: Find an order for incrementally registering the subset of point clouds. Metaview RegistraEon: The resulBng subset of views are registered and merged.

VerificaEon RouEne


€

ek = 〈et ,eθ 〉

€

fk = 3dRMS

€

E = (e1,..,eN )

€

F = ( f1,..., fN )€

F* = RANSAC(F)

€

eb = 〈max(et ∈E*),max(eθ ∈E*)〉

VerificaEon RouEne


Method EvaluaEon (Ground Truth)


Pose EsEmaEon using IR ART tracking system (Ground Truth)

ART System Set-‐up –  MulB-‐camera setup that

esBmates the 6DOF pose of the tracking targets.

–  Mean accuracy of 0.04 pixels.

–  Speed of 100 fps.



Implicit loop closure with tracking system (Ground Truth) –  By expressing in ART coordinate system a double loop closure is generated.

€

TCPfk = TartheTTheT

h ThaTa

tcp

€

TCPreg = TartheTTheT

h ThsTs

tcp

€

TCPart = (TartheTTheT

h )−1TarthaTThaT

tcp

§  Error IdenBficaBon

€

Tatcp

€

Tha

€

TCP

€

ART

€

TartheT

€

TarthaT

€

ThaTtcp

€

TheTh

€

Tstcp

€

Ths

€

TCPfk,TCPreg

Two step calibraEon: I. Center of RotaEon EsEmaEon: Non-‐rigid geometrically constrained sphere-‐fimng

min subject to :spherical fit :measurements :spherical constraint II. Axis of RotaEons EsEmaEon Combined plane/circle fimng for each axis.

min

:planar :radial


CalibraEon of Tracking targets to JusEn –  The esBmaBon of relies on the idenBficaBon of and

€

TCPart

€

TheTh

€

ThaTtcp

€

f = (δk2 +ε k

2)k=1

N

∑

€

ε k =||vk −m ||2 −r2

€

uTDTDu

€

uTCu =1

€

εk

€

δk

€

u

€

C

€

D


CalibraEon of Tracking targets to JusEn (cont’d) –  Create spherical trajectories around and .

–  CoR is the posiBon of the joint deviaBons throughout 10 calibraBons. –  AoRs are the rotaBons

–  Moun*ng frames: deviaBons throughout 10 calibraBons.

€

R = [AoRx,AoRy,AoRz]€

t = [mx,my,mz ]T

€

head

€

TCP

€

ThaTtcp = TCP(R,t)−1Tart

haT

€

TheTh = head(R,t)−1Tart

heT

€

ThaTtcp

€

TheTh





Experimental Results (TranslaEonal Error)


Experimental Results (RotaEonal Error)


Nadia Figueroa and Haider Ali (DLR)

SEGMENTATION AND POSE ESTIMATION OF PLANAR METALLIC OBJECTS

PROBLEM: Pose esBmaBon of planar metallic objects in a pile.

PROPOSED APPROACH: (i) SegmentaBon using Euclidean clustering (ii) Pose EsBmaBon using RegistraBon

SEGMENTATION AND POSE ESTIMATION OF PLANAR METALLIC OBJECTS

3D point clouds of the cloud from a range sensor.

Cluster RegistraEon

Euclidean Clustering We extract n-‐clusters C from pile P that represent the planar objects by analyzing the angle deviaBons between the surface normal vectors.

Model PosiEve aligned clusters

3D point clouds of the cloud from a range sensor.

Data AcquisiEon

Euclidean Clustering

CONTEXTUAL OBJECT CATEGORY RECOGNITION IN RGB-‐D SCENES

PROBLEM: Object category recogniBon in RGB-‐D Data

PROPOSED APPROACH: (i) Novel combinaBon of depth and color features. (ii) Scene segmentaBon based on table detecBon and euclidean clustering. (iii) ClassificaBon results augmented by a context model learnt from social media.

CONTEXTUAL OBJECT CATEGORY RECOGNITION IN RGB-‐D SCENES

System Architecture

CONTEXTUAL OBJECT CATEGORY RECOGNITION IN RGB-‐D SCENES RGB-‐D Object Features and Classifier

We use a linear SVM to train 6 object categories. The accuracy of our classicaBon framework (63.91%) is four-‐Bmes the minimum baseline generated by a random guess (16.67%).

MulE-‐object ClassificaEon

APPLICATIONS IN MULTIMEDIA

World, object, human reconstrucEon Rapid ReplicaEon (3D prinEng) Gaming

Kinect Fusion

Uses Truncated Signed Distance FuncEon (TSDF) to represent the 3D data. What is a TSDF? A TSDF cloud is a point cloud which use of how the data is stored within GPU at KinFu runBme.

Each element in the grid represents a voxel, and the value inside it represents the TSDF value. The TSDF value is the distance to the nearest isosurface.

RGB-‐D KINECT FUSION FOR CONSISTENT RECONSTRUCTIONS OF INDOOR SPACES Nadia Figueroa, Haiwei Dong and Abdulmotaleb El Saddik

PROBLEM: GeneraBng geometric models of environments for interior design, architectural and re-‐pair or remodeling of indoor spaces.

PROPOSED APPROACH: RGB-‐D Kinect Fusion, which is a combined approach towards consistent reconstrucBons of indoor Spaces based on Kinect Fusion and 6D RGB-‐D Odometry based on efficient feature matching.

RGB-‐D KINECT FUSION FOR CONSISTENT RECONSTRUCTIONS OF INDOOR SPACES 6D RGB-‐D ODOMETRY

FROM SENSE TO PRINT

Nadia Figueroa, Haiwei Dong and Abdulmotaleb El Saddik

FROM SENSE TO PRINT

SegmentaEon based on Camera Pose SemanEcs

Object on Table Top SegmentaEon Human Bust SegmentaEon

THANK YOU!