A Multimodal Approach for Face Modeling and Recognition

1

A Multimodal Approach for FaceModeling and Recognition

指導老師 : 萬書言老師學生 : 何炳杰

2

Outline

Abstract Introduction 3-D Face Recognition Based On Ridge Images And

Iterative Closest Points 2-D Face Recognition Based On Attributed Graphs Fusing The Information From 2-D And 3-D Experiments And Results

3

Abstract 1/3

In this paper, we present a fully automated multimodal (3-D and 2-D) face recognition system.

For the 3-D modality, we model the facial image as a 3-D binary ridge image that contains the ridge lines on the face.

We use the principal curvature max to extract the locations of the ridge lines around the important facial regions on the range image (i.e., the eyes, the nose, and the mouth.)

4

Abstract 2/3

For the 2-D modality, we model the face by an attributed relational graph (ARG).

Each node of the graph corresponds to a facial

feature point. At each facial feature point, a set of attributes is extracted by applying Gabor wavelets to the 2-D image and assigned to the node of the graph.

5

Abstract 3/3

Finally, we fuse the matching results of the 3-D and the 2-D modalities at the score level to improve the overall performance of the system.

6

Introduction 1/5

In this paper, we present a multimodal face recognition system that fuses results from both 3-D and 2-D face recognition.

The 2-D and the 3-D modeling data in our system is independent of each other, this system can be employed in different scenarios of face recognition, such as 2-D or 3-D face recognition individually, or multimodal face recognition.

7

Introduction 2/5

Fig. 1 illustrates a general block diagram of our system.

3-D binary ridge image

ARG

8

Introduction 3/5

For the 3-D modality: (i) we use the principal curvature to extract the locations of the

ridge lines around the important facial regions in the range image (i.e. the eyes, nose, and mouth).

(ii) we represent the face image as a 3-D binary ridge image that contains the ridge lines on the face.

(iii) In the matching phase, instead of using the entire surface of the face, we only match the ridge lines.

(By (iii) This reduces the computations during the matching process. )

maxk

9

Introduction 4/5 For 2-D modality, we build an attributed relational graph

using nodes at certain labeled facial points. In order to automatically extract the locations of facial points,

we use an improved version of active shape model (ASM) . At each node of the graph, we compute the response of 40 Gabor

filters in eight orientations and five wavelengths. The similarity between the ARG models is employed for 2-D

face recognition.

10

Introduction 5/5 The similarity between the ARG models is employed for 2-D face r

ecognition. In summary, the main contributions of this paper are: presenting a fully automated algorithm for 3-D face recognition based on the rid

ge lines of the face; developing a fully automated algorithm for 2-D face recognition based on attrib

uted relational graph models. presenting and comparing two methods for the fusion of the 2-D and 3-D face re

cognition based on the Dempster– Shafer (DS) theory of evidence and the weighted sum of scores technique;

evaluating the performance of the system using the FRGC2.0 database.

11

3-D Face Recognition Based On Ridge Images And Iterative Closest Points 1/3

A. Ridge Images(山脊影像 ) Our goal is to extract and use the points lying on ridge line

s as the feature points on the surface. For facial range images, these are points on the lines aroun

d the eyes, the nose, and the mouth. In the literature [13], ridges are defined as the points at whi

ch the principal curvature of the surface attains a local positive maximum.

Intuitively, valleys are the points that illustrate the drainage patterns and are referred to as ridges when looked at from the opposite side.

12

圖 . 2顯示了一個例子，一山脊圖像得到了的 Kmax閾值。這是一張三維二值影像顯示臉部表面上山脊線的位置。

13


B. Ridge Image Matching In this work, we use a fast ICP variant [33]. The difference in the ICP that we used in this paper and the ICP in

[33] is in the phase of feature point selection. We do not rely on random sampling of the points and we use all of

the feature points in the 3-D ridge image during the matching process.

Although random sampling of the points speeds up the matching process, it has a major effect on the accuracy of the final results. …作者的觀點

14


Before matching the ridge images, we initially align the ridge images using three extracted facial landmarks (i.e., the two inner corners of the eyes and the tip of the nose).

We use a fully automated technique to extract these facial landmarks, based on Gaussian curvature.

15

As shown in Fig. 3

Fig. 3, the surface that either has a peak or a pit shape has a positive Gaussian curvature value.

16

As shown in Fig. 4

Fig. 4 shows a sample range image with the three extracted facial landmarks

眼窩鼻尖 / 頭

17

2-D Face Recognition Based On Attributed Graphs 1/14

Elastic bunch graph matching (EBGM) represented a facial image by a labeled graph called bunch graph.

Where edges are labeled with distance information and nodes are labeled with wavelet responses bundled in jets.

In addition, bunch graphs are treated as combinatorial entities in which, for each fiducial point, a set of jets from different sample faces is combined, thus creating a highly adaptable model.

18


In mathematics, a geometric graph is a graph in which the vertices or edges are associated with geometric objects or configurations .

A triangulation is a technique for building a geometric graph.

Delaunay triangulation, a graph defined from a set of points in the plane by connecting two points with an edge whenever a

circle exists containing only those two points.

19

Delaunay triangulation

20


In this paper, the goal is to model 2-D facial images by attributed relational graphs.

21


A. Building the Attributed Graph An ARG [26] consists of a set of nodes, edges, and mutual

relations between them. Let us denote the ARG by , where is the set of N nodes of the graph

and is the set of M edges. The nodes of the graph represent the extracted facial

features. R is a set of mutual relations between the three edges of

each triangle in the Delaunay triangulation.

( , , )g V R

1 2{ , ,..., }Me e e 1 2{ , ,..., }NV v v v

22


Mathematically, we write , where is the set of triangles in Delaunay tria

ngulation.

Recall that a Delaunay triangulation for a set of points satisfies the condition that no point in is inside the circumcircle of any triangle in .

{ | , , }ijk i j k tR r e e e D

tD

( )tD P

( )tD P

PP

23


Where specifies the orientation of the wavelet, is the wavelength of the sine wave, is the radius of the Gaussian, is the phase of the sine wave, and γ specifies the aspect ratio of the Gaussian.

The kernels of the Gabor filters are selected at eight orientations (i.e., ) and five wavelengths (i.e., )

{0, /8,2 /8,3 /8,4 /8,5 /8,6 /8,7 /8}

{1, 2,2,2 2,4}

24


Specifically, referring to Fig. 5, the mutual relations used in this work are defined to be :

25


B. Facial Feature Extraction In this paper, we transform the color image into HSV

space and assume that the three channels, (i.e., hue, saturation, and value) are statistically independent and the normalized first derivative for each channel along a profile line satisfies a multivariate Gaussian distribution.

26


The best match for a probe sample in HSV color space to a reference model is found by minimizing the distance :

: is the sample profile. and : are the mean and the covariance of the profile line of

the component of the Gaussian model, respectively. : is the weighting factor for the component of the model with

the constraint that the

ig

gi1

i

thi

iw1h s vw w w

27


C. Feature Selection The number of feature points affects the

performance of the graph representation for face recognition.

In this work, we initially extracted 75 feature points and we then used a standard template to add more features at certain positions on the face, such as the cheek and the points on the ridge of the nose.

28


By using the standard template (Fig. 6), the total number of the feature point candidates represented by the nodes of the ARG model was increased to 111 points.

29


Fig. 7 shows a sample face in the gallery along with the candidate points for building the ARG model.

30


D. Recognition Assume that the ARG models of two faces and are given. The dissi

milarity between these two ARGs is defined by

and are functions that measure the differences between the nodes of the graph and the mutual relations of the correspond

ing triangles from the Delaunay triangulation, respectively. The and are weighting factors.

vwrw

(1 (.))vS (.)rD

31


The similarity measure is defined as

: is the magnitude of the set of 40 complex coefficients of the Gabor filter response, obtained at the node of the graph.

thj

ja

(.)vS

32

Fusing The Information From 2-D And 3-D 1/4

The Tanh-estimators score normalization is efficient and robust and is defined as

and : are the scores before normalization and after normalization.

The and are the mean and standard deviation estimates, respectively.

js n

js

GH GH

33


Hampel estimators are based on the following influence function:

where sign( ) = +1 if >=0 ; otherwise,sign( ) = -1 . The Hampel influence function reduces the influence of th

e scores at the tails of the distribution (identified by a, b, and c ).

34


B. Fusion Techniques The weighted sum score fusion technique is defined as :

: is the weight of the modality with the condition

and is the normalized score of the modality.

jw thj1

1R

jj

w

n

js thj

35


In our case, the values of the weights and for the 3-D and 2-D modalities, respectively.

Another fusion algorithm that we applied to combine the results of the 2-D and 3-D face recognition is the DS theory.

Based on the Dempster rule of combination, the match scores obtained from two different techniques (i.e., two modalities in our work) can be fused by

1w 2w

36

Experiments And Results 1/5

Fig. 8 shows the results of the verification experiment for neutral versus neutral facial images.

As the ROC curve shows (also the second row of Table II), the 3-D modality has better performance than the 2-D modality (88.5% versus 79.80% verification at 0.1% FAR) and the best verification rate of multimodal (3-D + 2-D) fusion belongs to the DS combination rule (94.49% at 0.1% FAR).

37

Table II

38


Fig. 9 shows the verification rate of the multimodal (3-D + 2-D) fusion, at 0.1% FAR, with respect to different weights for each modality.

Since there are only two modalities, then and the x axis of Fig. 9 is .

1w1 2 1w w

39


As the figure shows, the optimum weights that produce the maximum fusion performance are 0.7 and 0.3, respectively, for and .

1w 2w

40


Fig. 10 shows for various numbers of subjects enrolled in the database the average rank-one identification rate.

41


Fig. 11 shows the cumulative match characteristic (CMC) curve for the recognition, based on ridge images, of faces with expressions using the FRGC v2.0 database.

42

Thank you !

Documents

A Multimodal Approach for Face Modeling and Recognition