Download pdf - yüz tanıma

8/8/2019 yz tanma

1/63

Human behaviour analysis and

interpretation based on the videomodality: postures, facialexpressions and head movementsA. Benoit, L. Bonnaud, A. Caplier, N. Eveno, V.

Girondel, Zakia Hammal, M. Rombaut

8/8/2019 yz tanma

2/63

Introduction

Looking at people domain : automatic analysisand interpretation of human actions (gestures,

behaviour, expressions)Needs low level information (video analysis stepanswering to how are things ? ) and highlevel interpretation (data fusion answering to what is happening ? ).

8/8/2019 yz tanma

3/63

Applications

multimodal interactions and interface cf Similar NOE mixed reality systems smart rooms smart surveillance systems (hypovigilance detection,detection of distress cases: old people surveillance, bussurveillance)

e-learning assistance

8/8/2019 yz tanma

4/63

Outline

1. Global posture recognition2. Facial expressions recognition3. Head motion analysis and interpretation4. Conclusion

which expression ? which head motion ? which posture ?

8/8/2019 yz tanma

5/63

System overview

1. Low level data extraction :- segmentation- temporal tracking

- skin detection and face localization2. Static posture recognition :

- low-level data- belief theory, definitions, models, data fusion, decision

3. Results :- training and test sets- recognition results

- video sequence

8/8/2019 yz tanma

6/63

System overview

1030

1030

1030

1030

1030 sitting

Video sequence

Segmentation

Temporal tracking

Skin detection / Face localization

Static posture recognition

Indoor scene filmed by a static camera L o w l e

v e l

d a t a e x t r a c t i o n

8/8/2019 yz tanma

7/63

Low Level data: personsegmentation

Adaptive background removal algorithm :consecutive frame differences + adaptive reference image

A. Caplier, L. Bonnaud and J. M. Chassery Robust fast extraction of video objects combining framedifferences and reference image in Proc. IEEE International Conference on Image Processing , pp. 785-788, September 2001.

Low-level data computed :- rectangular bounding box(SRBB)- principal axes box (SPAB)- gravity center 475

475

---- SPAB

gravity center

SRBB

8/8/2019 yz tanma

8/63

Low Level data: skin detection

Color space YCbCr :no conversions, luminance and chrominances apartskin databases : Von Luschans, one obtained with camera

Method :thresholding in CbCr planewith initial thresholds :Cb [86, 140] Cr [139, 175]

Y

Y

Cr

Cb

8/8/2019 yz tanma

9/63

Low Level data: temporal tracking

Computation of SRBBs overlap : forward backward result

Low-level data computed :- Identification numbers (IDs)- temporal split/merge information

T-1

T

8/8/2019 yz tanma

10/63

Low Level data: face localization

Automatic thresholds adaptation :translation, reduction of detection

intervals towards Cb and Cr mean valuesFace and hands identification :

sorted lists of skin patchescriteria related to temporal tracking and human morhology

V. Girondel, A. Caplier, and L. Bonnaud Hands Detection and Tracking for Interactive MultimediaA lications in Proc. International Con erence on Com uter Vision and Gra hics . 282-287

Cr

Cb

face

lefthand

righthand

8/8/2019 yz tanma

11/63

System overview





- video sequence

8/8/2019 yz tanma

12/63

Posture recognition: measures

Reference posture Da Vinci Vitruvian man posture standing, arms stretched horizontally

Distance measurements D1, D2, D3, D4

ideas : person height and shape compactness

Normalization : ri=Di/Diref860

SRBBSPAB

112

SRBB

SPAB

i

8/8/2019 yz tanma

13/63

Posture recognition:belief theory

Advantages :- use of imprecise, conflictous data- not computationally expensive (HMMs, NNs)

Universe : = {Hi} i=1n 2N subsets A of

Hypotheses : Hi disjunctive if exhaustive closed universeelse open universe

Considered postures :standing (H1), sitting (H2), squatting (H3) and lying (H4)one hypothesis added for unknown postures (H0)

Belief mass distribution m :

confidence degree in A with

8/8/2019 yz tanma

14/63

Posture recognition: measuresevolution

Example for r1 measurement :

frame

8/8/2019 yz tanma

15/63

Posture recognition: measuresmodeling

Belief mass distributions : mri , measurements imprecision

=

8/8/2019 yz tanma

16/63

Posture recognition: data fusion

Final belief mass distribution : mr1234

Orthogonal sum of Dempster :

Example :

Conflict : non-null belief mass for empty set -> contradictorymeasurements

8/8/2019 yz tanma

17/63

System overview





- video sequence

8/8/2019 yz tanma

18/63

Results: implemented system andcomputing time

Implemented system- Sony DFW-VL500 camera- YCbCr 4:2:0 format, 30 fps, 640x480 resolution

- low-end PC : 1.8 GHz- unoptimized C++ code

Computing time :- segmentation : 42%

- temporal tracking : 3%- skin detection and face localization : 50%- static posture recognition : 5%

Results :obtained at approximately 11 fps

8/8/2019 yz tanma

19/63

Results: databases

Training set :- 6 video sequences, different persons of various heights- 10 consecutive postures

- normal postures, in front of the cameraTest set :

- 6 video sequences, other persons of various heights- 7 different postures- free postures, i.e. move the arms, sit sideways

8/8/2019 yz tanma

20/63

Results: recognition rates

Training set :meanrecognition

rate :88.2%

Test set :meanrecognitionrate :80.8%

8/8/2019 yz tanma

21/63

Results: video example

8/8/2019 yz tanma

22/63

Outline



8/8/2019 yz tanma

23/63

Facial expressions analysis

1 Assumptions2 Facial features segmentation: low level data

3 Facial expressions recognition: high levelinterpretation4 Facial expression recognition based on audio: a

step towards a multimodal system

8/8/2019 yz tanma

24/63

Assumptions

Facial regognition based on the analysis of thedeformations of facial permanent features (lips,

eyes and brows) 6 universal emotions: surprise, joy, disgust,anger, fear, sadness

Is it possible to recognize

the facial expression ?

8/8/2019 yz tanma

25/63


1 Assumptions and Applications2 Facial features segmentation: low level data



8/8/2019 yz tanma

26/63

Facial features extraction:models choice

P3

P2P1

P4

P5 P7

P6

Open eye : circle, parabola,

Bezier curveBrow : Bezier curve

P1 . . P2

Closed eye : line

External lips : 4 cubics,2 broken lines

More complex model, more possibledeformations

8/8/2019 yz tanma

27/63

Facial features extraction:models initialisation

Detection of characteristic points: eyes corners, mouth corners

Luminance and chrominancegradient information

xx

xx x

x

Luminance gradient information

8/8/2019 yz tanma

28/63

Facial features extraction:models deformations (1)

Gradient flows of luminance and/or chrominance maximisation

( ). ( ) p cercle

E I p n p

= ur r

I(p)n(p)

8/8/2019 yz tanma

29/63

Facial features extraction:models deformations (2)

A single control point displacementGradient flow of luminance

2 or 3 control points displacementGradient flow of luminance

Mouth corners displacementGradient flow of chrominance and luminance

8/8/2019 yz tanma

30/63

Facial features extraction:some results

Flexibility of the chosen models, accuracy

8/8/2019 yz tanma

31/63


1 Assumptions and Applications2 Facial features segmentation: low level data

3 Facial expressions recognition: high levelinterpretation

4 Facial expression recognition based on audio: a


8/8/2019 yz tanma

32/63

Facial expressions recognition:recognition on facial skeletons

disgust Fear anger

SurpriseJoy sadness

8/8/2019 yz tanma

33/63

Facial expressions recognition:characteristic distances

Facial features deformations related to characteristic distances

D2

D1

D5

D4

D3

Joy : {open mouth} => D3 > Dn3 and D4 > Dn4 ,{mouth corners backwards} => D5 < Dn5 ,{slakened brows} => D2 no modification

Neutral => Dni reference values

Surprise : {raising up brows} => D2 >D n2,{stared eyes} => D1 > Dn1 ,{open mouth} => D3< Dn3 and D4> Dn4 .

8/8/2019 yz tanma

34/63

D5

Facial expressions recognition:distances discretisation

Distances discretisation : 3 states Dni : distance for the neutral expression

S : stable C+ : Di >> Dni C- : Di Dni (state S C+) SC- : Di < Dni (state S C-)

D2 evolution (surprise)

D5 evolution (sourire)

D2

8/8/2019 yz tanma

35/63

Facial expressions recognition:basis of rules

joy C- S / C- C + C+ C -

Surprise C+ C+ C - C+ C+

disgust C - C - S / C+ C + S / C -

anger C + C - S S / C - S

sadness C - C + S S S

fear S / C+ S / C+ S / C- S / C+ S

neutral S S S S S

D1 D2D3

D4

D5

8/8/2019 yz tanma

36/63

Facial expressions recognition:evidence mass distribution and modelling

To each Di is associated the following mass of evidence :

Cliquez pour modifier les styles du texte du masqueDeuxime niveau

Troisime niveau

Quatrime niveau Cinquime niveau

Modelling

thresholds (ah) related to each Di are estimated after atraining step (analysis of distances evolution for 4 facialexpressions and 13 different persons.

:m Di ]1,0[2

)( A A m Di

Di

Dim

8/8/2019 yz tanma

37/63

Facial expressions recognition:method principle (1)

Distances Di measurement and symbolic state determination

Computation of the mass distribution for each Di state

With the basis of rules, computation of the evidence mass for each expression and each Di

Combination of evidence mass distribution in order to take allthe measures into account before taking a decision.

8/8/2019 yz tanma

38/63


( )=

=

=

AC B

D D

D D

C m Bm Am

mmm

)(.)( 21

21

Mass of evidence combination (orthogonal sum):

A, B and C : expressions or subset of expressions.

Reject class ( E8 : unknown ) : this is an expression which is differentfrom all the expressions described in the rules table.

8/8/2019 yz tanma

39/63


8/8/2019 yz tanma

40/63

Facial expressions recognition:results (1)

Hammal-Caplier database

Cohn-Kanade database

8/8/2019 yz tanma

41/63


neutral : 100% unknown : 100% joy : 100%

3 frames from neutral to joy=> transitory expression

8/8/2019 yz tanma

42/63


joy

E2 Surprise

E1 joy

Surprise disgust neutral

E3 disgust

E7 neutral

E1 E3

E8 unknown

other

Total

76,36%

12%

43,10%

88%

6,06%

6,66%

10,90%

0

0,02%

0

0

0

12%

11,08%

0

0,78%

00

0

6,06%

72,44%

2,08%

0

8,62%

9,48%

15,51%

12,06%

0

11,32%

0

0

0

84,44% 51,72% 88%87,26%

Syst\Exp

E2 E6

Results on the Hammal-Caplier database (21 samples for each consideredexpression, 630 images).

8/8/2019 yz tanma

43/63


1 Assumptions2 Facial features segmentation: low level data



8/8/2019 yz tanma

44/63

Facial expressions analysis based on audio(collaboration with Mons University)

Idea: characterization of expressions in the speechsignal => use of statistical speech features such asspeech rate, SPI, energy and pitch

Problem: expressions classes are different. After thepreliminary study, 2 classes active (joy, surprise, anger)and passive (neutral, sadness) suitable for speech

Perspectives: definition of a multimodal system for facialexpression recognition.

8/8/2019 yz tanma

45/63

Outline



8/8/2019 yz tanma

46/63

Head motion interpretation

1 Introduction2 Head motion estimation: biological modelling

3 Examples of head motion interpretation

8/8/2019 yz tanma

47/63

Introduction

Idea: Global head motions such as nods and localfacial motion such as blinking are involved in thehuman to human communication process.Aim: automatic analysis and interpretation ofsuch gestures.

8/8/2019 yz tanma

48/63




8/8/2019 yz tanma

49/63

Head motion estimation: biological modelling

Algorithm overview: human visual system modelling

8/8/2019 yz tanma

50/63

Head motion estimation: retina filtering

OPL stage: spatio-temporal filtering- contours enhancement- noise attenuation- illuminations variations removing

IPL stage: temporal high passfiltering dedicated to movingStimulus- moving contours (perpendicularto the motion direction) extraction- static contours removing

8/8/2019 yz tanma

51/63

Head motion estimation: log-polar spectrumcomputation

Computation of the spectrum of the retina filtered imagein the log polar domain => spectrum easier to analyse:

- roll and zoom = global energy spectrum translations- pan and tilt = local energy spectrum translations- translations = no changes in the energy spectrum from frame to frame

8/8/2019 yz tanma

52/63

Head motion estimation: log-polar spectruminterpretation (1)

Maximums of energy on the contoursperpendicular to the motion direction=> Cumulated curve of energy per orientation

Abscissa of the maximums = motion directionTemporal evolution of the abscissa = motion typeAmplitude of the maximum proportionalto the motion amplitude => energy decreasing

Or annulation in case of stops.

8/8/2019 yz tanma

53/63


8/8/2019 yz tanma

54/63


Each minimum of energy is related to a motion stop

8/8/2019 yz tanma

55/63


To summarize : properties of the retina filteredenergy spectrum in the log-polar domain

max of energy associated to the contours moving to the motion no motion = no energyOrientation of energy maximums = motion directions movements of energy maximums = motion type

8/8/2019 yz tanma

56/63




d d f b ( )

8/8/2019 yz tanma

57/63

Head nods of approbation or negation (1)

Idea : detection of periodic head motions

I am still with you

Approach: to put a biological head motion detectoron a face bounding box and to control all thehead movements

Goal : recognition of head nods of approbation and negation

approbation: periodic head tilting negation: periodic head panning

H d d f b i i (2)

8/8/2019 yz tanma

58/63

Head nods of approbation or negation (2)

Bli ki d i

8/8/2019 yz tanma

59/63

Blinking detection

Blink : vertical motion of the eyelidApproach: to put a biological motion detector on a bounding box

around the eyes

Y i d i

8/8/2019 yz tanma

60/63

Yawning detection

Yawn : vertical motion of the mouthApproach: to put a biological motion detector on a bounding box

around the mouth

H i il d i

8/8/2019 yz tanma

61/63

Hypo vigilance detection

Hypovigilence : short or long eyes closingmultiple head rotations

frequent yawning

Approach: combine the information coming from 3 biologicalmotion detectors

l

8/8/2019 yz tanma

62/63

Outline

1. Global posture recognition2. Facial Expressions recognition3. Head motion analysis and interpretation4. Conclusion


l

8/8/2019 yz tanma

63/63

Conclusion

Human activities analysis and recognition based onvideo and images data

Unified approaches: extraction of low level data andfusion process for high level semantic interpretation

Correlation with application; exple: project 4 ofEnterface Workshop about Attention level detectionof driver