8/8/2019 yz tanma
1/63
Human behaviour analysis and
interpretation based on the videomodality: postures, facialexpressions and head movementsA. Benoit, L. Bonnaud, A. Caplier, N. Eveno, V.
Girondel, Zakia Hammal, M. Rombaut
8/8/2019 yz tanma
2/63
Introduction
Looking at people domain : automatic analysisand interpretation of human actions (gestures,
behaviour, expressions)Needs low level information (video analysis stepanswering to how are things ? ) and highlevel interpretation (data fusion answering to what is happening ? ).
8/8/2019 yz tanma
3/63
Applications
multimodal interactions and interface cf Similar NOE mixed reality systems smart rooms smart surveillance systems (hypovigilance detection,detection of distress cases: old people surveillance, bussurveillance)
e-learning assistance
8/8/2019 yz tanma
4/63
Outline
1. Global posture recognition2. Facial expressions recognition3. Head motion analysis and interpretation4. Conclusion
which expression ? which head motion ? which posture ?
8/8/2019 yz tanma
5/63
System overview
1. Low level data extraction :- segmentation- temporal tracking
- skin detection and face localization2. Static posture recognition :
- low-level data- belief theory, definitions, models, data fusion, decision
3. Results :- training and test sets- recognition results
- video sequence
8/8/2019 yz tanma
6/63
System overview
1030
1030
1030
1030
1030 sitting
Video sequence
Segmentation
Temporal tracking
Skin detection / Face localization
Static posture recognition
Indoor scene filmed by a static camera L o w l e
v e l
d a t a e x t r a c t i o n
8/8/2019 yz tanma
7/63
Low Level data: personsegmentation
Adaptive background removal algorithm :consecutive frame differences + adaptive reference image
A. Caplier, L. Bonnaud and J. M. Chassery Robust fast extraction of video objects combining framedifferences and reference image in Proc. IEEE International Conference on Image Processing , pp. 785-788, September 2001.
Low-level data computed :- rectangular bounding box(SRBB)- principal axes box (SPAB)- gravity center 475
475
---- SPAB
gravity center
SRBB
8/8/2019 yz tanma
8/63
Low Level data: skin detection
Color space YCbCr :no conversions, luminance and chrominances apartskin databases : Von Luschans, one obtained with camera
Method :thresholding in CbCr planewith initial thresholds :Cb [86, 140] Cr [139, 175]
Y
Y
Cr
Cb
8/8/2019 yz tanma
9/63
Low Level data: temporal tracking
Computation of SRBBs overlap : forward backward result
Low-level data computed :- Identification numbers (IDs)- temporal split/merge information
T-1
T
8/8/2019 yz tanma
10/63
Low Level data: face localization
Automatic thresholds adaptation :translation, reduction of detection
intervals towards Cb and Cr mean valuesFace and hands identification :
sorted lists of skin patchescriteria related to temporal tracking and human morhology
V. Girondel, A. Caplier, and L. Bonnaud Hands Detection and Tracking for Interactive MultimediaA lications in Proc. International Con erence on Com uter Vision and Gra hics . 282-287
Cr
Cb
face
lefthand
righthand
8/8/2019 yz tanma
11/63
System overview
1. Low level data extraction :- segmentation- temporal tracking
- skin detection and face localization2. Static posture recognition :
- low-level data- belief theory, definitions, models, data fusion, decision
3. Results :- training and test sets- recognition results
- video sequence
8/8/2019 yz tanma
12/63
Posture recognition: measures
Reference posture Da Vinci Vitruvian man posture standing, arms stretched horizontally
Distance measurements D1, D2, D3, D4
ideas : person height and shape compactness
Normalization : ri=Di/Diref860
SRBBSPAB
112
SRBB
SPAB
i
8/8/2019 yz tanma
13/63
Posture recognition:belief theory
Advantages :- use of imprecise, conflictous data- not computationally expensive (HMMs, NNs)
Universe : = {Hi} i=1n 2N subsets A of
Hypotheses : Hi disjunctive if exhaustive closed universeelse open universe
Considered postures :standing (H1), sitting (H2), squatting (H3) and lying (H4)one hypothesis added for unknown postures (H0)
Belief mass distribution m :
confidence degree in A with
8/8/2019 yz tanma
14/63
Posture recognition: measuresevolution
Example for r1 measurement :
frame
8/8/2019 yz tanma
15/63
Posture recognition: measuresmodeling
Belief mass distributions : mri , measurements imprecision
=
8/8/2019 yz tanma
16/63
Posture recognition: data fusion
Final belief mass distribution : mr1234
Orthogonal sum of Dempster :
Example :
Conflict : non-null belief mass for empty set -> contradictorymeasurements
8/8/2019 yz tanma
17/63
System overview
1. Low level data extraction :- segmentation- temporal tracking
- skin detection and face localization2. Static posture recognition :
- low-level data- belief theory, definitions, models, data fusion, decision
3. Results :- training and test sets- recognition results
- video sequence
8/8/2019 yz tanma
18/63
Results: implemented system andcomputing time
Implemented system- Sony DFW-VL500 camera- YCbCr 4:2:0 format, 30 fps, 640x480 resolution
- low-end PC : 1.8 GHz- unoptimized C++ code
Computing time :- segmentation : 42%
- temporal tracking : 3%- skin detection and face localization : 50%- static posture recognition : 5%
Results :obtained at approximately 11 fps
8/8/2019 yz tanma
19/63
Results: databases
Training set :- 6 video sequences, different persons of various heights- 10 consecutive postures
- normal postures, in front of the cameraTest set :
- 6 video sequences, other persons of various heights- 7 different postures- free postures, i.e. move the arms, sit sideways
8/8/2019 yz tanma
20/63
Results: recognition rates
Training set :meanrecognition
rate :88.2%
Test set :meanrecognitionrate :80.8%
8/8/2019 yz tanma
21/63
Results: video example
8/8/2019 yz tanma
22/63
Outline
1. Global posture recognition2. Facial expressions recognition3. Head motion analysis and interpretation4. Conclusion
which expression ? which head motion ? which posture ?
8/8/2019 yz tanma
23/63
Facial expressions analysis
1 Assumptions2 Facial features segmentation: low level data
3 Facial expressions recognition: high levelinterpretation4 Facial expression recognition based on audio: a
step towards a multimodal system
8/8/2019 yz tanma
24/63
Assumptions
Facial regognition based on the analysis of thedeformations of facial permanent features (lips,
eyes and brows) 6 universal emotions: surprise, joy, disgust,anger, fear, sadness
Is it possible to recognize
the facial expression ?
8/8/2019 yz tanma
25/63
Facial expressions analysis
1 Assumptions and Applications2 Facial features segmentation: low level data
3 Facial expressions recognition: high levelinterpretation4 Facial expression recognition based on audio: a
step towards a multimodal system
8/8/2019 yz tanma
26/63
Facial features extraction:models choice
P3
P2P1
P4
P5 P7
P6
Open eye : circle, parabola,
Bezier curveBrow : Bezier curve
P1 . . P2
Closed eye : line
External lips : 4 cubics,2 broken lines
More complex model, more possibledeformations
8/8/2019 yz tanma
27/63
Facial features extraction:models initialisation
Detection of characteristic points: eyes corners, mouth corners
Luminance and chrominancegradient information
xx
xx x
x
Luminance gradient information
8/8/2019 yz tanma
28/63
Facial features extraction:models deformations (1)
Gradient flows of luminance and/or chrominance maximisation
( ). ( ) p cercle
E I p n p
= ur r
I(p)n(p)
8/8/2019 yz tanma
29/63
Facial features extraction:models deformations (2)
A single control point displacementGradient flow of luminance
2 or 3 control points displacementGradient flow of luminance
Mouth corners displacementGradient flow of chrominance and luminance
8/8/2019 yz tanma
30/63
Facial features extraction:some results
Flexibility of the chosen models, accuracy
8/8/2019 yz tanma
31/63
Facial expressions analysis
1 Assumptions and Applications2 Facial features segmentation: low level data
3 Facial expressions recognition: high levelinterpretation
4 Facial expression recognition based on audio: a
step towards a multimodal system
8/8/2019 yz tanma
32/63
Facial expressions recognition:recognition on facial skeletons
disgust Fear anger
SurpriseJoy sadness
8/8/2019 yz tanma
33/63
Facial expressions recognition:characteristic distances
Facial features deformations related to characteristic distances
D2
D1
D5
D4
D3
Joy : {open mouth} => D3 > Dn3 and D4 > Dn4 ,{mouth corners backwards} => D5 < Dn5 ,{slakened brows} => D2 no modification
Neutral => Dni reference values
Surprise : {raising up brows} => D2 >D n2,{stared eyes} => D1 > Dn1 ,{open mouth} => D3< Dn3 and D4> Dn4 .
8/8/2019 yz tanma
34/63
D5
Facial expressions recognition:distances discretisation
Distances discretisation : 3 states Dni : distance for the neutral expression
S : stable C+ : Di >> Dni C- : Di Dni (state S C+) SC- : Di < Dni (state S C-)
D2 evolution (surprise)
D5 evolution (sourire)
D2
8/8/2019 yz tanma
35/63
Facial expressions recognition:basis of rules
joy C- S / C- C + C+ C -
Surprise C+ C+ C - C+ C+
disgust C - C - S / C+ C + S / C -
anger C + C - S S / C - S
sadness C - C + S S S
fear S / C+ S / C+ S / C- S / C+ S
neutral S S S S S
D1 D2D3
D4
D5
8/8/2019 yz tanma
36/63
Facial expressions recognition:evidence mass distribution and modelling
To each Di is associated the following mass of evidence :
Cliquez pour modifier les styles du texte du masqueDeuxime niveau
Troisime niveau
Quatrime niveau Cinquime niveau
Modelling
thresholds (ah) related to each Di are estimated after atraining step (analysis of distances evolution for 4 facialexpressions and 13 different persons.
:m Di ]1,0[2
)( A A m Di
Di
Dim
8/8/2019 yz tanma
37/63
Facial expressions recognition:method principle (1)
Distances Di measurement and symbolic state determination
Computation of the mass distribution for each Di state
With the basis of rules, computation of the evidence mass for each expression and each Di
Combination of evidence mass distribution in order to take allthe measures into account before taking a decision.
8/8/2019 yz tanma
38/63
Facial expressions recognition:method principle (2)
( )=
=
=
AC B
D D
D D
C m Bm Am
mmm
)(.)( 21
21
Mass of evidence combination (orthogonal sum):
A, B and C : expressions or subset of expressions.
Reject class ( E8 : unknown ) : this is an expression which is differentfrom all the expressions described in the rules table.
8/8/2019 yz tanma
39/63
Facial expressions recognition:method principle (3)
8/8/2019 yz tanma
40/63
Facial expressions recognition:results (1)
Hammal-Caplier database
Cohn-Kanade database
8/8/2019 yz tanma
41/63
Facial expressions recognition:results (2)
neutral : 100% unknown : 100% joy : 100%
3 frames from neutral to joy=> transitory expression
8/8/2019 yz tanma
42/63
Facial expressions recognition:results (3)
joy
E2 Surprise
E1 joy
Surprise disgust neutral
E3 disgust
E7 neutral
E1 E3
E8 unknown
other
Total
76,36%
12%
43,10%
88%
6,06%
6,66%
10,90%
0
0,02%
0
0
0
12%
11,08%
0
0,78%
00
0
6,06%
72,44%
2,08%
0
8,62%
9,48%
15,51%
12,06%
0
11,32%
0
0
0
84,44% 51,72% 88%87,26%
Syst\Exp
E2 E6
Results on the Hammal-Caplier database (21 samples for each consideredexpression, 630 images).
8/8/2019 yz tanma
43/63
Facial expressions analysis
1 Assumptions2 Facial features segmentation: low level data
3 Facial expressions recognition: high levelinterpretation4 Facial expression recognition based on audio: a
step towards a multimodal system
8/8/2019 yz tanma
44/63
Facial expressions analysis based on audio(collaboration with Mons University)
Idea: characterization of expressions in the speechsignal => use of statistical speech features such asspeech rate, SPI, energy and pitch
Problem: expressions classes are different. After thepreliminary study, 2 classes active (joy, surprise, anger)and passive (neutral, sadness) suitable for speech
Perspectives: definition of a multimodal system for facialexpression recognition.
8/8/2019 yz tanma
45/63
Outline
1. Global posture recognition2. Facial expressions recognition3. Head motion analysis and interpretation4. Conclusion
which expression ? which head motion ? which posture ?
8/8/2019 yz tanma
46/63
Head motion interpretation
1 Introduction2 Head motion estimation: biological modelling
3 Examples of head motion interpretation
8/8/2019 yz tanma
47/63
Introduction
Idea: Global head motions such as nods and localfacial motion such as blinking are involved in thehuman to human communication process.Aim: automatic analysis and interpretation ofsuch gestures.
8/8/2019 yz tanma
48/63
Head motion interpretation
1 Introduction2 Head motion estimation: biological modelling
3 Examples of head motion interpretation
8/8/2019 yz tanma
49/63
Head motion estimation: biological modelling
Algorithm overview: human visual system modelling
8/8/2019 yz tanma
50/63
Head motion estimation: retina filtering
OPL stage: spatio-temporal filtering- contours enhancement- noise attenuation- illuminations variations removing
IPL stage: temporal high passfiltering dedicated to movingStimulus- moving contours (perpendicularto the motion direction) extraction- static contours removing
8/8/2019 yz tanma
51/63
Head motion estimation: log-polar spectrumcomputation
Computation of the spectrum of the retina filtered imagein the log polar domain => spectrum easier to analyse:
- roll and zoom = global energy spectrum translations- pan and tilt = local energy spectrum translations- translations = no changes in the energy spectrum from frame to frame
8/8/2019 yz tanma
52/63
Head motion estimation: log-polar spectruminterpretation (1)
Maximums of energy on the contoursperpendicular to the motion direction=> Cumulated curve of energy per orientation
Abscissa of the maximums = motion directionTemporal evolution of the abscissa = motion typeAmplitude of the maximum proportionalto the motion amplitude => energy decreasing
Or annulation in case of stops.
8/8/2019 yz tanma
53/63
Head motion estimation: log-polar spectruminterpretation (2)
8/8/2019 yz tanma
54/63
Head motion estimation: log-polar spectruminterpretation (3)
Each minimum of energy is related to a motion stop
8/8/2019 yz tanma
55/63
Head motion estimation: log-polar spectruminterpretation (4)
To summarize : properties of the retina filteredenergy spectrum in the log-polar domain
max of energy associated to the contours moving to the motion no motion = no energyOrientation of energy maximums = motion directions movements of energy maximums = motion type
8/8/2019 yz tanma
56/63
Head motion interpretation
1 Introduction2 Head motion estimation: biological modelling
3 Examples of head motion interpretation
d d f b ( )
8/8/2019 yz tanma
57/63
Head nods of approbation or negation (1)
Idea : detection of periodic head motions
I am still with you
Approach: to put a biological head motion detectoron a face bounding box and to control all thehead movements
Goal : recognition of head nods of approbation and negation
approbation: periodic head tilting negation: periodic head panning
H d d f b i i (2)
8/8/2019 yz tanma
58/63
Head nods of approbation or negation (2)
Bli ki d i
8/8/2019 yz tanma
59/63
Blinking detection
Blink : vertical motion of the eyelidApproach: to put a biological motion detector on a bounding box
around the eyes
Y i d i
8/8/2019 yz tanma
60/63
Yawning detection
Yawn : vertical motion of the mouthApproach: to put a biological motion detector on a bounding box
around the mouth
H i il d i
8/8/2019 yz tanma
61/63
Hypo vigilance detection
Hypovigilence : short or long eyes closingmultiple head rotations
frequent yawning
Approach: combine the information coming from 3 biologicalmotion detectors
l
8/8/2019 yz tanma
62/63
Outline
1. Global posture recognition2. Facial Expressions recognition3. Head motion analysis and interpretation4. Conclusion
which expression ? which head motion ? which posture ?
l
8/8/2019 yz tanma
63/63
Conclusion
Human activities analysis and recognition based onvideo and images data
Unified approaches: extraction of low level data andfusion process for high level semantic interpretation
Correlation with application; exple: project 4 ofEnterface Workshop about Attention level detectionof driver