36
HCI Course - Final Project Lecturer: Dr. 連連連 Student ID: P78971252 Student Name: 連連連 Mail address: [email protected] Department: Information Engineering Laboratory: ISMP Lab Supervisor: Dr. 連連連 1

HCI Course - Final Project

  • Upload
    candid

  • View
    80

  • Download
    0

Embed Size (px)

DESCRIPTION

HCI Course - Final Project. Lecturer: Dr. 連震杰 Student ID: P78971252 Student Name: 傅夢璇 Mail address: [email protected] Department: Information Engineering Laboratory: ISMP Lab Supervisor: Dr. 郭耀煌. Final Project. - PowerPoint PPT Presentation

Citation preview

Page 1: HCI Course - Final Project

1

HCI Course - Final ProjectLecturer: Dr. 連震杰Student ID: P78971252Student Name: 傅夢璇Mail address: [email protected]: Information EngineeringLaboratory: ISMP LabSupervisor: Dr. 郭耀煌

Page 2: HCI Course - Final Project

2

Final Project•Title: Real-time classification of evoked

emotions using facial feature tracking and physiological responses▫Jeremy N. Bailenson, Emmanuel D. Pontikakis,

Iris B. Mauss, James J. Gross, Maria E. Jabon, Cendri A.C. Hutcherson, Clifford Nass, Oliver John

▫International Journal of Human-Computer Studies. Volume 66 Issue 5, May, pp. 303-317 

• Keywords: Facial tracking; Emotion; Computer vision

Page 3: HCI Course - Final Project

3

Abstract• The real-time models use videotapes of

subjects’ faces and physiological measurements to predict rated emotion

• Through real behavior of subjects watching emotional videos instead of actors making facial poses

• To predict the amusement and sadness of emotion type and intensity level

• Results demonstrated better fit on the aspects of the emotion categories, amusement ratings of emotion type and person-specific models

Page 4: HCI Course - Final Project

4

Introduction• Cameras are constantly capturing images of a

person’s face such as cell phones, webcams, even in automobiles— often with the goal of using that facial information as a clue to understand more about the current state of mind of the user

• many car companies are installing cameras in the dashboard with the goal of detecting angry, drowsy, or drunk drivers

• Goals are to assist the human-computer interfaces and understanding emotions from the facial expressions

Page 5: HCI Course - Final Project

5

Related work•There are at least three main ways in which

psychologists assess facial expressions of emotions (Rosenberg and Ekman, 2000)▫1st: naı¨ve coders view images or videotapes,

and then make holistic judgments concerning the degree on target faces in the images

▫limited in that the coders may miss subtle facial movements, and in that the coding may be biased by idiosyncratic morphological features of various faces

Page 6: HCI Course - Final Project

6

Related work▫2nd: use componential coding schemes in

which trained coders use a highly regulated procedural technique to detect facial actions such as the Facial Action Coding System (Ekman and Friesan, 1978)

▫Advantage is this technique include the richness of the dataset

▫Disadvantage is the frame-by-frame coding of the points is extremely laborious

Page 7: HCI Course - Final Project

7

Related work▫3rd: to obtain more direct measures of

muscle movement via facial electromyography (EMG) with electrodes attached on the skin of the face

▫This allows for sensitive measurement of features, the placement of the electrodes is difficult and also relatively constraining for subjects who wear them

▫This approach is also not helpful for coding archival footage

Page 8: HCI Course - Final Project

8

The approach (Part 1- Actual facial emotion)•First, the stimuli used as the input

▫Input: To elicit intense emotions from people watch videotapes

•The actual emotional behavior is higher accessed than in studies that used deliberately posed faces

•The automatic facial expressions appear to be more informative about underlying mental states than posed ones

Page 9: HCI Course - Final Project

9

The approach (Part 2- Opposite emotions and intensity)•Second, the emotions were coded second-

by-second by using a linear scale for oppositely valenced emotions of the amusement and sadness

•The learning algorithms are trained by both binary set of data and linear set of data spanning a full scale of emotional intensity

Page 10: HCI Course - Final Project

10

The approach (Part 3- Three model types)•Hundreds of video frames rated individually for

amusement and sadness are collected from each person enable to create three model types▫1st is a ‘‘universal model’’ which predicts how

amused any face is by using one set of subjects’ faces as training data and another independent set of subjects’ faces as testing data.

▫The model would be useful for HCI applications in bank automated teller machines, traffic light cameras, and public computers with webcams.

Page 11: HCI Course - Final Project

11

The approach (Part 3- Three model types)

▫2nd is an ‘‘idiosyncratic model’’ predicts how amused or sad by using training and testing data from the same subject for each model

▫This model is useful for HCI applications in the same person who uses the same interfac

▫For example, driving in an owned car, using the same computer with a webcam, or any application with a camera in a private home

Page 12: HCI Course - Final Project

12

The approach (Part 3- Three model types)

▫3rd is a gender-specific model, trained and tested using only data from subjects in same gender

▫This model is useful for HCI applications target a specific gender

▫For example, make-up advertisements directed at female consumers, or home repair advertisements targeted at males

Page 13: HCI Course - Final Project

13

The approach (Part 4- Features)•Physiological responses

▫Cardiovascular activity,▫Electro dermal responding▫Somatic activity

•Facial features from a camera•Hart rate from the hands gripping the

steering wheel

Page 14: HCI Course - Final Project

14

The approach (Part 5- Real-time algorithm)•computer vision algorithms detecting

facial features•Real-time physiological measures•The applications on respond to a user’s

emotion in improving the interaction•For example, cars seek to avoid accidents

for drowsy drivers or advertisements seek to match their content to the mood of a person walking

Page 15: HCI Course - Final Project

15

The approach• The emotions of amusement and sadness and

the physiological responses are collected in order to sample positive and negative emotions

• Only two emotions are chosen since increasing the number of emotions would come at the cost of sacrificing the reliability of the emotions

• The selected films induced dynamic hangs in emotional states over the 9-min period, ranging from neutral to more intense emotional states because of different individuals responded to films with different intensity degrees

Page 16: HCI Course - Final Project

16

Data collection•Training data

▫It was taken from 151 Stanford undergraduates watched movies pretested to elicit amusement and sadness while they watch videotapes and physiological responses were also assessed

Page 17: HCI Course - Final Project

17

Data collection•Laboratory session

▫The participants watched a 9-min film clip▫The film was composed of an amusing, a

neutral, a sad, and another neutral segment (each segment was approximately 2min long)

▫From the larger dataset of 151, randomly chose 41 to train and test the learning algorithms

Page 18: HCI Course - Final Project

18

Expert ratings of emotions•The laboratory software is rated the amount

of amusement and sadness displayed in each second from video

•It was anchored at 0 with neutral and 8 with strong laughter for amusement and strong sadness expression

•Average inter-rater reliabilities were satisfactory, with Cronbach’s alphas= 0.89 (S.D.= 0.13) for amusement behavior and 0.79 (S.D.= 0.11) for sadness behavior

Page 19: HCI Course - Final Project

19

Physiological measures• 15 physiological measures were monitored

▫ Heart rate▫ Systolic blood pressure▫ Diastolic blood pressure▫ Mean arterial blood pressure▫ Pre-ejection period▫ Skin conductance level▫ Finger temperature▫ Finger pulse amplitude▫ Finger pulse transit time▫ Ear pulse transit time▫ Ear pulse amplitude▫ Composite of peripheral sympathetic

activation▫ Composite cardiac activation▫ Somatic activity

Page 20: HCI Course - Final Project

20

System structure•The videos of the 41 participants were

analyzed at a resolution of 20 frames per second

•The level of amusement/sadness of every person for every second was measured from 0 (less amused/sad) to 8 (more amused/sad)

•The goal is to predict at every individual second the level of amusement or sadness for every person

Page 21: HCI Course - Final Project

21

System structureINPUTS OUTPUTS

Processing

Page 22: HCI Course - Final Project

22

Measuring the facial expressions•For measuring the

facial expression of the person at every frame, the NEVEN Vision Facial Feature Tracker is used

•22 points are tracked on a face with four blocks which are mouth, nose, eyes and eyebrow

Page 23: HCI Course - Final Project

23

Chi-square values in amusement• Top 20 features in

amusement analysis

Page 24: HCI Course - Final Project

24

Chi-square values in sadness• Top 20 features in

sadness analysis

Page 25: HCI Course - Final Project

25

Predicting emotion intensity•Two-fold cross-validation was performed

on each dataset using two non-overlapping sets of subjects

•The separate tests are performed both sadness and amusement

•Three test are using face video alone, physiological features alone, and using both of them to predict the expert ratings

Page 26: HCI Course - Final Project

26

Intensity predicting• Table demonstrates that predicting the intensity of

amusement is easier than that of sadness• The correlation coefficients of the sadness neural nets

were consistently 20–40% lower than those forthe amusement classifiers

Page 27: HCI Course - Final Project

27

Emotion classification•A Support Vector Machine classifier with a

linear kernel and a Logitboost with a decision stump weak classifier using 40 iterations (Freund and Schapire, 1996; Friedman et al., 2000) is applied

•Each dataset using the WEKA machine learning software package (Witten and Fank, 2005)

•The data is split into two non-overlapping datasets and performed a two-fold cross-validation on all classifiers

Page 28: HCI Course - Final Project

28

Emotion classification• The precision, the recall and the F1 measure is defined

as the harmonic mean between the precision and the recall

• For a multi-class classification problem with classes Ai, i=1,..,M and each class Ai having a total of Ni instances

• in the dataset, if the classifier predicts correctly Ci instances for Ai and predicts C’I i instances to be in Ai where in fact those belong to other classes (misclassifies them), then the former measures are defined as

Page 29: HCI Course - Final Project

29

Emotion classification•The discrete classification results for all-

subject datasets is shown as

Page 30: HCI Course - Final Project

30

Experimental results – Subjects (1)•The linear classification results for the

individual subjects is shown as

Page 31: HCI Course - Final Project

31

Experimental results – Subjects (2)•The discrete classification results for the

individual subjects is shown as

Page 32: HCI Course - Final Project

32

Experimental results - Gender•Predicting continuous ratings within

gender•The subjects were split into two non-

overlapping datasets in order to perform two-fold cross validation on all classifiers

Page 33: HCI Course - Final Project

33

Classification Results by Gender (1)•Linear classification results for gender-

specific datasets

Page 34: HCI Course - Final Project

34

Classification Results by Gender (2)•Discrete classification results for gender-

specific datasets

Page 35: HCI Course - Final Project

35

Conclusion•A real-time system for emotion

recognition is presented•A relatively large number of subjects

watched videos through facial and physiological to recognize the feel of amused or sad responses

•A second-by second ratings of the intensity with expressed amusement and sadness to train coders

Page 36: HCI Course - Final Project

36

Discussions•The emotion recognition through facial

expressions while watching videotapes is not strong prove become of the limitations in content of videotapes

•Otherwise, the correctness of the devices of physiological features collection are also considered

•The statistics in the aspects of emotion intensity is not significant improve