View
2.452
Download
1
Embed Size (px)
Citation preview
1
(Jeremy)Behavioral Informatics and Interaction Computation Lab (BIIC)
:
2017 January 15th
():
?
2
THIS
IS
SUBWAY
MAP
Data
Science
Nave Bayes Algorithm
Transfer learning
Apriori Algorithm
Gaussian distribute
Random Forests
Logistic Regression
(Deep)Neural Networks
Decision Trees
Nearest Neighbour
Support Vector Machine K Means Algorithm
Linear Regression
Active learning
Domain adaptation
Semi-supervised learningReinforcement learning
unsupervised learningsupervised learning
7
8
9
Emotion
Health Care
Education
Voice Recognition
Symptom diagnosis
Behavior Activity
Image Recogn
Medical
IBM Pathway Genomics
Detection of DiabeticRetinopathy in RetinalFundus Photographs
customer behavior
Medical Imaging
Genomic Medicine
What do I do ?&
What am I going to share ?
10
11
Behavioral signal processing
Professor Shrikanth Narayanan, USC
12
Seek a window into human mind and traits
through engineering approach
S. Narayanan and P. G. Georgiou, Behavioral signal processing: Deriving human behavioral informaticsfrom speech and language," Proceedings of the IEEE, vol. 101, no. 5, pp. 12031233, 2013.
13
Behavioral Signal Processing (BSP)
Compute Human Behavior Traits and States for Domain Experts Decision Making
Help experts to do things they know in a more efficient manner at scale
Develop novel behavioral analytics framework for possible scientific discovery
from qualitative to quantitative . . .
through verbal and non-verbal behavioral cues . . .
Part I
:
14
15
. . .
16
(Signals)(System)
High-level (Abstraction) . . .
17
18
19
:
20
21
22
23
(Self Report)
24
:
25
26
(Self Report)
NRS
27
28
29
:
30
Autism diagnosis observational schedule
31
ADOS
32
BSPRole . . .
:
BSP Technology
(reliability) (repeatable) (scalable)
QUANTITATIVEQUANTITATIVE EVIDENCE DIRECTLY FROM MEASURABLE SIGNALS
EFFICIENCY :HELP DO THINGS THAT EXPERTS KNOW TO DO WELL MORE EFFICIENTLY, CONSISTENTLY & AT SCALE
SUPPLMENTARY:
COMPLEMENT WITH GOLD STANDARD METHOD WHEN APPROPRIATE
POSSIBILITY:
TOOLS FOR NOVEL ACTIONABLE INSIGHT DISCOVERY
33
COMPUTING BEHAVIORAL TRAITS & STATES FOR DECISION MAKING & ACTION
aim..
34
BSPEnablers . . . ()
Text Processing
Voice Activity Detection
Alignment
Transcription
Keyword Spotting
Prosody Modeling
Voice QualityDiarization
Speaker Identification
Dialog Act Tagging
Face Detection
Expression recognition
Action recognition
LanguageUnderstandin
Affective Computing
Speaker State and Trait
Joint Speech Visual
Processing
Interaction Modeling
Sentiment Analysis
35
Enabling Technologies
Domain Experts Knowledge
Low level descriptors
Acoustic features
Motion features
Text features
Image features
Speech recognition
Face recognition
Action recognition
Dialog act tagging
Keyword spotting
Text processing
Sentiment Analysis
Affect recognition
Speaker states and
traits
Visual-speech
processing
Interaction modeling
Subjectiveassessment
Internal state & construct
Neuro-developmental disorder
Evidence-based
observational coding
Intervention efficacy
Coder variability
control
Development of coding manual
Self report measure validity
Coding mechanism
Social behavior
Affective behavior
Communicative
behavior
Dyadic behavior
36
Behavior signal processing
BSP INGREDIENTS
37
()
: +
I. II.
III. IV.
38
BSP INGREDIENTS
39
BSP Operational Definition
40
Computational Methods that Model Human Behavior Signals
Manifested in Overt and Covert Cues
Processed and Used by Humans Explicitly or Implicitly
Facilitate Human Analysis and Decision Making
Outcome of Behavioral Signal Processing
Behavioral Analytics
QUANTIFYING HUMAN EXPRESSED BEHAVIOR ANDHUMAN FELT SENSE
DERIVING INTERPRETABLE BEHAVIOR ANALYTICS FROM DATA FOR ACTIONAL INSIGHTS
41
42
(20133 13 )
(20135 29 )
?
43
44
45
200/
:
?
46
47
Can you tell the difference?
48
1. Subjective evaluation2. Time-consuming3. Non-scalable
1. 2. 3.
49
50
0
2000000
4000000
6000000
8000000
2010 2011 2012 2 0 1 3 2014 2015
2010~2015 THE NUMBER OF EMERGENCY PATIENTS
7,200,000
51
52
(Taiwan Triage and Acuity Scale, TTAS)
(NRS-11)
The difficulty in implementation of NRS
53
54
NRS-11
55
56
social-communicative neurodevelopmental disorder
Prevalence: 1 in 68 children (1 in 42 males) diagnosed [CDC2014]
ASD: Spectrum disorder due to the extreme heterogeneity
Intervention leads to improved outcomes
BSP in Autism ?
What is Autism?
57
ROLE OF BSP?
ADOS social and interactive
AIM?
Analysis at scale
Quantitative evidence from signals
New finding beyond current status-quo in psychiatry (?)
58
(
()
Qualitative description
59
Example: a snippet of an actual clinical ADOS diagnostic session
60
Can we?
Automatic measuring spontaneous social (verbal/nonverbal) behavior betweenclinician and child predicting the child rating of atypical amount of socialreciprocal communication
from qualitative to quantitative . . .
through verbal and non-verbal behavioral cues . . .
61
BSP INGREDIENTS
62
()
: +
I. II.
III. IV.
63
64
= / +
Part 2:
65
BSP INGREDIENTS
66
()
: +
I. II.
III. IV.
67
() (ecologically-valid)
ease-of-application, realism
established instrument Scientific-rigor Ensure domain-applicable
analytics
68
69
where
when
how
BIIC
Ensure current system is not altered too much at the BEGINNING at-scale, ease-of-application is crucial
ecological validity & quality control
BIIC
BIICKinectsynchronized
70!! !!
? @@
71
360
72
73
74
75
where
when
how
BIIC
Ensure current system is not altered too much at the BEGINNING at-scale, ease-of-application is crucial
ecological validity & quality control
BIIC
BIIC
76
!! !!
77
250
78
79
Verbal Numerical Rating Scale (NRS)
11 self-report pain-level assessment (0 - 11)
Considered as clinically-validgold standardfor assessing pain
80
81
where
when
how
BIIC
Research Oriented:We have a little more flexibility in the room design!!
ecological validity & quality control
BIICADOSADOS
ADOS
BIIC
82
Two HD-cameras Two lapel microphones (synced through mixers)
~40 subjects
83
Autism Diagnostic Observation Schedule [Lord 2001]
Subject interacts with a psychologist for ~45 minutes
Current gold standard, research-level observational coding
Psychologists are trained using stringent training protocol
Semi-structured assessment in eliciting socio-communicative behavior of the ASD children for diagnostics
Multiple subparts events (14) on rating of a wide range number of socio-communicative behavior (28)
84
85
Internally quality control
(
()
ADOS
86
1 2
3 4:
BSP INGREDIENTS
87
()
I. II.
88
Pre-processing Data collection-dependent Smart utilization of current
progresses in audio-video processing
label?
Label consistency Reliable labeling Construct validity
89
90
Voice Activity Detector
Speech signal per session
Energy every frame
frame = 25ms
standard deviation (normalize D.C. offset)
Threshold
speech percentage in the wav
Speech Segments
Energy > Threshold Energy
Short-Time energyFormula:
=
=+
()
Human
V A D
VAD
Human
(Part 3)
93
94
(Diarization)
95
diarization
Segmentation and Clustering (Diarization)
Speaker B
Speaker A
Where are speaker changes?
Which segments are from the same speaker?
96
Segmentation and Clustering (Diarization)
()
MFCCLow-level descriptors(part 3)
(frame)
97
Segmentation:speaker change detection
1. ()2. frame
Bayesian Inference Criterion(BIC)
98
Clusteringspeaker change detection
1. Generate i-vector for each segment2. Compute pair-wise similarity each cluster3. Merge closest clusters4. Update distances of remaining clusters to
new cluster5. Iterate steps 2-4 until stopping criterion is
met
SpeakerDiarization
!
100
68facial landmark (openface toolkit)
101
Face detection
68 facial landmark detection
Pre-trained Constrained local neural field method
102
. . .
(learn the hard way!)
103
TAILORED SOLUTION
1 2
3
104
Pre-processing Data collection-dependent Smart utilization of current
progresses in audio-video processing
label?
Label consistency Reliable labeling Construct validity
105
Label
dynamic range
4 dimensions: 95% variance
( 20% )
( 20% )
( 20% )
( 10% )
( 10% )
( 10% )
( 10% )
(100%)
107
label -
PCA
First principal axis weights
inter-evaluator agreement level
concept!
rank-normalized
Depends on the scenarios (sometimes reviewers too!)
Cronbachs alpha, Intra-class correlation, Fleiss Kappa, Cohans Kappa
++
0.550.390.430.58
0.63
109
Label
110
Self report
:
:
?
111
frameworksample?
Rule:
Data samples
IEEE
112
Label
113
? Label Social Reciprocity ADOS
Description of pictureCreating a story
Emotion Joint interactive play
label
114
Pre-processing Data collection-dependent Smart utilization of current
progresses in audio-video processing
label?
Label consistency Reliable labeling Construct validity
1label 2domain experts
3
115
Enabling Technologies
Domain Experts Knowledge
Low level descriptors
Acoustic features
Motion features
Text features
Image features
Speech recognition
Face recognition
Action recognition
Voice activity
Diarization
Text processing
Sentiment Analysis
Affect recognition
Speaker states and
traits
Visual-speech
processing
Interaction modeling
Subjectiveassessment
Internal state & construct
Neuro-developmental disorder
Evidence-based
observational coding
Intervention efficacy
Coder variability
control
Development of coding manual
Self report measure validity
Coding mechanism
Social behavior
Affective behavior
Communicative
behavior
Dyadic behavior
label
Label
116
1. data2. label/data3. behavior analytics
Part 3:
117
BSP INGREDIENTS
118
()
: +
I. II.
III. IV.
119
Enabling Technologies
Domain Experts Knowledge
Low level descriptors
Acoustic features
Motion features
Text features
Image features
Speech recognition
Face recognition
Action recognition
Dialog act tagging
Keyword spotting
Text processing
Sentiment Analysis
Affect recognition
Speaker states and
traits
Visual-speech
processing
Interaction modeling
Subjectiveassessment
Internal state & construct
Neuro-developmental disorder
Evidence-based
observational coding
Intervention efficacy
Coder variability
control
Development of coding manual
Self report measure validity
Coding mechanism
Social behavior
Affective behavior
Communicative
behavior
Dyadic behavior
120
human computing (signal) research
Data & algorithm go hand-in-hand
Algorithms
121
?
122
/Profile
/Profile
/Profile
Behavioral Analytics
123
(low-level descriptors)
124
/Profile
(frame)Overlapping step
Source Filter
125
LLDs
Pitch (source):
Intensity (pressure):
MFCC (filter):
=
=+1
2()
=
=0
1
+ + + , 0
k
MFCC(13)
126
Versatile and Fast Audio Feature ExtractorOpen-Source and Cross-platformAbundant speech-related features
Signal energy LoudnessMel-spectraMFCCPLP-CCPitch
Audio I/OSupported A lot I/O formats: WEKA HTK LibSVM
PraatOpensmile
. . .
127
/Profile
Histogram of oriented gradients (HoG)Scale-invariant feature transform (Sift) Local binary pattern (Lbp)3D SIFTHOG3D
textureshapekeypointedge
() frame
Histogram of oriented gradients (HoG) Local binary pattern (Lbp)
128
C++ : opencv
Python : cv2(Opencv), Scikit-image
129
trajectory
Per-frame ?
Improved Dense Trajectory
Optical flow
Trajectory + HOG + HOF + MBH
130
data
131
(encoding/profile)
10ms
66ms
Analysis unit session
Label (time granularity)analysis unit
analysis unit
132
Analysis unit
Analysis unit
133
Functionals
LLDs
- featureanalysis unit
speaker state, emotion recognitionbaseline!!
# #=
134
k-means clustering
Histograms
Dictionary
Bag-of-feature encoding
LLDs
k-means
clustering
audio, video features
=
135
Analysis unit
Analysis unit
136
/Profile
(:analysis unit)
Distributed word representation
137
Term Weighting Method
a simplifying representation by term count
Term FrequencyHow important (or
informative) a word in a document.
Inverse Document FrequencyHow important (or
informative) a word in the corpus.
,
=, ,
,
= log
1 + X
Term FrequencyInverse Document Frequency (TF-IDF)
138
. . .
N-gram Turn unigram term into bigram term on the word token stepfor instance,
John also likes to watch football games
[ 'John also' , 'also likes' , 'likes to' , 'to watch' , 'watch football' , 'football games' ]
[ 1 , 1 , 1 , 1 , 1 , 1 ]
139
Distributed word representation()
CBOW predicting the word given its context
Skip-gram predicting the context given a word
distributed representation encoded in the hidden layer of the neural network as representations of words
140
141
(low-level descriptors)
142
(multimodal)work
143
/Profile
/Profile
/Profile
Behavioral Analytics
Behavioral Analytics
Behavioral Analytics
? ?
144
/Profile
/Profile
/Profile
Behavioral Analytics
Note*
(D/R)NN, (B)LSTM
BSP Work , just be aware of f(# of data), and sometimes
145
. . .
BSP
!!
146
. . .
147
148
:
:
-frame Dense Points Tracking
TRAJ
MBHxy
Each = A Unit-level (66ms) -length Derived Video features
: Dense Trajectory Fisher-
1
2
3
1
2
Acoustic LLDs
Each : = A Unit-level (200ms)-length Dense Acoustic Features
Functionals
1: {1, 1}1
1:1
2:1
:1
1:
: Dense Unit Acoustic Features
2: {1, 2}
3: {1, 3}
4: {1, 4}
K-Means Bag-of-word
149
|c |n |v |r |c |p|vn |r |v |p |n|r |d |v |v |v |r |ng|uj |m |n|zg |v |r |n |uj |n |zg |v |r |n |uj |n|n |l |p |r |b |uj|n |v ,|uj |m |v
Jieba
Built to be the best Python Chinese word segmentation module
151
Word2Vec
Yahoo newswikiptt
152
...
N-gram K-meansAll Documents
BOWper Document
Word2vec
N
functional, context, bow
153
/Profile
/Profile
/Profile
Behavioral Analytics
154
analytics?
= .
Inter-evaluator agreement 0.63
. . . (part 4)
Spearman correlation
0.3 - 0.4
155
Raw audio-videorecording
S1
S2
Sk
. . . MFCCPitch
Intensity
1 : [1,1]
2 : [1, 2]
: [1,]
156
:
:
S1
157
Action-unit inspired facial low-level descriptors computation
Facial landmark Head pose estimation
X
Z
Y
Head orientation movement
158
/Profile
/Profile
Behavioral Analytics
159
NRS- : :
160
NRS- : :
161
? self-report NRS111!
74%
52%
. . . (part 4)
audio video>
162
163
:
(
Quantitatively, Automatically
ADOS description
164
ADOSEmotion Part
Multimodal Turn-taking Behavior
Coordination Time Series
Automatic generating a time-series ofmultimodal behavior coordination measureacross a session . . .
165
Audio
Pitch
Intensity
MFCC
Delta Delta-Delta
Video
Head poses
Eye gaze
Delta Delta-Delta
166
/Profile
/Profile
Canonical correlation analysis
167
ADOSEmotion Part
Multimodal Turn-taking Behavior
Coordination Time Series
Automatic generating a time-series ofmultimodal behavior coordination measureacross a session . . .
168
(symbol)
turn-taking:(1.5second)Sliding
169
1.5s
X:
Y:
3 3 3 2 1 1 2 1 3
2 1 2 1 3 1 1 2 3
Shift
Session-level descriptors
Behavioral Analytics
n turn, n
Logistic regression
(dependency)
170
Binary Classification between typical vs. atypical
ADOS: Social reciprocity score (B9)
ADOS: social reciprocity score (B9)
173
= .
= .
= .
data science work
analytics
? (part 4)
174
175
/Profile
/Profile
/Profile
Behavioral Analytics
Behavioral Analytics
Behavioral Analytics
176
concept
General end-to-end system needs more R&D
Context-dependent (what ever works)
good rule of thumb
mapconstruct
177
/Profile
/Profile
/Profile
Behavioral Analytics
Note*
f(# of data), and
BSP INGREDIENTS
178
()
: +
I. II.
III. IV.
Part 4:
:
179
BSP INGREDIENTS
180
()
: +
I. II.
III. IV.
181
= .
= .
= .
182
183
= .
?
??
184
:
2
1
X
2
1
10
= .
= .
= .
185
= .
Higher consistency
?
Extension
186
Good collaborative vibe . . .
!
187
?
188
multi-task learning
()task
Task 1 - feature
Task 2 - feature
Task 8 - feature
.
.
.
Kernel
Multi-task learning
189
?
!
An actionable insights that were not clear beforeHence, project continue
190
= .
191(: 0-3, : 4-6, : 7-10)
: :
: :
74%
192
Content Validity
Validity
Construct Validity
Criterion Validity
193
acute painelderly
self-report
complementgold standard
()
NRS-11
A-V + FEATURE 43%
70%
Project continue
194
= .
?
195
POINT TO HIGHER ATYPICALITY
196
BSP
197
Psychologists unconsciously alter communicative social behavior strategy (cueingbehavior?) as conditioned on ASD kids ability to carry out reciprocal communicationduring interaction
198
()
: 0.81
199
Insight beyond current capability, opportunity now emerges
We can now start imagining the application of this :
(1) (?)
(2) ?
More?
200
Descriptors Included
Child Prosody Psych Prosody Child and Psych Prosody
Spearmans 0.64*** 0.79*** 0.67***
Psychologists acoustics at least as predictive of child ASD severity ratings
ADOS!
[1] Daniel Bone, Chi-Chun Lee, Matthew P. Black, Marian E. Williams, Pat Levitt, Sungbok Lee, and Shrikanth Narayanan, "The Psychologist as an Interlocutor in Autism Spectrum Disorder Assessment: Insights from a Study of Spontaneous Prosody", Journal of Speech, Language, and Hearing Research, 2014, 57(4), 1162-1177.
Hard to obtained scientific insights without such behavioral analytics for domain experts
NEED MORE VERIFICATION
201
:
1. Data
Is it Technical? Example Pitfall 1
Controlling for Channel Factors
Interspeech 2013 Autism Challenge
Baseline Approach
Black-box (works well)
2-class baseline: 92.8% UAR (chance is 50% UAR)
Hypothesis: Model captures channel, not diagnosis
ASD/SLI from 2 clinics, TD from classrooms
Simple experiment showed channel differences
Matched baseline
Conclusion: Remit (or note) noise sources in data collection.
202
Daniel Bone, Theodora Chaspari, Kartik Audkhasi, James Gibson, Andreas Tsiartas, Maarten Van Segbroeck, Ming Li, Sungbok Lee, and ShrikanthNarayanan, "Classifying Language-Related Developmental Disorders from Speech Cues: the Promise and the Potential Confounds", InterSpeech, 2013.
11/11/2014
203
:
2. cross validation
Is it Technical: Example Pitfall 2
Behavior Analysis & Modeling: Cross-validationThey do not perform speaker-separated cross-fold
validation! Can we detect United States Senators party affiliations
from speech features (with black-box approach)?
Performance increases as # samples/speaker increases
Conclusion: Always perform speaker-separated cross-validation!
20411/11/2014
205
206
Affective Computing
Social Signal Processing
Paralinguistic Recognition
Physiological/Pathological Disorder Recognition/Prediction
BSP,
207
In-car
In-home
In-classroom
on and on
208
application domain
209
Motivation Interview: Addiction Therapy
210
By professor Shrikanth Narayanan
System in clinical trial
211
?
212
Behavioral Signal Processing (BSP)
Compute Human Behavior Traits and States for Domain Experts Decision Making
Help experts to do things they know in a more efficient manner at scale
Develop novel behavioral analytics framework for possible scientific discovery
from qualitative to quantitative . . .
through verbal and non-verbal behavioral cues . . .
Transformative effort . . .
213
OF
FOR
BY
COMPUTING
HUMANS
Human action and behavior data
Meaningful analysis, timely decision making & intervention (action)
Collaborative integration of human expertise with automated processing
By professor Shrikanth Narayanan
214
Enabling Technologies
Domain Experts Knowledge
Low level descriptors
Acoustic features
Motion features
Text features
Image features
Speech recognition
Face recognition
Action recognition
Dialog act tagging
Keyword spotting
Text processing
Sentiment Analysis
Affect recognition
Speaker states and
traits
Visual-speech
processing
Interaction modeling
Subjectiveassessment
Internal state & construct
Neuro-developmental disorder
Evidence-based
observational coding
Intervention efficacy
Coder variability
control
Development of coding manual
Self report measure validity
Coding mechanism
Social behavior
Affective behavior
Communicative
behavior
Dyadic behavior
Relative New:RICH R&D
OPPORTUNITIES(CHALLENGES)
215
BSP INGREDIENTS
216
217
()
()
(): Pattern ()
Contextualize
218
I was challenged and inspired
219
220
221
:
Challenging the status quo/ Pushing scientific boundaryMaking a positive impact
222
BiiC lab @ NTHU EEhttp://biic.ee.nthu.edu.tw
THANK YOU . . .
many COLLABORATORS + the entire BIIC lab