Upload
leon-goodwin
View
224
Download
1
Embed Size (px)
Citation preview
A Regression Approach to Music Emotion Recognition
Yi-Hsuan Yang, Yu-Ching Lin, Ya-Fan Su, and Homer H. Chen, Fellow, IEEE
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 2, FEBRUARY 2008
Sung Eun Park
2009-11-20
Intelligent Database Systems LabSchool of Computer Science & EngineeringSeoul National University, Seoul, Korea
Copyright 2008 by CEBT 2
Contents
Introduction
Simple concept of the model
Body
Regression approach
Model Explanation
Evaluation
Conclusion
Discussion
Contribution
Q&A
Copyright 2008 by CEBT 3
Brief Concept of the Model
Thayer’s arousal-valence emotion plane.
♬♬♬♬♬
Copyright 2008 by CEBT 4
An application using this concept
Musicovery based on the same concept of this model.
click
Findrelevant music of the point
Copyright 2008 by CEBT 5
Many good regressor(regression algorithms ) are readily available.
Given N inputs (xi, yi), 1≤ i ≤ N, where xi is a feature vec-
tor for the ith input sample, and yi ∈ R is the real value
to be predicted for the ith sample, the regression system trains a regression algorithm(regressor) R(∙) such that the mean squared error ε is minimized.
Regression Approach
minimize a feature vectorReal Value
Find this!! Pre-dictedValue
Copyright 2008 by CEBT 6
The model
♬♬♬♬
♬
Ground Truth
Musical Features
RegressorReg.A and Reg.V
Subjec-tive test
FeatureExtrac-
tion
Regres-sion
Emotion Visualiza-tion
Copyright 2008 by CEBT 7
The model in detail
Preprocessing
RegressorTraining
Training Data
SubjectiveTest
Feature extraction
Reg.A Reg.V
Preprocessing
EmotionVisualization
Test Data
Reg.A Reg.V
Feature extraction
Copyright 2008 by CEBT 8
The dependency between the two dimensions,
arousal and valence
What is the positive music?
Then what is the energetic music?
Principle Component Analysis(CPA)
is common way of reducing the correlation
between variables.
An Issue of the Continuous Perspec-tive
ener-getic
calm
Com-puted by PCA
Original data
Principle component
Copyright 2008 by CEBT 9
Reducing Correlation Between Vari-ables
AV plane: some dependency
exists
PC plane: no dependency
exists
Train regressorRp ,Rq
Test in PQ plane and compare with AV planeDetails follow in the later presentation
Copyright 2008 by CEBT 10
Dataset
Preprocessing
RegressorTraining
Training Data
SubjectiveTest
Feature extraction
Reg.A Reg.V
195 popular songs selected from a number of Western, Chinese, and Japanese albums.
1) These songs should be distributed uniformly in each quadrant of the emotion plane.
2) Each music sample should express a certain dominant emotion.
Copyright 2008 by CEBT 11
253 volunteers from the campus
Subjective Test
Preprocessing
RegressorTraining
Training Data
SubjectiveTest
Feature extraction
Reg.A Reg.V
Is asked to listen to ten music samples randomly drawn from the music database and to label the AV values from –1.0 to 1.0 in 11 or-dinal levels.
Label the evoking emotion rather than the perceived one
Standard deviation of evaluation to the same song is 0.3( which is okay)
Same person tend to label same with same music.
Copyright 2008 by CEBT 12
Feature Extraction
Preprocessing
RegressorTraining
Training Data
SubjectiveTest
Feature extraction
Reg.A Reg.V
Copyright 2008 by CEBT 13
Feature Extraction
Preprocessing
RegressorTraining
Training Data
SubjectiveTest
Feature extraction
Reg.A Reg.V
• Psysound aims to model parameters of Auditory sensation based on some psychoacoustic models.• Earlier research found that 15 of the features are more closely related to emo-tion perception.
Copyright 2008 by CEBT 14
Feature Extraction
Preprocessing
RegressorTraining
Training Data
SubjectiveTest
Feature extraction
Reg.A Reg.V
Select features from all extracted fea-tures which is related to Emotion.
RReliefF is used as a feature extrac-tion algorithm(FSA).
RRFm,n is a space with top-m and top-n
selected features.
Copyright 2008 by CEBT 15
Regression Algorithms
Preprocessing
RegressorTraining
Training Data
SubjectiveTest
Feature extraction
Reg.A Reg.V
Three regression algorithms:1. Multiple linear regression (MLR)
• Assumes lineal relationship • Simple method
2. Support vector regression (SVR)• Nonlinearly maps input features into higher dimensional feature space• In many cases superior to existing machine learning methods
3. AdaBoost.RT (BoostR)• Nonlinear regression algorithm • A number of regression trees are trained iteratively and weighted according to the prediction accuracy
Copyright 2008 by CEBT 16
Method
R2 Statistics : showing how much prediction and real value are close.
AV and PC Plane comparison :
The effect of variance dependency
Evaluation
The best combination
No significantdifference
<<
Copyright 2008 by CEBT 17
Evaluation
Regressor Comparison
A plane with no correlation
Selected feature space
Copyright 2008 by CEBT 18
Evaluation – The Prediction Accuracy
+ Ground Truth Prediction Result
The best performance of the regression approachreaches 58.3% for arousal and 28.1% for valence by using PCRRF SVR
Copyright 2008 by CEBT 19
Performance Evaluation
Using same ground truth data and feature data
=100.3
=117.7
Copyright 2008 by CEBT 20
Subjectivity issue
Individual difference : influence of many factors. Cultural background, generation, sex, and personality.
GWMER(Group-wise MER scheme)
Personalization can be an alternative way.
Discussion
R1
R…
R2
R3
R4
Regressor
G1
G…
G2
G3
G4
Users
RegressorChoosing
Copyright 2008 by CEBT 21
Contribution
One of the first attempts that develop an MER system from a continuous perspective.(Each song maps to a point in the emotion plane)
A sound theoretical foundation is proposed.
Regression theory.
Extensive performance study.
Several algorithms are tested
Dealing with subjectivity issues of Music Emotion Re-trieval(MER).
Emotion is different from person to person
Two demensions in emotion plane are not dependent.
Thank you…
Q&A
Thank you…