A Regression Approach to Music Emotion Recognition Yi-Hsuan Yang, Yu-Ching Lin, Ya-Fan Su, and Homer H. Chen, Fellow, IEEE IEEE TRANSACTIONS ON AUDIO,

A Regression Approach to Music Emotion Recognition

Yi-Hsuan Yang, Yu-Ching Lin, Ya-Fan Su, and Homer H. Chen, Fellow, IEEE

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 2, FEBRUARY 2008

Sung Eun Park

2009-11-20

Intelligent Database Systems LabSchool of Computer Science & EngineeringSeoul National University, Seoul, Korea

Copyright 2008 by CEBT 2

Contents

Introduction

Simple concept of the model

Body

Regression approach

Model Explanation

Evaluation

Conclusion

Discussion

Contribution

Q&A


Brief Concept of the Model

Thayer’s arousal-valence emotion plane.

♬♬♬♬♬


An application using this concept

Musicovery based on the same concept of this model.

click

Findrelevant music of the point


Many good regressor(regression algorithms ) are readily available.

Given N inputs (xi, yi), 1≤ i ≤ N, where xi is a feature vec-

tor for the ith input sample, and yi ∈ R is the real value

to be predicted for the ith sample, the regression system trains a regression algorithm(regressor) R(∙) such that the mean squared error ε is minimized.

Regression Approach

minimize a feature vectorReal Value

Find this!! Pre-dictedValue


The model

♬♬♬♬

♬

Ground Truth

Musical Features

RegressorReg.A and Reg.V

Subjec-tive test

FeatureExtrac-

tion

Regres-sion

Emotion Visualiza-tion


The model in detail

Preprocessing

RegressorTraining

Training Data

SubjectiveTest

Feature extraction

Reg.A Reg.V

Preprocessing

EmotionVisualization

Test Data

Reg.A Reg.V

Feature extraction


The dependency between the two dimensions,

arousal and valence

What is the positive music?

Then what is the energetic music?

Principle Component Analysis(CPA)

is common way of reducing the correlation

between variables.

An Issue of the Continuous Perspec-tive

ener-getic

calm

Com-puted by PCA

Original data

Principle component


Reducing Correlation Between Vari-ables

AV plane: some dependency

exists

PC plane: no dependency

exists

Train regressorRp ,Rq

Test in PQ plane and compare with AV planeDetails follow in the later presentation


Dataset

Preprocessing

RegressorTraining

Training Data

SubjectiveTest

Feature extraction

Reg.A Reg.V

195 popular songs selected from a number of Western, Chinese, and Japanese albums.

1) These songs should be distributed uniformly in each quadrant of the emotion plane.

2) Each music sample should express a certain dominant emotion.


253 volunteers from the campus

Subjective Test

Preprocessing

RegressorTraining

Training Data

SubjectiveTest

Feature extraction

Reg.A Reg.V

Is asked to listen to ten music samples randomly drawn from the music database and to label the AV values from –1.0 to 1.0 in 11 or-dinal levels.

Label the evoking emotion rather than the perceived one

Standard deviation of evaluation to the same song is 0.3( which is okay)

Same person tend to label same with same music.


Feature Extraction

Preprocessing

RegressorTraining

Training Data

SubjectiveTest

Feature extraction

Reg.A Reg.V


Feature Extraction

Preprocessing

RegressorTraining

Training Data

SubjectiveTest

Feature extraction

Reg.A Reg.V

• Psysound aims to model parameters of Auditory sensation based on some psychoacoustic models.• Earlier research found that 15 of the features are more closely related to emo-tion perception.


Feature Extraction

Preprocessing

RegressorTraining

Training Data

SubjectiveTest

Feature extraction

Reg.A Reg.V

Select features from all extracted fea-tures which is related to Emotion.

RReliefF is used as a feature extrac-tion algorithm(FSA).

RRFm,n is a space with top-m and top-n

selected features.


Regression Algorithms

Preprocessing

RegressorTraining

Training Data

SubjectiveTest

Feature extraction

Reg.A Reg.V

Three regression algorithms:1. Multiple linear regression (MLR)

• Assumes lineal relationship • Simple method

2. Support vector regression (SVR)• Nonlinearly maps input features into higher dimensional feature space• In many cases superior to existing machine learning methods

3. AdaBoost.RT (BoostR)• Nonlinear regression algorithm • A number of regression trees are trained iteratively and weighted according to the prediction accuracy


Method

R2 Statistics : showing how much prediction and real value are close.

AV and PC Plane comparison :

The effect of variance dependency

Evaluation

The best combination

No significantdifference

＜＜


Evaluation

Regressor Comparison

A plane with no correlation

Selected feature space


Evaluation – The Prediction Accuracy

+ Ground Truth Prediction Result

The best performance of the regression approachreaches 58.3% for arousal and 28.1% for valence by using PCRRF SVR


Performance Evaluation

Using same ground truth data and feature data

=100.3

=117.7


Subjectivity issue

Individual difference : influence of many factors. Cultural background, generation, sex, and personality.

GWMER(Group-wise MER scheme)

Personalization can be an alternative way.

Discussion

R1

R…

R2

R3

R4

Regressor

G1

G…

G2

G3

G4

Users

RegressorChoosing


Contribution

One of the first attempts that develop an MER system from a continuous perspective.(Each song maps to a point in the emotion plane)

A sound theoretical foundation is proposed.

Regression theory.

Extensive performance study.

Several algorithms are tested

Dealing with subjectivity issues of Music Emotion Re-trieval(MER).

Emotion is different from person to person

Two demensions in emotion plane are not dependent.

Thank you…

Q&A

Thank you…

Documents

A Regression Approach to Music Emotion Recognition Yi-Hsuan Yang, Yu-Ching Lin, Ya-Fan Su, and Homer H. Chen, Fellow, IEEE IEEE TRANSACTIONS ON AUDIO,