2015/6/281 MIR: Status and Trends 音樂資訊檢索的現況與未來 J.-S. Roger Jang ( 張智星...


Citation preview

112/04/18 1

MIR: Status and Trends音樂資訊檢索的現況與未來

J.-S. Roger Jang ( 張智星 )

Multimedia Information Retrieval Lab

CS Dept., Tsing Hua Univ., Taiwan




Intro. to music information retrieval (MIR)Our work on MIR (with demos)

Query by singing/humming (QBSH) Singing voice separation



Types of MIR Systems

Text-based MIR Text input


Metadata: 類別、情緒、口水歌

Content-based MIR Symbolic input

Music score info: 音符、節拍、和弦等

Acoustic inputBy example: 原曲輸入By humans: 哼唱、口哨、敲擊、鼓聲


Span of MIR Research

Content analysis Audio music

Low-level feature extraction

High-level feature representation

Symbolic musicHigh-level feature


Retrieval methods Text-based information

retrieval Data clustering Pattern recognition Distance measures


MIR Methods for Audio Music

Audio features Low-level features

MFCC, spectral flux, rolloff freq, …

High-level featuresPitch, onset, beat, tempo,

chord, key, …Vocal extraction

OthersCollaborative filtering

Retrieval methods Clustering

K-means, VQ, hierarchical clustering

ClassificationSVM, GMM, LSA,


Distance measureDTW, KL, cosine

similarity, edit distance

Others: Learning to rank


MIR Major Events

ISMIR/MIREX Int. Sym. on music information retrieval, since 2000 Music Information Retrieval Evaluation eXchange, since


ICMC Int. Computer Music Conference, since 1974

ICASSP Int. Conf. on Acoustics, Speech, and Signal Processing ,

since 1976


ISMIR Growth: 2000-2009



2000 Plymouth, MA 35 155 63

2001 Bloomington, IN 41 222 86

2002 Paris, FR 57 300 117

2003 Baltimore, MD 50 209 111

2004 Barcelona, ES 105 582 214

2005 London, UK 114 697 233

2006 Victoria, BC 95 397 198

2007 Vienna, AT 127 486 267

2008 Philadelphia, PA 105 630 253

2009 Kobe, JP 124 773 301

TOTALS ---- 853 4451 ----


ISMIR Locations

2000, Plymouth

2001, Bloomington

2002, Paris

2003, Baltimore

2004, Barcelona

2005, London

2006, Victoria

2007, Vienna

2008, Philadelphia 2009, Kobe


State-of-the-Art MIR: Tasks at MIREX

Audio music High-level feature identification

Audio onset detection Audio beat tracking Audio tempo extraction Audio key detection Audio chord estimation Multiple fundamental frequency estimation

& tracking Audio structural segmentation

Classification Artist Genre Mood

Retrieval Audio cover song identification Audio tag classification Audio music similarity and retrieval

Alignment Real-time audio to score Alignment (a.k.a

score following)

Symbolic music Symbolic melodic similarity Symbolic music similarity and

retrieval Hybrid

Query by singing/humming Query by tapping


MIREX: 2005 - 2008

2005 2006 2007 2008

Number of Task(and Subtask) “Sets” 10 13 12 18

Number of Individuals 82 50 73 84

Number of Countries 19 14 15 19

Number of Runs 86 92 122 169


Our Work on MIR

QBSH: Query by Singing/Humming ( 哼唱檢索 )

Singing voice separation ( 人聲抽取 )Audio melody extraction ( 主旋律抽取 )


Introduction to QBSH

QBSH: Query by Singing/Humming Input: Singing or humming from microphone Output: A ranking list retrieved from the song


Overview First paper: Around1994 Extensive studies since 2001 State of the art: QBSH tasks at ISMIR/MIREX


Challenges in QBSH Systems

Reliable pitch tracking for acoustic input Input from mobile devices or noisy karaoke bar

Song database preparation MIDIs, singing clips, or audio music

Efficient/effective retrieval Karaoke machine: ~10,000 songs Internet music search engine: ~500,000,000 songs



QBSH: Goal and Approach

Goal: To retrieve songs effectively within a given response time, say 5 seconds or so

Our strategy Multi-stage progressive filtering Indexing for different comparison methods Repeating pattern identification


Flowchart of QBSH

Two steps Pitch tracking Comparison methods


Frame Blocking for Pitch Tracking

256 points/frame84 points overlap11025/(256-84)=64 pitch/sec

0 50 100 150 200 250 300-0.4








Zoom in



0 500 1000 1500 2000 2500-0.4









ACF: Auto-correlation Function

Frame s(n):

Shifted frame s(n-):



acf(30) = inner product of overlap part= dot(abs(s(30:256), s(1:227))


Pitch period


Frequency to Semitone Conversion

Semitone : A music scale based on A440

Reasonable pitch range: E2 - C6 82 Hz - 1047 Hz ( - )


log12 2




Example of Pitch Tracking

1 2 3 4 5 6 7 8-200








1 2 3 4 5 6 7 8







h (s



PT using ptByDpOverPfMex, with pfWeight=1 and indexDiffWeight=22

pitch1: computed pitch


Typical Result of Pitch Tracking

Pitch tracking via autocorrelation for 茉莉花 (jasmine)聲音


Comparison of Pitch VectorsYellow line : Target pitch vector


Linear Scaling (LS)

Scale the query linearly to match the candidateA typical example of linear scaling


Linear Scaling (LS)

Characteristics One-shot for dealing

with key transposition Efficient and effective Some indexing methods Cannot deal with large

tempo variations #1 method for task 2 in


Typical mapping path


DTW Path of “Match Beginning”


DTW Path of “Match Anywhere”


DTW Path of “Match Anywhere”


QBSH at MIREX 2006


語料: 人聲哼唱的測試資料包含 2797 首 wav 檔案(長度 8

秒, 8KHz/8Bit ), 118 人所錄製,含 48 首兒歌,可自由下載。 歌曲資料庫包含 2048 首單音的 midi 檔案,除前述 48 首兒歌外,

其餘歌曲由主辦單位提供,不公開。 評比項目:

以 2797 wav 檔案為輸入來檢索 2048 midi 檔案:評比標準為 mean reciprocal rank ,我們達到 0.883 (第三名,全球共有 13 隊參賽)

以 2797 wav 檔案為輸入來檢索其他 2797 wav 檔案:評比標準為 mean precision ,我們達到 0.926 (第一名,全球共有 10 隊參賽)


Demos of QBSH

Real-time pitch tracking demo SAP toolbox


Demo of QBSH http://mirlab.org/new/mir_products.asp#miracle

Most successful QBSH application http://www.midomi.com


Singing Voice Separation

Characteristics Easier on karaoke stereo songs Harder for monaural polyphonic songs Important step for a number of MIR applications

Demo clips http://sites.google.com/site/



On-going Research at AIST, Japan

Systems for listening to singing voices LyricSynchronizer: Automatic sync. of lyrics with

polyphonic music recordings Singer ID: Singer identification MiruSinger: Singing skill visualization/training Hyperlinking Lyrics: Creating hyperlinks between

phrases in song lyrics Breath Detection: Automatic detection of breath

sounds in unaccompanied singing voice


On-going Research at AIST, Japan (II)

Systems for music information retrieval based on singing voices VocalFinder: Music information retrieval based on

singing voice timbre Voice Drummer: Music notation of drums using vocal

percussion input

Systems for singing synthesis SingBySpeaking: Speech-to-singing synthesis VocaListener: Singing-to-singing synthesis


The Grand Challenges of MIR

Polyphonic audio music transcription Analogy to the problem of image understanding

over semitranslucent overlayed images 困難度如同觀察水波而得知烏龜或青蛙游過



MIR research is on the rise! MIR research over audio music (which account for

86% of MIREX tasks from 2005~2008)High-level feature identificationApplications to genre/mood/tag classification/retrieval

Preexisting approaches shed lights on MIR. Speech recognition/synthesis Text information retrieval Music theory


References J. S. Downie, D. Bryd, T. Crawford, “Ten Years of ISMIR: Reflections on

Challenges and Opportunities”, Keynote talk, Kobe, ISMIR 2010. M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney,

“Content-Based Music Information Retrieval: Current Directions and Future Challenges”, Proceedings of IEEE, Vol. 96, No. 4, April 2008.

J.-S. R. Jang and H.-R. Lee, "A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming", IEEE Transactions on Audio, Speech, and Language Processing, No. 2, Vol. 16, PP. 350-358, Feb 2008.

Z.-S. Chen, and J.-S. R. Jang, "On the Use of Anti-word Models for Audio Music Annotation and Retrieval", IEEE Transactions on Audio, Speech, and Language Processing, 2009.

C.-L. Hsu and J.-S. R. Jang, "On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset", IEEE Transactions on Audio, Speech, and Language Processing, 2009.

Masataka Goto, Takeshi Saitou, Tomoyasu Nakano, and Hiromasa Fujihara, “Singing Information Processing Based on Singing Voice Modeling”, PP. 5506-5509, ICASSP 2010.
