Upload
dustin
View
78
Download
0
Embed Size (px)
DESCRIPTION
Query by Tapping 敲擊選歌. J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://mirlab.org/jang. Query by Tapping. Goal: Music search based on uses’ tapping (at notes’ onsets) over the microphone/keyboard Characteristics - PowerPoint PPT Presentation
Citation preview
112/04/22 1
Query by Tapping敲擊選歌
J.-S. Roger Jang ( 張智星 )Multimedia Information Retrieval Lab
CS Dept., Tsing Hua Univ., Taiwan
http://mirlab.org/jang
-2-
Query by Tapping
Goal: Music search based on uses’ tapping (at notes’
onsets) over the microphone/keyboardCharacteristics
Only note duration is used for comparison, note pitch is discarded.
A hard task for human to recognize (which is different from query by singing/humming)Try this…
-3-
Query by Tapping
Goal: Music search based on uses’ tapping (at notes’
onsets) over the microphone/keyboardCharacteristics
Only note duration is used for comparison, note pitch is discarded.
A hard task for human to recognize (which is different from query by singing/humming)Try this…
-4-
Query by Tapping
Challenges: Users is unlikely to use the same tempo as the
intended song Users tend to lose notes instead of gaining ones We have about 13,000 songs in the database
Major approach: A distance measure based on dynamic
programming
-6-
Feature Extraction via Microphone
Microphone input:
After frame blocking, energy computation, and thresholding:
-7-
Performance Evaluation of Onset Detection
simSequence.m
0 1 2 3 4 5 6
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Computed
GT
precision=3/6=0.5recall=3/5=0.6f-measure=2pr/(p+r)=0.5455
-8-
Similarity Comparison with Songs in Database
A fast method based on IOI ratios Compute the IOI ratios for both query and db IOI
vectors Compute the Euclidean distance these two ratio
vectors
-9-
Music Note Alignment
t(3)
t: test (input) IOI vectorr: reference IOI vector
r(1)t(1)
t(2)r(2)
r(3)
NormalizationAlignment
by DP
t r t r t r
-10-
Normalization
Normalization to have
(Multiplication of 1000 to guarantee high resolution in fixed-point computation.)
)~,~(min),(2~2
qppq
rtDrtdist
))):1((/):1(*1000(~))(/*1000(~
qrsumqrroundr
tsumtroundt
q
1000)(~)(~
11
q
jq
p
i
irit
-11-
Dynamic-programming-based Distance
i
j
t(i-2)
r(j-1)
1~1,0),1(
1~1,0)1,(
)1()1()2()1,2(
)1()1()1,1(
)1()1()2()2,1(
min
),(
2
1
njjD
miiD
jrititjiD
jritjiD
itjrjrjiD
jiD
),( jiD
t: test IOI vector of length mr: reference IOI vector of length n
Recurrent relation:
r(j-2)
t(i-1)t(1) t(2)
r(1)
r(2)
-12-
Experimental Environment
269 test wave files of tapping clips 9 contributors (7 males, 2 females) Wave length: 15 seconds Wave format: PCM, 11025Hz, 8bits, Mono Start position: Beginning of a song
Environment Pentium III 800, 256MB RAM
Database 11,744 MIDI files
-13-
Test Results Using Clips of 15 Seconds
Average response time: 3.42 seconds (29.98 notes)
Recognition rates: Top-1 (top 0.0085%): 15% Top-10 (top 0.085%): 51% Top-100 (top 0.85%): 80%
-14-
Error Analysis
Errors analysis of low-ranked clips Some users cannot tap consistently through 15
seconds Feature extraction is not robust enough to handle
noisy input. Some MIDI files are not faithful rendition of the
original tunes. Users cannot keep up with short consecutive notes.
-15-
Recog. Rates w.r.t. Tapping Duration
Top-100 and 1000 curves level off after 10 seconds.
Top-100 curve does not go up monotonically.
Top-100
Top-10
Top-1000
-16-
Demo
No. of MIDI files: 12982
-17-
Partial List of Songs All I have to do is dream You are my sunshine Beautiful Sunday Do Re Mi Feelings A time for us Love is blue Let it be me My way Love story More than I can say Only you Rain and tears
Rhythm of the rain Rose Rose I love you The sound of silence Unchained melody We are the world Yesterday I just call to say I love you Close to you Mr. Lonely Ben Hey Jude Donna Donna Sealed with a kiss
-18-
Potential Applications
Interactive toysBeat-tracking training and gamesSong retrieval in noisy karaoke bars
-19-
Conclusions
Our MIR system is the first one with query-by-tapping capability.
Rhythm-based search can be used in conjunction with pitch-contour-based search to achieve a better recognition rate.
-20-
Future Work
Search scope expansion How to retrieve MP3 or CD music directly?
Scale-up by hierarchical filtering method How to deal with database with 100,000 songs? What if the user tap from anywhere in the middle
of a song?