View
260
Download
9
Category
Preview:
DESCRIPTION
Endpoint Detection ( 端點偵測 ). Jyh-Shing Roger Jang ( 張智星 ) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan. Intro. To Endpoint Detection. Endpoint detection (EPD, 端點偵測 ) Goal: Determine the start and end of voice activity Also known as voice activity detection (VAD) - PowerPoint PPT Presentation
Citation preview
Endpoint Detection( 端點偵測 )
Jyh-Shing Roger Jang (張智星 )http://mirlab.org/jangMIR Lab, CSIE Dept
National Taiwan Univ., Taiwan
-2-
Intro to Endpoint DetectionEndpoint detection (EPD, 端點偵測 )
Goal: Determine the start and end of voice activity Also known as voice activity detection (VAD)
Importance Acts as a preprocessing step for speech-based app. Requires as small computing power as possible
Two modes for recording for speech-base app. Push to talk Offline EPD
Example: Voice command Continuously listening Online EPD
Example: Dialog system, such as SIRI
Quiz!
Cell phone too!
-3-
Types of Features for EPDTime-domain
Volume only Volume and ZCR (zero
crossing rate) Volume and HOD (high-
order difference) …
Frequency-domain Variance of spectrum Entropy of spectrum Spectrum MFCC …
Some features belong to both!
-4-
Typical Approaches to EPDThresholding
Simple thresholdingCompute a feature (e.g.,
volume) from each frameSelect a threshold vth to
identify frames of voice activity
Combined thresholdingUse two features (e.g.,
volume and ZCR) to make decision
Static classification Extract features Perform binary
classificationNegative sil or noisePositive voice activity
Sequence alignment Use hidden Markov
models (HMM) for sequence alignment
You need to use these approaches in EPD program competition.
-5-
Performance Evaluation for EPD (1/2)Two types of errors (typical for all binary
classification) False negative (aka false rejection)
positive negative False positive (aka false acceptance)
negative positiveConfusion matrix/table
Quiz!
-6-
Performance Evaluation for EPD (2/2)Typical methods
Start & end position accuracy Frame-based accuracy
Quiz!
-8-
EPD by Volume ThresholdingThe simplest method for EPD
Volume is abs sum of samples in a frame.Four intuitive way to select vth:
vth = vmax* vth = vmedian* vth = vmin* vth = v1*
-9-
How Do They Fail?Unfortunately…
All the thresholds fail one way or another. Under what situations do they fail?
vth = vmax*Plosive soundsvth = vmedian*Silence too longvth = vmin*Total-zero framevth = v1*Unstable frame
We need a a better strategy…
-10-
A Better Strategy for Threshold FindingA presumably better way to select vth
vlower = 3rd percentile of volumes vupper = 97th percentile of volumes vth = (vupper-vlower)*+vlower
Why do we need to use percentile? To deal with plosive sounds To deal total-zero frames
Does it fail? Yes, still, in certain situation…
-11-
Example: EPD by VolumeepdByVol01.m
0.5 1 1.5 2
Am
plitu
de
-1
-0.5
0
0.5
1Waveform and EP (method=vol)
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
Volu
me
0
50
100
Volume
Play all Play detected
-12-
How to Enhance EPD by Volume?
Major problem of EPD by volume Threshold is hard to determine
Corpus-based fine-tuning Unvoiced parts are likely to be ignored
We need a feature to enhance the unvoiced partsThis can be achieved by ZCR or HOD
-14-
ZCR for Unvoiced Sound DetectionZCR: zero crossing rate
No. of zero crossing in a frame ZCRvoiced < ZCRsilence < ZCRunvoiced
Example: epdShowZcr01.m
0.5 1 1.5 2
Am
plitu
de
-1
-0.5
0
0.5
1SingaporeIsAFinePlace.wav
Time (sec)0.5 1 1.5 2
Cou
nt
0
50
100
150
200ZCR
Play Wave
Quiz:If frame=[-1 2 -2 3 5 2 -2 1],what is its ZCR?
Quiz!
-15-
EPD by Volume and ZCR1. Determine initial endpoints by u
2. Expand the initial endpoints based on l
3. Further expand the endpoints based on ZCR threshold zc
-16-
Example: EPD by Volume and ZCRepdByVolZcr01.m
0.5 1 1.5 2Am
plitu
de
-1
0
1Waveform and EP (method=volZcr)
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2Volu
me
2060
100
Volume
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
ZCR
0
50
Zero crossing rate
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2Am
plitu
de
-1
0
1Waveform after EPD
Play all Play detected
-18-
EPD by Volume and HODAnother feature to enhance unvoiced sounds:
High order differenceOrder-1 HOD = sum(abs(diff(s)))Order-2 HOD = sum(abs(diff(diff(s))))Order-3 HOD = sum(abs(diff(diff(diff(s)))))…
Quiz:If frame=[-1 2 -2 3 -3 2 -2 1], what is its order-n HOD when n is 1, 2, and 3?
-19-
Example: Plots of Volume and HODhighOrderDiff01.m
0 0.5 1 1.5 2 2.5
Am
plitu
de
-1
-0.5
0
0.5
1Waveform
Time (sec)0 0.5 1 1.5 2 2.5
0
50
100
VolumeOrder-1 diffOrder-2 diffOrder-3 diffOrder-4 diff
-20-
Example: EPD by Vol. and HODepdByVolHod01.m
0.5 1 1.5 2
Am
plitu
de
-1
0
1Waveform and EP (method=volHod)
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2Volu
me
& H
OD
0.5
1Volume & HOD
VolumeHOD
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
VH
0
0.5
VH
Play all Play detected
-21-
Hard Example: EPD by Vol. and HODA hard example: epdByVolHod02.m
1 2 3 4 5 6
Am
plitu
de
-1
0
1Waveform and EP (method=volHod)
1 2 3 4 5 6
Volu
me
& H
OD
0.5
1Volume & HOD
VolumeHOD
1 2 3 4 5 6
VH
0
0.5
VH
Play all Play detected
-23-
SpectrogramGoal
Describe energy distribution in each frame along time
MATLAB command [S,F,T] =
spectrogram(signal, frameSize, overlap, fftSize, fs);
Facts Real signals for FFT
Complex conjugate spectrum Take first frameSize/2+1 points when we consider magnitude only
Use zero padding to have a larger fftSize finer freq resolution
-24-
EPD by SpectrumepdShowSpec01.m epdShowSpec02.m
0.5 1 1.5 2
Am
plitu
de
-1
-0.5
0
0.5
1SingaporeIsAFinePlace.wav
Time (sec)0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
Freq
(Hz)
0
2000
4000
6000
8000
Play Wave
1 2 3 4 5 6
Am
plitu
de
-1
-0.5
0
0.5
1noisy4epd.wav
Time (sec)1 2 3 4 5 6
Freq
(Hz)
0
2000
4000
Play Wave
-25-
How to Aggregate Spectrum?How to aggregate spectrum as a single feature
which is larger (or smaller) when the spectral energy distribution is diversified? Entropy function Geometric mean over arithmetic mean
-26-
Entropy Function (1/2)Entropy function
Property
n
iii
n
iiin
ppentropy
ppppp
1
121
ln)(
1 and i,0,,...,
p
p
./1... when maximum its achieves )(1. is theof one when 0 minimum its achieves )(
21 npppentropypentropy
n
i
pp
Quiz!
-27-
Entropy Function (2/2)Proof by taking derivative
./1... when maximum its achieves )(1. is theof one when 0 minimum its achieves )(
21 npppentropypentropy
n
i
pp
Quiz!
-29-
Spectral Entropy
PDF: Normalization
Spectral entropy:
Nifs
fsp N
kk
ii ,...,1,
)(
)(
1
HzforHzfiffs iii 60002500)(
120 iii porpifp
N
kkk ppH
1
log
Reference: Jialin Shen, Jeihweih Hung, Linshan Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998
-30-
Geometric/Arithmetic MeansArithmetic & Geometric means
Property
Proof…
nn
ii
n
ii
in
pgmnpam
ipppp
)(,/)(
,0 and ,..., 21
pp
p
npppamgmgmam ... when maximum its achieves
)()()()( 21pppp
Quiz!
Recommended