專題研究 (1) Introduction

Speech Project Week 1

(1)Introduction Prof. Lin-Shan Lee1Speech Recognition by Kaldi toolkitIntroduction of the Project2

RecognitionSystemOutput SentenceInput Speech3How to do recognition?How to map speech O to a word sequence W ?

P(O|W): acoustic modelP(W): language model

4Language model P(W)W = w1, w2, w3, , wn

5Language model examples6

Probability in log scaleAcoustic Model P(O|W)Model of a phone

Gaussian Mixture ModelMarkov Model7Acoustic Model P(O|W)How to compute P(O|W) ?

8Feature ExtractionFeature Extraction9

MFCC (Mel-frequency cepstral coefficients)10

13 dimensions vectorLexicon11

Front-endSignal Processing

AcousticModelsLexiconFeatureVectorsLinguistic Decoding and Search AlgorithmOutput SentenceSpeechCorporaAcousticModelTrainingLanguageModelConstructionTextCorporaLexicalKnowledge-baseLanguageModelInput SpeechGrammarUse Kaldi as tool12Linux Introduction13Vim vim hello.txtiESC/:w:wq+14Screen screen1) "screen"screen4) screen"exit"5) "Ctrl + a" + "d"screen()6) "screen -r"screen~ 7) screen -r screenscreen id !!!15Linux Shell Script Basicsecho Hello (print hello on the screen)a=ABC (assign ABC to a)echo $a (will print ABC on the screen)b=$a.log (assign ABC.log to b)cat $b > testfile (write ABC.log to testfile)

-h (will output the help information)1602.01.extract.feat.shFeature Extraction17Feature Extraction - MFCC18

Extract Feature (02.extract.feat.sh)19Training SetDevelopment SetTesting SetInputOutputArchiveKaldi rspecifier & wspecifier formatark: wavmfccstatisticsscp: (material/train.wav.scp)arkark,t: ark,t,tark,scp:, arkscp20Extract Feature (extract.feat.sh)add-deltascompute-cmvn-stats apply-cmvn

21MFCC Add deltaadd-deltasDeltas and Delta-DeltasMFCC () 39Usage

22

MFCC CMVNCMVNCepstral Mean and Variance Normalization

23MFCC CMVNcompute-cmvn-statsUsage

apply-cmvnUsage24

Hint (Important!!)25Linux, background knowledge01.format.sh, 02.extract.feat.shHomework26Homework Linux Linux Linux Linux http://linux.vbird.org/linux_basic/0220filemanager.phpvim http://linux.vbird.org/linux_basic/0310vi.php

27Homework (optional) https://www.dropbox.com/s/dsaqh6xa9dp3dzw/wfst_thesis.pdf28 pietty/putty/Xshellssh 140.112.21.9 port 22(/proj1/)cp /share/proj1.ASTMIC.subset.tar.gztar zxvf proj1.ASTMIC.subset.tar.gzData29To Do30Step 1: Execute the following command:script/01.format.sh | tee log/01.format.logscript/02.extract.feat.sh | tee log/02.extract.feat.sh.logStep 2:Add-deltaCMVNObserve the output and reportQuestions31The intuition behind Mel-scale filter bank.Why does the dimension of MFCC = 13?How do we extract features from speech ? Draw the work flow of extracting MFCC.ScheduleWeekProgressGroup2IntroductionLinux + Feature extraction3Acoustic model trainingmonophone & triphone4Language model training + DecodingA5Live demo systemB6Progress ReportA7Progress ReportB32If you have any problem Facebook Group103Lecture systemhttp://speech.ee.ntu.edu.tw/[email protected]

e-mailfacebook [email protected], ,(A/B),emails,facebook,,Thanks33

Documents

專題研究 (1) Introduction