TA. Cheng-Chieh Yehspeech.ee.ntu.edu.tw/Project2018Spring/SpeechProj2.pdf · 2017. 9. 18. ·...

專題研究WEEK2

Prof. Lin-shan LeeTA. Cheng-Chieh Yeh

1. Recap

2. Apply HMM to Acoustic Modeling

3. Acoustic Model Training

4. Homework

Outline2

Recap3

語音辨識系統

Front-endSignal

Processing

AcousticModels Lexicon

FeatureVectors

Linguistic Decoding and

Search Algorithm

Output Sentence

SpeechCorpora

AcousticModel

Training

LanguageModel

ConstructionText

Corpora

LexicalKnowledge-base

LanguageModel

Input Speech

Grammar

Last time

Feature Extraction

◻ Feature Extraction5

How to do recognition?

◻ How to map speech O to a word sequence W ?

◻ P(O|W): acoustic model◻ P(W): language model

Apply HMM to Acoustic Modeling7

RGBGGBBGRRR……

Hidden Markov Model

Simplified HMM ExampleWhat problem can a HMM model?

Hidden Markov Model◻ Elements of an HMM {S,A,B,π}

⬜ S is a set of N states⬜ A is the N✕N matrix of state transition probabilities⬜ B is a set of N probability functions, each describing the

observation probability with respect to a state⬜ π is the vector of initial state probabilities

{A:.3,B:.2,C:.5}

{A:.7,B:.1,C:.2} {A:.3,B:.6,C:.1}

0.30.3

0.20.1

Gaussian Mixture Model (GMM)

• What if observation is continuous? (Ex. MFCC feature vectors)• Need a continuous probability density function to model

observations, which is often assumed to be a GMM.

HMM: Three Basic Problems

◻Given an observation sequence O=(o1,o2,…..,oT) and an HMM λ=(A,B,π)⬜Problem 1 :

How to efficiently compute P(O| λ) ?ð Evaluation problem⬜Problem 2 :

How to choose an optimal state sequence q=(q1,q2,……, qT) ?ð Decoding Problem⬜Problem 3 :

Given some observations O for the HMM λ , how to adjust the model parameter λ =(A,B,π) to maximize P(O| λ)?ð Learning /Training Problem

HMM: Three Basic Problems

◻Given an observation sequence O=(o1,o2,…..,oT) and an HMM λ=(A,B,π)⬜Problem 1 :

How to efficiently compute P(O| λ) ?ð Evaluation problem⬜Problem 2 :

How to choose an optimal state sequence q=(q1,q2,……, qT) ?ð Decoding Problem⬜Problem 3 :

Given some observations O for the HMM λ , how to adjust the model parameter λ =(A,B,π) to maximize P(O| λ)?ð Learning /Training Problem

DP and variations

EM and variations

HMM Training

◻Concept: EM-Training◻Ref:老師上課投影片4.0◻1.粗調：Segmental K-means◻2.細調：Baum-Welch Algorithm

Acoustic Model P(O|W)

◻ How to compute P(O|W) ?

ㄐ一ㄣㄊ一ㄢ

Acoustic Model P(O|W)

◻ Model of a phone

Gaussian Mixture Model

Markov Model

Segmental K-means

假設今天有四個人都發出‘ㄅ’這個音，但每個人念的長短不一

An example of Modifying HMM

1 2 3 4 5 6 7 8 9 10O4

v2b1(v1)=3/4, b1(v2)=1/4b2(v1)=1/3, b2(v2)=2/3b3(v1)=2/3, b3(v2)=1/3

Monophone vs. triphone

⬜Monophoneconsider only one phone imformation per model

⬜Triphonetaking into consideration both left and right neighboring phones

Ex. Monophone Triphone一、ㄨ、ㄩㄐ-一+ㄢ、ㄐ-一+ㄤ

different models betweenthese two '一' !!!

Triphone

◻ A phone model taking into consideration both left and right neighboring phones (60)3→ 216,000

◻ We need to share rare observations' parameters with others.

Generalized Triphone

Shared Distribution Model (SDM)

• Sharing at Model Level • Sharing at State Level

Actually, we use DecisionTree-based method now.

Training Tri-phone Models with Decision Trees

Example Questions:12: Is left context a vowel?24: Is left context a back-vowel?30: Is left context a low-vowel?32: Is left context a rounded-vowel?

30 sil-b+u

a-b+uo-b+uy-b+uY-b+u

U-b+u u-b+u i-b+u24

e-b+ur-b+u 50

N-b+uM-b+u E-b+u

yes no

· An Example: “( _ ‒ ) b ( +_ )”

How Kaldi Use Decision Trees to train Triphone

Ref: http://www.aclweb.org/anthology/H94-1062

Cluster-basedTree-based Cluster

How Kaldi Use Decision Trees to train Triphone

Ref: http://www.aclweb.org/anthology/H94-1062

Cluster-basedTree-based Cluster

No expert‘s rule questionsFully data-driven split

Acoustic Model Training23

03.mono.train.sh05.tree.build.sh06.tri.train.sh

Acoustic Model Training Steps

! Step1: Train Monophone! Step2: Build Decision Trees! Step3: Train Triphone

Train Monophone

◻ Get features (last time)◻ Train monophone model:

⬜ a. gmm-init-mono initial monophone model⬜ b. compile-train-graphs get train graph⬜ c. align-equal-compiled model -> decode&align

(use gmm-align-compiled instead when looping)⬜ d. gmm-acc-stats-ali EM training: E step⬜ e. gmm-est EM training: M step⬜ f. go to step c

Train Triphone

◻ Train triphone model⬜ a. gmm-init-model Initialize GMM ( from decision tree)⬜ b. gmm-mixup Gaussian merging (increase #gaussian)⬜ c. convert-ali Convert alignments(model <-> decisoin tree)⬜ d. compile-train-graphs get train graph⬜ e. gmm-align-compiled model -> decode&align⬜ f. gmm-acc-stats-ali EM training: E step⬜ g. gmm-est EM training: M step⬜ h. Goto step e. Train several times

How to get Kaldi usage?

source setup.shalign-equal-compiled --help

align-equal-compiledWrite an equally spaced alignment (for getting training started)Usage: align-equal-compiled <graphs-rspecifier> <features-rspecifier> <alignments-wspecifier>e.g.:align-equal-compiled 1.mdl 1.fsts scp:train.scp ark:equal.ali

gmm-align-compiled $scale_opts --beam=$beam --retry-beam=$[$beam*4] <hmm-model*> ark:$dir/train.graph ark,s,cs:$feat ark:<alignment*>For first iteration(in monophone) beamwidth = 6, others = 10;Only realign at $realign_iters="1 2 3 4 5 6 7 8 9 10 12 14 16 18 20 23 26 29 32 35 38”$realign_iters=“10 20 30”

gmm-acc-stats-ali

Accumulate stats for GMM training.(E step)Usage: gmm-acc-stats-ali [options] <model-in> <feature-rspecifier> <alignments-rspecifier> <stats-out>e.g.:gmm-acc-stats-ali 1.mdl scp:train.scp ark:1.ali 1.acc

gmm-acc-stats-ali --binary=false <hmm-model*> ark,s,cs:$feat ark,s,cs:<alignment*> <stats>

gmm-est

Do Maximum Likelihood re-estimation of GMM-based acoustic modelUsage: gmm-est [options] <model-in> <stats-in> <model-out>e.g.: gmm-est 1.mdl 1.acc 2.mdl

gmm-est --binary=false --write-occs=<*.occs> --mix-up=$numgauss <hmm-model-in> <stats> <hmm-model-out>--write-occs : File to write pdf occupation counts to.$numgauss increases every time.

Homework31

03.mono.train.sh 05.tree.build.sh 06.tri.train.sh閱讀：數位語音處理概論ch4, ch5 (opt.)

ToDo◻ Step0. Make sure last time‘s results exist.

⬜ ex. feat/train.39.cmvn.ark ...◻ Step1. Execute the following commands.⬜ script/03.mono.train.sh | tee log/03.mono.train.log⬜ script/05.tree.build.sh | tee log/05.tree.build.log⬜ script/06.tri.train.sh | tee log/06.tri.train.log

◻ Step2. finish code in ToDo and redo Step1.⬜ script/03.mono.train.sh ⬜ script/06.tri.train.sh

◻ Step3. Observe the output and results.◻ Step4.(Opt.) tune #gaussian and #iteration.

Hint (extremely important!!)

◻ 03.mono.train.sh⬜ Use the variables already defined.

⬜ Use these formula:

⬜ Pipe for error■compute-mfcc-feats … 2> $log

Kaldi HMM Resource

◻http://kaldi-asr.org/doc/hmm.html◻http://blog.csdn.net/u010731824/article/details/69668765#transitionmodel◻http://blog.csdn.net/u010731824/article/details/70161677

Questions?

◻ Try drawing the workflow of training.

TA. Cheng-Chieh Yehspeech.ee.ntu.edu.tw/Project2018Spring/SpeechProj2.pdf · 2017. 9. 18. ·...

Documents

EVENT FLIGHT TEAM NAME PLAYER NAME CALL NAME CARD ID ... · 01 doubles b chang ting chie/haung-yu-lin haung-yu-lin 黃毓麟 146*****768 chang ting chieh ... 01 doubles b ho cheng

Yangyang Cheng

Maste Cheng Yen

Flash Memory Fault Modeling and Test Algorithm Development Adviser: Prof. Cheng-Wen Wu 吳誠文教授 Student: Jen-Chieh Yeh 葉人傑 May 06, 2004 LAB for Reliable

EMT ch1 cheng

paul cheng

Master Cheng Yen

Chieh-chun Ku Design Portfolio

Han Cheng portfolio

Wang Chun-Chieh

Sulucionario electromagnetismo cheng

Shareholders’ Meeting · Attendance of directors: Cheng, Ta-Yu、Cheng, Yin-Hua、Su, Hui-Fen、Gordon Yeh Attendance of independent directors: Chen, Ming-Tai Attendance of supervisors:

Lecture 1: Introduction of Thesis Writing English Thesis Writing Workshop presented by Dr. Cheng-Chieh (Frank) Chen Graduate Institute of Logistics Management,

Kerwin Cheng

Anne Cheng

Hsiao chieh tseng

Young, Wei-Chieh

Dr. Yuan-Chieh Chang1 The determinants of academic entrepreneurial performance in Taiwan: the institutional and resource-based perspective Dr. Yuan-Chieh

專題研究 WEEK 4 - LIVE DEMO Prof. Lin-Shan Lee TA. Hsiang-Hung Lu,Cheng-Kuan Wei

PAN,YU-CHIEH profolio