91
1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive and Differential Coding

Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

1

Digital Speech Processing—Lecture 16

Speech Coding Methods Based on Speech Waveform

Representations and Speech Models—Adaptive

and Differential Coding

Page 2: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

2

Speech Waveform Coding-Summary of Part 1

1 2 323 0

8

/

( ) ( )| |

σ

πσ

−⎡ ⎤= = ∞⎢ ⎥⎢ ⎥⎣ ⎦

x

x

x

p x e px

21 102 2

| |

( ) ( )σ

σ σ

= =x

x

x x

p x e p

1. Probability density function for speech samples

Gamma

Laplacian

2. Coding paradigms

• uniform -- divide interval from +Xmax to –Xmax into 2B intervals of length ∆=(2Xmax/2B) for a B-bit quantizer

∆∆ ∆ ∆ ∆ ∆ ∆

-Xmax=4σx +Xmax=4σx

Page 3: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

3

Speech Waveform Coding-Summary of Part 1

∆∆ ∆ ∆ ∆ ∆ ∆

-Xmax=4σx +Xmax=4σx

6 4 77 20 10

20 10

2 6 02 46

max

max

max max

ˆ[ ] [ ] [ ]

. log

sensitivity to / ( varies a lot!!!) not great use of bits for actual speech densities!

log (uniform) (B=8)

. .

x

x x

x x

x n x n e n

XSNR B

X

X X SNR

σ

σ σ

σ σ

= +

⎡ ⎤= + − ⎢ ⎥

⎣ ⎦

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

754 12 04 40 738 18 06 34 7116 24 08 28 6932 30 10 22 6764 36 12 16 65

max

max

. .

. .. .. .. .

(or equivalently ) varies a lot across sounds, speakers, environments need to adapt coder ( [ ]) to time varying or key

x

x

Xn X

σσΔ

question is how to adapt

-∆/2 ∆/2

1/∆

p(e)

30 dB loss as Xmax/σx varies over a 32:1 range

Page 4: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

4

Speech Waveform Coding-Summary of Part 1

[ ]

[ ]1

1max

max

[ ] [ ]

| [ ] |log [ ]

log( )

y n F x n

x nX

X sign x nμ

μ

=

⎡ ⎤+⎢ ⎥

⎣ ⎦= ⋅+

• pseudo-logarithmic (constant percentage error)

- compress x[n] by pseudo-logarithmic compander

- quantize the companded x[n] uniformly

- expand the quantized signal

max

max

large | [ ] |

| [ ] | | [ ] | loglog

x n

X x ny nX

μμ

⎡ ⎤≈ ⋅ ⎢ ⎥

⎣ ⎦

-Xmax=4σx +Xmax=4σx

Page 5: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

5

Speech Waveform Coding-Summary of Part 1

[ ]2

10 106 4 77 20 1 10 1 2max max

max

( ) . log ln( ) log

insensitive to / over a wide range for large

μμσ μσ

σ μ

⎡ ⎤⎛ ⎞ ⎛ ⎞⎢ ⎥= + − + − + +⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦x x

x

X XSNR dB B

X

• maximum SNR coding — match signal quantization intervals to model probability distribution (Gamma, Laplacian)

• interesting—at least theoretically

Page 6: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

6

Adaptive Quantization• linear quantization => SNR depends on σx being

constant (this is clearly not the case)• instantaneous companding => SNR only weakly

dependent on Xmax/σx for large μ-law compression (100-500)

• optimum SNR => minimize σe2 when σx

2 is known, non-uniform distribution of quantization levels

Quantization dilemma: want to choose quantization step size large enough to accomodate maximum peak-to-peak range of x[n]; at the same time need to make the quantization step size small so as to minimize the quantization error– the non-stationary nature of speech (variability across sounds,

speakers, backgrounds) compounds this problem greatly

Page 7: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

7

Solutions to Quantization Dilemna

• Solution 1 - let ∆ vary to match the variance of the input signal => ∆[n]

• Solution 2 - use a variable gain, G [n],followed by a fixed quantizer step size, ∆ => keep signal variance of y[n]=G [n] x[n] constant

Case 1: ∆ [n] proportional to σx => quantization levels and ranges would be linearly scaled to match σx

2 => need to reliably estimate σx2

Case 2: G[n] proportional to 1/ σx to give σy2 ≈ constant

• need reliable estimate of σx2 for both types of adaptive quantization

Adaptive Quantization:

Page 8: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

8

Types of Adaptive Quantization• instantaneous-amplitude changes reflect sample-

to-sample variations in x[n] => rapid adaptation• syllabic-amplitude changes reflect syllable-to-

syllable variations in x[n] => slow adaptation• feed-forward-adaptive quantizers that estimate σx

2

from x[n] itself• feedback-adaptive quantizers that adapt the step

size, ∆, on the basis of the quantized signal, , (or equivalently the codewords, c[n])

ˆ[ ]x n

Page 9: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

9

Feed Forward Adaptation

ˆ ˆ[ [ [ ]′ =x n x n

Variable step size• assume uniform quantizer with step size ∆[n]

• x[n] is quantized using ∆[n] => c[n] and ∆[n] need to be transmitted to the decoder

• if c’[n]=c[n] and ∆’[n]=∆[n] => no errors in channel, and

Don’t have x[n] at the decoder to estimate ∆[n] => need to transmit ∆[n]; this is a major drawback of feed forward adaptation

Page 10: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

10

Feed-Forward Quantizer

Can’t estimate G[n] at the decoder => it has to

be transmitted

time varying gain, G[n] => c[n] and G[n] need to be transmitted to the decoder

Page 11: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

11

Feed Forward Quantizers• feed forward systems make estimates of σx

2 , then make ∆ or the quantization levels proportional to σx , or the gain is inversely proportional to σx

2

2

0

2 2

1

2

10

2

2

assume short-time energy

[ ] [ ] [ ] / [ ] where [ ] is a lowpass filter

[ ] (this can be shown)

consider [ ]

[ ] [ ]

σ

σ

σ σ

α

σ α

∞ ∞

=−∞ =

• ∝

= −

⎡ ⎤ ∝⎣ ⎦• = ≥

=

=

∑ ∑x

m m

x

n

n x m h n m h m h n

E n

h n notherwise

n x m ( )1

1

2

0 0

1 0 1

1 1 12 2

( )

[ ] [ ] [ ]( ) (recursion) this gives [ ] [ ] and [ ] / [ ]

α α

σ ασ ασ σ

−− −

=−∞

− < <

= − + − −• Δ = Δ =

∑n

n m

m

n n x nn n G n G n

Page 12: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

12

Slowly Adapting Gain Control

[ ] [ ] [ ]=y n G n x n

2 2 2

11 2

[ ] [ 1] [ 1](1 )

[ ](1 )n

n m

m

n n x n

x m

σ ασ α

α α−

− −

=−∞

= − + − −

= −∑

02

[ ][ ]σ

=GG n

n

{ }ˆ[ ] [ ] [ ]Δ=y n Q G n x n

20[ ] [ ]σΔ = Δn n

{ }[ ]ˆ[ ] [ ]Δ= nx n Q x n

0 99.α =

min max

min max

[ ][ ]

≤ ≤Δ ≤ Δ ≤ ΔG G n G

nα =0.99 => brings up level in low amplitude regions => time constant of 100 samples (12.5 msec for 8 kHz sampling rate) => syllabic rate

Page 13: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

13

Rapidly Adapting Gain Control

[ ] [ ] [ ]=y n G n x n

2 2 2

11 2

[ ] [ 1] [ 1](1 )

[ ](1 )n

n m

m

n n x n

x m

σ ασ α

α α−

− −

=−∞

= − + − −

= −∑

02

[ ][ ]σ

=GG n

n

{ }ˆ[ ] [ ] [ ]Δ=y n Q G n x n2

0[ ] [ ]σΔ = Δn n{ }[ ]

ˆ[ ] [ ]Δ= nx n Q x n

0 9.α =

min max

min max

[ ][ ]

≤ ≤Δ ≤ Δ ≤ ΔG G n G

nα = 0.9 => system reacts to amplitude variations more rapidly => provides better approximation to σy

2 = constant => time constant of 9 samples (1 msec at 8 kHz) for change => instantaneous rate

Page 14: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

14

Feed Forward Quantizers• ∆[n] and G[n] vary slowly compared to x[n]

– they must be sampled and transmitted as part of the waveform coder parameters

– rate of sampling depends on the bandwidth of the lowpass filter, h[n]—for α = 0.99, the rate is about 13 Hz; for α = 0.9, the rate is about 135 Hz

2

min max

min max

max

min

it is reasonable to place limits on the variation of [ ] or [ ], of the form [ ]

[ ]

for obtaining constant over a 40 dB range in signal levels

• Δ≤ ≤

Δ ≤ Δ ≤ Δ

• ≈ =>

=

y

n G nG G n G

n

σ

GG

100max

min

(40dB range)Δ=

Δ

Page 15: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

15

Feed Forward Adaptation Gain2 2

1

1 [ ] [ ]

[ ] or [ ] evaluated every samples used 128, 1024 samples for estimates adaptive quantizer achieves up to 5.6 dB better SNR than

non-adaptive quantizers can ac

n

m n Mn x m

Mn G n M

M

σ= − +

=

• Δ• =•

min max

hieve this SNR with low "idle channel noise" and wide speech dynamic range by suitable choice of and Δ Δ

feed-forward adaptation gain with B=3—less

gain for M=1024 than M=128 by 3 dB =>

M=1024 is too long an interval

Page 16: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

16

Feedback Adaptation

• σ2[n] estimated from quantizer output (or the code words)

• advantage of feedback adaptation is that neither ∆[n] nor G[n] needs to be transmitted to the decoder since they can be derived from the code words

• disadvantage of feedback adaptation is increased sensitivity to errors in codewords, since such errors affect ∆[n] and G[n]

Page 17: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

17

Feedback Adaptation2 2

0

2

1

12 2

10

2 1 101

ˆ [ ] [ ] [ ] / [ ]

ˆ [ ] based only on past values of [ ] two typical windows/filters are

1. [ ]

. [ ] /

ˆ[ ] [ ]

σ

σ

α

σ

∞ ∞

=−∞ =

= −

= −

••

= ≥=

= ≤ ≤=

=

∑ ∑

m m

n

n

m n M

n x m h n m h m

n x n

h n notherwise

h n M n Motherwise

n x mM

can use very short window lengths (e.g., 2) to achieve12 dB SNR for a 3 bit quantizer

==

MB

Page 18: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

18

Alternative Approach to Adaptation

[ ]

1 2 3 411

21 1

min

[ ] [ ]; { , , , }| [ ] |

[ ] [ ]ˆ( ) [ ] [ ]

[ ] only depends on [ ] and [ ] only need to transmit codewords

also necessary to impose the limits [

Δ = ⋅Δ − =

∝ −

Δ= + Δ

• Δ Δ − −=>

•Δ ≤ Δ

n P n P P P P PP c n

n sign c nx n n c n

n n c n

max

max min

] the ratio / controls the dynamic

range of the quantizer

≤ Δ

• Δ Δ

n

Page 19: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

19

Adaptation Gain• key issue is how should P vary with

|c[n-1]|– If c[n-1] is either largest positive or largest

negative codeword, then quantizer is overloaded and the quantizer step size is too small => P4 > 1

– if c[n-1] is either smallest positive or negative codeword, then quantization error is too large => P1 < 1

– need choices for P2 and P3

Page 20: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

20

Adaptation Gain

1 2 | [ 1] |2 1

+ −=

−Bc nQ

• shaded area is variation in range of P values due to different speech sounds or different B values

Can see that step size increases (P>1) are more vigorous than step size decreases (P<1) since signal growth needs to be kept within quantizer range to avoid ‘overloads’

Page 21: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

21

Optimal Step Size Multipliers

• optimal values of P for B=2,3,4,5

• improvements in SNR

• 4-7 dB improvement over μ-law

• 2-4 dB improvement over non-adaptive optimum quantizers

Page 22: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

22

Quantization of Speech Model Parameters

• Excitation and vocal tract (linear system) are characterized by sets of parameters which can be estimated from a speech signal by LP or cepstral processing.

• We can use the set of estimated parameters to synthesize an approximation to the speech signal whose quality depends of a range of factors.

Page 23: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

23

Quantization of Speech Model Parameters

• Quality and data rate of synthesis depends on:– the ability of the model to represent speech– the ability to reliably and accurately estimate

the parameters of the model– the ability to quantize the parameters in order

to obtain a low data rate digital representation that will yield a high quality reproduction of the speech signal

Page 24: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

24

Closed-Loop and Open-Loop Speech Coders

Closed-loop – used in a feedback loop where the synthetic speech output is compared to the input signal, and the resulting difference used to determine the excitation for the vocal tract model.

Open-loop – the parameters of the model are estimated directly from the speech signal with no feedback as to the quality of the resulting synthetic speech.

Page 25: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

25

Scalar Quantization• Scalar quantization – treat each model parameter

separately and quantize using a fixed number of bits– need to measure (estimate) statistics of each parameter, i.e.,

mean, variance, minimum/maximum value, pdf, etc.– each parameter has a different quantizer with a different number

of bits allocated• Example of scalar quantization

– pitch period typically ranges from 20-150 samples (at 8 kHz sampling rate) => need about 128 values (7-bits) uniformly over the range of pitch periods, including value of zero for unvoiced/background

– amplitude parameter might be quantized with a μ-law quantizer using 4-5 bits per sample

– using a frame rate of 100 frames/sec, you would need about 700 bps for pitch period and 400-500 bps for amplitude

Page 26: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

26

Scalar Quantization

• 20-th order LPC analysis frame

• Each PARCOR coefficient transformed to range: –π/2<sin-1(ki)<π/2 and then quantized with both a 4-bit and a 3-bit uniform quantizer.

• Total rate of quantized representation of speech about 5000 bps.

Page 27: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

27

Techniques of Vector Quantization

Page 28: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

28

Vector Quantization

• code block of scalars as a vector, rather than individually

• design an optimal quantizationmethod based on mean-squared distortion metric

• essential for model-based and hybrid coders

Page 29: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

29

Vector Quantization

Page 30: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

30

Waveform Coding Vector QuantizerVQ code pairs of waveform samples,

X[n]=(x[2n],x[2n+1]);

(b) Single element codebook with cluster centroid (0-bit codebook)

(c) Two element codebook with two cluster centers (1-bit codebook)

(d) Four element codebook with four cluster centers (2-bit codebook)

(e) Eight element codebook with eight cluster centers (3-bit codebook)

Page 31: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

31

Toy Example of VQ Coding• 2-pole model of the vocal tract => 4 reflection coefficients

• 4-possible vocal tract shapes => 4 sets of reflection coefficients

1. Scalar Quantization -assume 4 values for each reflection coefficient => 2-bits x 4 coefficients = 8 bits/frame

2. Vector Quantization -only 4 possible vectors => 2-bits to choose which of the 4 vectors to use for each frame (pointer into a codebook)

• this works because the scalar components of each vector are highly correlated

• if scalar components are independent => VQ offers no advantage over scalar quantization

Page 32: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

32

Elements of a VQ Implementation

1. A large training set of analysis vectors; X={X1,X2,…,XL}, L should be much larger than the size of the codebook, M, i.e., 10-100 times the size of M.

2. A measure of distance, dij=d(Xi,Xj), between a pair of analysis vectors, both for clustering the training set as well as for classifying test set vectors into unique codebook entries.

3. A centroid computation procedure and a centroid splitting procedure.

4. A classification procedure for arbitrary analysis vectors that chooses the codebook vector closest in distance to the input vector, providing the codebook index of the resulting nearest codebook vector.

Page 33: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

33

VQ Implementation

Page 34: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

34

The VQ Training Set• The VQ training set of L≥10M vectors should

span the anticipated range of:– talkers, ranging in age, accent, gender, speaking rate,

speaking levels, etc.– speaking conditions, range from quiet rooms, to

automobiles, to noisy work places– transducers and transmission systems, including a

range of microphones, telephone handsets, cellphones, speakerphones, etc.

– speech, including carefully recorded material, conversational speech, telephone queries, etc.

Page 35: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

35

The VQ Distance Measure• The VQ distance measure depends critically

on the nature of the analysis vector, X.– If X is a log spectral vector, then a possible

distance measure would be an Lp log spectral distance, of the form:

1/

1( , ) | |

pRk k p

i j i jk

d X X x x=

⎡ ⎤= −⎢ ⎥⎣ ⎦∑

• If X is a cepstral vector, then the distance measure might well be a cepstral distance of the form:

1/22

1( , ) ( )

Rk k

i j i jk

d X X x x=

⎡ ⎤= −⎢ ⎥⎣ ⎦∑

Page 36: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

36

Clustering Training Vectors• Goal is to cluster the set of L training vectors into a set

of M codebook vectors using generalized Lloyd algorithm (also known as the K-means clustering algorithm) with the following steps:1. Initialization – arbitrarily choose M vectors (initially out of the

training set of L vectors) as the initial set of codewords in the codebook

2. Nearest Neighbor Search – for each training vector, find the codeword in the current codebook that is closest (in distance) and assign that vector to the corresponding cell

3. Centroid Update – update the codeword in each cell to the centroid of all the training vectors assigned to that cell in the current iteration

4. Iteration – repeat steps 2 and 3 until the average distance between centroids at successive iterations falls below a preset threshold

Page 37: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

37

Clustering Training Vectors

Voronoi regions and

centroids

Page 38: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

38

Centroid Computation

1 2{ , ,..., }.

Assume we have a set of vectors, where all vectors are assigned to cluster The centroid of the set is defined as the vector that minimizes the average distortion,

C C C CV

C

VX X X X

V CX

Y

=

1

2

1min ( , )

i.e.,

The solution for the centroid is highly dependenton the choice of distance measure. When both and are measured in a -dimensional space withthe norm, the cen

VCiY i

Ci

Y d X YV

XY KL

=

= ∑

1

1

1troid is the mean of the vector set

When using an distance measure, the centroid is themedian vector of the set of vectors assigned to the given class.

VCi

iY X

VL

=

= ∑

Page 39: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

39

Vector Classification Procedure

The classification procedure for arbitrary test set vectors is a full search through the codebook to find the "best" (minimumdistance) match. If we denote the codebook vectors of an -vector codeboM

1

1(

arg min ( , )

okas for , and we denote the vector to be classifiedand vector quantized) as , then the index, , of the best

codebook entry is:

i

ii M

CB i MX i

i d X CB≤ ≤

≤ ≤∗

∗ =

Page 40: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

40

Binary Split Codebook Design1. Design a 1-vector codebook; the single vector in the

codebook is the centroid of the entire set of training vectors

2. Double the size of the codebook by splitting each current codebook vector, Ym, according to the rule:

where m varies from 1 to the size of the current codebook, and epsilon is a splitting parameter (0.01 typically)

3. Use the K-means clustering algorithm to get the best set of centroids for the split codebook

4. Iterate steps 2 and 3 until a codebook of size M is designed.

(1 )

(1 )m m

m m

Y Y

Y Y

ε

ε

+

= +

= −

Page 41: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

41

Binary Split Algorithm

Page 42: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

42

VQ Codebook of LPC Vectors

64 vectors in a

codebook of spectral

shapes

Page 43: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

VQ Cells

43

Page 44: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

VQ Cells

44

Page 45: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

45

VQ Coding for Speech

‘distortion’ in coding computed using a spectral distortion measure related to the difference in log spectra between the original and the codebook vectors

10-bit VQ comparable to 24-bit scalar quantization for these examples

Page 46: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

46

Differential Quantization

Theory and Practice

Page 47: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

47

Differential Quantization• we have carried instantaneous quantization of

x[n] as far as possible• time to consider correlations between speech

samples separated in time => differential quantization

• high correlation values => signal does not change rapidly in time => difference between adjacent samples should have lower variance than the signal itselfdifferential quantization can increase SNR at a given

bit rate, or lower bit rate for a given SNR

Page 48: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

48

Example of Difference Signal

Prediction error from LPC

analysis using p=12

Prediction error is about 2.5

times smaller than signal

Page 49: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

49

Differential Quantization

[ ] [ ] [ ] where [ ] unquantized input sample

[ ] estimate or prediction of [ ]ˆ [ ] is the output of a predictor system, , whose input is [ ],

a quantized version of

= −• =

=

%

%

%

d n x n x nx nx n x nx n P x n

[ ] [ ] prediction error signal

ˆ[ ] quantized difference (prediction error) signal

=

=

x nd n

d n

Q[ ] Encoder

P

+

+

Δ

[ ]c nˆ[ ]d n

ˆ[ ]x n[ ]x n%

[ ]d n[ ]x n

(a) Encoder

Page 50: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

50

Differential Quantization

Decoder

P

+

Δ'[ ]c n ˆ '[ ]d n

'[ ]x n%

ˆ '[ ]x n

(b) Decoder [ ] [ ] [ ] where [ ] unquantized input sample

[ ] estimate or prediction of [ ]ˆ [ ] is the output of a predictor system, , whose input is [ ],

a quantized version of

d n x n x nx nx n x nx n P x n

= −• =

=

%

%

%

[ ] [ ] prediction error signal

ˆ[ ] quantized difference (prediction error) signal

x nd n

d n

=

=

Page 51: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

51

Q[ ] Encoder

P

+

+

Δ

[ ]c nˆ[ ]d n

ˆ[ ]x n[ ]x n%

[ ]d n[ ]x n

This part reconstructs the quantized signal, ˆ[ ]x n

[ ] [ ] [ ]ˆ[ ] [ ] [ ]

= −

= +

%d n x n x n

d n d n e n

ˆˆ[ ] [ ] [ ]ˆ[ ] [ ] [ ]

x n x n d nx n x n e n= +

⇒ = +

%

1

1

( )

ˆ[ ] [ ]

α

α

=

=

=

= −

∑%

pk

kkp

kk

P z z

x n x n k

Differential Quantization

Page 52: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

52

Differential Quantization• difference signal, d[n], is quantized - not x[n]• quantizer can be fixed, or adaptive, uniform or non-uniform• quantizer parameters are adjusted to match the variance of d[n]

ˆ [ ] [ ] [ ] [ ] quantization errorˆˆ [ ] [ ] [ ] predicted plus quantized

ˆ [ ] [ ] [ ] quantized input has same quantizationerror as the difference signal if σ

= + −

= + −= + −

=>

%

d

d n d n e n e n

x n x n d n x dx n x n e n

22 , error is smaller independent of predictor, , quantized [ ] differs from unquantized [ ] by [ ], the quantization error of the difference signal!

good prediction gives lower quantization

σ<•

=>

x

P x nx n e n

error than quantizing input directly

Page 53: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

53

Differential Quantization• quantized difference signal is encoded into c(n)

for transmission

• first reconstruct the quantized difference signal from the decoder codeword, c’[n] and the step size ∆

• next reconstruct the quantized input signal using the same predictor, P, as used in the encoder

H(z)

ˆ ˆ ˆ( ) ( ) ( ) ( )1ˆ ˆ ˆ( ) ( ) ( ) ( )

1 ( )

′ ′ ′= +

′ ′ ′= =−

X z D z P z X z

X z D z H z D zP z

1

1( )1 α −

=

=−∑

pk

kk

H zz

Page 54: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

54

SNR for Differential Quantization

2 2

22

22

2 2

2

2

2

the SNR of the differential coding system is

[ ]

[ ]

where

signal-to-quantizing-noise ratio of the quantizer

σσ

σσσ σ

σσ

σ

⎡ ⎤⎣ ⎦= =⎡ ⎤⎣ ⎦

= ⋅ = ⋅

= =

=

x

e

dxP Q

d e

dQ

e

xP

E x nSNR

E e n

SNR G SNR

SNR

G 2 gain due to differential quantizationσ

=d

Page 55: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

55

SNR for Differential Quantization

depends on chosen quantizer and can be maximized using all of the previous quantization methods (uniform, non-uniform, optimal)

, hopefully, 1, is the gain in due to differential coding

w

• >

Q

P

SNR

G SNR

2

2

ant to choose the predictor, , to maximize since is fixed, then we

need to minimize , i.e., design the best predictor

σ⇒P x

d

PG

σ P

Page 56: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

56

Predictor for Differential Quantization

1

11

consider class of linear predictors,

ˆ [ ] [ ]

[ ] is a linear combination of previous quantized values of [ ] the predictor -transform is

( ) ( ) --

α

α

=

=

= −

••

= = −

%

%

p

kk

pk

kk

P

x n x n k

x n x nz

P z z A z

10

predictor system function

with predictor impulse response coefficients (FIR filter) [ ] α•

= ≤ ≤

=kp n k p

otherwise

Page 57: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

57

Predictor for Differential Quantization

1

1 1 111

ˆ the reconstructed signal is the output, [ ], of a system with system function

ˆ ( ) ( ) ˆ ( ) ( )( )

where the input to the system is the quantized difference

sig

α −

=

= = = =−−

∑p

kk

k

x n

X zH zP z A zD zz

1

1

ˆnal, [ ]

ˆ ˆ ˆ [ ] [ ] [ ]

where

ˆ [ ] [ ]

α

α

=

=

= − −

− =

∑ %

p

kk

p

kk

d n

d n x n x n k

x n k x n

Page 58: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

58

Predictor for Differential Quantization

( )

2

22 2

2

1

2

1 1

to solve for optimum predictor, need expression for

[ ] [ ] [ ]

ˆ[ ] [ ]

[ ] [ ] [ ]

ˆ( [ ] [ ] [ ])

σ

σ

α

α α

=

= =

⎡ ⎤⎡ ⎤= = −⎣ ⎦ ⎣ ⎦⎡ ⎤⎡ ⎤⎢ ⎥= − −⎢ ⎥⎢ ⎥⎣ ⎦⎣ ⎦⎡ ⎤⎛ ⎞⎢ ⎥= − − − −⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦

= +

∑ ∑

%

d

d

p

kk

p p

k kk k

E d n E x n x n

E x n x n k

E x n x n k e n k

x n x n e n

Page 59: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

59

Solution for Optimum Predictor

( ) ( )

2

2

2

1

1

2

0 1

want to choose { }, , to minimize

differentiate wrt , set derivatives to zero, giving

[ ] [ ] [ ] [ ] [ ]

which can be wri

α σ

σ α

σα

α =

• ≤ ≤ =>

⎡ ⎤⎛ ⎞∂= − − − + − ⋅ − + −⎢ ⎥⎜ ⎟∂ ⎝ ⎠⎣ ⎦= ≤ ≤

j d

d j

pd

kkj

j p

E x n x n k e n k x n j e n j

j p

( )2

0 1

tten in the more compact formˆ ˆ [ ] [ ] [ ] [ ] [ ] ,

the predictor coefficients that minimize are the ones that make the difference signal, [ ], be uncorrelated w

σ

⎡ ⎤− − = ⋅ − = ≤ ≤⎡ ⎤⎣ ⎦⎣ ⎦•

%

d

E x n x n x n j E d n x n j j p

d n1

ith past ˆvalues of the predictor input, [ ],− ≤ ≤x n j j p

Page 60: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

60

Solution for Alphas( ) 0 1ˆ ˆ [ ] [ ] [ ] [ ] [ ] ,

basic equations of differential codingˆ[ ] [ ] [ ] quantization of difference signalˆ[ ] [ ] [ ] error same for original signalˆ[ ] [ ]

⎡ ⎤− − = ⋅ − = ≤ ≤⎡ ⎤⎣ ⎦⎣ ⎦•

= += +

= +

%

%

E x n x n x n j E d n x n j j p

d n d n e nx n x n e n

x n x n

[ ]

1

1

1 1

ˆ[ ] feedback loop for signal

ˆ[ ] [ ] prediction loop based on quantized input

ˆ ˆ ˆ[ ] [ ] [ ] direct substitution

ˆ[ ] [ ] [ ]

ˆ[ ] [ ] [ ] [ ]

α

α

α α

=

=

= =

= −

= − −

− = − + −

= − = − + −

∑ ∑

%

%

p

kk

p

kk

p p

k kk k

d n

x n x n k

d n x n x n k

x n j x n j e n j

x n x n k x n k e n k

Page 61: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

61

Solution for Alphas( )

[ ]

[ ] [ ] [ ] [ ]

[ ]

1 1

1

ˆ ˆ [ ] [ ] [ ] [ ] [ ] 0, 1

ˆ[ ] [ ] [ ]

ˆ[ ] [ ] [ ] [ ]

[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

[ ] [ ] [ ]

α α

α

= =

=

⎡ ⎤− − = ⋅ − = ≤ ≤⎡ ⎤⎣ ⎦⎣ ⎦− = − + −

= − = − + −

− + − − − − −

⎡ ⎤= − + − −⎢ ⎥

⎣ ⎦

∑ ∑

%

%

% %

p p

k kk k

p

kk

E x n x n x n j E d n x n j j p

x n j x n j e n j

x n x n k x n k e n k

E x n x n j E x n e n j E x n x n j E x n e n j

E x n k e n k x n j

[ ]

[ ] [ ]

[ ] [ ]

1

1 1

1 1

[ ] [ ] [ ]

[ ] [ ] [ ] [ ]

[ ] [ ] [ ] [ ]

α

α α

α α

=

= =

= =

+

⎡ ⎤− + − −⎢ ⎥

⎣ ⎦

= − − + − − +

− − + − −

∑ ∑

∑ ∑

p

kk

p p

k kk kp p

k kk k

E x n k e n k e n j

E x n k x n j E e n k x n j

E x n k e n j E e n k e n j

Page 62: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

62

Solution for Optimum Predictor

[ ] [ ] [ ]

[ ] [ ]

[ ]

1

1 1

11

solution for - first expand terms to give

[ ] [ ] [ ] [ ] [ ] [ ]

[ ] [ ] [ ] [ ]

[ ] [ ] ,

assume fine quantization so th

α

α

α α

α

=

= =

=

− ⋅ + − ⋅ = − ⋅ −

+ − ⋅ − + − ⋅ −

+ − ⋅ − ≤ ≤

∑ ∑

kp

kk

p p

k kk kp

kk

E x n j x n E e n j x n E x n j x n k

E e n j x n k E x n j e n k

E e n j e n k j p

[ ][ ] 2

0

at [ ] is uncorrelated with [ ], and

[ ] is stationary white noise (zero mean), giving [ ] [ ] , ,

[ ] [ ] [ ]σ δ

− ⋅ − = ∀

− ⋅ − = ⋅ −e

e n x n

e nE x n j e n k n j k

E e n j e n k j k

Page 63: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

63

Solution for Optimum Predictor

2

1

2

2 21

1

1

we can now simplify solution to form

[ ] [ ] [ ] ,

where [ ] is the autocorrelation of [ ]. Defining terms

[ ] [ ] [ ] [ ] ,

φ α φ σ δ

φ

σφρ α ρ δσ σ

=

=

⎡ ⎤= − + − ≤ ≤⎣ ⎦

⎡ ⎤= = − + − ≤⎢ ⎥

⎣ ⎦

p

k ek

pe

kkx x

j j k j k j p

j x n

jj j k j k

1

2

1 1 1 1 12 1 1 1 2

1 2 1 1

or in matrix form

[ ] / [ ] [ ][ ] [ ] / [ ]

. , ,.

..[ ] [ ] [ ] /

ρ ααρ ρ ραρ ρ ρ

ρ α

αρ ρ ρ

•=

⎡ ⎤ + −⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥+ −⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥= = =⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥− − +⎣ ⎦ ⎣ ⎦⎣ ⎦p

j p

C

SNR pSNR p

C

p p p SNR

Page 64: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

64

Solution for Optimum Predictor

1

2

1

1 1 1 1 12 1 1 1 2

1 2 1 1

ρ ααρ ρ ραρ ρ ρ

ρ α

αρ ρ ρ

σ−

=

⎡ ⎤ + −⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥+ −⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥= = =⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥− − +⎣ ⎦ ⎣ ⎦⎣ ⎦

= =

[ ] / [ ] [ ][ ] [ ] / [ ]

. , ,.

..[ ] [ ] [ ] /

with matrix solution (defining

p

C

SNR pSNR p

C

p p p SNR

α C ρ SNR 2 2

1

2 2

σ

σ σα

−• =>

• =

/ )

where is a Toeplitz matrix can be computed via well understoodnumerical methods the problem here is that depends on / , but depends on

coefficients of the predicto

x e

x e

k

C C

C SNR SNR=>r, which depend on bit of a dilemmaSNR

Page 65: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

65

Solution for Optimum Predictor

1

1

1

1

11 1

1 1

special case of , where we can solve directly for of this first order linear predictor, as

[ ] /

can see that [ ] we will look further at this special case later

αρα

α ρ

• =

=+

• < <

p

SNR

Page 66: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

66

Solution for Optimum Predictor

( ) ( )( )

2

in spite of problems in solving for optimum predictor coefficients, we can solve for the prediction gain, , in terms of the coefficients, as

[ ] [ ] [ ] [ ]

[ ] [ ] [ ]

α

σ

⎡ ⎤= − ⋅ −⎣ ⎦⎡= − ⋅

% %

%

P j

d

G

E x n x n x n x n

E x n x n x n ( )( )

[ ] [ ] [ ]

where the term [ ] [ ] is the prediction error; we can show that the second term in the expression above is zero, i.e., the prediction error is uncorrelated with the pred

⎤ ⎡ ⎤− − ⋅⎣ ⎦ ⎣ ⎦• −

% %

%

E x n x n x n

x n x n

( )

( )

2

2

1

2 2 2

1 11

iction value; thus [ ] [ ] [ ]

[ ] [ ] [ ]) [ ]

assuming uncorrelated signal and noise, we get

[ ] [ ]

(

σ

α

σ σ α φ σ α ρ

=

= =

⎡ ⎤= − ⋅⎣ ⎦⎡ ⎤⎡ ⎤= − − + − ⋅⎢ ⎥⎣ ⎦ ⎣ ⎦

⎡ ⎤= − = −⎢ ⎥

⎣ ⎦

∑ ∑

%d

p

kk

p p

d x k x kk k

P

E x n x n x n

E x n E x n k e n k x n

k k

G

1

1

1) for optimum values of

[ ]α

α ρ=

=−∑

opt kp

kk

k

Page 67: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

67

First Order Predictor Solution

( )

2 2

1

1

11 1

1

1

For the case we can examine effects of sub-optimum value of on the quantity / . The optimum solution is:

[ ]

Consider choosing an arbitrary value for ; then we

α σ σ

α ρα

• =

=

=−

P x d

P opt

pG

G

( )

2 2 2 2 21 1 1

21 1

21

1 2 1

11 2 1 1 1

get

[ ]

Giving the sub-optimum result

[ ] ( / )

Where the term / represents the increase in variance of [ ] due to the feedback of the

σ σ α ρ α α σ

α ρ α

α

⎡ ⎤= − + +⎣ ⎦•

=− + +

d x e

P arbG

SNR

SNRd n error signal [ ]. e n

Page 68: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

First Order Predictor Solution

68

( )

( )

( )

21

21 1

2

2

2 2

1

1 2 1

1

111 11

1 1 1 1

1

1

Can reformulate as

[ ]

for any value of (including the optimum value). Consider the case of ( )

[ ][

[ ] [ ]

α

α ρ αα

α ρ

ρρ

ρ ρ

−=

− +

• =

−⎡ ⎤

= = ⋅ −⎢ ⎥− −⎣ ⎦

P arb

QP arb

QP subopt

G

SNRG

SNRG )

the gain in prediction is a product of the prediction gain without the quantizer, reduced by the loss due to feedback of the error signal.

⎡ ⎤⎢ ⎥⎣ ⎦

•QSNR

Page 69: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

69

First Order Predictor Solution

[ ]

( )

1

1

2

1 1 1

1

11 1

We showed before that the optimum value of was [ ] / / If we neglect the term in 1/SNR (usually very small), then

[ ] and the gain due to prediction is

[ ]

Thus t

α

ρ

α ρ

ρ

+

•=

=−

P opt

SNR

G

( )

1 01 0 8

2 77

here is a prediction gain so long as [ ] It is reasonable to assume that for speech, [ ] . , giving

. (or 4.43 dB)

ρρ

≠• >

>P optG

Page 70: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

70

Differential Quantizationˆ[ ] [ ]

[ ] [ ] [ ]ˆ[ ] [ ] [ ]

ˆˆ[ ] [ ] [ ]ˆ[ ] [ ] [ ]

=

= −

= −

= +

= +

= +

∑%

%

%

p

kk

x n x n k

d n x n x n

d n d n e n

x n x n d nx n x n e n

Q

P

+

+

x[n]+ -

d[n] d[n]^

x[n]~

[ ]/

[ ][ ]

[ ]

22 2

2 2 2

1

2

2

2

First Order Predictor:1

1 11 11

1 1

11 1

σσ σσ σ σ

ρα

ρρ

ρ

= = = ⋅

=+

⎛ ⎞⎛ ⎞= −⎜ ⎟⎜ ⎟⎜ ⎟−⎝ ⎠⎝ ⎠⎛ ⎞

≈ ⎜ ⎟−⎝ ⎠

dx xp Q

e d e

pQ

SNR G SNR

SNR

GSNR

x[n]^

The error, e[n], in quantizing d[n] is the same as the error in

representing x[n]

Prediction gain dependent on ρ[1], the first correlation coefficient

Page 71: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

71

Long-Term Spectrum and Correlation

Measured with 32-pointHamming window

0.85790.5680

0.2536

Page 72: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

72

Computed Prediction Gain

Page 73: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

73

Actual Prediction Gains for Speech

• variation in gain across 4 speakers

• can get about 6 dB improvement in SNR => 1 extra bit equivalent in quantization—but at a price of increased complexity in quantization

• differential quantization works!!

• gain in SNR depends on signal correlations

• fixed predictor cannot be optimum for all speakers and for all speech material

Page 74: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

74

Delta Modulation

Linear and Adaptive

Page 75: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

75

Delta Modulation• simplest form of differential quantization is in delta modulation (DM)• sampling rate chosen to be many times the Nyquist rate for the input

signal => adjacent samples are highly correlated• in the limit as T 0, we expect

• this leads to a high ability to predict x[n] from past samples, with the variance of the prediction error being very low, leading to a high prediction gain => can use simple 1-bit (2-level) quantizer => the bit rate for DM systems is just the (high) sampling rate of the signal

2[1] 0 as φ σ→ →x T

Page 76: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

76

Linear Delta Modulation

( ) 2

ˆ[ ] [ ] 0 [ ] 0[ ] 0 [ ] 1

11 [1]

[1]

2-level quantizer with fixed step size, , with quantizer form

( )( )

using simple first order predictor with optimum prediction gain

as

P opt

d n if d n c nif d n c n

ρ

•Δ

= Δ > == −Δ < =

=−

• → ( )1,

[1] 1)

(qualitatively only since the assumptions under which the equation was derived break down as

P optG

ρ

→ ∞

Page 77: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

77

Illustration of DMˆˆ ˆ[ ] [ 1] [ ]

1ˆ[ ] [ ] [ 1] [ ] [ 1] [ 1]

[ ]

basic equations of DM are

when (essentially digital integration or accumulation of increments of )

is a first backward

α

= − +• ≈ ± Δ

= − − = − − − −•

x n x n d nα

d n x n x n x n x n e nd n [ ]

( )( )max | |

difference of , or an approximation to the derivative of the input how big do we make --at maximum slope of we need

or else the reconstructed signal will lag the actua

• Δ

Δ≥

a

a

x nx t

dx tT dt

ˆ[ ]

l signal called 'slope overload' condition--resultingin quantization error called 'slope overload distortion' since can only increase by fixed increments of , fixed DM is called linear DM o

=>

• Δx n r LDM

slope overload condition

granular noise

Page 78: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

78

DM Granular Noise( )

( ) 0ˆ[ ]

when has small slope, determines the peak error when, quantizer will be alternating sequence of 0's and 1's, and

will alternate around zero with peak variation of this c

• Δ =>

=

Δ =>

a

a

x tx tx n onditionis called "granular noise"

• need large step size to handle wide dynamic range

• need small step size to accurately represent low level signals

• with LDM we need to worry about dynamic range and amplitude of the difference signal => choose ∆ to minimize mean-squared quantization error (a compromise between slope overload and granular noise)

Page 79: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

79

Performance of DM Systems

( )1/22

0

[ ] [ 1]

/ (2 )

normalized step size defined as

oversampling index defined as where is the sampling rate of the DM

and is the Nyquist frequency of the signal the tota

•Δ

⎡ ⎤− −⎣ ⎦•

=

S N

S

N

E x n x n

F F FF

F

02l bit rate of the DM is

= = ⋅S NBR F F F

• can see that for given value of F0, there is an optimum value of ∆

• optimum SNR increases by 9 dB for each doubling of F0 => this is better than the 6 dB obtained by increasing the number of bits/sample by 1 bit

• curves are very sharp around optimum value of ∆ => SNR is very sensitive to input level

• for SNR=35 dB, for FN=3 kHz => 200 Kbps rate

• for toll quality need much higher rates

Page 80: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

80

Adaptive Delta Mod

min max

[ ] [ 1][ ]

[ ][ 1] [ ]

[ ]ˆ[ ] [ ] [ 1]

[ ]

step size adaptation for DM (from codewords)

is a function of and , since depends

only on the sign of the sign of ca

α

Δ = ⋅Δ −Δ ≤ Δ ≤ Δ

•−

= − −•

n M nn

M c nc n c n

d nd n x n x n

d n

ˆ[ ][ ]

1 [ ] [ 1]1 [ ] [ 1]

n be determined before theactual

quantized value which needs the new value of for evaluation

the algorithm for choosing the step size multiplier is

Δ

= > = −= < ≠ −

d nn

M P if c n c nM Q if c n c n

Page 81: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

81

Adaptive DM Performance

• slope overload in LDM causes runs of 0’s or 1’s

• granularity causes runs of alternating 0’s and 1’s

• figure above shows how adaptive DM performs with P=2, Q=1/2, α=1

• during slope overload, step size increases exponentially to follow increase in waveform slope

• during granularity, step size decreases exponentially to ∆min and stays there as long as slope remains small

slope overload condition

granular noise

Page 82: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

82

ADM Parameter Behavior• ADM parameters are P, Q, ∆min and ∆max

• choose ∆min and ∆max to provide desired dynamic range

• choose ∆max/ ∆min to maintain high SNR over range of input signal levels

• ∆min should be chosen to minimize idle channel noise

• PQ should satisfy PQ≤1 for stability

• PQ chosen to be 1

• peak of SNR at P=1.5, but for range 1.25<P<2, the SNRvaries only slightly

Page 83: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

83

Comparison of LDM, ADM and log PCM

• ADM is 8 dB better SNR at 20 Kbps than LDM, and 14 dB better SNR at 60 Kbps than LDM

• ADM gives a 10 dB increase in SNR for each doubling of the bit rate; LDM gives about 6 dB

• for bit rate below 40 Kbps, ADM has higher SNR than μ-law PCM; for higher bit rates log PCM has higher SNR

Page 84: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

84

Higher Order Prediction in DM

1 1

ˆ[ ] [ 1]

ˆˆ ˆ[ ] [ 1] [ ]

ˆ ( ) 1( ) ˆ 1( )

first order predictor gave with reconstructed signal satisfying

with system function

digital equivalent of a leaky integra

α

α

α −

= −•

= − +•

= =−

%x n x n

x n x n d n

X zH zzD z

1 2ˆ ˆ[ ] [ 1] [ 2]

tor. consider a second order predictor with

α α•

= − + −%x n x n x n

( )( )

2

2 1 1

( )ˆ ( ) 1( ) , 0 , 1ˆ 1 1( )

assuming two real poles, we can write as

better prediction is achieved using this "double integration" system with up to 4 dB better SNR there are i

− −

= = < <− −

H z

X zH z a baz bzD z

ssues of interaction between the adaptive quantizer and the predictor

( )

1 2

2 1 21 2

ˆˆ ˆ ˆ[ ] [ 1] [ 2] [ ]

ˆ ( ) 1( ) ˆ 1( )

with reconstructed signal

with system function

α α

α α− −

= − + − +•

= =− −

x n x n x n d n

X zH zz zD z

Page 85: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

85

Differential PCM (DPCM)

• fixed predictors can give from 4-11 dB SNR improvement over direct quantization (PCM)

• most of the gain occurs with first order predictor

• prediction up to 4th or 5th order helps

Page 86: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

86

DPCM with Adaptive Quantization• quantizer step size proportional to variance at quantizer input

• can use d[n] or x[n] to control step size

• get 5 dB improvement in SNR over μ-law non-adaptive PCM

• get 6 dB improvement in SNR using differential configuration with fixed prediction => ADPCM is about 10-11 dB SNRbetter than from a fixed quantizer

Page 87: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

87

Feedback ADPCM

can achieve same improvement in SNR as feed forward system

Page 88: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

88

DPCM with Adaptive Prediction

need adaptive prediction to handle non-stationarity of

speech

Page 89: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

89

DPCM with Adaptive Prediction

1

ˆ[ ] [ ] [ ]

[ ]

prediction coefficients assumed to be time-dependent of the form

assume speech properties remain fixed over short time intervals choose to minimize the average s

α

α

=

= −

••

∑%p

kk

k

x n n x n k

n

1[ ] [ ] [ ], 1, 2,...,

[ ]

quared prediction errorover short intervals the optimum predictor coefficients satisfy the relationships

where is the short-time autocorrelation function

α=

= − =

∑p

n k nk

n

R j n R j k j p

R j

[ ] [ ] [ ] [ ] [ ], 0

[ ]

of the form

is window positioned at sample of input update 's every 10-20 msecα

=−∞

= − + − − ≤ ≤

••

∑nm

R j x m w n m x j m w n m j j p

w n - m n

Page 90: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

90

Prediction Gain for DPCM with Adaptive Prediction

[ ]2

10 10 2

[ ]10log 10log

[ ]

fixed prediction 10.5 dB prediction gain for large

adaptive prediction 14 dB gain for large adaptive prediction more robust to

speaker, speech

⎡ ⎤⎡ ⎤⎣ ⎦⎢ ⎥=⎡ ⎤⎢ ⎥⎣ ⎦⎣ ⎦

• →

• →

P

E x nG

E d n

p

p

material

Page 91: Digital Speech Processing— Lecture 16 - UCSB...1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

91

Comparison of Coders

• 6 dB between curves

• sharp increase in SNR with both fixed prediction and adaptive quantization

• almost no gain for adapting first order predictor