105
Speech Coding (Part I) Waveform Coding 虞虞虞

Speech Coding (Part I) Waveform Coding

  • Upload
    hugh

  • View
    123

  • Download
    7

Embed Size (px)

DESCRIPTION

Speech Coding (Part I)  Waveform Coding. 虞台文. Content. Overview Linear PCM (Pulse-Code Modulation) Nonlinear PCM Max-Lloyd Algorithm Differential PCM (DPCM) Adaptive PCM (ADPCM) Delta Modulation (DM). Speech Coding (Part I)  Waveform Coding. Overview. - PowerPoint PPT Presentation

Citation preview

Page 1: Speech Coding (Part I)    Waveform Coding

Speech Coding (Part I) Waveform Coding

虞台文

Page 2: Speech Coding (Part I)    Waveform Coding

Content

Overview Linear PCM (Pulse-Code Modulation) Nonlinear PCM Max-Lloyd Algorithm Differential PCM (DPCM) Adaptive PCM (ADPCM) Delta Modulation (DM)

Page 3: Speech Coding (Part I)    Waveform Coding

Speech Coding (Part I) Waveform Coding

Overview

Page 4: Speech Coding (Part I)    Waveform Coding

Classification of Coding schemes

Waveform coding

Vocoding

Hybrid coding

Page 5: Speech Coding (Part I)    Waveform Coding

Quality versus Bitrate of Speech Codecs

Page 6: Speech Coding (Part I)    Waveform Coding

Waveform coding

Encode the waveform itself in an efficient way Signal independent Offer good quality speech requiring a bandwidth of 16

kbps or more. Time-domain techniques

– Linear PCM (Pulse-Code Modulation)– Nonlinear PCM: -law, a-law– Differential Coding: DM, DPCM, ADPCM

Frequency-domain techniques– SBC (Sub-band Coding) , ATC (Adaptive Transform Coding)

Wavelet techniques

Page 7: Speech Coding (Part I)    Waveform Coding

Vocoding

‘Voice’ + ‘coding’ . Encoding information about how the speech signal

was produced by the human vocal system. These techniques can produce intelligible communi

cation at very low bit rates, usually below 4.8 kbps. However, the reproduced speech signal often sound

s quite synthetic and the speaker is often not recognisable.

LPC-10 Codec: 2400 bps American Military Standard.

Page 8: Speech Coding (Part I)    Waveform Coding

Hybrid coding Combining waveform and source coding methods in

order to improve the speech quality and reduce the bitrate.

Typical bandwidth requirements lie between 4.8 and 16 kbps.

Technique: Analysis-by-synthesis– RELP (Residual Excited Linear Prediction)– CELP (Codebook Excited Linear Prediction)– MPLP (Multipulse Excited Linear Prediction)– RPE (Regular Pulse Excitation)

Page 9: Speech Coding (Part I)    Waveform Coding

Quality versus Bitrate of Speech Codecs

Page 10: Speech Coding (Part I)    Waveform Coding

Speech Coding (Part I) Waveform Coding

Linear PCM(Pulse-Code Modulation)

Page 11: Speech Coding (Part I)    Waveform Coding

Pulse-Code Modulation (PCM)

A method for quantizing an analog signal for the purpose of transmitting or storing the signal in digital form.

Page 12: Speech Coding (Part I)    Waveform Coding

Quantization

A method for quantizing an analog signal for the purpose of transmitting or storing the signal in digital form.

Page 13: Speech Coding (Part I)    Waveform Coding

Linear/Uniform Quantization

Page 14: Speech Coding (Part I)    Waveform Coding

Quantization Error/Noise

Page 15: Speech Coding (Part I)    Waveform Coding

Quantization Error/Noise

granular noise

overloadnoise

overloadnoise

Page 16: Speech Coding (Part I)    Waveform Coding

Quantization Error/Noise

Page 17: Speech Coding (Part I)    Waveform Coding

Quantization Error/Noise

Unquantizedsinewave

3-bitquantizationwaveform

3-bitquantization

error

8-bitquantization

error

Quantization Step Sizemax2

2b

X

Page 18: Speech Coding (Part I)    Waveform Coding

The Model of Quantization Noise

max2

2b

X Quantization Step Size

( )x n ( )x n

( ) ( ) ( )x n x n e n

2 2( )e n

+ ( )e n

+

( )x n

( )e n

( )x n

Page 19: Speech Coding (Part I)    Waveform Coding

Signal-to-Quatization-Noise Ratio (SQNR)

A measurement of the effect of quantization errors introduced by analog-to-digital conversion at the ADC.

2

2-

10log signaldB

q noise

SQNR

-

20log signal

q noise

2

2-

signal

q noise

SQNR

Page 20: Speech Coding (Part I)    Waveform Coding

Signal-to-Quatization-Noise Ratio (SQNR)

2

2- -

10log 20logsignal signaldB

q noise q noise

SQNR

( ) ( ) ( )x n x n e n 2 2( )e n max2

2b

X

2 2( ) ~ ( , )e n U 2

2

12e Assume

2max

23 2 b

X

2

210log x

dBe

SQNR

max10log3 20 log 2 20logx

Xb

max4.77 6.02 20logx

Xb

2

2

max

3 210log

b

xX

Page 21: Speech Coding (Part I)    Waveform Coding

Signal-to-Quatization-Noise Ratio (SQNR)

2

2- -

10log 20logsignal signaldB

q noise q noise

SQNR

( ) ( ) ( )x n x n e n 2 2( )e n max2

2b

X

2 2( ) ~ ( , )e n U 2

2

12e Assume

2max

23 2 b

X

2

210log x

dBe

SQNR

max10log3 20 log 2 20logx

Xb

max4.77 6.02 20logx

Xb

2

2

max

3 210log

b

xX

Is the assumption always

appropriate? Is the assumption always

appropriate?

Page 22: Speech Coding (Part I)    Waveform Coding

Signal-to-Quatization-Noise Ratio (SQNR)

max4.77 6.02 20logdBx

XSQNR b

constantconstant

Each code bit contributes

6dB.

Each code bit contributes

6dB.

The term Xmax/x tells howbig a signal can be

accurately represented

The term Xmax/x tells howbig a signal can be

accurately represented

2

2- -

10log 20logsignal signaldB

q noise q noise

SQNR

Page 23: Speech Coding (Part I)    Waveform Coding

Signal-to-Quatization-Noise Ratio (SQNR)

max4.77 6.02 20logdBx

XSQNR b

Depending on the distribution of signal, which,

in turn, depends on users and time.

Depending on the distribution of signal, which,

in turn, depends on users and time.

Determined by A/D converter.

Determined by A/D converter.

2

2- -

10log 20logsignal signaldB

q noise q noise

SQNR

Page 24: Speech Coding (Part I)    Waveform Coding

Signal-to-Quatization-Noise Ratio (SQNR)

max4.77 6.02 20logdBx

XSQNR b

In what condition, the formula is reasonable?

In what condition, the formula is reasonable?

2

2- -

10log 20logsignal signaldB

q noise q noise

SQNR

Page 25: Speech Coding (Part I)    Waveform Coding

Overload Distortion

maxXmaxX

midtread

maxXmaxX

midrise

Page 26: Speech Coding (Part I)    Waveform Coding

Probability of Distortion

maxXmaxX

midtread

maxXmaxX

midrise

xx

Assume 2~ (0, )xx N

Page 27: Speech Coding (Part I)    Waveform Coding

Probability of Distortion

maxXmaxX

midtread

maxXmaxX

midrise

xx

Assume 2~ (0, )xx N

max(" ")x

XP overlad

max(" ")x

XP overlad

max 3

(" ") 0.0026xX

P overlad

max 3

(" ") 0.0026xX

P overlad

Page 28: Speech Coding (Part I)    Waveform Coding

Overload and Quantization Noise withGaussian Input pdf and b=4

maxXmaxX

midtread

maxXmaxX

midrise

xx

Assume 2~ (0, )xx N

max ( )xX dB

( )e

dB

Page 29: Speech Coding (Part I)    Waveform Coding

Uniform Quantizer Performance

max ( )xX dB

( )

SQNR

dB

Uniform Input Pdf

max ( )xX dB

( )

SQNR

dB

Gaussian Input Pdf

Page 30: Speech Coding (Part I)    Waveform Coding

More on Uniform Quantization

max4.77 6.02 20logx

XSQNR b

Conceptually and implementationally simple.– Imposes no restrictions on signal's statistics– Maintains a constant maximum error across its total dynam

ic range. x varies so much (order of 40 dB) across sounds, spe

akers, and input conditions. We need a quantizing system where the SQNR is ind

ependent of the signal’s dynamic range, i.e., a near-constant SQNR across its dynamic range.

Page 31: Speech Coding (Part I)    Waveform Coding

Speech Coding (Part I) Waveform Coding

Nonlinear PCM

Page 32: Speech Coding (Part I)    Waveform Coding

Probability Density Functionsof Speech Signals

Counting the number of samples in each interval provides an estimate of the pdf of the signal.

Page 33: Speech Coding (Part I)    Waveform Coding

Probability Density Functionsof Speech Signals

Page 34: Speech Coding (Part I)    Waveform Coding

Probability Density Functionsof Speech Signals

Good approx. is a gamma distribution, of the form

Simpler approx. is a Laplacian density, of the form:

1/ 2 3| |

23( )

8 | |x

x

x

p x ex

(0)p

2| |1

( )2

x

x

x

p x e

1(0)

2 x

p

Page 35: Speech Coding (Part I)    Waveform Coding

Probability Density Functionsof Speech Signals

Distribution normalized so that x=0 and x=1•

Gamma density more closely approximates measured distribution for speech than Laplacian.

Laplacian is still a good model in analytical studies.

Small amplitudes much more likely than large amplitudes—by 100:1 ratio.

Page 36: Speech Coding (Part I)    Waveform Coding

Companding

The dynamic range of signals is compressed before transmission and is expanded to the original value at the receiver.

Allowing signals with a large dynamic range to be transmitted over facilities that have a smaller dynamic range capability.

Companding reduces the noise and crosstalk levels at the receiver.

Page 37: Speech Coding (Part I)    Waveform Coding

Companding

Compressor ExpanderUniformQuantizer

( )C x 1( )C xx xy y

Page 38: Speech Coding (Part I)    Waveform Coding

( )g x 1( )g xx xy y

Companding

Compressor ExpanderUniformQuantizer

Page 39: Speech Coding (Part I)    Waveform Coding

( )g x 1( )g xx xy y

Companding

Compressor ExpanderUniformQuantizer

After compression, y is

Nearly uniformly distributed

Page 40: Speech Coding (Part I)    Waveform Coding

( )g x 1( )g xx xy y

The Quantization-Error Variance of Nonuniform Quantizer

Compressor ExpanderUniformQuantizer

max

max

22

2

( )

12 ( )

X

e X

p xdx

C x

Jayant and Noll

Page 41: Speech Coding (Part I)    Waveform Coding

( )g x 1( )g xx xy y

The Quantization-Error Variance of Nonuniform Quantizer

Compressor ExpanderUniformQuantizer

Jayant and Nollmax

max

22

2

( )

12 ( )

X

e X

p xdx

C x

Page 42: Speech Coding (Part I)    Waveform Coding

( )g x 1( )g xx xy y

The Optimal C(x)

Compressor ExpanderUniformQuantizer

max

max

22

2

( )

12 ( )

X

e X

p xdx

C x

Jayant and Noll

If the signal’s pdf is known, then the minimum SQNR, is achievable by letting

max

3

0max

3

0

( )( )

( )

x

X

p x dxC x X

p x dx

Page 43: Speech Coding (Part I)    Waveform Coding

( )g x 1( )g xx xy y

The Optimal C(x)

Compressor ExpanderUniformQuantizer

max

max

22

2

( )

12 ( )

X

e X

p xdx

C x

Jayant and Noll

If the signal’s pdf is known, then the minimum SQNR, is achievable by letting

max

3

0max

3

0

( )( )

( )

x

X

p x dxC x X

p x dx

Is the assumption realistic.Is the assumption realistic.

Page 44: Speech Coding (Part I)    Waveform Coding

PDF-Independent Nonuniform Quantization

2

2x

e

SQNR

max

max

max

max

2

2

2

( )

1( )

12 ( )

X

X

X

X

x p x dx

p x dxC x

Assuming overload free,

We require that SQNR is independent on p(x).

22 2

1 1

( )x

kC x

( ) /C x k x ( ) lnC x k x A

Page 45: Speech Coding (Part I)    Waveform Coding

Logarithmic Companding

( ) lnC x k x A

Page 46: Speech Coding (Part I)    Waveform Coding

-Law & A-Law Companding

-Law– A North American PCM standard– Used by North America and Japan

A-Law– An ITU PCM standard– Used by Europe

( )g x 1( )g xx xy y( )g x 1( )g xx xy y

Compressor ExpanderUnif ormQuantizer

Page 47: Speech Coding (Part I)    Waveform Coding

-Law & A-Law Companding

-Law– A North American PCM standard– Used by North America and Japan

A-Law– An ITU PCM standard– Used by Europe

( )g x 1( )g xx xy y( )g x 1( )g xx xy y

Compressor ExpanderUnif ormQuantizer

( )y C x ln 1 | |

( )ln(1 )

xsign x

( )Ay C x

| | 1( ) 0 | |

1 ln1 ln | | 1

( ) | | 11 ln

A xsign x x

A AA x

sign x xA A

(=255 in U.S. and Canada)

(A=87.56 in Europe)

Page 48: Speech Coding (Part I)    Waveform Coding

-Law & A-Law Companding

( )g x 1( )g xx xy y( )g x 1( )g xx xy y

Compressor ExpanderUnif ormQuantizer

x

()

yC

x

x

()

Ay

Cx

Page 49: Speech Coding (Part I)    Waveform Coding

( )x nmaxX

maxX

()

yC

x

-Law Companding

( )g x 1( )g xx xy y( )g x 1( )g xx xy y

Compressor ExpanderUnif ormQuantizer

( )y C x maxmax

| |ln 1

( )ln(1 )

xX

X sign x

( ) 0 ( ) 0x n y n

max max( ) ( )x n X y n X

0 ( ) ( )y n x n

Page 50: Speech Coding (Part I)    Waveform Coding

( )x nmaxX

maxX

()

yC

x

-Law Companding

( )g x 1( )g xx xy y( )g x 1( )g xx xy y

Compressor ExpanderUnif ormQuantizer

( )y C x maxmax

| |ln 1

( )ln(1 )

xX

X sign x

maxmax max

maxmax max

1 | | | |( ) 1

ln( )

1 | | | |ln ( ) 1

ln

x xX sign x

X XC x

x xX sign x

X X

1ln 1

ln 1

z zz

z z

1ln 1

ln 1

z zz

z z

Page 51: Speech Coding (Part I)    Waveform Coding

( )x nmaxX

maxX

()

yC

x

-Law Companding

( )g x 1( )g xx xy y( )g x 1( )g xx xy y

Compressor ExpanderUnif ormQuantizer

( )y C x maxmax

| |ln 1

( )ln(1 )

xX

X sign x

1ln 1

ln 1

z zz

z z

1ln 1

ln 1

z zz

z z

LinearLinear

LogLog

maxmax max

maxmax max

1 | | | |( ) 1

ln( )

1 | | | |ln ( ) 1

ln

x xX sign x

X XC x

x xX sign x

X X

Page 52: Speech Coding (Part I)    Waveform Coding

Histogram for -Law Companding

( )g x 1( )g xx xy y( )g x 1( )g xx xy y

Compressor ExpanderUnif ormQuantizer

x(n)

y(n)

Page 53: Speech Coding (Part I)    Waveform Coding

-law Approximation to Log

( )g x 1( )g xx xy y( )g x 1( )g xx xy y

Compressor ExpanderUnif ormQuantizer

( )x n

ˆ( )y n

Distribution of quantization level for a -law

3-bit quantizer.

Page 54: Speech Coding (Part I)    Waveform Coding

SQNR of -law Quantizer

6.02b dependence on b

Much less dependence on Xmax/x

For large SQNR is less sensitive to the changes in Xmax/x

2

max max6.02 4.77 20log ln(1 ) 10log 1 2dBx x

X XSQNR b

good

good

good

Page 55: Speech Coding (Part I)    Waveform Coding

Comparison of Linear and -law Quantizers

2

max max6.02 4.77 20log ln(1 ) 10log 1 2dBx x

X XSQNR b

max6.02 4.77 20logdBx

XSQNR b

Linear

Page 56: Speech Coding (Part I)    Waveform Coding

A-Law Companding

( )Ay C x

| | 1( ) 0 | |

1 ln1 ln | | 1

( ) | | 11 ln

A xsign x x

A AA x

sign x xA A

Page 57: Speech Coding (Part I)    Waveform Coding

A-Law Companding

( )Ay C x

| | 1( ) 0 | |

1 ln1 ln | | 1

( ) | | 11 ln

A xsign x x

A AA x

sign x xA A

LinearLinear

LogLog

Page 58: Speech Coding (Part I)    Waveform Coding

A-Law Companding

x

()

yC

x

x

()

Ay

Cx

Page 59: Speech Coding (Part I)    Waveform Coding

SQNR of A-Law Companding

6.02 4.77 20log(1 )dBSQNR b A

Page 60: Speech Coding (Part I)    Waveform Coding

Demonstration

PCM Demo

Page 61: Speech Coding (Part I)    Waveform Coding

Speech Coding (Part I) Waveform Coding

Max-Lloyd Algorithm

Page 62: Speech Coding (Part I)    Waveform Coding

How to design a nonuniform quantizer?

xkxk1 xk+1

ck

ck1

x

Q(x)

qk1

qk

Q(x): Quantization (Reconstruction) Level

1k k kx q x

qk

Page 63: Speech Coding (Part I)    Waveform Coding

How to design a nonuniform quantizer?

xkxk1 xk+1

ck

ck1

x

Q(x)

qk1

qk

Q(x): Quantization (Reconstruction) Level

1k k kx q x

qk

? ?

?

Page 64: Speech Coding (Part I)    Waveform Coding

How to design a nonuniform quantizer?

ck ck+1 ck+2 ck+3ck1ck2

xk xk+1 xk+2 xk+3xk1xk2 xk+4

qk qk+1 qk+2 qk+3qk1qk2

Major tasks:1. Determine the decision thresholds xk’s2. Determine the reconstruction levels qk’s

Related task:3. Determine codewords ck’s

Page 65: Speech Coding (Part I)    Waveform Coding

Optimal Nonuniform Quantization

22 ( )e E X Q X

An optimal quantizer is the one that minimizes the following quantization-error variance.

ck ck+1 ck+2 ck+3ck 1ck 2 ck ck+1 ck+2 ck+3ck 1ck 2

xk xk+1 xk+2 xk+3xk 1xk 2 xk+4xk xk+1 xk+2 xk+3xk 1xk 2 xk+4

qk qk+1 qk+2 qk+3qk 1qk 2 qk qk+1 qk+2 qk+3qk 1qk 2

Major tasks:1. Determine the decision thresholds xk’s2. Determine the reconstruction levels qk’s

Page 66: Speech Coding (Part I)    Waveform Coding

Optimal Nonuniform Quantization

22 ( )e E X Q X

ck ck+1 ck+2 ck+3ck 1ck 2 ck ck+1 ck+2 ck+3ck 1ck 2

xk xk+1 xk+2 xk+3xk 1xk 2 xk+4xk xk+1 xk+2 xk+3xk 1xk 2 xk+4

qk qk+1 qk+2 qk+3qk 1qk 2 qk qk+1 qk+2 qk+3qk 1qk 2

1 2

1

( )k

k

N x

kxk

x q p x dx

2( ) ( )e x p x dx

1

1

1

2* * * *1 1

1

( , , , , , ) arg min ( )k

kN

N

N x

N N kxx x kq q

x x q q x q p x dx

Page 67: Speech Coding (Part I)    Waveform Coding

Necessary Conditions for an Optimum

ck ck+1 ck+2 ck+3ck 1ck 2 ck ck+1 ck+2 ck+3ck 1ck 2

xk xk+1 xk+2 xk+3xk 1xk 2 xk+4xk xk+1 xk+2 xk+3xk 1xk 2 xk+4

qk qk+1 qk+2 qk+3qk 1qk 2 qk qk+1 qk+2 qk+3qk 1qk 2

1

1

1

2* * * *1 1

1

( , , , , , ) arg min ( )k

kN

N

N x

N N kxx x kq q

x x q q x q p x dx

2e

2

0e

kq

2

0e

kx

leads to the “centroid” condition

leads to the “nearest neighborhood” condition

Page 68: Speech Coding (Part I)    Waveform Coding

Necessary Conditions for an Optimum

ck ck+1 ck+2 ck+3ck 1ck 2 ck ck+1 ck+2 ck+3ck 1ck 2

xk xk+1 xk+2 xk+3xk 1xk 2 xk+4xk xk+1 xk+2 xk+3xk 1xk 2 xk+4

qk qk+1 qk+2 qk+3qk 1qk 2 qk qk+1 qk+2 qk+3qk 1qk 2

2

0e

kq

2

0e

kx

leads to the “centroid” condition

leads to the “nearest neighborhood” condition

1

1

( ), 1, ,

( )

k

k

k

k

x

xk x

x

xp x dxq k N

p x dx

1 , 1, ,2

k kk

q qx k N

Page 69: Speech Coding (Part I)    Waveform Coding

Optimal Nonuniform Quantization

ck ck+1 ck+2 ck+3ck 1ck 2 ck ck+1 ck+2 ck+3ck 1ck 2

xk xk+1 xk+2 xk+3xk 1xk 2 xk+4xk xk+1 xk+2 xk+3xk 1xk 2 xk+4

qk qk+1 qk+2 qk+3qk 1qk 2 qk qk+1 qk+2 qk+3qk 1qk 2

2

0e

kq

2

0e

kx

leads to the “centroid” condition

leads to the “nearest neighborhood” condition

1

1

( ), 1, ,

( )

k

k

k

k

x

xk x

x

xp x dxq k N

p x dx

1 , 1, ,2

k kk

q qx k N

This suggests an

iterative algorithm to

reach the optimum.

This suggests an

iterative algorithm to

reach the optimum.

Page 70: Speech Coding (Part I)    Waveform Coding

The Max-Lloyd algorithm

ck ck+1 ck+2 ck+3ck 1ck 2 ck ck+1 ck+2 ck+3ck 1ck 2

xk xk+1 xk+2 xk+3xk 1xk 2 xk+4xk xk+1 xk+2 xk+3xk 1xk 2 xk+4

qk qk+1 qk+2 qk+3qk 1qk 2 qk qk+1 qk+2 qk+3qk 1qk 2

1. Initialize a set of decision levels {xk} and set

2. Calculate reconstruction levels {qk} by

3. Calulate mse by

4. If , exit.5. Set and adjust decision levels {xk} by

6. Go to 2

1

1

( )

( )

k

k

k

k

x

x

x

x

kqxp x dx

p x dx

1

2kk kq q

x

1 22 ( )k

k

x

ke xx p x dxq

2e

2 2e e

2 2e e

Page 71: Speech Coding (Part I)    Waveform Coding

The Max-Lloyd algorithm

ck ck+1 ck+2 ck+3ck 1ck 2 ck ck+1 ck+2 ck+3ck 1ck 2

xk xk+1 xk+2 xk+3xk 1xk 2 xk+4xk xk+1 xk+2 xk+3xk 1xk 2 xk+4

qk qk+1 qk+2 qk+3qk 1qk 2 qk qk+1 qk+2 qk+3qk 1qk 2

1. Initialize a set of decision levels {xk} and set

2. Calculate reconstruction levels {qk} by

3. Calulate mse by

4. If , exit.5. Set and adjust decision levels {xk} by

6. Go to 2

1

1

( )

( )

k

k

k

k

x

x

x

x

kqxp x dx

p x dx

1

2kk kq q

x

1 22 ( )k

k

x

ke xx p x dxq

2e

2 2e e

2 2e e

This version assumes that the pdf of signa

l is availabe.This version assumes that the pdf of signa

l is availabe.

Page 72: Speech Coding (Part I)    Waveform Coding

The Max-Lloyd algorithm(Practical Version)

Exercise

Page 73: Speech Coding (Part I)    Waveform Coding

Speech Coding (Part I) Waveform Coding

Differential PCM (DPCM)

Page 74: Speech Coding (Part I)    Waveform Coding

Typical Audio Signals

0 500 1000 1500 2000 2500-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

1250 1300 1350 1400 1450 1500 1550 1600 1650 1700 1750-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

A segment of audio signals

Do you find any correlation and/or redundancy among the samples?

Page 75: Speech Coding (Part I)    Waveform Coding

The Basic Idea of DPCM

Adjacent samples exhibit a high degree of correlation.

Removing this adjacent redundancy before encoding, a more efficient coded signal can be resulted.

How?– Accompanying with prediction (e.g., linear prediction)– Encoding prediction error only

Page 76: Speech Coding (Part I)    Waveform Coding

Linear Prediction

1

ˆ( ) ( )p

kk

s n a s n k

ˆ( ) ( ) ( )e n s n s n

1

( ) ( )p

kk

s n a s n k

n1

n2

n3

np

ˆ( )s n

( )s n

2

1

( )N

pn

e n

E* arg min p

aa E

1( , , )pa a a

Page 77: Speech Coding (Part I)    Waveform Coding

Linear Predictor

1

ˆ( ) ( )p

kk

s n a s n k

PredictorPredictor( )s n ˆ( )s n

Page 78: Speech Coding (Part I)    Waveform Coding

DPCM Codec

ˆ( )s n

QuantizerQuantizer( )e n

( )s n

( )e n( )s n

PredictorPredictor

( )e n

+

++

( )s n

PredictorPredictor( )s n ˆ( )s nPredictorPredictorPredictorPredictor( )s n ˆ( )s n

ˆ( )s n

Channel

Channel( )e n

PredictorPredictor

( )s n+

+

ˆ( )s n

( )s nA/D

converter

Page 79: Speech Coding (Part I)    Waveform Coding

DPCM Codec

Channel( )e n

PredictorPredictor

( )s n+

+

ˆ( )s n

ˆ( )s n

QuantizerQuantizer( )e n

( )s n

( )e n( )s n

PredictorPredictor

( )e n

+

++

( )s n

ˆ( )s n

Channel

( )s nA/D

converter

The dynamic range of prediction

error is much smaller than the

signal’s.

Less quantization levels needed

The dynamic range of prediction

error is much smaller than the

signal’s.

Less quantization levels needed

Page 80: Speech Coding (Part I)    Waveform Coding

Performance of DPCM

By using a logarithmic compressor and a 4-bit quantizer for the error sequence e(n), DPCM results in high-quality speech at a rate of 32,000 bps, which is a factor of two lower than logarithmic PCM

Page 81: Speech Coding (Part I)    Waveform Coding

Speech Coding (Part I) Waveform Coding

Adaptive PCM (ADPCM)

Page 82: Speech Coding (Part I)    Waveform Coding

Basic Concept The power level in a speech signal varies

slowly with time.

Let the quantization step dynamically adapt to the slowly time-variant power level.

(n)

( ) ( )n n

Page 83: Speech Coding (Part I)    Waveform Coding

Adaptive Quantization Schemes

Feed-forward-adaptive quantizers– estimate (n) from x(n) itself– step size must be transmitted

Feedback-adaptive quantizers– adapt the step size, , on the basis of the quantize

d signal– step size needs not to be transmitted

ˆ( )x n

Page 84: Speech Coding (Part I)    Waveform Coding

Feed Forward Adaptation

QuantizerQuantizer EncoderEncoder

Step-SizeAdaptation

System

Step-SizeAdaptation

System

( )x n ˆ( )x n ( )c n

( )n( )n

DecoderDecoder( )c n

( )n

ˆ( )x n

Page 85: Speech Coding (Part I)    Waveform Coding

Feed Forward Adaptation

QuantizerQuantizer EncoderEncoder

Step-SizeAdaptation

System

Step-SizeAdaptation

System

( )x n ˆ( )x n ( )c n

( )n( )n

DecoderDecoder( )c n

( )n

ˆ( )x n

The source signal is not available at receiver. So, the receiver can’t evaluate (n) by itself.

The source signal is not available at receiver. So, the receiver can’t evaluate (n) by itself.

(n) has to be transmitted.

( )x n

Quantization errorˆ( ) ( ) ( )e n x n x n

Page 86: Speech Coding (Part I)    Waveform Coding

QuantizerQuantizer EncoderEncoder( )x n ˆ( )x n ( )c n

( )n( )n

DecoderDecoder( )c n

( )n

ˆ( )x n ( )x n

Quantization errorˆ( ) ( ) ( )e n x n x n

(n) has to be transmitted.

The Step-Size Adaptation System

Step-SizeAdaptation

System

Step-SizeAdaptation

System

Estimate signal’s short-time energy,

2(n), and make (n) (n).

Estimate signal’s short-time energy,

2(n), and make (n) (n).

0( ) ( )n n

Page 87: Speech Coding (Part I)    Waveform Coding

The Step-Size Adaptation System Low-Pass Filter Approach

2 2( ) ( ) ( )n

m

n x m h n m

( ) , 0,0 1nh n n

2 ( )n

n m

m

x m

1

2 2( ) ( )n

n m

m

x m x n

1

1 2 2( ) ( )n

n m

m

x m x n

2 2( 1) ( )n x n

0( ) ( )n n

Page 88: Speech Coding (Part I)    Waveform Coding

The Step-Size Adaptation System Low-Pass Filter Approach

0( ) ( )n n

= 0.99 = 0.9

Page 89: Speech Coding (Part I)    Waveform Coding

The Step-Size Adaptation System Moving Average Approach

2 2

1

( ) ( ) ( )n

m n M

n x m h n m

1( ) ,0 0 1h n M

M

2

1

1( )

n

m n M

x mM

0( ) ( )n n

Page 90: Speech Coding (Part I)    Waveform Coding

Feed-Forward Quantizer

2 2

1

1( ) ( )

n

m n M

n x mM

0( ) ( )n n

(n) evaluated every M Samples Use M=128, 1024 for estimates Suitable choosing of min and max

Page 91: Speech Coding (Part I)    Waveform Coding

Feed-Forward Quantizer

2 2

1

1( ) ( )

n

m n M

n x mM

0( ) ( )n n

(n) evaluated every M Samples Use M=128, 1024 for estimates Suitable choosing of min and max

Too longToo long

Page 92: Speech Coding (Part I)    Waveform Coding

Feedback Adaptation

QuantizerQuantizer EncoderEncoder

Step-SizeAdaptation

System

Step-SizeAdaptation

System

( )x n ˆ( )x n ( )c n

( )n

DecoderDecoder( )c n

( )n

ˆ( )x n

Step-SizeAdaptation

System

Step-SizeAdaptation

System

(n) can be evaluated at both sides using the same alogorithm. Hence, it needs not to be transmitted.

Page 93: Speech Coding (Part I)    Waveform Coding

The Step-Size Adaptation System

Step-SizeAdaptation

System

Step-SizeAdaptation

System

Step-SizeAdaptation

System

Step-SizeAdaptation

System

QuantizerQuantizer EncoderEncoder( )x n ˆ( )x n ( )c n

( )n

DecoderDecoder( )c n

( )n

ˆ( )x n

The same as feed-forward adaptation except that the input changes.

Page 94: Speech Coding (Part I)    Waveform Coding

Alternative Approach to Adaptation

( ) ( ) ( 1)n P n n

P(n){P1, P2, …} depends on c(n1).

Needs to impose the limits

The ratio max/min controls the dyna

mic range of the quantizer.

min max( )n

Page 95: Speech Coding (Part I)    Waveform Coding

Alternative Approach to Adaptation

( ) ( ) ( 1)n P n n

P(n){P1, P2, …} depends on c(n1).

Needs to impose the limits

The ratio max/min controls the dyna

mic range of the quantizer.

min max( )n P1

P2

P3

P4

P5

P6

P7

P8

Page 96: Speech Coding (Part I)    Waveform Coding

Alternative Approach to Adaptation

Page 97: Speech Coding (Part I)    Waveform Coding

Speech Coding (Part I) Waveform Coding

Delta Modulation

(DM)

Page 98: Speech Coding (Part I)    Waveform Coding

Delta Modulation

Simplest form of DPCM– The prediction of the next is simply the current

Sampling rate chosen to be many times (e.g., 5) the Nyquist rate, adjacent samples are quite correlated, i.e., s(n)s(n1).– 1-bit (2-level) quantizer is used– Bit-rate = sampling rate

Page 99: Speech Coding (Part I)    Waveform Coding

Review DPCM

ˆ( )s n

QuantizerQuantizer( )e n

( )s n

( )e n( )s n

PredictorPredictor

( )e n

+

++

( )s n

ˆ( )s n

Channel

Channel( )e n

PredictorPredictor

( )s n+

+

ˆ( )s n

( )s nA/D

converter

Page 100: Speech Coding (Part I)    Waveform Coding

DM Codec

ˆ( )s n

QuantizerQuantizer( )e n

( )s n

( )s n

PredictorPredictor

( )e n

+

++

Channel

1

Channel( )e n

PredictorPredictor

( )s n+

+

ˆ( )s n

A/Dconverter

z1

z1

Page 101: Speech Coding (Part I)    Waveform Coding

time

Distortions of DM

0 1 1 1 1 1 0 0 0 0 1 0 0 1 0

T

step size

code words:1 ( ) 1

( )0 ( ) 1

e nc n

e n

Page 102: Speech Coding (Part I)    Waveform Coding

time

Distortions of DM

0 1 1 1 1 1 0 0 0 0 1 0 0 1 0

T

step size

code words:1 ( ) 1

( )0 ( ) 1

e nc n

e n

granular noisegranular noise

slope overload condition

slope overload condition

Page 103: Speech Coding (Part I)    Waveform Coding

time

Choosing of Step Size

Needs small step size

Needs small step size

Needs large step size

Needs large step size

Page 104: Speech Coding (Part I)    Waveform Coding

time

Adaptive DM (ADM)

( ) ( 1)( ) ( 1) e n e nn n K

Page 105: Speech Coding (Part I)    Waveform Coding

Adaptive DM (ADM)

( ) ( 1)( ) ( 1) e n e nn n K

2K