Speech Coding (Part I) Waveform Coding

Speech Coding (Part I) Waveform Coding

虞台文

Content

Overview Linear PCM (Pulse-Code Modulation) Nonlinear PCM Max-Lloyd Algorithm Differential PCM (DPCM) Adaptive PCM (ADPCM) Delta Modulation (DM)


Overview

Classification of Coding schemes

Waveform coding

Vocoding

Hybrid coding

Quality versus Bitrate of Speech Codecs

Waveform coding

Encode the waveform itself in an efficient way Signal independent Offer good quality speech requiring a bandwidth of 16

kbps or more. Time-domain techniques

– Linear PCM (Pulse-Code Modulation)– Nonlinear PCM: -law, a-law– Differential Coding: DM, DPCM, ADPCM

Frequency-domain techniques– SBC (Sub-band Coding) , ATC (Adaptive Transform Coding)

Wavelet techniques

Vocoding

‘Voice’ + ‘coding’ . Encoding information about how the speech signal

was produced by the human vocal system. These techniques can produce intelligible communi

cation at very low bit rates, usually below 4.8 kbps. However, the reproduced speech signal often sound

s quite synthetic and the speaker is often not recognisable.

LPC-10 Codec: 2400 bps American Military Standard.

Hybrid coding Combining waveform and source coding methods in

order to improve the speech quality and reduce the bitrate.

Typical bandwidth requirements lie between 4.8 and 16 kbps.

Technique: Analysis-by-synthesis– RELP (Residual Excited Linear Prediction)– CELP (Codebook Excited Linear Prediction)– MPLP (Multipulse Excited Linear Prediction)– RPE (Regular Pulse Excitation)

Quality versus Bitrate of Speech Codecs


Linear PCM(Pulse-Code Modulation)

Pulse-Code Modulation (PCM)

A method for quantizing an analog signal for the purpose of transmitting or storing the signal in digital form.

Quantization

A method for quantizing an analog signal for the purpose of transmitting or storing the signal in digital form.

Linear/Uniform Quantization

Quantization Error/Noise


granular noise

overloadnoise

overloadnoise



Unquantizedsinewave

3-bitquantizationwaveform

3-bitquantization

error

8-bitquantization

error

Quantization Step Sizemax2

2b

X

The Model of Quantization Noise

max2

2b

X Quantization Step Size

( )x n ( )x n

( ) ( ) ( )x n x n e n

2 2( )e n

+ ( )e n

+

( )x n

( )e n

( )x n

Signal-to-Quatization-Noise Ratio (SQNR)

A measurement of the effect of quantization errors introduced by analog-to-digital conversion at the ADC.

2

2-

10log signaldB

q noise

SQNR

-

20log signal

q noise

2

2-

signal

q noise

SQNR


2

2- -

10log 20logsignal signaldB

q noise q noise

SQNR

( ) ( ) ( )x n x n e n 2 2( )e n max2

2b

X

2 2( ) ~ ( , )e n U 2

2

12e Assume

2max

23 2 b

X

2

210log x

dBe

SQNR

max10log3 20 log 2 20logx

Xb

max4.77 6.02 20logx

Xb

2

2

max

3 210log

b

xX


2

2- -


q noise q noise

SQNR

( ) ( ) ( )x n x n e n 2 2( )e n max2

2b

X

2 2( ) ~ ( , )e n U 2

2

12e Assume

2max

23 2 b

X

2

210log x

dBe

SQNR

max10log3 20 log 2 20logx

Xb

max4.77 6.02 20logx

Xb

2

2

max

3 210log

b

xX

Is the assumption always

appropriate? Is the assumption always

appropriate?


max4.77 6.02 20logdBx

XSQNR b

constantconstant

Each code bit contributes

6dB.

Each code bit contributes

6dB.

The term Xmax/x tells howbig a signal can be

accurately represented

The term Xmax/x tells howbig a signal can be

accurately represented

2

2- -


q noise q noise

SQNR



XSQNR b

Depending on the distribution of signal, which,

in turn, depends on users and time.

Depending on the distribution of signal, which,

in turn, depends on users and time.

Determined by A/D converter.

Determined by A/D converter.

2

2- -


q noise q noise

SQNR



XSQNR b

In what condition, the formula is reasonable?

In what condition, the formula is reasonable?

2

2- -


q noise q noise

SQNR

Overload Distortion

maxXmaxX

midtread

maxXmaxX

midrise

Probability of Distortion

maxXmaxX

midtread

maxXmaxX

midrise

xx

Assume 2~ (0, )xx N

Probability of Distortion

maxXmaxX

midtread

maxXmaxX

midrise

xx

Assume 2~ (0, )xx N

max(" ")x

XP overlad

max(" ")x

XP overlad

max 3

(" ") 0.0026xX

P overlad

max 3

(" ") 0.0026xX

P overlad

Overload and Quantization Noise withGaussian Input pdf and b=4

maxXmaxX

midtread

maxXmaxX

midrise

xx

Assume 2~ (0, )xx N

max ( )xX dB

( )e

dB

Uniform Quantizer Performance

max ( )xX dB

( )

SQNR

dB

Uniform Input Pdf

max ( )xX dB

( )

SQNR

dB

Gaussian Input Pdf

More on Uniform Quantization

max4.77 6.02 20logx

XSQNR b

Conceptually and implementationally simple.– Imposes no restrictions on signal's statistics– Maintains a constant maximum error across its total dynam

ic range. x varies so much (order of 40 dB) across sounds, spe

akers, and input conditions. We need a quantizing system where the SQNR is ind

ependent of the signal’s dynamic range, i.e., a near-constant SQNR across its dynamic range.


Nonlinear PCM

Probability Density Functionsof Speech Signals

Counting the number of samples in each interval provides an estimate of the pdf of the signal.



Good approx. is a gamma distribution, of the form

Simpler approx. is a Laplacian density, of the form:

1/ 2 3| |

23( )

8 | |x

x

x

p x ex

(0)p

2| |1

( )2

x

x

x

p x e

1(0)

2 x

p


Distribution normalized so that x=0 and x=1•

Gamma density more closely approximates measured distribution for speech than Laplacian.

Laplacian is still a good model in analytical studies.

Small amplitudes much more likely than large amplitudes—by 100:1 ratio.

Companding

The dynamic range of signals is compressed before transmission and is expanded to the original value at the receiver.

Allowing signals with a large dynamic range to be transmitted over facilities that have a smaller dynamic range capability.

Companding reduces the noise and crosstalk levels at the receiver.

Companding

Compressor ExpanderUniformQuantizer

( )C x 1( )C xx xy y

( )g x 1( )g xx xy y

Companding



Companding


After compression, y is

Nearly uniformly distributed


The Quantization-Error Variance of Nonuniform Quantizer


max

max

22

2

( )

12 ( )

X

e X

p xdx

C x

Jayant and Noll


The Quantization-Error Variance of Nonuniform Quantizer


Jayant and Nollmax

max

22

2

( )

12 ( )

X

e X

p xdx

C x


The Optimal C(x)


max

max

22

2

( )

12 ( )

X

e X

p xdx

C x

Jayant and Noll

If the signal’s pdf is known, then the minimum SQNR, is achievable by letting

max

3

0max

3

0

( )( )

( )

x

X

p x dxC x X

p x dx


The Optimal C(x)


max

max

22

2

( )

12 ( )

X

e X

p xdx

C x

Jayant and Noll

If the signal’s pdf is known, then the minimum SQNR, is achievable by letting

max

3

0max

3

0

( )( )

( )

x

X

p x dxC x X

p x dx

Is the assumption realistic.Is the assumption realistic.

PDF-Independent Nonuniform Quantization

2

2x

e

SQNR

max

max

max

max

2

2

2

( )

1( )

12 ( )

X

X

X

X

x p x dx

p x dxC x

Assuming overload free,

We require that SQNR is independent on p(x).

22 2

1 1

( )x

kC x

( ) /C x k x ( ) lnC x k x A

Logarithmic Companding

( ) lnC x k x A

-Law & A-Law Companding

-Law– A North American PCM standard– Used by North America and Japan

A-Law– An ITU PCM standard– Used by Europe

( )g x 1( )g xx xy y( )g x 1( )g xx xy y

Compressor ExpanderUnif ormQuantizer


-Law– A North American PCM standard– Used by North America and Japan

A-Law– An ITU PCM standard– Used by Europe



( )y C x ln 1 | |

( )ln(1 )

xsign x

( )Ay C x

| | 1( ) 0 | |

1 ln1 ln | | 1

( ) | | 11 ln

A xsign x x

A AA x

sign x xA A

(=255 in U.S. and Canada)

(A=87.56 in Europe)




x

()

yC

x

x

()

Ay

Cx

( )x nmaxX

maxX

()

yC

x

-Law Companding



( )y C x maxmax

| |ln 1

( )ln(1 )

xX

X sign x

( ) 0 ( ) 0x n y n

max max( ) ( )x n X y n X

0 ( ) ( )y n x n

( )x nmaxX

maxX

()

yC

x

-Law Companding



( )y C x maxmax

| |ln 1

( )ln(1 )

xX

X sign x

maxmax max

maxmax max

1 | | | |( ) 1

ln( )

1 | | | |ln ( ) 1

ln

x xX sign x

X XC x

x xX sign x

X X

1ln 1

ln 1

z zz

z z

1ln 1

ln 1

z zz

z z

( )x nmaxX

maxX

()

yC

x

-Law Companding



( )y C x maxmax

| |ln 1

( )ln(1 )

xX

X sign x

1ln 1

ln 1

z zz

z z

1ln 1

ln 1

z zz

z z

LinearLinear

LogLog

maxmax max

maxmax max

1 | | | |( ) 1

ln( )

1 | | | |ln ( ) 1

ln

x xX sign x

X XC x

x xX sign x

X X

Histogram for -Law Companding



x(n)

y(n)

-law Approximation to Log



( )x n

ˆ( )y n

Distribution of quantization level for a -law

3-bit quantizer.

SQNR of -law Quantizer

6.02b dependence on b

Much less dependence on Xmax/x

For large SQNR is less sensitive to the changes in Xmax/x

2

max max6.02 4.77 20log ln(1 ) 10log 1 2dBx x

X XSQNR b

good

good

good

Comparison of Linear and -law Quantizers

2

max max6.02 4.77 20log ln(1 ) 10log 1 2dBx x

X XSQNR b


XSQNR b

Linear

A-Law Companding

( )Ay C x

| | 1( ) 0 | |

1 ln1 ln | | 1

( ) | | 11 ln

A xsign x x

A AA x

sign x xA A

A-Law Companding

( )Ay C x

| | 1( ) 0 | |

1 ln1 ln | | 1

( ) | | 11 ln

A xsign x x

A AA x

sign x xA A

LinearLinear

LogLog

A-Law Companding

x

()

yC

x

x

()

Ay

Cx

SQNR of A-Law Companding

6.02 4.77 20log(1 )dBSQNR b A

Demonstration

PCM Demo


Max-Lloyd Algorithm

How to design a nonuniform quantizer?

xkxk1 xk+1

ck

ck1

x

Q(x)

qk1

qk

Q(x): Quantization (Reconstruction) Level

1k k kx q x

qk


xkxk1 xk+1

ck

ck1

x

Q(x)

qk1

qk

Q(x): Quantization (Reconstruction) Level

1k k kx q x

qk

? ?

?


ck ck+1 ck+2 ck+3ck1ck2

xk xk+1 xk+2 xk+3xk1xk2 xk+4

qk qk+1 qk+2 qk+3qk1qk2

Major tasks:1. Determine the decision thresholds xk’s2. Determine the reconstruction levels qk’s

Related task:3. Determine codewords ck’s

Optimal Nonuniform Quantization

22 ( )e E X Q X

An optimal quantizer is the one that minimizes the following quantization-error variance.

ck ck+1 ck+2 ck+3ck 1ck 2 ck ck+1 ck+2 ck+3ck 1ck 2

xk xk+1 xk+2 xk+3xk 1xk 2 xk+4xk xk+1 xk+2 xk+3xk 1xk 2 xk+4

qk qk+1 qk+2 qk+3qk 1qk 2 qk qk+1 qk+2 qk+3qk 1qk 2

Major tasks:1. Determine the decision thresholds xk’s2. Determine the reconstruction levels qk’s


22 ( )e E X Q X




1 2

1

( )k

k

N x

kxk

x q p x dx

2( ) ( )e x p x dx

1

1

1

2* * * *1 1

1

( , , , , , ) arg min ( )k

kN

N

N x

N N kxx x kq q

x x q q x q p x dx

Necessary Conditions for an Optimum




1

1

1

2* * * *1 1

1

( , , , , , ) arg min ( )k

kN

N

N x

N N kxx x kq q

x x q q x q p x dx

2e

2

0e

kq

2

0e

kx

leads to the “centroid” condition

leads to the “nearest neighborhood” condition

Necessary Conditions for an Optimum




2

0e

kq

2

0e

kx



1

1

( ), 1, ,

( )

k

k

k

k

x

xk x

x

xp x dxq k N

p x dx

1 , 1, ,2

k kk

q qx k N





2

0e

kq

2

0e

kx



1

1

( ), 1, ,

( )

k

k

k

k

x

xk x

x

xp x dxq k N

p x dx

1 , 1, ,2

k kk

q qx k N

This suggests an

iterative algorithm to

reach the optimum.

This suggests an

iterative algorithm to

reach the optimum.

The Max-Lloyd algorithm




1. Initialize a set of decision levels {xk} and set

2. Calculate reconstruction levels {qk} by

3. Calulate mse by

4. If , exit.5. Set and adjust decision levels {xk} by

6. Go to 2

1

1

( )

( )

k

k

k

k

x

x

x

x

kqxp x dx

p x dx

1

2kk kq q

x

1 22 ( )k

k

x

ke xx p x dxq

2e

2 2e e

2 2e e

The Max-Lloyd algorithm




1. Initialize a set of decision levels {xk} and set

2. Calculate reconstruction levels {qk} by

3. Calulate mse by

4. If , exit.5. Set and adjust decision levels {xk} by

6. Go to 2

1

1

( )

( )

k

k

k

k

x

x

x

x

kqxp x dx

p x dx

1

2kk kq q

x

1 22 ( )k

k

x

ke xx p x dxq

2e

2 2e e

2 2e e

This version assumes that the pdf of signa

l is availabe.This version assumes that the pdf of signa

l is availabe.

The Max-Lloyd algorithm(Practical Version)

Exercise


Differential PCM (DPCM)

Typical Audio Signals

0 500 1000 1500 2000 2500-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

1250 1300 1350 1400 1450 1500 1550 1600 1650 1700 1750-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

A segment of audio signals

Do you find any correlation and/or redundancy among the samples?

The Basic Idea of DPCM

Adjacent samples exhibit a high degree of correlation.

Removing this adjacent redundancy before encoding, a more efficient coded signal can be resulted.

How?– Accompanying with prediction (e.g., linear prediction)– Encoding prediction error only

Linear Prediction

1

ˆ( ) ( )p

kk

s n a s n k

ˆ( ) ( ) ( )e n s n s n

1

( ) ( )p

kk

s n a s n k

n1

n2

n3

np

ˆ( )s n

( )s n

2

1

( )N

pn

e n

E* arg min p

aa E

1( , , )pa a a

Linear Predictor

1

ˆ( ) ( )p

kk

s n a s n k

PredictorPredictor( )s n ˆ( )s n

DPCM Codec

ˆ( )s n

QuantizerQuantizer( )e n

( )s n

( )e n( )s n

PredictorPredictor

( )e n

+

++

( )s n

PredictorPredictor( )s n ˆ( )s nPredictorPredictorPredictorPredictor( )s n ˆ( )s n

ˆ( )s n

Channel

Channel( )e n

PredictorPredictor

( )s n+

+

ˆ( )s n

( )s nA/D

converter

DPCM Codec

Channel( )e n

PredictorPredictor

( )s n+

+

ˆ( )s n

ˆ( )s n


( )s n

( )e n( )s n

PredictorPredictor

( )e n

+

++

( )s n

ˆ( )s n

Channel

( )s nA/D

converter

The dynamic range of prediction

error is much smaller than the

signal’s.

Less quantization levels needed

The dynamic range of prediction

error is much smaller than the

signal’s.

Less quantization levels needed

Performance of DPCM

By using a logarithmic compressor and a 4-bit quantizer for the error sequence e(n), DPCM results in high-quality speech at a rate of 32,000 bps, which is a factor of two lower than logarithmic PCM


Adaptive PCM (ADPCM)

Basic Concept The power level in a speech signal varies

slowly with time.

Let the quantization step dynamically adapt to the slowly time-variant power level.

(n)

( ) ( )n n

Adaptive Quantization Schemes

Feed-forward-adaptive quantizers– estimate (n) from x(n) itself– step size must be transmitted

Feedback-adaptive quantizers– adapt the step size, , on the basis of the quantize

d signal– step size needs not to be transmitted

ˆ( )x n

Feed Forward Adaptation

QuantizerQuantizer EncoderEncoder

Step-SizeAdaptation

System

Step-SizeAdaptation

System

( )x n ˆ( )x n ( )c n

( )n( )n

DecoderDecoder( )c n

( )n

ˆ( )x n

Feed Forward Adaptation


Step-SizeAdaptation

System

Step-SizeAdaptation

System

( )x n ˆ( )x n ( )c n

( )n( )n


( )n

ˆ( )x n

The source signal is not available at receiver. So, the receiver can’t evaluate (n) by itself.

The source signal is not available at receiver. So, the receiver can’t evaluate (n) by itself.

(n) has to be transmitted.

( )x n

Quantization errorˆ( ) ( ) ( )e n x n x n

QuantizerQuantizer EncoderEncoder( )x n ˆ( )x n ( )c n

( )n( )n


( )n

ˆ( )x n ( )x n

Quantization errorˆ( ) ( ) ( )e n x n x n

(n) has to be transmitted.

The Step-Size Adaptation System

Step-SizeAdaptation

System

Step-SizeAdaptation

System

Estimate signal’s short-time energy,

2(n), and make (n) (n).

Estimate signal’s short-time energy,

2(n), and make (n) (n).

0( ) ( )n n

The Step-Size Adaptation System Low-Pass Filter Approach

2 2( ) ( ) ( )n

m

n x m h n m

( ) , 0,0 1nh n n

2 ( )n

n m

m

x m

1

2 2( ) ( )n

n m

m

x m x n

1

1 2 2( ) ( )n

n m

m

x m x n

2 2( 1) ( )n x n

0( ) ( )n n

The Step-Size Adaptation System Low-Pass Filter Approach

0( ) ( )n n

= 0.99 = 0.9

The Step-Size Adaptation System Moving Average Approach

2 2

1

( ) ( ) ( )n

m n M

n x m h n m

1( ) ,0 0 1h n M

M

2

1

1( )

n

m n M

x mM

0( ) ( )n n

Feed-Forward Quantizer

2 2

1

1( ) ( )

n

m n M

n x mM

0( ) ( )n n

(n) evaluated every M Samples Use M=128, 1024 for estimates Suitable choosing of min and max

Feed-Forward Quantizer

2 2

1

1( ) ( )

n

m n M

n x mM

0( ) ( )n n

(n) evaluated every M Samples Use M=128, 1024 for estimates Suitable choosing of min and max

Too longToo long

Feedback Adaptation


Step-SizeAdaptation

System

Step-SizeAdaptation

System

( )x n ˆ( )x n ( )c n

( )n


( )n

ˆ( )x n

Step-SizeAdaptation

System

Step-SizeAdaptation

System

(n) can be evaluated at both sides using the same alogorithm. Hence, it needs not to be transmitted.

The Step-Size Adaptation System

Step-SizeAdaptation

System

Step-SizeAdaptation

System

Step-SizeAdaptation

System

Step-SizeAdaptation

System

QuantizerQuantizer EncoderEncoder( )x n ˆ( )x n ( )c n

( )n


( )n

ˆ( )x n

The same as feed-forward adaptation except that the input changes.

Alternative Approach to Adaptation

( ) ( ) ( 1)n P n n

P(n){P1, P2, …} depends on c(n1).

Needs to impose the limits

The ratio max/min controls the dyna

mic range of the quantizer.

min max( )n


( ) ( ) ( 1)n P n n

P(n){P1, P2, …} depends on c(n1).

Needs to impose the limits

The ratio max/min controls the dyna

mic range of the quantizer.

min max( )n P1

P2

P3

P4

P5

P6

P7

P8



Delta Modulation

(DM)

Delta Modulation

Simplest form of DPCM– The prediction of the next is simply the current

Sampling rate chosen to be many times (e.g., 5) the Nyquist rate, adjacent samples are quite correlated, i.e., s(n)s(n1).– 1-bit (2-level) quantizer is used– Bit-rate = sampling rate

Review DPCM

ˆ( )s n


( )s n

( )e n( )s n

PredictorPredictor

( )e n

+

++

( )s n

ˆ( )s n

Channel

Channel( )e n

PredictorPredictor

( )s n+

+

ˆ( )s n

( )s nA/D

converter

DM Codec

ˆ( )s n


( )s n

( )s n

PredictorPredictor

( )e n

+

++

Channel

1

Channel( )e n

PredictorPredictor

( )s n+

+

ˆ( )s n

A/Dconverter

z1

z1

time

Distortions of DM

0 1 1 1 1 1 0 0 0 0 1 0 0 1 0

T

step size

code words:1 ( ) 1

( )0 ( ) 1

e nc n

e n

time

Distortions of DM

0 1 1 1 1 1 0 0 0 0 1 0 0 1 0

T

step size

code words:1 ( ) 1

( )0 ( ) 1

e nc n

e n

granular noisegranular noise

slope overload condition

slope overload condition

time

Choosing of Step Size

Needs small step size

Needs small step size

Needs large step size

Needs large step size

time

Adaptive DM (ADM)

( ) ( 1)( ) ( 1) e n e nn n K

Adaptive DM (ADM)

( ) ( 1)( ) ( 1) e n e nn n K

2K

Documents

Speech Coding (Part I) Waveform Coding