Waveform SpeechCoding Algorithms: An Overview Outline Introduction Concepts Quantization PCM DPCM ADPCM Standards & Applications G711

Waveform SpeechCoding Algorithms: An Overview

Outline

Introduction

Concepts

Quantization

PCM

DPCM

ADPCM

Standards & Applications

G711

G726

Performance Comparison & Examples

Summary & Conclusion

Technical Presentation Page 2


Introduction

Motivation

What is Speech Coding ?

It is the procedure of representing a digitized speech signal as efficiently as possible, while maintaining a reasonable level of speech quality.

Why would we want to do that ?

To Answer this, let’s have a look at the Structure of the Coding System

Our Guy


Introduction

Motivation

Filtering & Sampling (1)


Introduction

Motivation



Introduction

Motivation



Introduction

Motivation


[1]

Most of the speech contents lies in between 300 – 3400 Hz According to Nyquist theorem Fs >= 2 fm (to avoid aliasing) A value of 8kHz is selected (8 >= 2*3.4). For good quality16 bits are used to represent each sample. Bit-rate = 8kHz *16 bits = 128 kbps Input Rate

The Input rate could even be more, for example in Skype: 16 kHz sampling frequency is used in skype and so resulting to an input rate of 192 kBit/s. But, this is a waste of bandwidth that could rather be used by other services and applications. Source Coding (Speech Coding in this Context)


Introduction

Motivation

Desirable Properties of a Speech Coder

Low Bit-Rate: By using a lower bit-rate, a smaller bandwidth for transmission is needed , leaving room for other services and applications .

High Speech Quality: Speech quality is the rival of “low bit-rate”. It is important for the decoded speech quality to be acceptable for the target application.

Low Coding Delays: The process of speech coding introduce extra delay, this might affect application that have real time requirements.

[1]


Introduction

Speech Coding Categories

What are the different Categories of speech coding ?

Speech coding is divided into three different categories: Waveform Codecs (PCM, DM, APCM, DPCM, ADPCM) Vocoders (LPC, Homo-morphic, …etc ) Hybrid codecs (CELP, SELP, RELP, APC, SBC, … etc)

[2]


Concepts

Quantization

What Is Quantization ?

Quantization is the process of transforming the sample amplitude of a message into a discrete amplitude from a finite set of possible amplitudes.

[3]

Each sampled value is approximated with a quantized pulse, the approximation will result in an error no larger than q/2 in the positive direction or –q/2 in the negative direction.


Concepts

Quantization

Understanding Quantization

To understand quantization a bit more let’s have a look at the following Example:


Concepts

Quantization

Classification Of Quantization Process

The Quantization process is classified as follows:

Uniform Quantization: The representation levels are equally spaced (Uniformly spaced) Midtread type Midrise type

Non-Uniform Quantization: The representation levels have variable spacing from one another .

But why do we need such classification ?!

[4]


Concepts

Quantization

Human Speech – Excursion & Recap (1)

Speech can broken into two different categories: Voiced (zzzzz) Un-Voiced (sssss)

Naturally occurring speech signals are composed of a combination of the above categories, take the word “Goat” for example:

[4]

Goat contains two voiced signals followed by a partial closure of the vocal tract and then an Un-voiced signal. Those occurs at 3400-3900, 3900-5400, and 6300-6900, respectively.


Concepts

Quantization - why do we need such classification ?! (1)

Human Speech – Excursion & Recap (2)

It should be noted that: The peak-to-peak amplitude of voiced signals is approximately ten times that of un-voiced signal.Un-voiced signals contain more information, and thus higher entropy than voiced signals. The telephone system must provide higher resolution for lower amplitude signals

Statistics of Speech Signals :

[3] [6]Amplitude of speech signals


Concepts

Quantization - why do we need such classification ?! - (2)

Quantization Noise

The Quantization process is lossy (errorneous).

An error defined as the difference between the input signal M and the output signal V. This error E is called the Quantization Noise.

Consider the simple example: M = (3.117, 4.56, 2.31, 7.82, 1) V = (3,3,2,7,2) E = M – V = (0.117 ,1.561, 0.31, 0.89, 1)

How do we calculate the noise power ? Consider an input m of continuous amplitude of the range (-M_max, M_max) Assume a uniform Quantizer, how do we get the Quantization Noise Power

1


Concepts

Quantization - why do we need such classification ?! - (3)

Comparison – Uniform Vs. Non-Uniform Usage

Speech signals doesn’t require high quantization resolution for high amplitudes (50% Vs. 15%). wasteful to use uniform quantizer ? The goal is decrease the SQNR, more levels for low amplitudes, less levels for high ones. Maybe use a Non-uniform quantizer ?

[3]


Concepts

Quantization

More About Non-Uniform Quantizers (Companding)

Uniform quantizer = use more levels when you need it. The human ear follows a logarithmic process in which high amplitude sound doesn’t require the same resolution as low amplitude sounds. One way to achieve non-uniform quantization is to use what is called as “Companding” Companding = “Compression + Expanding”

Compressor Function

UniformQuantization

Expander Function

(-1)


Concepts

Quantization

What is the purpose of a Compander ?

The purpose of a compander is to equalize the histogram of speech signals so that the reconstruction levels tend to be equally used.

[6] [6]

There are two famous companding techniques that Follow the Encoding law

A-Law Compandingµ-Law Companding

2


Concepts

Quantization

A-Law Encoding µ-Law Encoding

[3]


Concepts

Quantization

Companding Approximation

Logarithmic functions are slow to compute, why not approximate ? 3 bits, 8 segments ( chords ) to approximate

P is the sign bit of the output S’s are the segment code Q’s are the quantization codes

[3]


Concepts

Quantization

Companding Approximation – Algorithm

Encoding

Add a bias of 33 to the absolute value of the input sample Determine the bit position of the most significant among bits 5 to 12 of the input Subtract 5 from that position, and this is the Segment code Finally, the 4 bit quantization code is set to 4 bits after the bit position of the most significant among bits 5 to 12

Decoding

Multiply the quantization code by 2 and add 33 the bias to the result Multiply to the result by 2 raised to the power of the segment code Decrement the result by the bias Use P – bit to determine the sign of the result

Example ?!

[3]


Concepts

Quantization

µ-Law Encoding - Example

Example Input - 656

1 1 0 0 0 1 0 1

P S2 S1 S3 Q3 Q4 Q5 Q6

Sample is negative so bit P becomes 1 Add 33 to the absolute value to bias high input values (due to wrapping) The result after adding is 689 = 0001-0101-10001 The most-significant 1 bit in position 5 to 12 is at position 9 Subtracting 5 from the position values yields 4 The segment code Finally the 4 bits after the last position are inserted as the quantization code


Concepts

Quantization

µ-Law Decoding - Example

Example Input - 656

1 1 0 0 0 1 0 1

P S2 S1 S3 Q3 Q4 Q5 Q6

The quantization code is 101 = 5, so 5*2 +33 =43 The segment code is 100 = 4 , so 43* 2^4 = 688. Decrement the Bias 688 -33 =655 But P is 1 so the final result is -655 Quantization Noise is 1 (Very small)


Concepts

Quantization

µ-Law Encoding

A-Law Encoding

Approximately linear for smaller values & Logarithmic for high input values

The practically used values for µ is 255

Used for speech signals

Used for PCM telephone systems in US, Canada and Japan

Linear segments for low level inputs & a logarithmic segment for high level inputs

The practically used values for A is 100

Used for PCM telephone system in Europe


Concepts

Pulse Code Modulation (PCM)

PCM Description

Sampling results in PAM PCM uniformly quantizes PAM The result of PCM are PCM words Each PCM word is l= Log2 (L) bits

[3]


Concepts

Differential Pulse Code Modulation (DPCM)

DPCM Description

Signals that are sampled at a high rate have high correlation. The difference between those samples will not be large Instead of quantizing each sample, why not quantize the difference ? This will result in a quantizer with much less number of bits

[7] [7]

This is a simple form where (First Order) More than one signal can be used in the prediction (N-Order) Problems with this approach ?


Concepts


DPCM Example

It is clear here from the table that the error adds up to produce an output signal which is completely different from the original one

[7]


Concepts


DPCM Prediction

Previously, input to predictor in the encoder was different than the one in the decoder. The difference between the predictor led to reconstruction error e(n) = x[n] – x’[n]. To solve this problem completely the same predictor that was used in the decoder will also be used in the decoder

Therefore the reconstruction error at the decoder output will be the same as the quantization error at the encoder. There will be no quantization accumulation.

Channel


Concepts

Adaptive Differential Pulse Code Modulation (ADPCM)

ADPCM Description

As can be inferred from the name, ADPCM combines PCM + DPCM and adds the ADPCM The “A” in ADPCM stands for “Adaptive” In DPCM, the difference between x[k] and x[k-1] is transmitted instead of x[k] To further reduce the number of bits per sample, ADPCM adapts the quantization levels to the characteristics of the analog signal . Original 32-Kbps ADPCM used 4 bits

[9]


Standards, Examples & Applications

G711

G711 Description

A Wave form codec that was Released in 1972 Formal name is Pulse Code Modulation (PCM) since it uses PCM in it’s encoding G711 achieves 64 kbps bit rate (8 kHz sampling frequency x 8 bits per sample) G711 defines two main compression algorithms

A-Law (Used in North America & Japan) µ-Law (Used in Europe and the rest of the world)

A and µ laws takes as an input 14-bit and 13-bit signed linear PCM samples and Compress them to 8-bit samples

Applications Public Switching Telephone Network (PSTN) WiFi phones VoWLAN Wideband IP Telephony Audio & Video Conferencing H.320 & H.323 specifications


Standards, Examples & Applications

G726

G726 Description

G726 makes a conversion of a 64 kbps A-law or µ-law PCM channel to and from a 40, 32, 24 or 16 kbps channel. The conversion is applied to raw PCM using the ADPCM Encoding Technique Different rates are achieved by adapting the number of quantization levels

4 - levels (2 bits and 16 kbps) 7 - levels (3 bits and 24 kbps) 15 - levels (4 bits and 32 kbps) 31 - levels (5 bits and 64 kbps)

Includes G721 and G723

[12]


Performance Comparison

[1]


Summary & ConclusionSummary & Conclusion

Summary

Conclusion

We talked about quantization concepts in all it’s flavors We discussed about the category of waveform coding (PCM,DPCM and ADPCM) We presented the ITU Standards (G711 and G726) and mentioned some examples and applications Finally we did a comparison the most prominent speech codec's out there.

Speech coding Is an important concept that is required to efficiently use the existing bandwidth There exist many important metrics to keep in mind when doing speech coding. It is I important for a good speech coder to balance those metrics. The Most important ones are

Data Rate Speech Quality Delay

Waveform codec's, achieves the best speech quality as well as low delays. Vocoders achieves low data rate but at the cost of delays and speech quality Hybrid coders achieves acceptable speech quality and acceptable delay and data rate.


References

1. Wai C. Chu Speech Coding Algorithms: Foundation & Evolution of Standardized Coders 2. Speech Coding: http://www-mobile.ecs.soton.ac.uk/speech_codecs/3. Sklar: Digital Communication Fundamentals And Applications. 4. A-Law and mu-Law Companding Implementations Using the TMS320C54x5. Michael Langer: Data Compression – Introduction to lossy compression 6. Signal Quantization and Compression Overview http://www.ee.ucla.edu/~dsplab/sqc/over.html7. Wajih Abu-Al-Saud: Ch. VI Sampling & Pulse Code Mod. Lecture 25 8. Yuli You: Audio Coding: Theory And Applications 9. Tarmo Anttalainen: Introduction to telecommunication Networks Engineering10. Wikipedia G711: http://en.wikipedia.org/wiki/G.71111. David Salomon: Data Communication the Complete Reference12. ITU CCIT Recommendation G.726 ADPCM

http://www-mobile.ecs.soton.ac.uk/speech_codecs/

http://www.ee.ucla.edu/~dsplab/sqc/over.html

http://en.wikipedia.org/wiki/G.711

http://en.wikipedia.org/wiki/G.711


Questions & Discussion

Thank you!!

Documents

Waveform SpeechCoding Algorithms: An Overview Outline Introduction Concepts Quantization PCM DPCM ADPCM Standards & Applications G711