05739208

7/27/2019 05739208

1/6

AN INTERACTIVE SPEECH CODING TOOL USING LABVIEWTM

Karthikeyan N. Ramamurthy, Jayaraman J. Thiagarajan and Andreas Spanias

SenSIP Center, School of ECEE, Arizona State University, Tempe, AZ USA 85287-5706

ABSTRACT

Code Excited Linear Prediction (CELP) is a closed-loop

analysis-by-synthesis speech coding algorithm that has been

standardized in Federal Standard-1016. Variants of the

CELP algorithm form the core of many speech coding

standards that exist today. In this paper, we discuss the

development of an interactive speech coding tool in National

Instruments LabVIEWTM software for the Federal Standard-

1016 CELP algorithm. A brief description of the speech

coding algorithm and the features of the LabVIEW speech

coding tool are presented. Illustrations demonstrating the use

of the interactive software tool in analyzing the speechcoding algorithm are provided. This tool can be used to

teach the various modules of the CELP based speech coders

to undergraduate and graduate students.

Index Terms Speech coding, LabVIEW, code

excited linear prediction, interactive tools.

1. INTRODUCTION

Speech coding is concerned with compact digital

representations of voice signals for the purpose of efficient

transmission or storage [1-6]. Linear predictive coding is the

core of many speech coding standards that exist today [7,8].

Linear predictive coding relies on the source-system modelof speech production, which is inspired by the human speech

production mechanism. Voiced speech is produced by

exciting the vocal tract filter with periodic impulses and

unvoiced speech is generated using random pseudo-white

noise excitation. The vocal tract is usually represented by a

tenth-order digital all-pole filter. This source-system

analysis-synthesis model is used in most standardized

algorithms. In fact, the Levinson-Durbin linear prediction

algorithm is embedded in every cell phone.

The closed-loop source-system encoders use linear

prediction (LP) along with an excitation scheme determined

by closed-loop analysis-by-synthesis (A-by-S) optimization.

The excitation sequence that minimizes the perceptually-weighted (PW) mean-square-error (MSE) between the input

speech and reconstructed speech is chosen as the optimal

[9]. In the CELP algorithm [10-11], the excitation sequences

are stored in two code books and the indices to the

codebooks are chosen during the PW MSE minimization

process. The adaptive code book (ACB) predicts the pitch

delay using the long term predictor (LTP) and the stochastic

code book (SCB) predicts the random component of the

excitation. Other components of a generic CELP encoder

include autocorrelation analysis and linear prediction, and

line spectral pair (LSP) computation. CELP decoder

implements a part of the encoder itself. A generic CELP

encoder is illustrated in Figure 1.

LabVIEWTM [12] was chosen as the programming

environment to implement the CELP algorithm as it has a

rich set of signal processing and visualization functions, and

real-time signal acquisition capabilities. Implementation of

speech coding algorithms involves integration of softwareand hardware components, which can be easily performed

with LabVIEW. The graphical programming approach

enables users to easily visualize and understand the basic

blocks of the speech analysis-synthesis procedure. This

speech coding tool is scalable in the sense that additional

options and capabilities could be added.

In this paper, we extend the work published in [13] and

discuss the implementation of the Federal Standard 1016

(FS-1016) version of the CELP speech coder in

LabVIEWTM. Our main goal here is to introduce and

demonstrate the concepts of speech coding to students and

enhance their learning experience using an interactive visual

interface. We choose the CELP coder for analysis, becauseit can be connected to several concepts covered in DSP

classes including digital filter theory, estimation of

periodicity, autocorrelation computation and filter stability.

Exercises that expose students to the non-stationarity of the

speech signal, the all-pole spectral modeling performed by

LP analysis-synthesis and the distortion caused by

quantization of LSP parameters will be developed. The tool

can be used along with the books [4,10] that have

demonstrations of the FS-1016 algorithm and exercises

based on MATLAB. The speech coding tool is of value

not only to undergraduate and graduate students but also to

DSP practitioners. The tool can also be used in high school

science classes after some simplifications, for demonstratingthe basic aspects of coding and transmission of speech.

Assessment instruments will be developed and pre-, post-

quizzes and interviews will be conducted among the

students.

180978-1-61284-227-1/11/$26.00 2011 IEEE DSP/SPE 2011

7/27/2019 05739208

2/6

2. CELP BASED SPEECH CODING STANDARDS

The speech coding standards based on CELP are surveyed in

this section. In our survey, we divide the algorithms based

on CELP into three categories based on their chronology oftheir development, i.e., first-generation CELP (1986-1992),

second-generation CELP (1993-1998), and third-generation

CELP (1999-present). A detailed description of the FS-1016

standard is also provided.

2.1. Survey of Speech Coders

The first-generation CELP algorithms are generally of high

complexity and non-toll quality that operate at bit rates

between 4.8 kb/s and 16 kb/s. Some of the first-generation

CELP algorithms include: the FS-1016 CELP, the IS-54

vector sum excited linear prediction (VSELP), the ITU-T

G.728 low delay-CELP, and the IS-96 Qualcomm CELP.The newer second and third generation A-by-S coders

replaced most of these standardized CELP algorithms.

The second-generation CELP algorithms are targeted

for Internet audio streaming, voice-over-Internet-protocol

(VoIP), teleconferencing applications, and secure

communications. Some of the second-generation CELP

algorithms include: the ITU-T G.723.1 dual-rate speech

codec [14], the GSM EFR [15], the IS-127 Relaxed CELP

(RCELP) [16], and the ITU-T G.729 CS-ACELP [17]. The

algebraic CELP (ACELP) uses algebraic codes in place of

the SCB and hence provides a huge reduction in

computational complexity for code book search.

The third-generation (3G) CELP algorithmsaccommodate different bit rates and are multimodal. They

are designed to operate in different modes: low-mobility,

high-mobility, indoor, etc., and consistent with the vision on

wideband wireless standards. There are at least two

algorithms that have been developed and standardized for

these applications. The Global Systems for Mobile Rate

Communications (GSM) standardized the Adaptive Multi-

(AMR) coder [18] in Europe and the Telecommunications

Industry Association (TIA) has tested the Selectable Mode

Vocoder (SMV) [19] in the U.S.

2.2. The FS-1016 CELP Coder

FS-1016 is a 4.8 kb/s CELP algorithm that was adopted in

the late 1980s by the Department of Defense (DoD) for use

in the third-generation secure telephone unit (STU-III). The

CELP FS-1016 remains interesting for our study as it

contains core elements of A-by-S algorithms that are still

very useful. The synthesis configuration for the FS-1016

CELP is shown in Figure 2. Speech is sampled at 8 kHz and

segmented into frames of 30ms in the FS-1016 CELP. Each

frame is segmented in sub-frames of 7.5 ms. The excitation

in CELP is formed by combining vectors from an adaptive

and a stochastic codebook (gain-shape VQ). The excitation

vectors are selected in every sub-frame by minimizing theperceptually weighted error measure.

The codebooks are searched sequentially starting with

the ACB. The ACB contains the history of past excitation

signals and the LTP lag search is carried over 128 integer

(20 to 147) and 128 non-integer delays. Only a subset of

lags is searched in even sub-frames to reduce the

computational complexity. The SCB contains 512 sparse

and overlapping code vectors [20]. Each code vector

consists of sixty samples and each sample is ternary valued

SpeechVQ index

Lag index

Postfilter

Stochastic

codebook

Adaptive

codebook

+

+

ga

gs

A(z)

SpeechVQ index

Lag index

Postfilter

Stochastic

codebook

Adaptive

codebook

+

+

ga

gs

A(z)

Figure 2. FS-1016 CELP synthesis.

.

.

..

MSE

minimization

PWF

W(z)

Input

speech

LP synthesis

filter1/A(z)

+

_Syntheticspeech

Residual

error

Excitation vectors

Codebook

( )ix

( )

is

( )

i

ws

( )ie

LTP synthesis

filter1/AL(z)

g(i)PWF

W(z)

s

sw

..

..

MSE

minimization

PWF

W(z)

Input

speech

LP synthesis

filter1/A(z)

+

_Syntheticspeech

Residual

error

Excitation vectors

Codebook

( )ix

( )

is

( )

i

ws

( )ie

LTP synthesis

filter1/AL(z)

g(i)PWF

W(z)

s

sw

Figure 1. Block diagram of a generic CELP encoder.

181

7/27/2019 05739208

3/6

MATLAB

Implementation

of the

Algorithm

Create a Shared

Library using

MATLAB

Compiler

Create a C++

WrapperFunction

Create a NewShared Library

for LabVIEW

Build User

Interface inLabVIEW

Figure 3. Block diagram illustrating the steps involved in building the speech coding tool.

(1,0,-1) [21] to allow for fast convolution.Ten short-term prediction parameters are encoded as

LSPs on a frame-by-frame basis. LSPs are more amenable

to quantization and hence they are transmitted instead of LP

coefficients. Sub-frame LSPs are obtained by performing

linear interpolation of frame LSPs. A short-term pole-zero

postfilter is also part of the standard. The details on the bit

allocations are given in the standard [11]. The computational

complexity of FS-1016 CELP was estimated at 16 Million

Instructions per Second (MIPS) for partially searched

codebooks and the Diagnostic Rhyme Test (DRT) and Mean

Opinion Scores (MOS) were reported to be 91.5 and 3.2respectively.

3. LABVIEW SPEECH CODING TOOL

In this section, we present the software tool developed for

teaching speech coding theory and the CELP algorithm

using the National Instruments LabVIEW package.

Implementation of highly complex signal processing

algorithms involves integration of several software and

hardware components developed across different platforms.

Hence, there is a need for a scalable framework that

provides flexibility to extensions and ability to perform

detailed analysis under different system conditions. Such a

framework can be realized using two different approaches:

a) hybrid programming and b) integration of existing

software.

Hybrid programming combines the inherent graphical

programming functions of LabVIEW with the textual

programming using Mathscript. The primary limitations of

this approach are the speed of execution and the overhead

involved in converting the external source code from the

native programming language to Mathscript. The other

approach is to integrate existing software and this requires a

complete understanding of the underlying platform in which

the native code was developed. The primary challenge in

this case is to develop suitable software interfaces for

LabVIEW to communicate with the different components.The important limitation of this approach is that extension of

the algorithm and modification of the native source code

may be required. However, we base our speech coding tool

on this approach, since hybrid programming is not fast

enough to realize real-time applications.

Figure 4. An example LabVIEW model using the built dll.

3.1. Basic Framework

The framework has been built using shared libraries that

exploited existing MATLAB implementation along with

LabVIEWs native functionalities. We make use of shared

libraries that are built from the native implementation and

integrated with software/hardware components developed in

LabVIEW. The basic steps involved in building the speech

coding tool using LabVIEW is illustrated in Figure 3.

i) MATLAB implementation: The speech coding/processingalgorithms are implemented using MATLAB [10]. This

implementation includes functions from specific toolboxes.

The inputs and outputs of the speech coding tool are

identified.

ii) Create shared library from MATLAB: The MATLAB

compiler is used to build a shared C library of the algorithm.

This requires the MATLAB Component Runtime (MCR).

iii) Create C++ wrapper: A C++ wrapper is built to

interface the MATLAB library and LabVIEW. This step

includes the identification of the functions that are to be

exposed to LabVIEW.

iv)Make library for LabVIEW: A new shared library is built

over the wrapper code. In effect, invoking a function of thisnew library will implicitly invoke the functions in the

MATLAB shared library.

v)Build user interface for the tool: Call Library node is used

to call the external shared library in LabVIEW. A graphical

182

7/27/2019 05739208

4/6

interface is developed to handle the inputs and outputs of

the library function

3.2. Challenges

The primary challenge involved in this process is in the

creation of the wrapper function. In addition to

communicating data types between MATLAB and

LabVIEW, it needs to account for the memory issues in

LabVIEW. When LabVIEW loads a VI, it loads all the

subVIs into memory. Specifically, it loads all the shared

libraries (*.dll) used. The dlls are erased from memory only

when the top-level VI is closed. This is a problem when

MATLAB libraries are used in our speech coding tool. Wecannot initialize the MATLAB dll again once we have

terminated the tool. This is because the dlls are not erased

from the memory unless the LabVIEW application is

restarted. This implies that we are only able to run the dll

once. Therefore the solution we resort to is splitting the

initialize, execute and terminate functions of the MATLAB

libraries. Then when running the tool in LabVIEW, we

initialize the libraries only once, before running the dll

functions and terminate them before we shut down. Figure 4

illustrates an example LabVIEW model that uses the built

dll.

4. USER INTERFACE OF THE TOOL

Figure 5 shows the user interface of the LabVIEW speechcoding tool. The interface consists of multiple tabs that

illustrate several modules of the FS-1016 algorithm. The

software can access either an audio (.wav) file or real -time

speech input. The user also has options to change certain

speech parameters to analyze the performance and behavior

of the algorithm under different conditions. The

preprocessed input speech is displayed and processed on a

frame-by-frame basis. Frame-by-frame display is also used

to view the spectra of the decoded output speech frames, the

LP spectral envelopes before and after quantizing the LSPs,

pole-zero plots of the synthesis filter, synthesized speech

waveforms etc. The software has options to save the output

speech. The user can also analyze the subjective quality ofthese algorithms by listening to the synthesized speech with

the aid of the playback feature.

In the following sections, the various outputs obtained

with the LabVIEW tool for a single frame of speech will be

shown. The analyzed frame is a voiced frame with a pitch

period of 65 samples.

Figure 5. User interface of the LabVIEW speech coding tool.

(a)

(b)

Figure 6. (a) Input spectrum and (b) output spectrum for

the given frame of speech data.

183

7/27/2019 05739208

5/6

4.1. Input and Output Spectra

The Fourier magnitude spectra of the input speech frame and

the output speech frame can be observed using the tool. In

Figure 6, the spectra for the analyzed frame as obtained from

the LabVIEW tool are shown. The user can analyze the

spectra of any desired frame.

4.2. Quantized and Unquantized LP Spectra

The LP spectra obtained before and after quantizing the

LSPs for the given frame are shown in Figure 7. This feature

of the tool is very useful in order to analyze the spectraldistortion caused by the quantization of LSP parameters.

The roots of the input LP polynomial obtained using the

unquantized LSPs and the output polynomial obtained from

the quantized LSPs are shown in Figure 9. It can be seen that

theoutput LP filter is still stable after quantization. This canbe used to demonstrate the preservation of stability by

quantizing LSPs instead of LP coefficients. Quantization of

LP coefficients is more likely to result in an unstable filter.

4.3. Subjective Quality

The subjective quality of the speech coder can be analyzed

using the options to play back the postfiltered, high-pass

filtered and non-postfiltered speech as shown in Figure 8.

5. UTILITY IN EDUCATION AND ASSESSMENT

The main educational objective of the LabVIEW speech

coding tool developed in this paper is to introduce and

demonstrate the concepts of speech coding, in particular

coding based on analysis-by-synthesis methods. The

interactive visual interface of LabVIEW is intuitive and the

tabbed interface of the tool allows the students to visualize

various concepts of speech coding simultaneously, which is

not possible when text-based programming languages are

used. The tool can also be extended easily to include other

outputs that are useful for student learning. This can be used

to demonstrate speech coding in a DSP class or in a more

advanced speech coding class. The authors have written

books on audio coding [4] and FS-1016 [10], which contain

exercises and demonstrations of the FS-1016 in MATLAB.

The proposed tool can be used along with these books for

demonstrating speech coding concepts.

The following exercises will be developed and

presented to the students as a part of the proposed

assessment. Assessment results will be generated after

introducing the students to the speech coding tool in the

DSP class at Arizona State University. The assessment

results will include pre- and post-quizzes on the

fundamentals of speech coding.

5.1. Analysis of Voice/Unvoiced/Mixed Frames

The students are required to identify voiced, unvoiced and

mixed frames from a speech file. They will plot the time

domain input and output waveforms, Fourier spectra,unquantized and quantized LPC plots. The time and

frequency domain characteristics of the voiced/unvoiced and

mixed frames will be analyzed.

5.2. Subjective Quality Analysis

The students will be asked to evaluate the performance of

the FS-1016 coder with the speech files provided. The three

speech outputs, (a) postfiltered speech, (b) non-postfiltered

speech and, (c) highpass filtered speech will be listened to

and the differences in subjective quality will be analyzed.

The students will also provide a MOS, which is a measure of

perceived speech quality.

5.3. Pitch Forcing

The students will have the option to force the pitch to a

predefined value, using the Force Pitch option in the tool.

Different values of pitch periods, (e.g.) 40, 75 and 110,

(a)

(b)Figure 7. (a) Input LP spectrum and (b) output LP spectrum

for the given frame of speech data.

Figure 8. Options for subjective quality analysis.

184

7/27/2019 05739208

6/6

will be forced and the students will evaluate the perceptual

quality of output speech.

6. CONCLUSIONS

In this paper, a LabVIEW speech coding tool that

implements the FS-1016 algorithm was presented. The steps

involved in creating the software tool from the existing

MATLAB implementation of FS-1016 were described.

The tool will be very useful to students and practitioners of

DSP for teaching and understanding the principles behind

CELP based speech coding algorithms.

7. ACKNOWLDGEMENTS

Portions of this work have been sponsored by the ASU

SenSIP center National Instruments project and the NSF

CCLI award 0443137.

8. REFERENCES

[1] A. Spanias, Speech Coding: A Tutorial Review,

Proceedings of the IEEE, Vol.82, Issue 10, Oct 1994.

[2] V. Atti, A Simulation Tool For Introducing Algebraic

CELP (ACELP) Coding Concepts In A DSP Course, IEEE

2002 DSPWorkshop, Callaway, Georgia, Oct. 2002.

[3] A. Spanias, E.M. Painter, A Software Tool for

Introducing Speech Coding Fundamentals in a DSP Course,

IEEE Trans. On Education, Vol.39,2, pp.143-152, May

1996.

[4] A. Spanias, T Painter, V. Atti, Audio Signal Processing

and Coding, ISBN: 0-471-79147-4, Wiley, February 2007.

[5] A. Spanias, Digital Signal Processing; An Interactive

Approach, ISBN: 978-1-4243-2524-5, January 2007.

[6] V. Atti, Interactive On-line Undergraduate Laboratories

Using J-DSP, IEEE Trans. on Education Special Issue on

Web-based Instruction, vol. 48, no. 4, pp. 735-749, Nov.

2005.

[7] A. Spanias, Chapter 3: Speech Coding Standards, pp.

25-44, Invited. Academic Press, Ed: G. Gibson, ISBN 2000

0-12- 282160-2.

[8] FS-1016 CELP C Code Implementation, Available at

World Wide Web: ftp://svr-ftp.eng.cam.ac.uk/ comp.speech/

coding/celp_3.2a.tar.Z.

[9] M.R. Schroeder and B. Atal, Code-Excited LinearPrediction (CELP): High Quality Speech at Very Low Bit

Rates,Proc. ICASSP-85, p. 937, Apr. 1985.

[10] K. Ramamurthy and A. Spanias, MATLAB Software for

the Code Excited Linear Prediction Algorithm: The Federal

Standard-1016, Morgan and Claypool, 2010 .

[11] J.P. Campbell Jr., T.E. Tremain and V.C. Welch, The

Federal Standard 1016 4800 bps CELP Voice Coder,

Digital Signal Processing, Academic Press, Vol. 1, No. 3, p.

145-155, 1991.

[12] LabVIEW Fundamentals, Available at World Wide

Web: http://www.ni.com/pdf/manuals/374029a.pdf

[13] A. Spanias, K. Natesan, J. Jayaraman and P. Spanias,

Work in progress - teaching speech signal processing andcoding using LabVIEWTM,Proc. IEEE FIE, pp.T1C-22-

T1C-23, Oct. 2007.

[14] ITU Recommendation G.723.1, Dual Rate Speech

Coder for Multimedia Communications transmitting at 5.3

and 6.3 kb/s, Draft 1995.

[15] TIA/EIA/IS-641, Cellular/PCS Radio Interface -

Enhanced Full-Rate Speech Codec, TIA 1996.

[16] TIA/EIA/IS-127, Enhanced Variable Rate Codec,

Speech Service Option 3 for Wideband Spread Spectrum

Digital Systems, TIA, 1997.

[17] ITU Study Group 15 Draft Recommendation G.729,

Coding of Speech at 8kb/s using Conjugate-Structure

Algebraic-Code-Excited Linear-Prediction (CS-ACELP),1995.

[18] R. Ekudden, R. Hagen, I. Johansson, and J. Svedburg,

The Adaptive Multi-Rate speech coder, Proc. IEEE

Workshop on Speech Coding, pp. 117-119, Jun. 1999.

[19] Y. Gao et. al., The SMV algorithm selected by TIA

and 3GPP2 for CDMA applications, Proc. IEEE ICASSP-

01, vol. 2, pp. 709-712, May 2001.

[20] W.B. Kleijn, Source-Dependent Channel Coding and

its Application to CELP, Advances in Speech Coding, Eds.

B. Atal, V. Cuperman, and A. Gersho, pp. 257-266, Kluwer

Ac. Publ., 1990.

[21] D. Lin, New Approaches to Stochastic Coding of

Speech Sources at Very Low Bit Rates, Proc. EUPISCO-86, p. 445, 1986.

Figure 9. Roots of input (left) and output (right) LP

polynomials.

185

Documents

05739208