Upload
gzb012
View
214
Download
0
Embed Size (px)
Citation preview
7/27/2019 05739208
1/6
AN INTERACTIVE SPEECH CODING TOOL USING LABVIEWTM
Karthikeyan N. Ramamurthy, Jayaraman J. Thiagarajan and Andreas Spanias
SenSIP Center, School of ECEE, Arizona State University, Tempe, AZ USA 85287-5706
ABSTRACT
Code Excited Linear Prediction (CELP) is a closed-loop
analysis-by-synthesis speech coding algorithm that has been
standardized in Federal Standard-1016. Variants of the
CELP algorithm form the core of many speech coding
standards that exist today. In this paper, we discuss the
development of an interactive speech coding tool in National
Instruments LabVIEWTM software for the Federal Standard-
1016 CELP algorithm. A brief description of the speech
coding algorithm and the features of the LabVIEW speech
coding tool are presented. Illustrations demonstrating the use
of the interactive software tool in analyzing the speechcoding algorithm are provided. This tool can be used to
teach the various modules of the CELP based speech coders
to undergraduate and graduate students.
Index Terms Speech coding, LabVIEW, code
excited linear prediction, interactive tools.
1. INTRODUCTION
Speech coding is concerned with compact digital
representations of voice signals for the purpose of efficient
transmission or storage [1-6]. Linear predictive coding is the
core of many speech coding standards that exist today [7,8].
Linear predictive coding relies on the source-system modelof speech production, which is inspired by the human speech
production mechanism. Voiced speech is produced by
exciting the vocal tract filter with periodic impulses and
unvoiced speech is generated using random pseudo-white
noise excitation. The vocal tract is usually represented by a
tenth-order digital all-pole filter. This source-system
analysis-synthesis model is used in most standardized
algorithms. In fact, the Levinson-Durbin linear prediction
algorithm is embedded in every cell phone.
The closed-loop source-system encoders use linear
prediction (LP) along with an excitation scheme determined
by closed-loop analysis-by-synthesis (A-by-S) optimization.
The excitation sequence that minimizes the perceptually-weighted (PW) mean-square-error (MSE) between the input
speech and reconstructed speech is chosen as the optimal
[9]. In the CELP algorithm [10-11], the excitation sequences
are stored in two code books and the indices to the
codebooks are chosen during the PW MSE minimization
process. The adaptive code book (ACB) predicts the pitch
delay using the long term predictor (LTP) and the stochastic
code book (SCB) predicts the random component of the
excitation. Other components of a generic CELP encoder
include autocorrelation analysis and linear prediction, and
line spectral pair (LSP) computation. CELP decoder
implements a part of the encoder itself. A generic CELP
encoder is illustrated in Figure 1.
LabVIEWTM [12] was chosen as the programming
environment to implement the CELP algorithm as it has a
rich set of signal processing and visualization functions, and
real-time signal acquisition capabilities. Implementation of
speech coding algorithms involves integration of softwareand hardware components, which can be easily performed
with LabVIEW. The graphical programming approach
enables users to easily visualize and understand the basic
blocks of the speech analysis-synthesis procedure. This
speech coding tool is scalable in the sense that additional
options and capabilities could be added.
In this paper, we extend the work published in [13] and
discuss the implementation of the Federal Standard 1016
(FS-1016) version of the CELP speech coder in
LabVIEWTM. Our main goal here is to introduce and
demonstrate the concepts of speech coding to students and
enhance their learning experience using an interactive visual
interface. We choose the CELP coder for analysis, becauseit can be connected to several concepts covered in DSP
classes including digital filter theory, estimation of
periodicity, autocorrelation computation and filter stability.
Exercises that expose students to the non-stationarity of the
speech signal, the all-pole spectral modeling performed by
LP analysis-synthesis and the distortion caused by
quantization of LSP parameters will be developed. The tool
can be used along with the books [4,10] that have
demonstrations of the FS-1016 algorithm and exercises
based on MATLAB. The speech coding tool is of value
not only to undergraduate and graduate students but also to
DSP practitioners. The tool can also be used in high school
science classes after some simplifications, for demonstratingthe basic aspects of coding and transmission of speech.
Assessment instruments will be developed and pre-, post-
quizzes and interviews will be conducted among the
students.
180978-1-61284-227-1/11/$26.00 2011 IEEE DSP/SPE 2011
7/27/2019 05739208
2/6
2. CELP BASED SPEECH CODING STANDARDS
The speech coding standards based on CELP are surveyed in
this section. In our survey, we divide the algorithms based
on CELP into three categories based on their chronology oftheir development, i.e., first-generation CELP (1986-1992),
second-generation CELP (1993-1998), and third-generation
CELP (1999-present). A detailed description of the FS-1016
standard is also provided.
2.1. Survey of Speech Coders
The first-generation CELP algorithms are generally of high
complexity and non-toll quality that operate at bit rates
between 4.8 kb/s and 16 kb/s. Some of the first-generation
CELP algorithms include: the FS-1016 CELP, the IS-54
vector sum excited linear prediction (VSELP), the ITU-T
G.728 low delay-CELP, and the IS-96 Qualcomm CELP.The newer second and third generation A-by-S coders
replaced most of these standardized CELP algorithms.
The second-generation CELP algorithms are targeted
for Internet audio streaming, voice-over-Internet-protocol
(VoIP), teleconferencing applications, and secure
communications. Some of the second-generation CELP
algorithms include: the ITU-T G.723.1 dual-rate speech
codec [14], the GSM EFR [15], the IS-127 Relaxed CELP
(RCELP) [16], and the ITU-T G.729 CS-ACELP [17]. The
algebraic CELP (ACELP) uses algebraic codes in place of
the SCB and hence provides a huge reduction in
computational complexity for code book search.
The third-generation (3G) CELP algorithmsaccommodate different bit rates and are multimodal. They
are designed to operate in different modes: low-mobility,
high-mobility, indoor, etc., and consistent with the vision on
wideband wireless standards. There are at least two
algorithms that have been developed and standardized for
these applications. The Global Systems for Mobile Rate
Communications (GSM) standardized the Adaptive Multi-
(AMR) coder [18] in Europe and the Telecommunications
Industry Association (TIA) has tested the Selectable Mode
Vocoder (SMV) [19] in the U.S.
2.2. The FS-1016 CELP Coder
FS-1016 is a 4.8 kb/s CELP algorithm that was adopted in
the late 1980s by the Department of Defense (DoD) for use
in the third-generation secure telephone unit (STU-III). The
CELP FS-1016 remains interesting for our study as it
contains core elements of A-by-S algorithms that are still
very useful. The synthesis configuration for the FS-1016
CELP is shown in Figure 2. Speech is sampled at 8 kHz and
segmented into frames of 30ms in the FS-1016 CELP. Each
frame is segmented in sub-frames of 7.5 ms. The excitation
in CELP is formed by combining vectors from an adaptive
and a stochastic codebook (gain-shape VQ). The excitation
vectors are selected in every sub-frame by minimizing theperceptually weighted error measure.
The codebooks are searched sequentially starting with
the ACB. The ACB contains the history of past excitation
signals and the LTP lag search is carried over 128 integer
(20 to 147) and 128 non-integer delays. Only a subset of
lags is searched in even sub-frames to reduce the
computational complexity. The SCB contains 512 sparse
and overlapping code vectors [20]. Each code vector
consists of sixty samples and each sample is ternary valued
SpeechVQ index
Lag index
Postfilter
Stochastic
codebook
Adaptive
codebook
+
+
ga
gs
A(z)
SpeechVQ index
Lag index
Postfilter
Stochastic
codebook
Adaptive
codebook
+
+
ga
gs
A(z)
Figure 2. FS-1016 CELP synthesis.
.
.
..
MSE
minimization
PWF
W(z)
Input
speech
LP synthesis
filter1/A(z)
+
_Syntheticspeech
Residual
error
Excitation vectors
Codebook
( )ix
( )
is
( )
i
ws
( )ie
LTP synthesis
filter1/AL(z)
g(i)PWF
W(z)
s
sw
..
..
MSE
minimization
PWF
W(z)
Input
speech
LP synthesis
filter1/A(z)
+
_Syntheticspeech
Residual
error
Excitation vectors
Codebook
( )ix
( )
is
( )
i
ws
( )ie
LTP synthesis
filter1/AL(z)
g(i)PWF
W(z)
s
sw
Figure 1. Block diagram of a generic CELP encoder.
181
7/27/2019 05739208
3/6
MATLAB
Implementation
of the
Algorithm
Create a Shared
Library using
MATLAB
Compiler
Create a C++
WrapperFunction
Create a NewShared Library
for LabVIEW
Build User
Interface inLabVIEW
Figure 3. Block diagram illustrating the steps involved in building the speech coding tool.
(1,0,-1) [21] to allow for fast convolution.Ten short-term prediction parameters are encoded as
LSPs on a frame-by-frame basis. LSPs are more amenable
to quantization and hence they are transmitted instead of LP
coefficients. Sub-frame LSPs are obtained by performing
linear interpolation of frame LSPs. A short-term pole-zero
postfilter is also part of the standard. The details on the bit
allocations are given in the standard [11]. The computational
complexity of FS-1016 CELP was estimated at 16 Million
Instructions per Second (MIPS) for partially searched
codebooks and the Diagnostic Rhyme Test (DRT) and Mean
Opinion Scores (MOS) were reported to be 91.5 and 3.2respectively.
3. LABVIEW SPEECH CODING TOOL
In this section, we present the software tool developed for
teaching speech coding theory and the CELP algorithm
using the National Instruments LabVIEW package.
Implementation of highly complex signal processing
algorithms involves integration of several software and
hardware components developed across different platforms.
Hence, there is a need for a scalable framework that
provides flexibility to extensions and ability to perform
detailed analysis under different system conditions. Such a
framework can be realized using two different approaches:
a) hybrid programming and b) integration of existing
software.
Hybrid programming combines the inherent graphical
programming functions of LabVIEW with the textual
programming using Mathscript. The primary limitations of
this approach are the speed of execution and the overhead
involved in converting the external source code from the
native programming language to Mathscript. The other
approach is to integrate existing software and this requires a
complete understanding of the underlying platform in which
the native code was developed. The primary challenge in
this case is to develop suitable software interfaces for
LabVIEW to communicate with the different components.The important limitation of this approach is that extension of
the algorithm and modification of the native source code
may be required. However, we base our speech coding tool
on this approach, since hybrid programming is not fast
enough to realize real-time applications.
Figure 4. An example LabVIEW model using the built dll.
3.1. Basic Framework
The framework has been built using shared libraries that
exploited existing MATLAB implementation along with
LabVIEWs native functionalities. We make use of shared
libraries that are built from the native implementation and
integrated with software/hardware components developed in
LabVIEW. The basic steps involved in building the speech
coding tool using LabVIEW is illustrated in Figure 3.
i) MATLAB implementation: The speech coding/processingalgorithms are implemented using MATLAB [10]. This
implementation includes functions from specific toolboxes.
The inputs and outputs of the speech coding tool are
identified.
ii) Create shared library from MATLAB: The MATLAB
compiler is used to build a shared C library of the algorithm.
This requires the MATLAB Component Runtime (MCR).
iii) Create C++ wrapper: A C++ wrapper is built to
interface the MATLAB library and LabVIEW. This step
includes the identification of the functions that are to be
exposed to LabVIEW.
iv)Make library for LabVIEW: A new shared library is built
over the wrapper code. In effect, invoking a function of thisnew library will implicitly invoke the functions in the
MATLAB shared library.
v)Build user interface for the tool: Call Library node is used
to call the external shared library in LabVIEW. A graphical
182
7/27/2019 05739208
4/6
interface is developed to handle the inputs and outputs of
the library function
3.2. Challenges
The primary challenge involved in this process is in the
creation of the wrapper function. In addition to
communicating data types between MATLAB and
LabVIEW, it needs to account for the memory issues in
LabVIEW. When LabVIEW loads a VI, it loads all the
subVIs into memory. Specifically, it loads all the shared
libraries (*.dll) used. The dlls are erased from memory only
when the top-level VI is closed. This is a problem when
MATLAB libraries are used in our speech coding tool. Wecannot initialize the MATLAB dll again once we have
terminated the tool. This is because the dlls are not erased
from the memory unless the LabVIEW application is
restarted. This implies that we are only able to run the dll
once. Therefore the solution we resort to is splitting the
initialize, execute and terminate functions of the MATLAB
libraries. Then when running the tool in LabVIEW, we
initialize the libraries only once, before running the dll
functions and terminate them before we shut down. Figure 4
illustrates an example LabVIEW model that uses the built
dll.
4. USER INTERFACE OF THE TOOL
Figure 5 shows the user interface of the LabVIEW speechcoding tool. The interface consists of multiple tabs that
illustrate several modules of the FS-1016 algorithm. The
software can access either an audio (.wav) file or real -time
speech input. The user also has options to change certain
speech parameters to analyze the performance and behavior
of the algorithm under different conditions. The
preprocessed input speech is displayed and processed on a
frame-by-frame basis. Frame-by-frame display is also used
to view the spectra of the decoded output speech frames, the
LP spectral envelopes before and after quantizing the LSPs,
pole-zero plots of the synthesis filter, synthesized speech
waveforms etc. The software has options to save the output
speech. The user can also analyze the subjective quality ofthese algorithms by listening to the synthesized speech with
the aid of the playback feature.
In the following sections, the various outputs obtained
with the LabVIEW tool for a single frame of speech will be
shown. The analyzed frame is a voiced frame with a pitch
period of 65 samples.
Figure 5. User interface of the LabVIEW speech coding tool.
(a)
(b)
Figure 6. (a) Input spectrum and (b) output spectrum for
the given frame of speech data.
183
7/27/2019 05739208
5/6
4.1. Input and Output Spectra
The Fourier magnitude spectra of the input speech frame and
the output speech frame can be observed using the tool. In
Figure 6, the spectra for the analyzed frame as obtained from
the LabVIEW tool are shown. The user can analyze the
spectra of any desired frame.
4.2. Quantized and Unquantized LP Spectra
The LP spectra obtained before and after quantizing the
LSPs for the given frame are shown in Figure 7. This feature
of the tool is very useful in order to analyze the spectraldistortion caused by the quantization of LSP parameters.
The roots of the input LP polynomial obtained using the
unquantized LSPs and the output polynomial obtained from
the quantized LSPs are shown in Figure 9. It can be seen that
theoutput LP filter is still stable after quantization. This canbe used to demonstrate the preservation of stability by
quantizing LSPs instead of LP coefficients. Quantization of
LP coefficients is more likely to result in an unstable filter.
4.3. Subjective Quality
The subjective quality of the speech coder can be analyzed
using the options to play back the postfiltered, high-pass
filtered and non-postfiltered speech as shown in Figure 8.
5. UTILITY IN EDUCATION AND ASSESSMENT
The main educational objective of the LabVIEW speech
coding tool developed in this paper is to introduce and
demonstrate the concepts of speech coding, in particular
coding based on analysis-by-synthesis methods. The
interactive visual interface of LabVIEW is intuitive and the
tabbed interface of the tool allows the students to visualize
various concepts of speech coding simultaneously, which is
not possible when text-based programming languages are
used. The tool can also be extended easily to include other
outputs that are useful for student learning. This can be used
to demonstrate speech coding in a DSP class or in a more
advanced speech coding class. The authors have written
books on audio coding [4] and FS-1016 [10], which contain
exercises and demonstrations of the FS-1016 in MATLAB.
The proposed tool can be used along with these books for
demonstrating speech coding concepts.
The following exercises will be developed and
presented to the students as a part of the proposed
assessment. Assessment results will be generated after
introducing the students to the speech coding tool in the
DSP class at Arizona State University. The assessment
results will include pre- and post-quizzes on the
fundamentals of speech coding.
5.1. Analysis of Voice/Unvoiced/Mixed Frames
The students are required to identify voiced, unvoiced and
mixed frames from a speech file. They will plot the time
domain input and output waveforms, Fourier spectra,unquantized and quantized LPC plots. The time and
frequency domain characteristics of the voiced/unvoiced and
mixed frames will be analyzed.
5.2. Subjective Quality Analysis
The students will be asked to evaluate the performance of
the FS-1016 coder with the speech files provided. The three
speech outputs, (a) postfiltered speech, (b) non-postfiltered
speech and, (c) highpass filtered speech will be listened to
and the differences in subjective quality will be analyzed.
The students will also provide a MOS, which is a measure of
perceived speech quality.
5.3. Pitch Forcing
The students will have the option to force the pitch to a
predefined value, using the Force Pitch option in the tool.
Different values of pitch periods, (e.g.) 40, 75 and 110,
(a)
(b)Figure 7. (a) Input LP spectrum and (b) output LP spectrum
for the given frame of speech data.
Figure 8. Options for subjective quality analysis.
184
7/27/2019 05739208
6/6
will be forced and the students will evaluate the perceptual
quality of output speech.
6. CONCLUSIONS
In this paper, a LabVIEW speech coding tool that
implements the FS-1016 algorithm was presented. The steps
involved in creating the software tool from the existing
MATLAB implementation of FS-1016 were described.
The tool will be very useful to students and practitioners of
DSP for teaching and understanding the principles behind
CELP based speech coding algorithms.
7. ACKNOWLDGEMENTS
Portions of this work have been sponsored by the ASU
SenSIP center National Instruments project and the NSF
CCLI award 0443137.
8. REFERENCES
[1] A. Spanias, Speech Coding: A Tutorial Review,
Proceedings of the IEEE, Vol.82, Issue 10, Oct 1994.
[2] V. Atti, A Simulation Tool For Introducing Algebraic
CELP (ACELP) Coding Concepts In A DSP Course, IEEE
2002 DSPWorkshop, Callaway, Georgia, Oct. 2002.
[3] A. Spanias, E.M. Painter, A Software Tool for
Introducing Speech Coding Fundamentals in a DSP Course,
IEEE Trans. On Education, Vol.39,2, pp.143-152, May
1996.
[4] A. Spanias, T Painter, V. Atti, Audio Signal Processing
and Coding, ISBN: 0-471-79147-4, Wiley, February 2007.
[5] A. Spanias, Digital Signal Processing; An Interactive
Approach, ISBN: 978-1-4243-2524-5, January 2007.
[6] V. Atti, Interactive On-line Undergraduate Laboratories
Using J-DSP, IEEE Trans. on Education Special Issue on
Web-based Instruction, vol. 48, no. 4, pp. 735-749, Nov.
2005.
[7] A. Spanias, Chapter 3: Speech Coding Standards, pp.
25-44, Invited. Academic Press, Ed: G. Gibson, ISBN 2000
0-12- 282160-2.
[8] FS-1016 CELP C Code Implementation, Available at
World Wide Web: ftp://svr-ftp.eng.cam.ac.uk/ comp.speech/
coding/celp_3.2a.tar.Z.
[9] M.R. Schroeder and B. Atal, Code-Excited LinearPrediction (CELP): High Quality Speech at Very Low Bit
Rates,Proc. ICASSP-85, p. 937, Apr. 1985.
[10] K. Ramamurthy and A. Spanias, MATLAB Software for
the Code Excited Linear Prediction Algorithm: The Federal
Standard-1016, Morgan and Claypool, 2010 .
[11] J.P. Campbell Jr., T.E. Tremain and V.C. Welch, The
Federal Standard 1016 4800 bps CELP Voice Coder,
Digital Signal Processing, Academic Press, Vol. 1, No. 3, p.
145-155, 1991.
[12] LabVIEW Fundamentals, Available at World Wide
Web: http://www.ni.com/pdf/manuals/374029a.pdf
[13] A. Spanias, K. Natesan, J. Jayaraman and P. Spanias,
Work in progress - teaching speech signal processing andcoding using LabVIEWTM,Proc. IEEE FIE, pp.T1C-22-
T1C-23, Oct. 2007.
[14] ITU Recommendation G.723.1, Dual Rate Speech
Coder for Multimedia Communications transmitting at 5.3
and 6.3 kb/s, Draft 1995.
[15] TIA/EIA/IS-641, Cellular/PCS Radio Interface -
Enhanced Full-Rate Speech Codec, TIA 1996.
[16] TIA/EIA/IS-127, Enhanced Variable Rate Codec,
Speech Service Option 3 for Wideband Spread Spectrum
Digital Systems, TIA, 1997.
[17] ITU Study Group 15 Draft Recommendation G.729,
Coding of Speech at 8kb/s using Conjugate-Structure
Algebraic-Code-Excited Linear-Prediction (CS-ACELP),1995.
[18] R. Ekudden, R. Hagen, I. Johansson, and J. Svedburg,
The Adaptive Multi-Rate speech coder, Proc. IEEE
Workshop on Speech Coding, pp. 117-119, Jun. 1999.
[19] Y. Gao et. al., The SMV algorithm selected by TIA
and 3GPP2 for CDMA applications, Proc. IEEE ICASSP-
01, vol. 2, pp. 709-712, May 2001.
[20] W.B. Kleijn, Source-Dependent Channel Coding and
its Application to CELP, Advances in Speech Coding, Eds.
B. Atal, V. Cuperman, and A. Gersho, pp. 257-266, Kluwer
Ac. Publ., 1990.
[21] D. Lin, New Approaches to Stochastic Coding of
Speech Sources at Very Low Bit Rates, Proc. EUPISCO-86, p. 445, 1986.
Figure 9. Roots of input (left) and output (right) LP
polynomials.
185