8/3/2019 Inter Frame Lsf
1/15
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999 495
Interframe LSF Quantization for Noisy ChannelsThomas Eriksson, Jan Linden, Member, IEEE, and Jan Skoglund, Member, IEEE
AbstractIn linear predictive speech coding algorithms, trans-
mission of linear predictive coding (LPC) parametersoftentransformed to the line spectrum frequencies (LSF) representa-tionconsumes a large part of the total bit rate of the coder.Typically, the LSF parameters are highly correlated from oneframe to the next, and a considerable reduction in bit rate canbe achieved by exploiting this interframe correlation. However,interframe coding leads to error propagation if the channelis noisy, which possibly cancels the achievable gain. In thispaper, several algorithms for exploiting interframe correlationof LSF parameters are compared. Especially, performance fortransmission over noisy channels is examined, and methods toimprove noisy channel performance are proposed. By combiningan interframe quantizer and a memoryless safety-net quan-tizer, we demonstrate that the advantages of both quantizationstrategies can be utilized, and the performance for both noiseless
and noisy channels improves. The results indicate that the bestinterframe method performs as good as a memoryless quantizingscheme, with 4 bits less per frame. Subjective listening testshave been employed that verify the results from the objectivemeasurements.
Index TermsInterframe coding, memory-based vector quan-tization, robust coding, spectrum coding, speech coding, vectorquantization.
I. INTRODUCTION
MODERN digital communication applications, such ascellular telephony, have lead to an increasing needfor high-quality speech coding schemes operating at lower
and lower bit rates. Most contemporary speech coders arebased on linear predictive coding (LPC), where a fairly white
excitation signal is fed into an all-pole filter representing the
spectral information of speech. For many applications, the LPC
spectrum is the major side information, and thus it is important
to encode the LPC parameters using as few bits as possible
with a maintained high speech quality. The aim of this study is
to investigate the problem of efficient transmission of spectral
Manuscript received August 19, 1996; revised March 31, 1999. Theassociate editor coordinating the review of this manuscript and approving itfor publication was Dr. Joseph Campbell.
T. Eriksson was with the Department of Information Theory, Chalmers
University of Technology, SE-412 96 Goteborg, Sweden. He is nowwith the Information Theory Group, Department of Signals and Systems,Chalmers University of Technology, SE-412 96 Goteborg, Sweden (e-mail:[email protected]).
J. Linden was with the Department of Information Theory, ChalmersUniversity of Technology, Goteborg, Sweden. He is now with the Departmentof Electrical and Computer Engineering, University of California, SantaBarbara, CA 93106 USA, and SignalCom Inc., Goleta, CA 93117 USA(e-mail: [email protected]).
J. Skoglund was with the Department of Information Theory, ChalmersUniversity of Technology, Goteborg, Sweden. He is now with AT&TLabsResearch, Shannon Laboratory, Florham Park, NJ 07932 USA (e-mail: [email protected]).
Publisher Item Identifier S 1063-6676(99)06560-8.
information by exploiting interframe correlation for noiseless
and noisy channels. The subject of LPC quantization has beenstudied intensively for many years, initially with the focus
on which parameter set to use for LPC representation [1],
[2]. In competition with reflection coefficients and log area
ratios (LAR), the line spectral frequencies or line spectrum
pairs (LSF or LSP, introduced in [3]) have shown to be a
suitable representation, and is the prevailing LPC parameter
set in speech coding today.
Up to about 1990, almost all coding schemes relied on
scalar quantization to some extent. Complexity reasons limited
the use of vector quantization (VQ), and therefore methods
designed to exploit intraframe correlation (correlation between
parameters within one frame) using scalar quantization wereproposed, see, e.g., [4][8]. The first work that incorporated
VQ was described in [9], but far from acceptable performance
was obtained with a VQ of 10 bits/frame. Instead, several
hybrids of scalar quantization and VQ were investigated, e.g.,
[10], [11]. Direct application of a single VQ is still not suitable
in practice (though it has been done in, e.g., [12]) but different
schemes that reduce the VQ complexity at the expense of
degraded performance have been demonstrated to outperform
earlier scalar systems. In [13] it is proposed that transparent
quantization can be achieved at 24 bits/frame if the LSF vector
is split into two vectors, each quantized with a separate VQ
(this procedure is usually referred to as split VQ). Another
efficient way of reducing VQ complexity is multistage VQ[14]. In [15] it is stated that the same performance as for the
24 bits/frame split VQ can be achieved at 22 bits/frame with
multistage VQ.
In memoryless quantization, each LSF parameter vector is
quantized independently of previous LSF vectors. This is not,
however, the most efficient way to encode the LSF vectors. Pa-
rameters extracted from speech, such as the LPC coefficients,
typically show a significant interframe correlation (correlation
between successive frames). Consequently, large gains can be
obtained by exploiting the interframe correlation. A number of
memory-based quantization schemes, i.e., schemes that utilize
correlation between successive frames, have been proposed
during the last ten years. In Section II, an overview of some
successful methods to exploit interframe correlation in LSF
quantization is presented. Among the most popular memory-
based VQ schemes is predictive VQ (PVQ) [16][19], a
straightforward extension of a scalar predictive quantizer,
and finite-state VQ (FSVQ) [20], [21], where a next-state
function determines which of a set of quantizers to use for the
next vector. Other quantizers with memory include methods
based on the discrete cosine transform, two-dimensional (2-D)
prediction, noiseless coding of VQ indices etc.
10636676/99$10.00 1999 IEEE
8/3/2019 Inter Frame Lsf
2/15
496 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999
Fig. 1. Predictive vector quantizer, encoder (left) and decoder (right). An error vector, which is quantized with a VQ, is formed by subtracting a predictionbased on previously quantized vectors from the current input vector.
Several studies of LSF quantization can be found in the
literature. However, the results can in general not be directly
compared since there may be large deviations in the exper-
imental setups. We have observed that different databases
can lead to different objective performance for the same
quantization scheme. Furthermore, there are several possi-
ble methods of performing the LPC analysis. For example,
both the autocorrelation method and the stabilized covariance
method are common for LPC analysis, and procedures suchas high frequency compensation and bandwidth expansion
also affect the result. The frame length varies between 5
and 40 ms in different papers, and the analysis window
overlap is also not consistent from one work to another (two
factors that are of significant importance for the performance
of memory-based methods [22]). Consequently, the greatest
caution should be exercised when comparing results from
different studies. Several memoryless quantization schemes are
compared using a common database in [23]. In this work we
have incorporated some of the most popular memory-based
VQ schemes and compared their performance for the same
database and analysis method. Throughout this paper, the order
of the linear prediction filter is 10 and the frame length is
20 ms with 25-ms analysis window. More details about the
experimental setup are found in Section VI-A.
An interesting subject in LSF quantization is the perfor-
mance of memory-based VQ methods when the transmission
channel is noisy. For such channels, bit errors are unavoidable.
This may cause the state of the encoder and decoder to differ.
In a memory-based scheme, this leads to a sequence of errors,
error propagation, which possibly cancels the advantage over
a memoryless VQ. We have studied a new technique called
safety-net VQ, which is shown to significantly decrease error
propagation. The safety-net can be used as an extension to a
memory-based VQ, thereby improving the performance bothfor transmission over noisy and noiseless channels. In this pa-
per, we study spectrum coding performance for noisy channels
without using explicit error protection on the transmitted bits
which can improve noisy channel performance, however at the
expense of fewer bits available for source coding.
The main topics of this report are 1) to study the perfor-
mance gains of exploiting interframe correlation for coding
of LPC parameters, and 2) to investigate the performance of
memory-based VQ for noisy channels.
The paper is organized as follows. Several of the most
commonly used memory-based VQ schemes are described in
Section II. In Section III, we calculate some estimates of the
achievable gains with interframe coding. The new safety-net
technique is thoroughly described in Section IV. Section V
investigates how performance can be improved for memory-
based VQs when channel noise is present. Simulation results
of the various systems under noisy and noiseless conditions
are given in Section VI in terms of objective measures as well
as in terms of subjective listening tests. Finally, conclusions
are given in Section VII.
II. MEMORY-BASED QUANTIZATION METHODS
A memory-based quantizer is a quantizer that incorporates
knowledge of previously quantized vectors when coding the
current input vector. The memory in the quantizer makes it
possible to exploit memory in the input process, i.e., interframe
dependencies. Both scalar and vector quantizers with memory
are common in the literature. Here we describe some of the
most successful memory-based quantization methods for LSF
parameters.
A. Predictive VQ
A straightforward method of taking advantage of the mem-
ory of the source is to utilize (linear) predictive vector quan-
tization (PVQ). PVQ is an extension of standard scalar pre-
dictive quantization (DPCM) obtained by replacing the scalar
predictor and scalar quantizer by their vector counterparts.
PVQ was introduced in [24] and [25], and further developed
in for example [19] and [26].
A vector linear predictor forms an estimate of the incoming
vectors1 as a linear combination of earlier quantized vectors,
and the prediction residual vector is quantized by a vector
quantizer. A PVQ encoder and decoder are depicted in Fig. 1.
The vector predictor can be written
(1)
where is the one-step-ahead prediction vector, are
earlier quantized input vectors, and are the prediction
matrices. The optimum values (in a minimum mean square
error sense) of the prediction matrices can be found by
1 In the following discussion, we will assume that the incoming vectors havezero mean, and that the vector process is ergodic and wide sense stationary.The formulas can easily be generalized to vectors with a nonzero mean.
8/3/2019 Inter Frame Lsf
3/15
ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 497
Fig. 2. Encoder (left) and decoder (right) of a finite-state VQ. Which of the K memoryless codebooks that is used at a certain coding instant is determinedby a next-state function. The input to the next-state function is the last chosen codevector and the previous state.
solving a system of linear matrix equations
for (2)
where are the correlation matrices
(3)
For simplicity, the unquantized input process, , is often
used to estimate the correlation matrices, instead of , which
would be more correct. The solution for a first-order predictor
( ) is particularly simple. For this case, the optimum
prediction matrix can be found by simple matrix inversion
and multiplication:
(4)
For higher order predictors, a generalized version of the
LevinsonDurbin algorithm [27] can be applied. In this work,
only first-order prediction has been simulated. As is pointed
out in Section III, most of the achievable prediction gain canbe realized with a first order vector predictor.
The correlation matrices are usually estimated from a train-
ing database, for example the same database that is later used
to train the vector quantizer in the PVQ. The simplest method,
and the one used in this paper, is the autocorrelation method,
where the correlation matrices are estimated as
(5)
Values of outside the observable window are assigned
the value zero. In [28], the autocorrelation method and the
covariance method for estimating correlation matrices in a
PVQ system are treated in more detail. After determiningthe prediction matrices, the VQ is trained, either by an open-
loop or closed-loop procedure. In the open-loop approach, the
predictor is designed first, without taking the VQ into account.
Then the VQ is separately trained on the resulting prediction
errors.
In the closed-loop approach, the predictor and the VQ are
first designed from the database, as in the open-loop approach.
Then, the PVQ system with the current VQ is used to generate
a new set of vectors for additional training of the VQ. This
process is iterated until a stopping criterion is reached. It is also
possible to update the predictor coefficients in a closed-loop
design process. The closed-loop PVQ design was proposed
in [19].
Another version of predictive VQ is the MA-PVQ, where
the decoder includes a moving average (MA) filter instead of
an autoregressive (AR) filter as in the standard PVQ solution.
In most cases, the MA predictor system requires a predictor of
higher order to reach the same performance as an AR predictor
system. The main advantage of the MA configuration is thefinite impulse response of the decoder filter, which leads to
limited bit error propagation. In this report, we study other
methods to limit the bit error propagation (see Sections IV and
V), and we will not discuss the MA predictor further. In [29], a
comparison between MA and AR prediction is presented and it
is found that using the methods described in Sections IV and V
the two prediction paradigms obtain comparable performance.
Other reports that study MA prediction include [30] and [31],
and the ITU-T 8 kb/s speech coding standard includes a fourth-
order MA predictor for LSF quantization [32].
Applications of PVQ to spectrum quantization can be found
in [16][18]. In [33] and [34], 2-D predictive quantization
is proposed, with the predictor utilizing both intraframe andinterframe correlation simultaneously. Some studies of non-
linear prediction can also be found, e.g., [35], [36]. A general
treatment of the concept of predictive VQ can be found in [37].
B. Finite-State VQ
Finite-state VQ (FSVQ), first reported on in [38], can
be viewed as a collection of memoryless vector quantizers,
together with a selection rule that determines which is the
current state, cf. Fig. 2. Each state is associated with one of
the memoryless VQs. The codebooks of the memoryless VQs
are called state codebooks and the union of them is usuallyreferred to as the super codebook. A next-state function is
employed to determine the new encoder state.
An input vector is encoded by searching the codebook,
corresponding to the current state, for the closest codevector.
The new encoder state is determined by the previous state
and the selected codeword in the state codebook, by use of
the next-state function. Only the codeword index has to be
transmitted since the current state is known by the decoder
which uses the same next-state function as the encoder. Note
that predictive VQ can be viewed as a special case of FSVQ
where the number of states is infinite.
8/3/2019 Inter Frame Lsf
4/15
498 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999
Several different FSVQ methods have been proposed in the
literature. The main difference is the next-state function, and
the way the codebooks are represented [20], [21]. We will in
this section describe two methods. The first is the very simple
nearest neighbor FSVQ (NN-FSVQ). In Section IV-C it will
be shown how performance can be significantly improved for
NN-FSVQ by extending the design. The next method is called
omniscient labeled-transitions FSVQ (OT-FSVQ) [20], which
has been found to yield the best codes of the proposed FSVQ
schemes with reasonable complexity in most applications [21].
In the NN-FSVQ approach, it is assumed that successive
input vectors are highly correlated and, consequently, succes-
sive coded vectors are close to each other. The basic idea
of NN-FSVQ is to design a very large memoryless (super-)
codebook, but only use a subset of the codevectors at every
coding instant. The smaller set of codevectors, chosen as the
nearest neighbor codevectors to the last chosen codevector in
the super codebook, constitutes the state codebook.
The gain compared to memoryless quantization of using
such a scheme is in general quite small. This is explained
by the fact that if the best codevector in the super codebookis not contained in the current state codebook, the best state
is lost and may never be recovered. This problem, usually
referred to as the derailment problem, is very similar to the
slope overload phenomenon well known from scalar delta
modulator quantizers. The NN-FSVQ method is, in its most
straightforward implementation, not practicable in the pres-
ence of channel noise, because the derailment problem then
becomes unmanageable.
The omniscient FSVQ technique has shown good perfor-
mance in many applications, especially for image coding,
e.g., [39] but also to some extent for speech coding [21],
[40]. There are two possibilities of representations for the
omniscient FSVQs; labeled-state and labeled-transitions. Wewill here only discuss the labeled-transition case, as it has
shown to perform better in applications [21], [40]. The first
step in the omniscient FSVQ design is to find a state classifier,
for example a memoryless VQ with the same number of
codevectors as the desired number of states. The training
data is then divided into subsets using the classifier. The
training subset for state consists of all training vectors
whose immediate predecessors have been classified to state .
The state codebook is then designed by applying a standard
VQ training algorithm using the subset of the training data
corresponding to state .
The decoder cannot track the omniscient next-state rule
defined above, since it depends on the input rather than onthe encoded input. However, if the actual input is replaced
with the encoded input, we get an approximation of the
next-state rule used in the design. Hence, the next state is
determined from the encoder output as depicted in Fig. 2,
which makes it possible for the encoder and the decoder to
be synchronized. The state codebooks can then be fine-tuned
by encoding the whole training sequence using the new FSVQ
encoder and replacing each codevector with the centroid of
the training vectors assigned to it. A closed-loop optimization
similar to that described for PVQ can be applied to improve
performance.
The omniscient FSVQ technique requires very large
databases for training purposes, especially if the number of
states is large. Even for the relatively small number of states
we have experimented with, the training is very complex
and requires a large training database. Another problem is
robustness, both against changes in the input signal and against
channel errors. In Section VI, results for OT-FSVQ with eight
states are reported. It is worth noting that if the number of
states is increased, the performance is expected to increase as
well. However, the performance improvement is in general
small, and is achieved at the expense of increased complexity
and storage requirements [40].
C. Other Memory-Based Quantization Schemes
Although finite-state VQ and predictive VQ are the most
commonly treated memory-based VQ methods in the literature,
there are also other methods to exploit interframe correlation.
Most of these other methods imply an increased coding delay,
high complexity, variable bit rate, etc. Variable bit rate and
high coding delay are acceptable in certain applications, suchas speech storage. However, in other applications, such as
speech coding for mobile telephony, it is of great importance
to keep the coding delay as low as possible. Variable bit rate
requires complex protocols in most channel access schemes,
and is hence not possible to use in many applications. In order
to keep the cost and power consumption of the hardware
(on which the coder is implemented) as low as possible, it
is important that the computational complexity is reasonably
low. Also, most speech coders operate in real-time, limiting
the computational delay to one frame. Brief explanations of
several methods to exploit interframe redundancy are given
below, but no measurements of performance are included in
this article.In matrix quantization, two or more vectors are compiled
into a matrix and are quantized simultaneously. This approach
is straightforward and clear, but it has two major disadvan-
tages: 1) the coding delay increases since two or more vectors
are buffered before quantization and 2) the complexity is often
very high. Hence, the usage of matrix quantization is in general
limited to very low rate applications. Complexity reduction for
matrix quantization have been proposed in, e.g., [41] and [42].
Phamdo and Farvardin [43] proposes a scheme called tree-
searched VQ with interblock noiseless coding (TSVQ-IBNC)
for coding of LSF parameters. This scheme relies on a tech-
nique developed by Neuhoff and Moayeri [44]. In TSVQ of
a correlated source, it is likely that the codewords of twoconsecutive vectors share a common part. Therefore, it is
possible to transmit only the altered bits of each codeword,
together with the length of the common part. This procedure
obviously results in a variable rate scheme.
Another scheme that works with the codewords instead of
directly on the vectors is relative index coding (RIC), proposed
by Bruhn in [45]. The codewords are sorted according to the
distance from the previously selected codevector, with index
zero being the same codevector as the previous, codeword one
is the closest index, and so on. The sorted index can then be
Huffman coded, resulting in a variable-rate scheme.
8/3/2019 Inter Frame Lsf
5/15
ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 499
Fig. 3. Left: Histogram of LSF parameters 1 to 10. Right: Scatter plot showing the distribution of LSF 1 and 2.
In [6], Farvardin and Laroia propose the use of the discrete
cosine transform to decorrelate consecutive LSF vectors. This
scheme requires an increased coding delay to obtain acceptable
performance.Interpolation of LSF parameters also relies on interframe
correlation. In [46], four out of eight frames are selected for
transmission, and the spectra of the remaining frames are
derived by interpolation. The coding delay is eight frames,
which is far from acceptable in low-delay applications.
Codebook adaptation is another popular procedure. Xydeas
and So [47] first search a fixed VQ for the best index, then
try to encode the index by use of a long history quantization
codebook, which is updated to contain the most common
indices. In [48], the first codebook in a two-stage VQ is
adapted by a deletion and partition operation.
III. ESTIMATES OF INTERFRAME CODING PERFORMANCE
In this section, we try to estimate the theoretically achiev-
able gains if interframe dependencies of the LSF vector
process are exploited. Rate distortion theory [49], [50] can
be of great help when the performance of a coding scheme
shall be estimated. The rate distortion function (RDF, )
gives a lower bound for the required rate, (number of bits
per parameter), in coding a stochastic process at a desired
distortion (commonly a quadratic distortion measure).
By computing the RDF for a memoryless coding scheme
and for a scheme where interframe correlation is exploited,
both at the same distortion, we can obtain an estimate of theachievable gains with interframe coding. The RDF is fully
determined by the probability density function (pdf) of the
actual process. However, the pdf of the LSF vector process is
not trivial to estimate, and even if a good estimate of the pdf
exists, the corresponding rate distortion function is difficult to
compute. Fortunately, there are some cases where the RDF is
simple to compute. In this section, we compute two estimates
of the RDF, based on different assumptions. In Section III-A,
we calculate the entropy of the index source generated by an
LSF VQ, and in Section III-B we make the assumption that the
distribution of the LSF parameter vector is jointly Gaussian.
A. Approximation 1: Entropy Measurements
In this section, we estimate the RDF of the LSF process by
entropy measurements.
First we design a vector quantizer for the LSF source usingthe algorithm in [51]. This VQ encodes the LSF source with
a certain distortion , producing a stream of indices .
Assuming that this index source is memoryless (alternatively
avoiding to exploit the memory in the process), we can find a
lower bound for the required number of bits to transmit the
VQ indices by computing the entropy of this source,
(6)
where is the number of vectors in the VQ. The index
source can be transmitted at a rate arbitrarily close to the
entropy by use of a noiseless coding scheme such as Huffman
coding, applied to long sequences of indices. This procedure is
impractical, due to the extra delay introduced. Therefore theseresults shall be considered as performance bounds, and not as
recipes on how to encode the VQ indices.
To estimate the required rate when knowledge of the previ-
ous indices is exploited, we compare the entropy above with
the conditional entropy, computed as
(7)
where is the history of the source, .
For reason of simplicity, we approximate the LSF process as
a first order Markov processes, with
(8)
We write
(9)
8/3/2019 Inter Frame Lsf
6/15
500 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999
The mutual information is defined as the difference
and constitutes an
estimate of the performance gain if knowledge of previously
encoded vectors is fully exploited. We note that the entropies
above are straightforward to determine once the probabilities
have been estimated, as shown in Section III-C.
B. Approximation 2: Gaussian pdf
If the probability density function of the LSF vectors
is assumed to be jointly Gaussian, there is a simple way
to estimate the gains if interframe correlation is exploited.
For Gaussian continuous-valued processes, the rate distortion
theory is well developed, and simple formulas exist. However,
a real LSF process is not Gaussian, partly due to the ordering
property of LSF parameters. In Fig. 3, histograms and scatter
plots of the LSF parameters are plotted. We conclude that the
one-dimensional marginal distributions of the LSFs are well
approximated with one-dimensional (1-D) Gaussian pdfs, but
we also note that a 2-D scatter plot of LSF 1 and 2 does
not seem Gaussian at all. Still, we think that valuable insights
of the LSF vector process can be achieved by the discussionbelow.
The rate distortion, , for jointly Gaussian pdfs is given
parametrically in the form [49]
(10)
where is the dimension of the vectors, and are the
eigenvalues of the vector process. For high rates, the distortion
rate function can be simplified to
(11)
Encoding of a Gaussian pdf requires a higher rate than
other pdfs to achieve a given distortion. This means that the
Gaussian RDF can serve as an upper bound for any non-
Gaussian pdf. The rate distortion function is fully determined
by the eigenvalues of the covariance matrix
[defined in (3)].
Now we want to compute the RDF for a system with mem-
ory. For jointly Gaussian vectors, the optimum minimum mean
square error one-step-ahead prediction is a linear combinationof the previous vectors (see e.g., [37])
(12)
The prediction error vector process, , is Gaussianas well. The covariance matrix for the error process of an
th-order minimum error variance predictor is given by
(13)
TABLE IESTIMATED BIT SAVING, ENTROPY MEASUREMENTS
TABLE IIESTIMATED BIT SAVING, GAUSSIAN APPROXIMATION
where are the optimum prediction matrices and
the correlation matrices, defined in Section II-A. From the
covariance matrix we can compute the eigenvalues and the
RDF for the prediction error. If we compute the RDF both forthe original LSF process and for the prediction error process
at the same distortion , we get an estimate of the achievable
interframe coding gain, measured in bits/vector. The results of
such RDF measurements are presented in the next section.
C. Interframe Gain Computation
The database for computing the entropies and rate distortion
functions is the same as the one described in Section VI-A,
consisting of almost two hours of speech recorded from FM
radio. The frame size is 20 ms, with a 2.5 ms overlap on both
sides. The ten-dimensional LSF vectors are split into three
vectors, with 3, 3, and 4 LSF parameters, respectively.First we trained a three-split VQ for the LSF process and
computed the entropy and the conditional entropy
for the stream of indices. It can be shown (the
proof is simple but beyond the scope of this study) that the
larger the size of the VQ, the higher the gains that can be
expected. However, for a -bit VQ, probabilities must be
estimated, and our database limits to be seven or less in order
to get accurate estimates of and
. Therefore we have computed the entropies for a three-split
VQ with 7 bits in each split, even though 8 bits would have
been more appropriate to estimate correlation gains relative
a realistic 24 bits ( ) memoryless VQ. The entropies and
conditional entropies of the three VQs are given in Table I.For this experiment, the results indicate that a total gain of
5.6 bits for three-split interframe encoding of the LSF vector
process can be expected.
In the second experiment, we computed the RDF (with
a Gaussian assumption) for three-split LSF vectors, and for
three-split of the first-order prediction error vectors. The RDF
was computed at a distortion of Hz per LSF
(standard deviation 25 Hz per LSF), which is close to the
distortion experimentally found for a 24 bits LSF VQ. The
corresponding rates for the LSF vectors are given in Table II,
together with the rates for the prediction error vectors. As
8/3/2019 Inter Frame Lsf
7/15
ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 501
Fig. 4. Safety-net principle: Combine a memory-based VQ with a fixedmemoryless VQ (the safety-net VQ).
can be seen in this table, we can expect to gain a total of
6.2 bits if first order interframe coding is employed. If we
use a second order predictor instead, the computations show
that we can expect to gain an extra 0.20.3 b, making a
total gain of 6.46.5 bits compared to a standard memoryless
LSF quantizer. If the predictor order is further increased, only
very little can be gained.2 We conclude that a first order
predictor achieves most of the gain, which is also confirmed
by experiments and other reports [17], [22].
These two gain estimates indicate that 56 bits can be savedby exploiting interframe correlation in a three-split structure.
The error in the above estimates of the achievable interframe
coding gain comes partly from the fact that the distortion
measure we seek to minimize in LSF quantization is not
the quadratic distance between original and encoded LSF
vector, but rather the spectral distance (SD). The necessary
approximations also lead to errors. However, we think that
the experiments in this section give a hint of the achievable
gains of interframe LSF coding.
IV. SAFETY-NET VQ
In this section, we propose an extension of existing memory-
based VQ systems with a fixed memoryless VQ, herebydenoted safety-net VQ. The safety-net concept was introduced
in [52], and has also been reported in [53] and [54]. Similar
systems have also been studied in for example [10], [17], [18],
[55], and [56]. In this paper we further develop the ideas, and
study the performance for transmission over noisy channels.
The main principle of the safety-net extension is illustrated
in Fig. 4. A memory-based VQ is combined with a fixed
memoryless VQ that operates independently of the memory-
based VQ. At each coding instant both codebooks are searched
for the best codevector.
By using this arrangement, we aim to achieve three objec-
tives.
To encode outliers, i.e., low-correlation frames, sep-
arately from the typical high-correlation frames. Many
memory-based VQ systems show good performance for
highly correlated input vectors, but perform worse than
memoryless systems for the occasional low correlation
frames. This results in low average distortion, but the
number of high distortion frames increases. In encoding
of, for example, spectrum coefficients, this is a serious
2 Note that there might be considerable long-time dependencies in the LSFvectors, since the speaker can be expected to repeat phonemes at irregularintervals. However, these dependencies are difficult to exploit by linearprediction.
problem, since there is a significant perceptual importance
of keeping the number of high distortion frames low.
This fact is emphasized in several studies [13], [57].
By adding a fixed memoryless codebook to the memory-
based VQ system, the low correlation frames are encoded
in a standard memoryless VQ, and a lower number of high
distortion frames can be expected.
Since outliers are separately encoded in the safety-net VQ,
the memory-based VQ can focus on the highly correlated
frames. A standard memory-based VQ encodes frames
with both high and low interframe correlation in the
same quantizer. The VQ must be designed to handle
both these cases, and the high interframe correlation in
the typical frames cannot be fully exploited. Some of
the potential performance gain of exploiting interframe
correlation in the memory-based VQ is lost due to the
need to compromise. The addition of a fixed memoryless
codebook that encodes outliers separately enables the
memory-based VQ to exploit interframe correlation to a
higher degree, and lower average distortion should result.
A serious objection to memory-based VQ systems is theperformance when the index must be transmitted over
a noisy channel, which is often the case in realistic
systems. An error in a memory-based VQ transmission
leads to error propagation, i.e., to a sequence of frames
where the internal state of the encoder and the decoder
differs, and thus a sequence of data with large errors is
produced. Most systems with memory forget the bit
error reasonably fast, but error propagation is nevertheless
a serious problem in memory-based VQs. By including a
fixed memoryless codebook, error propagation is canceled
every time an entry from the fixed codebook is selected
and correctly transmitted to the decoder. The improve-
ments in performance over noisy channels is perhapsthe strongest reason for extending the design with a
memoryless codebook. In Section V this subject is studied
in more detail.
The combination of the two VQs can be described as
(14)
where a fixed memoryless codebook is combined with
an adaptive memory-based codebook , resulting in the
extended codebook . The search process is performed by
first searching the adaptive codebook for the best vector,
then searching the fixed codebook for the best fixed vector.
The winning candidates from the two codebooks are compared,and the best of these two vectors,3 denoted , is encoded and
transmitted to the decoder as follows:
(15)
3 The distortion criterion we have used to find the two candidate vectorsis the weighted minimum squared error criterion (see Section VI-A), mainlydue to the comparably low complexity. When the best of the two candidatesshall be chosen, more complex criteria can be considered since only twovectors shall be compared; here we have used the spectral distance measure(Section VI-A).
8/3/2019 Inter Frame Lsf
8/15
8/3/2019 Inter Frame Lsf
9/15
ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 503
Fig. 6. Principle of a 2-D DCVQ. The nearest neighbors to the previouscodevector together with the fixed codebook form the combined codebook.
number of outliers. PVQ design procedures are described in
Section II-A.
C. Safety-Net FSVQ
The second safety-net method is a combination of the simple
nearest neighbor FSVQ technique, described in Section II-
B, and a safety-net VQ. It was first presented in [52] and
will be referred to as dynamic combination VQ (DCVQ).
Derailment occurs for NN-FSVQ when the input vector has
low correlation with the previous vector, and thus no good
representation of the input vector exists in the NN-FSVQ
codebook. Since the problem with outlying vectors propagate
to the next frame, due to the memory in the quantizer,
the result is a sequence of inadequately quantized vectors.
By introducing a safety-net to take care of outliers, theDCVQ solves the derailment problem. Hence, performance
for transmission over noisy channels is also significantly
improved. The major disadvantage of the DCVQ technique
is the same as one of the major problems with the NN-FSVQ;
that the storing requirements are large. An illustration of the
combination of a NN-FSVQ and a fixed memoryless quantizer
is given in Fig. 6.
The design of the DCVQ is simple, as described earlier:
The NN-FSVQ is trained using the full training database, and
a nearest neighbor table is stored, as described in Section II-B.
The safety-net VQ is also trained using the full training
database. Closed-loop training procedures can be applied for
this case as well, but in general the improvement is negligible.
V. MEMORY-BASED VQ ON NOISY CHANNEL
For the case of a memoryless VQ, the effect of channel noise
is straightforward. An error in the transmission of a codeword
index only effects the distortion of the current vector, since no
memory is incorporated. Systems with memory are affected
differently by channel errors than memoryless systems because
the memory in the decoder causes error propagation. The
effects of error propagation can be very serious in some
systems, if precautions are not taken. In, for example, a nearest
neighbor FSVQ a bit error could cause the system to derail and
never recover. In other systems, error propagation causes long
sequences of highly distorted vectors. One way to decrease the
effect of channel errors for memory-based VQ schemes is to
periodically perform a full search that forces the code into the
best possible state which is transmitted to the decoder. This
should be done quite infrequently, as the cost of sending extra
information gets high. This method is not suitable for PVQ
systems, because of the infinite number of states.
In this section, we study performance of memory-based VQ
systems operating on noisy channels, and try to decrease the
effects of error propagation. Other work that treat memory-
based LSF quantization in the presence of channel noise
include [58] and [40], while for example [59] and [13]
investigates noisy channel performance of memoryless LSF
quantization.
A. Optimization of Index Assignment for Memory-Based VQ
Index assignmentis the procedure of numbering the vectors
in a vector quantizer (assigning indices to the vectors). Noisy
channel performance of vector quantizers having random index
assignments is in general poor. In order to minimize the effect
of channel errors on the output signal, the codebook should be
reordered such that the Hamming distance (assuming a binary
channel) between any two codevector indices corresponds
closely to the Euclidean distance between the corresponding
codevectors. For this ordering problem, it is hard to find
optimal solutions. A number of suboptimal algorithms have
been proposed [60][63]. We have applied a fast and reli-
able method denoted the linearity increasing swap algorithm
(LISA), described in [63]. The choice of LISA for index
assignment is justified by the superior speed compared to other
methods (10 bits VQs are processed in seconds).
In this study, procedures to improve the index assignmentare applied to all vector quantizers. The VQ schemes in
this comparison benefit from improving index assignment
to various degrees. The gains are larger for methods with
memory, because the effect of error propagation is reduced.
It is not obvious how to apply such an algorithm for all of the
coding schemes, therefore we will here describe briefly how
it has been done.
1) Index Assignment for PVQ: The possible reconstruction
vectors at time are , where is the
prediction from previous coded vectors, is the codeword
in the actual prediction error codebook and is the codeword
index. Evidently, the distance between codevectors and
is the same as the distance between the correspondingvectors in the codebook and . Thus an index assign-
ment algorithm operating on the final reconstruction vectors is
identical to one operating on the codebook (i.e., not changing
with time). Consequently, for PVQ we can simply apply an
algorithm that improves the index assignment of the prediction
error codebook directly.
2) Index Assignment for FSVQ: For each state we have as-
signed one codebook and hence we can optimize the index
assignment for each of the state codebooks independently. Still
there is a problem when, due to channel errors, the encoder
and the decoder do not agree on the current state. In this
8/3/2019 Inter Frame Lsf
10/15
504 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999
case, another index assignment that takes into account that
the state is not correct would be optimal, but not practically
implementable. Thus, applying the index assignment algorithm
independently to each state codebook for FSVQ will result in
less frequent erroneous state decisions, but cannot improve
performance if they do occur.
3) Index Assignment for Safety-Net Methods: This problem
is more complicated because of the existence of two code-
books. If, for each coding instant, the adaptive and the memo-
ryless codebook are combined into one, and the optimization is
carried out for the combined codebook, the best possible index
assignment is achieved. This is only feasible if all possible
codebook combinations are known in the design procedure.
Otherwise, a new index assignment must be found at every
coding instant. For DCVQ all combinations of the codebooks
are known on beforehand, which means, that at least theo-
retically, it is possible to find an optimal index assignment.
Because of the increased complexity and storage requirements
that result, we have not implemented this strategy. Instead, we
have applied the index assignment algorithm independently to
the memoryless VQ and to the adaptive VQ. For SN-PVQwe have also applied the index assignment algorithm to the
two codebooks independently, but we have also improved the
index assignment by the help of a simple algorithm presented
in [22]. In short, a few different index assignments for the
adaptive VQ are precomputed, and which of these to use is
determined by classification of the current prediction vector.
The safety-net VQ still uses independently optimized index
assignment.
B. Channel Optimization for Memory-Based VQ
Index assignment does not take into account any explicit
knowledge about the channel error probability. If knowledge
about the channel can be incorporated in the design, perfor-
mance can be significantly improved. This is usually referred
to as channel optimized VQ (COVQ) [60]. A disadvantage is
that the performance degrades if the actual channel differs from
the design channel, or is changing with time. As is the case
in the index assignment design, a simultaneously optimized
COVQ requires that all combinations of the codebooks are
known already in the design procedure for the safety-net
extended systems. Hence, only independent COVQ designs for
the adaptive VQ and the memoryless VQ are feasible. In this
paper, we have not experimentally evaluated this method of
improving noisy channel performance, but in [40] it is shown
how COVQ can improve performance for omniscient FSVQ,and in [64], COVQ is employed to improve PVQ and SN-
PVQ. In [65], a design method that simultaneously trains the
codebook and the predictor for noisy channel PVQ is proposed.
C. Reducing Error Propagation
The fact that codevectors from the memoryless VQ are
frequently chosen implies that the error propagation is much
less prominent in a memory-based VQ scheme that includes
a safety-net than in one without a safety-net. The reason is
that if a codevector is chosen from the memoryless codebook
and the corresponding index is correctly conveyed over the
channel, the decoder will be forced into the same state as
the encoder. Consequently, it is desirable to increase the
number of times the encoder chooses a codevector from the
memoryless codebook as much as possible if the channel is
noisy, without increasing the total distortion noticeably. One
way to accomplish this is to study the relative number of
codevectors in the memoryless and adaptive codebook. If
and in (16) are equal, one bit is used to distinguish which
codebook the current vector originates from. If we want to
increase the usage of the memoryless VQ we can simply
increase the size of the memoryless codebook at the same
time decreasing the size of the adaptive codebook. However,
due to the indexing problems that arise when the codebook
size is not a power of two we have chosen not to investigate
other choices than equal sizes and .
Another way to increase usage of the memoryless VQ is
to bias the selection process to favor the safety-net vectors.
The bias can be a constant, or it can be a function of the
number of transmitted vectors since the last time a safety-net
vector was selected. With this method, the attractive limited
error propagation feature of moving average prediction canbe mimicked, by forcing the encoder to select a memoryless
vector after a predetermined number of vectors from the
memory-based VQ. This bias should be chosen depending
on the actual channel statistics. However, in this work we
use a constant bias of 0.15 dB (in SD) which is a heuristic
compromise for the range of error probabilities that was used
in the noisy channel experiments. Additional experiments on
biased decision can be found in [22].
D. Optimization of the Prediction Matrices for PVQ Systems
The performance of LSF quantization in a PVQ system
deteriorates much faster with increasing channel noise thanthe performance of memoryless LSF VQ. This fact motivated
us to improve the PVQ system for use over noisy channels.
In Fig. 1, a PVQ system is depicted. The VQ and the channel
in the system are modeled as white Gaussian noise sources,
see Fig. 7.
If we try to optimize the prediction matrices of the system,
in order to minimize the effects of quantization and channel
noise, we find that this problem is hard to treat mathematically.
However, for noiseless channels we have experimentally found
that the optimum predictor matrices are close to diagonal. Thus
the vector predictor can be approximated as a set of scalar
predictors. Some of the above approximations can be avoided
by orthogonalizing the input vectors before the analysis,by use of the KarhunenLoeve transform (see, e.g., [66]).
However, we find the approximations reasonable. For example,
by excluding all components of the prediction matrices outside
the diagonals, we have found that only about 0.15 bits is lost
for the full LSF vector. Therefore, we proceed by analyzing
the problem as a set of independent scalar problems.
Finding the equations for the optimum prediction coeffi-
cients for a set of independent problems is comparably simple.
For noiseless channels, the result
(17)
8/3/2019 Inter Frame Lsf
11/15
ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 505
Fig. 7. Model of VQ and channel in a PVQ system as Gaussian noise sources.
is obvious from Fig. 1, but worthwhile to emphasize. For noisy
channels, a term is added to , where is the
vector impulse response of the decoder filter, and denotes
convolution. Since we have a set of approximately independent
problems, we can consider the components of the vectors one
at a time. From (17) with the added channel noise term, we
write the error in a component as
(18)where is the power transfer factor for a component of the
decoder filter, i.e., the factor by which the power of a white
noise input signal is amplified by the filter. The quantizer and
channel error variances are assumed to be proportional to the
prediction error variance ,
and (19)
and we rewrite (18) as
(20)
This result is also derived in [66]. Now we want to express
the power transfer factor, , and the prediction error variance,
, as functions of the coefficients of the input process and
prediction filter. For the sake of simplicity, we restrict the
calculations to first- and second-order AR processes, generated
by
(21)
The linear predictor is written as
(22)
After some work, we find expressions for and :
(23)
(24)
By inserting (23) and (24) in (20) we obtain an expression
for the error variance of the PVQ system in the presence of
channel noise. Also, for given values of , , , , and
(derived from the VQ, the channel and the input process),
we can find the optimum values of and . Even for this
simplified system, an analytic solution is hard to find, but a
Fig. 8. Performance of a 20-bit SN-PVQ as a function of the mix betweenvectors in the memory-based VQ and the safety-net VQ in terms of averagespectral distortion.
numerical solution is easily obtained. Note that (20) must be
independently solved for each component in the LSF vector.
The result from the above analysis is used to improve
the PVQ and SN-PVQ performance for noisy channels, in
Section VI-B. The diagonal elements of the prediction ma-
trices are optimized for high noise levels ( ), and
no optimization for actual channel noise level is performed.
That is, the error probability of the channel is not a designparameter. Even the results for noiseless channels are obtained
with the prediction matrices optimized for high noise lev-
els. However, the noiseless spectral distortion increase when
the matrix is optimized for high noise is small, 0.020.04
dB, while the gains for severely degraded channels can be
several dB.
In [67], Chang and Donaldson derive formulas for optimum
predictor coefficients for scalar DPCM systems. Jayant and
Noll [66] give a general overview of the problem of transmit-
ting DPCM over noisy channels, and Noll [68] analyzes the
noisy channel performance of PCM and DPCM quantizing
schemes.
VI. EXPERIMENTS
In this section, the experiments used to determine optimal
parameters of the memorybased and safety-net methods are
described. Comparisons of all tested methods are given, both
for noiseless and noisy channels. To verify the objective
results, a listening test is presented.
A. Experimental Setup
The speech training database used to design all the VQs
in this work consists of 250 000 vectors. Another set of
8/3/2019 Inter Frame Lsf
12/15
506 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999
20 000 vectors is used for evaluation.4 The speech is recorded
from FM radio and includes a large number of speakers of
both gender. The language is mostly Swedish. The speech is
digitized at 16 kHz, lowpass-filtered at 3.4 kHz and decimated
to 8 kHz sampling frequency. A tenth-order LPC analysis
using the stabilized covariance method with high-frequency
compensation and error weighting [13] is performed every
20 ms using a 25 ms analysis window. A fixed 10 Hz
bandwidth expansion is applied to each pole of the LPC
coefficient vector.
One of the key issues in vector quantization is the selec-
tion of an appropriate distortion measure for the codebook
search. The Euclidean distance measure is often used for its
simplicity. Here, we employ the weighted Euclidean distance
measure presented in [13] that has shown to improve both
the objective quality (measured in spectral distortion, SD),
and the subjective quality of the coded speech. This distance
measure has been used in the design and evaluation of all VQ
techniques presented in this work except for the design of the
next-state functions in OT-FSVQ and DCVQ, where the un-
weighted Euclidean measure was employed. For measuring thequantization performance, we calculate the spectral distortion
in the 03 kHz range.
Large savings in complexity and storage requirements can
be achieved if a product-code technique is employed. In
this work we have utilized a three-split VQ scheme for allquantizers, where the dimension in each split are 3, 3, and 4,
respectively. An important design issue for a split VQ system
is the number of bits to allocate for the individual VQs. It is
common that the bits are evenly distributed over the splits in
order to keep the largest codebook as small as possible [13],
[40]. However, since the difference in complexity is relatively
small, we have here used the bit configurations that result in
the best performance in terms of average spectral distortion.A typical example is 24-bit VQs where 8 bits were used for
the first split, 9 bits for the second, and 7 bits for the last split.
All the investigated quantization methods used the same bit
allocations.
For the PVQ systems first order prediction is used, with the
prediction matrix optimized for high noise level according to
(20), (23), and (24) ( ). For the OT-FSVQ schemes
the number of states is chosen to be eight. We will not report
on any results for NN-FSVQ, instead the FSVQ class will be
represented by OT-FSVQ, due to its superior performance. In
the following MLVQ denotes memoryless VQ.
B. Performance for Noiseless Channels
An important aspect of the design of a safety-net extended
memory-based VQ system is what codebook sizes that should
be assigned to the safety-net VQ and the memory-based VQ,
respectively. We have investigated the performance of a 20-bit
SN-PVQ for five different constellations, and the results of the
simulations are depicted in Fig. 8. These results indicate that
4 In this study, we are mostly interested in the relative performancebetween different quantization schemes and the size of the evaluation setis considered sufficient for this purpose. Note, as mentioned in Section I, thatthe performance in absolute figures may differ if another speech material isused.
Fig. 9. Performance of the VQs in terms of average spectral distortion.The curves correspond to: A) memoryless VQ; B) OT-FSVQ; C) PVQ; D)DCVQ,; and E) SN-PVQ.
the best performance is obtained by choosing the safety-net
size to be somewhere between 2550% of the total size. This
experiment, together with the discussions in Sections IV-A
and V-C, motivate us to use a mix coefficient of 50% in all
experiments.
In Fig. 9, the average SD for the investigated coding
schemes is plotted as a function of the number of bits used. For
the safety-net configurations, one bit was used to determine the
chosen codebook. From this figure it is clear that all memory-
based VQ methods can utilize the interframe correlation and
achieve performance significantly better than the memoryless
VQ. Among the memory-based methods, the SN-PVQ is
clearly the best in these simulations followed by DCVQ and
PVQ and last OT-FSVQ. When employing the SN-PVQ the
required rate can be reduced by 45 bits/vector compared to
the memoryless VQ without reduction in performance. It can
also be seen that if a memory-based scheme is extended with
a safety-net, approximately 1 bit is gained. If we compare theresults with what was theoretically predicted in Section III-C,
we can conclude that with SN-PVQ performance close to the
predicted is achieved.
Differences in the analysis conditions and databases make it
difficult to compare our results to other similar work. However,
we can, if the analysis conditions are similar, compare the
relative improvement of using a memory-based VQ scheme
compared to a memoryless VQ. For OT-FSVQ, we compare
our results to the results by Hussain and Farvardin in [40].
They report a performance gain of slightly less than 3 bits
for the OT-FSVQ, which is very close to what is obtained in
this work. For the case of PVQ it is more difficult to find a
comparable investigation. For example, Loo and Chan in [35],[36] report a gain of 56 bits for PVQ, but for a completely
different coding situation than the one in this work.
In Table III, the performance both in average SD as well as
outlier percentage is depicted for all five investigated coding
methods at 24 bits. As expected, the introduction of a safety-
net VQ does not only decrease the average distortion but also
the number of outliers.
C. Performance for Noisy Channels
In the preceding section, we have verified that a number of
memory-based VQ schemes outperform conventional mem-
8/3/2019 Inter Frame Lsf
13/15
ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 507
TABLE IIIQUANTIZER PERFORMANCE AT 24 BITS AND BIT ERROR RATE q OF 0% AND 0.5%
Fig. 10. Performance of the VQs at 24 bits in terms of average spectraldistortion as a function of bit error rate. The curves correspond to: (a)memoryless VQ, (b) OT-FSVQ, (c) PVQ, (d) DCVQ, and (e) SN-PVQ.
TABLE IVS D COMPARISON OF SN-PVQ, PVQ, AND
MEMORYLESS VQ FOR DIFFERENT BIT ERROR RATES
oryless VQs under noiseless conditions. However, in order
to be useful for practical applications, it is essential that the
coding scheme can cope with channel noise. Therefore we
have performed a study of the behavior under noisy conditions.
Here we assume a memoryless binary symmetric channel with
bit error probability . For all vector quantizers, procedures
to improve the index assignment are applied, as described in
Section V-A.
The performance for all methods under equal noisy con-
ditions at 24 bits in terms of average SD are depicted in
Fig. 10. From the curves in Fig. 10 we conclude that SN-
PVQ is better than all other methods for all tested error rates.
The other memory-based methods only perform better than
the memoryless for small error probabilities. OT-FSVQ is the
scheme in this investigation that is most sensitive to channel
errors. Again we see that the introduction of a safety-net VQ
clearly improves performance. In Table III average SD and
outlier percentage is presented for . Even though low
values of average SD are achieved for some of the methods,
the number of outliers caused by bit errors are high and hence
the distortion is clearly audible.
Fig. 11. Synthetic speech production for the listening tests.
For high error probabilities, the average SD is higher for
all methods than what can be accepted in most applications.However, the results for high error rates can be significantly
improved if channel coding is applied, see for instance [13].
We have also found that the gain of using larger codebooks is
almost negligible for high error rates. Thus, if more bits can
be used it is more efficient to use them for channel coding
than increasing the codebook sizes.
Another interesting comparison is given in Table IV. Here
we compare a 20-bit SN-PVQ, with a 21-bit PVQ and a 24-bit
memoryless VQ, which all perform approximately equal with-
out noise. The results in Table IV lead to the conclusion that
a saving of 4 bits compared to a memoryless VQ can be
obtained for all tested error rates by SN-PVQ. Compared to a
PVQ scheme, an improvement of at least 1 bit is achievable.
Note that the performance degrades more for the PVQ system
when the bit error rate is increased, compared to the other two
methods. Hence, for large error probabilities the PVQ loses
more than 1 bit compared to SN-PVQ.
D. Subjective Evaluation
We have performed listening tests to verify the objective
results in the previous subsection. In the test, a 20-bit SN-
PVQ was compared to a 24-bit memoryless VQ. The coders
were compared both for noiseless conditions and for a bit
error rate of 1%.
A diagram of the model for studying the effects of quantiza-tion of the LSF parameters is shown in Fig. 11. A prediction
residual is formed by filtering the speech signal using an
unquantized prediction filter, and synthetic speech is generated
by exciting a quantized inverse prediction filter with the undis-
torted residual. In this way, the effects of LSF quantization can
be studied separately from any encoding of the residual.
Twelve short Swedish sentences uttered by male and female
speakers are encoded by the memoryless VQ and the SN-
PVQ, with and without channel noise. The sentences were
pairwise compared, including some comparisons with the
uncoded original sentences. Twelve test persons listened with
headphones to each pair (a total of 60 pairs) and were asked to
indicate a preference for either the first or the second sentence.The listening tests revealed that for a noiseless channel, the
20-bit SN-PVQ was preferred to the 24-bit memoryless VQ
in 58% of the comparisons. For a channel with 1% bit errors,
the result is very clear: A 20-bit SN-PVQ was preferred to a
24-bit memoryless VQ in 78% of the comparisons. Statistical
tests verified that at confidence level of 95%, the SN-PVQ is
preferred to the memoryless VQ, both for the noisy and the
noiseless case.
The outcome from the listening tests show that the objec-
tive performance advantage for the SN-PVQ over a standard
memoryless VQ also holds in subjective tests.
8/3/2019 Inter Frame Lsf
14/15
508 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999
VII. CONCLUSIONS
The most important results of the experiments can be
summarized in the following points.
A memory-based LSF quantizer has an advantage of
35 bits over memoryless VQ for error-free transmission.
The SN-PVQ method is the best in this work, with an
advantage of 45 bits.
A safety-net extension of an existing memory-based VQcan improve the performance with 12 bits for error-free
transmission. For transmission over noisy channels, the
performance gain is even larger.
For noisy channels, conventional memory-based VQ
methods rapidly lose the advantage over memoryless
VQ. However, the proposed safety-net extension of the
memory-based VQ algorithms improves the performance,
and the SN-PVQ method is similar to the memoryless
VQ for all tested error probabilities, with 4 bits less.
The above objective results are further strengthened by
subjective tests of speech quality. In the listening tests, a 20-
bit SN-PVQ was preferred to a 24-bit memoryless VQ in 58%
of the evaluated sentences for a noiseless channel, and in 78%of the sentences for a channel with 1% bit error rate.
All results in this work are derived for 20 ms frames,
windowed with an overlap of 2.5 ms on both sides. The
difference between memory-based and memoryless methods
will increase if the frame length is decreased, or if the overlap
between frames is increased. The performance for all methods
in general, and the memory-based methods in particular, will
also improve if the channel noise distribution is assumed to
be known, and channel optimization procedures can be used.
REFERENCES
[1] R. Viswanathan and J. Makhoul, Quantization properties of transmis-sion parameters in linear predictive systems, IEEE Trans. Acoust.,Speech, Signal Processing, vol. ASSP-23, pp. 309321, 1975.
[2] A. H. Gray, Jr. and J. D. Markel, Quantization and bit allocation inspeech processing, IEEE Trans. Acoust., Speech, Signal Processing,vol. ASSP-24, pp. 459473, 1976.
[3] F. Itakura, Line spectrum representation of linear predictive coefficientsof speech signals, J. Acoust. Soc. Amer., vol. 57, suppl. 1, p. S35(A),1975.
[4] F. K. Soong and B.-H. Juang, Line spectrum pair (LSP) and speechdata compression, in Proc. IEEE Int. Conf. Acoustics, Speech, SignalProcessing, San Diego, CA, 1984, pp. 1.10.11.10.4.
[5] N. Sugamura and N. Farvardin, Quantizer design in LSP speechanalysis-synthesis, IEEE J. Select. Areas Commun., vol. 6, pp. 432440,1988.
[6] N. Farvardin and R. Laroia, Efficient encoding of speech LSP param-eters using the discrete cosine transformation, in Proc. IEEE Int. Conf.
Acoustics, Speech, Signal Processing, Glasgow, U.K., 1989, vol. 1, pp.168171.
[7] R. Hagen and P. Hedelin, Low bit-rate spectral coding in CELP, a newLSP-method, in Proc. IEEE Int. Conf. Acoustics, Speech and SignalProcessing, Albuquerque, NM, 1990, pp. 189192.
[8] F. K. Soong and B.-H. Juang, Optimal quantization of LSP parameters,IEEE Trans. Speech Audio Processing, vol. 1, pp. 1524, 1993.
[9] A. Buzo, A. H. Gray, Jr., R. M. Gray, and J. D. Markel, Speech codingbased upon vector quantization, IEEE Trans. Acoust., Speech, SignalProcessing, vol. ASSP-28, pp. 562574, 1980.
[10] J. Grass and P. Kabal, Methods of improving vector-scalar quantizationof LPC coefficients, in Proc. IEEE Int. Conf. Acoustics, Speech, SignalProcessing, Toronto, Ont., Canada, 1991, pp. 657660.
[11] R. Laroia, N. Phamdo, and N. Farvardin, Robust and efficient quanti-zation of speech LSP parameters using structured vector quantizers, inProc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Toronto,Ont., Canada, 1991, pp. 641644.
[12] P. Hedelin, Single-stage spectral quantization at 20 bits, in Proc. IEEEInt. Conf. Acoustics, Speech and Signal Processing, Adelaide, Australia,1994, vol. 1, pp. 525528.
[13] K. K. Paliwal and B. S. Atal, Efficient vector quantization of LPCparameters at 24 bits/frame, IEEE Trans. Speech Audio Processing,vol. 1, pp. 314, 1993.
[14] B.-H. Juang and A. H. Gray, Jr., Multiple stage vector quantization forspeech coding, in Proc. IEEE Int. Conf. Acoustics, Speech and SignalProcessing, Paris, France, 1982, pp. 597600.
[15] W. P. LeBlanc, B. Bhattacharya, S. A. Mahmoud, and V. Cuperman,
Efficient search and design procedures for robust multi-stage VQ ofLPC parameters for 4 kb/s speech coding, IEEE Trans. Speech AudioProcessing, vol. 1, pp. 373385, 1993.
[16] Y. Shoham, Vector predictive quantization of the spectral parametersfor low rate speech coding, in Proc. IEEE Int. Conf. Acoustics, Speech,Signal Processing, Dallas, TX, 1987, vol. 4, pp. 21812184.
[17] M. Yong, G. Davidsson, and A. Gersho, Encoding of LPC spectralparameters using switched-adaptive interframe vector prediction, inProc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, NewYork, NY, 1988, vol. 1, pp. 402405.
[18] S. Wang, E. Paksoy, and A. Gersho, Product code vector quantization ofLPC parameters, in Speech and Audio Coding for Wireless and Network
Applications, B. Atal, V. Cuperman, and A. Gersho, Eds. Boston, MA:Kluwer, 1993, pp. 251258.
[19] V. Cuperman and A. Gersho, Vector predictive coding of speech at 16kbits/s, IEEE Trans. Commun., vol. COMM-33, pp. 685696, 1985.
[20] J. Foster, R. M. Gray, and M. O. Dunham, Finite-state vector quanti-
zation for waveform coding, IEEE Trans. Inform. Theory, vol. 31, pp.348359, 1985.[21] M. O. Dunham and R. M. Gray, An algorithm for the design of labeled-
transition finite-state vector quantizers, IEEE Trans. Commun., vol.COMM-33, pp. 8389, 1985.
[22] J. Linden, Interframe quantization of spectrum parameters in speechcoding, Licent. thesis, Tech. Rep. 235L, Chalmers Univ. Technol.,1996.
[23] K. K. Paliwal and W. B. Kleijn, Quantization of LPC parameters,in Speech Coding and Synthesis, W. B. Kleijn and K. K. Paliwal, Eds.New York: Elsevier, 1995, pp. 433466.
[24] V. Cuperman and A. Gersho, Adaptive differential vector codingof speech, in Conf. Rec. GlobeCom, Miami, FL, 1982, vol. 3, pp.10921096.
[25] T. R. Fischer and D. J. Tinnin, Quantized control with differential pulsecode modulation, in Proc. Conf. Decision and Control, Orlando, FL,1982, vol. 3, pp. 12221227.
[26] P.-C. Chang and R. M. Gray, Gradient algorithms for designing predic-tive vector quantizers, IEEE Trans. Acoust., Speech, Signal Processing,vol. ASSP-34, pp. 679690, 1986.
[27] R. A. Wiggins and E. A. Robinson, Recursive solution to the multi-channel filtering problem, J. Geophys. Res., vol. 70, pp. 18851891,1965.
[28] J.-H. Chen and A. Gersho, Covariance and autocorrelation methods forvector linear prediction, in Proc. IEEE Int. Conf. Acoustics, Speech andSignal Processing, Dallas, TX, 1987, pp. 15451548.
[29] J. Skoglund and J. Linden, Predictive VQ for noisy channel spectrumcoding: AR or MA?, in Proc. IEEE Int. Conf. Acoustics, Speech andSignal Processing, Munich, Germany, 1997, vol. 2, pp. 13511354.
[30] H. Ohmuro, T. Moriya, K. Mano, and S. Miki, Coding of LSPparameters using interframe moving average prediction and multi-stagevector quantization, in Proc. IEEE Workshop on Speech Coding forTelecommunications, Quebec, P.Q., Canada, 1993, vol. 1, pp. 6364.
[31] W. P. LeBlanc, C. Liu, and V. Viswanathan, An enhanced full ratespeech coder for digital cellular applications, in Proc. IEEE Int. Conf.
Acoustics, Speech and Signal Processing, Atlanta, GA, 1996, vol. 1, pp.569572.
[32] A. Kataoka, J. Ikedo, and S. Hayashi, LSP and gain quantization forthe proposed ITU-T 8-kb/s speech coding standard, in Proc. IEEEWorkshop on Speech Coding for Telecommunications, Annapolis, MD,1995, vol. 1, pp. 78.
[33] C.-C. Kuo, F.-R. Jean, and H.-C. Wang, Low bit-rate quantization ofLSP parameters using two-dimensional differential coding, in Proc.
IEEE Int. Conf. Acoustics, Speech and Signal Processing, San Francisco,CA, 1992, vol. 1, pp. 97100.
[34] E. Erzin and A. E. Cetin, Interframe differential vector coding of linespectrum frequencies, in Proc. IEEE Int. Conf. Acoustics, Speech andSignal Processing, Minneapolis, MN, 1993, vol. 2, pp. 2528.
[35] J. H. Y. Loo, W.-Y. Chan, and P. Kabal, Classified nonlinear predictivevector quantization of speech spectral parameters, in Proc. IEEE Int.Conf. Acoustics, Speech, and Signal Processing, Atlanta, GA, 1996, vol.
8/3/2019 Inter Frame Lsf
15/15
ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 509
2, pp. 761764.[36] J. H. Y. Loo and W. Y. Chan, Nonlinear predictive vector quantization
of speech spectral parameters, in Proc. IEEE Workshop on SpeechCoding for Telecommunications, Annapolis, MD, 1995, vol. 1, pp.5152.
[37] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression.Boston, MA: Kluwer, 1991.
[38] J. Foster and R. M. Gray, Finite-state vector quantizers for waveformcoding, in Proc. IEEE Int. Symp. Information Theory, New York, NY,1982, vol. 1, pp. 134135.
[39] R. Aravind and A. Gersho, Image compression based on vectorquantization with finite memory, Opt. Eng., vol. 26, pp. 570580, 1987.[40] Y. Hussain and N. Farvardin, Finite-state vector quantization over noisy
channels and its application to LSP parameters, in Proc. IEEE Int. Conf.Acoustics, Speech and Signal Processing, San Francisco, CA, 1992, vol.2, pp. 133136.
[41] S. Bruhn, Matrix product quantization for very-low-rate speech cod-ing, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing,Detroit, MI, 1995, vol. 1, pp. 724727.
[42] C. S. Xydeas and C. Papanastasiou, Efficient coding of LSP parametersusing split matrix quantization, in Proc. IEEE Int. Conf. Acoustics,Speech and Signal Processing, Detroit, MI, 1995, vol. 1, pp. 740743.
[43] N. Phamdo and N. Farvardin, Coding of speech LSP parameters usingTSVQ with interblock noiseless coding, in Proc. IEEE Int. Conf.
Acoustics, Speech and Signal Processing, Albuquerque, NM, 1990, pp.193196.
[44] D. L. Neuhoff and N. Moayeri, Tree searched vector quantizationwith interblock noiseless coding, in Proc. Conf. Information ScienceSystems, 1988, pp. 781783.
[45] S. Bruhn, Efficient interblock noiseless coding of speech LPC parame-ters, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing,Adelaide, Australia, 1994, vol. 1, pp. 501504.
[46] D. P. Kemp, J. S. Collura, and T. E. Tremain, Multi-frame coding ofLPC parameters at 600800 bps, in Proc. IEEE Int. Conf. Acoustics,Speech and Signal Processing, Toronto, Ont., Canada, 1991, vol. 1, pp.609612.
[47] C. S. Xydeas and K. K. M. So, A long history quantization approachto scalar and vector quantization of LSP coefficients, in Proc. IEEE
Int. Conf. Acoustics, Speech and Signal Processing, Minneapolis, MN,1993, vol. 2, pp. 14.
[48] M. A. Ferrer-Ballester and A. R. Figueiras-Vidal, Efficient adaptivevector quantization of LPC parameters, IEEE Trans. Speech AudioProcessing, vol. 3, pp. 314317, 1995.
[49] T. Berger, Rate Distortion Theory. Englewood Cliffs, NJ: Prentice-Hall, 1971.
[50] R. M. Gray, Source Coding Theory. Boston, MA: Kluwer, 1990.[51] Y. Linde, A. Buzo, and R. M. Gray, An algorithm for vector quantizerdesign, IEEE Trans. Commun., vol. COMM-28, pp. 8495, 1980.
[52] T. Eriksson, J. Linden, and J. Skoglund, A safety-net approach forimproved exploitation of speech correlations, in Proc. Int. Conf. DigitalSignal Processing, Cyprus, 1995, vol. 1, pp. 96101.
[53] , Vector quantization of glottal pulses, in Proc. 4th Europ. Conf.Speech Communication and Technology, Madrid, Spain, 1995, vol. 1,pp. 225228.
[54] , Exploiting interframe correlation in spectral quantizationAstudy of different memory VQ schemes, in Proc. IEEE Int. Conf.
Acoustics, Speech and Signal Processing, Atlanta, GA, 1996, vol. 2,pp. 765768.
[55] H. Zarrinkoub and P. Mermelstein, Switched prediction and quantiza-tion of LSP frequencies, in Proc. IEEE Int. Conf. Acoustics, Speechand Signal Processing, Atlanta, GA, 1996, vol. 2, pp. 757760.
[56] E. Shlomot, Delayed decision switched prediction multi-stage LSFquantization, in Proc. IEEE Workshop on Speech Coding for Telecom-munications, Annapolis, MD, 1995, vol. 1, pp. 4546.
[57] B. S. Atal, R. V. Cox, and P. Kroon, Spectral quantization andinterpolation for CELP coders, in Proc. IEEE Int. Conf. Acoustics,Speech and Signal Processing, Glasgow, U.K., 1989, pp. 6972.
[58] S. L. DallAgnol, A. Alcaim, and J. R. B. de Marca, Performance ofLSF vector quantizers for VSELP coders in noisy channels, Eur. Trans.Telecommun., vol. 5, pp. 553563, 1994.
[59] R. Hagen and P. Hedelin, Robust vector quantization in spectral cod-ing, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing,Minneapolis, MN, 1993, vol. 2, pp. 1316.
[60] N. Farvardin, A study of vector quantization for noisy channels, IEEETrans. Inform. Theory, vol. 36, pp. 799809, 1990.
[61] K. Zeger and A. Gersho, Pseudo-Gray coding, IEEE Trans. Commun.,vol. 38, pp. 21472158, 1990.
[62] P. Hedelin, P. Knagenhjelm, and M. Skoglund, Vector quantization forspeech transmission, in Speech Coding and Synthesis, W. B. Kleijn and
K. K. Paliwal, Eds. New York: Elsevier, 1995, pp. 311345.[63] P. Knagenhjelm and E. Agrell, The Hadamard transformA tool for
index assignment, IEEE Trans. Inform. Theory, vol. 42, pp. 11391151,1996.
[64] T. Eriksson, J. Linden, and J. Skoglund, Improvements of memoryvector quantization for noisy channel transmission of LSF parameters,in Proc. Radio Vetenskap och Kommunikation, Lulea, Sweden, 1996,vol. 1, pp. 370374.
[65] J. Linden and J. Skoglund, Channel optimization of predictive VQfor spectrum coding, in Proc. IEEE Workshop on Speech Coding for
Telecommunications, Pocono Manor, PA, 1997, pp. 9394.[66] N. S. Jayant and P. Noll, Digital Coding of Waveforms. EnglewoodCliffs, NJ: Prentice-Hall, 1984.
[67] K.-Y. Chang and R. W. Donaldson, Analysis, optimization, and sen-sitivity study of differential PCM systems operating on noisy com-munication channels, IEEE Trans. Commun., vol. 20, pp. 338350,1972.
[68] P. Noll, On predictive quantization schemes, Bell Syst. Tech. J., pp.14991532, 1978.
Thomas Eriksson was born in Skovde, Sweden, in1964. He received the M.S. degree in electrical engi-neering in 1990, and the Ph.D. degree in informationtheory in 1996, both from Chalmers University ofTechnology, Goteborg, Sweden.
From 1990 to 1996, he was with the Depart-ment of Information Theory, Chalmers Universityof Technology. From 1997 to 1998, he was at AT&TLabsResearch, Florham Park, NJ, and in 1998 and1999 he was working on a joint research projectwith the Royal Institute of Technology and Ericsson
Radio Systems AB, both in Stockholm, Sweden. He is currently an AssociateProfessor at the Department of Signals and Systems, Chalmers University ofTechnology, where his main research interests are vector quantization andspeech coding.
Jan Linden (S92M98) was born in Goteborg,Sweden, in 1966. He received the M.S. degree inelectrical engineering, the Licentiate of Engineer-
ing, and the Ph.D. degree in information theoryfrom Chalmers University of Technology, Goteborg,Sweden, in 1991, 1996, and 1998, respectively.
From 1992 to 1998, he was a Research andTeaching Assistant at the Department of Informa-tion Theory, Chalmers University of Technology.His research at Chalmers includes low bit ratespeech coding based on glottal pulse modeling and
memory-based vector quantization for noisy channels. He is currently a Post-Doctoral Researcher at the University of California, Santa Barbara (UCSB),and a Research Engineer at SignalCom, Inc., Goleta, CA. His researchat UCSB is focused on audio coding and wideband speech coding, andat SignalCom he is working on algorithm development for speech codingapplications.
Jan Skoglund (SM93M98) was born inGoteborg, Sweden, in 1967. He received the M.S.degree in electrical engineering, the Lic. Eng.,and the Ph.D. degree in information theory fromChalmers University of Technology, Goteborg,Sweden, in 1992, 1996, and 1998, respectively.His Ph.D. dissertation addressed different aspectsof speech coding such as spectrumquantization,pulse excitation modeling, and perceptual coding.
From 1992 to 1998 he was with the Departmentof Information Theory, Chalmers University of
Technology. Since 1999, he is a Consultant at the Speech and ImageProcessing Service Research Laboratory, AT&T Labs-Research, ShannonLaboratory, Florham Park, NJ, where he is working on low bit rate speechcoding.