57
NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守守 守守 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

Embed Size (px)

Citation preview

Page 1: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

2005.12.16

NTT Communication Science Labs.

Takehiro Moriya 守谷 健弘

Coding Technologies for Speech

and Audio Signals

ISPACS 2005

Page 2: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Self introduction

• 1980 Joined NTT, Basic research– Transform domain interleave VQ– Conjugate VQ

• 1989 guest researcher at AT&T Bell Labs• 1990 Standardization for Japanese PDC (PSI-CELP)• 1993 Standardization for ITU-T (CS-ACELP)• 1995 Standardization for MPEG-4 (TwinVQ)• 2001 Standardization for MPEG lossless audio

Page 3: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

512256128

6432168

42

1980 1985 1990 1995

PARCORLSP

APC-AB

VSELP

G.711G.726 G.728

G.722

MPEG-1CD, DAT

MPEG-4

1975 2000

MPEG-2

1024

bit rate [kbit/s]

2005year

MP3AAC

Technologies of speech and audio coding

mobilevocoder

music

telephone

mobile phone

streaming

archive

ubiquitous

VoIP/mobile

PSI-CELPG.729

MPEG-4(lossless)

wideband

Page 4: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Outline

• 1. Fundamentals– 1.1 Time domain for speech– 1.2 Frequency domain for audio

• 2. Standardization– 2.1 ITU-T speech coding– 2.2 MPEG audio coding

• 3. Hot topics– 3.1 MPEG lossless (ALS, SLS, DTS)– 3.2 MPEG SBR and SSC– 3.3 MPEG surround

Page 5: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Fundamentals

Page 6: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Category of coding

coding

compression

presentationmetadata

speech

language

lossless

lossy

time-domain

frequency-domain

text

speech

audioimagevideo

Page 7: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Time-domain

• linear prediction -> CELP

• predictive coefficients– PARCOR (partial auto correlation)

– LSP (line spectral pair)

• vector quantization of excitation source– algebraic structure (ACELP)

• Big market for cellular phone and VoIP

Page 8: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

LPC (Linear Predictive Coding)

Σ

Z-1 α 1

synthesized output

excitation(innovation)(prediction residual)

Z-1 α 2

Z-1 α p

・・

predictive coefficients

Page 9: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Family of LPC parameters

predictive coefficients

α1 .... αp

PARCOR coefficientsk1 .... kp

LSP parameters

ω1 .... ωp

frequencyω 1ω 2 ω p

merits of LSP•stability•interpolation•quantization•prediction

Page 10: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

CELP (Code Excited Linear Prediction)

adaptivecodebook(periodic)

randomcodebook

(noise, pulse)

+ LPCsynthesis

perceptualerror

LSPparameter

Feedback (analysis by synthesis)

gain

input

Page 11: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Synthesis model for vocoder

pitch intervalgain

( random)

synthesisfilterΣ

Page 12: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Synthesis model for multi-pulse

pitch intervalgain

amplitude and position of pulse

Σsynthesis

filter

Page 13: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Synthesis model for regular multi-pulse

pitch intervalgain

amplitude of regular pulse

Σsynthesis

filter

Page 14: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Synthesis model for CELP

pitch intervalgain

selection of code vector

Σ

・・・・・・・

synthesisfilter

Page 15: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Synthesis model of VSELP

pitch intervalgain

polarity of base vector

Σ+/-

+/-

+/-

・・・・・・・ ・・・

synthesisfilter

Page 16: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Synthesis model for CS-CELP

pitch intervalgain

selection of vector pair

Σ+/-

・・・・・・・

+/-

+/-

synthesisfilter

Page 17: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Synthesis model of ACELP

pitch interval gain

selection of unit pulse position

Σ+/-

+/-

+/-

+/-

+/-

synthesisfilter

Simplicity is the seal of truth

Page 18: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Frequency-domain

• Lapped transform: MDCT– Without frame noise nor information loss due to overlap

• Filter bank: QMF– compromises time and frequency

• adaptive noise control

• psycho-acoustics

Page 19: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Transform coding

Transformtime to

frequency

envelopeestimation

quantization

input

Transformfrequency to

time

Adaptivebit allocation

output

Side information

Page 20: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Base of DCT

freq

uenc

y

time

Page 21: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Base of MDCT0verlap

withprevious

frame

0verlap with

next frame

symmetryanti-symmetry

Page 22: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

frequency32 band QMF filter bank (analysis)

QMF for MPEG1,2 Layer-I, II

frequency

32 band QMF filter bank (synthesis)

…..

…..

•down sample•adaptive bit allocation for 32 equal bands (energy, masking)•adaptive quantization

reconstructionbit stream

Page 23: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

frequency32 band QMF filter bank (analysis)

QMF for MPEG1,2 Layer-III

frequency

32 band QMF filter bank (synthesis)

…..

…..

•down sample•long and short MDCT•adaptive bit allocation for Bark-scale (energy, masking)•adaptive quantization (Huffman coding), bit reservoir

reconstructionbit stream

Page 24: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

frequency32 band QMF filter bank (analysis)

QMS for MPEG extension tools

frequency

32 band QMF filter bank (synthesis)

…..

…..

•SBR (Spectral Band Replication)•PS (Parametric Stereo)•Surround

reconstructionbit stream

Page 25: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Masking effect

original spectrum

allowable noise level

audible level

log

spec

trum

frequency

masked region

Page 26: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Physical and perceptual distortion

un-noticeable(masking)

result of compression

additive noise

un-noticeableregion

original

additive echo

characteristics of perception application

Page 27: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Distortion by additional noise

original

distortion

original

noticeabletime

frequency

log

spec

trum

distortion

Page 28: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Distortion by data compression

control quantization noisedistortion is masked

original

frequency

distortion

time

original

distortion

log

spec

trum

Page 29: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Distortion by echo

echo is masked watermark

search or recognition

time

40 ms

original

frequency

distortion

log

spec

trum

original

distortion

Page 30: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Predictive coding and transform coding

small correlation

effect

gain

large correlation

method

unpredictable flat spectrum

prediction gain transform gain

waveform energyresidual energy

arithmetic meangeometric mean

predictable varied spectrum

closed-loop quantizationadaptive bit allocationweighted quantization

time-domain(prediction)

frequency-domain(transform, subband)

Speech (5 ms) Audio (30 ms)

=

Page 31: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Standards

Page 32: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Example of standard

• ITU-T– cellular phone– VoIP– TV-phone– FAX

• ISO/IEC JPEG, MPEG– digital camera, video– digital broadcasting– portable music player, DVD

Page 33: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Merits of standard

• interoperability

• open source– long term maintenance– visible patent holders

• Integration of the highest technologies

• cost reduction by mass production

market creation

Page 34: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

patent pool

disclosure of technologypatent

standardserviceproduct

marketresearch

R & Dbasic research

service andproducts

cost reduction

users

royalty

Circulatory evolution of market

competition

convenient

Page 35: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Standardization for speech

• ITU-T   G.• IMT-2000 (International Mobile Telecommunication)• GSM (European, Asia)• TIA (North America)• US FS-1015 (LPC-10), 1016 (CELP), 1017 (MELP)• Japanese Cellular

- PDC full/half rate- PHS- cdmaOne- PDC enhanced full rate

Page 36: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

ITU-T standard for speech

• Telephone band (8 kHz sample)– G.711 PCM 64 kbit/s– G.726 ADPCM 32 kbit/s (16,24,40 kbit/s)– G.727   Embedded ADPCM   32 kbit/s (16,24,40 kbit/s)– G.728   Low-delay CELP      16 kbit/s– G.723.1 ACELP/MPC-MLQ 5.3/6.3 kbit/s– G.729 CS-ACELP          8 kbit/s

• Wide band (16 kHz sample)– G.722 SB-ADPCM 64, 56, 48 kbit/s– G.722.1 Transform coding 24, 32 kbit/s  – G.722.2 AMR-WB 6.6 - - 24 kbit/s

Page 37: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Standard for IMT-2000

• 3GPP  (3rd Generation Partnership Project) (ARIB, TTC, T1, ETSI,TTA )

• 3GPP2• bi-directional CODEC 

AMR (Advanced Multi Rate)AMR-WB (wide band)

• video phone (H.263)• Audio/Low rate speech• packet transmission (MPEG-4)

Page 38: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Bandwidth and bitrate for audio coding

24 48 96 192 384 768

18

12

6

0

24

MPEG-4  MPEG-1

MPEG-2,1/2sample

MPEG-2multi-channel

AC-3,AAC

CDDAT

Rate[kbit/s]

band

wid

th [

kHz]  MD

Page 39: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Basic technology for audio coding

Transform

MPEG-1 L1,2 subband adaptive bit

MPEG-1 L3 subband+MDCT adaptive+Huffman

ATRAC subband+MDCT adaptive bit

AC-3 MDCT adaptive+Huffman

AAC MDCT

TwinVQ MDCT adaptive VQ

adaptive+Huffman

Quantization

Page 40: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

MPEG-1 , 2/audio

• MPEG-1 – sampling rate: 32, 44.1, 48 kHz stereo– algorithm:

Layer-I 32 band splitLayer-II + improved quantizerLayer-III + MDCT + Variable length + bit reservoir ++

• MPEG-2– low sampling rate 16, 22.05, 24 kHz– multi channel 5.1ch– backward compatibility

Page 41: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

MPEG-2/AAC

• 3 profiles-main, -LC (Low Complexity),-SSR (Scalable Sampling Rate)

• sampling rate: 32, 44.1, 48 kHz, +X2, X1/2, X1/4

• channel: 1-48

• bit rate: 8-576 kbit/s/ch

• MDCT 1024 or 128

• TNS (Time domain Noise Shaping)• MS (Middle-Side) stereo/intensity stereo• non-linear scale quantizer + variable length code

(2 and 4 dimension Huffman code)

Page 42: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Tools in MPEG-4 audio

Low rate speech   HVXC (Harmonic Vector eXcitation Coder)

Speech (narrow/wide)   CELPLow rate audio   TwinVQ (Transform domain Weighted Interleave VQ)

Audio   MPEG-2 AAC   (Advanced Audio Coder)Error resilient frameworkParametric audio coding HILNFine granular scalable audio coding BSACLow delay audio coding LD-AACLow overhead Audio Transport LATM

Page 43: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

MPEG-4 General audio

IMDCTLTPTNS

stereo codingscalability

output

common toolsinterleave VQ

for MDCT

scale factorHuffman coding

scale factorBit-slice arithmetic

TwinVQ

AAC

BSAC

Page 44: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Audio Demo (low rate)

• ITU-T G.711 64 kbit/s

• ITU-T G.726 32 kbit/s

• ITU-T G.728 16 kbit/s

• ITU-T G.729 8 kbit/s

• PDC Full 6.7 kbit/s

• PDC Half 3.45 kbit/s

• MPEG4 HVXC 2 kbit/s

• MPEG4 TwinVQ 8 kbit/s

Page 45: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Hot Topics

Page 46: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Background of lossless coding

• Demand for lossless compression of audio– archiving analog and digital contents – delivery over broadband network– high quality audio format

• up to 24 bit 192 kHz sampling

– multi-channel • medical data, seismic data, sensor array, etc.

• MPEG-4 extension– official tools (open source)– inter operability (good for over 100 years)

Page 47: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Family of MPEG lossless

• ALS– one-step compression in time domain

• SLS– scalable to lossless from MPEG lossy core– fine grain scalability in frequency domain– Integer MDCT

• DTS– 1-bit oversample format– compatible with Sony-Philips SACD format

Page 48: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Property of ALS

• Time domain adaptive prediction– simple to high-performance backward prediction– BGMC for prediction residual– Golomb-Rice Code for PARCOR– Progressive order prediction– Long-term prediction– Hierarchical block switching

• extension– Floating-point support– Multi-channel predictive coding

Page 49: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Prediction residual

time

ampl

itud

e

Original wave

Prediction residualwave

Page 50: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Predictive coding

vocoder

waveform coding

lossless coding

compressionratio1/30

ratio1/10

ratio1/2

input residual

prediction

synthesis

parameters

pulse interval

all residual

codebook forresidual

magnify30 times

different framework rich commonality

Page 51: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

45

46

47

48

49

50

0 5 10 15averaged decoding time for 30 sec files (48,96,192 kHz)

[%]

45

46

47

48

49

50

20 40 60 80 100 120 140[sec]

com

pres

sion

rat

io

Monkey’s Audio (free Software)

OptimFrog (free Software)

MPEG-4 SLS

[%]ALS(reference decoder)

ALS( high-

compression)

ALS(enhanced decoder)

Compression and decoding time

Page 52: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

24 48 72 96 120 144stereo bit rate [kbit/s]

rela

tive

qual

ity

MP3AACHE-AACHE-AAC V2

Quality improvements by SBR and PS

AAC

AAC profile

SBR

PS

HE-AAC profile

HE-AAC V2 profile

Japanese digital broadcasting (2003)

Japanese mobile digital broadcasting (2006)

Page 53: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

MPEG SBR (HE-AAC)

AAC stereo

encoder

AAC stereo

decoder

AAC stereobit steam

low-pass output

downsample

high frequency analysis

(Spectral Band Replication)

SBR bit steam

full-band output

full-band input

high frequency synthesis

envelopeexcitation

low-pass input

Page 54: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

MPEG SBR+PS (HE-AAC v2)

AAC monauralencoder

AAC monauraldecoder

AAC monauralbit steam

monaural output

monaural input

mixdown

stereo output

PS(parametric stereo)

analysis

PSbit stream

PS(parametric stereo)

synthesis

stereo input

Channel level differencesInter channel correlation

Page 55: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

MPEG surround

AACstereo encoder

AAC stereo

decoder

AAC stereo bit stream

stereo output

stereo input

mix-down

surround analysis

surround bit stream

5-ch output

5-ch input

surround synthesis

Channel level differencesInter channel correlationChannel prediction coefficients

Page 56: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

1992 1994 1996 1998 2000 2002 2004 2006

MPEG-1

MPEG-2MC/LSF

MPEG-2AAC

MPEG-4V1 V2 2001

SBR

SSC

MP3 on 4

2005 DST

ALS

SLS

History of MPEG Audio

surround

lossless

forward andbackward

compatibility

*Multi-channel and Low Sampling Frequency

Page 57: NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005

NTT Labs. 2005

Future challenge

• Open problems– all-mighty coder for both speech and audio at less

than 16 kbit/s– Wave field synthesis (multi-channel)

• Integrated service– video– copyright management