54
1 Digitial Audio Coding Perceptual Coding International Standard MPEG1 Layer3

1 Digitial Audio Coding Perceptual Coding International Standard MPEG1 Layer3

  • View
    229

  • Download
    1

Embed Size (px)

Citation preview

1

Digitial Audio Coding

Perceptual Coding International Standard MPEG1 Layer3

2

Perceptual Coding

減少儲存空間 降低頻寬需求 與其它媒體整合

人耳

被遮罩的音訊

原始音訊

聽不到的音訊

3

Perceptual Coding

聽覺敏感度的不均勻性 Non-uniform sensitivity

聽覺在頻譜上的遮罩性 Spectral masking

聽覺在時軸上的遮罩性 Temporal masking

4

Non-uniform sensitivity

Critical bandBand no. Center freq Bandwidth

1 50 ~100

2 150 100~200

3 250 200~300

4 350 300~400

.

.

.

.

.

.

.

.

.

25 19500 15500~

5

Spectral masking

6

Temporal masking

Temporal masking range

Temporal masking range

Pre-masking and Post-masking

7

Masking curve

0 5 10 15 20 25 30 3545

50

55

60

65

70

75

80Minimum masking threshold

Subband number

dB

8

影音編碼的國際標準

制訂機構 音訊標準 代表應用 定案時間

ITU-T G.728/G.722 Videoconference 1991

ISO ( MPEG-1 )

11172-3 PC multi-media 1992

ISO/ITU-T ( MPEG-2 )

13818-3 Digital movie 1995

ITU-T G723.1 Videophone 1996

ISO ( AAC ) 13818-7 Digital audio 1997

9

ISO MPEG-1之來源

Simplified MUSICAM => MPEG-1 Layer 1 Modified MUSICAM => MPEG-1 Layer 2 Modified ASPEC => MPEG-1 Layer 3 ( MP3 )

Sound quality : ASPEC 優於 MUSICAM Complexity : MUSICAM 簡捷過 ASPEC Coding delay : MUSICAM 短於 ASPEC

10

MPEG-1 Layer I ,II

Scaler and Quantizer

mux

Digital Channel

Dynamic Bit and Scalefactor Allocator and

Coder

Masking Thresholds

Analysis Filterbank

Dynamic Bit and Scalefact

Decoder

FFT

Demux

Dequantizer and Descaler

Synthesis Filterband

PCM INPUT

PCM OUTPUT

11

MPEG-1 Layer III

Scaler and Quantizer

mux

Digital Channel

Coding of Side Information

Masking Thresholds

Analysis Filterbank

Decoding of Side

Information

FFT

Demux

Dequantizer and Descaler

Synthesis Filterband

Inverse MDCT with Dynamic

Widowing

Huffman Coding

Huffman Decoding

MDCT with Dynamic

Windowing

12

MPEG-1 Audio 各 Layer特性MPEG-1 Layer I Layer II Layer III

Analysis/synthesis 32 subbands 32 subbands Subband+MDCT

Output bit-rate 32-448kbps 32-384kbps 32-320kbps

Effcient bit-rate 160-224kbps 96-128kbps 64-96kbps

Sampling freq. 32,44.1,48kHz 32,44.1,48kHz 32,44.1,48kHz

Intensity stereo Yes Yes Yes

Quantization Uniform Uniform Non-uniform

Window Fixed Fixed Dynamic

Entropy coding No No Yes

Slot size 4 bytes 1 bytes 1 bytes

Frame size 384 samples 1152 samples 1152 samples

Bit-allocation representation

Explicit Indexing Indexing

Frame self-decodable Yes Yes No

Suggested psychoacoustic model

Model 1 Model 1 Model 2

13

Quantization

Uniform Non-uniform

14

Syntax of MPEG-1 Audio

Frame Frame Frame Frame Frame

Header Error check Audio data Ancillary data

Syncword

(12)

ID

(1)

Layer

(2)

Protection bit

(1)

Bit rate index

(4)

Sampling frequency

(2)

Padding bit( 1 )

Private bit

( 1 )

Mode( 2 )

Mode extension

( 2)

Copy right( 1)

Org./copy( 1)

Emphasis( 2)

15

Header( 1)

Syncword = ‘1111 1111 1111’

ID :

Layer :

Protection bit :

reserved

audioMPEG

0

11

reserved

layer

layer

layer

00

301

210

111

absentcheckCRC

presentcheckCRC

1

0

16

Header ( 2 ) bit rate index

Bit-rate index Layer I Layer II Layer III

0000 Free format Free format Free format

0001 32 kbits/s 32 kbits/s 32 kbits/s

0010 64 kbits/s 48 kbits/s 40 kbits/s

0011 96 kbits/s 56 kbits/s 48 kbits/s

0100 128 kbits/s 64 kbits/s 56 kbits/s

0101 160 kbits/s 80 kbits/s 64 kbits/s

0110 192 kbits/s 96 kbits/s 80 kbits/s

0111 224 kbits/s 112 kbits/s 96 kbits/s

1000 256 kbits/s 128 kbits/s 112 kbits/s

1001 288 kbits/s 160 kbits/s 128 kbits/s

1010 320 kbits/s 192 kbits/s 160 kbits/s

1011 352 kbits/s 224 kbits/s 192 kbits/s

1100 384 kbits/s 256 kbits/s 224 kbits/s

1101 416 kbits/s 320 kbits/s 256 kbits/s

1110 448 kbits/s 384 kbits/s 320 kbits/s

1111 forbidden forbidden forbidden

17

Header ( 3 )

Sampling frequency :

Padding bit :

Private bit : 1 bit for private usage

reserved

KHz

KHz

KHz

11

3210

4801

1.4400

slotpaddingwith

slotpaddingwithout

1

0

18

Header( 4)

Mode :

Mode extension :

Copy right :” 1”means copyright protected

channelgle

channeldual

stereojo

stereo

sin11

10

int01

00

),(.311611

),(.311210

),(.31801

),(.31400

IIISJin

IIISJin

IIISJin

IIISJin

19

Joint Stereo

Stereo Joint Stereo

MS stereo( Layer III only)Left/Right are transform to Middle/Side

Intensity stereo Layer I II: Specify two channel scale factors on

common dataLayer III: Specify one ration on common data

20

Header( 5)

Original/copy :

Emphasis :

17.11

10

sec15/5001

00

CCITTJ

reserved

ondsmicro

emphasisno

original

copy

1

0

21

Error Check

16 bits parity check word is used for optional error detection

The generator polynomial is :

1)( 21516 XXXXGX

22

Protection Range

Layer I : Bits 16-31 of header Bit allocation

Layer II : Bits 16-31 of header Bit allocation Scalefactor selection information

Layer III : Bits 16-31 of header Side information

136 bits(single channel) 256 bits (other modes)

Header Error check Audio data Anillary data

23

Layer III Coding(MP3)

32 bands Filterbank

18 LineMDCT

Non-uniformQuantization

HuffmanCoding

multiplex

Bit allocation

PerceptionModelMaskingcalculation

Auido in576 spectral line

Side information

Bit streamoutput

24

Advance Features in Layer III

Higher ferquency resolution ( 576 spectral lines ) Variable length of data frame Adaptive windowing Variable length coding ( 32 VLC tables ) Two granules in one frame Spectral region partition Non-uniform Scale Factor Bands Special stereo mode extension

25

Variable Length of Main Data

Frame is not self-decodable

frame1 frame3frame2 frame4

Sy

nc inf

o Sy

nc inf

o Sy

nc inf

o Sy

nc inf

o

Main_data_begin1

Main_data_begin2

Main_data_begin3

Main_data_begin4

frame1 frame3frame2 frame4

Sy

nc inf

o Sy

nc inf

o Sy

nc inf

o Sy

nc inf

o

Main_data_begin1

Main_data_begin2

Main_data_begin3

26

Adaptive Windowing

Pre-echo

原始訊號 重建訊號

27

Adaptive Windowing

Normal windiow Start window

Short window Stop window

28

Variable Length Codigg

Symbol Code Probability

A 00 0.5

B 01 0.25

C 10 0.125

D 11 0.125

Symbol Code Probability

A 0 0.5

B 10 0.25

C 110 0.125

D 111 0.125

Fixed length coding Variable length coding

2*0.5+2*0.25+2*0.125+2*0.125=2

1*0.5+2*0.25+3*0.125+3*0.125=1.375

29

Granule

1152 samplesFilter banks

and transform

Granule0576lines

Spectral lines

Granule1576lines

Spectral lines

30

Spectral region partition

Big value range ( coding by pair) One value range ( coding by quadruple) Zero range

2*big_values

576

2*big_values+4*count1

31

Non-uniform scale Factor Bands( Long,32K)

SF BAND Rang

0 0~3

1 4~7

2 8~11

3 12~15

4 16~19

5 20~23

6 24~29

7 30~35

8 36~43

9 44~53

10 54~65

SF BAND Rang

11 66~81

12 82~101

13 102~125

14 126~155

15 156~193

16 194~239

17 240~295

18 296~363

19 364~447

20 448~549

32

Non-uniform Scale factor Bands( Short,32k)

SF BAND Rang

0 0~3

1 4~7

2 8~11

3 12~15

4 16~21

5 22~29

6 30~41

7 42~57

8 58~77

9 78~103

10 104~137

11 138~179

33

Layer III Mode Extension

When mode = ’01’

Mode_extension Ms_stereo Intensity_stereo

00 - -

01 - On

10 On -

11 On On

34

MS_stereo

Sum/difference instead of left/right When difference<<sum, this mode is effective, S

channel will become sparse

S

M

R

L

11

11

2

1

R

L

S

M

11

11

2

1

35

Intensity Stereo( Layer III)

Stereo:

Intensity stereo:

iOii

i

iOii

rLR

r

rLL

1

1*

1*

Where ri = )12

tan( ip

36

MS+Intensity Stereo( Layer III)

MS+Intensity stereo :

iOii

i

iOii

rLR

r

rLL

1

1*

1*

Where ri = )12

tan( ip

37

Layer III Audio Data

Header Error check Audio data Anillary data

Side information

Description filed

Main data filed

( variable length)

Main data begin, private bits

Scale factor selection information

Granule0 information

Granule1 information

Scale factors( graunle0)

Huffman data field ( granule0)

Scale factors( granule1)

Huffman data field( graunle1)

38

Side informationUnit Window_switch_flag=1 Widow_switch_flag=0

Channel 1 2 1 2

Main_data_begin 9 9 9 9 9

Private_bits 5,3 5 3 5 3

Scfsi[ch][scfsi_band] 1 4 8 4 8

Part2_3_length[gr][ch] 12 24 48 24 48

Big_values[gr][ch 9 18 36 18 36

Global_gain[gr][ch] 8 16 32 16 32

Scalefac_compress[gr][ch] 4 8 16 8 16

Window_switching_flag[gr][ch]

1 2 4 2 4

Block_type[gr][ch] 2 4 8

Mixed_block_flag[gr][ch] 1 2 4

Table_select [gr][ch][region] 5 20 40 30 60

Subblock_gain [gr][ch][window]

3 18 36

Region0_count [gr][ch] 4 8 16

Region1_count [gr][ch] 3 6 12

preflag [gr][ch] 1 2 4 2 4

Scalefac_scale [gr][ch] 1 2 4 2 4

Count1table_select [gr][ch] 1 2 4 2 4

Sum 136 256 136 256

Bytes 17 32 17 32

39

Main Data Begin

9 bits unsigned integer to specify the location as a negative offset in bytes from the first byte of audio syncword.

Main_data_begin4

frame1 frame3frame2 frame4

Sy

nc inf

o Sy

nc inf

o Sy

nc inf

o Sy

nc inf

o

Main_data_begin1

Main_data_begin2

Main_data_begin3

Main_data_begin4

frame1 frame3frame2 frame4

Sy

nc inf

o Sy

nc inf

o Sy

nc inf

o Sy

nc inf

o

Main_data_begin1

Main_data_begin2

Main_data_begin3

40

Scfsi Bands

If scfsi[scfsi_band]=0 Scalefactor are transmitted for each granule

If scfsi[scfsi_band]=1 Scalefactors are transmitted for granule 0 They are also valid for granule 1

Scfsi band SF band range

0 0~5 0~23

1 6~10 24~65

2 11~15 66~193

3 16~20 194~549

41

Part2_3_length

The number of main data bits Main data bits=scale factors + Huffman code data

This is used to calculate the beginning of next granule ( or ancillary data )

Part2_3_length

Main data

Sy

nc inf

o Sy

nc inf

o Sy

nc inf

o

42

Big_values

Big value range=2*big_values( 9bits) One value range Zero range

2*big_values

576

2*big_values+4*count1

8191

43

Global_gain

Long blocks :

Short blocks :

Pretab[sbf] : value from pre-emphasis table 210 : the value is used to scale output properly

subgain

grgainglobal

iii sssignr2

23

4 210][_4

1

*

]][][[_][][

*_]][][[

grchsfblscalefacsfbpretabgrpreflag

multiplierscalefacgrchsbfsubgain

]][][][[_*_

]][][[_*2]][][][[

windowsfbchgrsscalefacmultiplierscalefac

windowgrchgainsubblockwindowgrchsfbsubgain

44

Scalefac_compress

Select the number of bits for the transmission of scale factors

Scalefac compress Slen1 Slen2

0 0 0

1 0 1

2 0 2

3 0 3

4 3 0

5 1 1

6 1 2

7 1 3

8 2 1

9 2 2

10 2 3

11 3 1

12 3 2

13 3 3

14 4 2

15 4 3

45

Scale factor length( slen1,slen2)

Block_type Mix_block_flag Slen1 Slen2

0,1,3 - 0~10 11~20

2

0 0~5 6~11

1 0~7(L)+3~5(S)

6~11

46

Regino0_count,Region1_count[gr][ch]

Region0_count=4 bits. Region1_count=3 bits. Region0_count+1=scale factor bands in region 0 Regino1_count+1=scale factor bands is region 1

2*big_values

… … … …

REGIN0 REGIN1 REGIN2Big Values Region

47

Window_switching_flag

If window_switch_flag==1 then block_type!=0

If window_switch_flag==0 then block_type=0

Block_type Mix_block_flag Region0 Regino1

1,3 - 7 36

2

0 7 36

1 8 36

Block_type Mix_block_flag Region0 Regino1

0 Not exist Designate designate

48

Block_type

Bolck_type

0 -

1 Start block

2 3 short widows

3 End block

49

Mixed_block_flag[gr][ch]

- If mixed_block_flag=0 :- 32 sub-bands are transform with block_type[gr][ch]

- If mixed_block_flag=1 :- Two lowest sub-bands are transformed with long block- 30 other sub-bands are transformed with block_type[gr]

[ch]

50

Table_select[gr][ch][region]

- 5 bits for each region, each channel, each granule

- To select the 32 possible Huffman tables in big values

X Y Hlen Hcod

0 0 1 1

0 1 3 010

0 2 6 000001

1 0 3 011

1 1 3 001

1 2 5 00001

2 0 5 00011

2 1 5 00010

2 2 6 000000

51

Subblock_gain[gr][ch][window]

- Use only for block type 2- 3 bits for each window, each channel, each granule

]][][][[_*_

]][][[_*2]][][][[

windowsfbchgrsscalefacmultiplierscalefac

windowgrchgainsubblockwindowgrchsfbsubgain

subgain

grgainglobal

iii sssignr2

23

4 210][_4

1

*

52

Preflag[gr][ch]

- If preflag=1, internal pre-emphasis is used

- If block_thpe=2, preflag is never used

SF

BAND

Pretab Rang

0

0

0~3

1 4~7

2 8~11

3 12~15

4 16~19

5 20~23

6 24~29

7 30~35

8 36~43

9 44~53

10 54~65

SF

BAND

pretab Rang

11

1

66~81

12 82~101

13 102~125

14 126~155

15 2 156~193

16 194~239

17

3

240~295

18 296~363

19 364~447

20 2 448~549

53

Scalefac_scale[gr][ch]

- If scalefac_scale=0, scalefac_multiplier=1/2

- If scalefac_scale=1 scalefac_multiplier=1

- Long blocks :

- Short blocks :

]][][[210][_4

1

3

4

2*grchsfbsubgaingrgainglobal

iii sssignr

])][][[_(_

][][]][][[

grchsfblscalefacmultiplierscalefac

sfbpretabgrpreflaggrchsbfsubgain

]][][][[_(_

]][][[_*2]][][[

windowsfbchgrsscalefacmultiplierscalefac

windowchgrgainsubblockgrchsfbsubgain

54

Count1table_select

- If count1table_select=0, Table A is select( Huffman coding)- If count1table_select=1, Table B is select( Fixed 4 bits)

2*big_values

576

2*big_values+4*count1

quadruples