57
1 Case Study: DCT 賴瑞欣(Larry) Slides are prepared by Prof. Shao-Yi Chien, Information Theory and Coding Technique

Case Study: DCT - CSIEc.csie.org/~itct/slide/DCT_larry.pdf · 1 Case Study: DCT 賴瑞欣(Larry) Slides are prepared by Prof. Shao-Yi Chien, Information Theory and Coding Technique

  • Upload
    lehanh

  • View
    220

  • Download
    5

Embed Size (px)

Citation preview

1

Case Study: DCT

賴瑞欣(Larry) Slides are prepared by Prof. Shao-Yi Chien,

Information Theory and Coding Technique

DSP in VLSI Design Shao-Yi Chien 2

Outline

DCT Algorithm

1-D DCT : Row-Column Method

Direct 2-D Architecture

3

DCT Algorithm

DSP in VLSI Design

Discrete Fourier Transform (DFT)

Shao-Yi Chien 4

DSP in VLSI Design

Discrete Cosine Transform (DCT)

Shao-Yi Chien 5

DSP in VLSI Design

Shao-Yi Chien 6

DSP in VLSI Design

Shao-Yi Chien 7

DSP in VLSI Design Shao-Yi Chien 8

Discrete Cosine Transform

DCT

block size 8 x 8

( ie.N=8 )

DSP in VLSI Design Shao-Yi Chien 9

DCT-Based Coding

Optimal transform is KLT, but

KLT is image dependent

High computing complexity

DCT-based coding,

Image independent, unlike KLT for highly

correlated image data

DCT compaction efficiency is close to KLT

Computations of DCT can be performed with fast

algorithms which can be easily implemented on

parallel architectures.

DSP in VLSI Design Shao-Yi Chien 10

Roles in Video Encoder

DCT Q

IQ

IDCT

Frame(s)

Memory

Motion

Estimation

Entropy

Coding

Rate Control

Compressed

Data T

Motion Vector

Q-Step

Original

SequenceM

u

l

t

i

p

l

e

x

e

r

Bitstream

Motion

Coding

Compressed

Data M

DSP in VLSI Design Shao-Yi Chien 11

Roles in Video Decoder

IQ IDCT

Frame(s)

Memory

Entropy

Decoding

Compressed

Data T

Motion Vector

Q-StepD

e

m

u

l

t

i

p

l

e

x

e

r

Bitstream

Motion

Decoding

Reconstructed

Sequence

Compressed

Data M

DSP in VLSI Design Shao-Yi Chien 12

Hardware/Software Trade-Off

For low-end applications, using software

approach is powerful enough

For high-end applications, must use hardware

approach

For middle-end applications, either software or

hardware approach is possible, depending on

the target design platform

DSP in VLSI Design Shao-Yi Chien 13

Basic Transformation Forms

2-D forward transforms

2-D inverse transforms

1

0

1

0),,,(),(),(

N

x

N

yvuyxfyxgvuT

1

0

1

0),,,(),(),(

N

u

N

vvuyxivuTyxg

DSP in VLSI Design Shao-Yi Chien 14

1-D Discrete Cosine Transform

Forward DCT

Backward DCT

DSP in VLSI Design Shao-Yi Chien 15

1D 8-Point DCT Basis Functions

k = 0

k = 1

k = 5

k = 3

k = 4 k = 7

k = 2

k = 6

DSP in VLSI Design Shao-Yi Chien 16

2-D Discrete Cosine Transform

Forward DCT

Backward DCT

otherwise , 1

0,for , 2/1)(),( where

nmnumu

Block size = N x N

X(m,n) : DCT coefficients

x(i,j) : image samples in block

DSP in VLSI Design Shao-Yi Chien 17

2-D 8x8 DCT Basis Functions

The DCT represents each block of image

samples as a weighted sum of 2-D cosine

functions (basis functions)

DSP in VLSI Design Shao-Yi Chien 18

DCT Coefficients

dc Low

frequencies

Medium

frequencies

High

frequencies

dc Vertical edges

High

frequencies

DSP in VLSI Design Shao-Yi Chien 19

An Example of Energy Compaction

DSP in VLSI Design Shao-Yi Chien 20

An Example of DCT

DC: F(0,0)= (1/8) SS f(m,n)

related to the average value of the block

52 55 61 66 71 61 64 7363 59 66 90 109 585 69 7262 59 68 113 144 104 66 7363 58 71 122 154 106 70 6967 61 68 104 126 88 68 7079 65 60 70 77 68 58 7585 71 64 59 55 61 65 8387 79 69 58 65 76 78 94

609 -29 -62 25 55 -20 -1 37 -21 -62 9 11 -7 -6 6

-46 8 77 -25 -30 10 7 -5-50 13 35 -15 -9 6 0 311 -8 -13 -2 -1 1 -4 1-10 1 3 -3 -1 0 2 -1-4 -1 2 -1 2 -3 1 -2-1 -1 -1 -2 -1 -1 0 -1

DCT IDCT

Pixel Domain

Frequency Domain

DC

AC

DSP in VLSI Design Shao-Yi Chien 21

DCT Algorithm Classification

Direct 2-D Method The 2-D transforms, DCT and IDCT, to be applied directly on

the N x N input data items

Row-Column Method The 2-D transform can be carried out with two passes of 1-D

transforms

The separability property of 2-D DCT/IDCT allows the transform to be applied on one dimension (row) then on the other (column)

Requires 2N instances of N-point 1-D DCT to implement an N x N 2-D DCT

DSP in VLSI Design Shao-Yi Chien 22

Row-Column Decomposition

1D DCT

UNIT

1D DCT

UNIT

TRANSPOSE

MEMORY

X Z

(Y)

Y=AX Z=YAT

Separable,

row-column decomposition

TAXAZ 0for 1)( and 2

1)0(

1,...,1,0,

]4

)12(2cos[)(

2),(

kkcc

Nnk

N

knkc

Nnka

DSP in VLSI Design Shao-Yi Chien 23

Straightforward Approach

Carry out the computation as full matrix-vector

multiplications

1-D transform requires N*N multiplications and N* (N-1)

additions

2-D transform requires N*N*N*N multiplications and N * N

*(N * N-1) additions

Although requiring the most number of operations, this

method is very regular

Most suitable for vector processors or deeply pipelined

architectures for high PE utilization

1-D fast algorithms => O(N*logN)

2-D fast algorithms => O(N*N*logN)

DSP in VLSI Design Shao-Yi Chien 24

DSP in VLSI Design Shao-Yi Chien 25

DSP in VLSI Design Shao-Yi Chien 26

DSP in VLSI Design Shao-Yi Chien 27

DSP in VLSI Design Shao-Yi Chien 28

DSP in VLSI Design Shao-Yi Chien 29

x(2)

x(2)

DSP in VLSI Design Shao-Yi Chien 30

DSP in VLSI Design Shao-Yi Chien 31

DSP in VLSI Design Shao-Yi Chien 32

DSP in VLSI Design Shao-Yi Chien 33

34

1-D DCT

Row-Column

Method

DSP in VLSI Design Shao-Yi Chien 35

Row-Column Method(1/2)

Basic concept:

2-D DCT = 1-D DCT(row) → 1-D DCT(column)

1-D DCT 1-D DCTMEMORY

DCT for row DCT for column

Input (X) Output (Z)

(Y)

DSP in VLSI Design Shao-Yi Chien 36

1-D DCTTRANSPOSE

MEMORY

Row-Column Method(2/2)

Use transpose memory

Input (X) Output

(Y)

(Z)

Z = AXAT

(A)

37

Row-Column

Method Example

A. Madisetti and A. N. Willson Jr., “A 100 MHz 2-D

8x8 DCT/IDCT processor for HDTV applications,”

IEEE Transactions on Circuits and Systems for Video

Technology, vol. 5, no. 2, April 1995

DSP in VLSI Design Shao-Yi Chien 38

Matrix Decomposition

Reduce an 8x8 matrix computation to two

4x4 matrix computation

DCT

IDCT

DSP in VLSI Design Shao-Yi Chien 39

Implementation(1/9)

For 8-bit 1D-DCT unit array A:

Y = AX

DSP in VLSI Design Shao-Yi Chien 40

Implementation(2/9)

Use symmetrical property of DCT coefficients:

DCT:

IDCT:

DSP in VLSI Design Shao-Yi Chien 41

Overall Architecture (1/2)

Architecture:

DSP in VLSI Design Shao-Yi Chien 42

Overall Architecture (2/2)

BDEG MATRIX VECTOR MULTIPLIER

TRANSPOSEMEMORY

IDRUDRU

ACF MATRIX VECTOR MULTIPLIER

DSP in VLSI Design Shao-Yi Chien 43

X

Y

MUX

MUX LIFO

Data Reorder Unit (1/3)

DRU architecture

add sub even odd

DCT IDCT

DSP in VLSI Design Shao-Yi Chien 44

Y7Y6Y5Y4

X3X2X1X0Y3Y2Y1Y0 Y0Y1Y2Y3

X3X2X1X0

Y5Y4

X1X0Y3Y2Y1Y0 Y2Y3

X1X0Y1Y0

Y6Y5Y4

X2X1X0Y3Y2Y1Y0 Y1Y2Y3

X2X1X0Y0

Y0

Y0

Y2Y1Y0

Y2Y1Y0

Y1Y0

Y1Y0

Y3Y2Y1Y0

Y3Y2Y1Y0

Y4

X0Y3Y2Y1Y0 Y3

X0Y2Y1Y0

X

Y

MUX

MUX LIFO

Data Reorder Unit (2/3)

DRU_1 Data flow:

DSP in VLSI Design Shao-Yi Chien 45

Data Reorder Unit (3/3)

DRU_2 Data flow:

add sub even odd

Y0Y1Y2Y3

Y7Y6Y5Y4

Y0+ Y7

Y1+ Y6

Y2+ Y5

Y3+ Y4

Y0 - Y7

Y1 - Y6

Y2 - Y5

Y3 - Y4

Y0

Y6

Y2

Y4

Y7

Y1

Y5

Y3

DSP in VLSI Design Shao-Yi Chien 46

Matrix multiplier (1/4)

ACF matrix vector multiplier

acc0

DSP in VLSI Design Shao-Yi Chien 47

Matrix multiplier (2/4)

BDEG matrix vector multiplier acc0

DSP in VLSI Design Shao-Yi Chien 48

Matrix multiplier (3/4)

Hardwired multipliers

Use Sign Digit Code(SDC) number system

DSP in VLSI Design Shao-Yi Chien 49

Matrix multiplier (4/4)

Accumulator

DSP in VLSI Design Shao-Yi Chien 50

Transpose Memory (1/2)

Pin-pong mode

Using RAMs ADDR

0 1 2 3... 7 8 9 10 11...15 16 17...62 63

0 8 16 24...56 1 9 17 25...57 2 10...55 63

0 1 2 3... 7 8 9 10 11...15 16 17...62 63

...

TIME

DSP in VLSI Design Shao-Yi Chien 51

Transpose Memory (2/2)

Using registers

(0,7) (0,6) (0,0)

(6,7)

(7,7) (7,6)

(6,6)

(7,0)

(6,0)

IN

. . .

. . .

. . .

. . .

. . .

. . .

MUX OUT

DSP in VLSI Design Shao-Yi Chien 52

Scheduling

i: block index

R: row

C: column

X

XiR0

XiR7

Y

Y

YiC7YiC0

Z

ZiC7ZiC0

YiR0

YiR7

DSP in VLSI Design Shao-Yi Chien 53

Scheduling

DRU

ACF/

BDEG

IDRU

X0R0

X0R0 >

Y0R0

Y0R0

X0R1

X0R1 >

Y0R1

Y0R1

X0R2

X0R2 >

Y0R2

Y0R2

X0R3

X0R3 >

Y0R3

Y0R3

X0R4

X0R4 >

Y0R4

Y0R4

X0R5

X0R5 >

Y0R5

Y0R5

X0R6

X0R6 >

Y0R6

8 cycles

DSP in VLSI Design Shao-Yi Chien 54

Scheduling

DRU

ACF/

BDEG

IDRUY0R5

X0R6

X0R6 >

Y0R6

Y0R6

X0R7

X0R7 >

Y0R7

Y0R7

8 cycles

X1R0

X1R0 >

Y1R0

Y1R0

X1R1

X1R1 >

Y1R1

Y1R1

X1R2

X1R2 >

Y1R2

Y1R2

X1R3

X1R3 >

Y1R3

Y1R3

X1R4

X1R4 >

Y1R4

Y0C0

Y0C0 >

Z0C0

Z0C0

Y0C1

Y0C1 >

Z0C1

Z0C1

Y0C2

Y0C2 >

Z0C2

Z0C2

Y0C3

Y0C3 >

Z0C3

Z0C3

Y0C4

100% hardware utilization!

55

Direct 2-D DCT

Architecture

DSP in VLSI Design Shao-Yi Chien 56

Direct 2-D DCT Method

Computing the transform directly from the N x

N input numbers

Derive fast DCT algorithms from the signal flow

graph (like FFT)

Based on 1-D DCT

Larger flow graph

Global routing

More temporal storage

Larger datapath

DSP in VLSI Design Shao-Yi Chien 57

An Example

of 2-D DCT Architecture