34
1/34 Textbook: VLSI ARRAY PROCESSORS S.Y. Kung Prentice-Hall, Inc. 開發圖書 : INSTRUCTOR : CHING-LONG SU 課程名稱 課程名稱 : : 數位積體電路設計 數位積體電路設計 E-mail: [email protected]

Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

Embed Size (px)

Citation preview

Page 1: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

1/34

Textbook: VLSI ARRAY PROCESSORSS.Y. Kung

Prentice-Hall, Inc.開發圖書

教 師 : 蘇 慶 龍INSTRUCTOR : CHING-LONG SU

課程名稱課程名稱: : 數位積體電路設計數位積體電路設計

E-mail: [email protected]

Page 2: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

2/34

Chapter 4Systolic Array Processors

Chapter 4Chapter 4

Page 3: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

3/34Outline of Chapter 4Outline of Chapter 4

4.1 Introduction4.2 Systolic Array Processors4.3 Mapping DGs and SFGs to Systolic Arrays4.4 Performance Analysis and Design Optimization4.5 Systolic Arrays for the Transitive Closure and

Dynamic Programming Problems4.6 Systolic Design for Artificial Neural Network4.7 Conclusion Remarks4.8 Problems

Page 4: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

4/344.1 4.1 IntroductionIntroduction

4.1 Introduction4.2 Systolic Array Processors4.3 Mapping DGs and SFGs to Systolic Arrays4.4 Performance Analysis and Design Optimization4.5 Systolic Arrays for the Transitive Closure and

Dynamic Programming Problems4.6 Systolic Design for Artificial Neural Network4.7 Conclusion Remarks4.8 Problems

Page 5: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

5/344.1 4.1 IntroductionIntroduction

1. Review the algorithm mapping onto SFG methodology2. Discuss the cut-set systolization (retiming) method for

systolic array design

nn In Chapter 4In Chapter 4

Page 6: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

6/344.2 4.2 Systolic Array ProcessorsSystolic Array Processors

4.1 Introduction4.2 Systolic Array Processors4.3 Mapping DGs and SFGs to Systolic Arrays4.4 Performance Analysis and Design Optimization4.5 Systolic Arrays for the Transitive Closure and

Dynamic Programming Problems4.6 Systolic Design for Artificial Neural Network4.7 Conclusion Remarks4.8 Problems

Page 7: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

7/34

1. “A systolic system is a network of processors which rhythmically compute and pass data through the system.”

2. Systolic Array Processor avoids the classic memory access bottleneck problem.

3. Systolic Array Processor can solve the compute-bound and I/O-bound computations.

nn Systolic Array ProcessorSystolic Array Processor

4.2 4.2 Systolic Array ProcessorsSystolic Array Processors

Page 8: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

8/34

nn Basic Configuration of Systolic ArrayBasic Configuration of Systolic Array

4.2 4.2 Systolic Array ProcessorsSystolic Array Processors

PE PE PE PE PE PE PE PE

Memory

Page 9: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

9/34

1. Synchrony: The data are rhythmically computed (timed by a global clock) and passed through the network.

2. Modularity and Regularity3. Spatial Locality and Temporal Locality4. Pipelinability

nn Definition of Systolic ArraysDefinition of Systolic Arrays

4.2 4.2 Systolic Array ProcessorsSystolic Array Processors

Page 10: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

10/34

nn Example 1: Systolic Array for ConvolutionExample 1: Systolic Array for Convolution

4.2 4.2 Systolic Array ProcessorsSystolic Array Processors

W0 W0 W0 W0 0

- u1 - u0

- y0 - y1

Wi

ain

bout

aout

bin

aout=ain

bout=bin+ain*Wi

Page 11: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

11/34

nn Example 2: Example 2: Hexagonal SystolicHexagonal Systolic

Array for BandArray for BandMatrix Matrix

MultiplicationMultiplication

4.2 4.2 Systolic Array ProcessorsSystolic Array Processors

c11

c21 c12

c31 c13c22

c32c23

c14c41

b11

b12

b21

T=1

T=2

T=3

T=4

b13

b22

b23

b32

T=1

T=2T=3

T=4

a11

a21

a12

a31

a22

a32

a23

C out

T=1

T=2T=3

T=4

Page 12: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

12/34

nn Properties of Systolic ArrayProperties of Systolic Array

4.2 4.2 Systolic Array ProcessorsSystolic Array Processors

1. Simple and Regular Design2. Concurrency and Communication3. Balancing Computation with I/O

Page 13: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

13/34

nn Clock Distribution Scheme for Synchronization Clock Distribution Scheme for Synchronization of the Systolic Array System:of the Systolic Array System:

HH--tree Layout for the Balance of the Clock tree Layout for the Balance of the Clock Circuit DelayCircuit Delay

4.2 4.2 Systolic Array ProcessorsSystolic Array Processors

Linear Array Square Array Hexagonal Array

Page 14: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

14/34

nn Systolic vs. SIMD vs. SFG ArraysSystolic vs. SIMD vs. SFG Arrays

4.2 4.2 Systolic Array ProcessorsSystolic Array Processors

Control Unit(Central Control)

ProcessingUuit

ProcessingUuit

ProcessingUuit

Interconnection Network (Local)

Control Bus

Data Bus

GlobalCommunication

SIMD Array

Page 15: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

15/34

nn Systolic vs. SIMD vs. SFG ArraysSystolic vs. SIMD vs. SFG Arrays

4.2 4.2 Systolic Array ProcessorsSystolic Array Processors

Systolic Array

ProcessingUuit

ProcessingUuit

ProcessingUuit

Interconnection Network (Local)

ControlUnit

ControlUnit

ControlUnit

Page 16: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

16/34

nn Systolic vs. SIMD vs. SFG ArraysSystolic vs. SIMD vs. SFG Arrays

4.2 4.2 Systolic Array ProcessorsSystolic Array Processors

SFG Array

ProcessingUuit

ProcessingUuit

ProcessingUuit

Interconnection Network (Local)

Data Bus

ControlUnit

ControlUnit

ControlUnit

Global Communication

Page 17: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

17/344.2 4.2 Systolic Array ProcessorsSystolic Array Processors

4.1 Introduction4.2 Systolic Array Processors4.3 Mapping DGs and SFGs to Systolic Arrays4.4 Performance Analysis and Design Optimization4.5 Systolic Arrays for the Transitive Closure and

Dynamic Programming Problems4.6 Systolic Design for Artificial Neural Network4.7 Conclusion Remarks4.8 Problems

Page 18: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

18/34

nn Three Stage of Canonical Mapping Algorithm Three Stage of Canonical Mapping Algorithm for Systolic Array Designfor Systolic Array Design

4.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

1. Derive a (local) DG from the Algorithm2. Map the DG to an SFG Array3. Transform the SFG to a Systolic Array (i.e. retiming)

Page 19: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

19/34

nn The major systolic array design gap is that most The major systolic array design gap is that most SFGs SFGs are not given in temporally localized form.are not given in temporally localized form.

4.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

Systolic Array = SFG Array + Pipeline Retiming

Page 20: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

20/34

nn CutCut--Set Retiming ProcedureSet Retiming Procedure

4.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

1. Timing Scale: All delays D may be scaled, i.e. D αD(I/O Down Sample)

2. Delay Transfer: Given a cut-set of the SFG, which partitions the graph into two components, we can group the edges of the cut-set into two inbound edges and outbound edges.

Page 21: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

21/34

nn Data Transfer RuleData Transfer Rule

4.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

Inbound

Outbound

Cut

+kD

+kD

- kD

Page 22: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

22/34

nn Systolization Systolization ProcedureProcedure

4.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

1. Selection of Basic Operation Modules2. Applying Retiming Rules3. Combination of Delay and Operation Modules

Page 23: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

23/34

nn Systolization Systolization Procedure: Example of Lattice FiltersProcedure: Example of Lattice Filters

4.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

*+

*+

*+

*+

*+

*+

DD D

- - - X2 X1 X0

Y0 Y1 Y2- - -

Critical Path= 6 MAC

ki

*+

*+

*+

*+

*+

*+

2D2D 2D

- X2 - X1 - X0

Y0 - Y1 - Y2 -

SFG for AR Lattice Filter

Step 1. Time-Rescaled SFG for AR Lattice Filter

Page 24: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

24/34

nn Systolization Systolization Procedure: Example of Lattice FiltersProcedure: Example of Lattice Filters

4.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

- X2 - X1 - X0

Y0 - Y1 - Y2 -

Step 2. Retiming SFG for AR Lattice Filter

Step 3. Systolic Array for AR Lattice Filter

*+

*+

*+

*+

*+

*+

2D-D2D-D 2D-D

- X2 - X1 - X0

Y0 - Y1 - Y2 -

+D +D +D

Critical Path= 2 MAC

Page 25: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

25/34

nn Systolization Systolization Procedure:Procedure:Example ofExample of

Matrix MultiplicationMatrix Multiplication

4.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

D D D D

D D D D

D D D D

D D D D

b14b13b12b11

b24b23b22b21

b34b33b32b31

b44b43b42b41

a11

a21

a31

a41

a12

a22

a32

a42

a13

a23

a33

a43

a14

a24

a34

a44

Page 26: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

26/344.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

nn All systolic arrays obtained from linear projections All systolic arrays obtained from linear projections of the DG can be derived by the following two steps.of the DG can be derived by the following two steps.

1. Mapping the DG onto SFGs by the SFG Projection Procedure

2. Mapping the SFG onto a Systolic Array by the Cut-Set Retiming

Page 27: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

27/344.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

nn Retiming in the Sorting Systolic Arrays: Insertion SorterRetiming in the Sorting Systolic Arrays: Insertion Sorter

D m44m4

i

D m33m3

i

D m22m2

i

m11 m21 m31 m41 m51

x11 x21 x31 x41

m22 m32 m42 m52

x22 x32 x42x12

m33 m43 m53

x33 x43

m44 m54

x44x34

x23

x45

i

j

INPUT

-∞

-∞-∞

-∞-∞

-∞-∞

-∞

D m11m1

i

x11

x12

x13

x14

d = s =[1,0]

D

D

D

Page 28: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

28/344.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

nn Retiming in the Sorting Systolic Arrays: Selection SorterRetiming in the Sorting Systolic Arrays: Selection Sorter

x1j

D

x11m1

1

x2j

D

x21m2

2

x3j

D

x31m3

3

m11 m21 m31 m41 m51

x11 x21 x31 x41

m22 m32 m42 m52

x22 x32 x42x12

m33 m43 m53

x33 x43

m44 m54

x44x34

x23

x45

i

j

INPUT

-∞

-∞-∞

-∞-∞

-∞-∞

-∞

x4j

D

x41m4

4

d = s = [0,1]

D D D

Page 29: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

29/344.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

nn Retiming in the Sorting Systolic Arrays: BubbleRetiming in the Sorting Systolic Arrays: Bubble--SorterSorter

m11 m21 m31 m41 m51

x11 x21 x31 x41

m22 m32 m42 m52

x22 x32 x42x12

m33 m43 m53

x33 x43

m44 m54

x44x34

x23

x45

i

j

INPUT

-∞

-∞-∞

-∞-∞

-∞-∞

-∞

D

D

D

D

D

D

x44

x33

x22

x11m1

1

m13

m15

m17

d = s = [1,1]

Page 30: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

30/344.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

nn Rotation of Schedule Vector for Insertion SorterRotation of Schedule Vector for Insertion Sorter

D m44m4

i

D m33m3

i

D m22m2

i

m11 m21 m31 m41 m51

x11 x21 x31 x41

m22 m32 m42 m52

x22 x32 x42x12

m33 m43 m53

x33 x43

m44 m54

x44x34

x23

x45

i

j

INPUT

-∞

-∞-∞

-∞-∞

-∞-∞

-∞

D m11m1

i

x11

x12

x13

x14

d = s =[1,0]

D

D

DDefault Schedule

Desired Schedule

Page 31: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

31/344.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

nn Bit Level Systolic Arrays:Bit Level Systolic Arrays:Example of Inner Product of Two VectorsExample of Inner Product of Two Vectors

The inner product vector c of two vectors a and b is computed as:

Assume that elements of a and b are m-bit integer.

∑==

n

kkkbac

1

Page 32: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

32/344.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

nn Bit Level Systolic Arrays:Bit Level Systolic Arrays:Example of Inner Product of Two VectorsExample of Inner Product of Two Vectors

jnmj

jjn

mj

jj

mj

jj

mj

jnn

nk kk

baba

bababa

bac

,10,

10,1

10,1

10

2211

1

22...22

...

∑×∑++∑×∑=

×++×+×=

∑ ×=

−=

−=

−=

−=

=

a1,0a1,1a1,m-1

b1,0b1,1b1,m-1

b1,0 a1,0a1,1b1,0b1,0 a1,m-1

b1,1 a1,0a1,1b1,1b1,1 a1,m-1

b1,m-1 a1,0a1,1b1,m-1b1,m-1 a1,m-1

Top Layer:

Page 33: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

33/34

jnmj

jjn

mj

jj

mj

jj

mj

jnn

nk kk

baba

bababa

bac

,10,

10,1

10,1

10

2211

1

22...22

...

∑×∑++∑×∑=

×++×+×=

∑ ×=

−=

−=

−=

−=

=

4.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

nn Bit Level Systolic Arrays:Bit Level Systolic Arrays:Example of Inner Product of Two VectorsExample of Inner Product of Two Vectors

an,0an,1an,m-1

bn,0bn,1bn,m-1

bn,0 an,0an,1bn,0bn,0 an,m-1

bn,1 an,0an,1bn,1bn,1 an,m-1

bn,m-1 an,0an,1bn,m-1bn,m-1 an,m-1

Bottom Layer:

Page 34: Textbook: VLSI ARRAY PROCESSORS S.Y. Kungsoc.eecs.yuntech.edu.tw/Course/Digital IC/D_VLSI_Chap4.pdf · Textbook: VLSI ARRAY PROCESSORS ... 4.5 Systolic Arrays for the Transitive Closure

34/344.3 4.3 Mapping Mapping DGs DGs and and SFGs SFGs to Systolic Arraysto Systolic Arrays

nn Bit Level Systolic Arrays:Bit Level Systolic Arrays:Example of Inner Product Example of Inner Product

of Two Vectorsof Two Vectors

a0,0

a0,1

a0,2

b0,0

b0,1

b0,2

a1,0

a1,1

a1,2

b1,0

b1,1

b1,2

a2,0

a2,1

a2,2

b2,0

b2,1

b2,2

c0

c1c3

c4

c2

Sum

Carry

HA

FA

Top Layer

BottomLayer