Upload
linore
View
42
Download
2
Embed Size (px)
DESCRIPTION
Array Operation Synthesis to Optimize Data Parallel Programs. Department of Computer Science, National Tsing-Hua University Student:Gwan-Hwan Hwang Advisor: Dr. Jenq Kuen Lee. Array Operation Synthesis to Optimize Data Parallel Programs. 國立清華大學 資訊工程系 Student: 黃冠寰 Advisor: 李政崑博士. - PowerPoint PPT Presentation
Citation preview
Array Operation Synthesis to Optimize Data Parallel Programs
Department of Computer Science,
National Tsing-Hua University
Student:Gwan-Hwan HwangAdvisor: Dr. Jenq Kuen Lee
Array Operation Synthesis to Optimize Data Parallel Programs
國立清華大學資訊工程系
Student: 黃冠寰Advisor: 李政崑博士
Array Operation Synthesis on Distributed-memory Machines
國立清華大學資訊工程學系
黃冠寰 , Phd.
Compiler Optimization for Compiler Optimization for
Parallel Computations on Parallel Computations on
Distributed & Shared Memory Distributed & Shared Memory
Machines Machines •Communication Code for Block-Cyclic Distribution of HPF(IPPS’98)
•Array Operation Synthesis for Intrinsic Array Functions (JPDC, ACM PPoPP’95, ICPP’96)
Research Interests
Key Issues
•Automatic Alignment for Data Parallel Languages (LCPC’97)
Concurrent Testing Concurrent Testing •Reachability Testing of Concurrent Program (IJSEKE’95, APSEC’93)
Parallel Object Program Model &Parallel Object Program Model &
Heterogeneous ComputingHeterogeneous Computing
•Java-Based Network Computing Environment •Transparent Parallel Computing Environment (Ongoing)
Outline of Presentation
• Fortran 90 Intrinsic Array Operations
• Array Operation Synthesis(AOS)
• SYNTOOL
• Apply AOS to Shared-Memory Machines
• Apply AOS to Distributed-Memory Machines
• Conclusion and Future Work
Outline of Presentation• Fortran 90 Intrinsic Array Operations
• Array Operation Synthesis(AOS)
• SYNTOOL
• Apply AOS to Shared-Memory Machines
• Apply AOS to Distributed-Memory Machines
• Integrate AOS with Automatic Data Alignment
• Conclusion and Future Work
Intrinsic Array Operations
• Provided by Modern Program Languages. E.g. Fortran 90, High Performance Fortran(HPF), HPF2,
Fortran 97, APL, MATLAB, MATHEMATICA, NESL, C*
• Engineering and Scientific Applications
• Facilitate a Compilation Analysis for Optimization
• Support Parallel Execution and Portability
4321
16151413
1211109
8765
416128
315117
214106
11395
Intrinsic Array Operations(Cont’d)• Array Operations Provided by Fortran 90, HPF.
• Examples:
CSHIFT, TRANSPOSE, MERGE, EOSHIFT, RESHAPESPREAD, Section Move, Where Constructs, Reductions.
16151413
1211109
8765
4321B=CSHIFT(A,1,1)
4321
16151413
1211109
8765
C=TRANSPOSE(B)
Consecutive Array Expressions• Array Expression
• Consecutive Array Operations
C=EOSHIFT(MERGE(RESHAPE(S,/N,N/),A+B,T),1,0,1)
FXP=CSHIFT(F1,1,+1)FXM=CSHIFT(F1,1,-1)FXP=CSHIFT(F1,2,+1)FYM=CSHIFT(F1,2,-1)FDERIV=ZXP*(FXP-F1)+ZXM*(FXM-F1)+ ZYP*(FYP-F1)+ZYM*(FYM-F1)
Classification of Array Operations
• Model Array Operations by Data Access Functions (DAF)
Single-Clause Multiple-ClauseSingle-Source TYPE 1 TYPE 3
Multiple-Source TYPE 2 TYPE 4
Type 1Type 2 Type 3
Type 4
Data Access Functions
• Represent Array Operations by Mathematical Functions
• Model Array Operations by Data Access Functions (DAF)Single-Source, Multiple-SourceSingle-Clause, multiple-Clause
Type 1: Single-source Single-clause Data Access Function
• One Source Array
• One Data Access Pattern
4321
16151413
1211109
8765
416128
315117
214106
11395
B=TRANSPOSE(A)
Data Access Function is B(I,J)=A(J,I)
Single-source Single-clause Data Access Function
• One Source Array
• One Data Access Pattern
4321
16151413
1211109
8765
416128
315117
214106
11395
B=TRANSPOSE(A)
Data Access Function is B(I,J)=A(J,I)
Type 2: Multiple-source Single-clause Data Access Function
• Multiple Source Arrays
• One Data Access PatternR=MERGE(T,F,M)
Data Access Function is
111
111
111
222
222
222
TFF
FTT
TFT
122
211
121
I,JI,JI,JI,J M,F,TR where
False if
True if ,,
zy
zxzyx
Array T Array F Array M Array R
Multiple-source Single-clause Data Access Function
• Multiple Source Arrays
• One Data Access PatternR=MERGE(T,F,M)
Data Access Function is
111
111
111
222
222
222
TFF
FTT
TFT
122
211
121
I,JI,JI,JI,J M,F,TR where
False if
True if ,,
zy
zxzyx
Array T Array F Array M Array R
Type 3: Single-source Multiple-clause Data Access Function
• Single Source Array• Multiple Data Access Patterns
B=CSHIFT(A,1,1)
Data Access Function is
/1:4:1 , 1:3:1//,,/ 1A
/1:4:1 , 1:4:4//,,/ 3AB
JI,JI
JI,JII,J
16151413
1211109
8765
4321
4321
16151413
1211109
8765
Array A Array B
: a segmentation descriptor
Single-source Multiple-clause Data Access Function
• Single Source Array• Multiple Data Access Patterns
B=CSHIFT(A,1,1)
Data Access Function is
/1:4:1 , 1:3:1//,,/ 1A
/1:4:1 , 1:4:4//,,/ 3AB
JI,JI
JI,JII,J
16151413
1211109
8765
4321
4321
16151413
1211109
8765
Array A Array B
: a segmentation descriptor
Type 4: Multiple-source Multiple-clause Data Access Function
• Multiple Source Arrays• Multiple Data Access Patterns
No array operation of Fortran 90 belongs to type 4Synthesis of multiple array operations may derive a
type 4 data access function.
Multiple-source Multiple-clause Data Access Function
• Multiple Source Arrays• Multiple Data Access Patterns
No array operation of Fortran 90 belongs to this typeSynthesis of multiple array operations may derive a
multiple-source multiple-clause data access function
Straightforward Compilation• Translate each operation into a parallel loop
B=CSHIFT((TRANSPOSE(EOSHIFT(A,1,0,1),1,1)
FORALL (I=1:N:1; J=1:N:1) T2(I,J)=T1(J,I)ENDFORALL
FORALL (I=1:N:1; J=1:N:1) IF (1<=I<=N-1) and (1<=J<=N) THEN B(I,J)=T2(I+1,J) ELSE B(I,J)=T2(I-N,J)ENDFORALL
FORALL (I=1:N:1; J=1:N:1) IF (1<=I<=N-1) and (1<=J<=N) THEN T1(I,J)=A(I+1,J) ELSE T1(I,J)=0ENDFORALL
EOSHIFT
TRANSPOSE
CSHIFT
Array Operation Synthesis
• Construct the Parse Tree of Array Expression
• Represent Array Operations by Mathematical Functions (DAF)
B=CSHIFT((TRANSPOSE(EOSHIFT(A,1,0,1),1,1)
CSHIFT
TRANSPOSE
EOSHIFT
/1::1 , 1:://,,/ 12T
/1::1 , 1:1:1//,,/ 12TB
NNNJI,JNI
NNJI,JII,J
J,II,J T1 T2
/1::1 , 1:://,,/ 0
/1::1 , 1:1:1//,,/ 11T
NNNJI
NNJI,JIAI,J
Array Operation Synthesis (Cont’d)
/1::1 , 1:://,,/ 12T
/1::1 , 1:1:1//,,/ 12TB
NNNJI,JNI
NNJI,JII,J
J,II,J T1 T2
CSHIFT
TRANSPOSE
Synthesis of twofunctions
/1::1 , 1:://,,/ 0
/1::1 , 1:1:1//,,/ 11T
NNNJI
NNJI,JIAI,J
/1::1 , 1:://,,/ 1,1T
/1::1 , 1:1:1//,,/ 1,1TB
NNNJINIJ
NNJIIJI,J COSHIFT+
TRANSPOSE
EOSHIFT
/:1 , ://,1,/ /:1 , ://,,/ 0
/:1 ,1:1//,1,/ /:1 , ://,,/ 1,A
/:1 ,:/ /,1,//:1 , 1:1//,,/ 0
/:1 , 1:1/ /,1,//:1 , 1:1//,,/ 1,1A
B
NNNNIJNNNJI
NNNIJNNNJINIJ
NNNIJNNJI
NNIJNNJIIJ
I,J
• Substitution (Term Rewriting like method)Having two Data Access Patterns:
The Synthesized Data Access Pattern is:
Synthesis of two Data Access Functions
,,,,,,,,,,,112111 iifiifiifSiiT nmnnn
,,,,,,,,,,,112111 iihiihiihQiiS mpmmm
'
112111 ,,,,,,,,,,, iigiigiigQiiT npnnn
piiiifiifiifhiig nmnnini 1,,112111
,,,,,,,,,,,
/::,,::/,,,,,,/ 111,111'
sulsuliifiif mmmnmn
where
/::,,::/,,/ 111,/1 sulsulii mmmn where
• For example,
• By the substitution rule
3:3,1:/1j/,/i, ji,Aji,1T
Synthesis of two DAFs (Cont’d)
4:3,1:/1j/,/i, j1,iT3,1ij,T1ji,B
3:3,1:/11/,i/j,4:3,1:/1j/,/i, j1,iT3,1ij,Aji,B
• For example,
Synthesis of two DAFs (Cont’d)
y
q
11
p1
i,,iT
y
q
x
k
11
n1
,T,
,T,
,T,
i,,iS
x
k
yxyx
xx
xx
ykyk
kk
kk
yy
,x
2,x2
1,x1
,k
2,k2
1,k1
,111
2,1121
1,1111
n1
,,
,,
,,
,,
,,
,,
,,
,,
,,
i,,iS
Code Generation for Synthesized Data Access Function
/:1 , ://,1,/ /:1 , ://,,/ 0
/:1 ,1:1//,1,/ /:1 , ://,,/ 1,A
/:1 ,:/ /,1,//:1 , 1:1//,,/ 0
/:1 , 1:1/ /,1,//:1 , 1:1//,,/ 1,1A
B
NNNNIJNNNJI
NNNIJNNNJINIJ
NNNIJNNJI
NNIJNNJIIJ
I,J
FORALL (I=1:N:1; J=1:N:1)
IF (/I,J/,/1:N-1,1:N/) (/J,I+1/,/1:N-1,1:N/) THEN B(I,J)=A(J+1, I+1) IF (/I,J/,/1:N-1,1:N/) (/J,I+1/,/N:N ,1:N/) THEN B(I,J)=0
IF (/I,J/,/N:N ,1:N/) (/J,I+1/,/1:N-1,1:N/) THEN B(I,J)=A(J+1, I-N+1) IF (/I,J/,/N:N ,1:N/) (/J,I+1/,/N:N ,1:N/) THEN B(I,J)=0 ENDFORALL
Code Generation
Code Generation for Synthesized Data Access Function
/:1 , ://,1,/ /:1 , ://,,/ 0
/:1 ,1:1//,1,/ /:1 , ://,,/ 1,A
/:1 ,:/ /,1,//:1 , 1:1//,,/ 0
/:1 , 1:1/ /,1,//:1 , 1:1//,,/ 1,1A
B
NNNNIJNNNJI
NNNIJNNNJINIJ
NNNIJNNJI
NNIJNNJIIJ
I,J
After Optimization
1
N-1N
1 N-1 N
/ : , ://,,/ 0
/1:1 , ://,,/ 1,A
/ : , 1:1//,,/ 0
/1:1 , 1:1//,,/ 1,1A
B
NNNNJI
NNNJINIJ
NNNJI
NNJIIJ
I,J
• Simplifying the ranges at compilation time instead of runtime
• Optimization process:Normalize:
Intersection for each dimension:
/ , 5:28:3,//,,I,/ / , 5:100:4,//,,5I3,/
Optimization
/ , 5:200:7,//,,I,/ / , 6:100:5,//,,I,/
/ , 30:77:17,//,,I,/