If you can't read please download the document
Upload
doanphuc
View
243
Download
6
Embed Size (px)
Citation preview
( (TCCI))
2013A 15 2013725
1
2
Hartree-Fock
MP2
MP2
2
2
Rys quadrature (Rys)
Pople-Hehre ()
McMurchie-Davidson (Hermite Gaussian)
Obara-Saika ((VRR))
Head-Gordon-Pople ((HRR)+VRR)
ACE ()
PRISM((contraction)) 3
| = 1 2 1 1
1
12 2 2
1 : Gauss
2
Method PH MD HGP DRK
x 220 1500 1400 1056
y 2300 1700 30 30
z 4000 0 800 800 4
(sp,sp|sp,sp)
| = 1 2 1 1
1
12 2 2
=
exp 2
2=xK4+yK2+z K()
exp 2 exp
2 = exp
+ 2 exp ( + )
2
5
Pople-Hehre
xAB=0 xPQ=
yAB=0 yPQ=
yCD=0
McMurchie-Davidson
(ss|ss)
[0](m)(=(ss|ss)(m))-> (r) -> (p|q) -> (AB|CD)
D
C
B
A
x y
z P
Q
(AB|CD)
2
K. Ishimura, S. Nagase, Theoret Chem Acc, (2008) 120, 185-189.
6
[0](m)(=(ss|ss)(m)) [A+B+C+D] [A+B|C+D] [AB|CD]
xPQ= xAB=0
yPQ= yAB=0
yCD=0
=xK4+yK2+z (K:)
(sp,sp|sp,sp)
Method PH PH+MD
x 220 180
y 2300 1100
z 4000 5330
6-31G(d)cc-pVDZ
Method PH PH+MD
K=1 6520 6583
K=2 16720 12490
K=3 (STO-3G)
42520 29535
7
(ss|ss)(dd|dd)21
xAB=0, yAB=0FortranPerl
FortranPerl
2
dosqrt
GAMESSint2b.src (sp), int2[r-w].src (spd)
2 4
2005GAMESS
8doCPU
6d5d(GAMESS)
8
Taxol(C47H51NO14) Luciferin(C11H8N2O3S2)
STO-3G
(361 AOs) 6-31G(d)
(1032 AOs) aug-cc-pVDZ
(550 AOs)
Original GAMESS (PH) 85.7 2015.2 2014.9
PH+MD 69.9 1361.8 1154.5
GAMESSFock(sec) Pentimu4 3.0GHz
FMO, DC, ONIOM, QM/MM
ECP, Frozen core,
()
9
1 ()
SIMD
1
dofor
()
A(ix, iy, iz) or A(iz, iy, ix)
10 100 10 1
L2 L1
2 ()
BLAS, LAPACK
BLASCPU(100)
BLAS2BLAS3
11
BLAS2 (-) O(N2) O(N2)
BLAS3 (-) O(N3) O(N2)
1
116GB, 81PB
()
(MPI)(OpenMP)
12
2(MPI)
MPI
()
(1MB)
OpenMPMPI ()
MPI
MPI
13
3(OpenMP)
!$OMP parallel!$OMP do (schedule(dynamic))OpenMP
()OpenMP
criticalatomic
(private, reduction)
Commonmoduleprivatethreadprivate
commonmodule
14
Hartree-Fock, (DFT) 2()
O(N3N4) O(N2)
-
O(N5) O(N4)
O(N5) O(N4)
N: (or)
15
2N3 8N5 32 10N3 1,000N5 100,000
Hartree-Fock
,,
||2i
iiCCHF
(AO)2
SCFC
Fock
F: Fock, C: S: , e:
C
AO+Fock (O(N4))
Fock (O(N3))
C
| = 1 2 1 11
12 2 2
1 : Gauss
16
MPI/OpenMP1
!$OMP parallel do schedule(dynamic,1) reduction(+:Fock)
do n, 1, -1
MPI/OpenMP2
GAMESS
1OpenMP3MPI
MPI MPIOpenMP
MPImodIF
MPI
IF
MPIOpenMP
18
MPI/OpenMP3
OpenMP
(shared)(private)
privatecommonthreadprivate
xyzxyz
19
SCFNewton-RaphsonSecond-Order SCF
20
21
212
1
11122212
1
111
CSSSCCSSC tt
C1: C2: Huckel S11: S12: Huckel
D. Cremer and J. Gauss, J. Comput. Chem. 7 274 (1986).
Hartree-Fock
21
: Cray XT5 2048 CPU (Opteron 2.4GHz, Shanghai, 8cores/node)
: PGI fortran compiler-8.0.2 BLAS,LAPACK: XT-Libsci-10.3.3.5 MPI: XT-mpt-3.1.2.1 (MPICH2-1.0.6p1) : GAMESS : TiO2(Ti35O70) (6-31G, 1645 functions, 30 SCF cycles)
TiO2
22
0.0
200.0
400.0
600.0
800.0
1000.0
1200.0
1400.0
0 512 1024 1536 2048
GAMESS(Flat MPI)
(Flat MPI) (MPI/OpenMP)
CPU
Table ()()
MPI/OpenMP
CPU 16 256 1024 2048
GAMESS
MPI 18176.4 (16.0)
1368.6 (212.5)
527.6 (551.2)
383.5 (758.3)
MPI 18045.6 (16.0)
1241.2 (232.6)
428.7 (673.5)
273.7 (1054.9)
MPI/OpenMP 18121.6 (16.0)
1214.6 (238.7)
381.1 (760.8)
234.2 (1238.0)
TiO2
23
CPU 16 256 1024 2048 GAMESS
MPI 17881.8 (16.0)
1175.2 (243.5)
334.0 (856.6)
188.6 (1517.0)
MPI 17953.5 (16.0)
1175.2 (244.4)
360.0 (797.9)
203.1 (1414.4)
MPI/OpenMP 17777.6 (16.0)
1150.4 (247.3)
316.4 (899.0)
174.8 (1627.2)
CPU 16 256 1024 2048 GAMESS
MPI 166.2 (0.9%)
143.6 (10.5%)
143.6 (27.2%)
143.8 (37.5%)
MPI 20.2
(0.1%) 18.6
(1.5%) 18.9
(4.4%) 19.2
(7.0%)
MPI/OpenMP 18.6
(0.1%) 13.2
(1.1%) 13.6
(3.6%) 13.8
(5.9%)
Table Fock()()
Table()()
HF
CPU
1%20484
OpenMPGAMESS
CommonHartree-FockDFT
commonOpenMP
24
2(MP2)
(-)
| = (|)
2 = | 2 | |
+
(|) (O(N4))
(i|) (O(N5))
(i|j) (O(N5))
(ai|j) (O(N5))
(ai|bj) (O(N5))
MP2 (O(N4))
Hartree-Fock
25
:, :
i,j, a,b ,,,
AOMO
AO:MO2
MO:AO2 or Broadcast R. A. Whiteside, J. S. Binkley, M. E. Colvin, H. F. Schaefer (1987) (32CPU)
I. M. B. Nielsen, C. L. Janssen (2000) (MPI + pthreads)
Global arrays, ARMCI, DDI
AOMO
J. Baker, P. Pulay (2002) ()
26
MP2
27
K. Ishimura, P. Pulay, S. Nagase, J Comput Chem 2006, 27, 407.
(AO index) do , calculate |) [, ] (all ) AO calculate i|) [i, ] (all i) 1 calculate i|j) [ij, , ] (ij) 2 end do , calculate (i|bj) [b, ij] (all b) 3 i|bj) [b, ij, ] end do i,j (MO index) i|bj) + MPI_isend,irecv calculate (ai|bj) [b, a] (all a,b) 4 calculate MP2 energy end do i,j
MPIAOMO
BLAS
DGEMM(-)
DGEMM
3
28
calculate i|j) [ij, , ] (ij) 2 calculate (i|bj) [b, ij] (all b) 3
| = |
do call dgemm(T,T,...,C,...,(i|j),...,(i|bj),.,zero,..) enddo
DGEMM1
A(M,K), B(K,N)C=ABC(M,N)(N,M)2
DGEMM(N,N,...,A,...,B,...,C,...)
DGEMM(T,T,...,B,...,A,...,C,...)
29
A B C1 =
A B C2 =
M
N
M
N
DGEMM2
3A(M,L,K), B(K,N)C=ABC(M,L,N)(N,M,L)(M,N,L)(N,L,M) (M,L,N)DGEMM(N,N,...,A,...,B,...,C,...), A(M*L,K), C(M*L,N)
(N,M,L)DGEMM(T,T,...,B,...,A,...,C,...), A(M*L,K), C(N,M*L)
(M,N,L)LDGEMM, A(1:M,l,1:K)
A(M,L,K+1)
(N,L,M) A(M,L,K+1) ,C(N,L,M+1)
30
do l call DGEMM(N,N,M,N,K,one,A(1,l,1),M*L,B,K,zero,C(1,1,l),M) enddo
do l call DGEMM(T,T,N,M,K,one,B,K,A(1,l,1),M*L,zero,C(1,l,1),N*L) enddo
M,1 M,2 M,3 M,4
K
A
MP2
31
: Pentium4 3.0GHz : Gigabit Ethernet GAMESS : Taxol(C47H51NO14)
6-31G(d) 6-311G(d,p)
1032 1484
164 164
806 1258
1 0.67GB 0.96GB
1 90GB/nproc 202GB/nproc
90GB 202GB
MP2
32
CPU 1 2 4 8 16
6-31G(d) (1032 AOs)
(hour) 10.2 5.08 2.54 1.31 0.64
Speed-up 1.0 2.0 4.0 7.8 15.8
6-311G(d,p) (1484 AOs)
(hour) 31.6 16.3 8.06 4.05 2.05
Speed-up 1.0 1.9 3.9 7.8 15.4
MP2
33
ij ab
xab
ij
oall
ij
x
ij
x
ij
oall
i
vall
a
ai
x
ai
vall
ab
ab
x
ab
oall
ij
ij
x
ij
a
oall
i
ai
x
ai
a
vall
b
ab
x
ab
i
oall
j
ij
x
ij
jbiatIIIWSIIWSIIWSIIWS
IWSIWSIWS
|24
422
)2()2()2(
)2()2()2(
)2(2 ||22 pqall
pq
xoall
k
xx
pq
x
MP PqkpkkkpqHE
ij c Ba
ac
ijaB
ij cbc
ij
ac
ijab
k ab Ji
ab
ikiJ
k abab
jk
ab
ikij
jciBtP
D
jcibtP
kbJatP
D
kbjatP
eeee
| ,
| ,
| ,
| )2()2()2()2(
jk b
ab
jkai
ij c
ac
ijab
k ab
ab
ikij kbijtIWjcibtIWkbjatIW | ,| ,|)2()2()2(
MO
pq
pqijpqijiaiaibaababjiijij APIIIWPIIWPIIWPIIW)2()2()2()2()2()2()2()2( , , , eeeee
j bc
bc
iji
oall
jk
vall
bc jk b
ab
jkabcjkai jcabtNklijtNacibbciaPajikjkiaPL |2|2|2|4|2|4)2()2(
MP2
34
aijaiabijabijab
ij tIWIWIWPPt ,,,,,