量子化学計算の大規模化2 - cms- · PDF fileObara-Saika (垂直漸化関係式(VRR))

Embed Size (px)

Citation preview

  • ( (TCCI))

    [email protected]

    2013A 15 2013725

    1

  • 2

    Hartree-Fock

    MP2

    MP2

    2

  • 2

    Rys quadrature (Rys)

    Pople-Hehre ()

    McMurchie-Davidson (Hermite Gaussian)

    Obara-Saika ((VRR))

    Head-Gordon-Pople ((HRR)+VRR)

    ACE ()

    PRISM((contraction)) 3

    | = 1 2 1 1

    1

    12 2 2

    1 : Gauss

  • 2

    Method PH MD HGP DRK

    x 220 1500 1400 1056

    y 2300 1700 30 30

    z 4000 0 800 800 4

    (sp,sp|sp,sp)

    | = 1 2 1 1

    1

    12 2 2

    =

    exp 2

    2=xK4+yK2+z K()

    exp 2 exp

    2 = exp

    + 2 exp ( + )

    2

  • 5

    Pople-Hehre

    xAB=0 xPQ=

    yAB=0 yPQ=

    yCD=0

    McMurchie-Davidson

    (ss|ss)

    [0](m)(=(ss|ss)(m))-> (r) -> (p|q) -> (AB|CD)

    D

    C

    B

    A

    x y

    z P

    Q

    (AB|CD)

    2

    K. Ishimura, S. Nagase, Theoret Chem Acc, (2008) 120, 185-189.

  • 6

    [0](m)(=(ss|ss)(m)) [A+B+C+D] [A+B|C+D] [AB|CD]

    xPQ= xAB=0

    yPQ= yAB=0

    yCD=0

    =xK4+yK2+z (K:)

    (sp,sp|sp,sp)

    Method PH PH+MD

    x 220 180

    y 2300 1100

    z 4000 5330

    6-31G(d)cc-pVDZ

    Method PH PH+MD

    K=1 6520 6583

    K=2 16720 12490

    K=3 (STO-3G)

    42520 29535

  • 7

    (ss|ss)(dd|dd)21

    xAB=0, yAB=0FortranPerl

    FortranPerl

    2

    dosqrt

    GAMESSint2b.src (sp), int2[r-w].src (spd)

  • 2 4

    2005GAMESS

    8doCPU

    6d5d(GAMESS)

    8

    Taxol(C47H51NO14) Luciferin(C11H8N2O3S2)

    STO-3G

    (361 AOs) 6-31G(d)

    (1032 AOs) aug-cc-pVDZ

    (550 AOs)

    Original GAMESS (PH) 85.7 2015.2 2014.9

    PH+MD 69.9 1361.8 1154.5

    GAMESSFock(sec) Pentimu4 3.0GHz

  • FMO, DC, ONIOM, QM/MM

    ECP, Frozen core,

    ()

    9

  • 1 ()

    SIMD

    1

    dofor

    ()

    A(ix, iy, iz) or A(iz, iy, ix)

    10 100 10 1

    L2 L1

  • 2 ()

    BLAS, LAPACK

    BLASCPU(100)

    BLAS2BLAS3

    11

    BLAS2 (-) O(N2) O(N2)

    BLAS3 (-) O(N3) O(N2)

  • 1

    116GB, 81PB

    ()

    (MPI)(OpenMP)

    12

  • 2(MPI)

    MPI

    ()

    (1MB)

    OpenMPMPI ()

    MPI

    MPI

    13

  • 3(OpenMP)

    !$OMP parallel!$OMP do (schedule(dynamic))OpenMP

    ()OpenMP

    criticalatomic

    (private, reduction)

    Commonmoduleprivatethreadprivate

    commonmodule

    14

  • Hartree-Fock, (DFT) 2()

    O(N3N4) O(N2)

    -

    O(N5) O(N4)

    O(N5) O(N4)

    N: (or)

    15

    2N3 8N5 32 10N3 1,000N5 100,000

  • Hartree-Fock

    ,,

    ||2i

    iiCCHF

    (AO)2

    SCFC

    Fock

    F: Fock, C: S: , e:

    C

    AO+Fock (O(N4))

    Fock (O(N3))

    C

    | = 1 2 1 11

    12 2 2

    1 : Gauss

    16

  • MPI/OpenMP1

    !$OMP parallel do schedule(dynamic,1) reduction(+:Fock)

    do n, 1, -1

  • MPI/OpenMP2

    GAMESS

    1OpenMP3MPI

    MPI MPIOpenMP

    MPImodIF

    MPI

    IF

    MPIOpenMP

    18

  • MPI/OpenMP3

    OpenMP

    (shared)(private)

    privatecommonthreadprivate

    xyzxyz

    19

  • SCFNewton-RaphsonSecond-Order SCF

    20

    21

    212

    1

    11122212

    1

    111

    CSSSCCSSC tt

    C1: C2: Huckel S11: S12: Huckel

    D. Cremer and J. Gauss, J. Comput. Chem. 7 274 (1986).

  • Hartree-Fock

    21

    : Cray XT5 2048 CPU (Opteron 2.4GHz, Shanghai, 8cores/node)

    : PGI fortran compiler-8.0.2 BLAS,LAPACK: XT-Libsci-10.3.3.5 MPI: XT-mpt-3.1.2.1 (MPICH2-1.0.6p1) : GAMESS : TiO2(Ti35O70) (6-31G, 1645 functions, 30 SCF cycles)

  • TiO2

    22

    0.0

    200.0

    400.0

    600.0

    800.0

    1000.0

    1200.0

    1400.0

    0 512 1024 1536 2048

    GAMESS(Flat MPI)

    (Flat MPI) (MPI/OpenMP)

    CPU

    Table ()()

    MPI/OpenMP

    CPU 16 256 1024 2048

    GAMESS

    MPI 18176.4 (16.0)

    1368.6 (212.5)

    527.6 (551.2)

    383.5 (758.3)

    MPI 18045.6 (16.0)

    1241.2 (232.6)

    428.7 (673.5)

    273.7 (1054.9)

    MPI/OpenMP 18121.6 (16.0)

    1214.6 (238.7)

    381.1 (760.8)

    234.2 (1238.0)

  • TiO2

    23

    CPU 16 256 1024 2048 GAMESS

    MPI 17881.8 (16.0)

    1175.2 (243.5)

    334.0 (856.6)

    188.6 (1517.0)

    MPI 17953.5 (16.0)

    1175.2 (244.4)

    360.0 (797.9)

    203.1 (1414.4)

    MPI/OpenMP 17777.6 (16.0)

    1150.4 (247.3)

    316.4 (899.0)

    174.8 (1627.2)

    CPU 16 256 1024 2048 GAMESS

    MPI 166.2 (0.9%)

    143.6 (10.5%)

    143.6 (27.2%)

    143.8 (37.5%)

    MPI 20.2

    (0.1%) 18.6

    (1.5%) 18.9

    (4.4%) 19.2

    (7.0%)

    MPI/OpenMP 18.6

    (0.1%) 13.2

    (1.1%) 13.6

    (3.6%) 13.8

    (5.9%)

    Table Fock()()

    Table()()

  • HF

    CPU

    1%20484

    OpenMPGAMESS

    CommonHartree-FockDFT

    commonOpenMP

    24

  • 2(MP2)

    (-)

    | = (|)

    2 = | 2 | |

    +

    (|) (O(N4))

    (i|) (O(N5))

    (i|j) (O(N5))

    (ai|j) (O(N5))

    (ai|bj) (O(N5))

    MP2 (O(N4))

    Hartree-Fock

    25

    :, :

    i,j, a,b ,,,

  • AOMO

    AO:MO2

    MO:AO2 or Broadcast R. A. Whiteside, J. S. Binkley, M. E. Colvin, H. F. Schaefer (1987) (32CPU)

    I. M. B. Nielsen, C. L. Janssen (2000) (MPI + pthreads)

    Global arrays, ARMCI, DDI

    AOMO

    J. Baker, P. Pulay (2002) ()

    26

  • MP2

    27

    K. Ishimura, P. Pulay, S. Nagase, J Comput Chem 2006, 27, 407.

    (AO index) do , calculate |) [, ] (all ) AO calculate i|) [i, ] (all i) 1 calculate i|j) [ij, , ] (ij) 2 end do , calculate (i|bj) [b, ij] (all b) 3 i|bj) [b, ij, ] end do i,j (MO index) i|bj) + MPI_isend,irecv calculate (ai|bj) [b, a] (all a,b) 4 calculate MP2 energy end do i,j

    MPIAOMO

  • BLAS

    DGEMM(-)

    DGEMM

    3

    28

    calculate i|j) [ij, , ] (ij) 2 calculate (i|bj) [b, ij] (all b) 3

    | = |

    do call dgemm(T,T,...,C,...,(i|j),...,(i|bj),.,zero,..) enddo

  • DGEMM1

    A(M,K), B(K,N)C=ABC(M,N)(N,M)2

    DGEMM(N,N,...,A,...,B,...,C,...)

    DGEMM(T,T,...,B,...,A,...,C,...)

    29

    A B C1 =

    A B C2 =

    M

    N

    M

    N

  • DGEMM2

    3A(M,L,K), B(K,N)C=ABC(M,L,N)(N,M,L)(M,N,L)(N,L,M) (M,L,N)DGEMM(N,N,...,A,...,B,...,C,...), A(M*L,K), C(M*L,N)

    (N,M,L)DGEMM(T,T,...,B,...,A,...,C,...), A(M*L,K), C(N,M*L)

    (M,N,L)LDGEMM, A(1:M,l,1:K)

    A(M,L,K+1)

    (N,L,M) A(M,L,K+1) ,C(N,L,M+1)

    30

    do l call DGEMM(N,N,M,N,K,one,A(1,l,1),M*L,B,K,zero,C(1,1,l),M) enddo

    do l call DGEMM(T,T,N,M,K,one,B,K,A(1,l,1),M*L,zero,C(1,l,1),N*L) enddo

    M,1 M,2 M,3 M,4

    K

    A

  • MP2

    31

    : Pentium4 3.0GHz : Gigabit Ethernet GAMESS : Taxol(C47H51NO14)

    6-31G(d) 6-311G(d,p)

    1032 1484

    164 164

    806 1258

    1 0.67GB 0.96GB

    1 90GB/nproc 202GB/nproc

    90GB 202GB

  • MP2

    32

    CPU 1 2 4 8 16

    6-31G(d) (1032 AOs)

    (hour) 10.2 5.08 2.54 1.31 0.64

    Speed-up 1.0 2.0 4.0 7.8 15.8

    6-311G(d,p) (1484 AOs)

    (hour) 31.6 16.3 8.06 4.05 2.05

    Speed-up 1.0 1.9 3.9 7.8 15.4

  • MP2

    33

    ij ab

    xab

    ij

    oall

    ij

    x

    ij

    x

    ij

    oall

    i

    vall

    a

    ai

    x

    ai

    vall

    ab

    ab

    x

    ab

    oall

    ij

    ij

    x

    ij

    a

    oall

    i

    ai

    x

    ai

    a

    vall

    b

    ab

    x

    ab

    i

    oall

    j

    ij

    x

    ij

    jbiatIIIWSIIWSIIWSIIWS

    IWSIWSIWS

    |24

    422

    )2()2()2(

    )2()2()2(

    )2(2 ||22 pqall

    pq

    xoall

    k

    xx

    pq

    x

    MP PqkpkkkpqHE

    ij c Ba

    ac

    ijaB

    ij cbc

    ij

    ac

    ijab

    k ab Ji

    ab

    ikiJ

    k abab

    jk

    ab

    ikij

    jciBtP

    D

    jcibtP

    kbJatP

    D

    kbjatP

    eeee

    | ,

    | ,

    | ,

    | )2()2()2()2(

    jk b

    ab

    jkai

    ij c

    ac

    ijab

    k ab

    ab

    ikij kbijtIWjcibtIWkbjatIW | ,| ,|)2()2()2(

    MO

    pq

    pqijpqijiaiaibaababjiijij APIIIWPIIWPIIWPIIW)2()2()2()2()2()2()2()2( , , , eeeee

    j bc

    bc

    iji

    oall

    jk

    vall

    bc jk b

    ab

    jkabcjkai jcabtNklijtNacibbciaPajikjkiaPL |2|2|2|4|2|4)2()2(

  • MP2

    34

    aijaiabijabijab

    ij tIWIWIWPPt ,,,,,