卒業論文発表スライド 分割統治法の拡張

Embed Size (px)

Citation preview

  • 1. 2013/02/06 B4 0811112

2. (D&C) ELPA ELPA ELPA ScaLAPACK 2 3. Devide & Conquer method D&C step1Householder step2 step3step2step1 step3 Q Q 3 4. D&C step1. Householder step2. step3. step2step1,3 step1 step2step3 Q Q 4 5. ELPA D&C LAPACKScaLAPACK ELPA3 D&C DSTEDCPDSTEDC solve_tridi D&CMY_DSTEDC MY_PDSTEDC my_solve_band()5 6. 3D&C3D&C procedure tridiagonal eigen(T , Q , D) ((1)T = 1 T ) T2 + b m vv T (2)tridiagonal eigen(T 1 , Q1 , D1 )(3)tridiagonal eigen(T 2 , Q 2 , D 2 )(4)Q 1 ,Q 2 , D1 , D2 D+ cuuT (5) deflate( D , c , u) De e a lt f (6) rank1(d , c , u , ,Q) 1 cuuT =Q QT D+((7)Q= 1 Q ) Q2QreturnQ , D= 6 7. D&CD&Cprocedure bandmatrix eigen(T ,Q , D)(8) for j :=1 k do begin ()T T1 C(1)T = (9) u j =U j C T2 if j=1then(2) DGESVD (C ,W , S ,V ,...) (10)w j =u 11/2 1/2T(3)Q=W S , R=S V else ( )T T 1 R R (11)w j =Qu j+ UU T(4)T =( j1) T 2 QQ T(12) deflate ( D , c j , w j) ( j1)( j) ( j) T(5)bandmatrix eigen(T 1 R R , Q 1 , D 1 )(13)rank1( D ,c j , w j , D ,Q ) T(6)bandmatrix eigen(T 2 QQ ,Q 2 , D 2 )if j=1then(14)Q=Q (1) (0)(7) D = ( D1D2 )(15)elseQ=Q Q ( j)end ;(16)Q= 1( Q Q2 )Q ( j) 7return Q , D= D 8. ELPA Eigenvalue soLvers for Petaflops Applications BLASLAPACKBLACSScaLAPACK MPI Faster replacements for ScaLAPACK... ScaLAPACK28 9. ELPA 1 (3) (i) full (ii) band householder BLAS-3 (cache blocking) MPI 2 (D&C) 9 10. ELPA ScaLAPACK =NB, :16NB, =16 ELPA ScaLAPACKP0P1 P2 P3 P0P1 P2 P3P4P5 P4P5P15P15416416 4NB4NB4NB4NB 10 11. ELPA ScaLAPACK D&C solve_tridi v.s. PDSTEDC (JED) : JED purple(01~30), blue(01~20) 50 CPU: Intel(R) Core(TM)2 Duo CPU E8300 @ 2.83GHz : 3598232kB OS: Vine Linux 6.0 : mpif90, : -O3 MPI MPI_Wtime() : 4000, :1611 : 4,8,16,19,25,32,41,50 ( 2 ) 12. ELPA ScaLAPACK [sec] ELPAScaLAPACK50 12 13. ELPA ScaLAPACK D&C3 ELPA 13 14. ELPAD&C 1. solve_tridi- - solve_tridi_col 1,2 - merge_recursive- 2. solve_tridi_col 3 - solve_tridi_single- - merge_systems 4,5 3. solve_tridi_single 4. merge_systems 4 5. merge_recursive 14 15. 1. Pham Huu Phuong, , 222. T. Auckenthaler, V. Blum, H.-J. Bungartz, T. Huckle, R. Johanni, L. Krmer, B. Lang, H. Lederer, and P. R. Willems: "Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations", Parallel Computing 37, PP.783-794 (2011).3. http://elpa-lib.fhi-berlin.mpg.de/wiki/index.php/Main Page(ELPA )4. , , , , . Vol.3, No.20-29(June 2010) 15 16. 5. , GPU GPU . NVIDIA , GTC Workshop Japan 2011.6. , , The Japan Society for Indus-trial and Applied Mathematics, Vol.20, No.3, PP.212-222, Sep.2010 ( ) 16