51

Click here to load reader

CMSI計算科学技術特論A(11) 行列計算における高速アルゴリズム2

Embed Size (px)

Citation preview

  • 1. - - 2013627 CMSIA

2. Arnoldi Implicitly Restarted Arnoldi 3. Lanczos Thick Restart Lanczos 4. A = AT A = AT AT A* AH Ax = x Ax = Bx 2 Ax + Bx + Cx = 0 A( )x = 0 5. 3 3 6. A = AT n A = A* 7. A n 1A n 1 {xi} 1 8. AT A AT i A yi A i Schur A U T A Schur T A AT yi = i yi yi TA = i yi T UT AU = T 9. Z C = Z1AZ C A i A yi C wi A 3 xi = Zwi 10. Ax = Bx A B B = LLT L1ALTy = y x = LTy B B1Ax = x B B1A QZ 11. A 1 2 n 1 A x x = i=1 ncivi i=1 n ci 2 = 1 xTAx = ( i=1 ncivi)T( i=1 nci ivi ) = i=1 nci 2 i c1 = 1, c2 = = cn = 0 1 n 12. Courant-Fischer A r r S Rn xTAx / xTx x A Rayleigh 13. A A+ A x, y x y 14. A Jacobi 3 T 3 Householder 3 Lanczos 3 2QR T i ui A i vi A A 3 T = QTAQ Jacobi Lanczos Householder 2 QR 15. A Hessenberg H Hessenberg Householder Hessenberg Arnoldi Hessenberg QR U i ui A i vi A A Hessenberg H = QTAQ Arnoldi Householder QR U = PTHP 16. x(0) A x(k) x(k) 17. A | 1| > | i| i 2 x(0) A x(0) = i=1 ncivi c1 0 x(0) v1 18. A | 1|/| 2| 2 2 1 19. A (A sI)1 s s Lanczos/Arnoldi x(0), x(1), , x(k) x(k) QR n 20. Anoldi x(0), x(1), , x(k) x(0), x(1), , x(k) q(1), q(2), , q(k) q(1), q(2), , q(k) Kk(A; x(0)) Arnoldi Kk(A; x(0))Ritz-Galerkin x r = Ax x Kk(A; x(0)) 21. Arnoldi Arnoldi 22. Arnoldi Arnoldi Arnoldi Aqi q1, q2, , qi+1 A k Arnoldi Aqi = h1i q1 + h2i q2 + + hi+1,i qi+1 AQk = Qk+1Hk nk (k+1)k , 23. Arnoldi Arnoldi AQk = Qk+1Hk Qn T . Kk(A; x(0)) Qk Rk nHk Rk k 24. Arnoldi x Kk(A; x(0)) x = Qy y Rk Ritz-Galerkin k k x = Qky Qk T(Ax x) = Qk TAQky Qk TQky = Hky y = 0. Hky = y 25. Arnoldi Arnoldi Qk, Hk k k Hky = y LAPACKQR x = Qky Arnoldi O(k2n) k k O(k3) O(kn) Arnoldi k n Arnoldi 26. Arnoldi O(k2n) O(k2n) k qk q1 Arnoldi q1, q2, , qk1 27. Implicitly Restarted Arnoldi (1) Kk(A; x(0)) l l < k l q1 +, q2 +, , ql + q1, q2, , qk l k Arnoldi AQl + = Ql+1 +Hl + [q1 +, q2 +, , ql +] = [q1, q2, , qk]U U Rk l 28. Implicitly Restarted Arnoldi (2) q1 +, q2 +, , ql + k-step Arnoldi AQk = QkHk + hk+1,k qk+1ek T 1, 2, , kl Hk kl QR Qk + = QkUHk + = UT HkU U = U1U2 UklQR l l-step Arnoldi Arnoldi l AQk + = Qk + Hk + + hk+1, k qk+1ek TU AQl + = Ql + Hl + + hl+1, l ql+1 +el T 29. Implicitly Restarted Arnoldi (3) = = = + + + l kl kl+1 A A A Qk + Qk + Ql + Qk + Qk + Ql + Hk + Hk + Hl + ek T hk+1,k qk+1 U hk+1,k qk+1ek T U hl+1,l +ql+1 +el T QR l-step Arnoldi k l 30. Implicitly Restarted Arnoldi (4) QR 1, 2, , kl R = Rkl R2 R1 Ri QR Qk + 1 q1 + implicit restart UR = (Hk + 1I) (Hk + 2I) (Hk + kl I) q1 + = QkUe1 = Qk (Hk + 1I) (Hk + 2I) (Hk + kl I) R1e1 = r11 1 (A 1I) (A 2I) (A kl I) q1 q1(A 1I) (A 2I) (A kl I) q1 31. Implicitly Restarted Arnoldi (5) q1 32. Arnoldi AQk = QkHk + hk+1,k qk+1ek T DO l = 1, 2, 3, 1, 2, , kl Hk k l QR U = U1U2 Ukl Qk + = QkUHk + = UT HkU l = (Hk + )l+1,l l = Ukl ql+1 + = ql+1 l + hk+1,k qk+1 l hl+1,l = ql+1 + 2 ql+1 + := ql+1 + / hl+1,l Qk + l Ql + Hk + l l Hl + Arnoldi AQl + = Ql + Hl + + hl+1,l ql+1 + el TArnoldi k l END DO 33. (1) SR8000 (F1) Arnoldi IRAM (ARPACK) 34. (2) Ax x / < 10-14 N = 13112, 51482 NE = 30, 100 IRAM N=13112 N=51482 NE=30 930 50 1600 60 26.1G 7.69G Arnoldi IRAM 339G 40.6G Krylov N=13112 N=51482 NE=100 2000 140 160 242G 77.0G Arnoldi IRAM 382G 35. (3) N=51482 1 , 30 IRAM 1.0E-14 1.0E-12 1.0E-10 1.0E-08 1.0E-06 1.0E-04 1.0E-02 1.0E+00 0.0 10.0 20.0 30.0 40.0 50.0 1 30 Arnoldi Step=100 IRAM Step=60 (Gflop) 36. (4) N=51482 100 IRAMArnoldi 1.0E-14 1.0E-12 1.0E-10 1.0E-08 1.0E-06 1.0E-04 1.0E-02 1.0E+00 0.0 100.0 200.0 300.0 400.0 500.0 100 Arnoldi Step=500 IRAM Step=160 (Gflop) 37. Lanczos Arnoldi Kk(A; x(0)) Ritz-Galerkin A Lanczos Lanczos 38. Lanczos Lanczos Qk Rk nTk Rk k . 3 Kk(A; x(0)) 39. Lanczos Lanczos Qk, Tk k k Tky = y LAPACKQR x = Qky Ritzx Ritz Lanczos O(kn) k k O(k3) O(kn) Lanczos k n Lanczos 40. Lanczos n = 20 1 1.2 1.4 1.6 1.8 2 2.2 2.4 0 2 4 6 8 10 12 14 Power method Lanczos method 41. Lanczos Rayleigh xTAx / xTx Kk(A; x(0)) = span{x(0), x(1), , x(k1)} Qk = [q1, q1, , qk] Tk y Tk A x = Qky Lanczos Lanczos Rayleigh span{x(0), x(1), , x(k1)} v1 42. Lanczos r r Courant-Fischer S Kk(A; x(0)) Tk r Lanczos Lanczos Kk(A; x(0)) Courant-Fischer 43. Lanczos q1, q2, , qk q1, q2, , qk O(k2n) Lanczos 44. Lanczos k O(k2n) qk x(0) qk x(0) q1, q2, , qk Thick Restart Lanczos TR-Lanczos* * http://crd-legacy.lbl.gov/~kewu/ps/trlan.html 45. Thick Restart Lanczos Lanczos k k k Ritz l l < k Lanczos l k Lanczos AVk = VkTk + k vk+1 ek T Lanczos 46. (1) EP8000 /690Turbo Matrix Market Lanczos TR-Lanczos Matrix n nz bcsstk16 4884 147631 bcsstk17 10974 219812 bcsstk18 11984 80519 bcsstk21 3600 15100 bcsstk23 3134 24156 bcsstk24 3562 81736 bcsstk25 15439 133840 bcsstk35 30237 740200 bcsstk36 23052 583096 bcsstk37 25503 583240 bcsstk39 46772 1068034 crystk02 13965 491274 Gap 0.58 0.88 0.73 0.13 0.98 1.00 1.00 0.99 0.82 0.60 0.34 0.99 Gap=( 1 100)/ 1 47. (2) Ax x / < 10-10 NE = 30, 100 TR NE=30 NE =100 0 50 100 150 200 250 300 350 400 bcsstk16 bcsstk17 bcsstk18 bcsstk21 bcsstk23 bcsstk24 bcsstk25 bcsstk35 bcsstk36 bcsstk37 bcsstk39 crystk02 0 100 200 300 400 500 600 700 bcsstk16 bcsstk17 bcsstk18 bcsstk21 bcsstk23 bcsstk24 bcsstk25 bcsstk35 bcsstk36 bcsstk37 bcsstk39 crystk02 simple TR 48. (3) bcsstk39 28, 29, 30 1.0E-17 1.0E-15 1.0E-13 1.0E-11 1.0E-09 1.0E-07 1.0E-05 1.0E-03 1.0E-01 0.0 1.0 2.0 3.0 4.0 5.0 [Gflop] 28 30 29 TR simple Conv. check Simple: 100 TR : 90 49. (4) bcsstk21 TR-Lanczos 1.0E-16 1.0E-14 1.0E-12 1.0E-10 1.0E-08 1.0E-06 1.0E-04 1.0E-02 1.0E+00 0.0 0.2 0.4 0.6 0.8 1.0 1.2 [Gflop] 28 30 29 TR simple Conv. check Simple: 20 TR : 70 50. Arnoldi Lanczos Implicitly Restarted Arnoldi Thick Restart Lanczos Jacobi-Davidson IRAMTRLANJD 51. L. N. Trefethen and D. Bau III: Numerical Linear Algebra, SIAM, Philadelphia, 1997. Z. Bai, J. Demmel, J. Dongarra, A.Ruhe and H. Van der Vorst (eds.): Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, SIAM, Philadelphia, 2000. G. W. Stewart: Matrix Algorithms, Vol. II: Eigensystems, SIAM, Philadelphia, 2001. Y. Saad: Iterative Methods for Sparse Linear Systems, SIAM, Philadelphia, 2011. , : , , 2009.