28
High High - - Performance Quantum Performance Quantum Simulation: A challenge to Simulation: A challenge to Schr Schr ö ö dinger equation on dinger equation on 256^4 grids 256^4 grids * * Toshiyuki Imamura Toshiyuki Imamura 13 13 今村俊幸 今村俊幸 , , Thanks to Susumu Yamada Thanks to Susumu Yamada 23 23 , , Takuma Kano Takuma Kano 2 2 , and Masahiko Machida , and Masahiko Machida 23 23 1. 1. UEC (University of Electro UEC (University of Electro - - Communications Communications 電気通信大学 電気通信大学 ) ) 2. 2. CCSE JAEA (Japan Atomic Energy Agency) CCSE JAEA (Japan Atomic Energy Agency) 3. 3. CREST JST (Japan Science Technology) CREST JST (Japan Science Technology)

High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

HighHigh--Performance Quantum Performance Quantum Simulation: A challenge to Simulation: A challenge to SchrSchröödinger equation on dinger equation on 256^4 grids256^4 grids

**Toshiyuki ImamuraToshiyuki Imamura13 13 今村俊幸今村俊幸, , Thanks to Susumu YamadaThanks to Susumu Yamada2323,,

Takuma KanoTakuma Kano22, and Masahiko Machida, and Masahiko Machida2323

1.1. UEC (University of ElectroUEC (University of Electro--Communications Communications 電気通信大学電気通信大学))2.2. CCSE JAEA (Japan Atomic Energy Agency)CCSE JAEA (Japan Atomic Energy Agency)

3.3. CREST JST (Japan Science Technology)CREST JST (Japan Science Technology)

Page 2: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 2RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

□ OutlineOutline

I.I. Physics, Review of Quantum Physics, Review of Quantum SimulationSimulation

II.II. Mathematics, Numerical AlgorithmMathematics, Numerical AlgorithmIII.III. Grand Challenge, Parallel Grand Challenge, Parallel Computing Computing

on ESon ESIV.IV. Numerical ResultsNumerical ResultsV.V. ConclusionConclusion

Page 3: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

I. Physics,I. Physics, Review of Quantum Review of Quantum Simulation, etc.Simulation, etc.

Page 4: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 4RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

SW’

IS

W downdown--sizingsizing

Crossover from Classical to Quantum ???Crossover from Classical to Quantum ???

1.1, Quantum Simulation (1/2)(1/2)

Classical Equation of MotionClassical Equation of Motion

SchroedingerSchroedinger EquationEquation

Page 5: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 5RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

Numerical Simulation for Coupled Schrodinger Eq.Numerical Simulation for Coupled Schrodinger Eq.

αα::

CouplingCoupling

Requirement of Exact Requirement of Exact DiagonalizationDiagonalization for the Hamiltonianfor the Hamiltonian

1.2, Quantum Simulation (2/2)

ββ::

1/Mass 1/Mass ∝∝

11/ W/ W

ββ::

1/Mass 1/Mass ∝∝

11/ W/ W H

: Spectral expansionby {un } eigenvecs.

Ψ : possible statenot a valuebut a vector!

Numerical method to solve the above equationNumerical method to solve the above equation

Page 6: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

II. Mathematics,II. Mathematics, Numerical Algorithm, etc.Numerical Algorithm, etc.

Page 7: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 7RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

□ 2.1 2.1 KrylovKrylov Subspace IterationSubspace Iteration

LanczosLanczos (Traditional method)(Traditional method)Krylov+GSKrylov+GS : Simple, but : Simple, but shift+invertshift+invert version is neededversion is needed

LOBPCG LOBPCG (Locally Optimal Block PCG)(Locally Optimal Block PCG){{KrylovKrylov base, Ritz vector, prior vector} : CG approachbase, Ritz vector, prior vector} : CG approach**Restart at every iteration****Restart at every iteration****INVERSE**INVERSE--free** free** --> Less Communication> Less Communication

LOBPCGLanczos

Page 8: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 8RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

□ 2.2 LOBPCG 2.2 LOBPCG

Costly! Since the block is updated at every Costly! Since the block is updated at every iteration, MV operation is also required!!iteration, MV operation is also required!!

1*MV / every iteration

3*MV / every iteration

Other Difficulties in implementationOther Difficulties in implementation• Breakdown of linear independencyBreakdown of linear independency

make our own DSYGV using LDL and deflation (not make our own DSYGV using LDL and deflation (not CholeskyCholesky))•• Growth of numerical error in {W,X,P}Growth of numerical error in {W,X,P}detect numerical error and recalculate them automaticallydetect numerical error and recalculate them automatically

•• Choice of the shiftChoice of the shift•• Portability Portability

Page 9: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 9RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

□ 2.3 Preconditioning2.3 Preconditioning

T~HT~H--11

H=A+BH=A+B11 +B+B22 +B+B33 +B+B44 +C+C1212 +C+C2323 +C+C3434

1e-6

1e-5

1e-4

1e-3

0.01

0.1

1

10

100

5004003002001000

No preconditionerH1 (Point Jacobi)H2 (LDL)

H3 (LDL)

Iteration count

Res

idua

l err

or

H~(A+BH~(A+B11 ))

H~ (A+BH~ (A+B11 )A)A--11(A+B(A+B22 ))

H~AH~A

Here,A: diagonalA+Bx : block-tridiagonal

shift + LDLt is used

Page 10: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

III. Grand challenge,III. Grand challenge, Parallel Computing on ES, Parallel Computing on ES, etc.etc.

Page 11: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 11RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

□ 3.2 Technical Issues on the Earth 3.2 Technical Issues on the Earth SimulatorSimulator

Programming modelProgramming modelhybrid of distributed parallelism and thread hybrid of distributed parallelism and thread

parallelism.parallelism.

Processor 0

Processor 1

Processor 7

node node

Intra-Node

Vector processing

node

Inter-Node

•• InterInter--NodeNode ::MPI MPI (Message Passing Interface)(Message Passing Interface)Low latency (6.63[us])Low latency (6.63[us])Very fast (11.63[GB/s])Very fast (11.63[GB/s])

•• IntraIntra--NodeNode ::AutoAuto--parallelizationparallelizationOpenMPOpenMP (thread(thread--level parallelism)level parallelism)

•• Vector Processor (mostVector Processor (most--inner loops) :inner loops) :AutoAuto--/manual/manual-- VectorizationVectorization

3-level parallelism

Page 12: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 12RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

□ 3.3 Quantum Simulation parallel code3.3 Quantum Simulation parallel code

Application flow chartApplication flow chart

Eigenmodecalculation

Time Integrator

Quantum stateanalyzer

Parallel LOBPCG solverdeveloped on ES

Visualization

Parallel code on ES

Parallel code on ES

Visualized by AVS

Page 13: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 13RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

□ 3.4 Handling of Huge Data3.4 Handling of Huge Data

Data distribution in case of a 4D arrayData distribution in case of a 4D array

k

i, jl

i

j

(k, l )

/ NP

intra-node parallelization

iloop length=256

vector processing

2-dimensionnal loopdecomposition

1-dimension loopdecomposition

(k, l )

/ NP

j /MP

NP : Number of MPI processesMP : Number of microtasking processes (=8)

(k,l) (j)

Page 14: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 14RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

□ 3.5 Parallel LOBPCG3.5 Parallel LOBPCG

Core implementation is MATRIXCore implementation is MATRIX--VECTOR VECTOR multmult..33--level parallelism is carefully done in our implementation.level parallelism is carefully done in our implementation.In InterIn Inter--node parallelization, communication pipelining is used. node parallelization, communication pipelining is used. In the RayleighIn the Rayleigh--Ritz part, SCALAPACK is used.Ritz part, SCALAPACK is used.

LOBPCG

do l=1,256 :: interinter--node parallelismnode parallelismdo k=1,256 :: interinter--node parallelismnode parallelism

do j=1,256 :: intraintra--node (thread) parallelismnode (thread) parallelismdo i=1,256 :: vectorizationvectorization

w(i,j,k,l)=a(i,j,k,l)*v(i,j,k,l)& +b*(v(i+1,j,k,l)+・・・) +c*(v(i+1,j+1,k,l)+・・・)

enddoenddo

enddoenddo

Acg.f Acg.f

Page 15: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

IV. Numerical Results,IV. Numerical Results,

Page 16: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 16RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

□ 4.1, Numerical Result

Preliminary test of our Preliminary test of our eigensolvereigensolver44--junction system: junction system: --> 256^4 dimension> 256^4 dimension

CPUsCPUs time[stime[s]] TFLOPSTFLOPS20482048 31183118 3.653.6530723072 25352535 4.494.4940964096 16211621 7.027.02

Performance

(5 eigenmodes)

Convergence history

(10 eigenmodes)

1e-12

1e-10

1e-8

1e-6

1e-4

1e-2

1

1e+2

1e+4

0 500 1000 1500 2000 2500 3000

the ground statethe 2nd lowest statethe 3rd lowest statethe 4th lowest statethe 5th lowest statethe 6th lowest statethe 7th lowest statethe 8th lowest statethe 9th lowest statethe 10th lowest state

Iteration count

Res

idua

l err

or

Page 17: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 17RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

Initial StateInitial StatePotential Change: Potential Change:

Only a Single JunctionOnly a Single Junction

??Capacitive Capacitive CouplingCoupling

Question: Synchronization or Independence (Localization)Question: Synchronization or Independence (Localization)

The Simplest Case: (two Junctions)The Simplest Case: (two Junctions)

4.2, Numerical Result (Scenario)

Page 18: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 18RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

TwoTwo--Stacked Intrinsic Josephson JunctionStacked Intrinsic Josephson Junction

Classical Regime: Classical Regime:

Independent DynamicsIndependent Dynamics

Quantum Regime:Quantum Regime:

??

4.3, Numerical Result

Page 19: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 19RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

q1q2

q1q2

t=0.0(a.u.) t=2.9(a.u.)

q1q2

q1q2

t=9.2(a.u.) t=10.0(a.u.)

αα==0.40.4

ββ==0.20.2

Page 20: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 20RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

t=0.0(a.u.) t=2.5(a.u.)

t=4.2(a.u.) t=10.0(a.u.)

q1

q2

q1

q2

q1

q2

q1

q2

αα==0.40.4

ββ==1.01.0

Page 21: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 21RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

Weakly Weakly Quantum(ClassicalQuantum(Classical): Independence): Independence

Strongly Quantum: Synchronization

Two JunctionsTwo Junctions

Page 22: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 22RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

Three JunctionsThree Junctions

Page 23: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 23RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

□αα==0.40.4

ββ==0.20.2

Page 24: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 24RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

αα==0.40.4

ββ==1.01.0

Page 25: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 25RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

<q1 ><q2 ><q3 ><q4 >

<q1 ><q2 ><q3 ><q4 >

t(a.u.)

t(a.u.)

q

q

(a)

(b)

4 Junctions4 Junctionsαα=0.4=0.4

ββ=0.2=0.2

αα=0.4=0.4

ββ=1.0=1.0

Quantum Assisted SynchronizationQuantum Assisted Synchronization

Page 26: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

V. ConclusionV. Conclusion

Page 27: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Jan. 4-8, 2008 27RANMEP2008, NCTS, Taiwan (清華大学 新竹 台湾)

□ 5. Conclusion5. Conclusion

Collective MQT in Intrinsic Josephson Collective MQT in Intrinsic Josephson Junctions via parallel computing on ESJunctions via parallel computing on ES

Direct Quantum Simulation (4Direct Quantum Simulation (4--Junctions)Junctions)Quantum (Quantum (SychronusSychronus) ) vsvs Classical (Localized)Classical (Localized)Quantum Assisted SynchronizationQuantum Assisted Synchronization

High Performance ComputingHigh Performance ComputingNovel Novel eigenvalueeigenvalue algorithm LOBPCGalgorithm LOBPCGCommunicationCommunication--free (or less) implementationfree (or less) implementationSustained 7TFLOPS (21.4% of Peak)Sustained 7TFLOPS (21.4% of Peak)Toward Toward PetaPeta--scale computing? scale computing?

Page 28: High-Performance Quantum Simulation: A challenge to …math.cts.nthu.edu.tw/Mathematics/RANMEP Slides/Toshiyuki... · 2008. 2. 25. · High-Performance Quantum Simulation: A challenge

Thank you! Thank you! 謝謝謝謝

Further informationFurther informationPhysics: Physics: [email protected]@jaea.go.jp

HPC: HPC: [email protected]@im.uec.ac.jp