Final Thesis

Embed Size (px)

DESCRIPTION

GPGPU driven simulations of zero-temperature 1D Ising model with Glauber dynamics

Text of Final Thesis

  • 1.POLITECHNIKA WROCAWSKA WYDZIA INFORMATYKI I ZARZDZANIA GPGPU driven simulations of zero-temperature 1D Ising model with Glauber dynamics Daniel Kosalla FINAL THESIS under supervision of Dr in. Dariusz Konieczny Wrocaw 2013

2. Acknowledgments: Dr in. Dariusz Konieczny 3. Contents 1. Motivation 5 2. Target 5 3. Scope of work 5 4. Theoretical background and proposed model 6 4.1. Ising model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.2. Historic methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.3. Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.4. Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.5. Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.6. Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.7. Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5. General Purpose Graphic Processing Units 10 5.1. History of General Purpose GPUs . . . . . . . . . . . . . . . . . . . . . . 10 5.2. CUDA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6. CPU Simulations 14 6.1. Sequential algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.2. Random number generation on CPU . . . . . . . . . . . . . . . . . . . . 15 6.3. CPU performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 7. GPU Simulations - thread per simulation 17 7.1. Thread per simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7.2. Running the simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 7.3. Solution space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 7.4. Random Number Generators . . . . . . . . . . . . . . . . . . . . . . . . . 20 7.5. Thread per simulation - static memory . . . . . . . . . . . . . . . . . . . 20 7.6. Comparison of static and dynamic memory use . . . . . . . . . . . . . . . 21 8. GPU Simulations - thread per spin 24 8.1. Thread per spin approach . . . . . . . . . . . . . . . . . . . . . . . . . . 24 8.2. Concurrent execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 iii 4. 8.3. Thread communication . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 8.4. Race conditions with shared memory . . . . . . . . . . . . . . . . . . . . 27 8.5. Thread per spin approach - reduction . . . . . . . . . . . . . . . . . . . . 27 8.6. Thread per spin approach - ags . . . . . . . . . . . . . . . . . . . . . . 29 8.7. Thread-per-spin performance . . . . . . . . . . . . . . . . . . . . . . . . 30 8.8. Thread-per-spin vs thread-per-simulation performance . . . . . . . . . . . 31 9. Bond density for some W0 values 34 10.Conclusions 36 11.Future work 36 Appendix 38 A. Sequential algorithm - CPU 39 B. Thread per simulation - no optimizations 43 C. Thread per simulation - static memory 48 D. Thread per spin - no optimizations 53 E. Thread per spin - parallel reduction 58 F. Thread per spin - update ag 63 iv 5. 1. Motivation In the presence of recent developments of SCM (Single Chain Magnets) [14] the issue of criticality in 1D Ising-like magnet chains has turned out to be an promising eld of study [58]. Some practical applications has been already suggested [2]. Unfortunately, the details of general mechanism driving this changes in real world is yet to be discovered. Traditionaly, Monte Carlo Simulations regarding Ising model were conducted on CPUs1 . However, in the presence of powerful GPGPUs2 new trend in scientic computations was started enabling more detailed and even faster calculations. 2. Target The following document describes developed GPGPU applications capable of pro- ducing insights into underlying physical problem, examination of dierent approaches of conducting Monte Carlo simulations on GPGPU and comparison between developed parallel GPGPU algorithms and sequential CPU-based approach. 3. Scope of work The scope of this document includes development of 5 parallel GPGPU algorithms, namely: Thread-per-simulation algorithm Thread-per-simulation algorithm with static memory Thread-per-spin algorithm Thread-per-spin algorithm with ags Thread-per-spin algorithm with reduction 1 CPU - Central Processing Unit 2 GPGPU - General Purpose Graphics Processing Unit 5 6. 4. Theoretical background and proposed model 4.1. Ising model Although initially proposed by Wilhelm Lenz, it was Ernst Ising[10], who developed a mathematical model for ferromagnetic phenomena. Ising model is usually represented by means of lattice of spins - discrete variables {1, 1}, representing magnetic dipole moments of molecules in the material. The spins are then interacting with its neighbours, which may cause the phase transition of the whole lattice. 4.2. Historic methods Monte Carlo Simulation (MC) on Ising model consist of a sequence of lattice updates. Traditionally all (synchronous) or single (sequential) spins are updated in each iteration producing the lattice-state for future iterations. The update methods are based on the so called dynamics that are describing spin interactions. 4.3. Updating The idea of partially synchronous updating scheme has been suggested [57]. This c-synchronous mode has a xed parameter of spins being updated in one step-time. However, one can imagine, that the number of updated spins/molecules (often referred to as cL, where: L denotes size of the chain and c (0, 1]) is changing as the simulation progresses. If so, then it is either linked to some characteristics of the system or may be expressed with some probability distribution (described in subsection 4.5). This approach of changing c parameter can be applied while choosing spins randomly as well as in cluster (subsection 4.6) but only the later will be considered in this document. 4.4. Simulations In the proposed model cL sequential updating is used with c due to provided distribution. The considered environment consist of one dimensional array of L spins si = 1. Index of each spin is denoted by i = 1, 2, . . . , L. Periodic boundary conditions are assumed, i.e. sL+1 = s1. It has been shown in [8] that the system under synchronous Glauber dynamics reaches one of two absorbing states - ferromagnetic or antiferromagnetic. Therefore, lets introduce density of bonds () as an order parameter: = Lq i=1 (1 sisi+1) 2L (4.1) 6 7. As stated in [8] phase transitions in synchronous updating modes and c-sequential [7] ought to be rather continuous (in cases dierent then c = 1 for the later). Smooth phase transition can be observed in the Figure 4.1. Figure 4.1. The average density of active bonds in the stationary state < st > as a function of W0 for c = 0.9 and several lattice sizes L. [7] B. Skorupa, K. Sznajd-Weron, and R. Topolnicki. Phase diagram for a zero- temperature Glauber dynamics under partially synchronous updates, Phys. Rev. E 86, 051113 (2012) The system is considered in low temperatures (T) and therefore T = 0 can be assumed. Metropolis algorithm can be considered as a special case of zero-temperature Glauber dynamics for 1/2 spins. Each spin is ipped (si = si) with rate W(E) per unit time. While T = 0: W(E) = Y ____] ____[ 1 if E < 0, W0 if E = 0, 0 if E > 0 (4.3) In the case of T = 0, the ordering parameter W0 = [0; 1] (e.g. Glauber rate - W0 = 1/2 or Metropolis rate W0 = 1) is assumed to be constant. One can imagine that even W0 parameter can in fact be changed during simulation process but thats out of scope of proposed model. System starts in the fully ferromagnetic state ( = f = 0). After each time-step changes are applied to the system and the next time-step is being evaluated. After predetermined number of time steps state of the system is investigated. If the chain has obtained antiferromagnetic state ( = af = 1) or suciently large number of time-steps has been inconclusive then whole simulation is being shout down. 4.5. Distributions During the simulation c will not be xed in time but rather change from [0; 1] according to triangular continuous probability distribution[9] presented in the Figure 4.2. While studying dierent initial conditions for simulations, distributions are to be adjusted in order to provide peak values in range {0, 1}. This is due to the fact that 7 8. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 Figure 4.2. c could be any value in the interval [0; 1] but is most likely to around value of c = 1/2. Other values possible but their probabilities are inversely proportional to their distance from c = 1/2. the value of 0.5 (as presented in the plot) would mean that in each time-step half of the spins gets to be updated. 4.6. Updating The following algorithms make use of triangular probability distribution to assign appropriate c value before each time step. After (on average) L updated spins each Monte Carlo Step (MCS) can be distinguished. 4.7. Algorithm Transformation of the above-mentioned rules into set of instructions could yield in following description or pseudocode (below): Update cL consecutive spins starting from randomly chosen one. Each change is saved to the new array rather than the old one. After each Stop updated spins are saved and new time-step can be started. 1. Assign c value with given distribution 2. Choose random value of i [0, L] 3. max = i + cL 4. si is i-th spin if si = si+1 si = si1 : s i = si+1 = si1 otherwise: Flip si with probability W0 8 9. 5. if i max i = i + 1 Go to step 4 6. Stop 9 10. 5. General Purpose Graphic Processing Units 5.1. History of General Purpose GPUs Traditionally, in desktop computer GPU is highly specialized electronic circuit designed to robustly handle 2D and 3D graphics. In 1992 Silicon Graphics, released OpenGL library. OpenGL was meant as standardised, platform-independent interface for writing 3D graphics. By the mid 1990s an increasing demand for 3D applications appeared in the customer market. I