GTC Workshop Japan 2011 (2011 年 7 月 22 日） Multi-GPU Computing of Large Scale Phase-Field Simulation for Dendritic Solidification on TSUBAME 2.0 Takashi Shimokawabe 1) , Takayuki Aoki 1) , Tomohiro Takaki 2) , and Akinori Yamanaka 1) 1) Tokyo Institute of Technology (2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8550) 2) Kyoto Institute of Technology (Gosyokaido-cyo, Matsugasaki, Sakyo-ku, Kyoto, 606-8585) Key Words: CUDA, Phase-field method, Multi-GPU computing, Dendritic solidification 1 Introduction The mechanical properties and performance of metal materials depend on the intrinsic microstructures in these materials. In materials science and related areas, the phase-field method is widely used as one of the powerful computational methods to simulate the formation of complex microstructures such as den- drites during solidification and phase transformation of metals. The phase-field model consists of a large number of complex nonlinear terms comparing with other stencil computation. As a result, the phase-field simulations require large-scale computation, such as multi-GPU computing, to perform realistic three- dimensional simulations in the typical scales of the microstruc- tural pattern. 2 Multi-GPU implementation and results In this simulation, the time integration of the phase field and the solute concentration are carried out by the second-order finite difference scheme for space with the first-order forward Euler- type finite difference method for time on a three-dimensional regular computational grid. We decompose the whole computational domain in both the y- and z-directions (2D decomposition) and allocate each sub domain to a GPU since 3D decomposition tends to degrade the GPU performance that we are able to achieve. Similar to conven- tional multi-CPU calculations with MPI, multi-GPU computing requires boundary data exchanges between sub domains. Be- cause a GPU cannot directly access to the global memory of the other GPUs, host CPUs are used to bridge GPUs for the data exchange between the neighbor GPUs. This process is composed of the following three steps: (1) the data transfer from GPU to CPU using the CUDA runtime library, (2) the data exchange between nodes with the MPI library, and (3) the data transfer back from CPU to GPU with the CUDA runtime library. Figure 1 demonstrates the dendritic growth during the binary alloy solidification using the mesh size of 768 × 1632 × 3264 on 1156 GPUs of TSUBAME 2.0. To evaluate the performance of our multi-GPU code for the phase-field simulations, we use the TSUBAME 2.0 supercomputer at the Tokyo Institute of Technology. The TSUBAME 2.0 supercomputer in Tokyo Institute of Technology is equipped with 4224 NVIDIA Tesla M2050 GPUs. Each node of TSUB- AME 2.0 has three Tesla M2050 attached to the PCI Express Fig. 1: Dendritic growth in the binary alloy solidification bus 2.0 ×16 (8 GB/s), two QDR InfiniBand and two sockets of the Intel CPU Xeon X5670 (Westmere-EP) 2.93 GHz 6-core. Figure 2 shows that the performance of the phase-field simulation by the multi-GPU computing using TSUBAME 2.0 in both double- and single-precision. The performance of 1.017 PFlops is achieved for the mesh size of 4, 096 × 6, 500 × 10, 400 using 4,000 GPUs with 16,000 CPU cores of TSUBAME 2.0 in single precision. Number of GPUs 10 2 10 3 10 Performance [TFlops] 1 10 2 10 3 10 GPU, Single Precision GPU, Double Precision Fig. 2: Performance of multi-GPU computation REFERENCES [1] T. Aoki, S. Ogawa, and A. Yamanaka. Multiple-GPU scal- ability of phase-field simulation for dendritic solidification. Progress in Nuclear Science and Technology, in press.

Multi-GPU Computing of Large Scale Phase-Field Simulation

Download PDF Report

Upload
others
View
2
Download
0

Embed Size (px)

Citation preview

Page 1: Multi-GPU Computing of Large Scale Phase-Field Simulation

GTC Workshop Japan 2011 (2011年 7月 22日）

Multi-GPU Computing of Large Scale Phase-FieldSimulation for Dendritic Solidification on TSUBAME 2.0

Takashi Shimokawabe1), Takayuki Aoki1), Tomohiro Takaki2), and Akinori Yamanaka1)

1) Tokyo Institute of Technology (2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8550)

2) Kyoto Institute of Technology (Gosyokaido-cyo, Matsugasaki, Sakyo-ku, Kyoto, 606-8585)

Key Words: CUDA, Phase-field method, Multi-GPU computing, Dendritic solidification

1 Introduction

The mechanical properties and performance of metal mate-rials depend on the intrinsic microstructures in these materials.In materials science and related areas, the phase-field method iswidely used as one of the powerful computational methods tosimulate the formation of complex microstructures such as den-drites during solidification and phase transformation of metals.

The phase-field model consists of a large number of complexnonlinear terms comparing with other stencil computation. Asa result, the phase-field simulations require large-scale compu-tation, such as multi-GPU computing, to perform realistic three-dimensional simulations in the typical scales of the microstruc-tural pattern.

2 Multi-GPU implementation and results

In this simulation, the time integration of the phase field andthe solute concentration are carried out by the second-order finitedifference scheme for space with the first-order forward Euler-type finite difference method for time on a three-dimensionalregular computational grid.

We decompose the whole computational domain in both they- and z-directions (2D decomposition) and allocate each subdomain to a GPU since 3D decomposition tends to degrade theGPU performance that we are able to achieve. Similar to conven-tional multi-CPU calculations with MPI, multi-GPU computingrequires boundary data exchanges between sub domains. Be-cause a GPU cannot directly access to the global memory of theother GPUs, host CPUs are used to bridge GPUs for the data ex-change between the neighbor GPUs. This process is composedof the following three steps: (1) the data transfer from GPU toCPU using the CUDA runtime library, (2) the data exchange be-tween nodes with the MPI library, and (3) the data transfer backfrom CPU to GPU with the CUDA runtime library.

Figure 1 demonstrates the dendritic growth during the binaryalloy solidification using the mesh size of 768×1632×3264 on1156 GPUs of TSUBAME 2.0.

To evaluate the performance of our multi-GPU code for thephase-field simulations, we use the TSUBAME 2.0 supercom-puter at the Tokyo Institute of Technology. The TSUBAME2.0 supercomputer in Tokyo Institute of Technology is equippedwith 4224 NVIDIA Tesla M2050 GPUs. Each node of TSUB-AME 2.0 has three Tesla M2050 attached to the PCI Express

Fig. 1: Dendritic growth in the binary alloy solidification

bus 2.0 ×16 (8 GB/s), two QDR InfiniBand and two socketsof the Intel CPU Xeon X5670 (Westmere-EP) 2.93 GHz 6-core.Figure 2 shows that the performance of the phase-field simula-tion by the multi-GPU computing using TSUBAME 2.0 in bothdouble- and single-precision. The performance of 1.017PFlopsis achieved for the mesh size of 4, 096× 6, 500× 10, 400 using4,000 GPUs with 16,000 CPU cores of TSUBAME 2.0 in singleprecision.

Number of GPUs10 210 310

Perf

orm

ance

[TFl

ops]

210

310GPU, Single Precision

GPU, Double Precision

Performance of Phase-field method on TSUBAME 2.0

Fig. 2: Performance of multi-GPU computation

REFERENCES

[1] T. Aoki, S. Ogawa, and A. Yamanaka. Multiple-GPU scal-ability of phase-field simulation for dendritic solidification.Progress in Nuclear Science and Technology, in press.

GPU-based high-performance computing for urban · PDF fileNonlinear Cyclic Pushover Analysis Normalized by the weight of one story ... GPU-based high-performance computing for urban

Documents

GPU-Programmierung: OpenCL€¦ · Einsatzgebiete von GPU-Computing Entwicklung von GPU-Computing 2 OpenCL Entwicklung Architektur Spracheigenschaften Vergleich mit CUDA Beispiel

Documents

The Impact of Novel Computing Architectures on Large-Scale … · Dipartimento di Informatica Dottorato di Ricerca in Informatica Ph.D. Thesis The Impact of Novel Computing Architectures

Documents

MEC (Mobile Edge Computing) + GPUコンピューティングについて

Technology

ΣΥΝΔΥΑΣΤΙΚΗ ΒΕΛΤΙΣΤΟΠΟΙΗΣΗopencourses.uom.gr/assets/site/public/...Sifaleras.pdfNetwork Flow Problem”, Chapter 5, GPU Computing Gems Jade Edition, Morgan

Documents

Experiences with gpu Computing - University of California ...jowens/talks/intel-santaclara-070420.… · • Mark Harris, unc/nvidia • gpu Gems (Addison-Wesley) • Vol 1: 2004;

Documents

Masterarbeit akultätF Informatik Untersuchungen zur ...meixner/masterarbeitKlass.pdf · the GPU computing power in higher-level programming languages increases. Both the evaluation

Documents

Introduction of GPU Computing by Y.C.J.. GPU supported Cloud and Grid Computing （雲端運算） Matlab (Jacket) MathematicA BLAS Linkpack Lapack Guassian?

Documents

Cloud for Cognitive Computing - Huodongjia.com...2017/06/29 · open source cloud stack • Enabled GPU sharing service on OpenStack Cloud • Enabled GPU on Mesos, Marathon and Kuberntes

Documents

OpenACCで始めるGPUコンピューティング： OpenACC GPU ......Introduction of GPU computing by OpenACC: Optimize Loops 成瀬彰 1 はじめに前回までに、我々が推奨するOpenACCによるアプ

Documents

GPU-Computing mit CUDA und OpenCL in der Praxis

Technology

A GPU-based Framework for Large-scale Multi-Ag

Documents

Real-time 3D Video Processing Using Multi-stream GPU ... 3D Video Processing U… · Real-time 3D Video Processing Using Multi-stream GPU Parallel Computing Kenia Picos, Víctor H

Documents

$Sicurezza, protezione e integrità dei sistemi Cloud: …Introduzione al Cloud Computing \Cloud computing is a large-scale distributed computing paradigm that is driven by economies$