Zwart 2013

  • Upload
    oguier

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

  • 7/25/2019 Zwart 2013

    1/13

    Computer Physics Communications 184 (2013) 456468

    Contents lists available atSciVerse ScienceDirect

    Computer Physics Communications

    journal homepage:www.elsevier.com/locate/cpc

    Multi-physics simulations using a hierarchical interchangeablesoftware interface

    Simon F. Portegies Zwart a,, Stephen L.W. McMillan b, Arjen van Elteren a, F. Inti Pelupessy a,Nathan de Vries a

    a Sterrewacht Leiden, P.O. Box 9513, 2300 RA Leiden, The Netherlandsb Department of Physics, Drexel University, Philadelphia, PA 19104, USA

    a r t i c l e i n f o

    Article history:

    Received 11 June 2012

    Received in revised form

    17 September 2012

    Accepted 20 September 2012

    Available online 26 September 2012

    Keywords:

    Computer applications

    Physical sciences and engineering

    Astronomy

    Computing methodologies: simulation

    modeling, and visualization

    Distributed computing

    a b s t r a c t

    We introduce a general-purpose framework for interconnecting scientific simulation programs usinga homogeneous, unified interface. Our framework is intrinsically parallel, and conveniently separatesall component numerical modules in memory. This strict separation allows automatic unit conversion,distributed execution of modules on different cores within a cluster or grid, and orderly recovery fromerrors. The framework can be efficiently implemented and incurs an acceptable overhead. In practice, wemeasure the time spent in the framework to be less than 1% of the wall-clock time. Due to the unifiedstructure of the interface, incorporating multiple modules addressing the same physics in different waysis relatively straightforward. Different modules may be advanced serially or in parallel. Despite initialconcerns, we have encountered relatively few problems with this strict separation between modules,and the results of our simulations are consistent with earlier results using more traditional monolithicapproaches. This framework provides a platform to combine existing simulation codes or develop newphysical solver codes within a rich ecosystem of interchangeable modules.

    Crown Copyright2012 Published by Elsevier B.V. All rights reserved.

    1. Introduction

    Large-scale, high-resolution computer simulations dominatemany areas of theoretical and computational science. The demandfor such simulations has expanded steadily over the past decade,andis likelyto continue to grow in comingyearsdue to theincreasein the volume, precision, and dynamic range of experimental data,as well as the widening spectral coverage of observations andlaboratory experiments. Simulations are often used to mine andunderstand large observational and experimental datasets, and thequality of these simulations must keep pace with the increasinglyhigh quality of experimental data.

    In our own specialized discipline of computational astro-physics, numerical simulations have increased dramatically in bothscope and scale over the past four decades. In the 1970s and1980s, large-scale astrophysical simulations generally incorpo-rated mono-physics solutionsin our case, the sub-disciplines ofstellar evolution[1], gas dynamics [2], and gravitational dynam-ics [3]. A decade later, it became common to study phenomenacombining a few different physics solvers [4]. Todays simulationenvironments incorporate multiple physical domains, and theirnominal dynamic range often exceeds the standard numerical pre-cision of available compilers and hardware[3].

    Corresponding author.

    E-mail address:[email protected](S.F. Portegies Zwart).

    Recent developments in hardwarein particular the rapidlyincreasing availability of multi-core architectureshave led to asurge in computer performance [5]. With thevolume andquality ofexperimental datacontinuously improving, simulations expandingin scope and scale, and raw computational speed growing morerapidly than ever before, one might expect commensurate returnsin the scientific results returned. However, a major bottleneckin modern computer modeling lies in the software, the growingcomplexity of which is evident in the increase in the number ofcode lines, the lengthening lists of input parameters, the numberof underlying (and often undocumented) assumptions, and theexpanding range of initial and boundary conditions.

    Simulation environments have grown substantially in recentyears by incorporating more detailed interactions among con-stituent systems, resultingin theneed to incorporate very differentphysical solvers into the simulations, but the fundamental designof the underlying codes has remained largely unchanged since theintroduction of object-oriented programming[6] and patterns [7].As a result, maintaining and extending existing large-scale, multi-physics solvers has become a major undertaking. The legacy ofdesign choices made long ago can hamper further developmentand expansion of a code, prevent scaling on large parallel com-puters, and render maintenance almost impossible. Even config-uring, compiling, and running such a code has become a complexendeavor, not for the faint of heart. It has become increasinglydifficult to reproduce simulation results, and independent code

    0010-4655/$ see front matter Crown Copyright 2012 Published by Elsevier B.V. All rights reserved.doi:10.1016/j.cpc.2012.09.024

    http://dx.doi.org/10.1016/j.cpc.2012.09.024http://www.elsevier.com/locate/cpchttp://www.elsevier.com/locate/cpcmailto:[email protected]://dx.doi.org/10.1016/j.cpc.2012.09.024http://dx.doi.org/10.1016/j.cpc.2012.09.024mailto:[email protected]://www.elsevier.com/locate/cpchttp://www.elsevier.com/locate/cpchttp://dx.doi.org/10.1016/j.cpc.2012.09.024
  • 7/25/2019 Zwart 2013

    2/13

    S.F. Portegies Zwart et al. / Computer Physics Communications 184 (2013) 456468 457

    verification and validation are rarely performed, even though allresearchers agree that computer experiments require thesame de-gree of reproducibility as is customary in laboratory settings.

    We suggest that the root cause of much of this code complex-ity lies in the traditional approach to incorporating multi-physicscomponents into a simulationnamely, solving the equations ap-propriate to all components in a single monolithic software suite,often written by a single researcher or research group. Such a so-lution may seem desirable from the standpoint of consistency andperformance, but the resulting software generally suffers from allof the fundamental problems just described. In addition, integra-tion of new components often requires sweeping redesign andredevelopment of the code. Writing a general multi-physics ap-plication from scratch is a major undertaking, and the stringentrequirements of high-quality, fault-tolerant scientific simulationsoftware render such code development by a single author almostimpossible.

    But why reinvent or re-engineer a monolithic suite of coupledmono-physics solvers when well-tested applications already existto perform many or all of the necessary individual tasks? In manyscientific communities there is a rich tradition of sharing scientificsoftware. Many of these programs have been written by experts

    who have spent careers developing these codes and using them toconduct a wide range of numerical experiments. These packagesare generally developed and maintained independently of oneanother. We refer to them collectively as community software.The term is intended to encompass both legacy codes that arestill maintained but are no longer under active development, andnew codes still under development to address physical problemsof current interest. Together, community codes represent aninvaluable, if incoherent, resource for computational science.

    Coupling community codes raises new issues not found inmonolithic applications. Aside from the wide range of physicalprocesses, underlying equations, and numerical solvers they rep-resent, these independent codes generally also employ a widevariety of units, input and output methods, file formats, numerical

    assumptions, and boundary conditions. Their originality and inde-pendence are strengths, but the lack of uniformity can also signif-icantly reduce the shelf life of the software. In addition, directlycoupling the very dissimilar algorithms and data representationsused in different community codes can be a difficult taskalmostas complex as rewriting the codes themselves. But whatever theinternal workings of the codes, they are designed to represent agiven domain of physics, and different codes may (and do in prac-tice) implement alternate descriptions of the same physical pro-cesses. This suggests that integrating community codes should bepossible using interfaces based on physical principles.

    In this paper we present a comprehensive solution to manyof the problems mentioned above, in the form of a softwareframework that combines remote function calls with physically

    based interfaces, and implements an object oriented data model,automatic conversion of units, and a state handling model forthe component solvers, including error recovery mechanisms.Communication between the various solvers is realized via acentralized message passing framework, under the overall controlof a high-level user interface. In Section 2, we name ourframeworkMUSE, the MUlti-physics Software Environment. An example ofthe MUSE framework is presented in Section3. In Section 4 wedescribe a production implementation of AMUSE, the astrophysicsMUSE environment, which supports a wide range of programminglanguages and physical environments.

    1.1. An historical perspective on MUSE

    The basic concepts of MUSE are rooted in the earliest develop-ment of multi-scale software in computational astrophysics. The

    idea of combining codes within a flexible framework began withthe NEMO project in 1986 at the Institute for Advanced Study (IAS)in Princeton [8,9]. NEMO was (and still is) aimed primarily at colli-sionless galactic dynamics. It used a uniform file structure to com-municate data among its component programs.

    The Starlab package [10,11], begun in 1993 (again at IAS),adopted theNEMOtoolboxapproach, butusedpipesinsteadof filesfor communication among modules. The goal of the project was tocombine dynamics and stellar and binary evolution for studies ofcollisional systems, such as star clusters. The stellar/binary evolu-tion code SeBa[12]was combined with a high-performance gravi-tational stellar dynamics simulator. Because of the heterogeneousnature of the data, not all tools were aware of all datatypes (for ex-ample, thestellar evolution tools generallyhad no inherent knowl-edge of large-scale gravitational dynamics). As a result, the packageused an XML-like tagged data formatto ensure that no informationwas lost in the production pipelineunknown particle data weresimply passed unchanged from input to output.

    The intellectual parent of (A)MUSE is the MODEST initiative,begun in 2002 at a workshop at the American Museum of NaturalHistory in New York. The goal of that workshop was to formalizesome of theideasof modular software frameworks then circulating

    in the community into a coherent system for simulating densestellar systems. Originally, MODEST stood for MOdeling DEnseSTellar systems (star clusters and galactic nuclei). The name waslater expanded, at the suggestion of Giampolo Piotto (Padova) toModeling and Observing DEnse STellar systems. The MODEST webpage can be found at http://www.manybody.org/modest. Sincethen, MODEST has gone on to provide a lively and long-lived forumfor discussion of many topics in astrophysics. (A)MUSE is in manyways thesoftwarecomponentof theMODEST community. An earlyexample of MUSE-like code can be found in the proceedings of theMODEST-1 meeting[13].

    Subsequent MODEST meetings discussed many new ideas formodular multiphysics applications [14, e.g.].The basic MUSE archi-tecture, as described in this paper, was conceived during the 2005

    MODEST-6a workshop in Lund [15, Sweden]. The MUSE name,and the first lines of MUSE code, were created during MODEST-6e in Amsterdam in 2006, and expanded upon over the next12 years. The Noahs Ark milestone (meeting our initial goal ofhaving two independent modules for solving each particular typeof physics) was realized in 2007, during MODEST-7f in Amsterdamand MODEST-7a in Split [16,Croatia]. The AMUSE project, shortfor for Astrophysics MUlti-purpose Software Environment, a re-engineered version of MUSEMUSE 2.0, building on the lessonslearned during the previous 3 yearsbegan at Leiden Observatoryin 2009.

    2. The MUSE framework

    Each of the problems discussed in Section1could in principlebe addressed by designing an entirely new suite of programsfrom scratch. However, this idealized approach fails to capitalizeon the existing strengths of the field by ignoring the wealth ofhighlycapable scientific software that hasbeen developedover thelast four or five decades. We will argue that it is more practical,and considerably easier, to introduce a generalized interface thatconnects existing solvers to a homogeneous and easily extensibleframework.

    At first sight,the approach of assimilatingexisting software intoa larger framework would appear to be a difficult undertaking,particularly since community software may be written in a widevariety of languages, such as FORTRAN, C, and C++, and exhibitsan enormous diversity of internal units and data structures. On

    the other hand, such a framework, if properly designed, couldbe relatively easy to use, since learning one simulation package

    http://www.manybody.org/modesthttp://www.manybody.org/modesthttp://www.manybody.org/modest
  • 7/25/2019 Zwart 2013

    3/13

    458 S.F. Portegies Zwart et al. / Computer Physics Communications 184 (2013) 456468

    is considerably easier than mastering the idiosyncrasies of manyseparate programs. The use of standard calling procedures canenable even novice researchers quickly to become acquaintedwith the environment, and to perform relatively complicatedsimulations using it.

    To this end, we propose the Multi-physics Software Environ-ment, or MUSE. Within MUSE, a user writes a relatively simplescript with a standardized calling sequence to each of the under-lying community codes. Instructions are communicated through amessage-passing interface to anyof a number of spawned commu-nity modules, which respond by performing operations and trans-ferring data back through the interface to the user script.

    As illustrated inFig. 1,the two top-level components of MUSE:

    The user script, which implements a specific physical problem orset of problems,in the form of system-providedor user-writtenscripts that serve as the user interface onto the MUSE frame-work. The coupling between community codes is implementedin this layer, with the help of support classes provided by themanager.

    Thecommunity module, comprising three key elements:1. The manager, which provides an object-oriented interface

    onto thecommunicationlayer via a suite of system-providedutility functions. This layer handles unit conversion, as dis-cussed below, and also contains the state engine and theassociated data repository, both of which are required toguarantee consistency of data across modules. The role ofthis layer is genericit is not specific to any particular prob-lem or to any single physical domain. In Section 4.1 wediscuss an actual implementation of the managerfor astro-physical problems.

    2. The communication layer, which realizes the bi-directionalcommunication between the manager and the communitycode layer. This is implemented via a proxy and an associ-ated partner, which together provide the connection to thecommunity code.

    3. The community code layer, which contains the actual com-munity codes and implements control and data manage-ment operations on them. Each piece of code in this layer isdomain-specific, although the code may be designed to bevery general within its particular physical domain.

    2.1. Choice of programming languages

    We have adopted python as the implementation language forthe MUSE framework and high-level management functions, in-cluding the bindings to the communication interface. This choiceis motivated by pythons broad acceptance in the scientific com-munity, its object oriented design, and its ability to allow rapidprototyping, which shortens the software development cycle and

    enables easy accessto thecommunitycode in thecommunitymod-ule, albeit at the cost of slightly reduced performance. However,the entire MUSE framework is organized in such a way that rela-tivelylittle computer time is actually spent in the framework itself,as most time is spent in the community code. The overhead frompython compared to a compiled high-level language is 10%, andoften much less (see Section5).

    As discussed further in Section 2.2.2, our implementation ofthe communication layer in MUSE uses the standard MessagePassing Interface protocol[17], although we also have an opera-tional implementation using SmartSockets[18]via the Ibis frame-work[19,20] (see [21] for an actual implementation). NormallyMPIis used forcommunicationbetween compute nodes on paralleldistributed-memory systems. However, here it is used as the com-

    munication channel between all processes, whether or notthey re-side on thesamenode as thecontrol script, andwhether or notthey

    Fig. 1. The MUSE framework architecture is based on a three-layered design. The

    top layer consists of a user script written by the end-user in python (top box).The manager is part of the community module and consists of a data repository,

    unit conversion, and a state engine. The community module also includes the

    partner-proxy pair in the communication layera bi-directional message passing

    frameworkand a community code (bottom box).

    are themselves parallel. The proxy side of the message-passingframework in the communication layer is implemented in python,but the partner side is normallywritten in the language used in thecommunity code. Thus our only real restriction on supported lan-guages is the requirement that the community code is written in alanguage with standard MPI bindings, such as C, C++, or a memberof the FORTRAN family.

    2.2. The community module

    The community module consists of the actual community codeand the bi-directional communication layer enabling low-levelMPI communication between the proxy and the partner. Eachcommunity module contains a python class of functions dedicatedto communication with the community code. Direct use of thislow-level class from within a user script is not trivial, because itrequires considerable replication of code dealing with data models,unit conversion, and exception handling, and is discouraged.Instead, the MUSE framework provides the manager layer abovethe community module layer to manage the bookkeeping neededto maintain the interface with the low-level class.

    2.2.1. The community code

    Community applications may be programmed in any computerlanguage that has bindings to the MPI protocol. In practice,these codes exhibit a wide variety of structural properties, modeldiverse physical domains, and span broad ranges of temporal and

    spatial scales. In our astrophysical AMUSE implementation (seeSection4.1), this diversity is exemplified by a suite of community

  • 7/25/2019 Zwart 2013

    4/13

    S.F. Portegies Zwart et al. / Computer Physics Communications 184 (2013) 456468 459

    programs ranging from toy codes to production-quality high-precision dedicated solvers. A user script can be developed quicklyusing toy codes until the production and data analysis pipelinesare fully developed, then easily switched to production qualityimplementations of the physical solvers for actual large-scalesimulations.

    In principle, each of the community modules can be used standalone, but the main strength of MUSE is in the coupling between

    them. Many of thecodesincorporated in ourpractical implementa-tion are not written by us, but are publicly available and are amplydescribed in the literature.

    2.2.2. The communication layer

    Bi-directional communication between a running communityprocess and the manager can in principle be realized using anyprotocol for remote interprocess communication. Our main rea-sons for choosing MPI are similar to those for adopting pythonwidespread acceptance in the computational science communityand broad support in many programming languages and on mostcomputing platforms.

    The communication layer consists of two main parts, a pythonproxy on the manager side of the framework and a partner on the

    community code side. The proxy converts python commands intoMPI messages, transmits them to the partner, then waits for a re-sponse. The partner waits for and decodes MPI messages from theproxy into communitycode commands, executes them, then sendsa reply. The MPI interface allows the user script and communitymodules to operate asynchronously and independently. As a prac-tical matter, the detailed coding, decoding, and communicationsoperations in the proxy and the partner are never hand-coded.Rather, they are automatically generated as part of the MUSE buildprocess, from a high-level python description of the communitymodule interface.

    We note that other solutions to linking high-level languageswithin python existe.g. swig and f2py, and in fact both wereused in earlier implementations of the MUSE framework [16].However, despite their generality, these solutions cannot maintain

    name space independence between community modules, and inaddition are incapable of accommodating high-performance par-allel community codes. For these reasons, we have abandoned thestandard solutions in favor of our customized, high-performanceMPI alternative.

    The use of MPI may seem like overkill in the case of serial op-eration on a single computing node (as might well be the case),but it imposes negligible overhead, and even here it offers signif-icant practical advantages. It rigorously separates all communityprocesses in memory, guaranteeing that multiple instances of acommunity module are independent and explicitly avoiding namespace conflicts. It also ensures that the framework remains thread-safe when using older community codes that may not have beenwritten with that concept in mind.

    Despite the initial threshold for its implementation and use,the relative simplicity of the message-passing framework allowsus to easily couple multiple independent community codes ascommunity modules, or multiple copies of the same communitymodule, if desired (seeFig. 1). An example of the latter might be tosimulate simultaneously two separate objects of the same type, orthe same physics on different scales, or with different boundaryconditions, without having to modify the data structures in thecommunity code.

    As a bonus, the framework naturally accommodates inherentlyparallel communitymodules and allows simultaneous execution ofindependent modules from the user script. Individual communitymodules can be offloaded to other processors, in a cluster or on thegrid, and can be run concurrently without requiring any changesin the user script. This makes the MUSE framework well suited for

    distributed computing with a wide diversity of hardware architec-tures (see Section5).

    2.2.3. The manager

    Between the user script and the communication layer we in-troduce the manager, part of the community module. This part ofthe code is visible to the script-writing user and guarantees thatall accessible data are up to date, have the proper format, and areexpressed in the correct units.

    The manager layer constructs data models (such as particle setsor tessellated grids) on top of all functions in the low-level com-

    munication class and checks the error codes they return. To al-low different modules to work in their preferred systems of units,the manager incorporates a unit transformation protocol (see Sec-tion 2.4). It also includes a state engine to ensure that functions arecalled in a controlled way, as many community codes are writtenwith specific assumptions about calling procedures. In addition, adata repository is introduced to guarantee that at least one self-consistent copy of the simulation data always exists at any giventime. This repository can be structured in one or more native for-mats, such as particles, grids, tessellation, etc, depending on thetopology of the data adopted in the community code.

    Each community code has its own internal data, needed formodeling operations but not routinely exposed to the user script.However, some of these data may be needed by other parts

    of the framework. To share data effectively, and to minimizethe bookkeeping required to manage data coherency, the mostfundamental parts of the community data are replicated in thestructured repository in the manager. The internal data structuresdiffer, but the repositoryimposes a standard format. The repositoryallows the user to access community module data from the userscript without having to make individual (and often idiosyncratic)calls to individual community modules. The repository is updatedfrom the community module on demand, and is considered to beauthoritative within the script.

    The separation of the manager from the community modulerealizes a flexibility that allows the user to swap modules and re-compute the same problem using different physics solvers, pro-viding an independent check of the implementation, validation ofthe model, and verification of the results, simply by rerunning the

    same initial state using another community module. Potentiallyeven more interesting is the possibility of solving some parts of aproblem with one solver and others with a different solver of thesame type, combining the results at a later stage to complete thesolution. While studying the general behavior of a simulation, per-haps to test a hypothesis or guide our physical intuition, we mayelect to use computationally cheaper physics solvers, then switchto more robust, but computationally more expensive, modules ina production run. Alternatively, we might choose to adopt less ac-curate solvers for physical domains that are not deemed crucial tocapturing the correct overall behavior.

    Another advantage of separating of the manager from the com-munitymodule is the possibility of combining existing communitymodules into new community modules which can themselves becoupled via the manager, building a hierarchical environment for

    simulating complex physical systems with multiple components.

    2.3. The user script

    The MUSE framework is controlled by the user via pythonscripts (see Section 3and Section 4.1). The main tasks of thesescripts are to identify and spawn community modules, controltheir calling sequences, and manage the data flow among them.The user script and the spawned community code are connectedby a communication channel embedded in the community mod-uleand maintained by the manager. Since thecommunity modulesdo not communicate directly with one another, the only commu-nication channels are between modules and the user script. Thenumber of community modules controlled by a user script is effec-

    tively unlimited, and multiple instances of identical modules canbe spawned.

  • 7/25/2019 Zwart 2013

    5/13

    460 S.F. Portegies Zwart et al. / Computer Physics Communications 184 (2013) 456468

    Fig. 2. The community code.

    The main objective of the user script is to read in or generatean input model, process these initial conditions through one ormore simulation modules, and subsequently mine and analyze theresultant data. In practice the user script will itself be composed ofa series of scripts, each performing one specific functionality.

    2.4. Inter-module data transfer and unit conversion

    Computer scientists tend to attribute types to parameters andvariables, and generally impose strict rules for the conversion fromone type to another. In physics, data types are almost never im-portant, but units are crucial for validating and checking the con-sistency of a calculation or simulation. The lack of coherent unithandling in programming environments is notorious, and cancause significant problems in multi-physics simulations, particu-larly for inexperienced researchers.

    In many cases, a researcher can define a set of dimensionlessvariables applicable to a specific simulation, within the confines ofa set of rather strict assumptions. For a mono-physics simulation,the consistent use of such variables is generally clear to mostusers. However, as soon as an expert from another field attempts

    to interpret the results, or when the output of one simulation isused by another program, units must be restored and the dataconverted to another physical system for further processing andinterpretation.

    The absence of units in scientific software raises two importantproblems. The first is the loss of an independent consistency checkfor theoretical calculations. The second is the expert knowledgeand intuition required to manage dimensionless variables or oth-erwise unfamiliar units. Few would recognize 4.303 1043 as theradius of the Sun in units of the Planck length, even though it is notcompletely inconceivable that an astronomer might use such unitsin a simulation. To address these issues, MUSE incorporates a unit-conversion module as part of the manager (see Section 2.2.3) toguarantee that all communications within the top-level user script

    are performed in the proper units. In order to prevent unit check-ing and conversion from becoming a performance bottleneck, weadopt lazy evaluation, performing unit conversion only when ex-plicitly required.

    3. A simple implementation

    The above description of MUSE is somewhat abstract. Here wepresent a simple examplethe calculation of the orbital period ofa binary star in a circular orbit. This is a straightforward astrophys-ical calculation that serves to illustrate the basic operation of theframework. The community code that calculates the orbital periodis presented inFig. 2.

    The partner requires the definitions of all interface functions in

    the community code, which are included from the community.hheader file, presented inFig. 3. (Note that, as mentioned earlier,

    Fig. 3. The community header file.

    in a practical implementation the communication code is actually

    machine generated. The code here is hand-written, for purposes ofexposition.)

    The user script that initializes the required parameters (orbitalseparation, total mass of the binary) and computes the orbital pe-riod is presented in Fig. 4. Because of the simplicity of this examplewe do not include unit conversion here; we tacitly assume that thetotal binary mass is in units of the mass of the sun, and the orbitalseparation in astronomical units; the average orbital separation ofEarth around the Sun. The output orbital period is expressed inyears. This example illustrates how a set of units can be convenientwithin the implicit choices of a community code, but counterintu-itive for researchers from other disciplines. Calls to the communitycode are initiated by the user script, sent to the proxy, received bythe partner, and executed by the community code. The interaction

    between these MUSE components is illustrated inFig. 5.The python class proxy takes care of setting up the commu-nity code, encoding the arguments, sending the message, and de-coding the results. In an actual MUSE implementation the proxy issplit into two parts, one to translate the arguments into a genericmessage object(and translatethe returned message into returnpa-rameters), and one to send the message using the communicationlibrary (MPI). The source code listing inFig. 6presents a workingexample of a proxy.

    Messages sent via MPI are received by the partner code, whichdecodes them and executes the actual function calls in the com-munity code. Subsequently it encodes and returns the results ina message to the proxy. The partner code is generally written inthe same language as the community code, in this case C++. In thesourcecode listing in Fig.7 we present a working example of a part-

    ner.Fig. 5 and the associated source listingFig. 7might seem atfirst glance to be a rather complicated way to perform some rathersimple message passing. However, this procedure allows us toencapsulate existing code, rendering it largely independent of therest of the framework, allowing the implementation of parallelinternal architecture, and opening the possibility of launching aparticular application on a remote computer. Also, if one of theseencapsulated codes stops prematurely, the framework remainsoperational, and simply detects the unscheduled termination of aparticular application. A complete crash of one of the communitymodules, however, will likely still cause the framework to break.

    4. Advanced MUSE

    The simple example described in Section 3does not demon-strate the full potential of a MUSE framework, and specifically the

  • 7/25/2019 Zwart 2013

    6/13

    S.F. Portegies Zwart et al. / Computer Physics Communications 184 (2013) 456468 461

    Fig. 4. The main user script.

    Fig. 5. The MUSE framework manages the interaction between the user script (left) and the community code (right). When the script is started, process 1 spawns the

    communitycode andmessage-passing partner as process 2. Process 2 startsby sending a request forinstructions, andthenwaits.At some later time theuser scriptrequests

    the execution of the function orbital_period(). This results in a request to the communication layer. Since process 2 already has an open request both processesreturn with the confirmation that the request is satisfied. Process 1 subsequently sends a new request to return the resulting data and a confirmation that the function

    orbital_period()has been executed correctly. In the mean time process 2 executes the function, returning the requested data and a message to the framework oncethe execution is complete.

    wide range of possibilities that arise from coupling community

    codes.Many problems in astrophysics encompass a wide variety ofphysics. As a practical demonstration of MUSE, we present here anopen-source production environment in an astrophysics context,implemented by coupling community codes designed for gravita-tional dynamics, stellar evolution, hydrodynamics, and radiativetransfer, the most common physical domains encountered in as-trophysical simulations. We call this implementation AMUSE, theAstrophysics MUltiphysics Software Environment[22,23].

    4.1. AMUSE: a MUSE implementation for astrophysics

    AMUSE1 is a complete implementation of MUSE, with a fullyfunctional interface including automated unit conversion, a struc-

    1 Seehttp://amusecode.org.

    tured data repository (see Section 2.2.3), multi-stepping via opera-

    tor splitting (see Section 4.3.1), and(limited) ability to recover fromfatal errors (see Section 4.2). A wide variety of community codes isavailable, including multiple modules for each of the core domainslisted above, and the framework is currently used in a variety ofproduction simulations. AMUSE is designed for use by researchersin many fields of astrophysics, targeted at a wide variety of prob-lems, but it is currently also used for educational purposes, in par-ticular for training M.Sc. and Ph.D. students.

    The development of AMUSE sprang from our desire to simulatea multi-physics system within a single conceptual framework. Thefundamental design stems from our earlier endeavor in the de-velopment of the starlab software environment[10] in whichwe combined stellar evolution and gravitational dynamics. Thedrawback with starlab wasrigid coupling between domains, which

    made the framework inflexible and less applicable than desiredto a wider range of practical problems. The AMUSE framework

    http://amusecode.org/http://amusecode.org/http://amusecode.org/
  • 7/25/2019 Zwart 2013

    7/13

    462 S.F. Portegies Zwart et al. / Computer Physics Communications 184 (2013) 456468

    Fig. 6. The proxy part of the community module.

    provides us with a general solution to the problem, allowing us to

    hierarchically combine numerical solvers within a single environ-ment to create new and more capable community modules. Ex-amples in AMUSE include coupling a direct N-body algorithm witha hierarchical tree-code, and combining a smoothed-particle hy-drodynamics solver with a larger-scale grid-based hydrodynamicssolver.

    Such dynamic coupling has practical applications in resolvingshort time scale shocks using a grid-based hydrodynamics solverwithin a low-resolution smoothed-particles hydrodynamics envi-ronment, or in changing the evolutionary prescription of a star atrun-timefor example when two stars collide and the standardlookup description of their evolution must be replaced by liveintegration of the merger product. The validation and verificationof coupled solvers remain a delicate issue. It may well be that the

    effective strength of the coupling can only be determined at run-time, and it may require a thorough study for each application to

    identify the range of parameters within which the coupled solver

    can be applied.Adding new community modules is straightforward. The en-

    vironment runs efficiently on a single PC, on multi-core archi-tectures, on supercomputers, and on distributed computers [21].Given our experience with AMUSE, we are confident that it can berelatively easily adapted as the basis for a MUSE implementationin another research field, coupling a different suite of numericalsolvers. The AMUSE source code is freely availablefrom the AMUSEwebsite.2

    We have applied AMUSE to a number of interesting problemsincorporating stellar evolution and gravitational dynamics [24,25],and to the combination of stellar evolution and the dynamics ofstars and gas[26]. The former reference provides an explanation

    2 http://www.amusecode.org.

    http://www.amusecode.org/http://www.amusecode.org/http://www.amusecode.org/
  • 7/25/2019 Zwart 2013

    8/13

    S.F. Portegies Zwart et al. / Computer Physics Communications 184 (2013) 456468 463

    Fig. 7. The partner part of the community module.

    for the formation of the binary millisecond pulsar J1903 + 0327;the latter explores the consequences of early mass loss from starsin young star clusters by means of stellar winds and supernovae.In Fig. 8 we present three-dimensional renderings of severalsnapshots of the latter calculation.

    4.2. Failure recovery

    Real-world simulation codes crash. Crashes are inevitable, butrestarting and continuing a simulation afterwards can be a very

    delicate procedure, and there is a fine line between science andnonsense in the resulting data. In many cases catastrophic failureis caused by numerical instabilities that are either not understood,in which case a researcher would like to study them further, orare part of a natural phenomenon that cannot be modeled insufficient detail by the current community code. Recognizing andunderstanding code failure is a time consuming but essential partof the gritty reality of working with simulation environments.

    In many cases, simulation codes aredevelopedwith theimplicitassumption that a user has some level of expert knowledge, in lieu

  • 7/25/2019 Zwart 2013

    9/13

    464 S.F. Portegies Zwart et al. / Computer Physics Communications 184 (2013) 456468

    Fig. 8. Computer rendering of a simulation of a star cluster (1000 stars with aSalpeter mass function[27]distributed in a 1 pc Plummer sphere[28]) in which

    the mutual gravity and movement of the stars were solved self-consistently with

    the evolution of the stars and the hydrodynamics of the protostellar gas, togetherwith the stellar outflow in winds and supernovae [26]. The top image presents

    the stars and the gas distribution at the birth of the cluster, the second image at

    an age of about 4 Myr, and the bottom image at about 8 Myr, just before the last

    residual gas is blown out by the first supernova explosion. This 3D visualization is

    created by Maarten van Meersbergen and the animation can be downloaded from

    http://www.cs.vu.nl/ibis/demos.html[21].

    of providing user-friendly error messages, debugging assistance,and exception handling. However, for a framework like AMUSE,such an assumption may pose problems for the run-time behaviorand stability of the system, as well as being confusing and frustrat-

    ing for the user.To some degree, code fatalities can be handled gracefully by the

    MUSE manager. The failure of a particular communitymodule doesnotnecessarily resultin thefailure of theentire framework. As a re-sponse, the framework can return a meaningful error message, or,more usefully, continuing the simulation using another commu-nity code written for the same problem. The approach is analogousto fault-tolerant computing, in which a management process de-tects node failure and redirects the processes running on the failednode.

    4.3. Code coupling strategies

    In the AMUSE framework we recognize six distinct strategies

    for coupling community modules. Some can be programmed eas-ily by hand, although they can be labor intensive, while others are

    enabled by our implementation of the AMUSE framework. The ex-amples presented below are drawn taken from the public versionof AMUSE.

    1 Input/Output coupling: The loosest type of coupling occurswhen the resultof codeA generates the initial conditionsfor codeB. For example, a Henyey stellar evolution codemight generate mass, density and temperature profiles

    for a subsequent 3-dimensional hydrodynamical repre-sentation of a star.

    2 One-way coupling: System A interacts with (sub)system B,but the back-coupling from Bto A is negligible. For ex-ample, stellar mass loss due to internal nuclear evolutionis often important for the global dynamics of a star clus-ter, butthe dynamics of thecluster usually does notaffectthe evolution of individual stars (except in the rare caseof a physical stellar collision).

    3 Hierarchical coupling: SubsystemsAand A (and possiblymore) are embedded in parent system B, which affectstheir evolution, but the subsystems do not affect the par-ent or each other. For example, the evolution of cometaryOort clouds are strongly influenced by, but are irrelevant

    to, the larger galactic potential.4 Serial coupling: A system evolves through distinct regimeswhich need different codes A, B, C, etc., applied subse-quently in an interleaved way, to model them. For exam-ple, a collision between two stars in a dense star clustermay be resolved using one or more specialized hydrody-namics solvers, after which the collision product is rein-serted into the gravitational dynamics code.

    5 Interaction coupling: This type of coupling occurs when thereis neither a clear temporal or spatial separation betweensystems A and B. For example, in the interaction be-tween the interstellar medium and the gravitational dy-namics of an embedded star cluster, both the stellar andthe gas dynamics must incorporate the combined gravi-tational potential of the gas and the stars.

    6 Intrinsic coupling: This may occur where the physics of theproblem necessitates a solver that encompasses severaltypes of physics simultaneously, and does not allow fortemporal or spatial separation. An example is magneto-hydrodynamics, where the gas dynamics and the elec-trodynamics are so tightly coupled that they cannot beseparated.

    With the exception of the last, all types of coupling can be ef-ficiently implemented in AMUSE using single-component solvers.Many of the coupling strategies are straightforward in AMUSE,with the exception of the interaction and intrinsic couplingtypes. For interaction coupling, the symplectic multi-physics time-stepping approach originally described in[29]usually works very

    well. For intricate intrinsic coupled codes it may still be more effi-cient to write a monolithic framework in a single language, ratherthan adopt the method proposed here.

    4.3.1. Multi-physics time stepping by operator splitting

    The relative independence of the various community modulesin AMUSE allows us to combine them in the user script bycalling them consecutively. This is adequate for many applications,but in other situations such rigid time stepping is known tohave disastrous consequences for the stability of stiff systems,preventing convergence of non-linear phenomena[30].

    In some circumstances there is no alternative to simply alter-nating between modules in subsequent time steps. But in oth-ers we can write down the Hamiltonian of the combined solution

    and integrate this iteratively, using robust numerical integrationschemes. This operator splitting approach been demonstrated to

    http://www.cs.vu.nl/ibis/demos.htmlhttp://www.cs.vu.nl/ibis/demos.html
  • 7/25/2019 Zwart 2013

    10/13

    S.F. Portegies Zwart et al. / Computer Physics Communications 184 (2013) 456468 465

    Table 1

    Relative timing measurements of AMUSE in various configurations. The first column

    gives the number of star particles used in the simulations. Subsequent columns give

    the fraction of time spent in the AMUSE interface, rather than in the module(s). The

    relative timings in the second columns are performed using Fi as a stand alone module

    for both the gravity and the SPH inside AMUSE. For the other simulations we adopted a

    Bridge(see Section 4.3.1) interfaceusingtwo codes,for which we adopted Fi, PhiGPU,and a simple analytic tidal potential (field).

    N Fi Fi + Fi PhiGPU + Fi Fi + field PhiGPU + field

    10 0.1980 0.563 0.243 0.943 0.170

    50 0.129 0.109 0.885 0.308

    100 0.0010 0.061 0.057 0.701 0.322

    500 0.021 0.021 0.200 0.377

    1000 0.0003 0.017 0.017 0.113 0.415

    5000 0.0003 0.016 0.017 0.053 0.446

    Fig. 9. Schematic kickdriftkick procedure for the generalized mixed-variable

    symplectic method [32].

    work effectively and efficiently by[29], who adopted the Verlet-leapfrog algorithm to combine two independent gravitationalN-body solvers. It has been incorporated into AMUSE for re-solving interactions between gravitational and hydrodynamicalcommunity modules, and is called Bridge after the introducingpaper[29].

    TheclassicalBridge scheme [29] considersa star cluster orbitinga parent galaxy. The cluster is integrated using accurate directsummation of the gravitational forces among all stars. Interactionsamong the stars in the galaxy, and between galactic and clusterstars, are computed using a hierarchical tree force evaluationmethod [31].

    In Bridge, the Hamiltonian of the entire system is divided intotwo parts:

    H = HA + HB, (1)

    whereHA is the potential energy of the gravitational interactionsbetween galaxy particles and the star cluster (Wgc):

    HA = Wgc, (2)

    and HB is the sum of the total kinetic energy ofall particles (Kg+Kc)and the potential energy of the star cluster particles ( Wc) and thegalaxy (Wg)

    HB = Kg+ Wg+ Kc+ Wc Hg+ Hc. (3)

    The time evolution of any quantity f under this Hamiltoniancanthen be written approximately (because we have truncatedtheformal solution to just the second-order terms) as:

    f(t + t) e12tAetBe

    12tAf(t), (4)

    where the operators A and B are defined by Af = {f, HA}, Bf ={f, HB}, and {., .} is a Poisson bracket. The evolution operator e

    tB

    splits into two independent parts becauseHBconsists of two partswithout cross terms. This is the familiar second-order Leapfrogalgorithm. The evolution can be implemented as a kickdriftkickscheme, as illustrated inFig. 9.

    4.4. Performance

    The performance of the AMUSE framework depends on thecodes used, the problem size, the choice of initial conditions and

    the interactions among the component parts as defined in the userscript. Itis thereforenot possible topresent a general account of the

    Fig. 10. Wall-clock time for 1 N-body time unit (averaged from running for 10

    N-body time units) of a selection of 9 gravitational N-body solvers in the AMUSE

    framework. The direct codes are represented as dashed curves, the tree codes as

    solid lines. The scaling for direct N-body ( N2) and tree algorithm ( Nlog N)

    are indicated with the black solid lines. The 9 production quality codes includedirect (N2) N-body solvers with a 6th order Hermite predictor corrector integrator

    (MI6 [37,38]), several 4th order Hermite integrators (Hermite [39], Ph4) semi-

    symplectic code based on Hamiltonian splitting integrator (Huayno), and a suite of

    tree codes (BHTree[31], Gadget2[40], Bonsai[33]). For ph4 we also show the GPU

    accelerated version, which is realized using the Sapporo library. The performance

    of PhiGPU[41]is not shown because it is almost identical to that of ph4 with GPUsupport. Thereasonsfor thesimilarperformancestems from theuse of theSapporo

    GPU library for N-body solver, which are used by both codes. Also Octgrav [42]

    is omitted, which up to about N 104 performs like Bonsai and then follows Fi.

    All runs ware carried out on a 4-core workstation with an Intel Xeon CPU E5620

    operating at 2.40 GHz and NVIDIA G96 (Quadro FX580) running a generic 64-bit

    Ubuntu Linux kernel 2.6.35-32.

    overhead of theframework,or thetiming of theindividual modulesused. However, in order to provide some understanding of thetimespent in the framework, as opposed to the community modules,we present the results of two independent performance analysis,one using a suite of 9 gravitational N-body solvers (Figs. 10 and 11)on three of which we report in more detail in Fig. 12,and analysisof a coupled hydrodynamics and gravitational dynamics solver inseveral framework solutions (Table 1).

    4.4.1. Mono-physics solver performance

    In Fig. 10 we compare the performance of several N-bodysolvers in AMUSE. Some of these solvers are GPU accelerated (ph4andBonsai[33]), others run on multiple cores (Fi[34,35]), butmost were run on a single processor even though they couldhave been run in parallel. Each calculation was run using standardparameters (double precision, time step ofdt = 26 or an adaptive

    time step determined by the community code, an opening angle of = 75 for tree-codes, and N1/3 softening length). The initial

  • 7/25/2019 Zwart 2013

    11/13

    466 S.F. Portegies Zwart et al. / Computer Physics Communications 184 (2013) 456468

    Fig.11. Overhead (fractionof time spent in theAMUSEframework) in a numberof

    N-body simulation codes embedded in the AMUSE framework. The line styles are

    as inFig. 10.

    conditions were selected from a Plummer [28] sphere in N-bodyunits in which all particles had the same mass. The runs wereperformed for 10 N-body time units [36] and include frameworkoverhead and analysis of the data, but not the generation of theinitial conditions or spawning the community code.

    For smallN(10)the performance of allcodessaturates, mainlydue to the start-up cost of the AMUSE framework, the constructionof a tree for only a few particles, or communication with theGPU (which is particularly notable for Bonsai). For large N, theperformance of the direct N-body codes (dashed curves) scales N2. The wall-clock time of the tree codes (solid curves) scale Nlog N, as expected, but with a rather wide range in start-up timefor small N depending on the particulars of the implementation.The largest offset is measured for Bonsai; because of its relativelylarge start-up time of 0.2 s, its timing remains roughly constantuntil N 103, after which it performs considerably better thanthe other (tree) codes, which have a smaller offset but reach theirterminal speed at relatively small N. It is in particular due to theuse of the GPU that the start-up times for Bonsai (and also Octgrav,not shown) dominate until quite large N, and terminal speed is notreached untilN104. These codes require large Nto fully benefitfrom the massive parallelism of the GPU.

    In Fig. 11 we present the fraction of the wall-clock time (inFig. 10) spent in theAMUSEframework. This figuregives a differentperspectivefrom Fig. 10. Thetree codes perform generally worse interms of efficiency, mainly because of the relatively small amountof calculation time spent in the N-body engine compared to themuch more expensive direct N-body codes; this is particularlynotable in Fi and Bonsai. Eventually, each code reaches a terminal

    efficiency, the value of which depends on the details of theimplementation. Direct N-body codes tend to converge to anaverage overhead of0.1% of the runtime, whereas tree codesreach something like 1% for the fastest code (Bonsai). Theparticular shape of the various curves depends on the scaling withNof thework done by thecommunity code andthe framework. Wecan draw a distinction between generating the initial conditions(O(N)), starting up the community module (

  • 7/25/2019 Zwart 2013

    12/13

    S.F. Portegies Zwart et al. / Computer Physics Communications 184 (2013) 456468 467

    running a tree code. The analysis and commit parts still have thesame scaling withN, and for largeNless than 1% of the wall-clocktime is spent in the framework. The difference between the GPUand CPU versions of the direct code is most evident in committingthe data to the code, which is relatively slow for the GPU enabledcode, and particularly severe for smallN.

    4.4.2. Multi-physics performanceOne of the main advantages of AMUSE is the possibility ofrunning different codes concurrently, with interactions amongthem, as discussed in Section 4.3.1. To explore the additionaloverhead generated in such cases, we present in Table 1timingsfor a series of simulations combining gravitational dynamics andhydrodynamics. The timings listed in the table are determinedusing three different codes: the treeSPH code Fi [34,35], the directgravitational N-body code PhiGPU [41], and an analytic staticbackground potential (identified as field in the top row of thetable). The treeSPH code is implemented in FORTRAN90, the directgravitationalN-body code is in FORTRAN77, and the analytic tidalfield is implemented in python.

    The simulations were performed for N stars, each of one solarmass, in a cluster with a Plummer [28]initial density profile anda characteristic radius of 1 pc. We simulated the orbits of clusterstars within a cloud of gas having 9 times as much mass as instars,and thesame initial density distribution. The gas content wassimulated using 10 Ngas particles. To treat the direct gravitationalN-body solverPhiGPU in the same way as the tree-code Fi forsolving the inter-particle gravity, we adopted a softening lengthof 0.01 pc for the stars. When employing Fi as an SPH codewe adopted a smoothing length of 0.05pc for the gas particles.The N-body time step was 0.01N-body time units[36], and theinteractions between the stellar- and fluid-dynamical codes wereresolved using a Bridge time step of 0.125N-body time units (seeSection4.3.1,but for more details about the simulation see [26]).

    The first column in Table 1 gives the number of stars in thesimulation. Subsequent columns give the fraction of the wall-clock

    time spent in the AMUSE framework relative to the time spend inthetwo production codes.In thesecond column, we usedFi forthestars aswellas for the gas particles.No Bridgewas usedin thiscase;rather, the stars were implemented as massive dark particles inthe SPH code. For the simulations presented in the third columnwe used Bridge to couple the massive dark particles and theSPH particles, but both the gas dynamics and the gravitationalstellar dynamics were again calculated using Fi as a gravitationaltree-code. Not surprisingly, the AMUSE overhead using Bridge islarger than when all calculations are performed entirely within Fi(compare the second and third columns in Table 1), but for N 500, the overhead in AMUSE becomes negligible. For N = 1000the simulation lasted 769.1s when using onlyFi, whereas whenwe adopted Bridge the wall-clock time was 897.7 s. In production

    simulations the number of particles will generally greatly exceed1000, and we consider the additional cost associated with theflexibility of using Bridge time well spent.

    For small N (50), a considerable fraction (up to 83%) of thewall-clock time is spent in the AMUSE framework. This fraction ishigh, particularly for the combination ofPhiGPUand the analyticfield, because of the small wall-clock time (less than 10 s) of thesesimulations. The complexity of both codes, O(N2)for PhiGPU andO(N) for the analytic external field introduces a relatively largeoverhead in the python layer compared to the gravity solver, inparticular since the gravitational interactions are calculated usinga graphical processing unit, which for a relatively small numberof particles introduces a rather severe communication bottleneck.This combination of unfavorable circumstances cause even the

    N = 5000 runs with PhiGPU and the tidal field to spend morethan 50% of the total wall-clock time in the AMUSE framework.

    However, we do not consider this a serious drawback, since suchcalculations are generally only performed for test purposes. In thiscase it demonstrates that AMUSE has a range of parameters withinwhich it is efficient, butthatthereis also a regimefor which AMUSEdoes not provide optimal performance.

    5. Discussion

    We have described MUSE, a general software framework incor-porating a broad palette of domain-specific solvers, within whichscientists can perform detailed simulations combining differentcomponent solvers, within which scientists can perform detailedsimulations using different component solvers, and AMUSE, an as-trophysical implementation of the concept. The design goal of theenvironment is that it be easy to learn and use, and enable com-binations and comparisons of different numerical solvers withoutthe need to make major changes in the underlying codes.

    Oneof thehardest,and only partlysolved, problems in (A)MUSEis a general way to convert data structures from one type toanother. In some cases such conversion can be realized by simplychanging the units, or reorganizing a grid, but in more complex

    cases, for example, converting particle distributions to a tessellatedgrid, and vice versa, the solution is less clear. Standard proceduresexist for some such conversions, but it is generally not guaranteedthat the solution is unique.

    Some drawbacks of realizing the inter-community modulecommunication using MPI are the relatively slow response ofspawning processes, the limited flexibility of the communicationinterface, and the replication of large data sets in multiple real-izations of the same community module. Although it is straight-forward to spawn multiple instances of the same code, theseprocesses do not (by design) share data storage, aside from thedata repository in the manager. For codes that require large datastorage, for example look-up tables for opacities or other commonphysical constants, the data are reproduced as many times as the

    process is spawned. This limitation may be overcome by the useof shared data structures on multi-core machines, or by allowinginter-community module communication via tunneling, wheredata moves directly between community modules, rather than be-ing channeled through the manager.

    The python programming environment is not known for speed,although this is generally not a problem in AMUSE, since little timeis spent in the user script and the underlying codes are highlyoptimized, allowing good overall performance. Typical monolithicsoftware environments lose performance by unnecessary callsamong domains; in MUSE this is prevented by the loose couplingbetween community modules. We have experimented with arange of possible ways to couple the various community modules,andcan roughly quantify the degree to which communitymodules

    are coupled.MUSE, as described here, is best suited for problems in whichthe different physical solvers are relatively weakly coupled. Weakand strong coupling may be distinguished by the ratio of the timeintervals on which different community modules are called (seealso Section 4). For time step ratios 1/10, the AMUSE imple-mentation gives excellent performance, but if the ratio approachesunity the MUSE approach becomes progressively more expensive.It is, however, not clear to what extent a monolithic solver willgive a better performance in these strongly coupled cases. Theupshot is that there is a range of problems for which our imple-mentation works well, butthere are surely other interesting multi-physics problems in astrophysics and elsewhere for which thisseparation in scales is not optimal. However, we have experi-

    mented with more strongly coupled community modules, andhave not yet found it to be a limiting factor in the system design.

  • 7/25/2019 Zwart 2013

    13/13

    468 S.F. Portegies Zwart et al. / Computer Physics Communications 184 (2013) 456468

    Acknowledgments

    We are grateful to Jeroen Bdorf, Niels Drost, Michiko Fujii,Evghenii Gaburov, Stefan Harfst, Piet Hut, Masaki Iwasawa, JunMakino, Maarten van Meersbergen, and Alf Whitehead for use-ful discussions and for contributing their codes. SPZ and SMcMthank Sverre Aarseth and Peter Eggleton for their discussion at theIAU Symposium 246 on the possible assimilation of each othersproduction software, which started us thinking seriously about(A)MUSE. This work was supported by NWO (grants VICI[#639.073.803], AMUSE[#614.061.608] and LGM [# 612.071.503]),NOVA and the LKBF in the Netherlands, and by NSF grant AST-0708299 and NASA NNX07AG95G in the U.S.

    References

    [1] P.P. Eggleton, MNRAS 163 (1973) 279.[2] R.A. Gingold, J.J. Monaghan, MNRAS 181 (1977) 375.[3] S.J. Aarseth, Multiple Time Scales, 1985, pp. 377418.[4] B. Fryxell, et al., ApJS 131 (2000) 273.[5] M. Pharr, R. Fernando, GPU Gems 2 (Programming Techniques for High-

    Performance Graphics and General-Purpose Computation), Addison Wesley,ISBN: 0-321-33559-7, 2005.

    [6] J. McCarthy,P. Abrahams,D. Edwards, S. Hart,M.I. Levin, LISP1.5 Programmers

    Manual: Objecta synonym for atomic symbol, MIT Press, 1962, p. 105.[7] B. Kent, W. Cunningham, OOPSLA 87 workshop on Specification and Design

    for Object-Oriented Programming, 1987.[8] P. Teuben, in: R.A. Shaw, H.E. Payne, J.J.E. Hayes (Eds.), Astronomical Data

    Analysis Software and Systems IV, in: Astronomical Society of the PacificConference Series, vol. 77, 1995, p. 398.

    [9] J. Barnes, P. Hut, P. Teuben, Astrophysics Source Code Library, 2010, p. 10051.[10] S.F. PortegiesZwart,S.L.W. McMillan,P. Hut, J.Makino,MNRAS321 (2001)199.[11] P. Hut, S. McMillan, J. Makino, S. Portegies Zwart, Astrophysics Source Code

    Library, 2010, p. 10076.[12] S.F. Portegies Zwart, F. Verbunt, A&A 309 (1996) 179.

    [13] P. Hut, et al., New Astronomy 8 (2003) 337.[14] A. Sills, et al., New Astronomy 8 (2003) 605.[15] M.B. Davies, et al., New Astronomy 12 (2006) 201.[16] S. Portegies Zwart, et al., New Astronomy 14 (2009) 369.[17] E. Lusk, N. Doss, A. Skjellum, Parallel Comput. 22 (1996) 789.[18] J. Maassen, H. Bal, in: the 16th international symposium on High Performance

    Distributed Computing. New York, NY, USA, 2007, pp. 110.[19] F.J. Seinstra, et al., Jungle computing: distributed supercomputing beyond

    clusters, grids, and clouds, 2011.[20] H.E. Bal, et al., Computer 43 (8) (2010) 54.[21] N. Drost, et al., ArXiv e-prints, 2012.[22] S. Portegies Zwart, S. McMillan, I. Pelupessy, A. van Elteren, ArXiv e-prints,

    2011.[23] S. McMillan, S. Portegies Zwart, A. van Elteren, A. Whitehead, ArXiv e-prints,

    2011.[24] S. Portegies Zwart, E.P.J. van den Heuvel, J. van Leeuwen, G. Nelemans, ApJ

    734 (2011) 55.[25] M.S. Fujii, S. Portegies Zwart, Science 334 (2011) 1380.[26] F.I. Pelupessy, S. Portegies Zwart, MNRAS 420 (2012) 1503.[27] E.E. Salpeter, ApJ 121 (1955) 161.[28] H.C. Plummer, MNRAS 71 (1911) 460.[29] M.Fujii,M. Iwasawa,Y. Funato, J.Makino,Publ.Astr.Soc.Japan 59 (2007)1095.[30] T.E. Hull, W.H. Enright, B.M. Fellen, A.E. Sedgwick, SIAM J. Numer. Anal. 9 (4)

    (1972) 603.[31] J. Barnes, P. Hut, Nat 324 (1986) 446.[32] J. Wisdom, M. Holman, AJ 102 (1991) 1528.[33] J. Bdorf, E. Gaburov, S. Portegies Zwart, J. Comput. Phys. 231 (2012) 2825.[34] F.I. Pelupessy, P.P. van der Werf, V. Icke, A&A 422 (2004) 55.

    [35] F.I. Pelupessy, P.P. Papadopoulos, P. van der Werf, ApJ 645 (2006) 1024.[36] D.C. Heggie, R.D. Mathieu, in: P. Hut, S. McMillan (Eds.), LNP Vol. 267: The

    Use of Supercomputers in Stellar Dynamics, in: Lecture Not. Phys, 267,Springer-Verlag, Berlin, 1986.

    [37] K. Nitadori, J. Makino, New Astronomy 13 (2008) 498.[38] M. Iwasawa, S. An, T. Matsubayashi, Y. Funato, J. Makino, ApJL 731 (2011) L9.[39] P. Hut, J. Makino, S. McMillan, ApJL 443 (1995) L93.[40] V. Springel, MNRAS 364 (2005) 1105.[41] S. Harfst, et al., New Astronomy 12 (2007) 357.[42] E. Gaburov, J. Bdorf, S. Portegies Zwart, Procedia Computer Science, vol. 1,

    pp. 11191127, 1, 2010.