12
TVE-F 17 012 juni Examensarbete 15 hp Juni 2017 Bachelors Thesis in Scientific Computing Optimizing stochastic simulation of a neuron with parallelization Anders Liss

Bachelors Thesis in Scientific Computing

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bachelors Thesis in Scientific Computing

TVE-F 17 012 juni

Examensarbete 15 hpJuni 2017

Bachelors Thesis in Scientific Computing Optimizing stochastic simulation of a neuron

with parallelization

Anders Liss

Page 2: Bachelors Thesis in Scientific Computing

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

Optimizing stochastic simulation of a neuron withparallelization

Anders Liss

In order to optimize the solving of stochastic simulations of neuron channels, anattempt to parallelize the solver has been made. The result of the implementation wasunsuccessful. However, the implementation is not impossible and is still a field ofresearch with big potential for improving performance of stochastic simulations.

ISSN: 1401-5757, TVE-F 17 012 juniExaminator: Martin SjödinÄmnesgranskare: Igor RochaHandledare: Stefan Engblom

Page 3: Bachelors Thesis in Scientific Computing

Contents1 Introduction 4

2 Theory 4

2.1 Modeling a neuron channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.1 Microscale: Ion channel gating process . . . . . . . . . . . . . . . . . . . . . 42.1.2 Intermediate scale: current-balance and cable equation . . . . . . . . . . . . 52.1.3 Macroscale: extracellular field potentials . . . . . . . . . . . . . . . . . . . . 5

2.2 URDME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.3 Ingoing arguments to URDME . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 SSA - Stochastic Simulation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.1 Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Parallelization using OpenMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.5 Random Number Generation (RNG) using GNU scientific library (GSL) . . . . . . 8

3 Method 10

3.1 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Result 10

5 Discussion 10

5.1 Discussion of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105.2 Discussion of future implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 115.3 Discussion of project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.3.1 Project work - Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115.3.2 Projet plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6 Conclusion 11

3

Page 4: Bachelors Thesis in Scientific Computing

1 IntroductionUsing a software called URDME, we can make stochastic calculations in complicated geometries. Inorder to e�ectively use URDME to provide these stochastic simulations, we can run the simulationsparallel to optimize the speed of the simulation. The goal of this BCs thesis is to improve byparallelization one of the solvers included in URDME called "SSA". This solver is particularlyuseful in the modeling of neurons.

With an appropriate model like the neuron model, combined with a stochastic simulationframework such as URDME containing a solver, SSA, the components needed to make a stochasticsimulation are present. Generally, the simulation of any model of a complex geometry will sharethe overall structure of the multiscale neuron model detailed below. The reason for using anddetailing the model of the neuron in this BCs thesis is due to the fact that the work being done onthe SSA solver was made in conjuncture with the multiscale neuron model as a platform to testthe solver.

This report begins with the background surrounding the stochastic simulation. In order todescribe the process of taking a complex geometry and simulating a model, an example containinga neuron channel is described. After which the framework used to make the stochastic simulation,URDME and the SSA solver is described. This also includes the specification of which ingoingdata is used in the calculations. Finally, the software used in order to optimize the stochasticsimulation, OpenMP and Gnu Scientific Library random number generator are described. Withthe surrounding theory made clear, the report then covers the implementation of OpenMP on theSSA solver and the following results. Finally a discussion regarding the results, some meditationon the work done in the project and how this implementation can be done in the future.

2 Theory2.1 Modeling a neuron channelThe model of the neuron in this BCs thesis is from the work of P. Bauer, S. Engblom and S. Mikulvicin Multiscale modeling via split-step methods in neural firing. The model is based on splitting themodel into layers. The inner layer, or the model at its microscale is the ion channel gating process.At this scale, the membrane of the neuron acts as a gate, when open, it allows ions to pass to andfrom the intracellular stucture of the neuron. This is followed by the intermediate layer, the currentbalance and cable equation of the neuron as a whole. In this scale, the geometry of a neuron isdivided into compartments, each with di�erent current sources acting on the compartments of theneuron. Finally there is the extracellular layer. It models the distribution of an electric field as aresult of the change in voltage inside the neuron. [2]

2.1.1 Microscale: Ion channel gating process

The membrane of the ion channel reacts to a change in the voltage of on the membrane. This typeof reaction is called a voltage-gated ion channel. Another example is the ligand-gated ion channel,where the membrane reacts to a ligand chemical acting on the membrane. As a result of a changein voltage or the acting of a chemical on the membrane, the ion channel may open and allow ionsto pass through the membrane through pathways called pores [8]. The model of this BCs thesisis focused on the voltage-gated ion channel. This gating process is modeled stochastically withtransition properties measured from experiments using the voltage- or current clamp [8][2].

When modeling the gating process, the membrane of the neuron can be seen as being able totake on discrete states. The state of the neuronal membrane correlates to a set of transitionalparameters. It also dictates the probability of the membrane changing to another state. In otherwords, the state dictates whether ions can pass through the membrane. As P. Bauer, S. Engblomand S.Mikulovic writes in Multiscale modeling via split-step methods in neural firing "The transitionrates between the states depend on the membrane voltage and on the state itself. When thesetransitions take place in a microscopic environment where molecular noise is present, a continuous-time Markov chain (CTMC) is the most suitable model" [2].

The current state of the neuron can be seen as corresponding to a discrete state of a Markovchain. [2] see figure 1.

4

Page 5: Bachelors Thesis in Scientific Computing

Figure 1: A visualization of the states, and the transitions between states in the gating process.

2.1.2 Intermediate scale: current-balance and cable equation

Following the modeling of the inner workings of the ion channel, the intermediate scale models thebehavior of a collection of connected neuronal "compartments" [2]. The voltage in each compart-ment is calculated by solving for the current balance equation, with the current sources acting oneach compartment, See figure 2. The currents acting on each compartments are [2].

• the total axial current, Ia

axial

with respect to its connected compartments,

• the ionic current Ia

ionic

from the microscale ion gating process,

• the leakage current Ia

L

,

• a possible external current source Ia

inj

.

The currents are added into the current-balance equation and solved for the voltage of eachcompartment.

Figure 2: Current sources acting on one compartment of a neuron.

2.1.3 Macroscale: extracellular field potentials

By relating the ion channel to a cable-like geometry, for example a copper wire, the assumptionthat Maxwells equations can describe the electrostatic environment surrounding the neuron. Theresulting extracellular electric field can a�ect the nearby neurons, with an external current sourceacting on the neuron, perhaps resulting in nearby voltage-gated ion channel opening [2].

5

Page 6: Bachelors Thesis in Scientific Computing

2.2 URDME2.2.1 Background

Living cells such as neurons create intrinsic noise as a result of their microphysical conditions.This underlying noise results in fluctuations of the probability of a reaction happening. Stochas-tic simulations in order to more accurately portray the underlying noise and fluctuations of thecells probability of reaction has emerged as an important computational tool [9] [2]. The creatorsB. Drawert, S. Engblom and A. Hellander describes URDME in the article URDME: a modularframework for stochastic simulation of reaction-transport processes in complex geometries: "UR-DME uses U nstructured triangular and tetrahedral meshes to resolve general geometries, and relieson the R eaction-D i�usion M aster E quation formalism to model the processes under study" [5].

2.2.2 Implementation

The URDME framework is modular. This implies that the core of the program is built to acceptmultiple di�erent sources of input, or is compatible with di�erent types solvers. The logical struc-ture of URDME is seen in Figure 2.2.2. The modular nature of URDME is comprised of threedistinct layers. The first layer contains the external geometric mesh generator. The data structuresprovided by the generator is described in Table 2.2.3.[5]

The second layer of URDME is the Matlab interface. Here the user specifies the propensitiesand transition rates of the reactions in the model.[5] See Table 2.2.3 for details on which datastructures it provides.

Finally the last layer contains the solver used to provide the stochastic simulation based on thedata structures provided by the two previous layers. In the case of the SSA solver, it uses a formof Gillespies algorithm in order to obtain information regarding the states and reactions occuringin the model. [5] [4]

Figure 3: Implementation of URDME using an external mesh generator for geometric shape,Matlab as the interface finally the solvers.

2.2.3 Ingoing arguments to URDME

As previously stated, the ingoing data structures for the simulation is created in part by the meshgenerated externally and in part by Matlab. The ingoing data structures created by the externalmesh generator are data structures related to the geometry of the model, see table 2.2.3. Thecreated data structures in Matlab are related to the stochastic simulation of the model, see table2.2.3.

The main goal of the data structures from the external geometric generator is to provide thestates and di�usion rates of each discrete state, along with the actual geometry of the object beingmodeled.[4]

The di�erent matrices created in Matlab supplies the information regarding what and when thereactions on the geometry take place. The stoichiometric matrix defines the e�ect of the chemicalreactions on the state. The dependency graph indicates the reaction propensities that need to beupdated after a reaction has occured.[4]

6

Page 7: Bachelors Thesis in Scientific Computing

Table 1: Ingoing data to solver from external geometric mesh generator.[4]Name Description

Ncells Number of subvolumesMspecies Number of di�erent statesD Sparse matrix of a transpose of the di�usion matrix. Each column

corresponds to a subvolume and D(i,j) give the di�usion rate fromsubvolume i to subvolume j.

vol The volume of the compartments.sd Subdomain numbers for all subvolumes.

Table 2: Ingoing data to solver from Matlab.[4]Name Description

Mreactions Number of reactions to occur during simulationN The stoichiometric matrix. Each column responds to a reaction.G Dependency graph used to calculated reaction propensities be-

tween di�erent states of the model, depending on the current stateof the subvolumes.

u0 Initial states of the model.tspan Timevector containing points in time where the state of the system

is to be returned.data Optional. Generalized data vector for additional arguments for

propensities in subvolumes.

2.3 SSA - Stochastic Simulation AlgorithmAs stated in Section 2.1.1, the ion gating process in the neuron can be modeled after a Markovprocess. The current discrete state of a compartment of the neuron, or more generally, a cell beingsimulated with URDME is related to a set of propensities for that state to change as a result oftime. In order to successfully simulate this dynamic change in each compartment of the neuron, onewill have to be able to accurately simulate the Markov process taking place inside the individualcompartments. This can be done using the Stochastic Simulation Algorithm, SSA.[5] The basicoutline of the SSA solver is that it provides stochastic information about when a reaction occurs,what kind of reaction, and how this a�ects the model as a whole.[7]

2.3.1 Calculations

The SSA solver takes initial data from Matlab and the geometric mesh generator, including astoichiometric matrix, a propensity function, a dependency graph, see table 2.2.3 and table 2.2.3for complete ingoing data. [5] The main loop of the SSA solver begins by specifying the time of thenext reaction followed by, if a reaction occurred, determining which reaction. The change of thesystem as a result of the reaction which took place is determined using the stoichiometric matrix.If a reaction happens, the dependency graph is used to indicate which propensities need to beupdated, i.e. what states have been changed, and therefore have di�erent reactionary propensities.Once this is done, the reaction has been recorded, the state of all the subvolumes are updated andrecorded and the algorithm can reiterate its loop.

In the case of the neuron, URDME uses the SSA solver to determine the states of the neuronchannel gates throughout the entire timespan tspan. An array containing information on the statesof the neuron corresponds is then used in the Matlab interface to calculate the current propagatingthrough the neuron as specified in Section 2.1.

7

Page 8: Bachelors Thesis in Scientific Computing

2.3.2 Algorithm

Algorithm 1 displays how the SSA solver operates. r is a random number taken from a uniformdistribution, library function called drand48().

for all subvolumes do

calculate propensity of reactions that can occur, store in rrate;store sum of reaction propensities, in srrate;for do

step forward in time: t = t - log(1 - r) / srrate;determine the reaction;update state of the subvolume according to reaction;recalculate reaction propensity of subvolume to reflect new states;

end

end

Algorithm 1: Stochastic Simulation Algorithm used in URDME

2.4 Parallelization using OpenMPInstead of running the program through serially, a program is split up and run on di�erent coresof the computer. OpenMP is one of the frameworks available to use in parallelization and is"... a set of compiler directives and callable runtime library routines that extend Fortran (andseparately, C and C++) to express shared-memory parallelism." (L. Dagum, R. Menon. OpenMP:An Industry-Standard API for Shared Memory Programming) [3].

OpenMP functions by splitting the program on threads, each thread pertaining to a specificcore of a computer. Implementation of OpenMP into pre-written code is done by adding thecommand

#pragma omp parallel.

This initiates the parallelization of the program. With it there are a multitude of clauses thatcan be used to make sure the parallelized threads are working correctly. A clause can for exampledirect the threads to use the same shared variables already initiated before the code is parallelized,or to create their own set of variables, private to the thread. Along with clauses, OpenMP supportsa flexible system of initializing and closing parallel regions. Specific parts of the code, parts whichmay need more computing power can be parallelized, after which the program can return to runningthe code serially. [1], see Figure 4.

Figure 4: Visualization of OpenMP splitting a program from one master thread into severalthreads[3].

2.5 Random Number Generation (RNG) using GNU scientific library(GSL)

One example of a common random number generator is the library function drand48(), used in theSSA solver in (2.3) to provide random number generation for the stochastic simulation. Althoughreliable in its generation of random numbers, drand48() does not keep track of its own state. Ituses instead a global variable to store it. The result of this is that the drand48() on the multiplestates will each try to read or write the state of the variable at the same time, which ultimatelycause performance drops.

8

Page 9: Bachelors Thesis in Scientific Computing

What is needed is a random number generator that keeps track of its own state, thus allowingthe parallel threads to each keep track of its own generator. The GSL contains a RNG where..." Each instance of a generator keeps track of its own state, allowing the generators to be usedin multi-threaded programs."[6]. The generator comes with a wide variety of di�erent generators,based on di�erent algorithms and systems, see Table 3.

Table 3: Relative performance of a selection of random number generators.[6]

No. of 103 integers/sec No of 103 doubles/sec Type of generator1754 k ints/sec 870 k doubles/sec taus1613 k ints/sec 855 k doubles/sec gfsr41370 k ints/sec 769 k doubles/sec mt19937565 k ints/sec 571 k doubles/sec ranlxs0400 k ints/sec 405 k doubles/sec ranlxs1490 k ints/sec 389 k doubles/sec mrg407 k ints/sec 297 k doubles/sec ranlux243 k ints/sec 254 k doubles/sec ranlxd1251 k ints/sec 253 k doubles/sec ranlxs2238 k ints/sec 215 k doubles/sec cmrg247 k ints/sec 198 k doubles/sec ranlux389141 k ints/sec 140 k doubles/sec ranlxd2

9

Page 10: Bachelors Thesis in Scientific Computing

3 Method3.1 ResourcesThe following software was used to work on the SSA-solver.

• Matlab 2016a (MATLAB and Statistics Toolbox Release 2016a, The MathWorks, Inc., Nat-ick, Massachusetts, United States.)

• GNU scientific library 2.3

• OpenMP 4.5

• GCC 6.1 compiler for Linux

• URDME 1.3

3.2 ImplementationParallelization of the code in the SSA-solver was primarily done with Matlab as the program forrevising code. This was because the solver itself has to be able to compile with Matlab whenfinished. The process of parallelization of SSA was the following.

• Work to get compilers for the OpenMP and GNU scientific library (GSL) functioning. Atthe start this work was done on a Macbook, but the work later transgressed onto the Linuxplatform, utilizing a Unix computer.

• When both GSL and OpenMP was working the task of making SSA a parallell solver wasstarted.

• Random number generators from GSL were implemented in order to experiment on per-formance changes relating to di�erent RNGs. The work was done while the code was stillworking serially, not parallel.

• The implementation of OpenMP was made by expanding the allocation of memory for vec-tors to save data from each parallel thread, and working on structuring the output of thecalculations to return a coherent and still correct data.

• Work on the code also included hashing out some solutions for minor incompatibilities re-garding OpenMP and the original code.

4 ResultThe work of parallelizing the SSA-solver was not successful. Due to segmentation errors, Matlabcrashed on every occasion of running the program after attempting to implement OpenMP. Theactual cause of the crash is unclear, but the reason for the crash being segmentation violation,the error points to it having to do with the allocation of memory or pointers working within theparallel threads.

5 Discussion5.1 Discussion of resultsMatlab crashed on every occasion when running the software when OpenMP was implemented onthe code in SSA. While the implementation of the OpenMP software was not successful, the workwith GSL random number generators was, and gave an insight into the performance of di�erentrandom number generators.

10

Page 11: Bachelors Thesis in Scientific Computing

5.2 Discussion of future implementationsWhile the project yielded no results, the potential for implementing the SSA solver is still an inter-esting workfield. With a parallelized SSa-solver, one could then move on to assess the performancerate of di�erent random number generators in a parallel program. It would also lead to furtherincreases in the performance of simulating the models which URDME is already doing.

5.3 Discussion of project5.3.1 Project work - Time

The reason of the parallel SSA-solver not working was likely based out of errors in allocatingmemory, or likewise pointing to incorrect memory addresses when programming in C. Lookingback on the beginning of the project, one could see how focusing more time and e�ort into gettinga deeper understanding of this new programming language would be useful. Further, a deeperunderstanding of the algorithm used in SSA would mean an better way of understanding theunderlying problems when parallelizing it. These are some of the reasons to this BCs thesisunfortunate results.

External causes that might have influenced the results of this BCs thesis are a few as well.Starting with the fact of the primary work to get a functioning compiler working on a Macbook.This was a process that took several weeks. Comparatively, this process took a couple of hourswhen making the switching to Linux software. Reevaluating the project, this is probably thebiggest reason for falling behind schedule, and ultimately the cause of the poor results.

5.3.2 Projet plan

The project plan written for this project followed a logical form: work on code, continually writeabout progress or results in the report. Reevaluating this project plan, the main issue with it wasthe order of work that has to be done. Too little attention was given to the report in the beginning.The work during the starting weeks should have been focused on writing the theory and methodpart of the report to completion. This would increased the knowledge-base of the project going in,while also helping in getting a solid work order to lean back on.

6 ConclusionThe implementation of OpenMP in order to make the SSA-solver parallel was not successful. Al-though no results were produced during this project, the framework to continue the implementationof the parallelization of the SSA solver is in place. With the increasing quality of models beingsimulated, the computational power will have to be optimized in order to produce more complexsimulations e�ciently.

11

Page 12: Bachelors Thesis in Scientific Computing

References[1] Blaise Barney. Openmp, June 2016.

[2] Pavol Bauer, Stefan Engblom, and Sanja Mikulovic. Multiscale modeling via split-step methodsin neural firing. arXiv preprint arXiv:1611.00509, 2016.

[3] Leonardo Dagum and Ramesh Menon. Openmp: an industry standard api for shared-memoryprogramming. IEEE computational science and engineering, 5(1):46–55, 1998.

[4] Brian Drawert, Stefan Engblom, and Andreas Hellander. Urdme 1.1: User’s manual. ArXive-prints, 2011.

[5] Brian Drawert, Stefan Engblom, and Andreas Hellander. Urdme: a modular framework forstochastic simulation of reaction-transport processes in complex geometries. BMC systemsbiology, 6(1):76, 2012.

[6] M. Galassi et al. GNU Scientific Library Reference Manual - Third Edition. Network TheoryLtd., 2009.

[7] Daniel T Gillespie. A general method for numerically simulating the stochastic time evolutionof coupled chemical reactions. Journal of computational physics, 22(4):403–434, 1976.

[8] Bertil Hille et al. Ion channels of excitable membranes, volume 507. Sinauer Sunderland, MA,2001.

[9] Mukund Thattai and Alexander Van Oudenaarden. Intrinsic noise in gene regulatory networks.Proceedings of the National Academy of Sciences, 98(15):8614–8619, 2001.

12