A GPU-based Framework for Large-scale Multi-Ag

A GPU-based Framework for Large-scale Multi-Agent Traffic Simulations

Yoshihito Sano

Graduate School of InformaticsShizuoka UniversityHamamatsu, Japan

[email protected]

Naoki Fukuta

Graduate School of InformaticsShizuoka UniversityHamamatsu, Japan

[email protected]

Abstract—In order to improve the reproducibility of realsituations, agents should respond to dynamic environmentalchanges, as well as considering efficient computation of themsince the simulation often becomes huge scale. In this paper, anapproach and the basic architecture for GPGPU-based efficientand scalable framework is presented, by applying OpenCL-based multi-platform agent code conversion engine. We presenta prototype implementation of the framework to easily test andtry the implemented path planning codes in various settings.

Keywords-multiagent system; traffic simulation; GPU com-puting;

I. INTRODUCTION

Multi-agent simulation has been applied to various fields,

including traffic simulation[1], crowd simulation[11], evac-

uation simulation of the airport[6], etc. On deploying a

good agent simulation, it is important to be able to handle

details in a simulation. In [6], it was introduced that the

proposed agent simulation for an evacuation simulation in

an airport, could handle details of the feeling and the human

relations. It shows that the human beings may not be able

to escape in a certain disaster scenario that were considered

as a safe case for evacuation. To analyze how vehicles

and people act in unusual situations, such as disasters or

events, or to investigate the case that the precise behaviors

of people might affect greatly in the overall behaviors,

agents need to dynamically respond to their environmental

changes. In such a situation, agents should be coded to

respond to such environmental changes and therefore it is

important to be able to handle them in a simulation to run

it in a reasonable time. As an example scenario that agents

have to dynamically respond to such environmental changes,

replanning of the car-agent on a road traffic simulation

should run within a fine grained time in the simulation[3].

It is one of the promising issues to realize a multi-agent

simulation on a large scale. In actual situations, even if

considering a traffic simulation that covers just one city,

it should be influenced by the traffics from and to other

connected cities, and in order to simulate a traffic in one city

without considering traffic flows into the city, it is important

to handle millions of vehicles in the simulation to reproduce

the actual phenomenon happening there. For example, a

multi-agent simulation was proposed which considers what

kind of influences were obtained if the city has lanes

which only allow electric vehicles to run on them[4]. In

the case shown in [4], the simulation was performed using

approximately 3 million of agents.

It is one of the important key issues to be investigated to

make a multi-agent simulation large scale and increases the

efficiency of execution[12]. However, to realize a simulation

which is handling hundreds of thousands of individual agents

to analyze behaviors of them on a very large-scale complex

environment (e.g., an airport, a crowded train station or a

whole megacity) with credible behaviors, it requires massive

amount of computational power.

Our goal is to improve scalability of multi-agent-

simulations by accelerating processings of the code that

agents could respond to dynamic environmental changes

by utilizing modern computation infrastructures that were

not recognized as a resource to be used for the purpose.

In this paper, we present a preliminary framework that

allows its users to realize easily GPGPU-based large scale

simulations which can cover various agent models. We

show the framework can help handing a large amount of

processing needs on a multi-agent simulation by efficiently

using rich computational resources, such as GPUs.

II. RELATED WORK

There are several works based on the idea of changing

degrees of details in behavior of agents to reflect what should

be examined for the user’s needs to handle a large-scale

multi agent simulation(e.g., [2]). On implementing such a

simulation, the approaches can be divided to two approaches.

In the viewpoint of tradeoffs among the complexly and

granularity of the simulation, often some simplifications

should be done. For example, the behavior of each car agent

can be simplified to cover the phenomenon of daily traffic to

handle a whole city in the. Extending the scale of a multi-

agent simulation is also presented[11]. The method proposed

in [11] limits the possible search space which an agent is

going to move within a linked nodes instead of considering

the whole 3-dimension spaces. In [11], the authors showed

that it can dramatically reduce the calculation of collisions

among agents. These methods can help realize a large-scale

2013 Second IIAI International Conference on Advanced Applied Informatics

978-0-7695-5071-8/13 $26.00 © 2013 IEEE

DOI 10.1109/IIAI-AAI.2013.75

262

2013 Second IIAI International Conference on Advanced Applied Informatics

978-0-7695-5071-8/13 $26.00 © 2013 IEEE

DOI 10.1109/IIAI-AAI.2013.75

262

multi-agent simulation by effectively simplifying the details

in an agent simulation.

Moreover, a number of approaches have been proposed to

realize efficient processing of a large-scale traffic simulation

by massively parallel computers. In [13] as a base for per-

forming an agent simulation, agent server IBM Zonal Agent-

based SimulationEnvironment (ZASE) has been developed

which can efficiently run thread-level parallel programs.

ZAZE combines two or more agent servers accelerated by

thread-level parallel executions, as well as decomposing the

agent simulation into multiple processes that can be run on

massively parallel computers to realize a large-scale agent

simulation. However, the approach can normally be applied

to SMP-based scalar processor computer clusters.

To improve scalability of a simulation, the use of cloud

computing infrastructures and frameworks, (e.g., Hadoop,

etc.) could also be effective. Since a huge amount of

communications are necessary to synchronize data among

distributed processes, it is crucial to keep their network’s

latency low and give enough bandwidth for them to keep

scalability. In addition, the developers who want to perform

a simulation cannot always prepare massive amount of

computers with such low-latency networks. In this paper,

we would initially focus on how a large-scale simulation

could be done in a single or a small number of computers

each of which has many computation cores.

GPU (Graphics Processing Unit) has been widely used to

realize high-performance computing, especially on graphics-

related operations. In order to effectively utilize the rich

computation resources provided by GPU on non-graphic

operations, GPGPU(General Purpose GPU)-based program-

ming models have been proposed. When an agent simulation

would be run on GPGPU-based computing infrastructure,

the code of agent’s internal processing have to be developed

having special coding techniques and detailed parameter

tunings for each specific runtime environment. In this paper,

we propose a framework which supports those coding,

verification, and parameter tuning processes effectively, as

well as converting the codes to make it easy to run on

existing simulation systems. Although our work initially

targets a road traffic simulation, it could be extended to

a generic simulation on a given network represented as

a graph. To reproduce phenomenon caused by traffics of

cars, humans, etc. on a simulation, their most common unit

could be the movements of agents in a graph. In typical

traffic simulations, an origin and a destination for each agent

are given, and them the simulation engine reproduces each

agents’ moves toward the destination. To reproduce such

moves in a simulation, a kind of graph search algorithms

should be used to find out each agents’ itineraries and

paths to the destination. There are some approaches which

calculates the shortest path of a large-scale graph using

GPGPU[9,8,7], and it has been reported that GPGPU can

perform well to compute the shortest paths compared with

CPU. Although agents may perform replanning in order

to respond to the dynamic environmental changes, such

graph algorithms would be used so frequently. However,

even when the shortest path search in a graph could be

calculated on a GPGPU, we may need much intelligent

and complex search algorithms rather than for a shortest

paths in an agent simulation that aims at reproducing more

realistic traffic behaviors, etc. In an agent simulation, an

algorithm to be applied may be varied in their aims and their

corresponding environments. In this paper, our framework

enables the simulation code developers to develop and run a

complicated planning and other processings using GPGPUs.

III. PROPOSED FRAMEWORK

There are several GPGPU-based computing platforms

(e.g., CUDA, ATI Stream, OpenCL, etc.). In this paper,

we initially focuses on the use of OpenCL because of its

easiness to learn and wider support of hardware platforms.

When we consider about building an agent simulation, the

person who wants to develop a simulation is not necessarily

a specialist in GPGPU-based programming. In this paper,

we create a framework which can easily build and analyze a

certain scale of agent simulations empowered by GPGPU’s

which also allows the developers to easily analyze and

tune its execution speed for optimal executions on a certain

hardware by presenting a kind of instant testing environment.

When agents should respond to dynamic environmental

changes in the simulation, they should perform re-planning,

e.g., a kind of behavior which re-determine the route to

the destination in a car traffic simulation. Our framework

can help the developers perform efficient processing of such

replannings using an acceleration of GPGPU.

GPU is good at performing SMID-computation, which

applies a single instruction to multiple data, as well as

running such SIMD computing threads in parallel. On GPU

processing, a core program of such parallel processing is

called kernel program. Therefore, the code which performs

the same instructions to multiple data in the replanning

process should also be described as a kernel program. In

addition, we should also consider the case that each path

planning algorithm applied to each agent may differ in order

to express each agent’s behavior. Therefore, we focus on

improvement in the speed of the whole simulation including

the planning for every agent can do parallel processing by

using GPU rather than presenting a fast search and planning

algorithm that could run faster on GPUs.

In the OpenCL programming model, there are two basic

types of parallel processing; data parallel processing and task

parallel processing. Data parallel processing performs single

instruction on each processor to multiple data. The efficiency

of computation on data parallel approach often faster by

applying similar computations to multiple data at once. In

our case, this ‘data parallel’ approach can be applied to path

planning processing for every agent.

263263

The developer using our framework can use OpenCL

programming model within a C program, and describes

various kinds of computations for the agents, e.g., path

planning. The data of a road network, the number of agents,

etc. can be received as an argument of the specific kernel

functions, and the developer can describe various programs

that access those data. When the coding of kernel programs

are completed, the developer then register them to the

runtime platform by simply specifying their function names.

By doing so, developers can perform a test-simulation with

the newly developed path planning, etc. In addition, we

designed our framework to allow the developers to convert

their developed codes to source codes which can run on

another simulation platform such as MATSim[1]. To make

the code runnable on other simulation platforms, the code

should be finally coded in different languages (e.g., Java,

python, etc.), and it may not be able to use OpenCL

programming model directly. Therefore, our framework uses

a kind of universal description about its computation (e.g.,

path planning to the target) programming language which

does not use OpenCL based notations directly. In order to

enable the above functionality to adapt other simulation

platforms, we prepared a kind of source code converter

which uses external libraries to allow the codes to use

OpenCL on Java, or other programming languages.

We have two methods to import a road network into

the runtime platform. First, the developer can prepare road

networks manually. Using the map data creation function of

our runtime platform, the developer arranges nodes on the

screen of the system and creates map data by connecting

nodes by links. The edited map data can be exported to a

file and the stored map data can be imported to the system

again. The developers can reuse them in another simulation,

or use a base and to make a new map. Second, the developer

can also import map data which can be used in MATSim.

The map data used in MATSim is described in XML. Our

runtime platform can import a road network that has been

stored in a MATSim-compatible XML file, and adjust the

parameters which are relevant, if necessary. Also the runtime

platform can be used to extend road networks by arranging

new nodes and connecting them by links.

Here, we consider about the ways for an improvement

of the scalability of a multi-agent simulation where the

simulation is performed using a large-scale road network.

In such case, the processing needs of each core in a GPU

become huge, depending on the path planning algorithm to

be used and it might not be able to process efficiently. In

some case, it might have some corruption on a result because

of lack of resources (e.g., a specific kind of memories, etc.).

In this paper, in order to examine the scalability of the agent

codes to be run (e.g., path planning algorithm etc.) from

the viewpoint of the size of the map which can directly

be processed on the GPU, we prepared a function which

dynamically expand or reduce the size of road network

temporarily. In this function, the scale of a road network

can be shrunk by cutting some links of the road network

while keeping its consistency, and also it can be expanded

by copying them twice or more and connecting them while

preventing the situation that the agents cannot move between

specific nodes that have no inter-connections via the links.

In this way, the system effectively support the tests in its

scale of actual simulations.

Our goal is to help realize a large scale simulation which

simulates various phenomenon on disasters and where any

dynamic events will occur on their road traffic. Since we

considered such use-case scenario, we prepared a function

for giving dynamic changes to its road networks while

running the simulation on the runtime platform. Since the

time when a specific road should be disappeared depends on

a specific event or disaster to be reproduced in a simulation,

we provided the function that allows the users to set up

when such road connections should be disappeared, in what

algorithm (e.g., probabilistic disconnection, etc.). By doing

so, it is possible to easily test various conditions that the

road network might be dynamically changed.

IV. IMPLEMENTATION

We implemented a runtime platform based on the frame-

work we proposed in the previous section. Figure2 shows

the overview of the runtime platform that we implemented.

It can perform a simple road traffic simulation, and perform

an agent’s internal processing (e.g., dynamic planning) using

OpenCL-based codings. We implemented four major path

planning algorithms, Dijkstra, A*, RTA*[5], and LRTA*[10]

as a sample code set for further investigation of the users of

our framework.

We conducted a preliminary evaluation for its parallel

processing performance on the path planning algorithms

to validate the potential scalability of our framework. On

each planning problem, each agent receives an origin and

a destination randomly, then we measured the processing

time for the whole processing where all agents retrieved the

route on various conditions in the scale, parameters, and

parallelizing methods that can be specified on OpenCL.

Here, to keep simplicity of the evaluation we used a map

which consists of the 22 links among 12 nodes. We used a

MacBook Pro(os: OS X 10.8.2, cpu: 2.4 GHz Intel Core 2

Duo, compiler: gcc4.2.1 build 5658, gpu: NVIDIA GeForce

320M, memory: 8GB 1067 MHz DDR3) as the experiment

execution environment.

Figure 3 shows the results of total processing time exe-

cuted by GPU(using OpenCL) for two planning algorithms:

Dijkstra and A*. The horizontal axis shows the number of

agents and the vertical axis shows their processing time.

The graph shows four conditions; “GPU sequential path

planning”, “GPU sequential all process ”, “GPU parallel path

planning”, and “GPU parallel all process”.

264264

��

��

��

��

��

�� !" �

��

�

��#��

��

$��

��

��#��

��

%&��'��(��

��

�

Figure 1. The outline of proposed framework

Figure 2. The overview of execution environment

“GPU sequential path planning” shows the results which

restrict parallel processing for all agent’s path planning on

single GPU core, and “GPU sequential all process” shows

the results which include the whole time on processing re-

quired tasks in order to issue the OpenCL-based processings

for “GPU sequential path planning”. “GPU parallel path

planning” shows the results which allow serial processings

for all agent’s path planning on two or more GPU’s cores.

“GPU parallel all process” shows the results which include

the whole time on processing all required tasks in order

to issue OpenCL-based processings for “GPU parallel path

planning”. From the result of Fig. 3, on serial processing

results, when the number of agents increases, the processing

time increased proportionally. However, on parallel pro-

cessing, we observed that, even when the processing has

been done less than 256 travel agents, their processing

times are mostly same in each experiment setting. From

this result, we have confirmed that the planning could be

performed in parallel by multiple GPU cores. In addition,

we also confirmed that the RTA* and the LRTA* have been

265265

��

)�

��

��

��

��

�)�

��

��

��

)��

�� )� �� )�� )*�� *�)� ��)��)��

��

��

��

��

��

� ��

� ��

��

��

� ��

� ��

Figure 3. The test about parallel processing

performed similarly on their scalability performance.

Next, we confirmed that how these running performances

are different when different GPUs are used. In this compar-

ison, we used NVIDIA GeForce 320M, NVIDIA GeForce

8800GT, and AMD Radeon HD 6750M as the runtime

hardware.

Figure 4 shows the result on comparing the results with

those GPU(s) to the Dijkstra and the A*. The horizontal

axis shows the number of agents and the vertical axis

shows processing time. In comparison between GeForce

320M and GeForce 8800GT, both of which share the same

architecture, GeForce 8800GT could constantly process the

tasks within shorter time. This means that our runtime plat-

form can be used to confirm the difference of throughputs

between GPUs whose architectures are similar or same

but their performances are different. In addition, Figure 4

could be seen that when the number of agents is small

on AMD Radeon environment, Radeon HD 6750M did the

computation shorter than the GPUs compared to NVIDIA’s

architecture. However, when the number of agents was

increased, the processing time could become longer. The

runtime platform can help reproduce the phenomenon that,

for example, even when evaluating scalability on the number

of agents NVIDIA-based GPUs would do it within stable

time compared with AMD-based GPUs, and the overheads

of performing parallel processing were different on their

GPU architectures. From this result, it might be said that

the performance of AMD-based GPUs may require suitable

tunings for better scalability. In those experiments, all the

parameters for their memory arrangement on a program, the

degree of maximum parallel threads, etc. were not tuned

for specific problems. Instead, we just used the standard

(default) parameters for those architectures. Currently, we

are working on developing the techniques (as well as their

implementations to our framework) to realize automated

adjustment to each optimal parameter set for each GPU and

their simulation settings. In addition, we are also working on

the implementation of the functionality which can carry out

such auto-optimization tests and adjustments simultaneously

on two or more execution settings which also have different

kind of GPU(s).

V. CONCLUSION

In this paper, we presented a framework to help in-

vestigate that multi-agent simulations could be scaled up

which would be used as an approach to analyze what could

happen in large scale simulations covering the movements

of people, vehicles and moving objects when disasters or

events occurred. We proposed the framework which could be

useful for investigating scalability issues on such simulations

running on GPU computing resources. We presented a

preliminary case study on a multi-agent traffic simulation

with dynamic road situation changes. For instance, agents

can perform replanning during its short simulation period in

consideration of changes of the road network by disasters,

traffic congestions, and other reasons. We initially prepared

an OpenCL-based implementation on our runtime platform

to cover major four planning algorithms. By using the

proposed framework, when the agent’s actions are coded

using OpenCL, our framework could reduce the load in

analyzing characteristics of each GPU types, parameters, etc.

Future work includes an evaluation of effectiveness of our

proposed framework on an actual scalability improvement

scenario on a specific simulation problem. In addition,

to make it easy to utilize two or more GPU(s) on our

framework. There could be several approaches utilizing two

or more GPU(s). For example, use two or more GPU(s)

in one computer is one possible scenario, and also run it

on two or more computers each of which has single GPU,

and their combinations. Because it is not easy to prepare all

266266

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

� ��

� � ��

� � ��

Figure 4. Comparison by multiple GPU(s)

possible settings as actual execution environment, it could

be helpful to collect some key characteristics from machines

with some typical configurations and then the framework

predicts possible performances for a specific (or optimal)

configuration of equipments.Currently, the runtime platform can measure execution

performances for each agent program. However, it should

be run on each test environment. To extend the framework

to be deployed on a networked environment is also future

work, to run necessary measurements that should be done on

different computers, etc. Those extensions will help predict

possible performance range on a specific set of computers

for each specific application.

REFERENCES

[1] M. Balmer, K. Meister, M. Rieser, K. Nagel, and K. Axhausen.Agent-based simulation of travel demand: Structure and com-putational performance of matsim-t. In Proc. the 2nd TRBConference on Innovations in Travel Modeling, 2008.

[2] L. Navarro, F. Flacher and V. Corruble. Dynamic level ofdetail for large scale agent-based urban simulations. In Proc. of10th Int. Conf. on Autonomous Agents and Multiagent Systems(AAMAS2011), pp. 701–708, 2011.

[3] E. de la Hoz, I. Marsa-Maestre, M. A. Lopez-Carmona, andP. Perez. Extending matsim to allow the simulation of routecoordination mechanisms. In Proc. The 1st InternationalWorkshop on Multi-Agent Smart Computing(MASmart 2011),pages 1–15, 2011.

[4] R. Kanamori, T. Morikawa, and T. Ito. Evaluation of speciallanes as incentive policies for promoting electric vehicles. InProc. The 1st International Workshop on Multi-Agent SmartComputing(MASmart 2011), pages 45–56, 2011.

[5] R. E. Korf. Real-time heuristic search. Artificial Intelligence,42(2-3):189–211, 1990.

[6] J. Tsai, N. Fridman, E. Bowring, M. Brown, S. Epstein,G. Kaminka, S. Marsella, A. Ogden, I. Rika, A. Sheel, M. E.Taylor, X. Wang, A. Zilka, and M. Tambe. Escapes - evacuationsimulation with children, authorities, parents, emotions, andsocial comparison. In Proc. of 10th Int. Conf. on AutonomousAgents and Multiagent Systems (AAMAS2011), pages 457–464,2011.

[7] P. Harish, V. Vineet, and P. J. Narayanan. Large Graph Algo-rithms for Massively Multithreaded Architectures. TechnicalReport IIIT/TR/2009/74, 2009.

[8] V. Vineet, P. Harish, S. Patidar, and P. J. Narayanan. Fastminimum spanning tree for large graphs on the gpu. In Proc.of the Conference on High Performance Graphics 2009, HPG’09, pages 167–171, New York, NY, USA, 2009. ACM.

[9] P. Harish and P. J. Narayanan. Accelerating large graph algo-rithms on the gpu using cuda. In Proc. of the 14th internationalconference on High performance computing, HiPC’07, pages197–208, Berlin, Heidelberg, 2007. Springer-Verlag.

[10] T. Ishida and M. Shimbo. Path Learning by Realtime Search.Japanese Society for Artificial Intelligence, 11(3):411–419,1996. (in Japanese)

[11] T. Yamashita, T. Okada and I. Noda. Implementation ofSimulation Environment for Control of Huge-scale PedestrianFlow. In Joint Agent Workshop and Symposium(JAWS), 2012.(in Japanese)

[12] Y. Nakajima, S. Yamane and H. Hattori. Multi-model basedSimulation Platform for Urban Traffic Simulation. In JointAgent Workshop and Symposium(JAWS), 2010. (in Japanese)

[13] S. Kato, G. Yamamoto, H. Tai and H. Mizuta. Large-scaleTraffic Simulation with Scholar SMP Supercomputing System.In Joint Agent Workshop and Symposium(JAWS), 2008. (inJapanese)

267267

Documents

A GPU-based Framework for Large-scale Multi-Ag