Upload
rafael-alejandro-sanchez
View
8
Download
0
Embed Size (px)
DESCRIPTION
A GPU based Framework
Citation preview
A GPU-based Framework for Large-scale Multi-Agent Traffic Simulations
Yoshihito Sano
Graduate School of InformaticsShizuoka UniversityHamamatsu, Japan
Naoki Fukuta
Graduate School of InformaticsShizuoka UniversityHamamatsu, Japan
Abstract—In order to improve the reproducibility of realsituations, agents should respond to dynamic environmentalchanges, as well as considering efficient computation of themsince the simulation often becomes huge scale. In this paper, anapproach and the basic architecture for GPGPU-based efficientand scalable framework is presented, by applying OpenCL-based multi-platform agent code conversion engine. We presenta prototype implementation of the framework to easily test andtry the implemented path planning codes in various settings.
Keywords-multiagent system; traffic simulation; GPU com-puting;
I. INTRODUCTION
Multi-agent simulation has been applied to various fields,
including traffic simulation[1], crowd simulation[11], evac-
uation simulation of the airport[6], etc. On deploying a
good agent simulation, it is important to be able to handle
details in a simulation. In [6], it was introduced that the
proposed agent simulation for an evacuation simulation in
an airport, could handle details of the feeling and the human
relations. It shows that the human beings may not be able
to escape in a certain disaster scenario that were considered
as a safe case for evacuation. To analyze how vehicles
and people act in unusual situations, such as disasters or
events, or to investigate the case that the precise behaviors
of people might affect greatly in the overall behaviors,
agents need to dynamically respond to their environmental
changes. In such a situation, agents should be coded to
respond to such environmental changes and therefore it is
important to be able to handle them in a simulation to run
it in a reasonable time. As an example scenario that agents
have to dynamically respond to such environmental changes,
replanning of the car-agent on a road traffic simulation
should run within a fine grained time in the simulation[3].
It is one of the promising issues to realize a multi-agent
simulation on a large scale. In actual situations, even if
considering a traffic simulation that covers just one city,
it should be influenced by the traffics from and to other
connected cities, and in order to simulate a traffic in one city
without considering traffic flows into the city, it is important
to handle millions of vehicles in the simulation to reproduce
the actual phenomenon happening there. For example, a
multi-agent simulation was proposed which considers what
kind of influences were obtained if the city has lanes
which only allow electric vehicles to run on them[4]. In
the case shown in [4], the simulation was performed using
approximately 3 million of agents.
It is one of the important key issues to be investigated to
make a multi-agent simulation large scale and increases the
efficiency of execution[12]. However, to realize a simulation
which is handling hundreds of thousands of individual agents
to analyze behaviors of them on a very large-scale complex
environment (e.g., an airport, a crowded train station or a
whole megacity) with credible behaviors, it requires massive
amount of computational power.
Our goal is to improve scalability of multi-agent-
simulations by accelerating processings of the code that
agents could respond to dynamic environmental changes
by utilizing modern computation infrastructures that were
not recognized as a resource to be used for the purpose.
In this paper, we present a preliminary framework that
allows its users to realize easily GPGPU-based large scale
simulations which can cover various agent models. We
show the framework can help handing a large amount of
processing needs on a multi-agent simulation by efficiently
using rich computational resources, such as GPUs.
II. RELATED WORK
There are several works based on the idea of changing
degrees of details in behavior of agents to reflect what should
be examined for the user’s needs to handle a large-scale
multi agent simulation(e.g., [2]). On implementing such a
simulation, the approaches can be divided to two approaches.
In the viewpoint of tradeoffs among the complexly and
granularity of the simulation, often some simplifications
should be done. For example, the behavior of each car agent
can be simplified to cover the phenomenon of daily traffic to
handle a whole city in the. Extending the scale of a multi-
agent simulation is also presented[11]. The method proposed
in [11] limits the possible search space which an agent is
going to move within a linked nodes instead of considering
the whole 3-dimension spaces. In [11], the authors showed
that it can dramatically reduce the calculation of collisions
among agents. These methods can help realize a large-scale
2013 Second IIAI International Conference on Advanced Applied Informatics
978-0-7695-5071-8/13 $26.00 © 2013 IEEE
DOI 10.1109/IIAI-AAI.2013.75
262
2013 Second IIAI International Conference on Advanced Applied Informatics
978-0-7695-5071-8/13 $26.00 © 2013 IEEE
DOI 10.1109/IIAI-AAI.2013.75
262
multi-agent simulation by effectively simplifying the details
in an agent simulation.
Moreover, a number of approaches have been proposed to
realize efficient processing of a large-scale traffic simulation
by massively parallel computers. In [13] as a base for per-
forming an agent simulation, agent server IBM Zonal Agent-
based SimulationEnvironment (ZASE) has been developed
which can efficiently run thread-level parallel programs.
ZAZE combines two or more agent servers accelerated by
thread-level parallel executions, as well as decomposing the
agent simulation into multiple processes that can be run on
massively parallel computers to realize a large-scale agent
simulation. However, the approach can normally be applied
to SMP-based scalar processor computer clusters.
To improve scalability of a simulation, the use of cloud
computing infrastructures and frameworks, (e.g., Hadoop,
etc.) could also be effective. Since a huge amount of
communications are necessary to synchronize data among
distributed processes, it is crucial to keep their network’s
latency low and give enough bandwidth for them to keep
scalability. In addition, the developers who want to perform
a simulation cannot always prepare massive amount of
computers with such low-latency networks. In this paper,
we would initially focus on how a large-scale simulation
could be done in a single or a small number of computers
each of which has many computation cores.
GPU (Graphics Processing Unit) has been widely used to
realize high-performance computing, especially on graphics-
related operations. In order to effectively utilize the rich
computation resources provided by GPU on non-graphic
operations, GPGPU(General Purpose GPU)-based program-
ming models have been proposed. When an agent simulation
would be run on GPGPU-based computing infrastructure,
the code of agent’s internal processing have to be developed
having special coding techniques and detailed parameter
tunings for each specific runtime environment. In this paper,
we propose a framework which supports those coding,
verification, and parameter tuning processes effectively, as
well as converting the codes to make it easy to run on
existing simulation systems. Although our work initially
targets a road traffic simulation, it could be extended to
a generic simulation on a given network represented as
a graph. To reproduce phenomenon caused by traffics of
cars, humans, etc. on a simulation, their most common unit
could be the movements of agents in a graph. In typical
traffic simulations, an origin and a destination for each agent
are given, and them the simulation engine reproduces each
agents’ moves toward the destination. To reproduce such
moves in a simulation, a kind of graph search algorithms
should be used to find out each agents’ itineraries and
paths to the destination. There are some approaches which
calculates the shortest path of a large-scale graph using
GPGPU[9,8,7], and it has been reported that GPGPU can
perform well to compute the shortest paths compared with
CPU. Although agents may perform replanning in order
to respond to the dynamic environmental changes, such
graph algorithms would be used so frequently. However,
even when the shortest path search in a graph could be
calculated on a GPGPU, we may need much intelligent
and complex search algorithms rather than for a shortest
paths in an agent simulation that aims at reproducing more
realistic traffic behaviors, etc. In an agent simulation, an
algorithm to be applied may be varied in their aims and their
corresponding environments. In this paper, our framework
enables the simulation code developers to develop and run a
complicated planning and other processings using GPGPUs.
III. PROPOSED FRAMEWORK
There are several GPGPU-based computing platforms
(e.g., CUDA, ATI Stream, OpenCL, etc.). In this paper,
we initially focuses on the use of OpenCL because of its
easiness to learn and wider support of hardware platforms.
When we consider about building an agent simulation, the
person who wants to develop a simulation is not necessarily
a specialist in GPGPU-based programming. In this paper,
we create a framework which can easily build and analyze a
certain scale of agent simulations empowered by GPGPU’s
which also allows the developers to easily analyze and
tune its execution speed for optimal executions on a certain
hardware by presenting a kind of instant testing environment.
When agents should respond to dynamic environmental
changes in the simulation, they should perform re-planning,
e.g., a kind of behavior which re-determine the route to
the destination in a car traffic simulation. Our framework
can help the developers perform efficient processing of such
replannings using an acceleration of GPGPU.
GPU is good at performing SMID-computation, which
applies a single instruction to multiple data, as well as
running such SIMD computing threads in parallel. On GPU
processing, a core program of such parallel processing is
called kernel program. Therefore, the code which performs
the same instructions to multiple data in the replanning
process should also be described as a kernel program. In
addition, we should also consider the case that each path
planning algorithm applied to each agent may differ in order
to express each agent’s behavior. Therefore, we focus on
improvement in the speed of the whole simulation including
the planning for every agent can do parallel processing by
using GPU rather than presenting a fast search and planning
algorithm that could run faster on GPUs.
In the OpenCL programming model, there are two basic
types of parallel processing; data parallel processing and task
parallel processing. Data parallel processing performs single
instruction on each processor to multiple data. The efficiency
of computation on data parallel approach often faster by
applying similar computations to multiple data at once. In
our case, this ‘data parallel’ approach can be applied to path
planning processing for every agent.
263263
The developer using our framework can use OpenCL
programming model within a C program, and describes
various kinds of computations for the agents, e.g., path
planning. The data of a road network, the number of agents,
etc. can be received as an argument of the specific kernel
functions, and the developer can describe various programs
that access those data. When the coding of kernel programs
are completed, the developer then register them to the
runtime platform by simply specifying their function names.
By doing so, developers can perform a test-simulation with
the newly developed path planning, etc. In addition, we
designed our framework to allow the developers to convert
their developed codes to source codes which can run on
another simulation platform such as MATSim[1]. To make
the code runnable on other simulation platforms, the code
should be finally coded in different languages (e.g., Java,
python, etc.), and it may not be able to use OpenCL
programming model directly. Therefore, our framework uses
a kind of universal description about its computation (e.g.,
path planning to the target) programming language which
does not use OpenCL based notations directly. In order to
enable the above functionality to adapt other simulation
platforms, we prepared a kind of source code converter
which uses external libraries to allow the codes to use
OpenCL on Java, or other programming languages.
We have two methods to import a road network into
the runtime platform. First, the developer can prepare road
networks manually. Using the map data creation function of
our runtime platform, the developer arranges nodes on the
screen of the system and creates map data by connecting
nodes by links. The edited map data can be exported to a
file and the stored map data can be imported to the system
again. The developers can reuse them in another simulation,
or use a base and to make a new map. Second, the developer
can also import map data which can be used in MATSim.
The map data used in MATSim is described in XML. Our
runtime platform can import a road network that has been
stored in a MATSim-compatible XML file, and adjust the
parameters which are relevant, if necessary. Also the runtime
platform can be used to extend road networks by arranging
new nodes and connecting them by links.
Here, we consider about the ways for an improvement
of the scalability of a multi-agent simulation where the
simulation is performed using a large-scale road network.
In such case, the processing needs of each core in a GPU
become huge, depending on the path planning algorithm to
be used and it might not be able to process efficiently. In
some case, it might have some corruption on a result because
of lack of resources (e.g., a specific kind of memories, etc.).
In this paper, in order to examine the scalability of the agent
codes to be run (e.g., path planning algorithm etc.) from
the viewpoint of the size of the map which can directly
be processed on the GPU, we prepared a function which
dynamically expand or reduce the size of road network
temporarily. In this function, the scale of a road network
can be shrunk by cutting some links of the road network
while keeping its consistency, and also it can be expanded
by copying them twice or more and connecting them while
preventing the situation that the agents cannot move between
specific nodes that have no inter-connections via the links.
In this way, the system effectively support the tests in its
scale of actual simulations.
Our goal is to help realize a large scale simulation which
simulates various phenomenon on disasters and where any
dynamic events will occur on their road traffic. Since we
considered such use-case scenario, we prepared a function
for giving dynamic changes to its road networks while
running the simulation on the runtime platform. Since the
time when a specific road should be disappeared depends on
a specific event or disaster to be reproduced in a simulation,
we provided the function that allows the users to set up
when such road connections should be disappeared, in what
algorithm (e.g., probabilistic disconnection, etc.). By doing
so, it is possible to easily test various conditions that the
road network might be dynamically changed.
IV. IMPLEMENTATION
We implemented a runtime platform based on the frame-
work we proposed in the previous section. Figure2 shows
the overview of the runtime platform that we implemented.
It can perform a simple road traffic simulation, and perform
an agent’s internal processing (e.g., dynamic planning) using
OpenCL-based codings. We implemented four major path
planning algorithms, Dijkstra, A*, RTA*[5], and LRTA*[10]
as a sample code set for further investigation of the users of
our framework.
We conducted a preliminary evaluation for its parallel
processing performance on the path planning algorithms
to validate the potential scalability of our framework. On
each planning problem, each agent receives an origin and
a destination randomly, then we measured the processing
time for the whole processing where all agents retrieved the
route on various conditions in the scale, parameters, and
parallelizing methods that can be specified on OpenCL.
Here, to keep simplicity of the evaluation we used a map
which consists of the 22 links among 12 nodes. We used a
MacBook Pro(os: OS X 10.8.2, cpu: 2.4 GHz Intel Core 2
Duo, compiler: gcc4.2.1 build 5658, gpu: NVIDIA GeForce
320M, memory: 8GB 1067 MHz DDR3) as the experiment
execution environment.
Figure 3 shows the results of total processing time exe-
cuted by GPU(using OpenCL) for two planning algorithms:
Dijkstra and A*. The horizontal axis shows the number of
agents and the vertical axis shows their processing time.
The graph shows four conditions; “GPU sequential path
planning”, “GPU sequential all process ”, “GPU parallel path
planning”, and “GPU parallel all process”.
264264
�����������
������������
���������
���������
�������������
���� !" �
������������
�
������#���
�������
$������������
������
�����#������������ �
������
%&���'���(����
������
�
Figure 1. The outline of proposed framework
Figure 2. The overview of execution environment
“GPU sequential path planning” shows the results which
restrict parallel processing for all agent’s path planning on
single GPU core, and “GPU sequential all process” shows
the results which include the whole time on processing re-
quired tasks in order to issue the OpenCL-based processings
for “GPU sequential path planning”. “GPU parallel path
planning” shows the results which allow serial processings
for all agent’s path planning on two or more GPU’s cores.
“GPU parallel all process” shows the results which include
the whole time on processing all required tasks in order
to issue OpenCL-based processings for “GPU parallel path
planning”. From the result of Fig. 3, on serial processing
results, when the number of agents increases, the processing
time increased proportionally. However, on parallel pro-
cessing, we observed that, even when the processing has
been done less than 256 travel agents, their processing
times are mostly same in each experiment setting. From
this result, we have confirmed that the planning could be
performed in parallel by multiple GPU cores. In addition,
we also confirmed that the RTA* and the LRTA* have been
265265
��
)�
��
��
��
���
�)�
���
���
���
)��
�� ��� �)� ��� �)�� )*�� *�)� ��)��)����
�����������
��� ��
������
��� �������������������� �����������
��� �������������������������������
� ������������������ �����������
� �����������������������������
��� ��� ��������������� �����������
��� ��� ��������������������������
� ����������������� �����������
� ����������������������������
Figure 3. The test about parallel processing
performed similarly on their scalability performance.
Next, we confirmed that how these running performances
are different when different GPUs are used. In this compar-
ison, we used NVIDIA GeForce 320M, NVIDIA GeForce
8800GT, and AMD Radeon HD 6750M as the runtime
hardware.
Figure 4 shows the result on comparing the results with
those GPU(s) to the Dijkstra and the A*. The horizontal
axis shows the number of agents and the vertical axis
shows processing time. In comparison between GeForce
320M and GeForce 8800GT, both of which share the same
architecture, GeForce 8800GT could constantly process the
tasks within shorter time. This means that our runtime plat-
form can be used to confirm the difference of throughputs
between GPUs whose architectures are similar or same
but their performances are different. In addition, Figure 4
could be seen that when the number of agents is small
on AMD Radeon environment, Radeon HD 6750M did the
computation shorter than the GPUs compared to NVIDIA’s
architecture. However, when the number of agents was
increased, the processing time could become longer. The
runtime platform can help reproduce the phenomenon that,
for example, even when evaluating scalability on the number
of agents NVIDIA-based GPUs would do it within stable
time compared with AMD-based GPUs, and the overheads
of performing parallel processing were different on their
GPU architectures. From this result, it might be said that
the performance of AMD-based GPUs may require suitable
tunings for better scalability. In those experiments, all the
parameters for their memory arrangement on a program, the
degree of maximum parallel threads, etc. were not tuned
for specific problems. Instead, we just used the standard
(default) parameters for those architectures. Currently, we
are working on developing the techniques (as well as their
implementations to our framework) to realize automated
adjustment to each optimal parameter set for each GPU and
their simulation settings. In addition, we are also working on
the implementation of the functionality which can carry out
such auto-optimization tests and adjustments simultaneously
on two or more execution settings which also have different
kind of GPU(s).
V. CONCLUSION
In this paper, we presented a framework to help in-
vestigate that multi-agent simulations could be scaled up
which would be used as an approach to analyze what could
happen in large scale simulations covering the movements
of people, vehicles and moving objects when disasters or
events occurred. We proposed the framework which could be
useful for investigating scalability issues on such simulations
running on GPU computing resources. We presented a
preliminary case study on a multi-agent traffic simulation
with dynamic road situation changes. For instance, agents
can perform replanning during its short simulation period in
consideration of changes of the road network by disasters,
traffic congestions, and other reasons. We initially prepared
an OpenCL-based implementation on our runtime platform
to cover major four planning algorithms. By using the
proposed framework, when the agent’s actions are coded
using OpenCL, our framework could reduce the load in
analyzing characteristics of each GPU types, parameters, etc.
Future work includes an evaluation of effectiveness of our
proposed framework on an actual scalability improvement
scenario on a specific simulation problem. In addition,
to make it easy to utilize two or more GPU(s) on our
framework. There could be several approaches utilizing two
or more GPU(s). For example, use two or more GPU(s)
in one computer is one possible scenario, and also run it
on two or more computers each of which has single GPU,
and their combinations. Because it is not easy to prepare all
266266
��
����
��
����
��
����
��
����
��
����
�� �� ��� �� ��� ���
�����������
��� ��
������
�� ������ ��������������
�� ������ ��������������
�� ������ ����������������
� ��������������
� � ��������������
� � ����������������
Figure 4. Comparison by multiple GPU(s)
possible settings as actual execution environment, it could
be helpful to collect some key characteristics from machines
with some typical configurations and then the framework
predicts possible performances for a specific (or optimal)
configuration of equipments.Currently, the runtime platform can measure execution
performances for each agent program. However, it should
be run on each test environment. To extend the framework
to be deployed on a networked environment is also future
work, to run necessary measurements that should be done on
different computers, etc. Those extensions will help predict
possible performance range on a specific set of computers
for each specific application.
REFERENCES
[1] M. Balmer, K. Meister, M. Rieser, K. Nagel, and K. Axhausen.Agent-based simulation of travel demand: Structure and com-putational performance of matsim-t. In Proc. the 2nd TRBConference on Innovations in Travel Modeling, 2008.
[2] L. Navarro, F. Flacher and V. Corruble. Dynamic level ofdetail for large scale agent-based urban simulations. In Proc. of10th Int. Conf. on Autonomous Agents and Multiagent Systems(AAMAS2011), pp. 701–708, 2011.
[3] E. de la Hoz, I. Marsa-Maestre, M. A. Lopez-Carmona, andP. Perez. Extending matsim to allow the simulation of routecoordination mechanisms. In Proc. The 1st InternationalWorkshop on Multi-Agent Smart Computing(MASmart 2011),pages 1–15, 2011.
[4] R. Kanamori, T. Morikawa, and T. Ito. Evaluation of speciallanes as incentive policies for promoting electric vehicles. InProc. The 1st International Workshop on Multi-Agent SmartComputing(MASmart 2011), pages 45–56, 2011.
[5] R. E. Korf. Real-time heuristic search. Artificial Intelligence,42(2-3):189–211, 1990.
[6] J. Tsai, N. Fridman, E. Bowring, M. Brown, S. Epstein,G. Kaminka, S. Marsella, A. Ogden, I. Rika, A. Sheel, M. E.Taylor, X. Wang, A. Zilka, and M. Tambe. Escapes - evacuationsimulation with children, authorities, parents, emotions, andsocial comparison. In Proc. of 10th Int. Conf. on AutonomousAgents and Multiagent Systems (AAMAS2011), pages 457–464,2011.
[7] P. Harish, V. Vineet, and P. J. Narayanan. Large Graph Algo-rithms for Massively Multithreaded Architectures. TechnicalReport IIIT/TR/2009/74, 2009.
[8] V. Vineet, P. Harish, S. Patidar, and P. J. Narayanan. Fastminimum spanning tree for large graphs on the gpu. In Proc.of the Conference on High Performance Graphics 2009, HPG’09, pages 167–171, New York, NY, USA, 2009. ACM.
[9] P. Harish and P. J. Narayanan. Accelerating large graph algo-rithms on the gpu using cuda. In Proc. of the 14th internationalconference on High performance computing, HiPC’07, pages197–208, Berlin, Heidelberg, 2007. Springer-Verlag.
[10] T. Ishida and M. Shimbo. Path Learning by Realtime Search.Japanese Society for Artificial Intelligence, 11(3):411–419,1996. (in Japanese)
[11] T. Yamashita, T. Okada and I. Noda. Implementation ofSimulation Environment for Control of Huge-scale PedestrianFlow. In Joint Agent Workshop and Symposium(JAWS), 2012.(in Japanese)
[12] Y. Nakajima, S. Yamane and H. Hattori. Multi-model basedSimulation Platform for Urban Traffic Simulation. In JointAgent Workshop and Symposium(JAWS), 2010. (in Japanese)
[13] S. Kato, G. Yamamoto, H. Tai and H. Mizuta. Large-scaleTraffic Simulation with Scholar SMP Supercomputing System.In Joint Agent Workshop and Symposium(JAWS), 2008. (inJapanese)
267267