Upload
abelguilhermino
View
218
Download
0
Embed Size (px)
Citation preview
8/2/2019 Padrao Spl2010 Ufpe v5
1/6
AN ENVIRONMENT FOR ENERGY CONSUMPTION ANALYSIS OF
CACHE MEMORIES IN SOC PLATFORMS
Cordeiro, F.R.; Silva-Filho, A.G.; Araujo, C.C.; Gomes, M.; Barros, E.N.S. and Lima, M.E.
Informatics Center (CIn)
Federal University of Pernambuco (UFPE)
Av. Prof. Luiz Freire s/nCidade UniversitriaRecife/PE - Brasil
email: { frc, agsf,cca2,maag,ensb,mel}@cin.ufpe.br
ABSTRACT
The tuning of cache architectures in platforms for
embedded systems applications can dramatically reduceenergy consumption. The existing cache exploration
environments constrain the designer to analyze cache
energy consumption on single processor systems and
worse, systems that are based on a single processor type. Inthis paper is presented the PCacheEnergyAnalyzer
environment for energy consumption analysis of cache
memory on SoC platforms. This is a powerful energy
analysis environment that combines the use of efficient
tools to provide static and dynamic energy consumption
analysis, the flexibility to support the architecture
exploration of cache memories on platforms that are not
bound to a specific processor, and fast simulation
techniques. The proposed environment has been integratedinto the SoC modeling framework PDesigner, providing a
user-friendly graphical interface allowing the integrated
modeling and cache energy analysis of SoCs. The
PCacheEnergyAnalyzer has been validated with four
applications of the Mediabench suite benchmark.
1.INTRODUCTIONCurrently, the energy consumed by the memory hierarchy
can account for up to 50% of the total energy spent by
microprocessor-based architectures [1][2]. This fact
becomes more critical due to the emergence of SoCs,meaning a large part of the integrated circuits contains
heterogeneous processors and often cache memories.
Moreover, current semiconductor technologies have raised
static memory consumption from negligible to up to 30%.
Many approaches do not take this fact in consideration.
Many efforts have been made to reduce theconsumption of energy by adjusting cache parameters to the
needs of a particular application[3][4][5][6]. However,
since the fundamental purpose of the cache subsystem is to
provide high performance for memory accessing, cache
optimization techniques are supposed to be driven not only
by energy savings but also by preventing degradation of the
applications performance.
No single combination of cache parameters (total size,
line size and associativity), also known as cache
configuration, would be perfect for all applications.
Therefore cache subsystems have been customized in order
to deal with specific characteristics and to optimize theirenergy consumption when running a particular application.
By adjusting the parameters of a cache memory to a
specific application it is possible to save on average 60% of
energy consumption [3].
Nevertheless, finding a suitable cache configuration
(combination of total size, line size and associativity) for a
specific application can be a complex task and may take a
long time for simulation and analysis. Most of the tools use
exhaustive or heuristics based exploration [4][5][6] of all
possible cache configurations. These are cost intensive
approaches, leading to unacceptable exploration times.
The tools for cache analysis lack the resources designers
need to perform an energy consumption analysis withefficacy and efficiency. Some do not take into consideration
static energy consumption. These tools are not flexible;
they are normally bound to a specific processor. Designers
experience several difficulties when they need to analyzecaches on a platform with processors that are different from
the ones in the tools.
Some environments have been developed aimed at
cache parameters exploration [4][5][7]. However, none of
them has taken into account cache memory energy models
that consider the two energy components: static and
dynamic. Silva-Filho [6] considers static and dynamic
energy components, however, the work focuses on a single
platform and is not integrated in to a graphical interfaceenvironment for platform analysis.
Although cache memory exploration considering energy
consumption is not a new issue in Design Space
Exploration (DSE), this work contributes with a new
approach for exploiting platforms with cache memory
architectures, considering energy consumption. Differently
from other approaches with analysis intended for only one
processor, this paper presents an environment for energy
consumption analysis called PCacheEnergyAnalyzer. This
is a an environment that provides support for the
exploration of cache memories configurations in terms of
static and dynamic energy consumption. Moreover, it uses a
8/2/2019 Padrao Spl2010 Ufpe v5
2/6
fast exploration strategy based on single-step simulation for
simulating multiple sizes of caches simultaneously. It also
supports the cache exploration in platforms not bound to a
specific processor. All these features are integrated in a
easy to handle graphical environment in the PDesignerFramework.
The rest of this paper is structured as follows. In the
next section, we discuss some recent related work. In
section 3 the proposed approach for a cache energy
environment is presented. In section 5 some results are
presented comparing the potentialities for two different
processors and several applications by using
PCacheEnergyAnalyzer environment. Finally, in Section 5
the conclusions and future directions are discussed.
2.RELATED WORKSome existing methods still apply the exhaustive search to
find the optimal cache configuration in the design space.
However, the time required for such an exhaustive search
is often prohibitive. Platune [8] is an example of a
framework for adjusting configurable System-on-Chip
(SoC) platforms that utilizes the exhaustive search method
for one-level caches and just one type of processor (MIPS
core processor). It is suitable only in some cases whenthere are only a small number of possible configurations
[8]. But for a large design space, a long exploration time
would be required. Even the use of heuristics may be
unsuitable for several long simulations.
Palesi et al. [9] reduces the possible configuration spaceby using a genetic algorithm and produces faster resultsthan the Platune approach. Zhang et al. [3] have developed
a heuristic based on the influence of each cache parameter
(cache size, line size and associativity) in the overall
energy consumption. However, the simulation mechanism
used by the previous approaches is based on the
SimpleScalar [10] and CACTI tools [11]. SimpleScalar is a
microprocessor simulation tool based on command lines,
which generate the results of the applications
performance. The CACTI tool is intended to generate
energy consumption per access for a given cache
configuration. In these cases, the simulation of different
configurations for the same application may take a longperiod.
Prete et al. [12] proposed the simulation tool called
ChARM for tuning ARM-based embedded systems that
also include cache memories. This tool provides a
parametric, trace-driven simulation for tuning systemconfiguration. Unlike previous approaches, it provides a
graphical interface that allows designers to configure the
components parameters of the components, evaluate
execution time, conduct a set of simulations, and analyze
the results. However, energy results are not supported by
this approach.
On the other hand, Silva-Filho, in [8] takes into account
static and dynamic energy consumption estimates in his
analysis with the TECH-CYCLES heuristic. This heuristic
uses the eCACTI [13] cache memory model to determine
the energy consumption of the hierarchy. The eCACTI,differently from other approaches, considers the two
energy components: static and dynamic. The static energy
component that was negligible in previous technologies
represents, for recent technologies, up to 30% of the
energy in CMOS circuits [14].
The eCACTI is an up-to-date cache memory model that
was extended from the original CACTI model [11]. The
original CACTI tool does not consider the static
component of energy. Also, the transistor width of various
devices is assumed to be constant (except for wordlines)when analyzing power and delay. Nowadays this
assumption would be incorrect [11], because the transistor
widths in actual cache designs change according to theircapacitive load. These lead to significant inaccuracies in
the CACTI power estimates.
The PDesigner framework is an Eclipse-basedframework [15] that provides support for the modeling and
simulation of SoCs and MPSoCs platforms. By using this
framework the platform designer can build the platform
graphically and generate an executable simulator.
Currently, PDesigner is a free solution and offers support
to modeling platform with different components such as
processors, cache memory, memory, bus and connections.
Performance results are obtained from this approach;
however, energy results are not supported.
Looking at the situation depicted in Table 1 it becomes
evident that there is no environment that combines theflexibility to model multiple platforms with caches; the use
of an approach based on a single simulation; the capability
to estimate both dynamic and static energy consumption of
cache memories; or the possibility to explore the platform
configuration design space graphically.
Table 1. Comparison of related studies.
Multi
Platform
Modeling
Single
Simul.
Dynamic
Consump
.
Static
Consump
.
Graphical
Exploration
Zhang - - - -
Palesis - - - -
Silva-Filho - - -
Platune - - -
SimpleScalar - - - - -
ChARM - - - -
PDesigner - - -
8/2/2019 Padrao Spl2010 Ufpe v5
3/6
LibraryExtension
Integrationin PDesigner
VisualEnvironmentInteraction
AnalysisFlow
LibraryExtension
Integrationin PDesigner
InteractiveGraphicalEnvironment
AnalysisFlow
VisualEnvironmentInteraction
Dynamic & Static
Energy estimation
PCacheEnergyAnalyzer Plugin
3.PROPOSED APPROACHIn this paper, we propose the development of a cache
energy consumption estimation tool that implements an
energy consumption analysis flow and its integration as aplugin in the PDesigner framework. The plugin, called
PCacheEnergyAnalyzer, provides dynamic and static
energy consumption statistics for cache memory
components of a SoC. The plugin is also an interactive
environment that provides a graphical user-friendly
interface for cache analysis and its interaction with the
platform model already provided by the PDesigner.
The proposed approach is depicted in Figure 1. The first
step in the approach has been the definition of an energy
cache analysis flow. For the implementation of the flow a
new SystemC component that generates traces of memoryaccesses has been created, and that has been added to the
PDesigner library. Moreover, two additional tools have
been created: an interactive graphical environment that
allows the control and view of the results of the analysis;
and a tool for dynamic and static energy consumption
estimation based on the eCACTI model. These two toolscomprise the PCacheEnergyAnalyzer plugin. The plugin
allows the designer to select a cache on the platform,
define the design space to be explored, visualize the results
in charts, select the desired cache configuration from the
chart and reflect the decision on the platform.
Finally, the updated library and the
PCacheEnergyAnalyzer plugin have been integrated into
the PDesigner framework. The result is a powerful tool
that supports the modeling of platforms and the cache
architecture exploration.
In the rest of this section the analysis flow, itsimplementation by the PCacheEnergyAnalyzer and the
integration in the PDesigner are explained.
Fig. 1. Proposed approach.
3.1.Cache Energy Consumption Analysis FlowFigure 2 shows the flow used to analyze energy
consumption in cache memories. All necessary steps are
detailed carefully in this section.
Fig. 2. Energy consumption analysis flow.
Initially, the desired platform is graphically constructed
from a list of components available in the PDesigner
component library. System designers model the
architecture by dragging and dropping the componentsfrom the component palette. The component palette has the
following component types: processor, bus, device,
memory and cache memory. Figure 3 shows an example of
a platform composed of a MIPS processor, cache memory,
bus and main memory. The component master and slave
protocol ports are connected through connections. Thedesigner can also change the component parameters by
selecting them and using the properties view (lower part of
the Figure 3).
The application is a binary code compiled for the target
processor. The designer selects the processor and
associates the binary file with the triple {processor,
memory, load address}.
In order to make energy analysis in cache memory it is
necessary to select the PCacheEnergyAnalyzer option
when the designer right-clicks on the cache component.
This option enables the platform to explore energy
consumption in the cache memory component.
Once the cache component has been selected, the
designer can change the cache memory properties. In theProperties window shown in Figure 3, the designer can
change the exploration space of the cache memory
component. This is done by defining minimum and
maximum values for each cache memory parameter. The
parameters are the following: cache size, cache line size
and associativity. For the associativity there is only the
maximum parameter.After, an executable simulator of the platform it is
generated. The simulator performs a single simulation and
generates miss and hits statistics for the entire
configuration space defined by the designer. So, the result
of the simulator execution is an XML file that contains the
Define
Simulate
Define
View Results
Select Configuration
Update Platform
Select Cache Calculate Energy
Platform
Mapping
Application
Energy Analysis
Exploration Space
Configuration Space
DefineTransistor Technology
8/2/2019 Padrao Spl2010 Ufpe v5
4/6
cache configuration ID, cache parameters such as size, line
size, associativity, number of accesses and miss rate.
A simulation mechanism using a single-pass simulation
technique, based on [16] work, has been adopted. Usually,
simulations using this method are based on traces andspend more than one single simulation [16] [17]. For
instance, single-pass cache evaluation mechanism
proposed in [16] is 70 times faster than a simulation-based
mechanism for ADPCM application from Mediabench.
Fig. 3. PDesigner, Architecture Modeling, Component
Palette and Configuration Space.
The exploration space may contain cache
configurations that are invalid or that are not interesting for
the designer. After simulation, the designer is able to select
some or all configurations for energy analysis and definethe configuration space that contains all the desired cache
configurations through a Configuration Selection Window.
This window allows the designer to select the transistor
technology size and also all the cache configurations in the
configuration space. After the configuration space has been
defined, the energy module calculates the energyconsumption and number of cycles for each selected
configuration.
The cache memory energy consumption calculation
flow is depicted in Figure 4.
A parser receives as input the selectedConfigurations Space saved in the XML file and separates
it in two sets of information. The first of these is the cache
parameters and technology information that are provided to
the eCACTI tool for the dynamic and static energy
calculation per access. The second one contains the
number of misses, the number of accesses and cacheparameters of the chosen configuration. This information,
together with the dynamic and static energy provided by
the eCACTI, is used to calculate the total static and
dynamic energies consumed by the cache memory for the
application. In addition, in this step the total number of
cycles needed to run the application is also calculated.
Once calculated these parameters, another parser generates
the energy estimation results for each configuration also in
XML format file.
Fig. 4. Energy Calculation Flow
A cost function represented by F = Energy x Cycles
equation is also calculated. The minimization of this cost
function makes it possible to obtain the cache
configurations near to Pareto-optimal [8]. These cacheconfigurations present a tradeoff between performance and
energy consumption. The configuration that has the lowest
Energy x Cycles cost is also identified.
Once the energy calculation flow is concluded, the user
graphically visualizes the results of the cache energy
analysis. The energy consumption estimation for each ofthe configurations in the configuration space is displayed
in a visual interactive chart as depicted in Figure 5. Thechart displays on the y-axis the energy consumed and, on
the x-axis, the performance in number of clock cycles.
Each point on the chart corresponds to one of the
configurations in the configuration space.
The chart is interactive, meaning the user can select one
of the points and display information about it. There are
two types of information: the first, in the form of a tool tip,
is depicted by the rectangle in Figure 5 and contains the
number of cycles and energy consumed by the selected
configuration; the second form of presenting information is
by viewing properties, also shown in Figure 5.
Selected Configurations
Space (.XML)
parser
Cache parameters
and technology
eCACTI
Energy, Cycles
CalculationEnergy, Cycles
Results
parser
Energy Consumption
Estimation Results (.XML)
Cache parameters,
# Miss, # Accesses
Dynamic and Static
Energy per access
Processor
Cache Memory
Bus
Main Memory
Component
Exploration Space
Processor Load Address
8/2/2019 Padrao Spl2010 Ufpe v5
5/6
0,0000
0,0200
0,0400
0,0600
0,0800
0,1000
0,1200
Timing Rawcaudio Rawdaudio FFT
Energy
(Joules)
MIPS (Cost Function)
MIPS (Lowest Energy)
SPARC (Cost Function)
SPARC (Lowest Energy)
Fig. 5. Energy estimation interactive chart.
Here the following information is displayed: the cacheconfiguration parameter values, miss rate, number of
accesses, the cost value based on the cost function
calculation, dynamic and static energy consumption, the
total cycles required to run the application and the total
energy consumption.
The configuration with the lowest calculated cost isrepresented in the interactive chart in a different color. The
user can use this configuration as a reference. Therefore
he/she is not obliged to choose it as the optimal
configuration.
The user also can interact with the chart in order to view theproperties of a particular cache configuration. In this step,
the designer selects one of the configurations that meets
his/her performance/energy consumption requirements. The
user selects the configuration by simply clicking on the
point in the chart. In this step, the designer updates the
platform by replacing the actual cache component with theselected configuration parameter values. The
PCacheEnergyAnalyzer plugin makes the substitution
automatically by interacting with the PDesigner
Framework.
4.RESULTS
The PCacheEnergyAnalyzertool has been used to explore
the cache memory design space for four different
applications of the Mediabench benchmark suite [18]: fft,
timing, rawcaudio and rawdaudio.
The architecture is composed of one interconectionstructure SimpleBus; one cache memory; and a RAM
memory. The parameters of the cache memory are varied
and the exploration is performed for the two different
processors and four different applications from the
Mediabench suite [18].
The configuration space used considers 50 differentconfigurations for each application. The selected
technology was 0.18um. The cache size varies from 256 to
8192 bytes; the cache line size ranges from 16 to 64 bytes;
and the associativity ranges from 1 to 4.
The energy consumption estimation and performance
have been calculated based on the flow depicted in Figure4. The results are then displayed in the energy estimation
interactive chart of Figure 5.
Fig. 6. Energy estimation for different applications.
Figure 6 summarizes the energy consumption estimation
values of the cache configurations with best cost function
and configurations with the lowest energy consumption in
the configuration space for each application, running in the
MIPS and SPARCV8 processors.Despite these two processors have similar architectures,
compilers and compilation optimization presents some
differences in some cases. It can be seen in the chart that
the MIPS processor presents a much better energy
consumption than the SPARCV8 for the timing
application, and slightly higher energy consumption for theother applications.
8/2/2019 Padrao Spl2010 Ufpe v5
6/6
Additionally, the proposed approach also was compared
with existing work by using the basicmath_small from
Mibench suite [19]. SimpleScalar and
PcacheEnergyAnalyzer(PCEA) were compared in terms of
fidelity by analyzing the energy consumption for somedifferent cache configurations. Each pixel in Figure 7
represents the energy consumption for a given cache
configuration (cache size, cache line size, associativity).
Fig. 7. Normalized Energy comparison for SimpleScalar
and PCEA approaches.
Although SimpleScalar tool do not support energyconsumption analysis, it was calculated with an approach
based on Zhang work [3], using one level cache and the
eCACTI cache memory energy model. For simplicity of
the analysis, data and instructions caches configurations
are assumed to be the same.
Results showed in Figure 7 indicate that both approachespresent fidelity. We believe that the precision difference
depicted in the figure 8 is due to the used compilers and
compilation optimizations.
5.CONCLUSIONIn this work has been presented the PcacheEnergyAnalyzer
environment for energy consumption analysis. The tool
provides support for cache memory energy consumption
estimation on SoC platforms. Initial studies were focused
for one level caches, however, it can be easily extended formore levels. Results have shown that it is a powerful tool
for helping users to find interesting cache configurations
for a particular application, which consider not only
performance, but also the best relation between
performance and energy consumption.
PCacheEnergyAnalyzer fills the gaps of the existing
tools by simultaneously providing multiplatform support,
extensibility, dynamic and static energy consumption
estimation and a graphical environment.
6.REFERENCES[1] H. Chang; L. Code; M. Hunt, G. Martin, A.J. McNelly and
L. Todd, Surviving the SOC revolution: A guide to
platform-based design; Kluwer Academic Publishers, 1 ed.,1999.
[2] B. Malik Moyer and D. Cermak, A Low Power UnifiedCache Architecture Providing Power and Performance
Flexibility, Int Symp. On Low Power Electronics and
Design, June 2000, pp. 241-243.
[3] C. Zhang, F. Vahid, Cache configuration exploration onprototyping platforms. 14th IEEE Interational Workshop onRapid System Prototyping (June 2003), vol 00, p.164.
[4] A. Gordon-Ross, F. Vahid, N. Dutt, Automatic Tuning ofTwo-Level Caches to Embedded Aplications, DATE,pp.208-213 (Feb 2004).
[5] A. Gordon-Ross, et.al. ,Fast Configurable-Cache Tuningwith a Unified Second-Level Cache, ISLPED05, 2005.
[6]
A.G. Silva-Filho, F.R. Cordeiro, R.E. SantAnna and M.E.Lima, Heuristic for Two-Level Cache Hierarchy
Exploration Considering Energy Consumption and
Performance, PATMOS 2006, Montpellier, France,September 13-15, 2006 pp 75-83.
[7] A. Halambi, et al. EXPRESSION: A language forarchitecture exploration through compiler/simulator
retargetability. DATE , March 1999. p.485-491.
[8] T. Givargis, F. Vahid; Platune: A Tuning framework forsystem-on-a-chip platforms, IEEE Trans. Computer-AidedDesign, vol 21, nov. 2002. pp.1-11.
[9] M. Palesi, T. Givargis, Multi-objective design spaceexploration using genetic algorithms. InternacionalWordshop on Hardware/Software Codesign (May 2002).
[10]D. Burger, T.M. Austin, The SimpleScalar Tool Set,Version 2.0; Computer Architecture News; Vol 25(3). June1997. pp.13-25.
[11]P. Shivakumar, N.P. Jouppi, Cacti 3.0: An Integrated CacheTiming, Power and Area model, WRL Research Report2001/2.
[12]C.A. Prete, M. Graziano, F. Lazzarini, The ChARM Toolfor Tuning Embbeded Systems. In IEEE Micro 1997. Vol17, pp. 67-76.
[13]N. Dutt, M. Mamidipaka, eCACTI: An Enhanced PowerEstimation Model for On-chip Caches, TR 04-28; set. 2004.
[14]E. Macii, et. al. ; Energy-Aware Design of EmbeddedMemories: A Survey of Technologies, Architectures and
Optimization Techniques,ACM Transactions on Embedded
Computing Systems; Vol. 2, No. 1, Feb. 2003, pp. 5-32.
[15]Eclipse, available at http://www.eclipse.org.[16]P. Viana, et al. Cache-Analyzer: Design Space Evaluationof Configurable-Caches in a Single-Pass. International
Workshop on Rapid System Prototyping. pp. 3-9, May 2007.
[17]R.A. Sugumar, and S.G. Abraham, Efficient simulation ofmultiple cache configurations using binomial trees, CSE-TR-111-91,CSE Div, Univ. of Michigan, 1991. Available in:
.
[18]Mediabench: http://cares.icsl.ucla.edu/MediaBench/,2006.[19]M.R. Guttaus, et al. Mibench: A free, commercially
representative embedded benchmark suite. In IEEE 4thAnnual Workshop on Workload Characterization, pp.1-12,
Dec. 2001.