Padrao Spl2010 Ufpe v5

Embed Size (px)

Citation preview

  • 8/2/2019 Padrao Spl2010 Ufpe v5

    1/6

    AN ENVIRONMENT FOR ENERGY CONSUMPTION ANALYSIS OF

    CACHE MEMORIES IN SOC PLATFORMS

    Cordeiro, F.R.; Silva-Filho, A.G.; Araujo, C.C.; Gomes, M.; Barros, E.N.S. and Lima, M.E.

    Informatics Center (CIn)

    Federal University of Pernambuco (UFPE)

    Av. Prof. Luiz Freire s/nCidade UniversitriaRecife/PE - Brasil

    email: { frc, agsf,cca2,maag,ensb,mel}@cin.ufpe.br

    ABSTRACT

    The tuning of cache architectures in platforms for

    embedded systems applications can dramatically reduceenergy consumption. The existing cache exploration

    environments constrain the designer to analyze cache

    energy consumption on single processor systems and

    worse, systems that are based on a single processor type. Inthis paper is presented the PCacheEnergyAnalyzer

    environment for energy consumption analysis of cache

    memory on SoC platforms. This is a powerful energy

    analysis environment that combines the use of efficient

    tools to provide static and dynamic energy consumption

    analysis, the flexibility to support the architecture

    exploration of cache memories on platforms that are not

    bound to a specific processor, and fast simulation

    techniques. The proposed environment has been integratedinto the SoC modeling framework PDesigner, providing a

    user-friendly graphical interface allowing the integrated

    modeling and cache energy analysis of SoCs. The

    PCacheEnergyAnalyzer has been validated with four

    applications of the Mediabench suite benchmark.

    1.INTRODUCTIONCurrently, the energy consumed by the memory hierarchy

    can account for up to 50% of the total energy spent by

    microprocessor-based architectures [1][2]. This fact

    becomes more critical due to the emergence of SoCs,meaning a large part of the integrated circuits contains

    heterogeneous processors and often cache memories.

    Moreover, current semiconductor technologies have raised

    static memory consumption from negligible to up to 30%.

    Many approaches do not take this fact in consideration.

    Many efforts have been made to reduce theconsumption of energy by adjusting cache parameters to the

    needs of a particular application[3][4][5][6]. However,

    since the fundamental purpose of the cache subsystem is to

    provide high performance for memory accessing, cache

    optimization techniques are supposed to be driven not only

    by energy savings but also by preventing degradation of the

    applications performance.

    No single combination of cache parameters (total size,

    line size and associativity), also known as cache

    configuration, would be perfect for all applications.

    Therefore cache subsystems have been customized in order

    to deal with specific characteristics and to optimize theirenergy consumption when running a particular application.

    By adjusting the parameters of a cache memory to a

    specific application it is possible to save on average 60% of

    energy consumption [3].

    Nevertheless, finding a suitable cache configuration

    (combination of total size, line size and associativity) for a

    specific application can be a complex task and may take a

    long time for simulation and analysis. Most of the tools use

    exhaustive or heuristics based exploration [4][5][6] of all

    possible cache configurations. These are cost intensive

    approaches, leading to unacceptable exploration times.

    The tools for cache analysis lack the resources designers

    need to perform an energy consumption analysis withefficacy and efficiency. Some do not take into consideration

    static energy consumption. These tools are not flexible;

    they are normally bound to a specific processor. Designers

    experience several difficulties when they need to analyzecaches on a platform with processors that are different from

    the ones in the tools.

    Some environments have been developed aimed at

    cache parameters exploration [4][5][7]. However, none of

    them has taken into account cache memory energy models

    that consider the two energy components: static and

    dynamic. Silva-Filho [6] considers static and dynamic

    energy components, however, the work focuses on a single

    platform and is not integrated in to a graphical interfaceenvironment for platform analysis.

    Although cache memory exploration considering energy

    consumption is not a new issue in Design Space

    Exploration (DSE), this work contributes with a new

    approach for exploiting platforms with cache memory

    architectures, considering energy consumption. Differently

    from other approaches with analysis intended for only one

    processor, this paper presents an environment for energy

    consumption analysis called PCacheEnergyAnalyzer. This

    is a an environment that provides support for the

    exploration of cache memories configurations in terms of

    static and dynamic energy consumption. Moreover, it uses a

  • 8/2/2019 Padrao Spl2010 Ufpe v5

    2/6

    fast exploration strategy based on single-step simulation for

    simulating multiple sizes of caches simultaneously. It also

    supports the cache exploration in platforms not bound to a

    specific processor. All these features are integrated in a

    easy to handle graphical environment in the PDesignerFramework.

    The rest of this paper is structured as follows. In the

    next section, we discuss some recent related work. In

    section 3 the proposed approach for a cache energy

    environment is presented. In section 5 some results are

    presented comparing the potentialities for two different

    processors and several applications by using

    PCacheEnergyAnalyzer environment. Finally, in Section 5

    the conclusions and future directions are discussed.

    2.RELATED WORKSome existing methods still apply the exhaustive search to

    find the optimal cache configuration in the design space.

    However, the time required for such an exhaustive search

    is often prohibitive. Platune [8] is an example of a

    framework for adjusting configurable System-on-Chip

    (SoC) platforms that utilizes the exhaustive search method

    for one-level caches and just one type of processor (MIPS

    core processor). It is suitable only in some cases whenthere are only a small number of possible configurations

    [8]. But for a large design space, a long exploration time

    would be required. Even the use of heuristics may be

    unsuitable for several long simulations.

    Palesi et al. [9] reduces the possible configuration spaceby using a genetic algorithm and produces faster resultsthan the Platune approach. Zhang et al. [3] have developed

    a heuristic based on the influence of each cache parameter

    (cache size, line size and associativity) in the overall

    energy consumption. However, the simulation mechanism

    used by the previous approaches is based on the

    SimpleScalar [10] and CACTI tools [11]. SimpleScalar is a

    microprocessor simulation tool based on command lines,

    which generate the results of the applications

    performance. The CACTI tool is intended to generate

    energy consumption per access for a given cache

    configuration. In these cases, the simulation of different

    configurations for the same application may take a longperiod.

    Prete et al. [12] proposed the simulation tool called

    ChARM for tuning ARM-based embedded systems that

    also include cache memories. This tool provides a

    parametric, trace-driven simulation for tuning systemconfiguration. Unlike previous approaches, it provides a

    graphical interface that allows designers to configure the

    components parameters of the components, evaluate

    execution time, conduct a set of simulations, and analyze

    the results. However, energy results are not supported by

    this approach.

    On the other hand, Silva-Filho, in [8] takes into account

    static and dynamic energy consumption estimates in his

    analysis with the TECH-CYCLES heuristic. This heuristic

    uses the eCACTI [13] cache memory model to determine

    the energy consumption of the hierarchy. The eCACTI,differently from other approaches, considers the two

    energy components: static and dynamic. The static energy

    component that was negligible in previous technologies

    represents, for recent technologies, up to 30% of the

    energy in CMOS circuits [14].

    The eCACTI is an up-to-date cache memory model that

    was extended from the original CACTI model [11]. The

    original CACTI tool does not consider the static

    component of energy. Also, the transistor width of various

    devices is assumed to be constant (except for wordlines)when analyzing power and delay. Nowadays this

    assumption would be incorrect [11], because the transistor

    widths in actual cache designs change according to theircapacitive load. These lead to significant inaccuracies in

    the CACTI power estimates.

    The PDesigner framework is an Eclipse-basedframework [15] that provides support for the modeling and

    simulation of SoCs and MPSoCs platforms. By using this

    framework the platform designer can build the platform

    graphically and generate an executable simulator.

    Currently, PDesigner is a free solution and offers support

    to modeling platform with different components such as

    processors, cache memory, memory, bus and connections.

    Performance results are obtained from this approach;

    however, energy results are not supported.

    Looking at the situation depicted in Table 1 it becomes

    evident that there is no environment that combines theflexibility to model multiple platforms with caches; the use

    of an approach based on a single simulation; the capability

    to estimate both dynamic and static energy consumption of

    cache memories; or the possibility to explore the platform

    configuration design space graphically.

    Table 1. Comparison of related studies.

    Multi

    Platform

    Modeling

    Single

    Simul.

    Dynamic

    Consump

    .

    Static

    Consump

    .

    Graphical

    Exploration

    Zhang - - - -

    Palesis - - - -

    Silva-Filho - - -

    Platune - - -

    SimpleScalar - - - - -

    ChARM - - - -

    PDesigner - - -

  • 8/2/2019 Padrao Spl2010 Ufpe v5

    3/6

    LibraryExtension

    Integrationin PDesigner

    VisualEnvironmentInteraction

    AnalysisFlow

    LibraryExtension

    Integrationin PDesigner

    InteractiveGraphicalEnvironment

    AnalysisFlow

    VisualEnvironmentInteraction

    Dynamic & Static

    Energy estimation

    PCacheEnergyAnalyzer Plugin

    3.PROPOSED APPROACHIn this paper, we propose the development of a cache

    energy consumption estimation tool that implements an

    energy consumption analysis flow and its integration as aplugin in the PDesigner framework. The plugin, called

    PCacheEnergyAnalyzer, provides dynamic and static

    energy consumption statistics for cache memory

    components of a SoC. The plugin is also an interactive

    environment that provides a graphical user-friendly

    interface for cache analysis and its interaction with the

    platform model already provided by the PDesigner.

    The proposed approach is depicted in Figure 1. The first

    step in the approach has been the definition of an energy

    cache analysis flow. For the implementation of the flow a

    new SystemC component that generates traces of memoryaccesses has been created, and that has been added to the

    PDesigner library. Moreover, two additional tools have

    been created: an interactive graphical environment that

    allows the control and view of the results of the analysis;

    and a tool for dynamic and static energy consumption

    estimation based on the eCACTI model. These two toolscomprise the PCacheEnergyAnalyzer plugin. The plugin

    allows the designer to select a cache on the platform,

    define the design space to be explored, visualize the results

    in charts, select the desired cache configuration from the

    chart and reflect the decision on the platform.

    Finally, the updated library and the

    PCacheEnergyAnalyzer plugin have been integrated into

    the PDesigner framework. The result is a powerful tool

    that supports the modeling of platforms and the cache

    architecture exploration.

    In the rest of this section the analysis flow, itsimplementation by the PCacheEnergyAnalyzer and the

    integration in the PDesigner are explained.

    Fig. 1. Proposed approach.

    3.1.Cache Energy Consumption Analysis FlowFigure 2 shows the flow used to analyze energy

    consumption in cache memories. All necessary steps are

    detailed carefully in this section.

    Fig. 2. Energy consumption analysis flow.

    Initially, the desired platform is graphically constructed

    from a list of components available in the PDesigner

    component library. System designers model the

    architecture by dragging and dropping the componentsfrom the component palette. The component palette has the

    following component types: processor, bus, device,

    memory and cache memory. Figure 3 shows an example of

    a platform composed of a MIPS processor, cache memory,

    bus and main memory. The component master and slave

    protocol ports are connected through connections. Thedesigner can also change the component parameters by

    selecting them and using the properties view (lower part of

    the Figure 3).

    The application is a binary code compiled for the target

    processor. The designer selects the processor and

    associates the binary file with the triple {processor,

    memory, load address}.

    In order to make energy analysis in cache memory it is

    necessary to select the PCacheEnergyAnalyzer option

    when the designer right-clicks on the cache component.

    This option enables the platform to explore energy

    consumption in the cache memory component.

    Once the cache component has been selected, the

    designer can change the cache memory properties. In theProperties window shown in Figure 3, the designer can

    change the exploration space of the cache memory

    component. This is done by defining minimum and

    maximum values for each cache memory parameter. The

    parameters are the following: cache size, cache line size

    and associativity. For the associativity there is only the

    maximum parameter.After, an executable simulator of the platform it is

    generated. The simulator performs a single simulation and

    generates miss and hits statistics for the entire

    configuration space defined by the designer. So, the result

    of the simulator execution is an XML file that contains the

    Define

    Simulate

    Define

    View Results

    Select Configuration

    Update Platform

    Select Cache Calculate Energy

    Platform

    Mapping

    Application

    Energy Analysis

    Exploration Space

    Configuration Space

    DefineTransistor Technology

  • 8/2/2019 Padrao Spl2010 Ufpe v5

    4/6

    cache configuration ID, cache parameters such as size, line

    size, associativity, number of accesses and miss rate.

    A simulation mechanism using a single-pass simulation

    technique, based on [16] work, has been adopted. Usually,

    simulations using this method are based on traces andspend more than one single simulation [16] [17]. For

    instance, single-pass cache evaluation mechanism

    proposed in [16] is 70 times faster than a simulation-based

    mechanism for ADPCM application from Mediabench.

    Fig. 3. PDesigner, Architecture Modeling, Component

    Palette and Configuration Space.

    The exploration space may contain cache

    configurations that are invalid or that are not interesting for

    the designer. After simulation, the designer is able to select

    some or all configurations for energy analysis and definethe configuration space that contains all the desired cache

    configurations through a Configuration Selection Window.

    This window allows the designer to select the transistor

    technology size and also all the cache configurations in the

    configuration space. After the configuration space has been

    defined, the energy module calculates the energyconsumption and number of cycles for each selected

    configuration.

    The cache memory energy consumption calculation

    flow is depicted in Figure 4.

    A parser receives as input the selectedConfigurations Space saved in the XML file and separates

    it in two sets of information. The first of these is the cache

    parameters and technology information that are provided to

    the eCACTI tool for the dynamic and static energy

    calculation per access. The second one contains the

    number of misses, the number of accesses and cacheparameters of the chosen configuration. This information,

    together with the dynamic and static energy provided by

    the eCACTI, is used to calculate the total static and

    dynamic energies consumed by the cache memory for the

    application. In addition, in this step the total number of

    cycles needed to run the application is also calculated.

    Once calculated these parameters, another parser generates

    the energy estimation results for each configuration also in

    XML format file.

    Fig. 4. Energy Calculation Flow

    A cost function represented by F = Energy x Cycles

    equation is also calculated. The minimization of this cost

    function makes it possible to obtain the cache

    configurations near to Pareto-optimal [8]. These cacheconfigurations present a tradeoff between performance and

    energy consumption. The configuration that has the lowest

    Energy x Cycles cost is also identified.

    Once the energy calculation flow is concluded, the user

    graphically visualizes the results of the cache energy

    analysis. The energy consumption estimation for each ofthe configurations in the configuration space is displayed

    in a visual interactive chart as depicted in Figure 5. Thechart displays on the y-axis the energy consumed and, on

    the x-axis, the performance in number of clock cycles.

    Each point on the chart corresponds to one of the

    configurations in the configuration space.

    The chart is interactive, meaning the user can select one

    of the points and display information about it. There are

    two types of information: the first, in the form of a tool tip,

    is depicted by the rectangle in Figure 5 and contains the

    number of cycles and energy consumed by the selected

    configuration; the second form of presenting information is

    by viewing properties, also shown in Figure 5.

    Selected Configurations

    Space (.XML)

    parser

    Cache parameters

    and technology

    eCACTI

    Energy, Cycles

    CalculationEnergy, Cycles

    Results

    parser

    Energy Consumption

    Estimation Results (.XML)

    Cache parameters,

    # Miss, # Accesses

    Dynamic and Static

    Energy per access

    Processor

    Cache Memory

    Bus

    Main Memory

    Component

    Exploration Space

    Processor Load Address

  • 8/2/2019 Padrao Spl2010 Ufpe v5

    5/6

    0,0000

    0,0200

    0,0400

    0,0600

    0,0800

    0,1000

    0,1200

    Timing Rawcaudio Rawdaudio FFT

    Energy

    (Joules)

    MIPS (Cost Function)

    MIPS (Lowest Energy)

    SPARC (Cost Function)

    SPARC (Lowest Energy)

    Fig. 5. Energy estimation interactive chart.

    Here the following information is displayed: the cacheconfiguration parameter values, miss rate, number of

    accesses, the cost value based on the cost function

    calculation, dynamic and static energy consumption, the

    total cycles required to run the application and the total

    energy consumption.

    The configuration with the lowest calculated cost isrepresented in the interactive chart in a different color. The

    user can use this configuration as a reference. Therefore

    he/she is not obliged to choose it as the optimal

    configuration.

    The user also can interact with the chart in order to view theproperties of a particular cache configuration. In this step,

    the designer selects one of the configurations that meets

    his/her performance/energy consumption requirements. The

    user selects the configuration by simply clicking on the

    point in the chart. In this step, the designer updates the

    platform by replacing the actual cache component with theselected configuration parameter values. The

    PCacheEnergyAnalyzer plugin makes the substitution

    automatically by interacting with the PDesigner

    Framework.

    4.RESULTS

    The PCacheEnergyAnalyzertool has been used to explore

    the cache memory design space for four different

    applications of the Mediabench benchmark suite [18]: fft,

    timing, rawcaudio and rawdaudio.

    The architecture is composed of one interconectionstructure SimpleBus; one cache memory; and a RAM

    memory. The parameters of the cache memory are varied

    and the exploration is performed for the two different

    processors and four different applications from the

    Mediabench suite [18].

    The configuration space used considers 50 differentconfigurations for each application. The selected

    technology was 0.18um. The cache size varies from 256 to

    8192 bytes; the cache line size ranges from 16 to 64 bytes;

    and the associativity ranges from 1 to 4.

    The energy consumption estimation and performance

    have been calculated based on the flow depicted in Figure4. The results are then displayed in the energy estimation

    interactive chart of Figure 5.

    Fig. 6. Energy estimation for different applications.

    Figure 6 summarizes the energy consumption estimation

    values of the cache configurations with best cost function

    and configurations with the lowest energy consumption in

    the configuration space for each application, running in the

    MIPS and SPARCV8 processors.Despite these two processors have similar architectures,

    compilers and compilation optimization presents some

    differences in some cases. It can be seen in the chart that

    the MIPS processor presents a much better energy

    consumption than the SPARCV8 for the timing

    application, and slightly higher energy consumption for theother applications.

  • 8/2/2019 Padrao Spl2010 Ufpe v5

    6/6

    Additionally, the proposed approach also was compared

    with existing work by using the basicmath_small from

    Mibench suite [19]. SimpleScalar and

    PcacheEnergyAnalyzer(PCEA) were compared in terms of

    fidelity by analyzing the energy consumption for somedifferent cache configurations. Each pixel in Figure 7

    represents the energy consumption for a given cache

    configuration (cache size, cache line size, associativity).

    Fig. 7. Normalized Energy comparison for SimpleScalar

    and PCEA approaches.

    Although SimpleScalar tool do not support energyconsumption analysis, it was calculated with an approach

    based on Zhang work [3], using one level cache and the

    eCACTI cache memory energy model. For simplicity of

    the analysis, data and instructions caches configurations

    are assumed to be the same.

    Results showed in Figure 7 indicate that both approachespresent fidelity. We believe that the precision difference

    depicted in the figure 8 is due to the used compilers and

    compilation optimizations.

    5.CONCLUSIONIn this work has been presented the PcacheEnergyAnalyzer

    environment for energy consumption analysis. The tool

    provides support for cache memory energy consumption

    estimation on SoC platforms. Initial studies were focused

    for one level caches, however, it can be easily extended formore levels. Results have shown that it is a powerful tool

    for helping users to find interesting cache configurations

    for a particular application, which consider not only

    performance, but also the best relation between

    performance and energy consumption.

    PCacheEnergyAnalyzer fills the gaps of the existing

    tools by simultaneously providing multiplatform support,

    extensibility, dynamic and static energy consumption

    estimation and a graphical environment.

    6.REFERENCES[1] H. Chang; L. Code; M. Hunt, G. Martin, A.J. McNelly and

    L. Todd, Surviving the SOC revolution: A guide to

    platform-based design; Kluwer Academic Publishers, 1 ed.,1999.

    [2] B. Malik Moyer and D. Cermak, A Low Power UnifiedCache Architecture Providing Power and Performance

    Flexibility, Int Symp. On Low Power Electronics and

    Design, June 2000, pp. 241-243.

    [3] C. Zhang, F. Vahid, Cache configuration exploration onprototyping platforms. 14th IEEE Interational Workshop onRapid System Prototyping (June 2003), vol 00, p.164.

    [4] A. Gordon-Ross, F. Vahid, N. Dutt, Automatic Tuning ofTwo-Level Caches to Embedded Aplications, DATE,pp.208-213 (Feb 2004).

    [5] A. Gordon-Ross, et.al. ,Fast Configurable-Cache Tuningwith a Unified Second-Level Cache, ISLPED05, 2005.

    [6]

    A.G. Silva-Filho, F.R. Cordeiro, R.E. SantAnna and M.E.Lima, Heuristic for Two-Level Cache Hierarchy

    Exploration Considering Energy Consumption and

    Performance, PATMOS 2006, Montpellier, France,September 13-15, 2006 pp 75-83.

    [7] A. Halambi, et al. EXPRESSION: A language forarchitecture exploration through compiler/simulator

    retargetability. DATE , March 1999. p.485-491.

    [8] T. Givargis, F. Vahid; Platune: A Tuning framework forsystem-on-a-chip platforms, IEEE Trans. Computer-AidedDesign, vol 21, nov. 2002. pp.1-11.

    [9] M. Palesi, T. Givargis, Multi-objective design spaceexploration using genetic algorithms. InternacionalWordshop on Hardware/Software Codesign (May 2002).

    [10]D. Burger, T.M. Austin, The SimpleScalar Tool Set,Version 2.0; Computer Architecture News; Vol 25(3). June1997. pp.13-25.

    [11]P. Shivakumar, N.P. Jouppi, Cacti 3.0: An Integrated CacheTiming, Power and Area model, WRL Research Report2001/2.

    [12]C.A. Prete, M. Graziano, F. Lazzarini, The ChARM Toolfor Tuning Embbeded Systems. In IEEE Micro 1997. Vol17, pp. 67-76.

    [13]N. Dutt, M. Mamidipaka, eCACTI: An Enhanced PowerEstimation Model for On-chip Caches, TR 04-28; set. 2004.

    [14]E. Macii, et. al. ; Energy-Aware Design of EmbeddedMemories: A Survey of Technologies, Architectures and

    Optimization Techniques,ACM Transactions on Embedded

    Computing Systems; Vol. 2, No. 1, Feb. 2003, pp. 5-32.

    [15]Eclipse, available at http://www.eclipse.org.[16]P. Viana, et al. Cache-Analyzer: Design Space Evaluationof Configurable-Caches in a Single-Pass. International

    Workshop on Rapid System Prototyping. pp. 3-9, May 2007.

    [17]R.A. Sugumar, and S.G. Abraham, Efficient simulation ofmultiple cache configurations using binomial trees, CSE-TR-111-91,CSE Div, Univ. of Michigan, 1991. Available in:

    .

    [18]Mediabench: http://cares.icsl.ucla.edu/MediaBench/,2006.[19]M.R. Guttaus, et al. Mibench: A free, commercially

    representative embedded benchmark suite. In IEEE 4thAnnual Workshop on Workload Characterization, pp.1-12,

    Dec. 2001.