0502 Ghlai Slide

Embed Size (px)

Citation preview

  • 7/30/2019 0502 Ghlai Slide

    1/24

    A Hybrid Energy-EstimationTechnique for Extensible

    Processors

    Fei, Y.; Ravi, S.; Raghunathan, A.; Jha, N.K.

    IEEE Transactions on Computer-Aided Design of

    Integrated Circuits and Systems

    Volume: 23 Issue: 5

    Pages: 652-664

    May 2004

  • 7/30/2019 0502 Ghlai Slide

    2/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 2/24

    Abstract

    In this paper, we present an efficient and accurate methodology forestimating the energy consumption of application programsrunning on extensible processors. Extensible processors, whichare getting increasingly popular in embedded system design, allow

    a designer to customize a base processor core through instructionset extensions. Existing processor energy macromodelingtechniques are not applicable to extensible processor, since theyassume that the instruction set architecture as well as theunderlying structural description of the micro-architecture remain

    fixed. Our solution to the above problem is a hybrid energymacromodel suitably parameterized to estimate the energyconsumption of an application running on the correspondingapplication-specific extended processor instance, whichincorporates any custom instruction extension. Such acharacterization is facilitated by careful selection ofmacromodelparameters/variables that can capture both the functional andstructural aspects of the execution of a program on an extensibleprocessor.

  • 7/30/2019 0502 Ghlai Slide

    3/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 3/24

    Abstract (cont.)

    Another feature of the proposed energy characterization flow is theuse ofregression analysis to build the macromodel. Regressionanalysis allows for in-situ characterization, thus allowing arbitrarytest programs to be used during macromodel construction. We

    validated the proposed methodology by characterizing the energyconsumption of a state-of-the-art extensible processor (TensilicasXtensa). We used the macromodel to analyze the energyconsumption of several benchmark applications with custom

    instructions. The mean absolute error in the macromodel estimatesis only 3.3%, when compared to the energy values obtained by acommercial tool operating on the synthesized register-transferlevel (RTL) description of the custom processor. Our approach

    achieves an average speedup of three orders of magnitude overthe commercial RTL energy estimator. Our experiments show thatthe proposed methodology also achieves good relative accuracy,which is essential in energy optimization studies. Hence, our

    technique is both efficientand accurate.

  • 7/30/2019 0502 Ghlai Slide

    4/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 4/24

    Outline

    Whats the problem Introduction & related work

    Extensible processor energy macromodelrequirements Proposed energy estimation methodology Experimental results and evaluation Conclusions

  • 7/30/2019 0502 Ghlai Slide

    5/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 5/24

    Whats the Problem

    Existing processor energy estimation frameworkis impractical for use in energy optimizationdone in the ASIP design cycle The extension to the base processor ISA is not fixed The number of configurations/extensions is large

    Its essential to have a fast and accurate energyestimation of an application running on anextensible processor for each candidate

    configuration in energy optimization studies

  • 7/30/2019 0502 Ghlai Slide

    6/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 6/24

    Related Work

    Structural macromodeling Characterize energy consumption of its constituent

    hardware moduleE =Em1,i(bit transition) +Em2,i(bit transition) + +Emk,i(bit transition)( Em1,i(bit transition) denote energy per access of the module1)

    z Advantage: High accuracyz Disadvantage:

    1) Low efficiency (RTL simulation of a processor is extremely slow)2) Require RTL hardware description of the processor

    Suitable for energy estimation of a processor core

  • 7/30/2019 0502 Ghlai Slide

    7/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 7/24

    Related Work (cont.)

    Instruction-level macromodeling Characterize energy consumption of each instruction of

    the processor

    E = EIC1* CycIC1+ EIC2 * CycIC2+ EIC3* CycIC3+.+ EICk* CycICk(EIC1denote average energy consumption by instruction class1 )(CycIC1denote number of cycles taken by instruction class1 )

    z Energy coefficient EIC1

    is acquired by actual measurementof a chipimplementation Advantage: High efficiency (Use ISS to yield energy estimation) Disadvantage:

    1) Low accuracy

    2) Require actual chip implement and this is infeasible forpower tradeoff studies early in the design cycle

    Suitable for energy estimation of software on a fixed processorarchitecture

  • 7/30/2019 0502 Ghlai Slide

    8/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 8/24

    Related Work (cont.)

    Statistical analysis and prediction macromodeling Energy coefficients are calculated with regression analysis

    to build the macromodel

    Ei = C1 * M1,i+ C2 * M2,i+ .+ Ck* Mk,i+i ( i=1,2.n)(Total energy consumption Ei denote dependent variable)(Macromodel parameters M1,i. Mk,I denoteindependent variable)(i denote inaccuracy)

    z Use a set of given (Ei, M1,i ,.,Mk,i) ,i=1,2n to predict the bestenergy coefficient C1 , C2 ,..,Ck

    Energy macromodel generation

    =1 * M1+ 2 * M2,+ .+k * Mk(1,..,k denotethe estimate ofenergy coefficient)( denotes the estimate of total energy consumption )(Macromodel parameters M1,..,Mk are observable during ISS )

  • 7/30/2019 0502 Ghlai Slide

    9/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 9/24

    Paper Overview and Contributions

    Hybrid energy macromodeling Instruction-level macromodeling for base processor Structural macromodeling for custom hardware extension Regression macromodeling for energy characterization

    Contributions Energy consumption can simply be determined by instruction set

    simulation Combines the efficiency ofinstruction-level approaches and the

    accuracy ofstructural approaches Only needs the custom instruction descriptions Doest require the custom processor to be synthesized This is the only work on evaluate energy/performance tradeoff

    among candidate custom instructions for extensible processor at

    the early design cycle

  • 7/30/2019 0502 Ghlai Slide

    10/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 10/24

    Extensible Xtensa Processor

    Xtensas ISA consists of a basic set of instructions plus aset of configurable and extensible options

    Extensibility is achieved by specifying application-specific

    functionality through custom instructions The behavior of the custom instruction is descried using TIE

    (Tensilica Instruction Extension) language TIE is independent of the processors pipeline

    z Only need to describe the semantics of the instructions as ifthey consist of only combination logic

    The TIE compiler automatically derives

    The hardware implementation of custom instructions Corresponding software development kit for the configuration

    z ANCI C/C++ compiler, linker, assembler, debuggerz Cycle-accurate instruction set simulator (ISS)

  • 7/30/2019 0502 Ghlai Slide

    11/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 11/24

    Example Containing Three Custom Instructions

    user register statement Specify the custom state register

    and indices

    iclass statement Define a new instruction class

    with one or multiple custominstructions

    semantic statement Describe the behavior of theinstruction class

    schedule statement

    (Used for multiple cycle instruction) Schedule the operation

    sequence of the custominstruction Need ars and art at the beginning of first cycle

    Need ACCU at the beginning of second cycle

    Produce new ACCU at the end of second cycle

  • 7/30/2019 0502 Ghlai Slide

    12/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 12/24

    Partial Architecture of an Extended Processor

    Augmented with custom hardware to implement three custominstruction: MULT, MAC and CUS

    MULT and MAC perform their functionality using shared customhardware (which is dependent ofbase processor operand buses) A multiplier (X), a multiplexer (MUX1), and an adder (+1)

    CUS accesses custom register CR0CR2 (which is independent ofbase processor operand buses)

    temp1 temp2

    ACCU

  • 7/30/2019 0502 Ghlai Slide

    13/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 13/24

    Snapshot of Dynamic Execution of a Program

    Top horizontal bar lists the sequence of processor events dictated byits execution

    The bottom bar depicts the side effects in either the base processor orthe custom hardware Execution of the base processor instruction add actives custom hardware (X, MUX1,+1) in the second cycle Execution of the custom instructions (I2 and I3) active base processor hardware

    (ALU) in the second cycle Side effect occurs because the custom hardware and the ALU of the base

    processor share the same operand buses

  • 7/30/2019 0502 Ghlai Slide

    14/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 14/24

    Different Factors of the Energy Macromodel

    Energy consumed by base processor instructions on the baseprocessor core Energy dependency on inter-instruction correlation and other

    nonideal features (such as stalls, cache misses, etc.) Energy consumed by custom instructions on the custom

    hardwarez Only custom hardware computation energy

    The second box in the top bar of I2, I3, I4

    Interplay between the base processor and custom hardware Active energy ofcustom hardware owing to base processor instructions

    z Computation side effect in the EXE stage The bottom bar of instruction I1

    Active energy ofbase processor hardware owing to custom instructionsz Computation side effect in the EXE stage

    The bottom bar of instructions I2 and I3z Involvement of the base processor in other pipeline stages

    RdReg, Wait, WrReg, WrCR event in the top bar of instruction I2, I3, I4

  • 7/30/2019 0502 Ghlai Slide

    15/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 15/24

    Extensible Processor Energy Estimation Flowchart

    constructing macromodel templateE=E0X0+E1X1+ +EnXn

    express energy consumption (dependent variable)as a function of those characteristic parameter

    (independent variable)E0,..,En are constants called energy coefficientX1,...,Xn are chosen from both instruction-leveland structural domain

    Test program suite incorporates

    custom instructions to cover all thecustom HW library components

    Regression analysis require knowledgeof both the dependent variable and the

    independent variableStep 3-7 repeat for all the test programdependent variable

    independent variable

    Regression analysis finds the estimate ofenergy coefficient

    (energy macromodel construction complete

    Characterization Flow

  • 7/30/2019 0502 Ghlai Slide

    16/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 16/24

    Extensible Processor Energy Estimation Flowchart

    Step 9 gathers instruction-levelmacromodel parameter valuesinstruction-level execution statistics

    Step 10 gathers structural macromodelparameter values

    The activation of custom hardware

    Estimation Flow

    parameter values are fed to the energy

    macromodel to yield the energy estimatio

  • 7/30/2019 0502 Ghlai Slide

    17/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 17/24

    Energy Macromodel Template Generation

    - Eins is a linear function of instruction-level parameters depicts energy on the base processor- Estruc is a linear function of structural parameters depicts energy on custom hardware

    Instruction-level macromodel parameters

    Reflect the usage ofbase processor core due to either base processor orcustom instructions

    Energy components of the base processor core Energy ofbase processor owing to base processor instructions

    z Earith,.., Ebr_utk represent the average energy consumption ofeach instruction classz Cycarith,.., Cycbr_utk represent the number of cycles taken by each instruction class

    Energy due to inter-instruction correlation and other nonideal featuresz Macromodel parameters Numi,..,Numinterlock denote the number of times each

    nonideal case occurs

    Energy consumption in the base processor imposed by custom instructions(Energy consumption in the four pipeline stages other than the EXE stage)

    z Macromodel parameter Cycside_tie accounts for the number of cycles taken by allcustom instructions

    Eins= Earith*Cycarith + Eld*Cycld + Est*Cycst + Ej*Cycj + Ebr_tk* Cycbr_tk + Ebr_utk*Cycbr_utk +

    Ei*Numi + Ed*Numd + Euncache* Numuncache + Einterlock*Numinterlock +Eside_tie*Cycside_tie

  • 7/30/2019 0502 Ghlai Slide

    18/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 18/24

    Energy Macromodel Template Generation

    Structural macromodel parameters Reflect the usage ofcustom hardware extensions due to either base processor

    or custom instructions

    z Macromodel parameters Cyc1,,Cyc10denote the number of cycles in whicheach custom hardware component category is active

    z Energy coefficients E1,..,E10 represent the average energy consumption for eachkind ofcustom hardware component category

    Energy components of the custom hardware extensions Custom functional blocks is activated when any custom instructions executing Custom functional blocks can also be activated when base processor

    instructions are runningz Side effect due to the sharing of the same operand buses still affects the custom

    hardware

    Dynamic resource usage analysis in the execution trace identifies the activated

    custom functional blocks (HW component) for each instruction

    Custom hardware energy consumption expresses as below:Estruc= E1 * Cyc1 + E2 * Cyc2 + E3 * Cyc3 +.+E10 * Cyc10Note: structural macromodel parameters should be covered all the components present in

    the custom hardware library (10 component categories is this paper)

  • 7/30/2019 0502 Ghlai Slide

    19/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 19/24

    Macromodel Fitting Through Regression Analysis

    Determining the energy coefficients in the macromodel template Solving the linear-matrix equation M(n*21) X C(21*1)=E(n*1)

    E denotes a n*1 column vector which are grouped by the

    energy consumption data of n test programs M denotes a n*21 matrix which are grouped by the values

    corresponding to the macromodel parameters C is the energy coefficient vector corresponding to

    {Earith, Eld, Est, Ej, Ebr_tk, Ebr_utk, Ei, Ed, Euncache, Einterlock, Eside_tie, E1, E2,E3, E4, E5, E6, E7, E8, E9, E10 }

    ( denotes the estimate ofenergy coefficient C)

    ( denotes the estimate of total energyconsumption E)Yields the energy coefficient vector C, such thatthe mean square error is minimized

  • 7/30/2019 0502 Ghlai Slide

    20/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 20/24

    Energy Coefficients of the Xtensa Processor

    Energy consumption foreach base processor

    instruction category percycle

    Energy consumption forside-effectper cycle

    Energy consumption forexecution-time effects permiss/per-interlock

    Energy consumption for

    different custom hardwarecomponents per cycle

  • 7/30/2019 0502 Ghlai Slide

    21/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 21/24

    Absolute Accuracy Examination

    Application Energy Estimates

    The maximumestimation error is 8.5%

    The average absolute error is only 3.3% The proposed energy estimation methodology is very fast WattWatcher needs several more hours for energy estimation

    ( RTL description generation +RTL simulation +power estimation using

    WattWatcher )

  • 7/30/2019 0502 Ghlai Slide

    22/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 22/24

    Absolute Accuracy Examination (cont.)

    Energy consumption due to custom hardware can be significant

    The accuracy of the macromodel is high both for the baseprocessor and custom hardware

  • 7/30/2019 0502 Ghlai Slide

    23/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 23/24

    Relative Accuracy Examination

    Good relative accuracy of our macromodel

    The proposed energy estimation methodology is highrelative accuracy and low effort (no custom processorgeneration, no RTL simulation)

    Therefore, it is highly suitable for energy optimization studies

  • 7/30/2019 0502 Ghlai Slide

    24/24

    2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 24/24

    Conclusions

    Presented an efficient and accurate energy estimationmethodology for extensible processors High efficiency comes from energy estimation only requires

    instruction-set simulation based analysis of the application High accuracy comes from dynamic analysis ofcustom

    hardware usage pattern

    Although it speedup energy estimation, but it still havegood absolute accuracy (average absolute error is only 3.3%)and also achieve high relative accuracy