80
Autotuning at Illinois María Jesús Garzarán University of Illinois

Autotuning at Illinois María Jesús Garzarán University of Illinois

Embed Size (px)

Citation preview

Page 1: Autotuning at Illinois María Jesús Garzarán University of Illinois

Autotuning at Illinois

María Jesús Garzarán

University of Illinois

Page 2: Autotuning at Illinois María Jesús Garzarán University of Illinois

Outline

1. Why Autotuning?

2. What is Autotuning?

3. Research Problems

Page 3: Autotuning at Illinois María Jesús Garzarán University of Illinois

Why autotuning?

• In the era of parallelism…• Applications and software must maintain high

efficiency as machines evolve.– Otherwise, no reason for new machines.

• Problem: High-efficiency requires laborious tuning. – Cost increase. – Low performance if not enough resources

• Would like to automate tuning.

Page 4: Autotuning at Illinois María Jesús Garzarán University of Illinois

Compilers

• One way is compilers, but compilers have limitations.– Lack semantic information → fewer choices– Must target all applications– Must be reasonably fast

Page 5: Autotuning at Illinois María Jesús Garzarán University of Illinois

Compiler vs. Manual TuningDiscrete Fourier Transform

Page 6: Autotuning at Illinois María Jesús Garzarán University of Illinois

Compiler vs. Manual TuningMatrix Matrix Multiplication

20x

MF

LOP

S

Matrix Size

Intel MKL

icc -O3 -xT

icc -O3

Page 7: Autotuning at Illinois María Jesús Garzarán University of Illinois

Compiler vs. Manual TuningMatrix Matrix Multiplication

loop 1c[i*N+j] += a[i*N+k]*b[k*N+j]

loop 2c[i][j] += a[i][k]*b[k][j]

loop 3C += a[i][k]*b[k][j]

Page 8: Autotuning at Illinois María Jesús Garzarán University of Illinois

Compilers …

• Can and should improve

• But we will need other strategies (at least in the short term)

Page 9: Autotuning at Illinois María Jesús Garzarán University of Illinois

Outline

1. Why Autotuning?

2. What is Autotuning?

3. Research Problems

Page 10: Autotuning at Illinois María Jesús Garzarán University of Illinois

What is Autotuning

• An emerging strategy: empirical search– Goal: Automatically generate highly efficient code for each target

machine (and input set). – Programmers develop metaprograms (a program that generates

programs) that search the space of possible algorithms/implementations

Page 11: Autotuning at Illinois María Jesús Garzarán University of Illinois

Generator of the versions

High-level code

Source-to-source optimizer

Native compiler

Metaprogram:Decription of the space of versions

Object code

Execution

performance

Selectedcode

High-level code

Input data(training)

Autotuning with empirical search

Page 12: Autotuning at Illinois María Jesús Garzarán University of Illinois

Autotuning

• More laborious than conventional programming, but – Longer lifetime → cost reduction – Can accumulate experience → better results

– Can afford to search more extensively → better results

Page 13: Autotuning at Illinois María Jesús Garzarán University of Illinois

Examples of Existing Autotuning Systems

• ATLAS: Whaley, Petite, Dongarra (Tennessee)• BeBop: Demmel, Yelick, Im, Vuduc (Berkeley)

• Datamining: Jian, Garzarán, Snir (Illinois)• FFTW: Frigo (MIT)

• Illinois Sorting: Li, Garzarán, Padua (Illinois)• Matrix-matrix multiplication for GPU: Jiang, Snir (Illinois)• Phipac: Bilmes, Asanovic, Vuduc, Iyer, Demmel, Chin, Lan (Berkeley)• Space Pruning for GPU: Ryoo, Rodrigues,Stone, Baghsorkhi, Ueng,

Stratton, Hwu (Illinois)

• SPIRAL: Moura, Pueschel (CMU), Johnson (Drexel), Garzarán, Padua (Illinois)

• SPIKETune: Wong, Kuck (Intel), Sameh(Purdue), Padua (Illinois)

Page 14: Autotuning at Illinois María Jesús Garzarán University of Illinois

Outline

1. Why Autotuning?

2. What is Autotuning?

3. Research Problems

Page 15: Autotuning at Illinois María Jesús Garzarán University of Illinois

Generator of the versions

High-level code

Source-to-source optimizer

Native compiler

Metaprogram: Decription of the version space

Object code

Execution

Selectedcode

High-level code

Input data(training)

Autotuning with empirical search

What to do when performance depends on the input

How to specify the search space?

performanceWhat is performance(execution time, power)?

How to drive the search?

Page 16: Autotuning at Illinois María Jesús Garzarán University of Illinois

Research Issues

1. What to do when performance depends on input

2. Modeling/Search

3. Description of the space

4. What to tune

5. What to tune for

Very promising, but much to learn

Page 17: Autotuning at Illinois María Jesús Garzarán University of Illinois

Issue 1: Performance depends on input

• When performance depends on the input we must generate dynamically adapting routines. – Illustrated with the generation of sorting routines

[CGO04] Li, Garzarán, Padua. A Dynamically Tuned Sorting Library. In Proc. of the Int. Symp. on Code Generation and Optimization,2004.

[CGO05] Li, Garzarán, Padua. Optimizing Sorting with Genetic Algorithms. In Proc. of the Int. Symp. on Code Generation and Optimization 2005.

Page 18: Autotuning at Illinois María Jesús Garzarán University of Illinois

Issue 1: Sorting

• Different algorithms to perform sorting– Radix sort– Quick sort– Merge sort

• No single algorithm is the best for all inputs and platforms

Page 19: Autotuning at Illinois María Jesús Garzarán University of Illinois

Our Contribution

• Design of hybrid algorithms and use of genetic search to find sorting routines that automatically adapt to the target machine and the input characteristics.

• Result:– Generation of the fastest sorting routines for sequential and

parallel execution

Page 20: Autotuning at Illinois María Jesús Garzarán University of Illinois

20

Sorting

Perf

orm

ance

(ke

ys

per

cycl

e)

Intel Xeon

AMD Athlon MP

CC-Radix

Merge Sort

Quicksort

CC-Radix

Merge SortQuicksort

Same inputdifferent performance

Standard Deviation

Page 21: Autotuning at Illinois María Jesús Garzarán University of Illinois

21

Sorting

Perf

orm

ance

(ke

ys

per

cycl

e)

Intel Xeon

AMD Athlon MP

CC-Radix

Merge Sort

Quicksort

CC-Radix

Merge SortQuicksort

Standard Deviation

Page 22: Autotuning at Illinois María Jesús Garzarán University of Illinois

22

Divide with pivot

Select with entropy

Divide into block

Sorting Genome

< theta ≥ theta

Divide by digit

Hybrid sorting

for dynamic adaptation

Page 23: Autotuning at Illinois María Jesús Garzarán University of Illinois

23

Input

Divide with pivot

Select with entropy

Divide by digit

Divide into block

< theta ≥ theta

Example of hybrid sorting

Page 24: Autotuning at Illinois María Jesús Garzarán University of Illinois

24

Divide with pivot

Select with entropy

Divide into block

Input

< theta ≥ theta

Divide by digit

Example of hybrid sorting

Page 25: Autotuning at Illinois María Jesús Garzarán University of Illinois

25

Divide with pivot

Select with entropy

Divide into block

PivotBucket 1

Bucket 2

Input

< theta ≥ theta

Divide by digit

Example of hybrid sorting

Page 26: Autotuning at Illinois María Jesús Garzarán University of Illinois

26

Divide with pivot

Select with entropy

Divide into block

Pivot

Select operations based on entropy

Bucket 1

Bucket 2

Input

< theta ≥ theta

Divide by digit

Example of hybrid sorting

Page 27: Autotuning at Illinois María Jesús Garzarán University of Illinois

27

Divide with pivot

Select with entropy

Divide into block

Pivot

Select operations based on entropy

Bucket 1

Bucket 2

Input

Sorted

< theta ≥ theta

Divide by digit

Example of hybrid sorting

Page 28: Autotuning at Illinois María Jesús Garzarán University of Illinois

28

Divide with pivot

Select with entropy

Divide into block

Pivot

Select operations based on entropy

Bucket 1

Bucket 2

Input

Sorted Sorted

< theta ≥ theta

Divide by digit

Example of hybrid sorting

Page 29: Autotuning at Illinois María Jesús Garzarán University of Illinois

29

Divide with pivot

Select with entropy

Divide into block

Pivot

Select operations based on entropy

Bucket 1

Bucket 2

Input

Sorted Sorted

< theta ≥ theta

Divide by digit

Example of hybrid sorting

Page 30: Autotuning at Illinois María Jesús Garzarán University of Illinois

30

Divide with pivot

Select with entropy

Divide into block

Pivot

Select operations based on entropy

Bucket 1

Bucket 2

Input

Sorted

< theta ≥ theta

Divide by digit

Example of hybrid sorting

Page 31: Autotuning at Illinois María Jesús Garzarán University of Illinois

31

Target Machine

Learning Mechanism

Used at runtime

Training inputs

Mappinginput data ➔ best algorithm

Learning: Algorithm Selection

Page 32: Autotuning at Illinois María Jesús Garzarán University of Illinois

32

IBM Power3

26%

ClassifierSort

IBM ESSL

C++ STL

Results: Sequential Sorting

Page 33: Autotuning at Illinois María Jesús Garzarán University of Illinois

Results: Parallel SortingIntel Quad Intel Quad

CoreCore

Page 34: Autotuning at Illinois María Jesús Garzarán University of Illinois

Research Issues

1. Performance depends on input

2. Modeling/Search

3. Description of the space

4. What to tune

5. What to tune for

Page 35: Autotuning at Illinois María Jesús Garzarán University of Illinois

Issue 2: Modeling/Search

• When the search space is too big we must use models or better search mechanisms. Illustrated with:

1. An analytical model and hybrid approach for ATLAS[PLDI03] Yotov, Li, Ren, Cibulskis, DeJong, Garzarán, Padua, Pingali, Stodghill, and Wu. A

Comparison of Empirical and Model-driven Optimization. In PLDI, 2003.[Proc of IEEE] Yotov, Li, Ren, Garzarán, Padua, Pingali, and Stodghill. Is Search Really

Necessary to Generate High-Performance BLAS? In Proc. of the IEEE, 2005.[LCPC05] Epshteyn, Garzarán, Dejong, Padua, Ren, Li, Yotov and Pingali. Analytic Models

and Empirical Search: A Hybrid Approach to Code Optimization. In LCPC, 2005

2. Genetic search for sorting[CGO04, CG005]

Page 36: Autotuning at Illinois María Jesús Garzarán University of Illinois

36

ATLAS Modeling• ATLAS = Automated Tuned Linear Algebra Software,

developed by R. Clint Whaley, Antoine Petite and Jack Dongarra, at the University of Tennessee.

• ATLAS uses empirical search to automatically generate highly-tuned Basic Linear Algebra Libraries (BLAS). – Use search to adapt to the target machine

Page 37: Autotuning at Illinois María Jesús Garzarán University of Illinois

37

Our Contribution• Development of methods to speed-up the search process.

– Analytical models that replace the search– Hybrid models that combine models with empirical search

[LCPC05] Epshteyn, Garzarán, Dejong, Padua, Ren, Li, Yotov and Pingali. Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization. In LCPC, 2005

• The result– Same performance – Faster generation

Page 38: Autotuning at Illinois María Jesús Garzarán University of Illinois

38

ATLAS Infrastructure

DetectHardwareParameters

ATLAS SearchEngine(MMSearch)

NRMulAddLatency

L1SizeATLAS MMCode Generator(MMCase)

xFetchMulAddLatency

NBMU,NU,KU MiniMMM

Source

Compile,Execute,Measure

MFLOPS

DetectHardwareParameters

ATLAS MMCode Generator(MMCase)

ATLAS SearchEngine(MMSearch)

Page 39: Autotuning at Illinois María Jesús Garzarán University of Illinois

39

Modeling for Optimization Parameters

• Our Modeling Engine

• Optimization parameters– NB: Hierarchy of Models (later)– MU, NU:– KU: maximize subject to L1 Instruction Cache– Latency, MulAdd: from hardware parameters– xFetch: set to 2

DetectHardwareParameters

ATLAS SearchEngine(MMSearch)

NRMulAddLatency

L1I$Size ATLAS MMCode Generator(MMCase)

xFetchMulAddLatency

NBMU,NU,KU MiniMMM

Source

L1Size

Model

MU *NUMU NU LatencyRegisters

Page 40: Autotuning at Illinois María Jesús Garzarán University of Illinois

40

Modeling for Tile Size (NB)• Models of increasing complexity

– 3*NB2 ≤ C• Whole work-set fits in L1

– NB2 + NB + 1 ≤ C• Fully Associative• Optimal Replacement• Line Size: 1 word

– or

• Line Size > 1 word

– or

• LRU Replacement

B

N

M

A C

NB

NB

K

K

B

C

B

NB

B

NB

1

2

B

CNB

B

NB

1

2

B

C

B

NB

B

NB

B

NB

12

2

B

CNB

B

NB

13

2 A

M(I)

K

C

B

N (J)

KB

A

M(I)

K

C

B

N (J)

KL

Page 41: Autotuning at Illinois María Jesús Garzarán University of Illinois

41

MMM Performance• SGI R12000 • Sun UltraSparc III

• Intel Pentium III

0

100

200

300

400

500

600

0 1000 2000 3000 4000 5000

0

100

200

300

400

500

600

0 1000 2000 3000 4000 5000

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

BLAS COMPILER

ATLAS MODEL

MF

LO

PS

MF

LO

PS

MF

LO

PS

Page 42: Autotuning at Illinois María Jesús Garzarán University of Illinois

42

Models/Search

• Models reduce search time to 0.

• However, search is still necessary when a model does not exist.

Page 43: Autotuning at Illinois María Jesús Garzarán University of Illinois

43

Divide with pivotSelect with entropy

Divide into block

Sorting Genome

< theta ≥ theta

Divide by digit

Genetic search for sorting

Genetic operators are used to derive new offsprings:-Mutation (add, remove subtrees, change params)-Cross-over

Page 44: Autotuning at Illinois María Jesús Garzarán University of Illinois

Issue 2: Modeling/Search

We need tools to guide models and search:

P-Ray: Characterization of hardware

[LCPC05] Duchateau, Sidelnik, Garzarán, Padua. P-RAY: A Suite of Micro benchmarks for Multi-core Architectures. In LCPC, 2008.

Page 45: Autotuning at Illinois María Jesús Garzarán University of Illinois

45

Characterize Hardware

• P-Ray: Development of benchmarks to measure hardware characteristics of multicore platforms

DetectHardwareParameters

ATLAS SearchEngine(MMSearch)

NRMulAddLatency

L1I$Size ATLAS MMCode Generator(MMCase)

xFetchMulAddLatency

NBMU,NU,KU MiniMMM

Source

L1Size

Page 46: Autotuning at Illinois María Jesús Garzarán University of Illinois

46

Our Contribution• P-Ray: Tool to measure.

– Block Size– Cache Mapping– Processor Mapping– Effective Bandwidth

• The result– Correct results for 3 different platforms (Intel Xeon Haperton, Sun

UltraSparc T1 Niagara, Intel Core 2 Quad Kentsfield)

Page 47: Autotuning at Illinois María Jesús Garzarán University of Illinois

P-Ray:Processor Mapping

L2L2L2

Core 1

Core 3

L2L2L2

Core 5

Core 7

L2L2L2

Core 2

Core 4

L2L2L2

Core 6

Core 8

8 Core Intel Hapertown

Chip 1

Chip 2

Page 48: Autotuning at Illinois María Jesús Garzarán University of Illinois

Research Issues

1. Performance depends on input

2. Modeling/Search

3. Description of the space

4. What to automate

5. What to tune for

Page 49: Autotuning at Illinois María Jesús Garzarán University of Illinois

Issue 3:Description of the Space

• ATLAS generator is written in C

• We need more effective notations to implement a generator (describe the search space)

• Two possibilities:– Domain Specific Languages

– General Purpose Languages

Page 50: Autotuning at Illinois María Jesús Garzarán University of Illinois

Issue 3:Description of the Space

Illustrated with:

1. SPIRAL (Domain Specific Language) [Proc. Of IEEE05] Püschel, Moura, Johnson, Padua, Veloso, Singer, Xiong,

Franchetti, Gacic, Voronenko, Chen, Johnson, and Rizzolo. Spiral: Code Generation for DSP Transforms. Proc. Of IEEE, 2005.

http://www.spiral.net

2. Metalanguage (General Purpose Language)

[LCPC05] Donadio, Brodman, Roeder, Yotov, Barthou, Cohen, Garzarán, Padua and Pingali. A Language for the Compact Representation of Multiples Program

Versions. In LCPC 2005.

Page 51: Autotuning at Illinois María Jesús Garzarán University of Illinois

SPIRAL

• SPIRAL, generator of signal processing algorithms (DFT, DCT, WHT, filters, …)

• SPIRAL uses empirical search to generate routines that adapt to the target machine:– Sequential, parallel, SIMD, …

Page 52: Autotuning at Illinois María Jesús Garzarán University of Illinois

SPIRAL Contribution

• Declarative domain-specific language and rewriting rules to specify the search space.

• The result– Generation of routines that run faster than IPP (manually tuned)– Intel has started to use SPIRAL to generate parts of the IPP

library

Page 53: Autotuning at Illinois María Jesús Garzarán University of Illinois

SPIRAL

• Search based on breakdown and re-writing rules:

This is SPL, SPIRALmetalanguage

Page 54: Autotuning at Illinois María Jesús Garzarán University of Illinois

54

SPIRAL Program Generation

Transform

Rule

SPL Formula

PDFTIDIDFTDFT mnmnnm

parameterized matrix

• a breakdown strategy (Cooley Tukey) • product of sparse matrices

DFTp

Ruletree8DFT

2DFT 4DFT

2DFT 2DFT

8DFT

DFT 24DFT

2DFT 2DFT(a) (b)

(a)

(b)

PFIIDIFDFT 222428

DFT8 ( F2 I2 ... I2)D I2 F2 P

CT

CT

CT

CT

Page 55: Autotuning at Illinois María Jesús Garzarán University of Illinois

SPIRAL Program Generation

Page 56: Autotuning at Illinois María Jesús Garzarán University of Illinois

SPIRAL

• Why is search important?

– Different formulas (algorithms) have different execution times• They differ in the memory access pattern• Have different ILP

Page 57: Autotuning at Illinois María Jesús Garzarán University of Illinois

SPIRAL Performance Results

Page 58: Autotuning at Illinois María Jesús Garzarán University of Illinois

Metaprogramming

• General-purpose programming of autotuned libraries and applications.

• A metaprogram contains a compact description of the space of program versions and how to proceed with the search.

Page 59: Autotuning at Illinois María Jesús Garzarán University of Illinois

Metaprogram example

%try s in {2,4,8}for j=1 to 128 by %s %for k=j to j+s-1 a(%k) = …

for j=1 to 128 by 4 a(j) = … a(j+1) = … a(j+2) = … a(j+3) = …

for j=1 to 128 by 2 a(j) = … a(j+1) = …

for j=1 to 128 by 8 a(j) = … a(j+1) = … a(j+2) = … a(j+3) = … a(j+4) = … a(j+5) = … a(j+6) = … a(j+7) = …

Search strategy

Program shapefor each value

Page 60: Autotuning at Illinois María Jesús Garzarán University of Illinois

Research Issues

1. Performance depends on input

2. Modelling/Search

3. Description of the space

4. What to tune

5. What to tune for

Page 61: Autotuning at Illinois María Jesús Garzarán University of Illinois

Issue 4: What to tune

1. Kernels (MMM, FFT, sorting, …)

2. Codelets

3. Primitives

Page 62: Autotuning at Illinois María Jesús Garzarán University of Illinois

Codelets

• A class of (short) code sequences that appear often in an application domain

• The set of codelets should cover much of the execution domain

• Applications are decomposed into codelets

• Codelets are autotuned

Page 63: Autotuning at Illinois María Jesús Garzarán University of Illinois

Codelets

• Need a database of codelets– Each codelet in the database contains a set of compiler

optimizations

• Application is decomposed in codelets that are matched against the codelets in the database – Application codelets are optimized using the set of optimizations

of the matched codelet in the database

• Collaboration with David Kuck and David Wong, INTEL

Page 64: Autotuning at Illinois María Jesús Garzarán University of Illinois

Primitive Operations

• Same as codelets, but not identified automatically by the compiler

• The user is expected to write the application using primitives

• The primitives operations are tuned for each target platform

Page 65: Autotuning at Illinois María Jesús Garzarán University of Illinois

Example of Primitive Operations

• HTA : Hierarchically Tiled Arrays

[PPoPP06] Bikshandi, Guo, Hoeflinger, Almasi, Fraguela, Garzarán, Padua, and von Praun. Programming for Parallelism and Locality with Hierarchically Tiled. In PPoPP, 2006.

[PPoPP08] Guo, Bikshandi, Fraguela, Garzarán, and Padua. Programming with Tiles.In PPoPP 2008.

Page 66: Autotuning at Illinois María Jesús Garzarán University of Illinois

Hierarchically Tiled Arrays (HTAs)

• HTA is a data type where tiles are explicit

• HTAs are manipulated with data parallel primitives– HTA programs look sequential programs where parallelism is

encapsulated into the data parallel primitives

• Result– Programs that run as fast as MPI (test with NAS benchmarks)– Fewer lines of code– Portable codes

Page 67: Autotuning at Illinois María Jesús Garzarán University of Illinois

FFT using HTA parallel primitives

Can be autotuned

Page 68: Autotuning at Illinois María Jesús Garzarán University of Illinois

Data Parallel Primitives

• Challenge:

Can we extend data parallel primitive operations to other complex data types, such as sets, trees, graphs?

Page 69: Autotuning at Illinois María Jesús Garzarán University of Illinois

Research Issues

1. Performance depends on input

2. Modeling/Search

3. Description of options/space search

4. What to tune

5. What to tune for

Page 70: Autotuning at Illinois María Jesús Garzarán University of Illinois

Issue 5: What to tune for

1. Execution Time (All the previous systems)

2. Power (Preliminary data in next slides)

3. Space

4. Reliability

Page 71: Autotuning at Illinois María Jesús Garzarán University of Illinois

71

Power in SPIRAL

• Processors allow software control of operating frequency and voltage

• e.g. Intel Pentium M 770 has 6 settings– 2.13 GHz at 1.340 volt (max performance)– 800MHz at 0.988 volt (min power/energy)

Page 72: Autotuning at Illinois María Jesús Garzarán University of Illinois

72

Experimental Setup

• Intel Pentium M model 770 – <2133MHz, 1.34V>, <1866MHz, 1.27V>, <1600MHz, 1.2V>, <1333MHz ,

1.13V>, <1067MHz, 1.06V>, <800MHz, 0.99V>

• Measurements– HW: Agilent 34134A current probe and Agilent 34401A DMM– SW: SPIRAL controlled automatic runtime and energy measurement routine

• Optimization space– voltage-frequency scaling

Page 73: Autotuning at Illinois María Jesús Garzarán University of Illinois

73

Dynamic voltage-frequency scaling

• Use of voltage scaling instructions– CPU bound region --> run at high frequency– Memory bound region --> run at low frequency

• Minimum impact on execution time and significant reduction in energy consumption

Page 74: Autotuning at Illinois María Jesús Garzarán University of Illinois

74

0

5

10

15

20

25

30

35

40

45

1 201 401 601 801 1001 1201 1401 1601 1801

Dynamic voltage-frequency scaling: memory profile

Time

Cach

e m

iss ra

tio

Each point shows the cache miss ratio every 100 seconds

WHT-219 (out-of-cache)

Zoom

Page 75: Autotuning at Illinois María Jesús Garzarán University of Illinois

75

Dynamic voltage-frequency scaling: memory profile

Cach

e m

iss ra

tio

Each point shows the cache miss ratio every 100 seconds

WHT-219 (out-of-cache)

Time

0

5

10

15

20

25

30

35

18000 19000 20000 21000 22000 23000 24000 25000 26000

low frequency

high frequency

Page 76: Autotuning at Illinois María Jesús Garzarán University of Illinois

76

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0.075 0.08 0.085 0.09 0.095 0.1 0.105

Dynamic voltage-frequency scaling: results

Ener

gy (J

oule

s)

WHT-219

Execution Time (Seconds)

Energy versus execution time

Page 77: Autotuning at Illinois María Jesús Garzarán University of Illinois

77

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0.075 0.08 0.085 0.09 0.095 0.1 0.105

Same exec. time10% less energy

Dynamic voltage-frequency scaling: results

Ener

gy (J

oule

s)

Execution Time (Seconds)

Energy versus execution time

Dynamic Voltage Scaling

Same energyless execution time

WHT-219

Page 78: Autotuning at Illinois María Jesús Garzarán University of Illinois

78

0

200

400

600

800

1000

1200

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Compiler Optimizations (Future work)

Iterations

Ca

che

mis

s ra

tio

Apply dependence analysis and group together iterations

with similar cache miss ratio

increases the benefit of dynamic voltage scaling

0

200

400

600

800

1000

1200

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Iterations

Page 79: Autotuning at Illinois María Jesús Garzarán University of Illinois

Research Agenda

1. Performance depends on input

2. Modeling/Search

3. Description of the space

4. What to automate

5. What to tune for

Page 80: Autotuning at Illinois María Jesús Garzarán University of Illinois