22
IBM Research - Tokyo June 15, 2012 | ISMM 2012 at Beijing, China © 2012 IBM Corporation Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

IBM Research - Tokyo

June 15, 2012 | ISMM 2012 at Beijing, China © 2012 IBM Corporation

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

Hiroshi Inoue and Toshio NakataniIBM Research - Tokyo

Page 2: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

2

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Motivation and GoalCache miss information from Hardware Performance Monitor (HPM) is useful for runtime optimizations

– Prefetch injection: Adl-Tabatabai et al. 2004

– Object placement optimization: Schneider et al. 2007, etc..

HPM is difficult to use in the real world– HPM may require a special device driver and root privilege

– Only one process can use hardware at a time

To enable runtime optimization by identifying the sources of cache misses without using HPM

Our Goal

Page 3: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

3

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Presentation Overview

Motivation and Goal

Analysis– Key Observation– Our Technique– Evaluation in Coverages

Optimization– Object Alignment and Collocation Optimizations– Our Techniques– Performance Evaluation

Summary

Page 4: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

4

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Key Observation

Many cache misses in Java programs are caused in a simple idiomatic code pattern

We can heuristically identify instructions and classes that may cause frequent cache misses by matching hot loops with the idiomatic pattern

load a reference and touch the referenced object in a hot loop

Pattern

Page 5: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

5

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

So Simple Basic Code Pattern Tends to Cause Frequent Cache Misses

ClassA objA;while (!end) {

...objA = ...;...access to objA; ...

}

access to the referenced objecta field access (load/store)a metadata access (checkcast, monitor enter)

load a reference from a field of an object or a return value of a method call

within a hot loopdetected by software-based profiling

This access to objA tends to cause frequent cache misses

miss

Page 6: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

6

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Anti-Pattern That Rarely Causes Cache Misses

ClassA objA;while (!end) {

...objA = ...;...access to objA; ...

}

access to the referenced objecta field access (load/store)a metadata access (checkcast, monitor enter)

within a hot loopdetected by software-based profiling

This access to objA does NOT cause frequent cache misses

if the first load is loop invariant

Page 7: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

7

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Implementation

We implemented the analysis in 32-bit IBM J9/TR JVM Java 6 SR2We execute pattern matching in JIT compiler

– after applying optimizations including method inlining and loop-invariant code motion

– using execution frequency information obtained by software-based profiling to identify the hot loops

– only for hot methods that are recompiled with higher optimization levels than the initial level

Page 8: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

8

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Evaluation

EnvironmentProcessor: POWER6 4.0GHz

– 64-KB L1D cache, 64-KB L1I cache– 4-MB unified L2 cache– 128-byte cache line for both L1 and L2 cache

OS: RedHat Enterprise Linux 5.2Benchmark: SPECpower_ssj2008, SPECjbb2005, SPECjvm2008, DaCapo-9.12

We show coverages by instructions identified by our technique forStatic count of load and store instructionsL1D cache missesL2 cache data missesMemory accesses

Page 9: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

9

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Coverages by Instructions Selected by Our Technique

total in JIT-compiledcode

0%10%20%30%40%50%60%70%80%90%

100%

static count ofload/store

instructions

L1D miss L2 data miss

total in hot methods(upper boundfor us)

selectedby ourtechnique

(average of all benchmarks, see the paper for more detail)

☺ Our technique selects only 2.8% of load and store instructions and they cover about 50% of the total cache misses compared to total in JIT-compiled code

Page 10: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

10

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Coverages by Instructions Selected by Our Technique

total in JIT-compiledcode

0%10%20%30%40%50%60%70%80%90%

100%

static count ofload/store

instructions

L1D miss L2 data miss

total in hot methods(upper boundfor us)

selectedby ourtechnique

(average of all benchmarks, see the paper for more detail)

☺ Our technique selects 14% of load and store instructions and they cover 64% in L1 miss and 69% in L2 miss compared to total in hot methods

Page 11: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

11

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

0%10%20%30%40%50%60%70%80%90%

100%

Memory access L1D miss L2 data miss

Not Only Accessed Frequently

total in JIT-compiledcode

total in hot methods(upper boundfor us)

selectedby ourtechnique

(average of all benchmarks, see the paper for more detail)

☺ Instructions selected by our technique causes about 2x more cache misses per executionthan the instructions not selected

Page 12: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

12

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Presentation Overview

Motivation and Goal

Analysis– Key Observation– Our Technique– Evaluation in Coverages

Optimization– Object Alignment and Collocation Optimizations– Our Techniques– Performance Evaluation

Summary

Page 13: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

13

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Application in Runtime Optimization

We implemented two object placement optimizations to reduce cache misses

– Object alignment

– Object collocation

•••

cache line (128 byte)

objAhot fields

••• ••• ObjAhot fields

•••padding

align object

••••••

cache line (128 byte) reference

••••••

reference

objAcollocate

two objectsobjB objBobjA

Both techniques can reduce cache misses from two to one

Page 14: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

14

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Our Approach for Optimizations

We identify target classes for optimization based on our analysis in JIT compiler

– If two accesses to distinct fields of a object are selected in a hot loop

select the class for target of alignment optimization– If two accesses to objects, where one has a reference to

another, are selected in a hot loopselect the pair of classes for targets of collocation

optimization

We do optimizations both in garbage collector (for objects in tenure space) or at allocation time (for objects in nursery space)

Page 15: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

15

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Pattern for Alignment Optimization

Derived from the basic pattern

ClassA objA;while (!end) { // in a hot loop

...// 1) first, load a reference to a ClassA’s instanceobjA = ...;...// 2) then, access at least two different fields of objAaccess to objA.field1; ...access to objA.field2;...

}

objA

loop variant load

ClassA is selected for target of alignment optimization

Page 16: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

16

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Pattern for Collocation Optimization

ClassA objA;ClassB objB;while (!end) { // in a hot loop

...// 1) first, load a reference to a ClassA’s instanceobjA = ...;...// 2) next, load a reference of ClassB from objAobjB = objA.referenceToClassB;...// 3) then, access at least one field of objBaccess to objB.field1; ...

}

objA

objB

loop variant load

pair of ClassA and ClassB is selected for target of collocation optimization

Page 17: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

17

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Special Handling For checkcast OperationClassA objA;ClassS objS; // ClassS is a super class of ClassAwhile (!end) { // in a hot loop

...// 1) first, load a reference of a super class of ClassAobjS = ...;...// 2) next, cast objC to ClassA (access to object header)objA = (ClassA) objS;...// 3) then, access at least one field of objAaccess to objA.field1;...

}

objA

loop variant load

header accessed bycheckcast

ClassA (not ClassS) is selected for target of alignment optimizationCommon pattern in HashMap or TreeMap accesses in Java

Page 18: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

18

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Performance Improvements by Optimizations

high

er is

fast

er

Object collocationObject alignment

Our technique achieved comparable performance gains in cache-miss-intensive programs without relying on the hardware help

0.9

0.95

1

1.05

1.1

1.15

SPECpower_

ssj20

08

SPECjbb20

05

SPECjvm20

08

DaCap

o-9.12

rela

tive

thro

ughp

ut .

Baseline (w/o alignment optimization)Our software-only optimizationHPM-based optmization

0.9

0.95

1

1.05

1.1

1.15

SPECpower_

ssj20

08

SPECjbb20

05

SPECjvm20

08

DaCap

o-9.12

Baseline (w/o collocation optimization)Our software-only optimizationHPM-based optmization

(non-zeroorigin)

non intensivecache miss intensivecache miss intensive non intensive

Page 19: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

19

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Remaining Challenges

For Optimizations– Pattern matching in compiler cannot tell us the location of the

objects in the Java heap (e.g. tenure or nursery)– An instance of a subclass of the identified target may cause

cache misses More detailed software-based profiling can help (in trade for additional overhead)

For Analysis– Current pattern matching cannot identify frequent cache misses

caused by conflicting writes from multiple threadDifferent patterns and profiling information is required to achieve higher coverage

Page 20: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

20

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Summary

We present a technique to identify the instructions and objects that frequently cause cache misses in Java programs without relying on the HPM

Matching hot loops with simple idiomatic patterns worked very well for many Java programs

We showed the effectiveness of our approach using two types of optimizations

Page 21: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

21

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

backup

Page 22: Identifying the Sources of Cache Misses in Java Programs ......2 IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters

22

IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters © 2012 IBM Corporation

Coverage for each benchmark (L1D cache misses)

0%10%20%30%40%50%60%70%80%90%

100%

SPECpower_

ssj20

08SPECjbb

2005

compil

er.co

mpiler

compil

er.su

nflow

compre

ssde

rbympe

gaud

iose

rial

sunfl

owxm

l.tran

sform

xml.v

alida

tion

avror

aba

tikec

lipse fop h2

jytho

nlui

ndex

lusea

rch pmd

sunfl

owtom

cat

trade

bean

stra

deso

apxa

lanav

erage

cove

rage

(rat

io to

the

tota

l num

ber o

f eve

nts

cau

sed

in J

IT-c

ompi

led

code

)

Our technique with Aggressive threshold All in hot methods (upper bound for us) All in JITted code