Upload
kenji-kazumura
View
1.037
Download
0
Embed Size (px)
Citation preview
2017.11.18
数村 憲治
CPUから見たG1GC
Copyright 2017 FUJITSU LIMITED
JJUG CCC 2017 Fall
C-7
アジェンダ
Copyright 2017 FUJITSU LIMITED
モチベーション
GCの歴史
PA分析
JITとGC
最後に
1
自己紹介
Copyright 2017 FUJITSU LIMITED
Professional in Java Core Tech
JCP Executive Committee
JSR382 Configuration API Expert Group
Javaオンラインコース
https://directshop.fom.fujitsu.com/shop/commodity_param/ctc/el_middleit/shc/0/cmc/ASP03737
https://directshop.fom.fujitsu.com/shop/commodity_param/ctc/el_middleit/shc/0/cmc/ASP03738
@kkzr
2
アジェンダ
Copyright 2017 FUJITSU LIMITED
モチベーション
GCの歴史
PA分析
JITとGC
最後に
3
0
10
20
30
40
50
60
g1 gc parallel gc shenandoah gc
GCトレンド
Copyright 2017 FUJITSU LIMITED
https://trends.google.com
4
JDK 9 のデフォルトGCが G1GC
スループットとポーズタイムを同時に実現
G1GC
Copyright 2017 FUJITSU LIMITED
https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector.htm
It attempts to meet garbage collection pause-time
goals with high probability while achieving high
throughput with little need for configuration.
HotSpot Virtual Machine Garbage Collection Tuning Guide
5
SPECjbb2015
Copyright 2017 FUJITSU LIMITED
SPECjbb2015ではG1GCは使われていない
https://www.spec.org/jbb2015/results/res2017q4/jbb2015-20171011-00259.html
スループット(max-jOPS)とレスポンス(critical-jOPS)の2つの指標
critical-jOPSのワールドレコード(2017/11現在)
6
大容量メモリ対応
Copyright 2017 FUJITSU LIMITED
The Garbage-First (G1) garbage collector is targeted for
multiprocessor machines with a large amount of
memory.
https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector.htm
HotSpot Virtual Machine Garbage Collection Tuning Guide
大容量メモリ搭載マシンをターゲット
ご希望に応えて、大容量メモリ搭載マシンで性能比較
7
1 public class GCTest extends Thread2 {3 static final int N = 7000000;4 static final int M = 4;5 static GCTest[] g;6 Object[] objs = new Object[N];78 public static void main(String ... arg) throws Exception9 {10 g = new GCTest[M];11 for (int i = 0 ; i < M ; ++i)12 g[i] = new GCTest();1314 System.out.println("warm up ...");15 for (int i = 0 ; i < N ; ++i)16 for (int j = 0 ; j < M ; ++j)17 g[j].doIt(i);1819 System.out.println("start");2021 for (int j = 0 ; j < M ; ++j)22 g[j].start();23 }24 public void run() {25 long start = System.currentTimeMillis();26 for (int j = 0 ; j < 60 ; ++j)27 for (int i = 0 ; i < N ; ++i)28 doIt(i);29 long end = System.currentTimeMillis();30 System.out.println("time: " + (end-start) + "ms");31 }3233 void doIt(int i) {34 if (objs[i] == null)35 objs[i] = new X();36 else37 objs[i] = g[i%M].objs[i];38 }3940 static class X {41 byte[] b = new byte[128];42 }
ソース
Copyright 2017 FUJITSU LIMITED8
デモ
Copyright 2017 FUJITSU LIMITED9
結果
Copyright 2017 FUJITSU LIMITED
G1GC ParallelGC
Xms/Xmx 96GB/96GB
プログラム 同じ
GC発生 なし
実行結果 16秒 2.5秒
10
アジェンダ
Copyright 2017 FUJITSU LIMITED
モチベーション
GCの歴史
PA分析
JITとGC
最後に
11
メモリ解放処理時間
Copyright 2017 FUJITSU LIMITED
アプリ処理
メモリ解放処理C/C++
Java
実行時間分布
メモリ解放処理にかかるトータル時間は変わらなそう
マルチコア環境では総ポーズ時間に加えスループットも問題
C/C++
Java シリアルGC
12
マルチコア環境で2系統のGC
Copyright 2017 FUJITSU LIMITED
アプリ処理
メモリ解放処理C/C++
Java
マルチコアでGCを集中処理
GC専用コアでバックグランド処理
Java
パラレルGC
コンカレントGC/G1GC
13
GC比較
Copyright 2017 FUJITSU LIMITED
Serial Parallel Concurrent G1
アプリ停止時間(NEW世代)
長い 短い 短い 短い
アプリ停止時間(OLD世代)
長い 短い かなり短い かなり短い
GC実行時間(アプリ処理への影響)
長い 短い 長い 長い
用途 クライアント スループット重視
レスポンス重視
スループット・レスポンス
14
アジェンダ
Copyright 2017 FUJITSU LIMITED
モチベーション
GCの歴史
PA分析
JITとGC
最後に
15
PA
Copyright 2017 FUJITSU LIMITED
CPUの性能統計情報
Solarisではcpustatやcputrack、Linuxではperfなど
load命令の実行回数とか分岐ミスの回数など
CPU使用率の高い時の分析に有効
採取ツール
Developer Studio
JIT翻訳コードとjavaメソッドの対応
16
Developer Studio
Copyright 2017 FUJITSU LIMITED
http://www.oracle.com/technetwork/jp/server-storage/developerstudio/overview/index.html
collectコマンドで採取
er_printコマンドで可視
化% cat scr
outfile result.txt
viewmode machine
metrics e+cycle_counts:e+effective_instruction_counts
func
% er_print -script scr test.1.er
% collect –h cycle_counts,on,effective_instruction_counts,on –j on java –
Xmx96g –Xms96g GCTest
17
関数一覧
Copyright 2017 FUJITSU LIMITED
サイクル数 命令数8.4E+10 1.2E+11 <Total>8.3E+10 1.2E+11 GCTest.run()9.3E+8 1.1E+9 GCTest.run()1.9E+8 4.6E+7 GCTest.run()6.4E+7 1.5E+8 Interpreter6.4E+7 3.8E+7 GCTest.doIt(int)3.2E+7 0 GCTest.run()
0 0 <Unknown>0 0 AbstractCompiler::nsic_available‥0 0 AddPNode::Ideal(PhaseGVN*,bool)0 0 AddPNode::bottom_type()const0 0 AdvancedThresholdPolicy::common‥0 0 AdvancedThresholdPolicy::method‥‥
18
PA分析
Copyright 2017 FUJITSU LIMITED
サイクル数 命令数 CPI IPC (実行時間)
Parallel 8.4E+10 1.2E+11 0.7 1.4 (2.5秒)
G1 7.7E+11 4.2E+11 1.8 0.5 (16秒)
命令数が3倍
サイクル数が9倍
同じプログラムなのに、なぜ命令数が増えているのか?
なぜ命令数以上にサイクル数が増えているのか?
19
PA分析(サイクル数分布)
Copyright 2017 FUJITSU LIMITED
49.2%
22.6%
19.5%
8.8%
8.0%
5.3%
1.2%
2.7%
0.7%
16.3%
21.6%
42.0%
2.3%
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
Parallel G1
PARALLELV.S.
コミット 計算待 ブランチ実行 ブランチミス L2$ミス L1$ミス その他
Parallel v.s. G1
20
関数一覧(ParallelGC)
Copyright 2017 FUJITSU LIMITED
サイクル数 命令数8.4E+10 1.2E+11 <Total>8.3E+10 1.2E+11 GCTest.run()9.3E+8 1.1E+9 GCTest.run()1.9E+8 4.6E+7 GCTest.run()6.4E+7 1.5E+8 Interpreter6.4E+7 3.8E+7 GCTest.doIt(int)3.2E+7 0 GCTest.run()
0 0 <Unknown>0 0 AbstractCompiler::nsic_available‥0 0 AddPNode::Ideal(PhaseGVN*,bool)0 0 AddPNode::bottom_type()const0 0 AdvancedThresholdPolicy::common‥0 0 AdvancedThresholdPolicy::method‥‥
21
関数一覧(G1GC)
Copyright 2017 FUJITSU LIMITED
サイクル数 命令数7.7E+11 4.2E+11 <Total>3.7E+11 2.2E+11 GCTest.run()1.2E+11 8.1E+10 OtherRegionsTable::add_refere‥‥9.5E+10 1.7E+10 ObjArrayKlass::oop_oop_iterate‥‥7.2E+10 8.8E+10 G1UpdateRSOrPushRefOopClosure‥‥5.0E+10 5.2E+9 G1RemSet::refine_card(signed ‥‥5.0E+10 2.1E+9 G1HotCardCache::insert(signed ‥‥4.0E+9 2.2E+9 GCTest.run()1.9E+9 2.8E+9 HeapRegion::oops_on_card_seq_ite‥‥1.4E+9 1.9E+9 RefineCardTableEntryClosure::do‥‥1.1E+9 4.0E+8 DirtyCardQueueSet::apply_closure‥‥9.0E+8 9.5E+8 ObjArrayKlass::oop_oop_iterate‥‥5.1E+8 5.2E+8 G1CardCounts::add_card_count(‥‥
22
PA分析(GCTest.run)
Copyright 2017 FUJITSU LIMITED
8.4E+10
Parallel G1
3.7E+11
7.7E+11Total
Total
GCTest.run()
サイクル数
GCTest.run()
1.2E+112.2E+10
Parallel G1
Total
Total
GCTest.run() GCTest.run()
命令数
4.2E+10
8.3E+10
23
Call Tree(ParallelGC)
Copyright 2017 FUJITSU LIMITED
サイクル数 命令数8.4E+10 1.2E+11 +-<Total>8.4E+10 1.2E+11 +-_lwp_start8.4E+10 1.2E+11 | +-thread_native_entry8.4E+10 1.2E+11 | | +-JavaThread::run()8.4E+10 1.2E+11 | | | +-JavaThread::thread_main_‥‥8.4E+10 1.2E+11 | | | +-thread_entry(JavaThread‥‥8.4E+10 1.2E+11 | | | | +-JavaCalls::call_virtua‥8.4E+10 1.2E+11 | | | | +-JavaCalls::call_virtu‥8.4E+10 1.2E+11 | | | | +-JavaCalls::call_hel‥8.4E+10 1.2E+11 | | | | +-call_stub8.3E+10 1.2E+11 | | | | +-GCTest.run()9.3E+8 1.1E+9 | | | | +-GCTest.run()2.2E+8 6.9E+7 | | | | +-GCTest.run()3.2E+7 2.3E+7 | | | | | +-GCTest.doIt(in‥6.4E+7 1.7E+8 | | | | +-Interpreter
24
Call Tree(G1GC)(1/2)
Copyright 2017 FUJITSU LIMITED
サイクル数 命令数7.7E+11 4.2E+11 +-<Total>7.7E+11 4.2E+11 +-_lwp_start7.7E+11 4.2E+11 | +-thread_native_entry3.9E+11 2.0E+11 | | +-ConcurrentGCThread::run()3.9E+11 2.0E+11 | | | +-ConcurrentG1RefineThread::r‥3.9E+11 2.0E+11 | | | | +-DirtyCardQueueSet::appl‥‥3.9E+11 2.0E+11 | | | | | +-RefineCardTableEntryClo‥3.9E+11 2.0E+11 | | | | | | +-G1RemSet::refine_ca‥‥2.9E+11 1.9E+11 | | | | | | | +-HeapRegion::oops‥‥2.9E+11 1.9E+11 | | | | | | | | +-ObjArrayKlass::‥1.2E+11 8.1E+10 | | | | | | | | | +-OtherRegionsTa‥
・・・
25
Call Tree(G1GC)(2/2)
Copyright 2017 FUJITSU LIMITED
サイクル数 命令数・・・
3.7E+11 2.2E+11 | | +-JavaThread::run()3.7E+11 2.2E+11 | | | +-JavaThread::thread_main_in‥‥3.7E+11 2.2E+11 | | | +-thread_entry(JavaThread*,‥‥3.7E+11 2.2E+11 | | | | +-JavaCalls::call_virtual‥‥3.7E+11 2.2E+11 | | | | +-JavaCalls::call_virtual‥3.7E+11 2.2E+11 | | | | +-JavaCalls::call_help‥‥3.7E+11 2.2E+11 | | | | +-call_stub3.7E+11 2.2E+11 | | | | +-GCTest.run()4.5E+8 3.1E+7 | | | | | +-PtrQueue::enqueue‥2.2E+8 0 | | | | | | +-PtrQueueSet::all‥9.6E+7 0 | | | | | | | +-Monitor::lock‥9.6E+7 0 | | | | | | | | +-Monitor::ILo‥9.6E+7 0 | | | | | | | | +-Monitor::Tr‥3.2E+7 0 | | | | | | | +-Monitor::IUnlo‥
26
CPU使用率(Parallel指定時)
Copyright 2017 FUJITSU LIMITED
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl0 0 0 25 221 1 14 0 4 0 0 1 0 2 0 98
…8 0 0 3 10 0 6 0 3 0 0 0 0 0 0 1009 0 0 0 1 0 0 0 0 0 0 0 0 0 0 10010 0 0 0 7 0 0 4 0 0 0 0 100 0 0 0
…29 0 0 0 1 0 0 0 0 0 0 0 0 0 0 10030 0 0 0 5 0 0 4 0 0 0 0 100 0 0 031 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
…44 0 0 0 1 0 0 0 0 0 0 0 0 0 0 10045 0 0 0 5 0 0 4 0 0 0 0 100 0 0 046 0 0 0 1 0 0 0 0 0 0 0 0 0 0 10047 0 0 0 1 0 0 0 0 0 0 0 0 0 0 10048 0 0 0 1 0 0 0 0 0 0 0 0 0 0 10049 0 0 0 7 0 0 4 0 0 0 0 100 0 0 050 0 0 0 1 0 0 0 0 0 0 0 0 0 0 10051 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
…
27
CPU使用率(G1指定時)
Copyright 2017 FUJITSU LIMITED
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl0 0 0 50 213 0 0 7 0 0 0 0 99 1 0 01 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
…5 0 0 0 7 0 0 5 0 0 0 0 99 1 0 0
…17 0 0 1 6 1 2 0 0 0 0 516 1 0 0 99
…22 0 0 0 7 0 0 5 0 0 0 0 100 0 0 0
…26 0 0 128 17 0 18 7 0 0 0 137 99 0 0 1
…45 0 0 0 6 0 0 5 0 0 0 0 100 0 0 0
…49 0 0 0 7 0 0 5 0 0 0 0 100 0 0 0
…58 0 0 9 10 2 0 7 0 0 0 9 100 0 0 059 0 0 0 1 0 0 0 0 0 0 0 0 0 0 10060 0 0 0 1 0 0 0 0 0 0 0 0 0 0 10061 0 0 1 135 0 255 4 1 0 0 127 48 0 0 5262 0 0 0 6 0 0 5 0 0 0 0 100 0 0 0
…
28
アセンブラ(G1GC)(GCTest.run)
Copyright 2017 FUJITSU LIMITED
サイクル数 命令数 L1$ミス0 0 0 [34] 3fc: mov 32, %g1
3.2E+7 0 0 [34] 400: cmp %l7, %g1
3.2E+7 3.1E+7 0 [34] 404: be,pn %icc,0x2f0
3.2E+7 0 0 [34] 408: nop
0 6.2E+7 0 [34] 40c: ldx [%g2 + 904], %l2
0 0 0 [34] 410: ldx [%g2 + 896], %l7
0 3.1E+7 0 [34] 414: membar #StoreLoad
8.0+E8 8.3+E8 5.5+E8 [34] 418: ldsb [%o0], %g1
3.2E+7 0 0 [34] 41c: cmp %g1, 0
2.9E+8 3.1E+7 2.5+E8 [34] 420: be,pn %icc,0x2f0
6.4E+7 0 6.2E+7 [34] 424: nop
0 0 0 [34] 428: cmp %l2, 0
3.2E+7 0 0 [34] 42c: bne,pn %xcc,0x44c
0 0 0 [34] 430: clrb [%o0]
0 0 0 [34] 434: mov %g2, %o1
0 0 0 [34] 438: call 0x1542cce0 ! (Unable to (‥0 0 0 [34] 43c: mov %g2, %l7
29
メモリバリア
Copyright 2017 FUJITSU LIMITED
http://gee.cs.oswego.edu/dl/jmm/cookbook.html
各CPUで微妙に違うが、大まかな概念はだいたい同じ
4種類のメモリバリア
JSR-133 Cookbook
LoadLoad/StoreStore/LoadStore/StoreLoad
CPUから見て、StoreLoadが最もコスト大
The sequence: Store1; StoreLoad; Load2
ensures that Store1's data are made visible to other
processors (i.e., flushed to main memory) before
data accessed by Load2 and all subsequent load
instructions are loaded.
30
アジェンダ
Copyright 2017 FUJITSU LIMITED
モチベーション
GCの歴史
PA分析
JITとGC
最後に
31
hsdis
Copyright 2017 FUJITSU LIMITED
/export/home/JDK/jdk-9.0.1/bin/java -Xms96g -Xmx96g -verbose:gc
-XX:+UnlockDiagnosticVMOptions '-
XX:CompileCommand=print,GCTest.run()' GCTest | & tee grun.asm
https://github.com/AdoptOpenJDK/jitwatch/wiki/Building-hsdis
https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly
HotSpot disassembler
使用方法 (SPARCの場合。他も同様。)
hsdis-sparcv9.soをビルド
hsdis-sparcv9.soを${JDK}/lib/serverへコピー
またはLD_LIBRARY_PATHに設定
オプションを指定してjavaコマンドの実行
32
CompileCommand: print GCTest.run()Java HotSpot(TM) 64-Bit Server VM warning: printing of assembly code is enabled;turning on DebugNonSafepoints to gain additional outputwarm up ...startCompiled method (c1) 8386 147 % 3 GCTest::run @ 15 (78 bytes)total in heap [0xffffffff5cc42b90,0xffffffff5cc44440] = 6320relocation [0xffffffff5cc42d00,0xffffffff5cc42ef8] = 504main code [0xffffffff5cc42f00,0xffffffff5cc43ac0] = 3008stub code [0xffffffff5cc43ac0,0xffffffff5cc43d48] = 648oops [0xffffffff5cc43d48,0xffffffff5cc43d78] = 48metadata [0xffffffff5cc43d78,0xffffffff5cc43e28] = 176scopes data [0xffffffff5cc43e28,0xffffffff5cc44040] = 536scopes pcs [0xffffffff5cc44040,0xffffffff5cc443d0] = 912dependencies [0xffffffff5cc443d0,0xffffffff5cc443d8] = 8handler table [0xffffffff5cc443d8,0xffffffff5cc44420] = 72nul chk table [0xffffffff5cc44420,0xffffffff5cc44440] = 32Loaded disassembler from /export/home/JDK/jdk-9.0.1/lib/server/hsdis-sparcv9.so----------------------------------------------------------------------GCTest.run()V [0xffffffff5cc42f00, 0xffffffff5cc43d48] 3656 bytes[Disassembling for mach='sparc:v9b'][Entry Point][Constants]# {method} {0xffffffe702c007e0} 'run' '()V' in 'GCTest'0xffffffff5cc42f00: ldx [ %o0 + 8 ], %g30xffffffff5cc42f04: cmp %g3, %g50xffffffff5cc42f08: be %xcc, 0xffffffff5cc42f400xffffffff5cc42f0c: nop0xffffffff5cc42f10: sethi %hi(0xa3fb8400), %g30xffffffff5cc42f14: xor %g3, -1024, %g3
アセンブラ-hsdis (GCTest.run)
Copyright 2017 FUJITSU LIMITED33
アセンブラ-hsdis (GCTest.run)
Copyright 2017 FUJITSU LIMITED
0xffffffff63c0bdc0: ldsb [ %o0 ], %l20xffffffff63c0bdc4: mov 0x20, %l70xffffffff63c0bdc8: cmp %l2, %l70xffffffff63c0bdcc: be,pn %icc, 0xffffffff63c0be240xffffffff63c0bdd0: nop0xffffffff63c0bdd4: ldx [ %g2 + 0x388 ], %l70xffffffff63c0bdd8: ldx [ %g2 + 0x380 ], %g10xffffffff63c0bddc: membar #StoreLoad0xffffffff63c0bde0: ldsb [ %o0 ], %l20xffffffff63c0bde4: cmp %l2, 00xffffffff63c0bde8: be,pn %icc, 0xffffffff63c0be240xffffffff63c0bdec: nop0xffffffff63c0bdf0: cmp %l7, 00xffffffff63c0bdf4: bne,pn %xcc, 0xffffffff63c0be140xffffffff63c0bdf8: clrb [ %o0 ]0xffffffff63c0bdfc: mov %g2, %o10xffffffff63c0be00: call 0xffffffff75c4f0e0 ;
{runtime_call void SharedRuntime::g1_wb_post(void*,JavaThread*)}0xffffffff63c0be04: mov %g2, %l7
・・・
0xffffffff63c0beb0: clrb [ %o0 ] ;*putfield b {reexecute=0 rethrow=0 return_oop=0}
; - G$X::<init>@10 (line 47); - G::doIt@18 (line 41); - G::run@26 (line 34)
34
HotSpot(runtime)ソース
Copyright 2017 FUJITSU LIMITED
// G1 write-barrier post: executed after a pointer store.
JRT_LEAF(void, SharedRuntime::g1_wb_post(void* card_addr,
JavaThread* thread))
thread->dirty_card_queue().enqueue(card_addr);
JRT_END
share/vm/runtime/sharedRuntime.cpp
g1_wb_postを呼ぶコードを生成しているJITの場所は?
share/vm/optoあたりを探す
35
HotSpot(JIT)ソース
Copyright 2017 FUJITSU LIMITED
Node* GraphKit::store_oop(Node* ctl,
・・・・Node* store = store_to_memory(control(), adr, val, bt, adr_idx,
mo, mismatched);
post_barrier(control(), store, obj, adr, adr_idx, val, bt,
use_precise);
return store;
share/vm/opto/graphKit.cpp
36
putfieldに対応するコード
HotSpot(JIT)ソース
Copyright 2017 FUJITSU LIMITED
void GraphKit::post_barrier(Node* ctl,
・・・・BarrierSet* bs = Universe::heap()->barrier_set();
set_control(ctl);
switch (bs->kind()) {
case BarrierSet::G1SATBCTLogging:
g1_write_barrier_post(store, obj, adr, adr_idx, val, bt,
use_precise);
break;
share/vm/opto/graphKit.cpp
37
HotSpot(JIT/G1GC)ソース
Copyright 2017 FUJITSU LIMITED
void GraphKit::g1_write_barrier_post(Node* oop_store,・・・・
// Offsets into the threadconst int index_offset = in_bytes(JavaThread::dirty_card_queue_offset() +
DirtyCardQueue::byte_offset_of_index());const int buffer_offset = in_bytes(JavaThread::dirty_card_queue_offset() +
DirtyCardQueue::byte_offset_of_buf());// Pointers into the threadNode* buffer_adr = __ AddP(no_base, tls, __ ConX(buffer_offset));Node* index_adr = __ AddP(no_base, tls, __ ConX(index_offset));
// Now some values// Use ctrl to avoid hoisting these values past a safepoint, which could// potentially reset these fields in the JavaThread.Node* index = __ load(__ ctrl(), index_adr, TypeX_X, TypeX_X->basic_type(), Compile::AliasIdxRaw);Node* buffer = __ load(__ ctrl(), buffer_adr, TypeRawPtr::NOTNULL, T_ADDRESS, Compile::AliasIdxRaw);
// Convert the store obj pointer to an int prior to doing math on it// Must use ctrl to prevent "integerized oop" existing across safepointNode* cast = __ CastPX(__ ctrl(), adr);
// Divide pointer by card sizeNode* card_offset = __ URShiftX( cast, __ ConI(CardTableModRefBS::card_shift) );
// Combine card table base and card offsetNode* card_adr = __ AddP(no_base, byte_map_base_node(), card_offset );
__ if_then(card_val, BoolTest::ne, young_card); {sync_kit(ideal);// Use Op_MemBarVolatile to achieve the effect of a StoreLoad barrier.insert_mem_bar(Op_MemBarVolatile, oop_store);__ sync_kit(this);
share/vm/opto/graphKit.cpp
38
HotSpot(JIT/GC)ソース
Copyright 2017 FUJITSU LIMITED
void GraphKit::g1_mark_card(IdealKit& ideal,
・・・・__ make_leaf_call(tf, CAST_FROM_FN_PTR(address,
SharedRuntime::g1_wb_post),
"g1_wb_post", card_adr, __ thread());
share/vm/opto/graphKit.cpp
39
Mark & Evacuation
Copyright 2017 FUJITSU LIMITED
region 1 free region
root set
region 1
Mark Evacuation
garbage
コピー
40
コンカレントマーク
Copyright 2017 FUJITSU LIMITED
region 1 free region
root set
region 1
Mark Evacuation
region 1Application
不当にgarbage扱い
コピー
region 2
41
ストアバリア
Copyright 2017 FUJITSU LIMITED
region 1 free region
root set
region 1
Mark Evacuation
region 1
root set扱いRemember Set
コピー
Applicationregion 2
42
Refinementスレッド
Copyright 2017 FUJITSU LIMITED
Remember Setを非同期にアップデート+-<Total>+-_lwp_start| +-thread_native_entry| | +-ConcurrentGCThread::run()| | | +-ConcurrentG1RefineThread::run_service()| | | | +-DirtyCardQueueSet::apply_closure_to‥‥| | | | | +-RefineCardTableEntryClosure::do_‥‥| | | | | | +-G1RemSet::refine_card(signed‥‥
43
// G1 write-barrier post: executed after a pointer store.
JRT_LEAF(void, SharedRuntime::g1_wb_post(void* card_addr,
JavaThread* thread))
thread->dirty_card_queue().enqueue(card_addr);
Refinementスレッド
Copyright 2017 FUJITSU LIMITED
Remember Setをアップデート
アプリケーション スレッド
参照の更新
Refinementスレッド
キューに書込み
キューから読込み
非同期
44
アジェンダ
Copyright 2017 FUJITSU LIMITED
モチベーション
GCの歴史
PA分析
JITとGC
最後に
45
Wrap Up
Copyright 2017 FUJITSU LIMITED
GCの評価は、GC処理だけでは不十分
GC選択は、先入観にとらわれず、実機調査
アプリケーション実行時の影響
例:SPECjbb2015
よく分からない時は、PAがヒントになるかも
46
結論
2つのバリア
Q/A
Copyright 2017 FUJITSU LIMITED47
Copyright 2010 FUJITSU LIMITED