30
XEye: A Profiling Tool for X10 Seisei Itahashi Yoshiki Sato Prof. Shigeru Chiba Master student in Chiba Lab, The University of Tokyo 1 Seisei Itahashi, Chiba Lab in U.Tokyo

X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

X‐Eye: A Profiling Tool for X10

Seisei ItahashiYoshiki Sato

Prof. Shigeru Chiba

Master student in Chiba Lab,The University of Tokyo

1Seisei Itahashi, Chiba Lab in U.Tokyo

Page 2: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

Background

Seisei Itahashi, Chiba Lab in U.Tokyo 2

Nowadays, programming languages such as X10, etc are evolving toward  abstracting low level process

Behavior becomes more implicit, and can’t be seen

Performance tuning can be difficult

Page 3: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

Motivation: Visualize Implicit Behavior

1. Implicit data transferWhen and how much data are transferred?

2. Waste of CPU resource in synchronizationWhich activities causes long sync‐wait time?

3. Scattered synchronization codeWhich async corresponds to which finish?Possible to determine it statically?

3Seisei Itahashi, Chiba Lab in U.Tokyo

Page 4: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

1. Implicit data transferWhen is the value of a transferred to place 2? How much data are transferred?What route does the value a pass?

• Users want to know easily– For performance tuning

• But, hard to know at a glance– Harder If  code is more complex 4Seisei Itahashi, Chiba Lab in U.Tokyo

Variable a is initialized as an arraywith 10 elements and those values are 1

Moves 2 times in sequential

Only the first element is referred

Page 5: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

2. Waste of CPU resource in sync

• Sync and activity creation are easy in X10– Useless sync blocking is easy to happen

5

Sync startActivity A B C

Sync end

Blocking time may wastethe CPU resource

Seisei Itahashi, Chiba Lab in U.Tokyo

finish {async { S1; }async { S2; }async { S3; }}

S1 S2 S3

Page 6: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

3. Scattered synchronization code

• Which async corresponds to which finish?– The number of asyncs executed during the finish clause depends on the variable num and taskSet.

6

jump to a separate  

source file

How many asyncs?

Seisei Itahashi, Chiba Lab in U.Tokyo

Page 7: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

8

Moreover…

Run in parallel appropriately (value of num and taskSet decides the number of activities)

Run in sequential (run all tasks by 1 activity) Run in parallel (run 1 task by 1 activity)

Seisei Itahashi, Chiba Lab in U.Tokyo

Which class’s run() method?

Page 8: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

X‐Eye: Profiler and Visualizer for X10(1/2)

• Profiler: Records X10 specific events at parallel and distribution constructs such as at, finish at runtime– With transferred data size and activity identifiers 

11

ProfilerProfilerX10 sourcecode

LogdataVisualizerVisualizer

Insert profiling code and compile it

Analyze the log

X10 binarycode

Run it, andgenerate log

Extending polyglot Impl. of X10 compiler

Based on JavaFX

Both Managedand Native X10

Seisei Itahashi, Chiba Lab in U.Tokyo

Page 9: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

X‐Eye: Profiler and Visualizer for X10(2/2)

• Visualizer: not only visualizes the events but is able to interactively limit the scope of visualization by Scoping DSL

Grammar is like Stream API in Java 8Filter conditions can be written in rambdaPossible to filter interactively by watching the progress (interim) result

12

Ex: eventStream = eventStream.filter(e ‐> e.execPlace == 0);

Seisei Itahashi, Chiba Lab in U.Tokyo

Run

Display only the events happened in Place 1

Page 10: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

Demo

13Seisei Itahashi, Chiba Lab in U.Tokyo

Page 11: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

Implementation of X‐Eye

• Profiler– Extend the X10 Compiler to insert the profiling codes to AST

• Developed the visitors for polyglot

• Visualizer– GUI is implemented using JavaFX– Scoping DSL is Java code, so compile on‐memory, output class file, load it and run

Seisei Itahashi, Chiba Lab in U.Tokyo 14

Page 12: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

Data recorded by our tool• Events of execution of

– at, async, finish, clocked, etc…• Size of transferred data

– During compilation, the tool statically determines which values are transferred, then estimates the size at every at.

• The size is estimated based on the impl. of X10 runtime– During runtime, the profiling code records the estimated size of the transferred data at the beginning of the block of the at operation.

15Seisei Itahashi, Chiba Lab in U.Tokyo

Page 13: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

Data recorded by our tool (cont.)

• Activity identifier– maintains which activity is executing.– The profiling code records the activity identifier

• To trace the relation between async and finish.

– It is also used to solve the 2nd motivation:Our tool inserts an extra parameter to every method.The parameter specifies an activity identifier.

16

X10 API does not provide functionalities of activity identifier,because an activity is not always bound to the single thread.  Instead, a worker is bound for dealing with multiple activities.But users want to distinguish activities, so, we’ve implemented it!

X10 API does not provide functionalities of activity identifier,because an activity is not always bound to the single thread.  Instead, a worker is bound for dealing with multiple activities.But users want to distinguish activities, so, we’ve implemented it!

Seisei Itahashi, Chiba Lab in U.Tokyo

Page 14: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

An example of profiling codeof our tool

17

• We’ve modified the X10 compiler based on polyglot

Insert profiling code

Seisei Itahashi, Chiba Lab in U.Tokyo

Page 15: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

An Extra parameter added to method

18

Insert profiling code

Seisei Itahashi, Chiba Lab in U.Tokyo

Page 16: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

An example of JSON event log file

19Seisei Itahashi, Chiba Lab in U.Tokyo

Page 17: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

Our tool shows us chance for tuning such as...

• KMeansDist.x10– Some activities moved to their same place of them(no need!)

– Activity 0‐0 (main activity) moves to every place one‐by‐one to create a new activity

• This can be parallelized

• NQueensPar.x10– Activity doesn’t move.  All activities are in the same place

– There are activities which causes sync‐blocking time long

20Seisei Itahashi, Chiba Lab in U.Tokyo

Page 18: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

21

Source Place and Target Place are same

You can find there are several D (move event with data transfer)

You can also check by watching transferred route of variable pSeisei Itahashi, Chiba Lab in U.Tokyo

Page 19: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

Seisei Itahashi, Chiba Lab in U.Tokyo 22

Main Activity goes around the places to spawn child activities

Program may be accelerated by parallelizing this part

Page 20: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

Our tool shows us chance for tuning such as...

• KMeansDist.x10– Some activities moved to their same place of them  (no need!)

– Activity 0‐0 moves to every place one‐by‐oneto create a new activity

• This can be parallelized

• NQueensPar.x10– Activity doesn’t move.  All activities are in the same place

– There are activities which causes sync‐blocking time long

23Seisei Itahashi, Chiba Lab in U.Tokyo

Page 21: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

24Seisei Itahashi, Chiba Lab in U.Tokyo

All activities never move

Computer resource can’t be efficiently usedeven if it is run by several places

Page 22: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

25

Activity 0‐2 causes the long blocking time in sync

F: “finish” event

Seisei Itahashi, Chiba Lab in U.Tokyo

Page 23: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

Overhead of Profiler (Result in X10 workshop ‘14)

• KMeansDist.x10– CPU: Intel Xeon E5‐2687W 3.10GHz 8 cores x 2, RAM: 64GB– OS: CentOS release 6.2– X10 version: 2.4.0– The number of places:  2~10

29

Large overheads are caused by inter‐place communications for profiling‐ The profiler frequently moves to the first place to record the data

Large overheads are caused by inter‐place communications for profiling‐ The profiler frequently moves to the first place to record the dataSeisei Itahashi, Chiba Lab in U.Tokyo

Page 24: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

Seisei Itahashi, Chiba Lab in U.Tokyo 30

Overhead of Profiler (Latest result)

Overhead was reduced‐ The profiler collects the result only in the end of the program (=main function)

But… the overhead increases as the number of places increases

Overhead was reduced‐ The profiler collects the result only in the end of the program (=main function)

But… the overhead increases as the number of places increases

Number of places: 1 ~ 20

Page 25: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

31Seisei Itahashi, Chiba Lab in U.Tokyo

We consider “at” event causes the overhead‐ Part of the inserted profiling codes are transferred when “at” happens‐ The number of “at” events increases as the number of places increases

Page 26: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

32Seisei Itahashi, Chiba Lab in U.Tokyo

Page 27: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

33Seisei Itahashi, Chiba Lab in U.Tokyo

Page 28: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

Related Work

• Guiding to X10 Programmers to Improve Runtime Performance– XAnalyzer– Detect the predefined 8 patterns’ codes– Suggest a better code for each pattern

• Data‐centric Performance Analysis of PGAS Applications– Detect “read” and “write” of global data objects– Target language is Global Arrays

• Automatic Communications Performance Debugging in PGAS Languages– ti‐trend‐prof– Debug the performance of remote read and write– Target language is Titanium

36Seisei Itahashi, Chiba Lab in U.Tokyo

Page 29: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

Conclusion

• Developed a tool to support to expose implicit behavior in X10– To make the implicit data transfer explicit– To capture the activities’ behaviors inside of the synchronization even if…

• Dynamic dispatching happens• The codes inside of sync are scattered among files and codes

• Developed interactive filtering functionality– Possible to filter the events when visualizing by scoping DSL– Let us be able to continue to filter as watching the progress results

37Seisei Itahashi, Chiba Lab in U.Tokyo

Page 30: X‐Eye: A Profiling Tool for X10x10.sourceforge.net/documentation/presentations/X10Day...Based on JavaFX Both Managed and Native X10 Seisei Itahashi, Chiba Lab in U.Tokyo X‐Eye:

Future Work

• Reduce the overhead in the case of many places• Extend the event filtering functionality

– Not only when visualizing, but also be able to restrict the range of profiling when profiling

• Results in reducing the overhead• Interactive filtering based on execution contexts

38Seisei Itahashi, Chiba Lab in U.Tokyo