TensorFlow XLA : AOT編チラ見版

@Vengineer2017/3/20

TensorFlow XLAコード解析 : AOT編

チラ見版

勉強会主催 : Xilinx Zynq MPSoC (2016/02/20) Altera SDK for OpenCL (2016/06/10) Xilinx SDSoC (2017/01/28)

PYNQ祭り (2017/03/04)FPGAディープラーニング実践懇親会 (2017/05/20)

ブログ : Vengineerの戯言http://blogs.yahoo.co.jp/verification_engineer

Twitter : ＠Vengineer

書籍 : SystemVerilogスタートアップhttp://www.cqpub.co.jp/hanbai/books/36/36191.htm

自己紹介

TensorFlow XLAとはhttps://www.tensorflow.org/performance/xla/

XLA(Accelerated Linear Algebra)は、TensorFlow計算を最適化する線形代数のドメイン固有のコンパイラです。結果として、サーバーおよびモバイルプラットフォームでの速度、メモリ使用率、移植性が向上します。当初、ほとんどのユーザーはXLAの大きなメリットは見られませんが、JIT(Just-In-Time)コンパイルやAOT(Ahead-Of-Time)コンパイルを使用してXLAを使用することで実験を開始できます。新しいハードウェアアクセラレータをターゲットとする開発者は、XLAを試すことを特にお勧めします。

原文(英語)をそのまま、Google翻訳にお願いしました。

https://www.tensorflow.org/performance/xla/

https://www.tensorflow.org/performance/xla/

ブログにも書きました

TensorFlow XLAの衝撃2017年2月20日

http://blogs.yahoo.co.jp/verification_engineer/71016304.html

簡単にまとめると

TensorFlow XLAでは、次の2つをサポートした

1)、JIT (Just-In-Time) コンパイルただし、単一マシンのみで、GPUは1つ

2)、AOT (Ahead-Of-Time) コンパイルCPUのみx86/x86-64/ARM/AARCH64/PowerPC

この資料は、

TensorFlow XLAの

AOTに関するコードを解析したものをまとめたです。

ご利用は、自己責任でお願いします。

Using AOT compilationhttps://www.tensorflow.org/performance/xla/tfcompile

・tfcompileって、何？

・tfcompileは、何をする？

・tfcompileを使うには！

現時点（TensorFlow r1.0) では、AOTのターゲットは

CPU(x86/x86-64/ARM/ARM64/PowerPC)のみサポート。

tfcompileって、何？

・TensorFlowグラフを実行可能コードにコンパイルす

るためのツール

・バイナリサイズおよびランタイムオーバーヘッドを減ら

す

・利用例：推論グラフをモバイルデバイス用の実行可

能コードにコンパイル

ランタイムが無くなる

TensorFlowグラフは通常、TensorFlowランタイムに

よって実行されます。これにより、グラフ内の各ノード

の実行ではランタイムオーバヘッドを招く。グラフ自体

に加えて、TensorFlowランタイム用のコードが必要で

あるため、バイナリサイズが大きくなる。

tfcompileによって生成される実行可能コードは、

TensorFlowランタイムを使用せず、計算で実際に使

用されるカーネルにのみ依存する。

tfcompileは、何をする？

tfcompileは、TensorFlowサブグラフからそのサブグ

ラフを実装する関数を生成する。

Feedは関数の入力引数、Fetchは関数の出力引数と

なる。

すべてのPalceholdersとVariablesは、関数の入力引

数としてFeedとして指定する必要がある。

tfcompileによって生成されたファイルは、関数のオブ

ジェクトファイルとして利用できる。

tfcompileを使うには！

ステップ1：コンパイルするサブグラフを構成する

ステップ2：tf_libraryビルドマクロを使用してサブグラ

フをコンパイルする

ステップ3：サブグラフを呼び出すコードを書く

ステップ4：最終的なバイナリを作成する

tfcompile

バイナリでは提供されていないので、ソースコードからビルドす

る必要がある

ディレクトリ構成

compilerディレクトリがTensorFlow XLA

・aot・jit・tests・tf2xla・xla

AOT関連は、主に、aotディレクトリ内にある

TensorFlowは、

Bazelを使ってビルドしていますので、

Bazel : https://bazel.build/

まずは、BUILDファイルaot/BUILD

cc_binary( name = "tfcompile", visibility = ["//visibility:public"], deps = [":tfcompile_main"],) tfcompile_main

tfcompile_mainaot/BUILD

cc_library( name = "tfcompile_main", srcs = ["tfcompile_main.cc"], visibility = ["//visibility:public"], deps = [ ":tfcompile_lib", ":tfcompile_proto", ….. ],)

tfcompile_main.cc

tfcompile_main.ccint main(int argc, char** argv) { 各種処理フラグの設定 tensorflow::tfcompile::MainFlags flags; flags.target_triple = "x86_64-pc-linux"; flags.out_object = "out.o"; flags.out_header = "out.h";

std::vector<tensorflow::Flag> flag_list; AppendMainFlags(&flag_list, &flags); xla::legacy_flags::AppendCompilerFunctorFlags(&flag_list); xla::legacy_flags::AppendCpuCompilerFlags(&flag_list); xla::legacy_flags::AppendCpuRuntimeFlags(&flag_list);

tfcompile_main.cc 引数の処理 tensorflow::string usage = tensorflow::tfcompile::kUsageHeader; usage += tensorflow::Flags::Usage(argv[0], flag_list); bool parsed_flags_ok = tensorflow::Flags::Parse(&argc, argv, flag_list);

tensorflow::port::InitMain(usage.c_str(), &argc, &argv);

tensorflow::tfcompile::Main(flags); return 0;}

tfcompile::Mainaot/tfcompile_main.cc

コンフィグファイルとグラフファイルの読み込み

ReadProtoFile("config", flags.config, &config); ReadProtoFile("graph", flags.graph, &graph_def);

グラフの初期化

InitGraph(graph_def, config, flags, &flib, &graph); グラフのコンパイル

CompileGraph(std::move(graph), flags, &flib, &compile_result);

ファイル(オブジェクト、ヘッダ)の書き出し

WriteStringToFile( …., …., …. );

グラフ情報

コンフィグ情報

グラフ情報をHLO(最適化)に変換

HLOをLLVMでCPU実行コードに変換

オブジェクトファイルへの出力

ProtoFileの読み込みaot/tfcompile_main.cc

if (StringPiece(fname).ends_with(".pbtxt")) { core/platform/env.cc return ReadTextProto(Env::Default(), fname, proto); } else { core/platform/env.cc return ReadBinaryProto(Env::Default(), fname, proto); }

グラフの初期化aot/compile.cc : InitGraph

新しいグラフを生成

std::unique_ptr<Graph> g(new Graph(flib)); GraphDef copy_def(graph_def); AddDefaultAttrsToGraphDef(&copy_def, *g->op_registry(), 0);

グラフ定義(GraphDef)からグラフに変換

ConvertGraphDefToGraph(GraphConstructorOptions(), copy_def, g.get());

Feed/Fetchをノード(_Arg/Retval)としてグラフに追加

RewriteAndPruneGraph(g.get(), config, flags));

グラフのコンパイルaot/compile.cc : CompileGraph

TensorFlowグラフをXLA(HLO)フォーマットに変換

ConvertGraphToXla(client, std::move(graph), flib,

&computation, &compile_result->has_context_arg); コンパイルオプションの設定

xla::cpu::CpuAotCompilationOptions aot_opts( flags.target_triple, flags.target_cpu, flags.target_features, flags.entry_point, xla::cpu::CpuAotCompilationOptions::RelocationModel::BigPic); XLA(HLO)をコンパイル

return CompileXla(client, computation, aot_opts, compile_result);

ファイルの書き出しaot/tfcompile_main.cc:Main

オブジェクトファイルの書き出し

const std::vector<char>& obj = compile_result.aot->object_file_data(); WriteStringToFile(env, flags.out_object, StringPiece(obj.data(), obj.size())); CPPクラス名の解析

ParseCppClass(flags.cpp_class, &header_opts.class_name, &header_opts.namespaces)); ヘッダの生成

GenerateHeader(header_opts, config, compile_result, &header));

ファイルの書き出し

WriteStringToFile(env, flags.out_header, header));

ありがとうございました

Data & Analytics

TensorFlow XLA : AOT編 チラ見版

TensorFlow XLA : AOT編チラ見版