An annotated context-free grammar based vulnerability detection using LALR parser

An annotated context-free grammar based vulnerability detection using LALR parser

安藤類央情報通信研究機構ネットワークセキュリティ研究所

LALR ( 先読み上昇型パーサ）を用いたCVE-2013-4371 の検出と評価

CVE-2013-4371 Realloc() vulnerability under high memory pressure

情報通信システムセキュリティ研究会（ ICSS ）2015 年 11 月 27 日 ( 金 ) 　 15:05-15:30

概要： LALR ( 先読み上昇型パーサ）を用いた大規模プログラム脆弱性走査■ 近年、ミッションクリティカルな情報基盤ソフトウェアに大規模なオープンソース（ Linux, Xen, OpenFlowなど）が用いられており、このオープンソースの規模が爆発的に増加している。■ANTLR, Bison/Flex, Boost Spirit などの構文解析（言語実装）技術の普及により、大規模プログラムの処理・生成に関数型言語のフレームワークが適用されつつある。例） Android Open Source Project 内での ANTLR, OpenFLow 内での Bison/Flex の利用など■ 近年のクラスタコンピューティング、クラウドコンピューティングなどの技術の普及により、 Map-Reduce や LALR (Look-Ahead Left-Right) などの上昇型のテキスト技術処理で現実的な計算時間で行なうことが可能になっている。■ 本論文ではソースコードの脆弱性の検出に LALR 型の構文解析プログラムを解析し、大規模プログラムの高メモリプレッシャー下での情報漏洩の危険性のある脆弱性を文脈自由文法を用いて記述し、ソースコードの全走査からの検出を許容可能な計算時間で行なうことが可能になった。

背景と設計方針 (Scalability vs False Negatives)In designing vulnerability checker, we face the difficult choice 　between precision and scalability. Particularly, security system 　design is forced to emphasize either false negatives or false positives. In todayfs large scale computing era, we conclude that a 　 false negative rate should be as close to 0 as possible.

As of January 2013, GitHub had grown to 3 million users and 4.9 million repositories (repositories are histories of code shared on the site). [9] And by December of this year, the company hit 10 million repositories.

http://slideplayer.us/slide/703331/

http://thenextweb.com/insider/2013/01/16/github-300-million-users/#%21pQyn3

１９ 40-1950 1960 １９９０

２０００

２０１０

assembler

C language(1972) -

Lisp (1958-)

Prolog（１９７２－）

mapReduceOcaml

Scala

JavaRuby / Python

Turing machine

Lamda calculus

OtterFirst order Theorem Prover

First order Logic

Map and Fold

1970-1980

Isabelle

proverif

John von Neumann

Two streams of computing paradigm(1940 – 2015) Imperative vs Declarative

Dalvik VM

Kurt Gödel

MainFrame

resolution

Haskel

-> x { -> y { x.call(y) } }

量子力学

集合論

ICOT

Long term trend ( 検査方式と問題領域）　

ITS4ACSAC 2000

MOPSCCS 2002

MC Meta-Level CompilationOSDI 2000

MACEConcolic ExecutionUSENIX SEC 2011

COTS (ROP)Usenix 2013

AutomationNDSS 2000

Format StringUSENIX SEC 2001

MOPS (2)CCS 2004

MetaSymsploitUSENIX SEC 2013

CHUCKYCCS 2013

Computational Verification (proverif)

CCS 2012

ConfAidOSDI 2011

Metal Compiler ExtentionSSP 2002

SLAMPOPL 2002

ForNox Hot SDN 2012

DowserUSENIX SEC 2013

F7 verificationCCS 2010

StackGuardUSENIX SEC 1998

Branch Tracing (ROP) Usenix Sec 2013

ProverifSSP 2006 プロトコル検証の精緻化

複合型

設定整合性

攻撃手法の迅速化への対応

検査方法の分類■ 構文主導型 (Syntax Directed Translation)　 - This translator consists of a parser (or grammar) with embedded actions that immediately generate output.正規表現、有限オートマトンITS4: a static vulnerability scanner for C and C++ 　 code, Computer Security Applications, ACSAC 2002Chucky: exposing missing checks in source code for vulnerability discovery ccs 2013

■ ルール方式 (Rule Based Translation) - Rule-based translators use the DSL of a particular rule engine to specify a set of “this goes to that” translation rules.遷移規則、プッシュダウンオートマトンUsing programmer-written compiler extensions to catch security holes SSP 2002Checking system rules using system-specific, programmer-written compiler extensions OSDI 2000

■ モデル駆動方式 (Model Driven Translation) - From the input model, a translator can emit output directly, build up strings, build up templates (documents with “holes” in them where we can stick values), or build up specialized output objectsモデル検査・実行系MOPS: an infrastructure for examining security properties of software CCS2002Chucky: exposing missing checks in source code for vulnerability discovery ccs 2013

提案手法（ tagging, LALR parsing and binary search) Ａ－１対象となるソースツリーのファイルをリストアップする

Ａ－２ファイルリストから関数を tagging( 関数の行数など）するＢ－１対象となるソースツリーのループ群を検出するＢ－２対象となるソースツリーから realloc() 関数を検出するＣ－３　検出されたループそれぞれについて、関数の行数配列と、 realloc() の行数の間でバイナリサーチを行うＤ－１ＡからＣの手順をもとに、脆弱性についての情報をまとめる。

{ "_id" : ObjectId("5633ed7f42e0e0048307ec14"), "loop_end_line" : 420, "realloc" : 1, "loop_start_line" : 398, "loop_type" : 1, "realloc_line" : 402, "file_name" : "tools_libxl_libxl__c", "func_line_number" : 388 }

関数解析部分

ループ解析部分

比較した手法（ＳＣＩＳ２０１５）　：　プッシュダウンオートマトンによるブロック解析Main Loop

Lexer

NFA （有限オートマトン）PDA( プッシュダウンオートマトン）

Token Analyzer

Block Handler

識別子（制御文、メモリ操作命令など）の検出と処理

ブロック文（繰り返し、分岐）のネスト管理

Saturator-1lightweight code checker with document databasehttps://github.com/RuoAndo/Saturator-1

Iteration for each token

　　　　 switch (charatyp[ch]) fcase Letter:for ( ; charatyp[ch]==Letter ||

charatyp[ch]==Digit;ch=nextCh())if (p < p 16) p++ = ch;p = '\0'

if(strcmp(tkn.text, “for")==0)

Document Database処理系の状態情報（プログラム中の位置など）問い合わせ

格納

検査対象　 CVE-2013-4371Xen Hypervisor

402 tmp = realloc(ptr, (i + 1) * sizeof(libxl_cpupoolinfo));

388libxl_cpupoolinfo * libxl_list_cpupool(libxl_ctx *ctx, int *nb_pool)389{397 poolid = 0;398 for (i = 0;; i++) {399 info = xc_cpupool_getinfo(ctx->xch, poolid);400 if (info == NULL)401 break;402 tmp = realloc(ptr, (i + 1) * sizeof(libxl_cpupoolinfo));403 if (!tmp) {404 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "allocating cpupool info");405 free(ptr);406 xc_cpupool_infofree(ctx->xch, info);407 return NULL;408 }409 ptr = tmp;410 ptr[i].poolid = info->cpupool_id;411 ptr[i].sched_id = info->sched_id;412 ptr[i].n_dom = info->n_dom;413 if (libxl_cpumap_alloc(ctx, &ptr[i].cpumap)) {414 xc_cpupool_infofree(ctx->xch, info);415 break;416 }417 memcpy(ptr[i].cpumap.map, info->cpumap, ptr[i].cpumap.size);418 poolid = info->cpupool_id + 1;419 xc_cpupool_infofree(ctx->xch, info);420 }

realloc use-after-free vulnerabilityUse-after-free vulnerability in the libxl\_list_cpupool function in the libxl toolstack library in Xen 4.2.x and 4.3.x, when running "under memory pressure," returns the original pointer when the realloc function fails, which allows local users to cause a denial of service (heap corruption and crash) and possibly execute arbitrary code via unspecified vectors.

At line 402, Xen uses realloc for reallocating the memory. Note that the address of libxl\_cpupoolinfo is already assigned outside of this routine. Under high pressure, realloc can not extend the memory from the original pointer which is already obtained. in this case, realloc newly yielding the address which remaining the data to be written.

Boundary(終了条件）が緩いループに realloc が不適切なポインタを引数にして実行されている。

Loop representation and semantic action

66 line67: for_statement_168| for_statement_269| for_loop_start70| condition_171| condition_272| realloc73| block74;

258 block259: BRACE_LEFT {263}264|265 BRACE_RIGHT {266 counter = yylval.ival;269274 func_for_statement_end();275 func_for_statement_insert();276281 }

"for" { return FOR;}"realloc" { return REALLOC;}[0-9*] { return NUMBER;} "(" { return PAREN_LEFT;}

Ｌｅｘｅｒ ParserYacc or Bison Compiler

C Compiler

Parser Binary (a.out)

LR specification

y.tab.c

Input stream

y.tab.c

a.out

output stream

{ "_id" : ObjectId("5633ed7f42e0e0048307ec14"), "loop_end_line" : 420, "realloc" : 1, "loop_start_line" : 398, "loop_type" : 1, "realloc_line" : 402, "file_name" : "tools_libxl_libxl__c", "func_line_number" : 388 }

Semantic action

Bottom up

評価実験 CVE-2013-4371 並列化したプッシュダウンオートマトン

12{"_id" : ObjectId("53f9ec4764e21cef244d69fb"), "located" : "402", "functionName" : "libxl_list_cpupool", "functionLine" : "388", "filename" : "libxl.c“}34{"_id" : ObjectId("53f9ec9464e21cef244d6a0e"), "start_line" : "398", "end_line" : "420", "functionName" : "libxl_list_cpupool", "functionLine" : "388", "filename" : "libxl.c“}

realloc

{"_id" : ObjectId("53d291fe40c2acf65bbbf9f7"), "located" : "145

"functionName" : "xc_vcpu_setaffinity", "functionLine" : "116", "filename" : "xc_domain.c" }

Use-after-free vulnerability in the libxl_list_cpupool function in the libxl toolstack library in Xen 4.2.x and 4.3.x, when running "under memory pressure," returns the original pointer when the realloc function fails, which allows local users to cause a denial of service (heap corruption and crash) and possibly execute arbitrary code via unspecified vectors.http://www.cvedetails.com/cve/CVE-2013-4371/

We compiled our system on ubuntu12 LTS with Linux kernel 3.2.0. proposed system is hosted on Intel Xeon E5645 with 2.4 GHZ clock.

version forloop realloc functions real user sys real user sys

4.0.4 5438 76 13143m41.925s

0m9.213s 0m22.837 0m17.817s 0m2.880s 0m0.328s

4.1.0 5579 80 13735m35.133s

0m9.381s

0m25.002s 0m18.597 0m2.980 0m0.448

4.1.2 5547 76 13682m2.915s 0m9.301s

0m23.545s 0m18.432s 0m3.012 0m0.396

青：並列化なし　　赤：提案手法（タスク並列化）

評価実験（２） LALR を用いた loop タイプの特定等{ "_id" : ObjectId("5633ed7f42e0e0048307ec14"), "loop_end_line" : 420, "realloc" : 1, "loop_start_line" : 398, "loop_type" : 1, "realloc_line" : 402, "file_name" : "tools_libxl_libxl__c", "func_line_number" : 388 }

　 computing time

detected realloc() 　loop

detected loop

4.2.1 10m40.012s 21 　 3734

4.2.5 11.m17.259s 20 　 37374.2.5 11.m17.259s 20 　 3737

4.3.1 11m54.117s 18 　 3907

4.3.4 12m3.511s 18 　 3911

for_statement_1 : for_loop_start condition_1_1 condition_1_2 condition_1_3 { printf("for_statement type:1 starts at :"); print_line_number(); func_for_statement_1_start(); for_loop_flag = 1;} ;

for_statement_2: for_loop_start condition_1_1 condition_2_2 { printf("for_statement type:2 started at :"); print_line_number(); func_for_statement_2_start(); for_loop_flag = 1;} ;

終了条件 (boundary) が不十分なループ実装

提案した LALR 型は変数遷移やループ系特定可能だが reentrant(pure)ではないので現状では並列化できない。

進行中の解析処理と今後の課題

脆弱性なループのある関数* libxl_list_cpupool(libxl_ctx *ctx, int *nb_pool)

外部から攻撃者が操作し、ループ回数に影響をあたえる与えることができる関数（ main)

Exhaustive path search ( ソースコードからの全パス列挙）

Linux 4.1.3 ４００分

当該ループ実装の他のオープンソース内の検査

Exhaustive path search は Xen 4.2.0 であれば *** 分程度ですべて網羅できる。

Httpd 2.2.31 268 分

まとめと今後の課題： LALR ( 先読み上昇型パーサ）を用いた大規模プログラム脆弱性走査■ANTLR, Bison/Flex, Boost Spirit などの構文解析（言語実装）技術の普及により、大規模プログラムの処理・生成に関数型言語のフレームワークが適用されつつある。例）Android Open Source Project 内での ANTLR, OpenFLow 内での Bison/Flex の利用など■ 近年のクラスタコンピューティング、クラウドコンピューティングなどの技術の普及により、Map-Reduce や LALR (Look-Ahead Left-Right) などの上昇型のテキスト技術処理で現実的な計算時間で行なうことが可能になっている。■ 本論文ではソースコードの脆弱性の検出に LALR 型の構文解析プログラムを解析し、大規模プログラムの高メモリプレッシャー下での情報漏洩の危険性のある脆弱性を文脈自由文法を用いて記述し、ソースコードの全走査からの検出を許容可能な計算時間で行なうことが可能になった。■今後の課題提案 LALR パーサの pure 化 (reentrant にする）による並列化、 Exhaustive path search( 全パス走査）の高速化、 ANTLR, Boost Spirit の適用による semantic action の強化等

Engineering

An annotated context-free grammar based vulnerability detection using LALR parser