Upload
heather-rogers
View
303
Download
5
Embed Size (px)
DESCRIPTION
Introduction Program Analysis General speaking, automated analysis of program behaviors Flow analysis tasks data/control flow analysis information flow analysis (security) points-to/alias analysis … can be modeled as graph reachability problems
Citation preview
OutlineIntroduction
Program Analysis• Graph reachability problem
Summary-based Analysis• One challenge: callbacks
CFL-reachabilityReachability Analysis for Callbacks
Callbacks: conditionsTAL-reachability: conditional reachability
IntroductionProgram Analysis
General speaking, automated analysis of program behaviors
Flow analysis tasks • data/control flow analysis• information flow analysis (security)• points-to/alias analysis• …• can be modeled as graph reachability problems
IntroductionExample: transitive data dependence analysis int gcd(int a, int b, string msg) { write(msg); while (b != 0) { int tmp = a % b; a = b; b = tmp; } return a; }
a b
tmp
ret msg
All transtive data dependence relationships
a --> b, b --> a, a --> tmp, b --> tmp, tmp --> a, tmp --> b,a --> ret, b --> ret, tmp --> ret
IntroductionSummary-based Analysis
Summarizing behaviors of a component (modular/compositional analysis)• Result: summary• Goal:
• reusable: to reuse analysis result• concise: to hind internal complexity• efficient: to avoid unnecessary re-computation
• A general model• transferring summary function from entries to exits
IntroductionExample: transitive data dependence analysis int gcd(int a, int b, string msg) { write(msg); while (b != 0) { int tmp = a % b; a = b; b = tmp; } return a; }
a b
tmp
ret msg
All transtive data dependence relationships
a --> b, b --> a, a --> tmp, b --> tmp, tmp --> a, tmp --> b,
a --> ret, b --> ret, tmp --> ret
Summary:a --> retb --> ret
IntroductionSummary-based Analysis
Summarizing behaviors of a component (modular/compositional analysis)
Challenge: handling incompleteness (incomplete/partial program analysis)• calling context
• unknown parameters• global variables• …
• callbacks (due to dynamic dispatch)• unknown client code
?
?
IntroductionExampleclass Math { int gcd(int a, int b); int gcd20(int a) { return gcd(a, 20); }}
class Math1 extends Math { int gcd(int a, int b) {…}}class Math2 extends Math { int gcd(int a, int b) {…}}// mainMath2 m = new Math2();int x = 30;int y = m.gcd20(x);
Math::gcd20
Math::gcd
Math2::gcd
Math1::gcd
Mainclient code
library code incomplete
CFL-reachabilityInterprocedural Analysis
IFDS/IDE [Reps et al. 1995, Sagiv et al. 1996]• Realizable path: matched parentheses • Filtering out unrealizable paths
void fun() { … y1 = p(x1); … y2 = p(x2); …}
int p(int x) { …}
{1
}1}
2
{2
matched parenthesis language
SeSSSS{i S }i , i = 1,2,…
* We only discuss realizable paths (reachability) defined by matched parenthesis language in the following part.
CFL-reachabilityAlgorithm: Dynamic Programming
(similar to Floyd-Warshall Algorithm)O(n3)
{1
{2
}1
}2
a p
b qe
S
S
S
x y
matched parenthesis language
SeSSSS{i S }i , i = 1,2,…
GraphInvocation edge: {i
Return edge: }i
Normal edge: e
Reachability Analysis for CallbacksSummarizing “Incomplete” Graph
Postponing analysis of callbacksLeaving unnecessary nodes in the summary
{2 {3 {4
}4}3}2
Se SSS S{i S }i , i = 1,2,…
library
a c
db
{1
}1
{5
}5
u
v
x
y
d=g(c), [g: callback
function]
Reachability Analysis for Callbacks
{2 {3 {4
}4}3}2
library
a c
db
callback siteConditional
Reachability
Reachability Analysis for CallbacksConditional Reachability
CRa,b(x,y): x ~> y, if a ~> b
Unconditional Reachability (by CFL reachability)UR(x,y): x ~> y
Summary: CRa,b(x,y) and UR(x,y)
x a b y{ }
x a b y{ }
Reachability Analysis for Callbacks
Client-code AnalysisTurn conditional into unconditional
• if the condition is satisfied
CRc,d(a,b)
library
a c
db
{1
}1
{5
}5
x
y
UR(a,b)
Reachability Analysis for CallbacksLibrary Summarization
Unconditional Reachability• CFL-reachability
Conditional Reachability• ?
Reachability Analysis for CallbacksTree Adjoining Language (TAL)
Mildly Context Sensitive LanguageParsable in O(n6)
Application: Natural Language ProcessingOur Contribution
TAL-Reachability: Conditional Reachability
Reachability Analysis for Callbacks
Tree Adjoining LanguageStringsNon-terminals
Reachability Analysis for Callbacks
Operators
Reachability Analysis for Callbacks
First-order “S”One string
Reachability for a 2-tuple (x,y)One path
x a b y{ }
UR(x,y)
Reachability Analysis for Callbacks
Second-order “ ”𝕊A pair of strings
Reachability for a 4-tuple (x,a,b,y)A pair of paths
x a b y{ }
CRa,b(x,y)
Reachability Analysis for CallbacksOperators Operations for TAL-reachability
α
β
a
b q
p
Reachability Analysis for CallbacksAlgorithm
Result: concise and efficient summaryKeep three types of node (empirically 10%)
• boundary nodes (entries and exits of the library)• chaining nodes• hidden chaining nodes
Evaluation: 8X
Reachability Analysis for Callbacks
Future WorkCallback analysis for real applications
• Android / Web applicationsA more general case
• Handling multiple callbacks in a path
ConclusionAn important question, but few research
papersCallbacks in summary-based analysis techniques
Borrow ideas from other research fieldTree adjoining language (NLP)
Create conditions for unknown facts Instantiate when facts are available
Thank you!
Library Summarization: TAL ReachabilityComplete TAL
Grammar
Library Summarization: TAL Reachability
x1 x2
y1 y2
{i }i
{i(x1,y1) + }i(y2,x2) CRy1,y2(x1,x2)
Library Summarization: TAL Reachability
x1 x2
y
CRy,y(x1,x2) UR(x1,x2)
Library Summarization: TAL Reachability
x1 x2
y1 y2
z1 z2
CRy1,y2(x1,x2)+CRz1,z2(y1,y2) CRz1,z2(x1,x2)
Library Summarization: TAL Reachability
x1 x2
y1 y2
x0CRy1,y2(x1,x2)+UR (x0,x1) CRy1,y2(x0,x2)
Reachability Analysis for CallbacksKeeping reachability between only boundary
nodes are not sufficientChaining nodes & Hidden chaining nodes
Chaining nodes (“connectors”): x1, x2
Hidden chaining nodes (“start/end nodes”): x0, x3
x0 x1 x2 x3
{2 }2 }3{3 }4{4
Reachability Analysis for Callbacks
{2+}2CRp,q(a,b){3+}3CRr,s(p,q){4+}4CRc,d(r,s)
{2 {3 {4
}4}3}2
library
a c
db
p
q
r
s
Reachability Analysis for Callbacks
{2+}2CRp,q(a,b)
{3+}3CRr,s(p,q)
{4+}4CRc,d(r,s)
CRp,q(a,b)+CRr,s(p,q)CRr,s(a,b)
CRr,s(p,q)+CRc,d(r,s)CRc,d(p,q)
CRr,s(a,b)+CRc,d(r,s)CRc,d(a,b)
{2 {3 {4
}4}3}2
library
a c
db
p
q
r
s
Reachability Analysis for Callbacks
CRp,q(a,b)CRr,s(p,q)CRc,d(r,s)CRr,s(a,b)CRc,d(p,q)CRc,d(a,b)
{2 {3 {4
}4}3}2
library
a c
db
• Redundant reachability relationships
p
q
r
s
boundary nodes
Reachability Analysis for CallbacksEvaluation: 15 subjects
<10%Fund.
• Fundamental nodes• Boundary nodes• Chaining nodes• Hidden chaining
nodes
Reachability Analysis for CallbacksEvaluation: library summarization
• 3.16X slow-down• More memory
required
Reachability Analysis for CallbacksEvaluation: client-code analysis
• 8.24X Speed-up• Less memory
required