43
String Analysis for String Analysis for Binaries Binaries Mihai Christodorescu Nicholas Kidd Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

Embed Size (px)

DESCRIPTION

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Why Do We Need String Analysis? We could just use the strings program: $ strings no_msg /lib/ld-linux.so.2 libc.so.6... no msg This food has %s. $  

Citation preview

Page 1: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

String Analysis for BinariesString Analysis for Binaries

Mihai ChristodorescuNicholas KiddNicholas KiddWen-Han Goh

University of Wisconsin, Madison

Page 2: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 2

What is What is String AnalysisString Analysis??• Recovery of values a string variable

might take at a given program point.void main( void ){char * msg = “no msg”;printf( “This food has %s.\n”, msg );

}

Output: This food has no msg.

Page 3: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 3

Why Do We Need String Why Do We Need String Analysis?Analysis?

• We could just use the strings program:$ strings no_msg/lib/ld-linux.so.2libc.so.6......no msgThis food has %s.$

Page 4: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 4

Why Perform String Analysis?Why Perform String Analysis?• Computer forensics

Given an unknown program, we want to know the files it might access,the registry keys it might get and set,the commands it might execute.

• Program verificationSQL queries, embedded scripting, …

Page 5: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 5

A Complicated ExampleA Complicated Examplevoid main( void ){char buf[257];strcpy( buf, “/” );strcat( buf, “b” );strcat( buf, “i” );strcat( buf, “n” );...system( buf );

}

Running strings:/abcd...

Running a string analysis:/bin/ifconfig –a |

/bin/mail ...@...

Page 6: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 6

Our ContributionsOur Contributions• Developed a string analysis for

binaries.

• Implemented x86sa, a string analyzer for Intel IA-32 binaries.

• Evaluated on both benign and malicious binaries.

Page 7: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 7

OutlineOutline String analysis for Java.

String analysis for x86.

Evaluation.

Applications & future work.

Page 8: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 8

String Analysis for JavaString Analysis for Java1.Create string flowgraph.void main( void ){

String x = “/”;x = x + “b”;x = x + “i”;x = x + “n”;...System.exec( x );

}

Christensen, Møller, Schwartzbach "Precise Analysis of String Expressions" (SAS’03)

“/” “b”

concat “i”

concat...

Page 9: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 9

String Analysis for Java [2]String Analysis for Java [2]2. Create context-free grammar.3. Approximate with finite automaton.

“/” “b”

concat “i”

concat...

A1 “/” “b”A2 A1 “i”A3 A2 “n”A4 A3 “/”…

“/bin/ifconfig ...”

Page 10: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 10

From Java to x86 executablesFrom Java to x86 executables

Flowgraph CFG FSAJava.class

Page 11: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 11

From Java to x86 executablesFrom Java to x86 executables

Rest of this talk:Bridge the syntactic and semantic gaps between Java and assembly language.

x86executable Flowgraph CFG FSA

Page 12: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 12

OutlineOutline String analysis for Java.

String analysis for x86.

Evaluation.

Applications & future work.

Page 13: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 13

Four Problems with AssemblyFour Problems with Assembly1. No types.

2. No high-level constructs.

3. No argument passing convention.

4. No Java string semantics.

Page 14: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 14

Problem 1: Problem 1: No TypesNo Types• Solution: infer types from C lib. funcs.

Assumption #1:Strings are manipulated only using string library functions.

char * strcat( char * dest, char * src )• After: “eax” points to a string.• Before: “dest” and “src” point to a

string.

Page 15: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 15

Problem 1: Problem 1: No Types [cont.]No Types [cont.]• Perform a backwards analysis to find

the strings:– Destination registers “kill” string type

information.– Libc string functions “gen” string type

information.– Strings at entry to CFG are constant

strings or function parameters.

Page 16: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 16

Problem 1: Problem 1: No Types No Types [example][example]

String variables:– after the call: { eax }– before the call: { ebx, ecx }

eax = _strcat( ebx, ecx );

Page 17: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 17

Problem 2: Problem 2: Function Function ParametersParameters

• Function parameters are not explicit in x86 machine code.

mov ecx, [ebp+var1]push ecxmov ebx, [ebp+var2]push ebxcall _strcatadd esp, 8

Page 18: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 18

Problem 2: Problem 2: Fn. Params Fn. Params [example][example]

mov ecx, [ebp+var1]push ecxmov ebx, [ebp+var2]push ebxcall _strcatadd esp, 8

Stack pointer push ecx

push ebx

Page 19: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 19

mov ecx, [ebp+var1]push ecxmov ebx, [ebp+var2]push ebxcall _strcatadd esp, 8

Problem 2: Problem 2: Function Function ParametersParameters

• Solution: Perform forwards analysis modeling x86 instructions effects on the stack.

Page 20: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 20

Problem 3: Problem 3: Unmodeled Unmodeled FunctionsFunctions

• String type information and stack model may be incorrect!

Assumption #2:“_cdecl” calling convention and well behaved functions

• Treat all function arguments and return values as strings.

Page 21: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 21

Problem 4: Problem 4: Java vs. x86 Java vs. x86 SemanticsSemantics

• Java strings are immutable,x86 strings are not.

String y, x=“x”;y = x;y = y + “123”;System.out.println(x);

char *y;char x[10] = “x”;y = x;y = strcat(y,”123”);printf(x);

=> “x” => “x123”

Page 22: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 22

Problem 4: Problem 4: Java vs. x86 Java vs. x86 SemanticsSemantics

• Solution: May-Must alias analysis.

0x200: _strcat( ebx,ecx )

eax

ebx

esi

200

eax

ebx

esi

“May alias” relations:

Page 23: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 23

Problem 4: Problem 4: Java vs. x86 Java vs. x86 SemanticsSemantics

• Solution: May-Must alias analysis.

0x200: _strcat( ebx,ecx )

eax

ebx

esi

200

eax

ebx

esi

“Must alias” relations:

Page 24: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 24

Stack HeightTransformers

String TypeTransformersAlias AnalysisTransformers

UTF-8 func’s

Stack HeightTransformers

String TypeTransformersAlias AnalysisTransformers

libc char* func’s

Stack HeightTransformers

String TypeTransformersAlias AnalysisTransformers

Unicode func’s

x86sax86sa Architecture Architecture

EXE

IDA Pro Connector WPDS++

JSA

Page 25: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 25

IntraIntraprocedural Analysis procedural Analysis SummarySummary

1. Recover callsite arguments.(stack-operation modeling)

2. Infer string types.(backward type analysis)

3. Discover aliases.(may-, must-alias forward analysis)

Generate the String Flow Graph for the Control Flow Graph.

Page 26: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 26

OutlineOutline String analysis for Java.

String analysis for x86.

Evaluation.

Applications & future work.

Page 27: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 27

Example 1: Example 1: simplesimplechar * s1 = "Argc has ";char * s2;char * s3 = " arguments";char * s4;switch( argc ) { case 1: s2 = "1“ ; break; case 2: s2 = "2“ ; break; default: s2 = "> 2"; break;}s4 = malloc( strlen(s1)+strlen(s2)+strlen(s3)+1 );s4[0] = 0;strcat( strcat( strcat( s4, s1 ), s2), s3 );printf( "%s\n", s4 );

Page 28: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 28

Example 1: String Flow GraphExample 1: String Flow Graph“Argc has”

“ arguments”

“1”

concat

concat

concat

malloc “2” “> 2”

assign

Our result:"Argc has 1 arguments""Argc has 2 arguments""Argc has > 2 arguments"

Page 29: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 29

Example 2: Example 2: cstarcstarchar * c = “c";char * s4 = malloc(101);for( int i=0; i < 100 ; i++ ) { strcat( s4, c );}printf( "%s\n", s4 );

Page 30: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 30

Example 2: String Flow GraphExample 2: String Flow Graph

“c”

concat

malloc

assign

Our result: c*Correct answer: c100

Page 31: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 31

Example 3: Example 3: Lion WormLion Worm• Code and String Flow Graph omitted.

• x86sa analysis results:"/sbin/ifconfig -a|/bin/mail [email protected]"

Page 32: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 32

Future Work:Future Work:InterInterprocedural Analysisprocedural Analysis

1. Inline everything and apply intra-procedural analysis.

2. “Hook” intraprocedural String Flow Graphs into a “Super String Flow Graph”.

3. Polyvariant analysis over function summaries for String Flow Graphs.

Page 33: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 33

Future Work:Future Work:Relax AssumptionsRelax Assumptions

• Relax the assumptions:– Strings can be manipulated in many ways.– Calling conventions can vary in a

program.

• Value Set Analysis looks promising:– Identifies “variables” based on usage

patterns.

Page 34: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 34

Future Work:Future Work:More ApplicationsMore Applications

• Malicious code analysis

• Analysis of dynamic code generators:– Packed programs– Shell code generators

Page 35: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 35

What to Take Away:What to Take Away:• If you have type information, AND

arguments to calls, ANDno aliasing

then string analysis is straightforward.

• String analysis on binaries performs well and provides useful results.

Page 36: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

String Analysis for BinariesString Analysis for Binaries

Mihai ChristodorescuNicholas KiddNicholas KiddWen-Han Goh

University of Wisconsin, Madison{ mihai, kiddkidd, wen-han }@cs.wisc.edu

Page 37: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 37

From Java to x86 BinariesFrom Java to x86 Binaries• Input: x86 binary file

• Java string analyzer input: Flowgraph• Output: finite automaton

?

Page 38: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 38

Problem 4: Java vs. x86 Problem 4: Java vs. x86 SemanticsSemantics

• Solution: Alias analysis.0x100: mov eax,ecx

• λS.let A = aliases(ecx)S – {(eax,*)} ({eax} A)

0x200: _strcat( ebx, ecx )• λS. let A = alias(ebx)

let N = vars(A) (S (N {200})) – {(ebx,*),{(eax,*)}

{(ebx,200),(eax,200)}

Page 39: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 39

Example 1: x86sa Analysis Example 1: x86sa Analysis ResultsResults

1. "Argc has 1 arguments"2. "Argc has 2 arguments"3. "Argc has > 2 arguments"

Page 40: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 40

cstar’s x86sa analysis resultscstar’s x86sa analysis results• Infinitely many strings with common

prefix: “c”

Page 41: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 41

Transformers – String Transformers – String InferenceInference

• 0x200: _strcat( ebx, ecx )• String inference

λd.{ d – {eax} {ebx,ecx} }

Page 42: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 42

Transformers – Alias AnalysisTransformers – Alias Analysis• 0x200: _strcat( ebx, ecx )

• ebx mustlet T = aliasmust(ebx)let V = aliasmay(ebx)λd. { must(d) {(ebx, *), (eax, *)}

{(ebx, 200), (eax, 200)} -aT{(a,*)} aT{(a,200)},

(may(d) {(ebx, *), (eax, *)} aV{(a,200)})

Page 43: String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 43

Transformers – Alias AnalysisTransformers – Alias Analysis• 0x200: _strcat( ebx, ecx )

• ebx maylet T = aliasmust(ebx)let V = aliasmay(ebx)λd. { must(d) {(eax, *)} -aT{(a,*)}

{(ebx, 200), (eax, 200)}, {may(d) {(ebx, *), (eax, *)}

aT{(a,200),(a,old(a)} aV{(a,200)}}