Upload
donat
View
39
Download
0
Embed Size (px)
DESCRIPTION
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience. Lab for Internet & Security Technology (LIST) Northwestern University. The Spread of Sapphire/Slammer Worms. Desired Requirements for Polymorphic Worm Signature Generation. - PowerPoint PPT Presentation
Citation preview
Hamsa: Fast Signature Generation for Zero-day
Polymorphic Wormswith Provable Attack Resilience
Lab for Internet & Security Technology (LIST)Northwestern University
2
The Spread of Sapphire/Slammer Worms
3
Desired Requirements for Polymorphic Worm Signature
Generation•Network-based signature generation
–Worms spread in exponential speed, to detect them in their early stage is very crucial… However
»At their early stage there are limited worm samples.
–The high speed network router may see more worm samples… But
»Need to keep up with the network speed !»Only can use network level information
4
Desired Requirements for Polymorphic Worm Signature
Generation
No existing work satisfies these requirements !
•Noise tolerant–Most network flow classifiers suffer false
positives.–Even host based approaches can be injected
with noise.
•Attack resilience–Attackers always try to evade the detection
systems
•Efficient signature matching for high-speed links
5
Outline
•Motivation•Hamsa Design•Model-based Signature Generation•Evaluation•Related Work•Conclusion
6
Choice of Signatures
•Two classes of signatures–Content based
»Token: a substring with reasonable coverage to the suspicious traffic
»Signatures: conjunction of tokens
–Behavior based
•Our choice: content based–Fast signature matching. ASIC based
approach can archive 6 ~ 8Gb/s–Generic, independent of any protocol or
server
7
Unique Invariants of Worms
• Protocol Frame– The code path to the vulnerability part, usually
infrequently used– Code-Red II: ‘.ida?’ or ‘.idq?’
• Control Data: leading to control flow hijacking– Hard coded value to overwrite a jump target or a
function call
• Worm Executable Payload– CLET polymorphic engine: ‘0\x8b’, ‘\xff\xff\xff’ and
‘t\x07\xeb’
• Possible to have worms with no such invariants, but very hard
Invariants
8
Hamsa ArchitectureProtocolClassifier
UDP1434
HamsaSignatureGenerator
WormFlow
Classifier
TCP137
. . .TCP80
TCP53
TCP25
NormalTraffic Pool
SuspiciousTraffic Pool
Signatures
NetworkTap
KnownWormFilter
Normal traffic reservoir
Real time
Policy driven
9
Components from existing work
•Worm flow classifiers–Scan based detector [Autograph]–Byte spectrum based approach
[PAYL]–Honeynet/Honeyfarm sensors
[Honeycomb]
10
Hamsa Design•Key idea: model the uniqueness of worm
invariants–Greedy algorithm for finding token
conjunction signatures
•Highly accurate while much faster–Both analytically and experimentally –Compared with the latest work, polygraph–Suffix array based token extraction
•Provable attack resilience guarantee•Noise tolerant
11
Outline
•Motivation•Hamsa Design•Model-based Signature Generation•Evaluation•Related Work•Conclusion
12
Hamsa Signature Generator
• Core part: Model-based Greedy Signature Generation
• Iterative approach for multiple worms
TokenExtractor Tokens
FilterPool sizetoo small?
NO
SuspiciousTraffic Pool
NormalTraffic Pool
YES
Quit
SignatureRefiner
SignatureTokenIdentification
Core
13
Problem Formulation
SignatureGenerator Signature
false positive bound
Maximize the coverage in the suspicious pool
False positive in the normal pool is bounded by
Suspicious pool
Normal pool
With noise NP-Hard!
Without noise, can be solve linearly using token extraction
14
Model Uniqueness of Invariants
FP21%
9%
17%
5%
t1
Joint FP with t1
2%
0.5%
1%
t2
The total number of tokens bounded by k*
U(1)=upper bound of FP(t1) U(2)=upper bound of FP(t1,t2)
15
Signature Generation Algorithm
(82%, 50%)
(COV, FP)
(70%, 11%)
(67%, 30%)
(62%, 15%)
(50%, 25%)
(41%, 55%)
(36%, 41%)
(12%, 9%)
u(1)=15%Suspicious pool tokens
token extraction
Order by coverage
t1
16
(82%, 50%)
(COV, FP)
(70%, 11%)
(67%, 30%)
(62%, 15%)
(50%, 25%)
(41%, 55%)
(36%, 41%)
(12%, 9%)
t1
Order by joint coverage with t1
(69%, 9.8%)
(COV, FP)
(68%, 8.5%)
(67%, 1%)
(40%, 2.5%)
(35%, 12%)
(31%, 9%)
(10%, 0.5%)
u(2)=7.5%t2
Signature
Signature Generation Algorithm
17
Algorithm Runtime Analysis
•Preprocessing need:O(m + n + T*l + T*(|M|+|N|))
• Running time: O(T*(|M|+|N|))– In most case |M| < |N| so, it can
reduce to O(T*|N|)
T : the # of tokens l: the maximum length of tokens
|M|: the # of flows in the suspicious pool
|N|: the # of flows in the normal pool
m: the # of bytes in the suspicious pool
n: the # of bytes in the normal pool
18
Provable Attack Resilience Guarantee
• Proved the worse case bound on false negative given the false positive
• Analytically bound the worst attackers can do!
• Example: K*=5, u(1)=0.2, u(2)=0.08, u(3)=0.04, u(4)=0.02, u(5)=0.01 and =0.01
• The better the flow classifier, the lower are the false negatives
Noise ratio FP upper bound
FN upper bound
5% 1% 1.84%
10% 1% 3.89%
20% 1% 8.75%
19
Attack Resilience Assumptions
• Common assumptions for any sig generation sys
1. The attacker cannot control which worm samples are encountered by Hamsa
2. The attacker cannot control which worm samples encountered will be classified as worm samples by the flow classifier
• Unique assumptions for token-based schemes1. The attacker cannot change the frequency of
tokens in normal traffic2. The attacker cannot control which normal
samples encountered are classified as worm samples by the worm flow classifier
20
Attack Resilience Assumptions
• Attacks to the flow classifier– Our approach does not depend on perfect
flow classifiers– But with 99% noise, no approach can work!– High noise injection makes the worm
propagate less efficiently.
• Enhance flow classifiers– Cluster suspicious flows by return messages– Information theory based approaches
(DePaul Univ)
21
Generalizing Signature Generation with noise
• BEST Signature = Balanced Signature– Balance the sensitivity with the specificity– Create notation scoring function:
score(cov, fp, …) to evaluate the goodness of signature
– Current used
» Intuition: it is better to reduce the coverage 1/a if the false positive becomes 10 times smaller.
» Add some weight to the length of signature (LEN) to break ties between the signatures with same coverage and false positive
LENCOVFPLENFPCOVscore )10),log((),,(
22
Hamsa Signature Generator
Next: Token extraction and token identification
TokenExtractor Tokens
FilterPool sizetoo small?
NO
SuspiciousTraffic Pool
NormalTraffic Pool
YES
Quit
SignatureRefiner
SignatureTokenIdentification
Core
23
Token Exaction•Problem formulation:
– Input: a set of strings, and minimum length l and minimum coverage COVmin
–Output: »A set of tokens (substrings) meet the minimum
length and coverage requirements• Coverage: the portion of strings having the token
»Corresponding sample vectors for each token
•Main techniques:–Suffix array–LCP (Longest Common Prefix) array, and LCP
intervals–Token Exaction Algorithm (TEA)
24
Suffix Array• Illustration by an
example– String1: abrac, String2:
adabra
– Cat together: abracadabra$
– All suffix: a$, ra$, bra$, abra$, dabra$…
– Sort all the suffix:– 4n space– Sorting can be done in
4n space and O(nlog(n)) time
a 10
abra 7
abracadabra 0
acadabra 3
adabra 5
bra 8
bracadabra 1
cadabra 4
dabra 6
ra 9
racadabra 2
25
LCP Array and LCP Intervals
Suffixes sufarr lcparr idx str
a 10 - (0) 0 2
abra 7 1 1 2
abracadabra 0 4 2 1
acadabra 3 1 3 1
adabra 5 1 4 2
bra 8 0 5 2
bracadabra 1 3 6 1
cadabra 4 0 7 1
dabra 6 0 8 2
ra 9 0 9 2
racadabra 2 2 10 1
0-[0,10]
1-[0,4] 3-[5,6] 2-[9,10]
4-[1..2]
LCP intervals => tokens
26
Token Exaction Algorithm (TEA)
•Find eligible LCP intervals first•Then find the tokens
27
Token Exaction Algorithm (TEA)
28
Token Exaction Algorithm (TEA)
29
Token Identification
•For normal traffic, pre-compute and store suffix array offline
•For a given token, binary search in suffix array gives the corresponding LCP intervals
•O(log(n)) time complexity–More sophisticated O(1) algorithm is
possible, may require more space
30
Implementation Details
• Token Extraction: extract a set of tokens with minimum length l and minimum coverage COVmin.
– Polygraph use suffix tree based approach: 20n space and time consuming.
– Our approach: Enhanced suffix array 8n space and much faster! (at least 20 times)
• Calculate false positive when check U-bounds (Token Identification)
– Again suffix array based approach, but for a 300MB normal pool, 1.2GB suffix array still large!
– Optimization: using MMAP, memory usage: 150 ~ 250MB
31
Hamsa Signature Generator
TokenExtractor Tokens
FilterPool sizetoo small?
NO
SuspiciousTraffic Pool
NormalTraffic Pool
YES
Quit
SignatureRefiner
SignatureTokenIdentification
Core
Next: signature refinement
32
Signature Refinement
•Why refinement? –Produce a signature with same
sensitivity but better specificity
•How?–After we use the core algorithm to get
the greedy signature, we believe the samples matched by the greedy signature are all worm samples
–Reduce to a signature generation without noise problem. Do another round token extraction
33
Extend to Detect Multiple Worms
•Iteratively use single worm detector to detect multiple worms–At the first iteration, the algorithm
find the signature for the most popular worms in the suspicious pool.
–All other worms and normal traffic treat as noise
34
Practical Issues on Data Normalization
•Typical cases need data normalization– IP packet fragmentation–TCP flow reassembly (defend fragroute)–RPC fragmentation–URL Obfuscation–HTML Obfuscation–Telnet/FTP Evasion by \backspace or \
delete keys•Normalization translates data into
the canonical form
35
•Hamsa with data normalization works better
•Without or with weak data normalization, Hamsa still work–But because the data many have
different forms of encoding, may produce multiple signature for a single worm
–Need sufficient samples for each form of encoding
Practical Issues on Data Normalization (II)
36
Outline
• Motivation• Hamsa Design• Model-based Signature Generation• Evaluation• Related Work• Conclusion
37
Experiment Methodology
• Experiential setup:– Suspicious pool:
» Three pseudo polymorphic worms based on real exploits (Code-Red II, Apache-Knacker and ATPhttpd),
» Two polymorphic engines from Internet (CLET and TAPiON).
– Normal pool: 2 hour departmental http trace (326MB)
• Signature evaluation:– False negative: 5000 generated worm samples
per worm– False positive:
» 4-day departmental http trace (12.6 GB)» 3.7GB web crawling including .mp3, .rm, .ppt, .pdf, .swf
etc.» /usr/bin of Linux Fedora Core 4
38
Results on Signature Quality
• Single worm with noise– Suspicious pool size: 100 and 200 samples– Noise ratio: 0%, 10%, 30%, 50%, 70%– Noise samples randomly picked from the
normal pool– Always get above signatures and accuracy.
WormsTraining
FNTraining
FPEvaluation
FNEvaluatio
nFP
Binaryevaluation
FP
Signature
Code-Red II 0 0 0 0 0
{'.ida?': 1, '%u780': 1, ' HTTP/1.0\r\n': 1, 'GET /': 1, '%u': 2}
CLET 0 0.109% 0 0.06236% 0.268%
{'0\x8b': 1, '\xff\xff\xff': 1,'t\x07\xeb': 1}
39
Results on Signature Quality (II)
• Suspicious pool with high noise ratio:– For noise ratio 50% and 70%, sometimes we
can produce two signatures, one is the true worm signature, anther solely from noise, due to the locality of the noise.
– The false positive of these noise signatures have to be very small:
» Mean: 0.09%» Maximum: 0.7%
• Multiple worms with noises give similar results
40
Experiment: U-bound evaluation
• • To be conservative we chose k*=15.
– u(k*)= u(15)= 9.16*10-6.
• u(1) and ur evaluation– We tested:u(1) = [0.02, 0.04, 0.06, 0.08, 0.10,
0.20, 0.30, 0.40, 0.5]
– and ur = [0.20, 0.40, 0.60, 0.8].
– The minimum (u(1), ur) works for all our worms was (0.08,0.20)
– In practice, we use conservative value (0.15,0.5)
*1 1 *)1()( kiuuiu ir
41
Speed Results• Implementation with C++/Python
– 500 samples with 20% noise, 100MB normal traffic pool, 15 seconds on an XEON 2.8Ghz, 112MB memory consumption
• Speed comparison with Polygraph– Asymptotic runtime: O(T) vs. O(|M|2), when |M|
increase, T won’t increase as fast as |M|!– Experimental: 64 to 361 times faster (polygraph
vs. ours, both in python)
0
1000
2000
3000
100 200 300 400
pool size
the
nu
mb
er
of
tok
en
s
20% noise
30% noise40% noise
50% noise
42
Experiment: Sample requirement
• Coincidental-pattern attack [Polygraph]• Results
– For the three pseudo worms, 10 samples can get good results
– CLET and TAPiON at least need 50 samples
• Conclusion– For better signatures, to be conservative, at
least need 100+ samplesRequire scalable and fast signature generation!
43
Token-fit Attack Can Fail Polygraph
•Polygraph: hierarchical clustering to find signatures w/ smallest false positives
•With the token distribution of the noise in the suspicious pool, the attacker can make the worm samples more like noise traffic –Different worm samples encode different
noise tokens
•Our approach can still work!
44
Token-fit attack could make Polygraph fail
Noise samplesN1 N2 N3
Worm samplesW1
W2 W3
MergeCandidate 1
MergeCandidate 2
MergeCandidate 3
CANNOT merge further!NO true signature found!
45
Experiment: Token-fit attack
• Suspicious of 50 samples with 50% noise• Elaborate different worm samples like
different noise samples.• Results
– Polygraph 100% false negative– Hamsa still can get the correct signature as
before!
46
Outline
• Motivation• Hamsa Design• Model-based Signature Generation• Evaluation• Related Work• Conclusion
47
Related worksHamsa Polygrap
hCFG PADS Nemea
nCOVERS Malware
Detection
Network or host based
Network
Network Network
Host Host Host Host
Content or behavior based
Contentbased
Contentbased
Behaviorbased
Contentbased
Contentbased
Behavior based
Behaviorbased
Noise tolerance
Yes Yes (slow)
Yes No No Yes Yes
Multi worms in one protocol
Yes Yes (slow)
Yes No Yes Yes Yes
On-line sig matching
Fast Fast Slow Fast Fast Fast Slow
Generality Generalpurpose
Generalpurpose
Generalpurpose
Generalpurpose
Protocolspecific
Serverspecific
Generalpurpose
Provable atk resilience
Yes No No No No No No
Information exploited
48
Conclusion• Network based signature generation
and matching are important and challenging
• Hamsa: automated signature generation
– Fast– Noise tolerant– Provable attack resilience– Capable of detecting multiple worms in a
single application protocol
• Proposed a model to describe the worm invariants
Questions ?
50
Results on Signature Quality (II)
• Suspicious pool with high noise ratio:– For noise ratio 50% and 70%, sometimes we
can produce two signatures, one is the true worm signature, anther solely from noise.
– The false positive of these noise signatures have to be very small:
» Mean: 0.09%» Maximum: 0.7%
• Multiple worms with noises give similar results
51
Normal Traffic Poisoning Attack
• We found our approach is not sensitive to the normal traffic pool used
• History: last 6 months time window• The attacker has to poison the normal
traffic 6 month ahead!• 6 month the vulnerability may have
been patched!• Poisoning the popular protocol is very
difficult.
52
Red Herring Attack
•Hard to implement•Dynamic updating problem.
Again our approach is fast•Partial Signature matching, in
extended version.
53
Coincidental Attack
•As mentioned in the Polygraph paper, increase the sample requirement
•Again, our approach are scalable and fast
54
Model Uniqueness of Invariants•Let worm has a set of invariants:
Determine their order by:
t1: the token with minimum false positive in normal traffic. u(1) is the upper bound of the false positive of t1
t2: the token with minimum joint false positive with t1 FP({t1,t2}) bounded by u(2)
ti: the token with minimum joint false positive with {t1, t2, ti-1}. FP({t1,t2,…,ti}) bounded by u(i)
The total number of tokens bounded by k*
jtFPtFP j })({})({ 1
1 }),({}),({ 121 jttFPttFP j
1 }),,...,({}),...,({ 111 ijtttFPttFP jii
55
Problem FormulationNoisy Token Multiset Signature Generation
Problem :
INPUT: Suspicious pool and normal traffic pool N; value <1.OUTPUT: A multi-set of tokens signature S={(t1, n1), . . . (tk, nk)} such that the signature can maximize the coverage in the suspicious pool and the false positive in normal pool should less than
•Without noise, exist polynomial time algo•With noise, NP-Hard
56
Generalizing Signature Generation with noise
• BEST Signature = Balanced Signature– Balance the sensitivity with the specificity– But how? Create notation Scoring
function:score(cov, fp, …) to evaluate the goodness of signature
– Current used
» Intuition: it is better to reduce the coverage 1/a if the false positive becomes 10 times smaller.
» Add some weight to the length of signature (LEN) to break ties between the signatures with same coverage and false positive
LENCOVFPLENFPCOVscore )10),log((),,(
57
Generalizing Signature Generation with noise
• Algorithm: similar
• Running time: same as previous simple form
• Attack Resilience Guarantee: similar
58
Extension to multiple worm
• Iteratively use single worm detector to detect multiple worm
– At the first iteration, the algorithm find the signature for the most popular worms in the suspicious pool. All other worms and normal traffic treat as noise.
– Though the analysis for the single worm can apply to multiple worms, but the bound are not very promising. Reason: high noise ratio
59
Token Extraction
• Extract a set of tokens with minimum length lmin and coverage COVmin. And for each token output the frequency vector.
• Polygraph use suffix tree based approach: 20n space and time consuming.
• Our approach:– Enhanced suffix array 4n space– Much faster, at least 50(UPDATE) times!– Can apply to Polygraph also.
60
Calculate the false positive
• We need to have the false positive to check the U-bounds
• Again suffix array based approach, but for a 300MB normal pool, 1.2GB suffix array still large!
• Improvements– Caching– MMAP suffix array. True memory usage: 150
~ 250MB.– 2 level normal pool– Hardware based fast string matching– Compress normal pool and string matching
algorithms directly over compressed strings
61
Future works
•Enhance the flow classifiers–Cluster suspicious flows by return
messages–Malicious flow verification by
replaying to Address Space Randomization enabled servers.