Transcript

Learning on User Learning on User Behavior for Novel Behavior for Novel

Worm DetectionWorm Detection

Steve Martin, Anil Sewani, Blaine Nelson, Karl Chen, and

Anthony Joseph

{steve0, anil, nelsonb, quarl, adj}@cs.berkeley.edu

University of California at Berkeley

The Problem: Email Worms

(source: http://www.sophos.com)

• Email worms cause billions of dollars of damage yearly.– Nearly all of the most virulent worms of 2004 spread

by email:

Current Solutions• Signature-based methods are effective against

known worms only.– 25 new Windows viruses a day released during

2004!

• Human element slows reaction times.– Signature generation can take hours to days.– Signature acquisition and application can take hours

to never.

• Signature methods are mired in an arms race.– MyDoom.m and Netsky.b got through EECS mail

scanners

Statistical Approaches• Unsupervised learning on network behavior.

– Leverage behavioral invariant: a worm seeks to propagate itself over a network.

• Previous work: novelty detection by itself is not enough.– Many false negatives = worm attack will succeed.– Many false positives = irritated network admins.

• Common solution: make the novelty detector model very sensitive.– Tradeoff: Introduces additional false positives.– Can render a detection system useless.

Our Approach• Use two-layer approach to filter novelty detector

results.– Novelty detector minimizes false negatives.– Secondary classifier filters out false positives.

• Leverage human reactions and existing methods to improve secondary classifier.– Use supervisor feedback to partially label data corpus– Correct and retrain as signatures become available

• Filter novelty detection results with per-user classifier trained on semi-supervised data.

Per-User Detection Pipeline

EmailData

Human Supervisor

Novelty Detector

Parametric Classifier

Network

1. Features are calculated on outgoing email.

2. Novelty detector classifies based on global email behavior.

3. Per-user parametric classifier filters results. Continually re-trained on semi-labeled data corpus.

4. Quarantined results reported to human supervisor.

Pipeline Details• Both per-email and per-user features used.

– User features capture elements of behavior over a window of time.

– Email features examine individual snapshots of behavior.

• Any novelty detector can be inserted.– These results use a Support Vector Machine.– One SVM is trained on all users’ normal email.

• Parametric classifier leverages distinct feature distributions via a generative graphical model.– A separate model is fit for each user.– Classifier retrains over semi-supervised data.

System Deployment

User 3

Mail Server

User 2

User 1

User 2Classifier

GlobalSVM

User 1Classifier

User 3Classifier

Supervisor

ThresholdResults

Using Feedback• Use existing virus scanners to update corpus.

– For each email within last d days:• If the scanner returns virus, we label virus• If the scanner returns clean, we leave the current label.

– Outside prev. d days, scanner labels directly.

• Threshold number of emails classified as virus to detect user infection.– Machine is quarantined, infected emails queued.

• If infection confirmed, i random messages from queue are labeled by the supervisor.– Model is retrained – Labels retained until virus scanner corrects them.

Feedback Utilization Process

Completely Labeled Data

Completely Labeled Data Threshold Active, User Quarantined

. . .

d

Administrator

. . .Semi-Labeled Data

Mail Server

time

VirusScanner . . .

i

Unlabeled Email

Known Clean Email

Known Virus Email

Suspected Virus Email

Evaluation• Examined feature distributions on real email.

– Live study with augmented mail server and 20 users.– Used Enron data set for further evaluation.

• Collected virus data for six email worms using virtual machines and real address book.– BubbleBoy, MyDoom.u, MyDoom.m, Netsky.d,

Sobig.f, Bagle.f

• Constructed training/test sets of real email traffic artificially ‘infected’ with viruses. – Infections interleaved while preserving intervals

between worm emails.

Results I

• Average Accuracy: 79.45%• Training Set: 1000 infected emails from 5 different

worms, 400 clean emails• Test set: 200 infected emails, 1200 clean emails

Table 1. Results using only SVM

Virus Name False Positives False Negatives Accuracy

BubbleBoy 23.56% 1.01% 79.64%

Bagle.F 23.90% 0.00% 79.50%

Netsky.D 24.06% 0.00% 79.36%

Mydoom.U 23.98% 0.00% 79.43%

Mydoom.M 23.61% 0.00% 79.71%

Sobig.F 24.14% 1.51% 79.07%

Results II

• Average Accuracy: 99.69%• Training Set: 1000 infected emails from 5 different worms,

400 clean emails• Test set: 200 infected emails, 1200 clean emails

Table 2. Results using SVM and Semi-Sup Classifier

Virus Name False Positives False Negatives Accuracy

BubbleBoy 0.00% 1.51% 99.79%

Bagle.F 0.00% 2.01% 99.71%

Netsky.D 0.00% 2.01% 99.71%

Mydoom.U 0.00% 2.01% 99.64%

Mydoom.M 0.00% 2.03% 99.64%

Sobig.F 0.00% 2.01% 99.64%


Recommended