Learning on User Behavior for Novel Worm Detection

Learning on User Learning on User Behavior for Novel Behavior for Novel

Worm DetectionWorm Detection

Steve Martin, Anil Sewani, Blaine Nelson, Karl Chen, and

Anthony Joseph

{steve0, anil, nelsonb, quarl, adj}@cs.berkeley.edu

University of California at Berkeley

The Problem: Email Worms

(source: http://www.sophos.com)

• Email worms cause billions of dollars of damage yearly.– Nearly all of the most virulent worms of 2004 spread

by email:

Current Solutions• Signature-based methods are effective against

known worms only.– 25 new Windows viruses a day released during

2004!

• Human element slows reaction times.– Signature generation can take hours to days.– Signature acquisition and application can take hours

to never.

• Signature methods are mired in an arms race.– MyDoom.m and Netsky.b got through EECS mail

scanners

Statistical Approaches• Unsupervised learning on network behavior.

– Leverage behavioral invariant: a worm seeks to propagate itself over a network.

• Previous work: novelty detection by itself is not enough.– Many false negatives = worm attack will succeed.– Many false positives = irritated network admins.

• Common solution: make the novelty detector model very sensitive.– Tradeoff: Introduces additional false positives.– Can render a detection system useless.

Our Approach• Use two-layer approach to filter novelty detector

results.– Novelty detector minimizes false negatives.– Secondary classifier filters out false positives.

• Leverage human reactions and existing methods to improve secondary classifier.– Use supervisor feedback to partially label data corpus– Correct and retrain as signatures become available

• Filter novelty detection results with per-user classifier trained on semi-supervised data.

Per-User Detection Pipeline

EmailData

Human Supervisor

Novelty Detector

Parametric Classifier

Network

1. Features are calculated on outgoing email.

2. Novelty detector classifies based on global email behavior.

3. Per-user parametric classifier filters results. Continually re-trained on semi-labeled data corpus.

4. Quarantined results reported to human supervisor.

Pipeline Details• Both per-email and per-user features used.

– User features capture elements of behavior over a window of time.

– Email features examine individual snapshots of behavior.

• Any novelty detector can be inserted.– These results use a Support Vector Machine.– One SVM is trained on all users’ normal email.

• Parametric classifier leverages distinct feature distributions via a generative graphical model.– A separate model is fit for each user.– Classifier retrains over semi-supervised data.

System Deployment

User 3

Mail Server

User 2

User 1

User 2Classifier

GlobalSVM

User 1Classifier

User 3Classifier

Supervisor

ThresholdResults

Using Feedback• Use existing virus scanners to update corpus.

– For each email within last d days:• If the scanner returns virus, we label virus• If the scanner returns clean, we leave the current label.

– Outside prev. d days, scanner labels directly.

• Threshold number of emails classified as virus to detect user infection.– Machine is quarantined, infected emails queued.

• If infection confirmed, i random messages from queue are labeled by the supervisor.– Model is retrained – Labels retained until virus scanner corrects them.

Feedback Utilization Process

Completely Labeled Data

Completely Labeled Data Threshold Active, User Quarantined

. . .

d

Administrator

. . .Semi-Labeled Data

Mail Server

time

VirusScanner . . .

i

Unlabeled Email

Known Clean Email

Known Virus Email

Suspected Virus Email

Evaluation• Examined feature distributions on real email.

– Live study with augmented mail server and 20 users.– Used Enron data set for further evaluation.

• Collected virus data for six email worms using virtual machines and real address book.– BubbleBoy, MyDoom.u, MyDoom.m, Netsky.d,

Sobig.f, Bagle.f

• Constructed training/test sets of real email traffic artificially ‘infected’ with viruses. – Infections interleaved while preserving intervals

between worm emails.

Results I

• Average Accuracy: 79.45%• Training Set: 1000 infected emails from 5 different

worms, 400 clean emails• Test set: 200 infected emails, 1200 clean emails

Table 1. Results using only SVM

Virus Name False Positives False Negatives Accuracy

BubbleBoy 23.56% 1.01% 79.64%

Bagle.F 23.90% 0.00% 79.50%

Netsky.D 24.06% 0.00% 79.36%

Mydoom.U 23.98% 0.00% 79.43%

Mydoom.M 23.61% 0.00% 79.71%

Sobig.F 24.14% 1.51% 79.07%

Results II

• Average Accuracy: 99.69%• Training Set: 1000 infected emails from 5 different worms,

400 clean emails• Test set: 200 infected emails, 1200 clean emails

Table 2. Results using SVM and Semi-Sup Classifier

Virus Name False Positives False Negatives Accuracy

BubbleBoy 0.00% 1.51% 99.79%

Bagle.F 0.00% 2.01% 99.71%

Netsky.D 0.00% 2.01% 99.71%

Mydoom.U 0.00% 2.01% 99.64%

Mydoom.M 0.00% 2.03% 99.64%

Sobig.F 0.00% 2.01% 99.64%

Documents

Learning on User Behavior for Novel Worm Detection