11 Packet Sampling for Worm and Botnet Detection in TCP Connections Reporter: 林佳宜 Email: [email protected] 2010/10/25

11

Packet Sampling for Worm and Botnet Detection in TCP Connections

Reporter: 林佳宜Email: [email protected]/10/25

ReferencesLothar Braun, Gerhard Münz, Georg

Carle. "Packet Sampling for Worm and Botnet Detection in TCP Connections." In Proc. of IEEE/IFIP Network Operations and Management Symposium (NOMS) 2010

2

3

OutlineIntroductionSampling algorithmTCP connectionBloom filtersConclusion

IntroductionSignature-based network intrusion

detection systems NIDS Ex: Snort packet headers and packet payload are checked

The permanent growth of network traffic

single system are often not sufficient to analyse the entire traffic

random packet is likely loss

Present a new “sampling algorithm” selects packets carrying the first N payload bytes of every

TCP connection can deployment in high-speed networks the required amount of memory is constant

4

Previous workDetect First 1000 bytes of payload

Anomalous Payload-Based Network Intrusion Detection. (RAID) 2004

Classify a flow as belonging to a P2P application after looking at the first ten packets

Accurate, Scalable In-Network Identification of P2P Traffic Using Application Signatures (www) 2004

Most application signatures appear within the first five packets of a flow

A Hybrid Approach for Accurate Application Traffic Identification (E2EMon) 2006

5

TCP connectionsFor NIDS performing session analysis

the traffic volume in both direction is usually not symmetric

separate limits for both directions result in an earlier cut-off of one direction

IDS cannot properly continue session analysis as one direction of traffic is missing

Beginning of a flow is that first packets sufficient information to classify the entire flow as

harmful or benign

6

Analysis is based on traffic93 different worms and bots

run in a controlled environment

The detection is done with two rule sets shipped with Snort 2.8.4.1 emergingthreats.net (on June 11th 2009)

4030 alarms are raised 3511 by TCP packets 451 by UDP packets 68 by ICMP packets

The TCP alarms constitute the majority of alarms.

7

TCP Alarms95 alarms are raised by the shipped3416 by setemergingthreats rule set

8

Calculate the positionWe calculate the position within the TCP

connection at which an alarm is raised using

TCP sequence acknowledgement numbers TCP payload length

30% of all alarms are only need the handshake packets

96% are found within less than 3kB of payloadThe last alarms are found at about 682kB

belong to a generic shellcode rule

9

Number of payload bytes

10

Accurate TCP Connection TrackingTCP connections are identified by the

addresses and ports

Start and end of a regular TCP connection are defined

SYN, FIN, and RST packets that are exchanged a buffer is needed to withhold a packet

Exceptional situations resulting in an increased number of TCP connections

TCP port and network scans SYN flooding attacks

11

Simplified TCP Connection Tracking[1/2]The first simplification concerns the

detection of connection establishments SYN packet shortly followed by a second packet

without SYN flag

Both packets have to be exchanged between the same endpoints

identified by tuples of IP address and port number

The two packets have to be observed within a small time interval

determined three seconds to be an appropriate value

12

Simplified TCP Connection Tracking[2/2]The second simplification concerns

the connection reassembly: do not perform any packet reordering nor do

we remove duplicated packets

Leave these tasks to the subsequent packet analysis step

e.g., an NIDS

13

Selecting and dropping packetsSample those packets containing the first N

bytes of payloadThey count the payload lengths in the

order of packet arrival But without regarding sequence numbers

In the presence of packet reordering and duplicates

selecting packets which should not be sampled (false positives)

dropping packets which should be sampled (false negatives).

14

Bloom filtersBloom filter is a probabilistic data

structure composed of a bit array and a set of hash

functions Initially, every bit in the array is set to zero

If a new element is tobe inserted into the set

hash functions must be calculated for this element All corresponding bits are then set to one

False positives are possible due to collisions in the hash functions

query id true, but not in the set

15

Memory consumptioncalculated depending on the

number of used hash functions (l)the collision probability of the hash

functions (p)the number of stored elements (k)

16

Bloom filters variantsThey use two Bloom filters variants

summarized

The first variant is called Time-out Bloom filter[1]

Its array is composed of timestamps instead of bits.

The second Bloom filter variant is called Count-Min Sketch (CMS)[2]

17

Sampling algorithmSelects the first packets of a TCP

connection until a maximum of N bytes of payload has been exported

uses two Time-out Bloom filters one CMS to store the required connection

states

Each TCP connection is identified source IP address (SA), destination IP address

(DA), source port (SP), and destination port (DP)

18

The three filtersThe first Time-out Bloom filter

stores the timestamps of all observed SYN packets

The CMS stores the number of payload bytes which need

to be exported for an established TCP connection

The second Time-out Bloom filter stores the point in time because the maximum number of payload bytes

was reached or because a FIN or RST packet was observed.

19

Traces Used in EvaluationUsing two traffic traces

a student residence at the University of Twente

their research group at the Technische Universit¨at Munich

The two traces will be called Twente trace and Munich trace

20

Amount of sampled trafficTwente trace, between 476,817 ( N = 1kB)

and 512,334 (N = 25kB) packets are sampled

Munich trace , between 314,063 and 326,742 TCP packets are sampled

21

Empirical Analysis of Sampling ErrorsThe number of hash functions

influences the computing cost a small number of hash functions is desired

A large number of hash functions reduces the probability of collisions

A good trade-off between computational complexity and collision probability

found that l = 3 hash functions

22

Sampling errorsSampling of payload in the Twente

trace 1kB only 30,130 (7.35%) and 30,694 (7.36%)

error 25kB only 59,789 (11.6%) and 73,925

(14.4%) error

Bloom filters increases the sampling errors by 0.1% to 24%, depending on the traffic

23

Evading detectionAttacker to circumvent packet

selection locate the malicious part of payload beyond

the first N bytes of a TCP connection inserting packets with identical addresses

and ports but invalid sequence numbers performing a TCP scan in order to poison the

values stored sending empty TCP ACK packets

To make evasion more difficult can dynamically vary N over time

24

25

ConclusionWe demonstrate the usability of our

sampling strategy for botnet

Using of a simplified TCP connection tracking mechanism and Bloom filters to store the required connection states

the algorithm can be efficiently implemented in software or hardware

It would be possible to develop a similar sampling algorithm for UDP traffic

Questions

26

1. S. Kong, T. He, X. Shao, and X. Li, “Time-out Bloom Filter: A New Sampling Method for Recording More Flows,” in Proc. of International Conference on Information Networking (ICOIN) 2006, Sendai, Japan, Jan. 2006.

2. G. Cormode and S. Muthukrishnan, “An Improved Data Stream Summary: The Count-Min Sketch and its Applications,” Journal of Algorithms, vol. 55, no. 1, 2005.

27

Documents

11 Packet Sampling for Worm and Botnet Detection in TCP Connections Reporter: 林佳宜 Email: [email protected] 2010/10/25