10 Aniello

Embed Size (px)

Citation preview

  • 7/27/2019 10 Aniello

    1/6

    Inter-Domain Stealthy Port Scan Detectionthrough Complex Event Processing

    Leonardo Aniello, Giorgia Lodi and Roberto Baldoni

    Dipartimento di Informatica e Sistemistica "Antonio Ruberti"Universit degli Studi di Roma "La Sapienza", Via Ariosto 25, 00185 Roma, Italy

    aniello,lodi,[email protected]

    ABSTRACT

    Large enterprises are nowadays complex interconnected soft-ware systems spanning over several domains. This new di-mension makes difficult for enterprises the task of enablingefficient security defenses. This paper addresses the problemof detecting inter-domain stealthy port scans and proposesan architecture of an Intrusion Detection System which uses,for such purpose, an open source Complex Event Processing

    engine named Esper. Esper provides low cost of ownershipand high flexibility. The architecture consists of softwaresensors deployed at different enterprise domains. Each sen-sor sends events to the Esper event processor for correlation.We implemented an algorithm for the detection of inter-domain SYN port scans named Rank-based SYN (R-SYN)port scan detection algorithm. It combines and adapts threedetection techniques in order to obtain a unique global state-ment about the malicious behavior of host activities. Anevaluation of the accuracy of our approach has been carriedout using several traces, some of which including originaltraffic dumps, some others altered by injecting packets thatsimulate port scan activities. Accuracy results show that ouralgorithm is able to produce a list of scanners characterizedby high detection and low false positive rates.

    Categories and Subject DescriptorsC.2.0 [Computer-Communication Networks]: GeneralSecurity and protection; I.5.2 [Pattern Recognition]: De-sign MethodologyPattern analysis

    General Terms

    Algorithms, Measurement, Security

    Keywords

    Intrusion detection systems, Complex Event Processing, Portscan

    1. INTRODUCTIONPort scan is one of the most widespread mechanism used

    by attackers for obtaining information on possible vulner-

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.

    EWDC 11, May 11-12, 2011, Pisa, ItalyCopyright 2011 ACM 978-1-4503-0284-5/11/05 ...$10.00.

    abilities of any target. Port scan is a preparatory actionperformed in several security threats such as worm spread-ing, botnet formation and DDoS attacks. Among the variousexisting port scan variants, a very common one is the SYNport scan, also named Half Open (HO) port scan [16]. HOport scan is a form of stealthy port scan that aims at uncov-ering the status of certain TCP ports without being tracedby application level loggers.

    When an attacker is committed to a target organization

    or enterprises, its first steps usually consist in reconnais-sance activities, done through port scanning, that discoverwhether any vulnerability exists which could be exploitedat a later time for various purposes (e.g., capturing sensitiveinformation, disrupting service operation). The attack con-sists in sending probing packets to specific enterprise sitesin order to find out those sites that exhibit some leaks. Anenterprise defends itself from such activity by means of In-trusion Detection Systems (IDSs). IDSs monitor the traf-fic coming into enterprise network by looking for suspiciouspatterns and signatures. Usually, attackers are aware of thepresence of such systems: they thus attempt to perform theiractivities in a stealthy fashion in order to elude the IDSs.IDSs trace the amount and pattern of connections issued by

    source hosts and check behavioral profiles against some con-figured rule based on time windows and thresholds. From anattacker point of view, one of the most effective way of cir-cumventing these security checks consists in distributing theport scans both in space, probing a few ports of interest atdifferent sites to avoid exceeding configured thresholds, andin time, delaying the single probes to bypass time windowcontrols.

    If the enterprise is deployed on top of a single adminis-trative domain and it is connected to the internet through asingle physical network connected to the Internet, IDSs canbe effectively tuned, and most of the earlier reconnaissanceactivities can be detected by correlating the data receivedby enterprises hosts. For large scale enterprises (e.g., cor-porate banks, power grid companies etc) spanning multiple

    domains, each including many sites, the defense offered bythese IDSs cannot be sufficient since probing packets couldbe sent to distinct enterprise domains whose data cannotbe correlated. Without a global correlation over the trafficobserved at every domain, no suspect would arise about thebehavior of an attacker who is carrying on an inter-domainport scan targeting the domains of some corporate company.

    In this paper we address the general problem of detectinginter-domain port scanning activities originating from sin-gle sources. The detection is carried out in a cooperative

  • 7/27/2019 10 Aniello

    2/6

    fashion by correlating network traffic data coming from ge-ographically distributed enterprise nodes. To this end, wehave designed an IDS architecture that can (i) easily dealwith the evolution of the monitored system. In a large scaleenterprise network, additional domains may be added; thearchitecture is able to extend its deployment in order tomonitor these new parts of the system; and (ii) devise aneasy way for updating the detection logic which also haslow cost of ownership and high flexibility. As new security

    mechanisms are put in practice, malicious attackers enablenew ways for circumventing them. In order to cope withthis evolving scenario it is required that the architecture canpromptly deploy new techniques for facing these brand-newthreats.

    The proposed solution employs so-called Gateway com-ponents, i.e., software sensors located at each enterprisegeographically dispersed domain that is to be monitored.Gateways send captured network traffic data to a ComplexEvent Processing (CEP) engine. The engine is responsiblefor correlating the data and thus discover spatial and/ortemporal relationships among apparently uncorrelated datathat would have been undetected by in-house IDSs. Weuse Esper [6] as CEP engine. Esper carries out its com-

    putation on the basis of a set of SQL-like queries that canbe configured at run time. This latter Espers feature al-lows us to dynamically adapt the detection logic the IDS isexpected to accomplish, integrating queries for facing newthreats that may arise. To validate the effectiveness of theproposed architecture, we implemented a new SYN port scandetection algorithm, i.e., the Rank-based SYN (R-SYN) al-gorithm which adapts and combines three port scan detec-tion techniques: half-open connections detection, horizontaland vertical scans detection, and entropy-based failed con-nections detection. The algorithm is implemented througha set of queries hosted at the CEP engine; the queries takeas input packets (sent by the Gateways) of the TCP 3-wayhandshaking protocol.

    We carried out an experimental evaluation in order to as-

    sess the detection accuracy of our architecture. We used realtraffic traces both in their original version and after havinginjected packets simulating SYN port scan activities. In ourevaluation we point out the importance of using all the threetechniques, including the entropy on failed connections, byshowing the differences in accuracy when using our detec-tion algorithm with and without one of the techniques. Theresults we obtained are based on a formal evaluation modeland showed high detection and low false positive rates.

    The rest of the paper is organized as follows. Section 2introduces related works in the field of IDSs and CEPs tomotivate the choice of Esper. Section 3 describes the R-SYNport scan detection algorithm. Section 4 presents the solu-tion we designed and implemented. Section 5 outlines theexperimental evaluation we carried out. Section 6 discussesthe main conclusions of this work.

    2. RELATED WORKMany free IDSs exist that are deployed in enterprise set-

    tings. Snort [10] is an open source Network Intrusion Pre-vention/Detection System. It performs real-time traffic anal-ysis and packet logging on IP networks to detect probes orattacks. Bro [7] is an open-source Network IDS that pas-sively monitors network traffic and searches suspicious activ-ity. Its analysis includes detection of specific attacks using

    both defined signatures and events patterns, and unusualactivities. Even though some works addressing distributedSnort-based or Bro-based IDSs (e.g., [5] [15] [2]) exist, tothe best of our knowledge no concrete application seems tohave been developed which faces the issues of online gath-ering raw events from geographically dispersed nodes andcorrelation of those events at real time. Moreover, [5], [15]and [2] propose the correlation of alerts produced by pe-ripheral sensors. Such alerts are generated using local data,

    only. The information that is cut away by the peripheralcomputations could bring IDSs to miss crucial details nec-essary for gaining the global knowledge required to detectinter-domain malicious behaviors. Our solution addressesprecisely this issue through the usage of a general-purposeCEP that offers great flexibility to the management of thedetection logic. The need for ease of deployment also droveus to choose among Java-based CEP systems.

    CEP and Stream Processing (SP) systems play an impor-tant role in the IT technologies. IBM System S [11] has beenused by market makers in processing high-volume marketdata and obtaining low latency results, as reported in [19].System S (like others CEP/SP systems, e.g. [12] [18]) isbased on event detection across distributed event sources.

    However, all these systems exhibit high cost-of-ownership.Our solution employs open source CEP systems such asJBoss Drools [8] and Esper [6]. JBoss Drools is a BusinessLogic Integration Platform which provides a unified and in-tegrated platform for Rules, Workflow and Event Process-ing. Esper is a CEP engine technology that processes eventsand discovers complex patterns among multiple streams ofevent data. As our solution did not require an entire Busi-ness Logic Platform as provided by JBoss Drools, we chosethe lightweight Esper engine for our research.

    3. RANK-BASED SYN PORT SCAN DETEC-

    TION ALGORITHMLet us consider a scanner S, a target Tand a port P to

    scan. A TCP SYN (half-open) port scan is characterized

    as follows: S sends a SYN packet toT : P and waits for aresponse. If a SYN-ACK packet is received, S can concludethatPis open and optionally reply with an RST packet toreset the connection. In contrast, if an RST-ACK packet isreceived,Scan considerPas closed. If no packet is receivedat all andShas some knowledge thatT is reachable, thenScan conclude thatP is filtered. Otherwise, ifSdoes not haveany clue on the reachability status ofT, it cannot assumeanything about the state ofP.

    Not all the port scans can be considered malicious. Forinstance, there exist search engines that carry out port scan-ning activities in order to discover Web servers to index [14].It becomes then crucial to distinguish accurately between ac-tual malicious port scanning activities and benign ones. Tothis end, we have designed the so-called Rank-based SYN(R-SYN) port scan detection algorithm which adapts andcombines three port scan detection techniques; namely (i)Half Open connections detection, (ii) Horizontal and Verti-cal port scans detection, and (iii) Entropy-based failed con-nections detection.

    Half open connections detection It analyzes the se-quence of SYN, ACK, RST packets in the three-way TCPhandshake. Specifically, in normal activities the followingsequence is verified (i) SYN, (ii) SYN-ACK, (iii) ACK. Inthe presence of a SYN port scan, the connection looks like

  • 7/27/2019 10 Aniello

    3/6

    the following: (i) SYN, (ii) SYN-ACK, (iii) RST (or noth-ing) and we refer to it as an incomplete connection. For agiven IP address, if the number of incomplete connectionsis higher than a certain threshold THO (see below), we canconclude that the IP address is likely carrying out maliciousport scanning activities.

    DEF: for a given IP address x, let countHO (x) be thenumber of incomplete connections issued by x; we defineHO(x) as follows:

    HO(x) =

    1 ifcountHO (x)> THO0 otherwise

    Horizontal and vertical port scans detection. In ahorizontal port scan the attackers are interested in a portacross all IP addresses within a range. In a vertical portscan, attackers scan some or all the ports of single destina-tion hosts [17].

    In our algorithm we adapt the Threshold Random Walk(TRW) technique described in [14]. TRW classifies a hostas malicious observing the sequence of its requests. Lookingat the pattern of successful and failed requests of a certainsource IP, it attempts to infer whether the host is behav-ing as a scanner. While the original technique considersas a failure a connection attempt to either an unreachable

    host or to a closed port on a reachable host, we adapt theTRW technique in order to distinguish between connectionattempts to unreachable hosts and attempts to closed porton reachable hosts, since the former concern horizontal portscans whereas the latter to vertical port scans. We havethen designed two modified versions of the original TRW al-gorithm in order to take into account the earlier mentionedaspects.

    Specifically, in order to detect horizontal port scans, weidentify connections to both unreachable and reachable hosts.Hosts are considered unreachable if a sender, after a time in-terval from the sending of a SYN packet, does not receiveneither SYN-ACK nor RST-ACK packets, or if it receivesan ICMP packet of type 3 that indicates that the host isunreachable. In contrast, hosts are reachable if a sender re-ceives SYN-ACK or RST-ACK packets. For each source IPaddress we then count the number of connections to unreach-able and reachable hosts and apply TRW algorithm. LetT RWHS (x) be the boolean output of our TRW algorithmversion for horizontal port scans computed for a certain IPaddress x. True output indicates that x is considered ascanner. Otherwisex is considered a honest host.

    DEF:for a given IP address x, we define H S(x) as follows:

    HS(x) =

    1 ifT RWHS (x) == true0 otherwise

    In order to detect vertical port scans, we first identifyconnections to open and closed ports. We then count suchconnections for each source IP address. LetT RWV S (x) bethe boolean output of our TRW algorithm version for verti-

    cal port scans computed for a certain IP address x.

    DEF:for a given IP address x, we define V S(x) as follows:

    V S(x) =

    1 ifT RWV S (x) == true0 otherwise

    Entropy-based failed connections detection Not allsuspicious activities are actually performed by scanners. Thereexist cases in which the connections are simply failures andnot deliberate malicious port scans [14]. As in [13], in or-der to discriminate failures from malicious port scans, weuse an entropy-based approach. In our case, the entropy

    is evaluated considering two elements of a TCP connection;namely, destination IP and destination port. The entropyassumes a value in the range [0, 1] so that if some source IPaddress is issuing failed connections towards the same des-tination IP and port, its entropy is close to 0; otherwise, ifthe IP address attempts to connect without success to dif-ferent destination IPs and ports, its entropy is close to 1.This evaluation originates by the observation that a scannerIP address does not repeatedly perform the same operation

    towards specific hosts or ports: if the attempt fails a scan-ner likely carries out a malicious port scan towards differenttargets.

    Given a source IP address x, a destination IP address yand a destination port p, we define failures(x,y,p) as thenumber of failed connection attempts ofx towardsy : p. Fora given IP address x, we define N(x) as follows:

    N(x) =

    y,pfailures(x,y,p)

    In addition, we introduce a statistic about the ratio offailed connection attempts towards a specific destination IPand port. We define stat(x,y,p) as follows:

    stat(x,y,p) = failures(x,y,p)N(x)

    The normalized entropy can then be evaluated applying the

    following formula:DEF: for a given IP address x,

    EN(x) =

    y,p(stat(x,y,p)log2(stat(x,y,p)))

    log2(N(x))

    Combining the techniques. We have designed a rankingmechanism which allows us to minimize the probability thata scanner cheats by behaving apparently in a good way: forexample, even if scanners try to circumvent one technique,it is likely that their malicious activities are recognized byone of the other two, thus permitting to identify them inany case.

    Our mechanism sums up the three values related to halfopens, horizontal and vertical port scans and weights thetotal result using the entropy-based failed connections.

    DEF:for a given IP addressx, we definerank(x) as follows:rank(x) = (HO(x) + HS(x) + V S(x)) EN(x)

    Such ranking is compared () to a fixed threshold in orderto mark an IP address as scanner.

    4. INTER-DOMAIN IDS ARCHITECTUREThe architecture of the inter-domain IDS consists of a

    Gateway component installed in each domain and a singleEsper CEP engine instance deployed in any of the availabledomains. The architecture is shown in Figure 1.

    Gateway Traffic data are captured from the internal do-mains. In order to be analyzed by the engine, the data areto be normalized and transformed in Plain Old Java Objects(POJOs). To this end, the Gateway component has been

    designed and implemented to (i) take as input the flows ofnetwork data (TCP data in Figure 1), (ii) filter them tokeep packets related to TCP three-way handshaking only,and, finally (iii) wrap each packet in a proper POJO to besent to Esper. We implemented TCPPojo for TCP packetsand ICMPPojo for ICMP packets. Each POJO maps everyfield in the header of the related protocol. POJOs are serial-ized and sent through Java sockets to Esper. When sendingthe POJOs our implementation maintains the order of thecaptured packets, which is crucial when evaluating sequenceoperators in the EPL queries of the Esper engine.

  • 7/27/2019 10 Aniello

    4/6

    Figure 1: Inter-domain IDS architecture

    Complex Event Processing (CEP)The Esper CEP en-gine [6] receives POJOs that represent the events it has toanalyze (input streams). The processing logic is specified ina high level language similar to SQL, namely the Event Pro-cessing Language (EPL). In order to detect malicious portscanning activities a number of EPL queries are defined andexecuted by the engine, as shown in Figure 1. EPL queriesrun over a continuous stream of POJOs and produce out-put streams. When an EPL query finds a match against itsclauses in its input stream, it generates a new tuple thatis added to its output stream. Asubscriber is a Java ob-ject that can be subscribed to a particular output stream sothat whenever the query outputs a new tuple, theupdate()method of the subscriber is invoked using the tuple as argu-ment.

    Each detection technique of the R-SYN port scan detec-tion algorithm is implemented using a combination of EPLqueries and subscribers. The EPL queries are in charge ofrecognizing any packet patterns which reveal an anomalousbehavior according to the detection metric. Subscribers areinvoked whenever such matches are verified; they updateglobal data structures we have created so as to maintain thestatus about the behavioral profile of each IP address thathas chances of being suspected in future. The data struc-tures are the following:IPs suspected by Half Open connections technique (listHO );IPs suspected by Horizontal port scan technique (listHS ); IPs suspected by Vertical port scan technique (listV S ); stat values for computing the entropy (entropy).

    General queries. We implemented general queries for allthe three techniques. For instance, the following one(syn stream) filters SYN packets:

    insert into syn_streamselect sourceIP, destIP, ...from TCPPojo where SYNFlag=true and ACKFlag=false

    We use a further query (syn ack stream) for filtering incontrast SYN + ACK packets.

    Half Open connections detection. Incomplete connec-tions are identified using the following query

    insert into halfopen_connectionselect ...from pattern [

    every a = syn_stream -> (( b = syn_ack_stream(...) -> (

    and not

    ) where timer:within(10 sec) ) ) ]

    Note that in this case we exploit the pattern constructof Esper to detect patterns of incomplete connections. Inparticular, a is the stream of SYN packets, b is the streamof SYN+ACK packets, is a filter for RST packets andis the stream of ACK packets that would correctly com-plete the three-way handshaking. Such pattern matches iffor everyapackets these are followed bybpackets followed inturn by (or ) within a time window of 10 seconds.Additional queries are then used and bound to Subscribersfor filtering IP addresses that made more than THO (equalto 2) incomplete connections, and updatinglistHO .

    Horizontal port scans detection. Connections attemptsto both reachable and unreachable hosts are recognized. Thefollowing query shows how connections to reachable hostscan be detected:

    insert into host_reach_unreachselect ..., -1 as valuefrom pattern [

    every a = syn_stream -> ( (( or ) and not

    ) where timer:within(10 sec) ) ]

    Tuples of related output stream have avalue field thatis -1 for reachable hosts and 1 for unreachable hosts. Thepattern for distinguishing reachable hosts consists of SYNpacket (a), followed by a packet that can be a SYN + ACK() or RST + ACK () but not an ICMP packet(). Such pattern matches if involved packets are withina time window of 10 seconds. The query we use for unreach-able hosts can be expressed as follows:

    insert into host_reach_unreachselect ..., 1 as valuefrom pattern [

    every a = syn_stream ->timer:interval(10 sec) andnot ( ( or ) and not ) ]

    In this case, we search a data pattern in which a SYNpacket is not followed by any packet matching the expression(( or ) and not ) within a time interval of 10seconds. The meaning of the symbols in the expression isthe same as for the query that detects reachable hosts. Theoutput stream is then manipulated by another query thatcreates a further stream as follows:

    select sourceIP, count(*) as SUM, sum(value) as DIFFfrom host_reach_unreach

    group by sourceIP

    A Subscriber is registered for this query and it executesTRW calculations. In particular, the number of connec-tions to reachable and unreachable hosts, required by theTRW computation, are obtained using SU M and DIFFfields that result from the above query. If an IP address issuspected of carrying out a horizontal port scan, listHS isupdated accordingly.

    Vertical port scans detection. In order to detect verti-cal port scans, connection attempts to both open and closed

  • 7/27/2019 10 Aniello

    5/6

    ports are discovered so as to produceport open closed stream.Such stream has avaluefield equal to 1 for closed ports and-1 for open ports, as host reach unreach earlier described.The main difference from horizontal port scans is the patternused to single out connections to closed ports and connec-tions to open ports.Similarly to horizontal port scan detection another query isused to computeS U M andDIFFfields, used in turn by aSubscriber for vertical scans that executes TRW calculations

    and updates listV S .

    Entropy-based failed connections detection. We im-plemented three queries for detecting connections to un-reachable hosts, closed ports and incomplete connections, re-spectively, and updatingfailuresstream havingsourceIP,sourcePort,destIPand destPort fields. The following querycreates a stream containing the information required forcomputingstat values:

    insert into entropy_syn_stream

    select sourceIP, destIP,destPort, count(*) as stat

    from failuresgroup by sourceIP, destIP,

    destPort

    The Subscriber registered for such stream updates theentropy data structure we used in our entropy-based com-putation.

    Computing the entropy requires a large amount of storagedue to the fact thatsourceIPcan be any possible IP addresswhiledestIPis restricted to an IP address of the enterprisedomains. A rough estimation of the storage necessary tocompute the entropy is in the order of 256TBytes when con-sidering large organizations owning 216 IP addresses. In casethis storage is not available a windowing system can be in-troduced to cut storage.

    Ranking. Once global data structures are updated, theranking of suspicious hosts is evaluated. Given an IP addressx, H O(x) (HS(x) and V S(x)) is equal to 1 if and only ifxbelongs to listHO (listHS and listV S ), 0 otherwise. EN(x)is computed considering stat values contained in entropydata structure. If the computedrank(x) exceeds a certainthreshold,x is marked as scanner. In the experiments shownin the next section the threshold is set to 0.52.

    5. EXPERIMENTAL EVALUATIONTestbed. We have deployed the IDS prototype we haveimplemented on a small cluster of 4 Windows Virtual Ma-chines (VMs), each of which equipped with 2GB of RAMand 63GB of disk space. The 4 VMs were hosted in a clus-ter of 4 quad core 2.8 Ghz dual processor physical machinesequipped with 24GB of RAM. The physical machines are

    connected to a LAN of 10Gbit.The layout of the components on the cluster consisted of aVM dedicated to host the Esper CEP engine. Each of the re-maining 3 VMs represented the resources made available forthe simulated domain. Each resource hosted the Gatewaycomponent.

    TracesTo test the effectiveness of our IDS, we used 4 realtraces obtained from ITOC research web site [3], LBNL/ICSIEnterprise Tracing Project [4] and MIT DARPA Intrusiondetection project [1]. To test the detection accuracy of ourIDS architecture we have synthetically added scanners to

    all the traces with a trace infector program that infectsa trace by injecting TCP packets which simulate all knownpatterns of SYN port scanning. Table 1 provides a highlevel summary of the content of the traces. We have man-ually assessed offline the content of the traces in order tounderstand the scenarios of attacks included in those data,thus comparing those scenarios with the results obtainedfrom R-SYN algorithm.

    sources connections scannerstrace1 - 3MB 10 1165 0trace1* - 3MB 10 1429 7

    trace2 - 5MB 15 223 1trace2* - 5MB 15 487 8

    trace3 - 80MB 36 9559 0trace3* - 80MB 36 9823 7

    trace4 - 160MB 39 413772 3trace4* - 160MB 39 414036 10

    * infected trace

    Table 1: Content of the traces

    Aggregate Input Streams Bandwidth Our centralized

    IDS may suffer from scalability issues when new domainsto monitor are added. To estimate how IDS network loadincreases with the growth of the number of domains, weevaluated the aggregate bandwidth used by Gateways forsending POJOs to Esper. Each of the 8 traces has beenpartitioned over the three simulated domains and the re-quired aggregate bandwidth was always less than 80 kbit/s.When attacks are carried out, we could expect an increase ofthis bandwidth; however, it is expected to remain within areasonable range with respect to the available bandwidth ofthe enterprise connection. The aggregated bandwidth showsalso that our IDS can scale to tens of domains.

    ResultsIn order to assess the accuracy of the IDS, we parti-tioned the traces, both in their original and infected versions,and injected resulting sub-traces to available Gateways toobserve what we were able to detect. In particular, in orderto show the effectiveness of our R-SYN algorithm we run anumber of tests using the traces and considering the follow-ing accuracy metrics (following the assessment described in[20]): (i)T P (True Positive) which represents the numberof suspicious activities we detect as attacks and are true at-tack activities; (ii)F P (False Positive) which represents anerror of the detection; that is, a normal activity which iserroneously considered as an attack; (iii) T N (True Nega-tive) which represents a number of normal activities that wedetect as true normal activities; (iv)F N (False Negative)which represents a number of activities that are real attacksthat we do not detect. With these values we computedDR(Detection Rate) as DR = T P/(T P +F N) and, finally,

    F P R (False Positive Rate) as F P R = F P/(F P +T N).The results of these tests are reported in Table 2.

    In case of trace1 and trace3, there is noDR as no scanneractivities are included in those traces (see Table 1). Fromthe results it emerges that the R-SYN algorithm is able toaccurately detect attackers activities, with an exception fortrace4 and trace4* where one scanner is not recognized andtwo source IPs (trace4) or one source IP (trace4*) are erro-neously considered malicious.

    In order to show the effect of the entropy on the R-SYNalgorithm we report test results executed on a Ranking func-

  • 7/27/2019 10 Aniello

    6/6

    TP FP TN FN DR FPRtrace1 0 0 10 0 0%trace1* 7 0 3 0 100% 0%

    trace2 1 0 14 0 100% 0%trace2* 8 0 7 0 100% 0%

    trace3 0 0 36 0 0%trace3* 7 0 29 0 100% 0%

    trace4 2 2 34 1 66% 6%

    trace4* 9 1 28 1 90% 3%*Infected trace

    Table 2: Accuracy metrics for R-SYN Algorithm

    TP FP TN FN DR FPRtrace1 0 4 6 0 40%trace1* 7 1 2 0 100% 33%

    trace4 2 17 19 1 66% 47%trace4* 9 10 19 1 90% 34%

    *Infected trace

    Table 3: Accuracy metrics for R-SYN Algorithmwithout entropy correction

    tion that does not exploit the entropy computation. In thiscase rank(x) is an integer in the interval [0 3] and thethreshold used to detect the scanners is set to 1. Table 3reports only the values of the traces that change from Table2. Comparing the two tables, it clearly emerges that theentropy correction is highly effective in reducing F P R. Inall 4 traces reported in Table 3, at least 1/3 of non mali-cious source IPs are recognized as scanners. This is actuallydue to the ability of entropy-based technique to properlydiscriminate suspicious activities from honest TCP failures.

    In conclusion, the ranking system used in R-SYN algo-rithm shows an increased detection accuracy when imple-mented considering all the three proposed detection tech-niques, including the entropy-based technique.

    6. CONCLUDING REMARKS

    More and more cyber attacks are carried out against largeorganizations (e.g., corporate banks, power grid companies).Currently 1 out of 5 of such attacks comes together with anextortion [9]. Port scanning is a preparatory action to manymajor attacks such as worm spreading and DDoS perpe-trated against large organizations.

    In this paper we described an IDS architecture based onCEP which specifically targets the detection of reconnais-sance activities against organizations spanning multiple ad-ministration domains. The architecture is based on Gate-ways deployed at the different domains of the organizationsthat send only the necessary data (i.e., packets related tothe 3-way handshake protocol) to a central CEP engine forcorrelation purposes. We presented the R-SYN port scanalgorithm and deployed it on the inter-domain IDS archi-tecture. Results on real traces show the effectiveness of ourapproach with respect to the detection accuracy. In partic-ular we obtained high detection and false positive rates.

    Even though storage and bandwidth requirements of inter-domain IDS can be met, in order to increase scalability weare studying techniques to bound the storage needed at theCEP side and to reduce the bandwidth necessary to transferthe data from the Gateways to the engine. We are currentlyconducting a number of experiments aiming at evaluatingboth the scalability of the entire system and its performancesin terms of detection latencies in WAN settings.

    7. ACKNOWLEDGEMENTSThis work has been partially supported by the EU project

    CoMiFin on the protection of the Financial Infrastructurefrom Cyber Attacks.

    8. REFERENCES[1] 2000 DARPA Intrusion Detection Scenario Specific Data

    Sets. http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/2000data.html .

    [2] Broccoli, the Bro client communications library.http://www.icir.org/christian/broccoli/index.html .

    [3] ITOC Research: CDX Datasets. http://www.itoc.usma.edu/research/dataset/index.html .

    [4] LBNL/ICSI Enterprise Tracing Project.http://www.icir.org/enterprise-tracing/ .

    [5] Complete Snort-based IDS Architecture.http://cybervlad.net/ids/index.html , 2002.

    [6] Where Complex Event Processing meets Open Source:Esper and NEsper. http://esper.codehaus.org/, 2009.

    [7] Bro: an open source Unix based Network intrusiondetection system (NIDS). http://www.bro-ids.org/, 2010.

    [8] JBoss Drools Fusion.http://www.jboss.org/drools/drools-fusion.html , 2010.

    [9] McAfee report In the crossfire: critical infrastructures inthe age of cyber war, 2010.

    [10] Snort: an open source network intrusion prevention anddetection system (IDS/IPS). http://www.snort.org/, 2010.

    [11] System S. http://domino.research.ibm.com/comm/research_projects.nsf/pages/esps.index.html , 2010.

    [12] M. Akdere, U. Cetintemel, and N. Tatbul. Plan-basedcomplex event detection across distributed sources.PVLDB, 1(1):6677, 2008.

    [13] W. G. Hai Zhang, Xuyang Zhu. Tcp portscan detectionbased on single packet flows and entropy. In ICIS 09Proceedings of the 2nd International Conference onInteraction Sciences: Information Technology, Culture andHuman, 2009.

    [14] J. Jung, V. Paxson, A. W. Berger, and H. Balakrishnan.Fast portscan detection using sequential hypothesis testing.In In Proceedings of the IEEE Symposium on Security andPrivacy, 2004.

    [15] S. R. Kreibich C. Policy-controlled event management fordistributed intrusion detection. In Distributed ComputingSystems Workshops, 2005. 25th IEEE InternationalConference on, 2005.

    [16] J. A. H. S. Staniford and J. M. McAlerney. Practicalautomated detection of stealthy portscans. In Proceedingsof the 7th ACM Conference on Computer andCommunications Security, 2000.

    [17] H. Singh and R. Chun. Distributed Port Scan.Handbook ofInformation and Communication Security - Part B, pages221234, 2010.

    [18] C. Tang, M. Steinder, M. Spreitzer, and G. Pacifici. AScalable Application Placement Controller for EnterpriseData Centers. In 16th international Conference on WorldWide Web, 2007.

    [19] X. J. Zhang, H. Andrade, B. Gedik, R. King, J. Morar,S. Nathan, Y. Park, R. Pavuluri, E. Pring, R. Schnier,

    P. Selo, M. Spicer, V. Uhlig, and C. Venkatramani.Implementing a high-volume, low-latency market dataprocessing system on commodity hardware using ibmmiddleware. In WHPCF 09: Proceedings of the 2ndWorkshop on High Performance Computational Finance,pages 18, New York, NY, USA, 2009. ACM.

    [20] C. Zhou, S. Karunasekera, and C. Leckie. Evaluation of aDecentralized Architecture for Large Scale CollaborativeIntrusion Detection. In In Proceedings of the 10thIFIP/IEEE International Symposium on IntegratedNetwork Management, 2007.