Complex Event Processing with Esper

Preview:

DESCRIPTION

Talk I gave at Codebits 2011 on 11/11/11 about Complex Event Processing using Esper.

Citation preview

Complex Event Processing with Esper

@antonioalegria

CEP

Complex Event Processing?

“Complex Event is an event that could only happen if lots of other events happened”

“CEP is a set of tools and techniques for analyzing and controlling the complex series

of interrelated events that drive modern distributed information

systems”

David Luckham, 2002

Example

• Church bell ringing

• Appearance of a man in a tuxedo

• Appearance of a woman in a white gown

• Rice flying through the air

Example

• Church bell ringing

• Appearance of a man in a tuxedo

• Appearance of a woman in a white gown

• Rice flying through the air

Wedding has happened!

CEP Use Cases

• Are our business processes running on time and correctly?

• Can we detect an opportunity for arbitrage in our trading department?

• Are we servicing our call center customer’s requests in a timely fashion?

• Was there a breach in our network?

It’s not a technology

like SOA!

It’s a Buzzword

It’s an Architectural Pattern

What do you need for CEP?

Event driven

(soft) Real-time

(soft) Real-timeRight

Across all layers of organization

Event Aggregation

Event Relationships

• Causality

• Membership

• Timing

Event Patterns

for Event Processing

Domain Specific Language

What you need for CEP

• Event Driven

• Right-time

• Across all layers

• Aggregation, Correlation & Traceability

• Patterns

• DSL

Common CEP Operations

• Windowing

• Transformation

• Aggregation/Grouping

• Merging/Union

• Filtering

• Sorting

• Correlation

• Pattern Detection

http://esper.codehaus.org

Esper

Esper makes it easier to build a CEP app

Not meant to replace Databases

But some parallels can be made

• Stores data

• On-demand queries

• Time is a data type

DBEsper

• Stores queries

• Continuous queries

• Time is a dimension

• SQL

• Tables

• Rows

DBEsper

• EPL

• Event Streams

• Events

Esper Processing Model

EPLEvent Processing Language

Event Definition (1/2)

create schema Event ( id string, // Event unique identifier ts long // Timestamp (milliseconds));

create schema Tweet ( user string,// username (e.g. ‘codebits’) text string,// actual tweet retweet_of string // references a Tweet.id) inherits Event;

Event Definition (2/2)

create schema Hashtag ( tweet_id string, // references a Tweet.id user string, value string) inherits Event;

// Create Url and Mention event types as a copy of Hashtag

create schema Url() copyfrom Hashtag;

create schema Mention() copyfrom Hashtag;

Looks like SQL...

// All eventsselect * from Event;

// Only tweetsselect user, text as statusfrom Tweet;

Filtering

// Tweets from @codebitsselect * from Tweet(user = 'codebits');

// Another way to do itselect * from Tweet where user = 'codebits';

// All occurrences of #codebits not posted by @codebitsselect user, value as hashtag, current_timestamp() as tsfrom Hashtag(value = 'codebits' and user != 'codebits');

Stream Creation and Redirection

insert into CodebitsTweetsselect * from Tweet(user = ‘codebits’);

select * from CodebitsTweets;

Aggregation

insert into UrlsPerSecondselect count(*) as count from Url.win:time_batch(1 sec);

// Every second (driven by above rule) calculate for last minute// - average Urls tweeted// - total Urls tweetedselect avg(count), sum(count)from UrlsPerSecond.win:length(60);

Grouping

select value as hashtag, count(*)from Hashtag(value != null).win:time(30 seconds)group by value;

Simple Event Views

select * from Tweet.win:time(5 min);

select * from Tweet.win:time_batch(1 hour);

select * from Tweet.win:length(10);

select * from Tweet.win:length_batch(10);

Other Standard Event Views

// Don’t use system clock, use event stream propertyselect * from Tweet.win:ext_timed(ts, 5 min);

// Last 10 tweets per userselect * from Tweet.std:groupwin(user).win:length(10);

// Top 5 Hashtagsselect * from HashtagsPerMinute.std:sort(5, count desc);

You can create your own custom Views

Correlation

// Associate hashtags used to describe a URLinsert into UrlTagsselect u.value as url, h.value as hashtagfrom Url.std:lastevent() as u, Hashtag.std:lastevent() as hwhere u.tweet_id = h.tweet_id;

insert into UrlTagsCountselect url, hashtag, count(*) as countfrom UrlTags.win:time(1 hour)group by url, hashtag;

Correlation (1/2)

// Every minute, output Top 3 hashtags per URLselect * from UrlTagsCount.ext:sort(3, count desc)output snapshot at(*/1,*,*,*,*);

Event Patterns

// Measure how long it takes users to respond to Tweetinsert into ResponseDelayselect t.id as tweet_id, t.user as author, m.value as responder, t.ts as start_ts, m.ts as stop_ts, m.ts - t.ts as durationfrom pattern [ every (t=Tweet -> m=Mention(value = t.user))];

Detecting Missing Events

// No Tweet from @codebits in 1 hourselect *from pattern [ every Tweet(user = ‘codebits’) -> (timer:interval(1 hour) and not Tweet(user = ‘codebits’))];

Other features

• Subqueries

• Inner, outer joins

• Named windows

• 1st class integration with databases (JDBC)

• Regex-like Event Pattern matching (match-recognize)

Esper is awesome!

well, duh!

It’s not a silver bullet

Memory Usage

Resilience & Persistence

Weak Pattern matching

Drill-down not trivial

It’s NOT distributed!

Not full-stack

For more: @antonioalegria

QA

Recommended