Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

EXTRACTING EVENTS FROM PROBABILISTIC

STREAMS

Chris Re, Julie Letchner,

Magdalena Balazinska and Dan Suciu

University of Washington

One Slide Overview Motivating App: RFID Ecosystem

Tagged people, cups, books, keys, laptops, etc. Event queries [Cayuga, SASE, Snoop]

Alert when anyone enters the coffee room Two problems

Missed readings, read-rates in practice are lowGranularity mismatch, e.g. Office v. Antenna 41

Instead, infer location from sensors Propose, keep probs & query with PEEX+

PEEX+ (Probabilistic Event EXtraction) keeps data probabilistic to get higher P/R and is still efficient.

Motivating Apps

RFID appsDiary and Active Calendar Application.

○ Alert if I go to a database meeting.Supply chain

○ Alert if Mach 3 razors are being stolen

Many independent HMMsElder care [Intel,Patterson]

○ Alert if elder takes their medicine with waterFinancial applications on predictive HMM

○ Alert if head-and-shoulders market

Outline

RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments

The source of probabilities

Each orange particle is a guess of true location

6th Floor in PAC

Blue ring is ground truth

Connectivity Diagram

Antennas

PFs to a (prob) DB personTag t Loc P

Joe 7 O2 0.4

H2 0.2

H3 0.4

Joe 8 O2 0.6

H2 0.2

H3 0.2

Sue 7 … …

At(tag,loc)

To query Particle Filter output, query At

Tag t Loc P

Joe 7 O2 0.4

H2 0.2

H3 0.4

Joe 8 O2 0.6

H2 0.2

H3 0.2

Sue 7 … …

Semantics of the Model

At(tag,loc)

Tag t Loc

Joe 7 O2

Joe 8 O2

Sue 7 …

Prob =0.4 * 0.6 * …

NB: Markovian correlations OK

“Joe enter O2 at t=8”

(0.2 0.4)*0.6 0.36 Query Semantic: sum weight of all worlds where Q is true at time t

possible stream (worlds)

Probability outside O2 (in H2,H3)

Outline


A hierarchy of PEEX+ queries Regular Queries

Alert me when Joe goes to the coffee room Extended Regular

Alert when anyone goes to the coffee room Safe

Alert when anyone goes to the coffee room and a DB member follows them.

Hard Others (Simulation)This line is sharp for some queries

Peex+ Queries

Fragment of Cayuga, queries define events.Operator Description

Base stream

semicolon Sequence

Select

Kleene+

Technical Point: Left-to-right eval,

( ) ( )V P1( ) 1( ( , ))l At p lRoom

1( , )At p l

1 2( , ); ( , )At p l At p l

( )( , ) { , }Hall lAt p l p

g;

{ , }P V

1 2 3 1 2 3; ; ( ; );E E E E E E

Same p in both

p in some location

Regulars and Extended Regular

Query is regular if no variable is shared between subgoals

Query is extended regular if any variable shared by two subgoals, is shared by all subgoals, i.e. templated regular query

502 ( (' ', '501'); (' ', ))l At Joe At Joe l

502 ( ( , 5̀01 ); ( , ))l At p At p l p is shared between subgoals

Wrinkle in the language:Filter v. Selection

“Alert next time Joe is in 502 after he is in 501”

(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe

`502'( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l

Time

Yes

No

( ,501)Joe ( ,502)Joe( ,503)Joe

“Alert if the next place Joe is in after 501 is 502”

At

Outline


Why are ER queries hard?

Regular Queries ~ Regular ExpressionsMapping is non-trivial

○ similar to Cayuga [Demers et al. 06] Queries have #P-combined complexity

○ Can encode mDNF as regular expressionIntuition: n-sized automaton leads to

Extended regular ~ 1 NFA per/personk persons implies O(k)-size automatonExponential cost

time(2 )n

When ER, can avoid blowup

Algorithm for Regular Queries Overview

Deterministic Algorithm

1. Compile a query q1. NFA –like-thing in a language

2. Mapping events to subsets of

2. At runtime, at time t have events E1. Create set of symbols at time t:

2. Process NFA on

( ) ( )q qe E

M E M e

( )qM E

qL

qLqM

Focus on the compilation

Compile Select and Filter



Intuition: goal maps to two letters:match (m) : matches filteraccept (a) : accepted by select

1 1 2 2{m , , , }L a m a

1a 2a

2{ }m Final

Does not contain

Does contain

language and automaton are the same for both queries

The difference is the mapping

1 1 2 2{m , , , }L a m a

1a 2a

2{ }m Final

Does not contain

Does contain



Event Filter Select

( ,501)Joe11{ , }m a 1 1 2{ , , }m a m

( ,502)Joe22{ , }m a

2 2{ , }m a

0( , )Joe l2{ }m

Regular Queries w. Probabilities

Probabilistic Algorithm

1. Compile a query q1. NFA with transition in a language

2. Mapping events to subsets of

2. At time t have events E with probs1. Create set of symbols at time t:

2. Process NFA on

( ) ( )q qe E

M E M e

( )qM E

qLqLqM

Stays the same

distribution on inputs

Algorithm is constant in data, exponential in |Q|

distribution on states

State at t+1 only depends on state at t and input at t+1

Extension to Extended regular “Alert when anyone in 501 and next step

in 502”

If substitute for p, result is regular

Bindings use disjoint sets of tuples. Algorithm: independent copies, multiply

`502'( ( , 5̀01 ); ( , ))select lq At p At p l

`502'[ ] ( ( , 5̀01 ); ( , ))lq p Joe At Joe At Joe l

Depends on # distinct values (shared vars), not # of timesteps – can stream

`502'[ ] ( ( , 5̀01 ); ( , ))lq p Tom At Tom At Tom l

Recap of Algorithms

Regular QueriesCompiled them to an NFA, then used imageData complexity O(1)

Extended regularSeveral regulars multiplied togetherDepends on number of distinct people in the

data, not number of time steps. Markov Correlations: more arithmetic &

state

PEEX+ Algorithms and Analysis

Compilation procedures Safe plans.

More complicated based on algebracost grows with data (useful for archives)

Aggregates Complexity: Can we do better?

For a restricted class, draw a crisp lineMinor variants of safe result in hardness

Outline


Experimental Setup

Quality Experiment52 objects, 352 locations, 10k sq. ft.

○ 2x30m trace with 10 m break in betweenParticipants marked down true locations“Alert when anyone enters the Coffee Room”

Consider two ScenariosRealtime (No correlations) v. MLEArchived (Smoothing) v. Viterbi

2 1( ) ( ) 1 2( ( ( , )); ( , ))Coffee l Hallway l At p l At p l

In practice, can smooth in a short time

Quality: Realtime Declare an event “true”, if its Pr > threshold

Vary threshold

0

0.2

0.4

0.6

0.8

1Precision

0

0.2

0.4

0.6

0.8

1Recall

0

0.2

0.4

0.6

0.8

1F1

10% improvement in F1

Quality: Archived

Smoothing v. ViterbiPEEX keeps track of Markovian Correlations

0

0.2

0.4

0.6

0.8

1

Precision Recall F1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

Approx ~30% gain in F1

Performance

Conclusion Showed PEEX+

Processed output of several inference tasks○ Applies more generally than just RFID

Quality (F1) gains by keeping probability50% from probs, 50% from correlations

Performance was usable in real-timeNo indexing!

Preprint available on request

Future Work Implementing archived stream indexing.

Aggregations in timeAggressive indexingRanking? Top-K?

Shaper lines for complexityAre there more streamable queries?

Richer languageSimilar to linear style plansWhat do people need?

Temporal Models!Consistency

Correlations

Sequencing by example

Sequencing is parameterized [Cayuga]

502' ( ( , 5̀01'); ( , ))l At p At p l

( ,501)Joe ( ,502)Bob ( ,502)Joe

Time

( ,503)Joe

Semicolon means “the next event among those that match next goal”

Semicolon is not “after”

Compilation by example

Each goal “corresponds” to two letters:move (m) – the query should advanceaccept (a) – the next subgoal accepts

1 50` 2' ( ( , 5̀01 ); ( , ))lq At Joe At Joe l

1 1 1 2 2{m , , , }L a m a

1a 2a

2{ }m1 1 2( ,501) { , , }Joe m a m

2 2( ,502) { , }Joe m a

Any other maps to empty set0 2( , ) { }Joe l m

Final

Does not contain

Does contain

qM

Subtle example..

What about:

1 50` 2' ( ( , 5̀01 ); ( , ))lq At Joe At Joe l 1 1 1 2 2{m , , , }L a m a

1a 2a

2{ }m

1 1 2( ,501) { , , }Joe m a m

2 2( ,502) { , }Joe m a

Any other maps to empty set0 2( , ) { }Joe l m

Final

Does not contain

Does contain

1M

2 ( , 5̀01 ); ( , 5̀02 ')q At Joe At Joe

0( , )Joe l

2M

Documents

Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington