33
EXTRACTING EVENTS FROM PROBABILISTIC STREAMS Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Embed Size (px)

Citation preview

Page 1: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

EXTRACTING EVENTS FROM PROBABILISTIC

STREAMS

Chris Re, Julie Letchner,

Magdalena Balazinska and Dan Suciu

University of Washington

Page 2: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

One Slide Overview Motivating App: RFID Ecosystem

Tagged people, cups, books, keys, laptops, etc. Event queries [Cayuga, SASE, Snoop]

Alert when anyone enters the coffee room Two problems

Missed readings, read-rates in practice are lowGranularity mismatch, e.g. Office v. Antenna 41

Instead, infer location from sensors Propose, keep probs & query with PEEX+

PEEX+ (Probabilistic Event EXtraction) keeps data probabilistic to get higher P/R and is still efficient.

Page 3: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Motivating Apps

RFID appsDiary and Active Calendar Application.

○ Alert if I go to a database meeting.Supply chain

○ Alert if Mach 3 razors are being stolen

Many independent HMMsElder care [Intel,Patterson]

○ Alert if elder takes their medicine with waterFinancial applications on predictive HMM

○ Alert if head-and-shoulders market

Page 4: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Outline

RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments

Page 5: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

The source of probabilities

Each orange particle is a guess of true location

6th Floor in PAC

Blue ring is ground truth

Connectivity Diagram

Antennas

Page 6: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

PFs to a (prob) DB personTag t Loc P

Joe 7 O2 0.4

H2 0.2

H3 0.4

Joe 8 O2 0.6

H2 0.2

H3 0.2

Sue 7 … …

At(tag,loc)

To query Particle Filter output, query At

Page 7: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Tag t Loc P

Joe 7 O2 0.4

H2 0.2

H3 0.4

Joe 8 O2 0.6

H2 0.2

H3 0.2

Sue 7 … …

Semantics of the Model

At(tag,loc)

Tag t Loc

Joe 7 O2

Joe 8 O2

Sue 7 …

Prob =0.4 * 0.6 * …

NB: Markovian correlations OK

“Joe enter O2 at t=8”

(0.2 0.4)*0.6 0.36 Query Semantic: sum weight of all worlds where Q is true at time t

possible stream (worlds)

Probability outside O2 (in H2,H3)

Page 8: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Outline

RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments

Page 9: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

A hierarchy of PEEX+ queries Regular Queries

Alert me when Joe goes to the coffee room Extended Regular

Alert when anyone goes to the coffee room Safe

Alert when anyone goes to the coffee room and a DB member follows them.

Hard Others (Simulation)This line is sharp for some queries

Page 10: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Peex+ Queries

Fragment of Cayuga, queries define events.Operator Description

Base stream

semicolon Sequence

Select

Kleene+

Technical Point: Left-to-right eval,

( ) ( )V P1( ) 1( ( , ))l At p lRoom

1( , )At p l

1 2( , ); ( , )At p l At p l

( )( , ) { , }Hall lAt p l p

g;

{ , }P V

1 2 3 1 2 3; ; ( ; );E E E E E E

Same p in both

p in some location

Page 11: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Regulars and Extended Regular

Query is regular if no variable is shared between subgoals

Query is extended regular if any variable shared by two subgoals, is shared by all subgoals, i.e. templated regular query

502 ( (' ', '501'); (' ', ))l At Joe At Joe l

502 ( ( , 5̀01 ); ( , ))l At p At p l p is shared between subgoals

Page 12: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Wrinkle in the language:Filter v. Selection

“Alert next time Joe is in 502 after he is in 501”

(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe

`502'( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l

Time

Yes

No

( ,501)Joe ( ,502)Joe( ,503)Joe

“Alert if the next place Joe is in after 501 is 502”

At

Page 13: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Outline

RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments

Page 14: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Why are ER queries hard?

Regular Queries ~ Regular ExpressionsMapping is non-trivial

○ similar to Cayuga [Demers et al. 06] Queries have #P-combined complexity

○ Can encode mDNF as regular expressionIntuition: n-sized automaton leads to

Extended regular ~ 1 NFA per/personk persons implies O(k)-size automatonExponential cost

time(2 )n

When ER, can avoid blowup

Page 15: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Algorithm for Regular Queries Overview

Deterministic Algorithm

1. Compile a query q1. NFA –like-thing in a language

2. Mapping events to subsets of

2. At runtime, at time t have events E1. Create set of symbols at time t:

2. Process NFA on

( ) ( )q qe E

M E M e

( )qM E

qL

qLqM

Focus on the compilation

Page 16: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Compile Select and Filter

(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe

`502'( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l

Intuition: goal maps to two letters:match (m) : matches filteraccept (a) : accepted by select

1 1 2 2{m , , , }L a m a

1a 2a

2{ }m Final

Does not contain

Does contain

language and automaton are the same for both queries

Page 17: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

The difference is the mapping

1 1 2 2{m , , , }L a m a

1a 2a

2{ }m Final

Does not contain

Does contain

(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe

`502'( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l

Event Filter Select

( ,501)Joe11{ , }m a 1 1 2{ , , }m a m

( ,502)Joe22{ , }m a

2 2{ , }m a

0( , )Joe l2{ }m

Page 18: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Regular Queries w. Probabilities

Probabilistic Algorithm

1. Compile a query q1. NFA with transition in a language

2. Mapping events to subsets of

2. At time t have events E with probs1. Create set of symbols at time t:

2. Process NFA on

( ) ( )q qe E

M E M e

( )qM E

qLqLqM

Stays the same

distribution on inputs

Algorithm is constant in data, exponential in |Q|

distribution on states

State at t+1 only depends on state at t and input at t+1

Page 19: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Extension to Extended regular “Alert when anyone in 501 and next step

in 502”

If substitute for p, result is regular

Bindings use disjoint sets of tuples. Algorithm: independent copies, multiply

`502'( ( , 5̀01 ); ( , ))select lq At p At p l

`502'[ ] ( ( , 5̀01 ); ( , ))lq p Joe At Joe At Joe l

Depends on # distinct values (shared vars), not # of timesteps – can stream

`502'[ ] ( ( , 5̀01 ); ( , ))lq p Tom At Tom At Tom l

Page 20: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Recap of Algorithms

Regular QueriesCompiled them to an NFA, then used imageData complexity O(1)

Extended regularSeveral regulars multiplied togetherDepends on number of distinct people in the

data, not number of time steps. Markov Correlations: more arithmetic &

state

Page 21: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

PEEX+ Algorithms and Analysis

Compilation procedures Safe plans.

More complicated based on algebracost grows with data (useful for archives)

Aggregates Complexity: Can we do better?

For a restricted class, draw a crisp lineMinor variants of safe result in hardness

Page 22: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Outline

RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments

Page 23: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Experimental Setup

Quality Experiment52 objects, 352 locations, 10k sq. ft.

○ 2x30m trace with 10 m break in betweenParticipants marked down true locations“Alert when anyone enters the Coffee Room”

Consider two ScenariosRealtime (No correlations) v. MLEArchived (Smoothing) v. Viterbi

2 1( ) ( ) 1 2( ( ( , )); ( , ))Coffee l Hallway l At p l At p l

In practice, can smooth in a short time

Page 24: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Quality: Realtime Declare an event “true”, if its Pr > threshold

Vary threshold

0

0.2

0.4

0.6

0.8

1Precision

0

0.2

0.4

0.6

0.8

1Recall

0

0.2

0.4

0.6

0.8

1F1

10% improvement in F1

Page 25: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Quality: Archived

Smoothing v. ViterbiPEEX keeps track of Markovian Correlations

0

0.2

0.4

0.6

0.8

1

Precision Recall F1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

Approx ~30% gain in F1

Page 26: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Performance

Page 27: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Conclusion Showed PEEX+

Processed output of several inference tasks○ Applies more generally than just RFID

Quality (F1) gains by keeping probability50% from probs, 50% from correlations

Performance was usable in real-timeNo indexing!

Preprint available on request

Page 28: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington
Page 29: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Future Work Implementing archived stream indexing.

Aggregations in timeAggressive indexingRanking? Top-K?

Shaper lines for complexityAre there more streamable queries?

Richer languageSimilar to linear style plansWhat do people need?

Temporal Models!Consistency

Page 30: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Correlations

Page 31: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Sequencing by example

Sequencing is parameterized [Cayuga]

502' ( ( , 5̀01'); ( , ))l At p At p l

( ,501)Joe ( ,502)Bob ( ,502)Joe

Time

( ,503)Joe

Semicolon means “the next event among those that match next goal”

Semicolon is not “after”

Page 32: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Compilation by example

Each goal “corresponds” to two letters:move (m) – the query should advanceaccept (a) – the next subgoal accepts

1 50` 2' ( ( , 5̀01 ); ( , ))lq At Joe At Joe l

1 1 1 2 2{m , , , }L a m a

1a 2a

2{ }m1 1 2( ,501) { , , }Joe m a m

2 2( ,502) { , }Joe m a

Any other maps to empty set0 2( , ) { }Joe l m

Final

Does not contain

Does contain

qM

Page 33: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Subtle example..

What about:

1 50` 2' ( ( , 5̀01 ); ( , ))lq At Joe At Joe l 1 1 1 2 2{m , , , }L a m a

1a 2a

2{ }m

1 1 2( ,501) { , , }Joe m a m

2 2( ,502) { , }Joe m a

Any other maps to empty set0 2( , ) { }Joe l m

Final

Does not contain

Does contain

1M

2 ( , 5̀01 ); ( , 5̀02 ')q At Joe At Joe

0( , )Joe l

2M