Fast Mining and Forecasting of Complex Time-Stamped Eventsyasuko/PUBLICATIONS/... · Money.com...

Preview:

Citation preview

Fast Mining and Forecasting of Complex Time-Stamped Events

Paper ID: 183

Yasuko Matsubara

Kyoto University y.matsubara@db.soc.i.kyoto-u.ac.jp

Yasushi Sakurai

NTT Communication Science Labs yasushi.sakurai@acm.org

Christos Faloutsos

Carnegie Mellon University christos@cs.cmu.edu

Tomoharu Iwata

NTT Communication Science Labs iwata.tomoharu@lab.ntt.co.jp

Masatoshi Yoshikawa

Kyoto University yoshikawa@i.kyoto-u.ac.jp

Motivation Complex time-stamped events Web click events: {time, url, user, access device, http referrer,…}

Challenges

L We cannot see any trends!!

Problem definition Given: A sequence of complex time-stamped events Goal: (1) Find major topics (2) Forecast future events

Timestamp URL User Device 2012-08-01-12:00 CNN.com Smith iphone 2012-08-02-15:00 YouTube.com Brown iphone 2012-08-02-19:00 CNET.com Smith mac

… … … …

Proposed method: TriMine Main idea (1): M-way analysis (a) Complex time-stamped tensor (b) Decompose to 3 topic vectors (c) Inference (Gibbs sampling) Infer k topics for each element according to probability p: Main idea (2): Multi-scale analysis TriMine is linear on the input size N, i.e., (N: counts of events in X, n: duration of X)

u

vn

Object

Actor

Time

Topic1 (business)

Topic2 (news)

Topic3 (media)

Object

Actor

Time

TriMine-F – forecasts Use topic matrices O, A, C [step1] Forecast time-topic matrix C’ Forecast C’ using multiple levels of matrices [step2] Generate events using three matrices O, A, C’

(a)  Count estimation – X’ = O * A * C’ (b)  Complex event generation – sampling approach

Experiments web click data / On demand TV data TriMine-plot (red points: each URL/user/TV program) Benefit of multi-scale forecasting Forecasting accuracy Temporal perplexity RMSE between original & forecasted events Scalability

Conclusions We addressed the problem of complex event mining TriMine has following properties:

• Effective: It finds meaningful patterns in real datasets • Accurate: It enables forecasting • Scalable: It is linear on the database size

Code: http://www.kecl.ntt.co.jp/csl/sirg/people/yasuko/software.html

Original access counts (100 users, 1 week)

Mth order tensor (M=3) cube {object x actor x time} : event count of object i, actor j at time t xijt

e.g., (‘Smith’, ‘CNN.com’, ‘Aug1 12pm’; 21 times)

Object/URL

Actor/user

Time

u

v

n

xijt

e.g., Topic 1 :business topic (Higher value: related topic)

Object/URL

Money.com

CNN.com

Smith Johnson

Actor/user

Time

Mon-Fri Sat-Sun

Object matrix

Actor matrix

Time matrix

u

v n

k

k

k€

O

A

C

time actors

O

A

n€

u

k

u

k€

k

χ = χ(0)

u

χ(1)

C(1)

k

u

n / l1

C(2)

k

n / l2

χ(2)

l0

l1

l2

v

v

v

v

n / l0

C(0)

Share O & A for all levels

Hourly pattern

Daily pattern

Weekly pattern

Compute C for each level

Future events

Tensor X €

O

AC C

O

AC

time 1−t2−t

c r(0)

c r(1)

c r(2)

r,t(0)c

Forecasted value

[Step 1] [Step 2]

102 103 104102

104

106

Duration

Wal

l clo

ck ti

me

(sec

.)

ARTriMine−F (naive)TriMine−F

7x

74x

10 20 3020

40

60

80

Time (per 2 hours)

RM

SE

Marginal

TriMine−FPLiFT2

1 2 3

103

104

Time (per 1 day)

RM

SE

Marginal

TriMine−FPLiFT2AR

O(N logn)→O(N )

0 100 200 3000

0.005

0.01

0.015

0.02

0.025

Time (per 1 hour)

Valu

e

"Business""Media""Drive"

Weekend

Weekend

"Business"

"Drive"

"Media"

car&biketravelarea,

word−of mouth

newsportal

business news

pet

restaurant

money,finance

URL matrix User matrix Time matrix

"Blog"

"Communication"

"Food" blog

route map, diet

community

pet

rss

web mailmobile mail

restaurant0 100 200 300

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Time (per 1 hour)

Valu

e

"Blog""Food""Communication"

11pm4pm

0 20 40 60 800

0.005

0.01

0.015

Time (per 4 hours)

Valu

e

"Sports""Romance""Action"

Saturday

SundaySaturday

"Romance"

"Sports"

"Action"

LOST

Taxi 1 & 2Oceans 11 & 12

Australian Open tennis

French Open tennis final (women)

Desperate Housewives

French Open tennis final (men)

TV matrix Time matrix

0

5

10 x 10−3

Time (per 2hours)

Valu

e

2 topics by TriMine

0 50 100 150 200 250 3000

5

10 x 10−3

Time (per 2hours)

Valu

e

Forecasting − TriMine−F (single)

0 50 100 150 200 250 3000

5

10 x 10−3

Time (per 2hours)

Valu

e

Forecasting − TriMine−F

0

5

10 x 10−3

Time (per 2hours)

Valu

e

2 topics by TriMine

0 50 100 150 200 250 3000

5

10 x 10−3

Time (per 2hours)

Valu

e

Forecasting − TriMine−F (single)

0 50 100 150 200 250 3000

5

10 x 10−3

Time (per 2hours)

Valu

e

Forecasting − TriMine−F

0

5

10 x 10−3

Time (per 2hours)

Valu

e

2 topics by TriMine

0 50 100 150 200 250 3000

5

10 x 10−3

Time (per 2hours)

Valu

e

Forecasting − TriMine−F (single)

0 50 100 150 200 250 3000

5

10 x 10−3

Time (per 2hours)

Valu

e

Forecasting − TriMine−F

Original C matrix Single scale: failed Multi-scale: capture patterns

Any trends?

Can we forecast future events?

0 50 100 150 200 250 3000

100

200

300

Time

Cou

nt o

f clic

ks URL: blog site

Bursty L

Noisy L

URL matrix Time matrix

Similar trends very clear groups

daily + weekly periodicities

180 200 220 240 260 280 300

105

Time (per 2 hours)

Perp

lexi

ty

TriMine−FT2

170 180 190 200 210 220 230 240

106

Time (per 4 hours)

Perp

lexi

ty

TriMine−FT2

T2: [Hong et al.’11], PLiF [Li et al.’10]