38
TMSG – Paper Reading 黃黃黃 (Angus)

黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

Embed Size (px)

Citation preview

Page 1: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

TMSG – Paper Reading黃福銘 (Angus)

Page 2: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

2Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Trajectory Clustering : A Partition-and-Group Framework

Jae-Gil Lee Jiawei Han

UIUC Kyu-Young

Whang KAIST

ACM SIGMOD’07

2012.01.04

Page 3: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

3Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Paper outline

IntroductionTrajectory clusteringTrajectory partitioningLine segment clusteringExperimental evaluationDiscussion and conclusions

Page 4: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

4Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

1. Introduction

BackgroundThe key observationExamples in real applicationsPossible argumentsContributions

Page 5: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

5Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Background

Previous research has mainly dealt with clustering of point data

K-means, BIRCH, DBSCAN, OPTICS, STING

Recent researches cluster trajectories as a whole

Improvements in satellites and tracking facilities

Page 6: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

6Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

The key observation

Clustering trajectories as a whole could not detect similar portions of the trajectories

Page 7: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

7Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Examples in real applications

Hurricanes : landfall forecastsCoastline: at the time of landingSea: before landing

Animal movements : effects of roads and traffic

Road segmentsTraffic rate

Page 8: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

8Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Possible arguments

If we prune the useless parts of trajectories and keep only the interesting ones

It is tricky to determine which part of the trajectories is useless

Pruning useless parts of trajectories forbids us to discover unexpected clustering results

Page 9: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

9Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Contributions

Partition-and-group frameworkTo cluster trajectoriesTo discover common sub-trajectories

Formal trajectory partitioning algorithmMinimum description length principle

Density-based clustering algorithm for line segmentsDemonstrate by using various real data sets

Page 10: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

10Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

2. Trajectory clustering

Problem statementThe TRACLUS algorithmDistance function

Page 11: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

11Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Problem statement

Input : a set of trajectoriesT = {TR1,…,TRnumtra

}

Output : a set of clustersO = {C1,…,Cnumclus

}

TrajectoryTri = p1p2p3…pj…pleni

Sub-trajectoryCharacteristic pointCluster

A set of trajectory partitions

Representative trajectoryThe major behavior of the trajectory partitions

Page 12: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

12Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Example of trajectory clustering

Page 13: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

13Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

The TRACLUS algorithm

Page 14: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

14Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Distance functionThe perpendicular distance ( d┴ )

The parallel distance ( d|| )

The angle distance ( dθ )

Page 15: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

15Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

3. Trajectory partitioning

Desirable propertiesFormalization using the MDL principleApproximate solution

Page 16: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

16Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Desirable propertiesPreciseness

The difference between a trajectory and a set of its trajectory partitions should be as small as possible

ConcisenessThe number of trajectory partitions should be as small as possible

Page 17: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

17Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Formalization using the MDL principle

To find the optimal tradeoff between preciseness and concisenessMinimum description length (MDL)

Cost components: H hypothesis; D data.L(H) is the length, in bits, of the description of the hypothesis; and L(D|H) is the length, in bits, of the description of the data when encoded with the help of the hypothesis.

Definition: The best hypothesis H to explain D is the one that minimizes the sum of L(H) and L(D|H).

A hypothesis corresponds to a specific set of trajectory partitionsFind the optimal partitioning translates to finding the best hypothesis using the MDL principle

Page 18: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

18Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Formulation of the MDL costL(H) represents the sum of the length of all trajectory partitionsL(D|H) represents the sum of the difference between a trajectory and a set of its trajectory partitions

So~ Let’s minimize the L(H)+L(D|H)

Page 19: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

19Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

MDL=L(H)+L(D|H)

Approximate solution

Page 20: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

20Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

4. Line segment clustering

Density of line segmentsClustering algorithmRepresentative trajectory of a clusterHeuristic for parameter value selection

Page 21: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

21Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Density of line segments

ε-neighborhood

Core line segment

Directly density-reachable

Density-reachable

Density-connected

Density-connected set

Page 22: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

22Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Density-reachability and density-connectivity

L1, L2, L3, L4, and L5 are core line segmentsL2 (or L3) is directly density-reachable from L1L6 is density-reachable from L1, but not vice versaL1, L4, and L5 are all density-connected

Page 23: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

23Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Influence of a very short line segment on clustering

A short line segment might induce over-clusteringOur experience indicates that increasing the length of trajectory partitions by 20~30% generally improves the clustering quality

Page 24: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

24Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Line segments clustering algorithm

A cluster is a density-connected set

Trajectory cardinality

Be classified as a cluster or a noise

Directly density-reachable

ε-neighborhoodCore line segment

Cardinality checking

Page 25: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

25Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Representative trajectory of a cluster

Page 26: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

26Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Rotation of the X and Y axes

Page 27: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

27Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Rep

rese

nta

tive tra

jecto

ry

gen

era

tion

alg

orith

m

3

5 6

Page 28: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

28Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Heuristic for parameter value selection

The value of the ε and MinLnsSimulated annealingEntropy function

Page 29: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

29Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

5. Experimental evaluation

Experimental settingResults for hurricane track dataResults for animal movement dataEffects of parameter values

Page 30: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

30Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Experimental setting

Hurricane track data setAtlantic1950~2004570 trajectories and 17736 pointsLatitude and longitude

Animal movement data setElk, 1993: 33 trajectories and 47204 pointsDeer 1995: 32 trajectories and 20065 points

Page 31: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

31Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Clustering quality measurement

No well-defined measure for density-based clustering methodsSum of Squared Error (SSE)N : the set of all noise line segments

The noise penalty becomes larger if we select too small ε or too large MinLns

Page 32: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

32Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Results for hurricane track data

Page 33: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

33Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Results for animal movement data

Page 34: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

34Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Effects of parameter values

Use smaller ε or larger MinLnsDiscovers a larger number of smaller clusters

Use a larger ε or a smaller MinLnsDiscovers a smaller number of larger clusters

Page 35: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

35Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

7. Discussion and conclusions

DiscussionConclusions

Page 36: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

36Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Discussion

ExtensibilityUndirected or weighted trajectories

Parameter insensitivityPoint data, trajectory data

Efficiencyindex

Movement patternsCircular motion..

Temporal information

Page 37: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

37Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

Conclusions

Partition-and-group frameworkTrajectory clustering algorithm TRACLUSTwo real data sets experimentsA visual inspection toolCommon sub-trajectories

Page 38: 黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07

38Angus Fuming

HuangAcademia Sinica, Institute of Information Science, ANTS Lab

My comments

Detailed sentence with explicit illustration !

What is the principle of the parallel distance function ? (p.14)What is the base for the 20~30% length increasing? (p.23)