Upload
dortha-harper
View
234
Download
0
Embed Size (px)
Citation preview
TMSG – Paper Reading黃福銘 (Angus)
2Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Trajectory Clustering : A Partition-and-Group Framework
Jae-Gil Lee Jiawei Han
UIUC Kyu-Young
Whang KAIST
ACM SIGMOD’07
2012.01.04
3Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Paper outline
IntroductionTrajectory clusteringTrajectory partitioningLine segment clusteringExperimental evaluationDiscussion and conclusions
4Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
1. Introduction
BackgroundThe key observationExamples in real applicationsPossible argumentsContributions
5Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Background
Previous research has mainly dealt with clustering of point data
K-means, BIRCH, DBSCAN, OPTICS, STING
Recent researches cluster trajectories as a whole
Improvements in satellites and tracking facilities
6Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
The key observation
Clustering trajectories as a whole could not detect similar portions of the trajectories
7Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Examples in real applications
Hurricanes : landfall forecastsCoastline: at the time of landingSea: before landing
Animal movements : effects of roads and traffic
Road segmentsTraffic rate
8Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Possible arguments
If we prune the useless parts of trajectories and keep only the interesting ones
It is tricky to determine which part of the trajectories is useless
Pruning useless parts of trajectories forbids us to discover unexpected clustering results
9Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Contributions
Partition-and-group frameworkTo cluster trajectoriesTo discover common sub-trajectories
Formal trajectory partitioning algorithmMinimum description length principle
Density-based clustering algorithm for line segmentsDemonstrate by using various real data sets
10Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
2. Trajectory clustering
Problem statementThe TRACLUS algorithmDistance function
11Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Problem statement
Input : a set of trajectoriesT = {TR1,…,TRnumtra
}
Output : a set of clustersO = {C1,…,Cnumclus
}
TrajectoryTri = p1p2p3…pj…pleni
Sub-trajectoryCharacteristic pointCluster
A set of trajectory partitions
Representative trajectoryThe major behavior of the trajectory partitions
12Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Example of trajectory clustering
13Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
The TRACLUS algorithm
14Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Distance functionThe perpendicular distance ( d┴ )
The parallel distance ( d|| )
The angle distance ( dθ )
15Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
3. Trajectory partitioning
Desirable propertiesFormalization using the MDL principleApproximate solution
16Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Desirable propertiesPreciseness
The difference between a trajectory and a set of its trajectory partitions should be as small as possible
ConcisenessThe number of trajectory partitions should be as small as possible
17Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Formalization using the MDL principle
To find the optimal tradeoff between preciseness and concisenessMinimum description length (MDL)
Cost components: H hypothesis; D data.L(H) is the length, in bits, of the description of the hypothesis; and L(D|H) is the length, in bits, of the description of the data when encoded with the help of the hypothesis.
Definition: The best hypothesis H to explain D is the one that minimizes the sum of L(H) and L(D|H).
A hypothesis corresponds to a specific set of trajectory partitionsFind the optimal partitioning translates to finding the best hypothesis using the MDL principle
18Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Formulation of the MDL costL(H) represents the sum of the length of all trajectory partitionsL(D|H) represents the sum of the difference between a trajectory and a set of its trajectory partitions
So~ Let’s minimize the L(H)+L(D|H)
19Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
MDL=L(H)+L(D|H)
Approximate solution
20Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
4. Line segment clustering
Density of line segmentsClustering algorithmRepresentative trajectory of a clusterHeuristic for parameter value selection
21Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Density of line segments
ε-neighborhood
Core line segment
Directly density-reachable
Density-reachable
Density-connected
Density-connected set
22Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Density-reachability and density-connectivity
L1, L2, L3, L4, and L5 are core line segmentsL2 (or L3) is directly density-reachable from L1L6 is density-reachable from L1, but not vice versaL1, L4, and L5 are all density-connected
23Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Influence of a very short line segment on clustering
A short line segment might induce over-clusteringOur experience indicates that increasing the length of trajectory partitions by 20~30% generally improves the clustering quality
24Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Line segments clustering algorithm
A cluster is a density-connected set
Trajectory cardinality
Be classified as a cluster or a noise
Directly density-reachable
ε-neighborhoodCore line segment
Cardinality checking
25Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Representative trajectory of a cluster
26Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Rotation of the X and Y axes
27Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Rep
rese
nta
tive tra
jecto
ry
gen
era
tion
alg
orith
m
3
5 6
28Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Heuristic for parameter value selection
The value of the ε and MinLnsSimulated annealingEntropy function
29Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
5. Experimental evaluation
Experimental settingResults for hurricane track dataResults for animal movement dataEffects of parameter values
30Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Experimental setting
Hurricane track data setAtlantic1950~2004570 trajectories and 17736 pointsLatitude and longitude
Animal movement data setElk, 1993: 33 trajectories and 47204 pointsDeer 1995: 32 trajectories and 20065 points
31Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Clustering quality measurement
No well-defined measure for density-based clustering methodsSum of Squared Error (SSE)N : the set of all noise line segments
The noise penalty becomes larger if we select too small ε or too large MinLns
32Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Results for hurricane track data
33Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Results for animal movement data
34Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Effects of parameter values
Use smaller ε or larger MinLnsDiscovers a larger number of smaller clusters
Use a larger ε or a smaller MinLnsDiscovers a smaller number of larger clusters
35Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
7. Discussion and conclusions
DiscussionConclusions
36Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Discussion
ExtensibilityUndirected or weighted trajectories
Parameter insensitivityPoint data, trajectory data
Efficiencyindex
Movement patternsCircular motion..
Temporal information
37Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
Conclusions
Partition-and-group frameworkTrajectory clustering algorithm TRACLUSTwo real data sets experimentsA visual inspection toolCommon sub-trajectories
38Angus Fuming
HuangAcademia Sinica, Institute of Information Science, ANTS Lab
My comments
Detailed sentence with explicit illustration !
What is the principle of the parallel distance function ? (p.14)What is the base for the 20~30% length increasing? (p.23)