Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive transportation data

Preview:

Citation preview

GPSInsights: Towards an efficient framework for storing and mining

massive real-time vehicle location data

Linh-Truong Hoang, Duy-Khanh Bui, Viet-Trung Tran

Hanoi University of Science and Technology

1  

Agenda

•  Motivation •  System architecture •  Scalable map-matching •  Experimentation •  Conclusion

2  

Global Navigation Satellite System (GNSS)

•  Autonomous geo-spatial positioning – position – velocity –  time

•  "Great" points about GNSS – Free – Real-time – No required local infrastructures

3  

GNSS as part of Intelligent transport system (ITS)

•  "precious" data for real-time traffic managements –  traffic dashboard – speed control –  traffic jams monitoring

4  

Need  for  collec-ng  and  mining  massive  GNSS  data    

in  REAL-­‐TIME  

GNSS data characteristics

•  Real-time –  reported every

second •  Massive in volume –  from millions cars

•  "bad" data •  Need to be

processed within digital map topology

5  

GNSS data is Bigdata's 5V

6  

SYSTEM ARCHITECTURE

Store massive GNSS data Real-time mining

7  

8  

Elas(city  High-­‐throughput  Fault-­‐tolerance  

Scalable  First-­‐class  spa(o-­‐temporal  

API  High-­‐thoughput  Fault-­‐tolerance   Online  processing    

Scalable    Fault-­‐tolerence  

 Leverage  opensource  components  

9  

Apache spark processing

•  Resilient Distributed dataset (RDD) –  In-memory, backed by persistent storage (HDFS) –  fault-tolerance by lineage – Support interactive – iterative analysis

10  

Spark streaming

11  

Apache storm

12  

MongoDb with geo-indexing

13  

Geomesa: Accumulo + geo-indexing

14  

SCALABLE MAP-MATCHING ALGORITHM

15  

Map-matching

•  Online vs. Offline

•  OSM map

16  

Algorithm

•  OSM map format

•  Filling intermediate points – Millions more points – Massive data – but simple calculations •  real-time, scalable

17  

K-d tree for closest neighbours

•  Run by apache spark/storm

18  

EXPERIMENTATION

19  

Experiment setup

•  12 millions GPS records collected by vehicles equipped with the GPS receiver in March 2014

•  4 nodes cluster – 8-cores Intel Xeon 2.6GHz CPU, 32GB memory

20  

Map-matching completion time

21  

Latency

22  

"Scalability"

23  

Demonstration

24  

Real-time traffic monitoring

25  

Real-time shortest path

26  

Conclusion •  GPSInsights: Scalable framework for storing

and mining massive location data – built on open-source scalable components – scalable storage + real-time mining – Plug-able components – Demonstration with scalable map-matching

algorithm •  Future work – Advance map-matching algorithms – Traffic jam prediction

27  

Current state-of-the-arts

•  PostGIS – Spatial objects management

over Postgres – Small size – No mining supported

28