TIPMAX
Hsiang-HsuanHung
Mo)va)on Helping taxi drivers to max their income
WebApp:TipMaxhttp://www.tipmaxnyc.xyz
DataSource
http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
Pipeline
Flask
Batch process
Pipeline
Flask
Batch process
Problem: raw data is not ordered by time and 220GB with 13 billions events
Real-TimePipeline
Flask
Real-TimePipeline
Flask
Batch process
…
Engineer real-time streaming
Challenges• Connector between Cassandra and Spark
• Design primary keys for data query
• Cleaning data
Challenges• Time series forecast?
AboutMe• UCSD, Physics PhD 2011
• U Illinois, ECE 2011-2012
• U Texas Austin, Physics 2012-2015
• Computational material science:
• Programming, travel, fitness….
HPC, e.g. quantum Monte Carlo…
Morecomplicatedqueries
• Will passengers give higher tips during rush hours?
• Will tips vary by payment type, years and weather, number of passengers?
• ….....