Upload
planet-cassandra
View
415
Download
0
Embed Size (px)
Citation preview
Data Modeling In Depth: A Time Series Example
Thom Valley Solutions Engineer, DataStax
1 Introduction – Sample Application
2 Data Modeling in Cassandra
3 Time Series Tips and Tricks
4 Example Models
Introduction
Former COO @ Fybr • Operations • Systems • Account Management
Parking Sensors?
• Wireless / Battery Powered • On Street Parking • Cellular Network Gateways • Real Time Acquisition / Delivery
Use Cases
• Real Time Parking Availability • Utilization Analysis and Planning • Predictive Analytics • Directed Enforcement • Operations, Performance and PM
System Diagram
SFPark Project
• ~8,200 parking spaces + control areas • 11 city districts • <10 second avg. reaction time
• Motorist way finding • Parking Utilization Analysis / Demand Pricing
1 Sensor Message Archive
2 Parking Sessions
3 Current Space Status
Sample Models
• Master Data (we won’t cover defining these) – Sensor – Space – Block – District – Customer – Types (parking, non-parking, metered, etc. etc.)
Additional Considerations
• Distributed / replicated (always on) • High performance writes • Easy to expand capacity • Events stored in sequence • Reads done in sequence (fast)
Why Cassandra for Time Series?
• Define our objects / conceptual model • Define our queries • Create our logical model • Validate
– Query hot spotting – Excessive Duplication – Overwriting – Partition size
Approach
• Level of Granularity • Clustering Columns • Enable range queries • Managing Aggregations
Guidelines (time series)
1 Sensor Message Archive
2 Parking Sessions
3 Current Space Status
Sample Models
• Every sensor message received • Newer messages more important
• Show list of recent events (20) • Validate message processing
• Drive sensor health reporting
Sensor Message Archive
Sensor Message Archive
SENSOR MESSAGE
event_time
sensor_id
space*
event
battery
msg_count
arr_time
gateway
QUERIES • Messages for time range by sensor • Current battery status by sensor • Current message count by sensor
Sensor Message Archive
VALIDATE • Duplication • Partition Size:
– 120 * 365 * 7 = 306,600 • Hot Spotting • Overwriting
sensor_id K event_time C(D) arr_time space event battery msg_count gateway
sensor_message
Sensor Message Archive
sensor_id K year_month K event_time C(D) arr_time space event battery msg_count gateway
sensor_message
create table sensor_message( sensor_id text, year_month text, event_time timestamp, arr_time timestamp, space text, event text, battery text, msg_count int, gateway text, PRIMARY KEY ((sensor_id, year_month), event_time ) ) WITH CLUSTERING ORDER BY (event_time DESC);
Sensor Message Archive
SENSOR MESSAGE
event_time
sensor_id
space*
event
battery
msg_count
arr_time
gateway
QUERIES • Messages for time range by sensor • Current battery status by sensor • Current message count by sensor
Sensor Message Archive
sensor_id K event_time arr_time space event battery msg_count gateway
sensor_message
create table last_sensor_message( sensor_id text, event_time timestamp, arr_time timestamp, space text, event text, battery text, msg_count int, gateway text, PRIMARY KEY (sensor_id) )
Options
Set a default TTL AND default_time_to_live = 157680000 (5 years)
Use DateTieredCompactionStrategy
DateTieredCompactionStrategy (DTCS)
• Frequently higher performing for time series data • Manages compaction by age of data in each SSTABLE • Can significantly reduce compaction overhead Best Practices • Data is written in time order • Data is immutable • Old data is infrequently accessed
1 Sensor Message Archive
2 Parking Sessions
3 Current Space Status
Sample Models
Parking Session
session
session_start
space
inferred
session_end QUERIES • Last / current session by space • Total occupancy for a period
• Start / End of each space being occupied • Newer messages more important
• Show last session for a space • Calculate occupancy / vacancy ratio • Calculate space turnover (number of sessions in time period)
Parking Session
Parking Session
VALIDATE • Duplication • Partition Size:
– 25 * 365 * 3 = 27,365 • Hot Spotting • Overwriting
space K session_start C(D) session_end inferred
session
Parking Session
space K session_start C(D) session_end inferred
session
create table session( space text, session_start text, session_end timestamp, inferred text, PRIMARY KEY (space, session_start) ) WITH CLUSTERING ORDER BY (session_start DESC);
1 Sensor Message Archive
2 Parking Sessions
3 Current Space Status
Sample Models
Current Space Status
current_status
district
space
block
QUERIES • Current Status
– by space – by block – by district – by customer
status
customer
last_change
• Last status change for every space • Only one status per space • High read / low write / almost no deletes
• Feed public way finding applications
Current Space Status
Current Space Status
space K customer district block status last_change
current_status QUERIES • Current Status
– by space – by block – by district – by customer
Current Space Status
customer K district C block C space status last_change
current_status QUERIES • Current Status
– by space – by block – by district – by customer
Current Space Status
VALIDATE • Duplication • Partition Size:
– 10,000 • Hot Spotting • Overwriting
customer K district C block C space C status last_change
current_status
Current Space Status
VALIDATE • Duplication • Partition Size:
– 10,000 • Hot Spotting • Overwriting
customer K district K block C street C status last_change
current_status
Current Space Status
customer K district K block C space C status last_change
current_status
create table current_status( customer text, district text, block text, street text, space text, status text, last_change timestamp, PRIMARY KEY ((customer, district), block, space) )
Options
customer K district K block C space C status last_change
current_status Use In Memory Table? • Relatively small foot print of data set • Very few deletes • Significant mutation in place • Very high read / low latency requirements
Current Space Status
customer K district K block C space C status last_change
current_status
create table current_status( customer text, district text, block text, space text, status text, last_change timestamp, PRIMARY KEY (customer, district), block, space) ) WITH compaction= { 'class': 'MemoryOnlyStrategy', 'size_limit_in_mb': 25 } AND caching = 'NONE';
Results
create table current_status( customer text, district text, block text, space text, status text, last_change timestamp, PRIMARY KEY (customer, district), block, street, status) ) WITH compaction= { 'class': 'MemoryOnlyStrategy', 'size_limit_in_mb': 25 } AND caching = 'NONE';
create table sensor_message( sensor_id text, year_month text, event_time timestamp, arr_time timestamp, space text, event text, battery text, msg_count int, gateway text, PRIMARY KEY ((sensor_id, year_month), event_time ) )WITH CLUSTERING ORDER BY (event_time DESC);
create table last_sensor_message( sensor_id text, event_time timestamp, arr_time timestamp, space text, event text, battery text, msg_count int, gateway text, PRIMARY KEY (sensor_id) )
create table session( space text, session_start text, session_end timestamp, inferred text, PRIMARY KEY (space, session_start) ) WITH CLUSTERING ORDER BY (session_start DESC);
Thank You