38
Data Modeling In Depth: A Time Series Example Thom Valley Solutions Engineer, DataStax

Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Embed Size (px)

Citation preview

Page 1: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Data Modeling In Depth: A Time Series Example

Thom Valley Solutions Engineer, DataStax

Page 2: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

1 Introduction – Sample Application

2 Data Modeling in Cassandra

3 Time Series Tips and Tricks

4 Example Models

Page 3: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Introduction

Former COO @ Fybr •  Operations •  Systems •  Account Management

Page 4: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Parking Sensors?

•  Wireless / Battery Powered •  On Street Parking •  Cellular Network Gateways •  Real Time Acquisition / Delivery

Page 5: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Use Cases

•  Real Time Parking Availability •  Utilization Analysis and Planning •  Predictive Analytics •  Directed Enforcement •  Operations, Performance and PM

Page 6: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

System Diagram

Page 7: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

SFPark Project

•  ~8,200 parking spaces + control areas •  11 city districts •  <10 second avg. reaction time

•  Motorist way finding •  Parking Utilization Analysis / Demand Pricing

Page 8: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

1 Sensor Message Archive

2 Parking Sessions

3 Current Space Status

Sample Models

Page 9: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

•  Master Data (we won’t cover defining these) –  Sensor –  Space –  Block –  District –  Customer –  Types (parking, non-parking, metered, etc. etc.)

Additional Considerations

Page 10: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

•  Distributed / replicated (always on) •  High performance writes •  Easy to expand capacity •  Events stored in sequence •  Reads done in sequence (fast)

Why Cassandra for Time Series?

Page 11: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

•  Define our objects / conceptual model •  Define our queries •  Create our logical model •  Validate

–  Query hot spotting –  Excessive Duplication –  Overwriting –  Partition size

Approach

Page 12: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

•  Level of Granularity •  Clustering Columns •  Enable range queries •  Managing Aggregations

Guidelines (time series)

Page 13: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

1 Sensor Message Archive

2 Parking Sessions

3 Current Space Status

Sample Models

Page 14: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

•  Every sensor message received •  Newer messages more important

•  Show list of recent events (20) •  Validate message processing

•  Drive sensor health reporting

Sensor Message Archive

Page 15: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Sensor Message Archive

SENSOR MESSAGE

event_time

sensor_id

space*

event

battery

msg_count

arr_time

gateway

QUERIES •  Messages for time range by sensor •  Current battery status by sensor •  Current message count by sensor

Page 16: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Sensor Message Archive

VALIDATE •  Duplication •  Partition Size:

–  120 * 365 * 7 = 306,600 •  Hot Spotting •  Overwriting

sensor_id K event_time C(D) arr_time space event battery msg_count gateway

sensor_message

Page 17: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Sensor Message Archive

sensor_id K year_month K event_time C(D) arr_time space event battery msg_count gateway

sensor_message

create table sensor_message( sensor_id text, year_month text, event_time timestamp, arr_time timestamp, space text, event text, battery text, msg_count int, gateway text, PRIMARY KEY ((sensor_id, year_month), event_time ) ) WITH CLUSTERING ORDER BY (event_time DESC);

Page 18: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Sensor Message Archive

SENSOR MESSAGE

event_time

sensor_id

space*

event

battery

msg_count

arr_time

gateway

QUERIES •  Messages for time range by sensor •  Current battery status by sensor •  Current message count by sensor

Page 19: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Sensor Message Archive

sensor_id K event_time arr_time space event battery msg_count gateway

sensor_message

create table last_sensor_message( sensor_id text, event_time timestamp, arr_time timestamp, space text, event text, battery text, msg_count int, gateway text, PRIMARY KEY (sensor_id) )

Page 20: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Options

Set a default TTL AND default_time_to_live = 157680000 (5 years)

Use DateTieredCompactionStrategy

Page 21: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

DateTieredCompactionStrategy (DTCS)

•  Frequently higher performing for time series data •  Manages compaction by age of data in each SSTABLE •  Can significantly reduce compaction overhead Best Practices •  Data is written in time order •  Data is immutable •  Old data is infrequently accessed

Page 22: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

1 Sensor Message Archive

2 Parking Sessions

3 Current Space Status

Sample Models

Page 23: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Parking Session

session

session_start

space

inferred

session_end QUERIES •  Last / current session by space •  Total occupancy for a period

Page 24: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

•  Start / End of each space being occupied •  Newer messages more important

•  Show last session for a space •  Calculate occupancy / vacancy ratio •  Calculate space turnover (number of sessions in time period)

Parking Session

Page 25: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Parking Session

VALIDATE •  Duplication •  Partition Size:

–  25 * 365 * 3 = 27,365 •  Hot Spotting •  Overwriting

space K session_start C(D) session_end inferred

session

Page 26: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Parking Session

space K session_start C(D) session_end inferred

session

create table session( space text, session_start text, session_end timestamp, inferred text, PRIMARY KEY (space, session_start) ) WITH CLUSTERING ORDER BY (session_start DESC);

Page 27: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

1 Sensor Message Archive

2 Parking Sessions

3 Current Space Status

Sample Models

Page 28: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Current Space Status

current_status

district

space

block

QUERIES •  Current Status

–  by space –  by block –  by district –  by customer

status

customer

last_change

Page 29: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

•  Last status change for every space •  Only one status per space •  High read / low write / almost no deletes

•  Feed public way finding applications

Current Space Status

Page 30: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Current Space Status

space K customer district block status last_change

current_status QUERIES •  Current Status

–  by space –  by block –  by district –  by customer

Page 31: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Current Space Status

customer K district C block C space status last_change

current_status QUERIES •  Current Status

–  by space –  by block –  by district –  by customer

Page 32: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Current Space Status

VALIDATE •  Duplication •  Partition Size:

–  10,000 •  Hot Spotting •  Overwriting

customer K district C block C space C status last_change

current_status

Page 33: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Current Space Status

VALIDATE •  Duplication •  Partition Size:

–  10,000 •  Hot Spotting •  Overwriting

customer K district K block C street C status last_change

current_status

Page 34: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Current Space Status

customer K district K block C space C status last_change

current_status

create table current_status( customer text, district text, block text, street text, space text, status text, last_change timestamp, PRIMARY KEY ((customer, district), block, space) )

Page 35: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Options

customer K district K block C space C status last_change

current_status Use In Memory Table? •  Relatively small foot print of data set •  Very few deletes •  Significant mutation in place •  Very high read / low latency requirements

Page 36: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Current Space Status

customer K district K block C space C status last_change

current_status

create table current_status( customer text, district text, block text, space text, status text, last_change timestamp, PRIMARY KEY (customer, district), block, space) ) WITH compaction= { 'class': 'MemoryOnlyStrategy', 'size_limit_in_mb': 25 } AND caching = 'NONE';

Page 37: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Results

create table current_status( customer text, district text, block text, space text, status text, last_change timestamp, PRIMARY KEY (customer, district), block, street, status) ) WITH compaction= { 'class': 'MemoryOnlyStrategy', 'size_limit_in_mb': 25 } AND caching = 'NONE';

create table sensor_message( sensor_id text, year_month text, event_time timestamp, arr_time timestamp, space text, event text, battery text, msg_count int, gateway text, PRIMARY KEY ((sensor_id, year_month), event_time ) )WITH CLUSTERING ORDER BY (event_time DESC);

create table last_sensor_message( sensor_id text, event_time timestamp, arr_time timestamp, space text, event text, battery text, msg_count int, gateway text, PRIMARY KEY (sensor_id) )

create table session( space text, session_start text, session_end timestamp, inferred text, PRIMARY KEY (space, session_start) ) WITH CLUSTERING ORDER BY (session_start DESC);

Page 38: Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

Thank You