Андрей Козлов (Altoros): Оптимизация производительности Cassandra

CASSANDRA PERFORMANCE OPTIMIZATION

Andrey Kozlov

CASSANDRA FEATURES● “Masterless” architecture● Scalable● Fast inserts

COLUMN FAMILY BASICS

Column Family Rows:

THRIFT AND SUPER COLUMN FAMILY

Super Column Family Rows:

CQL AND COLUMN GROUPS

CQL Partitions

CQL TABLE AND TYPICAL REQUESTS

CREATE TABLE users ( org text,name text,department text, age int,role text,PRIMARY KEY (org, name)

);

• SELECT * FROM users; - NON OPTIMAL REQUEST

• SELECT * FROM users WHERE org – NORMAL REQUESTIN ('altoros', ‘google');

• SELECT * FROM users WHERE org – OPTIMAL REQUEST

= 'altoros' AND name= 'john';

CASSANDRA IS SLOW :(

SIGNAL EXAMPLESignal was stored in Thrift Format. Row key was defined by the params:• APPLICATIONID : id for the application•METRICID: id for the metric• SUBJECTTYPE: server, group of servers or application• SUBJECTID: id of server, group of application• ROLLEDDATA: true if this row has rolled data, false if it has raw data• ROLLUPINTERVAL: 60, 300, 900, 1800, ...• AGGREGATOR: min, max, avg

Each row contains a column per timestampSize of data 8 bytes

servers=10metrics=500threads=20apps = 40

Batch size 10-100

Astyanax implementation, 1 node installation (initial point)

writes/second: 270kreads/second: 70-80k

BASIC OPTIMIZATION

1. Change Thrift protocol to CQLA. It will reduce number of rows (partitions)B. According to documentation, Cassandra 2.1 with CQL

driver is faster than Thrift (because of Nagle’s algorithm and Shared Executor Pool)

2.Start using prepared statements

THRIFT VS CQL 2.1

CQL MODEL FOR SIGNAL CREATE TABLE test_table (

appid bigint,subject_type int,subject_id bigint,metric_id bigint,rolled_data boolean,rollup_interval int,aggregator_type int,time timestamp,value blob,

PRIMARY KEY ((appid, subject_type, subject_id, metric_id), rolled_data, rollup_interval, aggregator_type, time)

)WITH compaction = {'class': 'SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'LZ4Compressor'} AND caching = 'keys_only'

WRITE PERFORMANCE Data was written by batches with 10-100 values

Astyanax implementation, 1 node installation (initial point) writes/second: 270k

Datastax implementation, 1 node installation, cache disabled writes/second: 400k

Datastax implementation, 1 node installation, cache disabled (2 generators)

writes/second: 600k

CASSANDRA ASYNCHRONOUS REQUESTS

All the requests in Cassandra are asynchronous

ASYNCHRONOUS READING IN JAVAList<ResultSetFuture> futures = Lists.newArrayListWithExpectedSize(ids.size()*timeValueNumber);

for (Identifier key: ids){ BoundStatement st = createBoundStatement(key);

futures.add(session.executeAsync(st));}

List<ListenableFuture<ResultSet>> listenableFutures= Futures.inCompletionOrder(futures);

ASYNCHRONOUS READ PERFORMACNE Data was read asynchronously by 5-values batches

Astyanax implementation, 1 node installation (initial point) reads/second: 70-80k Datastax implementation, 1 node installation, cache disabled reads/second: 190-220k

Datastax implementation, 1 node installation, cache disabled (2 generators)

reads/second: 250k

* request latency is above 2-5 miliseconds

OPTIMUM DATA MODEL The best solution is to make metric_ID the part of clustering key. This will allow reading and batch writing range of server metrics into one partition by one transaction

CREATE TABLE test_table ( appid bigint,subject_type int,subject_id bigint,rolled_data boolean,rollup_interval int,aggregator_type int,time timestamp,metric_id bigint,value blob,

PRIMARY KEY ((appid, subject_type, subject_id), rolled_data, rollup_interval, aggregator_type, time, metric_id)

)

OPTIMUM PARTITION STRUCTURE

PARTITION SIZE Size = Nr x (rolled_data + rollup_interval + aggregator_type + time + metric_ID + val + valTimeStamp)

Size = 86400 x (1 + 4 + 4 + 8 + 8 + 8 + 8) = 3.542.400 (per day)

Size = 24.796.800 (per week) Size = 106.272.000 (per month)

PARTITION SIZE CREATE TABLE test_table (

appid bigint,subject_type int,subject_id bigint,first_day_of_week , - First day of week, to make separate partitionsmetric_id bigint,rolled_data boolean,rollup_interval int,aggregator_type int,time timestamp,value blob,

PRIMARY KEY ((appid, subject_type, subject_id, first_day_of_week, metric_id), rolled_data, rollup_interval, aggregator_type, time)

)

CASSANDRA 2-LEVEL CACHING

CASSANDRA ROW CACHE Cassandra 2.0

Row cache can store only the whole partition• Could be used with small partitions•Could be used with small number of partitions

Cassandra 2.1Row cache stores part of partion• It is possible to manually set number of cached rows in partition•It is possible to set TTL for rows in partition

CASSANDRA ROW CACHE PERFORMANCE

PERFORMANCE OPTIMIZATION RESULTS Astyanax implementation, 1 node installation (initial point) writes/second: 270k reads/second: 70-80k

Datastax implementation, 1 node installation, cache enabled (2 generators) writes/second: 600k reads/second: 500-600k

Datastax implementation, 3 node installation (replica factor 2), cache enabled (3 generators)

writes/second: 1,4 million reads/second: 0,8 - 1,2 million

THANKS!

Technology

Андрей Козлов (Altoros): Оптимизация производительности Cassandra