Download pptx - Metrics: where and how

Transcript
Page 1: Metrics: where and how

Metrics: where and howgraphite-oriented story

Page 2: Metrics: where and how

• Vsevolod Polyakov• Platform Engineer at Grammarly

Page 3: Metrics: where and how

GraphiteAll whisper-based systems

Page 4: Metrics: where and how

Default graphite architecture

Page 5: Metrics: where and how

what?• RRD-like (gram.ly/gfsx)• so.it.is.my.metric → /so/it/is/my/metric.wsp• Fixed retention (by name\pattern)• Fixed size (actually no)

Page 6: Metrics: where and how

Retention and size• 1s:1d → 1 036 828 bytes• 10s:10d → 1 036 828 bytes• 1s:365d → 378 432 028 bytes (1 TB ~ 3 000)• 10s:365d → 37 843 228 bytes (1 TB ~ 30 000)

whisper calc

Page 7: Metrics: where and how

Retention and size• 10s:30d,1m:120d,10m:365d → 4 564 864 bytes• 240 864 metrics in 1 TB• aggregation: average, sum, min, max, and last.• can be assign per metric

Page 8: Metrics: where and how

How• terraform (https://www.terraform.io/)• docker (https://www.docker.com/)• ansible (https://www.ansible.com/)• rocker (https://github.com/grammarly/rocker)• rocker-compose (

https://github.com/grammarly/rocker-compose)

Page 9: Metrics: where and how

Default graphite architecture

Page 10: Metrics: where and how

carbon-cache.py• single-core• many options in config file• default

link

Page 11: Metrics: where and how

architecturecarbon-cache.py

Page 12: Metrics: where and how

Start load testing• m4.xlarge instance (4 CPU, 16 GB ram, 256 GB disk EBS gp2)• retentions = 1s:1d• MAX_CACHE_SIZE, MAX_UPDATES_PER_SECOND, MAX_CREATES_PER_MINUTE = inf• defaults• almost 1.5h to get limit :(

Page 13: Metrics: where and how

carbon-cache.py cache size → 75k req\s

Page 14: Metrics: where and how
Page 15: Metrics: where and how
Page 16: Metrics: where and how

results

• 75 000 req\s max• 60 000 req\s flagman speed• I\O :(

Page 17: Metrics: where and how

Try to tune!

• WHISPER_SPARSE_CREATE = true (don’t allocate space on creation) non-linear I\O load.

• CACHE_WRITE_STRATEGY = sorted (default)

Page 18: Metrics: where and how

cache size 1k → 195k req\s

Page 19: Metrics: where and how

results

• 120 000 req\s flagman speed• cache flush problem :(

Page 20: Metrics: where and how

Try to tune!

• CACHE_WRITE_STRATEGY = max will give a strong flush preference to frequently updated metrics and will also reduce random file-io.

Page 21: Metrics: where and how

from 1k to 150k

Page 22: Metrics: where and how

results

• 90 000 req\s flagman speed• cache flush problem :(

Page 23: Metrics: where and how

Try to tune!

• CACHE_WRITE_STRATEGY = naive just flush. Better with random I\O.

Page 24: Metrics: where and how

from 45k to 135k

Page 25: Metrics: where and how

results

• 120 000 req\s flagman speed• still CPU

Page 26: Metrics: where and how

sorted

max

naive

Page 27: Metrics: where and how

• Maybe it’s I\O EBS limitation? → 512 GB disk. • No.

Page 28: Metrics: where and how

go-carbon• multi-core single daemon• written in golang• not many options to tune :(

link

Page 29: Metrics: where and how

Start load testing• m4.xlarge instance (4 CPU, 16 GB ram, 256 GB disk EBS gp2)• retentions = 1s:1d• max-size = 0• max-updates-per-second = 0• almost 1h to get limit :(

Page 30: Metrics: where and how

1k → 130k req\s ~3k/min

Page 31: Metrics: where and how
Page 32: Metrics: where and how

results• 120 000 req\s flagman speed• but it’s without sparse. • try to implement

Page 33: Metrics: where and how

try to tune!remaining := whisper.Size() - whisper.MetadataSize()whisper.file.Seek(int64(remaining-1), 0)whisper.file.Write([]byte{0})chunkSize := 16384zeros := make([]byte, chunkSize)for remaining > chunkSize {

// if _, err = whisper.file.Write(zeros); err != nil {// return nil, err// }remaining -= chunkSize

}if _, err = whisper.file.Write(zeros[:remaining]); err != nil {

return nil, err}

Page 34: Metrics: where and how

180 000 req\s !

Page 35: Metrics: where and how
Page 36: Metrics: where and how

try to tune!

• max update operation = 1500

Page 37: Metrics: where and how

results

• TLDR 210 000 - 240 000 req\s flagman speed• 31 000 000 cache size!

Page 38: Metrics: where and how
Page 39: Metrics: where and how

try to tune!

• max update operation = 0• input-buffer = 400 000

Page 40: Metrics: where and how

results

• 270 000 req\s flagman speed• 10-20 million req cache size!

Page 41: Metrics: where and how
Page 42: Metrics: where and how

try to tune!

• vm.dirty_background_ratio=40• vm.dirty_ratio=60

Page 43: Metrics: where and how

300 000 req\s

Page 44: Metrics: where and how

results

• 300 000 req\s flagman speed• 180k+ req\s ±without cache

Page 45: Metrics: where and how

Re:Lays

Page 46: Metrics: where and how

Default graphite architecture

Page 47: Metrics: where and how

arch forward

Page 48: Metrics: where and how

arch named\regexp

Page 49: Metrics: where and how

arch hash

Page 50: Metrics: where and how

arch hash replicafactor: 2

Page 51: Metrics: where and how

carbon-relay.py

• twisted based• native

Page 52: Metrics: where and how

Start load testing• c4.xlarge instance (4 CPU, 7.5 GB ram)• ~1 Gb lan• default parameters• hashing• 10 connections

Page 53: Metrics: where and how

WTF!

Page 54: Metrics: where and how

carbon-relay-ng• golang-based• web-panel• live-updates• aggregators• spooling

link

Page 55: Metrics: where and how

<150 000 req\s

Page 56: Metrics: where and how

carbon-c-relay

• written in C• advanced cluster management

Page 57: Metrics: where and how

from 100 000 to 1 600 000 req\s

Page 58: Metrics: where and how

1 400 000 flagman speed. Or not?

Page 59: Metrics: where and how

So…go-carbon + carbon-c-relay = ♡

Page 60: Metrics: where and how

BTW. influx, 130k req\s on cluster

Page 61: Metrics: where and how

influx

Page 62: Metrics: where and how

openTSDBsingle instance + hbase cluster = upto 150k

req\s

Page 63: Metrics: where and how

ALSO• zipper:

• https://github.com/grobian/carbonserver• https://github.com/grobian/carbonwriter• https://github.com/dgryski/carbonzipper• https://github.com/dgryski/carbonapi• https://github.com/dgryski/carbonmem

• https://github.com/jssjr/carbonate

Page 64: Metrics: where and how

plans• Cyanite, retest• newTS• openTSDB tuninig• zipper tuning

Page 65: Metrics: where and how

feel free to ask• Vsevolod Polyakov• [email protected]• skype: ctrlok1987• github.com/ctrlok• twitter.com/ctrlok• slack: HangOps• Gitter: dev_ua/devops• skype: DevOps from Ukraine


Recommended