Overprov a tool for cluster overprovisioning detection

Preview:

Citation preview

Overprov: A Tool for Cluster Overprovisioning Detection

Del Bao

Problemad_backend cpu.idle uswest2-prod

Problem (2)

bizfeed oldgen gc count a day

Problem (3)generic cassandra byte_percentfree

what does the tool do?

Design Goals

• save cost in the long run

• based on simple rules

• eliminate false positive

• extensible

Code Structure● run()

for cluster_name in clusters:

dt = detector.ClusterOverprovDetector(

product,

ecosystem,

cluster_name,

metric_list,

start,

stop,

signalfx_auth_token

dt.execute()● metric_list

metric_list_cass = [

ModuleClass('overprov.analyzers.cpu_idle_analyzer', 'CpuIdleAnalyzer'),

ModuleClass('overprov.analyzers.cass_gc_count_analyzer', 'CassGcCountAnalyzer'),

ModuleClass('overprov.analyzers.cass_disk_free_analyzer', 'CassDiskFreeAnalyzer'),

]

You can extend it

• create your own analyzer

• pass in your start, stop day

Assumptions

• static check, so the daily/hourly resolution, e.g., p95 is fine.

• cluster is almost well balanced, so take max/min across cluster hosts in a region represents the entire cluster

What it’s Not

• Fleetmiser– Instantaneous autoscale spot fleet for seagull

clusters– a signal of 10 min interval

• Paasta– similar to above, only for paasta service

Demo• virtualenv_run/bin/overprov -p cassandra -c

ad_backend --start 60 --stop 30 -e prod -k ./api_token

• virtualenv_run/bin/overprov -p cassandra -c ad_backend --start 60 --stop 30 -e prod -k ./api_token --debug

• virtualenv_run/bin/overprov -p elasticsearch -c ads144 --start 60 --stop 30 -e prod -k ./api_token

Questions

Recommended