Upload
mrichardson
View
142
Download
7
Embed Size (px)
DESCRIPTION
Citation preview
The State of Open Source Monitoring
ToolsMichael Richardson (@m_richo)
Energized Work
What tools are we currently using to monitor and troubleshoot our systems?
What tools are we currently using to monitor and troubleshoot our systems?
• Nagios• ssh + grep <something_bad>
/some/random/log/file.log• tail –f /some/random/log/file.log• Others?
Nagios
Nagios – The lovers
Nagios – The lovers
Nagios – The lovers
Nagios – The lovers
Nagios Love-meter
0 10
Nagios Love-meter
0 10
Where are you on the Scale?
Nagios Love-meter
0 10
Nagios shits me to tears
Sign me up to Nagios World Conference
2013!!!!
Where are you on the Scale?
Alternatives ?
Alternatives ?
Yep, there’s lots
Alternatives ?
Yep, there’s lots
some are better andsome are worse
•Graphite•Statsd•Logstash•Sensu
Today let’s check out
Graphite
Graphite
• Metric storage• Complex graph creation• http://graphite.wikidot.com• Apache 2.0 license• Send time-series data that you are interested in
graphing
Graphite
Components1. Web2. Whisper3. Carbon
Graphite
• Everything stored in graphite has a path with components delimited by dots. Eg
servers.HOSTNAME.METRIC applications.APPNAME.METRIC
servers.database01.memfreeapplications.trading.loginattempts
Graphite
• No need to pre-define metric end-points• Determine granularity of data upfront.
/opt/graphite/conf/storage-schemas.conf[stats]pattern = ^stats.*retentions = 10:2160,60:10080,600:262974
[catchall]priority = 0pattern = ^.*retentions = 30:86400,300:525600
GraphiteWhat should I graph/trend?1. Application Profiling Data2. Operational Profiling Data3. Regression Testing (releases)
Why should I Graph/trend?4. Trends can tell you when something is about to
break.5. …instead of hearing from your customers that it’s
broken6. Data can tell you when something is already
broken but you don’t yet know it (regression).
Source: Jason Dixon (@obfuscurity)
GraphiteDemo
Image source - http://joemiller.me/2011/11/05/correlating-puppet-changes-to-events-in-your-infrastructure/
StatsD
StatsD
• Measure Anything, Measure Everything
• Created and released by Etsy• Aggregate counters and timers• http://github.com/etsy/statsd
StatsD
• Written in node.js• ~400 lines of javascript• Listens to statistics (counters &
timers), and sends aggregates to backend services (like graphite).
• simple
StatsD
Don’t like Javascript or Node.js??
StatsD
Don’t like Javascript or Node.js??
Google “statsd alternatives”…..
StatsD
Don’t like Javascript or Node.js??
Google “statsd alternatives”…..
20+ rewrites/clones for you including..Ruby, python, scala, python+twisted, erlang, clojure, C, groovy
StatsD
Concepts• Buckets (a name that translates to graphite end-
point)• Values• Flush (default 10 seconds)
Counter metricssuccessfullogins:1|c|@0.1
Timing metricsapitimer:320|ms
StatsD
Counter examples• Successful customer login attempts• Failed customer login attempts• Register a new customer• Hit 3rd party API
StatsD
Timer examples• How fast is our function blah()• How fast is a database query• How fast is our 3rd party API service• How fast is our internet access• How fast are our page response
times.
StatsD
demo
LogStash
LogStash
• Tool for managing Events and logs• http://logstash.net• https://github.com/logstash/
logstash• Apache 2.0 license• Created by Jordan Sissel
(@jordansissel)
LogStash
• Written in ruby.• Built with jruby and ships as a jar
file.
LogStash
LogStash agent is an Event pipeline with 3 parts.
1. Inputs2. Filters3. Outputs
LogStash
1. Inputs – generate events
2. Filters – modify them
3. Outputs – ship them somewhere
LogStash
Inputs include :
amqp, drupal_dblog, eventlog, exec, file, ganglia, gelf, gemfire, generator, heroku, irc, log4j, lumberjack, pipe, redis, relp, sqs, stdin, stomp, syslog, tcp, twitter, udp, xmpp, zenoss, zeromq
LogStash
Filters include :
alter, anonymize, checksum, csv, date, dns, environment, gelfify, geoip, grep, grok, grokdiscovery, json, kv, metrics, multiline, mutate, noop, split, syslog_pri, urldecode, xml, zeromq
LogStashOutputs include :
amqp, boundary, circonus, cloudwatch, datadog, elasticsearch, elasticsearch_http, elasticsearch_river, email, exec, file, ganglia, gelf, gemfire, graphite, graphtastic, http, internal, irc, juggernaut, librato, loggly, lumberjack, metriccatcher, mongodb, nagios, nagios_nsca, null, opentsdb, pagerduty, pipe, redis, riak, riemann, sns, sqs, statsd, stdout, stomp, syslog, tcp, websocket, xmpp, zabbix, zeromq
LogStash
Typical setup
LogStash
Shipper alternatives?
LogStash
Shipper alternatives?• Syslog (rsyslog, syslog-ng,)• Lumberjack https://github.com/jordansissel/lumberjack
• Beaverhttps://github.com/josegonzalez/beaver
• Woodchuckhttps://github.com/danryan/woodchuck
LogStash
Kibana• Web interface for viewing logstash
records stored in elastic search• http://kibana.org/• http://github.com/rashidkpc/Kibana• Search for records• Stream records (near realtime)• Create RSS feeds based on search
results• Score, trend data
LogStash
Kibana – search data
Image source - http://kibana.org/
LogStash
Kibana – trend data
Image source - http://kibana.org/
LogStash
Demo(Syslog & Apache access logs)
LogStash
TIP – Go buy the Logstash Book – http://logstashbook.com/James Turnbull (@kartar)
It’s a great introduction to how to use Logstash.
Sensu
• https://github.com/sensu/sensu• Creator – Sean Porter (@portertech)• Ruby, RabbitMQ, Redis• <1200 lines of code• Omnibus installation packages
Sensu
Components• Sensu-server• Sensu-client• Sensu-api• Sensu-dashboard
Sensu
• Message oriented architecture (messages are JSON objects)
• Described as a monitoring router• Connects “check” scripts on Sensu
Clients to “handler” scripts on Sensu Servers
Sensu
Checks can• Determine if a service like apache
up and running? (check exit code)• Collect metrics like page views or
database cache usage.
Sensu
Output of checks are router to 1 or more handlers who determine what to do.
Sensu
Output of checks are router to 1 or more handlers who determine what to do.
• Send alerts via email, pagerduty, IRC, twitter, basecamp, xmpp, hipchat, campfire, etc, etc
Sensu
Output of checks are router to 1 or more handlers who determine what to do.
• Send alerts via email, pagerduty, IRC, twitter, basecamp, xmpp, hipchat, campfire, etc, etc
• Feed metrics to backend services like graphite, librato, opentsdb, etc, etc
Sensu
demo
Questions??
Thank you