Tips & Tricks To Reducing TTR

Jason Hand – DevOps Evangelist

Tips & Tricks to Reduce TTR for the Next Incident

@jasonhand

Time to Resolution (TTR)

•  The total amount of time taken to resolve an incident

•  MTTR – Mean Time To Resolution* – summary over time – measurement used to describe the most

"typical" value in a set of values – the lower the better

*Resolve = Repair = Recover

•  Incident Lifecycle – Alerting – Triage – Investigation – Identification – Resolution – Documentation

Alerting “zero 1me” aler1ng pla6orm to find people instantly can only really effect average TTR by a very small percentage

No1fy on-‐call members

Victor’s Tips

“Include useful content & context in the alerts.”

“Use custom no8fica8ons to dis8nguish cri8cal alerts.”

Triage Assign degrees of urgency to incidents

Victor’s Tips

“Get the right alerts to the right people through rou8ng.”

“Establish a single source of truth for all ac8vi8es of an incident.”

Investigation • Log in • Check the logs • Analyze metrics • Review wikis • Discuss w/ team

Victor’s Tips

“Collaborate & Share.”

“Connect with the right resources and team members.”

Identification “Everything will be beKer if I fix this one thing.”

Victor’s Tips “Provide quick access to accurate metrics & runbooks.”

Resolution Self-‐documen1ng what teams do to solve the problem

Bidirec1onal integra1on with your favorite chat client and the VictorOps 1meline

Team members performing system ac1ons to fix the problem(s)

Victor’s Tips “Be vocal & share what is taking place.”

Documentation Write down and talk about what we did

Runbook

Victor’s Tips “Conduct (blameless) post-‐mortems.”

Tips & Tricks to Reduce TTR for the Next Incident

Summary

“Conduct (blameless) post-‐mortems.”

“Be vocal & share what is taking place.”

“Provide quick access to accurate metrics & runbooks.”

“Collaborate & Share.”

“Connect with the right resources and team members.”

“Get the right alerts to the right people through rou8ng.”

“Establish a single source of truth for all ac8vi8es of an incident.”

“Include useful content & context in the alerts.”

“Use custom no8fica8ons to dis8nguish cri8cal alerts.”

Jason Hand – DevOps Evangelist Tips & Tricks to Reduce TTR for the Next Incident

@jasonhand

Thank You

[email protected]

Software

Tips & Tricks To Reducing TTR