Antifragile, Microservices and DevOps - A Study

  • View
    4.268

  • Download
    1

  • Category

    Internet

Preview:

Citation preview

Anti-fragility, Microservices & DevOps- A Study

By William

Agenda

• The Principle of Anti-fragility• Microservices Architecture• The Principle of DevOps

Topic:What’s the Antonym of Fragile?

• Robust?• Anti-fragile

Fragile

Shatters when exposed to even a small stressor.

Robust

The Problem of Robust

• Robust is just Fragile with a thicker skin…• Encourages a defensive, static mindset• Resistant to change?• Vulnerable to “Black Swan” events…– Something we haven’t anticipated– A failure mode we can’t have foreseen– A cascade of errors that we did not plan for

Black Swans

Anti-fragile

When exposed to stress it gets stronger

Anti-fragile

Some things benefit from shocks…volatility, randomness, disorder, and stressors and love adventure, risk, and uncertainty… there is no word for the exact opposite of fragile. Let’s call it antifragile.

Nassim N. Taleb, “Antifragile. Things that gain from disorder”

Triple Prism of Fragile, Robust & Anti-fragile

Fragile Robust Anti-Fragile

Icon Glass Medieval Castle

DNA/Muscle

Methodology “Spaghetti” ITIL DevOps

Attitude to change

Fear Change Resist Change Embrace Change

Response to Change

Break Repel Adapt

Rate of Change

Ideally never! Slow Rapid

Change initiated by

Needs CEO approval

Change Management Board

User-initiated(via automation)

Focuses on Survival Process Business Value

http://blog.devopsguys.com/2013/07/17/devops-antifragility-and-the-borg-collective/

Is the System in Your Company

• Fragile?• Robust??• Anti-fragile???

Anti-fragile Microservices Architecture

Microservices Architecture – A Case in Practice

Service Dependency

Single Dependency Delay Causing Blocking of User Request

All User Requests will be Blocked at Peak Hour(Cascading Failure)

Circuit Breaker & Bulkhead Isolation Pattern

https://github.com/Netflix/Hystrix

Cross IDC Active - Active

GLSB

DC Aware Gateway

SOA Edge ServiceServiceRegistry

Peer Sync

Invoke

Invoke Invoke

Invoke

DC 1 DC 2

SOA Middle Tier Service

DC Aware Gateway

SOA Edge Service

SOA Middle Tier Service

ServiceRegistryDC Aware

ClientDC Aware

Client

Invoke

Invoke Invoke

Lookup Lookup

Register Register

Lookup Lookup

RegisterRegister

Fallback Invocation

Fallback Invocation

Building Distributed System is Extremely Hard

• Even Harder to Test Sufficiently– Massive data sets and changing shape– Internet-scale traffic– Complex interaction and information flow– Asynchronous nature– 3rd party services– All while innovating and building features

Prohibitively expensive, if not impossible, for most large-scale systems.

There is another Way

• Assume everything will fail• Cause failure to validate resiliency• Test design assumption by stressing them• Don’t wait for random failure. Remove its

uncertainty by forcing it periodically.

What Netflix has Done – Embrace Chaos!

“One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage.”

http://luckyrobot.com/netflix-chaos-monkey-keeps-movies-streaming/

http://www.codinghorror.com/blog/2011/04/working-with-the-chaos-monkey.html

Netflix Simian Army

Representative Anti-fragile Organization

The Netflix cloud architecture is anti-fragile… The Netflix culture is anti-fragile… Getting stronger through failure is the basis of anti-fragility. Avoiding failure at all costs… makes you brittle and vulnerable…

Adrian Cockroft, “Looking back at 2012 with pointers to 2013”http://perfcap.blogspot.com/2013/12/looking-back-at-2013-with-pointers-to.html

Architecture for ImperfectionA highly agile and highly available service constructed from ephemeral and often broken components. It is a service-oriented architecture built on micro-services, none of which are essential to the operation of the whole.The software is written to run across three Amazon datacenters, and will tolerate the loss of any one. We can lose a third of our infrastructure without our customers noticing and calling customer services, it’s no idle claim, Netflix even tests this aspect of its infrastructure. A few weeks ago the team deliberately killed one of the three zones, knocking out 3000 servers in one fell swoop, just to prove that we could do it.By Adrian Cockcroft, from “Netflix, HANA and the meaning of cloud”http://diginomica.com/2013/05/13/netflix-hana-and-the-meaning-of-cloud/

Netflix Global Active – Active Cloud Architecture

http://awsmedia.s3.amazonaws.com/ARC305.pdf

What on Earth is DevOps

Devops means giving a sh*t about your job enough to not pass the buck.Devops means giving a sh*t about your job enough to want to learn all the parts and not just your little world.Developers need to understand infrastructure.Operations people need to understand code.- John E. Vincent(@Lusis)

http://blog.lusis.org/blog/2013/06/04/devops-the-title-match/

The First Way

Silo vs. System Thinking, focus on the end to end value flow.

The Second Way

System improvement via visibility, feedback and data driven decisions

The Third Way

Embrace ChangeBe willing to ExperimentLearn from your mistakes

Microservices Organizational Structure

Take Away

1. Obsessive protection of system against extremely rare events makes it more fragile.

2. Monoculture is fragile, diversity is anti-fragile.

3. If it hurts, do it more often, and bring the pain forward.

4. To create anti-fragile system, stress to them continuously so we are forced to simplify and automate.

Reading for System and Architectural Thinking – recommended by Adrian Cockroft

Recommended