88
Corpus collapsum Partition tolerance of Galera in a noisy high load environment Highload++ 2014 Raghavendra Prabhu [email protected] Percona [email protected] randomsurfer wnohang.net rdprabhu ronin13

Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

  • Upload
    ontico

  • View
    271

  • Download
    0

Embed Size (px)

DESCRIPTION

Доклад Рагавендра Прабу на HighLoad++ 2014.

Citation preview

Page 1: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Corpus collapsumPartition tolerance of Galera in a noisy high load

environmentHighload++ 2014

Raghavendra Prabhu [email protected]

Percona [email protected] randomsurfer wnohang.net rdprabhu ronin13

Page 2: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

The Title?

Page 3: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Our Cluster

Page 4: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Split brain

Page 5: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Introduction

Seed quotes..

“ ’Network is reliable’ - a fallacy of the distributed system. ”“ A distributed system is one in which the failure of a computer you didn’t

even know existed can render your own computer unusable. ” - Leslie Lamport“ Never attribute to malice that which is adequately explained by stupidity.

” - Hanlon’s Razor“ Never attribute to Byzantine failure which can be explained by an ill

node(s) ” - Me

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58

Page 6: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Introduction

Seed quotes..

“ ’Network is reliable’ - a fallacy of the distributed system. ”“ A distributed system is one in which the failure of a computer you didn’t

even know existed can render your own computer unusable. ” - Leslie Lamport“ Never attribute to malice that which is adequately explained by stupidity.

” - Hanlon’s Razor“ Never attribute to Byzantine failure which can be explained by an ill

node(s) ” - Me

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58

Page 7: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Introduction

Seed quotes..

“ ’Network is reliable’ - a fallacy of the distributed system. ”“ A distributed system is one in which the failure of a computer you didn’t

even know existed can render your own computer unusable. ” - Leslie Lamport“ Never attribute to malice that which is adequately explained by stupidity.

” - Hanlon’s Razor“ Never attribute to Byzantine failure which can be explained by an ill

node(s) ” - Me

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58

Page 8: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Introduction

Seed quotes..

“ ’Network is reliable’ - a fallacy of the distributed system. ”“ A distributed system is one in which the failure of a computer you didn’t

even know existed can render your own computer unusable. ” - Leslie Lamport“ Never attribute to malice that which is adequately explained by stupidity.

” - Hanlon’s Razor“ Never attribute to Byzantine failure which can be explained by an ill

node(s) ” - Me

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58

Page 9: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

20000 feet view

Page 10: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Introduction

Actors

▶ Database - WSREP/PXC▶ Plugin - Galera▶ Traffic control

♦ Traffic Control - tc♦ NetEm

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 7 / 58

Page 11: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Introduction

Actors

▶ Database - WSREP/PXC▶ Plugin - Galera▶ Traffic control

♦ Traffic Control - tc♦ NetEm

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 7 / 58

Page 12: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Introduction

Actors

▶ Database - WSREP/PXC▶ Plugin - Galera▶ Traffic control

♦ Traffic Control - tc♦ NetEm

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 7 / 58

Page 13: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Introduction

Actors

▶ Containers - Docker▶ Load

♦ Generators - Sysbench, RQG▶ Network

♦ Dnsmasq♦ nsenter

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 8 / 58

Page 14: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Introduction

Actors

▶ Containers - Docker▶ Load

♦ Generators - Sysbench, RQG▶ Network

♦ Dnsmasq♦ nsenter

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 8 / 58

Page 15: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Introduction

Actors

▶ Jenkins♦ Build flow and CI

▶ Storage♦ Why

▶ “Others”

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 9 / 58

Page 16: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

But why

▶ The ’P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58

Page 17: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

But why

▶ The ’P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58

Page 18: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

But why

▶ The ’P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58

Page 19: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

But why

▶ The ’P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58

Page 20: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

But why

▶ Failures in warehouses.▶ Not quorum, but consensus.▶ Real world networks and synchronous replication

- Delay- Partition

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 11 / 58

Page 21: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Galera

Page 22: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Galera

▶ Data-centric approach▶ EVS▶ Causality and Synchronous▶ Latency

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 13 / 58

Page 23: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)
Page 24: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)
Page 25: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)
Page 26: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Where did it start

Page 27: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Where did it start

▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192▶ Loss of PC▶ Crash▶ HA goal

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 18 / 58

Page 28: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

One can bring the whole down

Page 29: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

The Flow

Page 30: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58

Page 31: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58

Page 32: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58

Page 33: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58

Page 34: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58

Page 35: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58

Page 36: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58

Page 37: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58

Page 38: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic FlowRR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logsRaghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58

Page 39: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic FlowRR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logsRaghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58

Page 40: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic FlowRR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logsRaghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58

Page 41: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic FlowRR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logsRaghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58

Page 42: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic FlowRR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logsRaghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58

Page 43: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic FlowRR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logsRaghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58

Page 44: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic FlowRR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logsRaghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58

Page 45: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Basic FlowRR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logsRaghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58

Page 46: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Cluster Resilience

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 23 / 58

Page 47: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Parameters

▶ Sysbench▶ Segment▶ Reconciliation period▶ Loss nodes

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58

Page 48: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Parameters

▶ Sysbench▶ Segment▶ Reconciliation period▶ Loss nodes

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58

Page 49: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Parameters

▶ Sysbench▶ Segment▶ Reconciliation period▶ Loss nodes

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58

Page 50: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Parameters

▶ Sysbench▶ Segment▶ Reconciliation period▶ Loss nodes

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58

Page 51: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Parameters

▶ NetEm▶ Detach loss▶ Fsync▶ Shutdown

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58

Page 52: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Parameters

▶ NetEm▶ Detach loss▶ Fsync▶ Shutdown

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58

Page 53: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Parameters

▶ NetEm▶ Detach loss▶ Fsync▶ Shutdown

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58

Page 54: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Parameters

▶ NetEm▶ Detach loss▶ Fsync▶ Shutdown

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58

Page 55: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Containers!

Page 56: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Docker

▶ Why not virtualize♦ Occam♦ Namespaces

▶ Simplicity♦ Network♦ One application per node

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 27 / 58

Page 57: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Docker

▶ Portability- See same qualitative behavior that I do.

▶ Reproducibility- Makes it determinstic

▶ Configurable and CI- Byproducts

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 28 / 58

Page 58: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Docker

▶ QEMU and Docker▶ Scalability

♦ Performance♦ Feature

▶ Abstraction of channels

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 29 / 58

Page 59: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Container Networking

▶ Linking didn’t help▶ Dnsmasq to rescue!

♦ Hosts file and volumes♦ SIGHUP and refresh

▶ More elegant methodsSwarm

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 30 / 58

Page 60: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Noise

▶ Initial setup- Bridge- Egress only- IFB

▶ Present state▶ NetEm

- tc qdisc buckets- packet loss, delay, corruption, duplication, reordering- nsenter

▶ Future- Docker exec

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 31 / 58

Page 61: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Testing methods

Page 62: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Method I

▶ Qdisc is detached after load▶ Objective

- Time to recover of full cluster▶ Done with a larger subset

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 33 / 58

Page 63: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Method II

▶ Qdisc is kept till the end▶ Objective

- Formation of primary component▶ Comparatively smaller set

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 34 / 58

Page 64: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Observations

▶ Post sanity types- Why

▶ Which method is more pertinent▶ State transfer issues

- Beginning- During re-emergence

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 35 / 58

Page 65: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Observations

▶ Direct load to affected nodes▶ Logs

- journalctl- Streaming?

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 36 / 58

Page 66: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Other noises

▶ Aim▶ Fsync

- libeatmydata- Variance

▶ Correlation with network▶ How with Docker

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 37 / 58

Page 67: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

System Load

Page 68: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Load generation

▶ Sysbench- Generation- Reconnect on partition

▶ Sockets chosen- Load on affected nodes

▶ Distribution of Load- RR with socat- Native sysbench support

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 39 / 58

Page 69: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Load generation

▶ Nature of data/load- DDL

▶ RQG in future- Fuzz testing

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 40 / 58

Page 70: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

The Fix

Page 71: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Strike Out!

Page 72: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Eviction

▶ STONITH▶ Permanent eviction▶ ’N’ strikes & out!

- Timers - evs parameters- wsrep_evs_delayed and wsrep_evs_evict_list

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 43 / 58

Page 73: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Eviction

▶ Aim▶ Quorum required

- Why? - Not shoot each other - Non-PC nodes also.

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 44 / 58

Page 74: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Eviction

▶ Aim▶ Quorum required

- Why? - Not shoot each other - Non-PC nodes also.

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 44 / 58

Page 75: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Eviction

▶ EVS version and upgrade▶ TODO!

- Ingress only - Follow here.▶ Credits to Teemu Ollakka, Yan Zhang and Alex Yurchenko from codership.

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 45 / 58

Page 76: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

Coredumps with Docker

▶ Breakdown of abstraction▶ Lack of isolation▶ What was done

- Volumes- core_pattern & sysctl- suid and ulimit

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 46 / 58

Page 77: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Details

WAN Segments

▶ How they work▶ Random allocation▶ Joiner starvation▶ Simulates data center▶ Donor selection

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 47 / 58

Page 78: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Epilogue

The code

▶ Github: https://github.com/percona/pxc-docker▶ Jenkins: http://jenkins.percona.com/job/PXC-5.6-netem/

- Demo?▶ Contributions/testing welcome!▶ Dependencies

- Sysbench

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 48 / 58

Page 79: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Epilogue

Code: todo

▶ Docker automated builds▶ Orchestration▶ Docker

♦ Injection♦ Signal proxying

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 49 / 58

Page 80: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Epilogue

Code: todo

▶ Use Hoare’s channels - Go!▶ Run it bare - CoreOS▶ Overlay with etcd/fleet/libswarm

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 50 / 58

Page 81: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Future work

Page 82: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Epilogue

Future work

▶ Fault injection♦ Memory

- Poisoned memory♦ Disk

- libeatmydata- Opposite: laggard!- ENOSPC

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 52 / 58

Page 83: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Epilogue

Fault injection

▶ CPU- NUMA?- Hotplug

▶ More network- corruption, duplication, reordering, rate-limit- Better distribution- Other shaping

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 53 / 58

Page 84: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

More Chaos

Page 85: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Epilogue

Future work

▶ Disturb cluster more!- Membership changes* Manual eviction* Pull the cord!- Corrupt nodes

▶ Consistency voting

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 55 / 58

Page 86: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Epilogue

Further Reading

▶ Byzantine fault tolerance- Reaching agreement in presence of faults

▶ The Network is Reliable▶ NetEm▶ Latency: The New Web Performance Bottleneck▶ Galera▶ Auto eviction code▶ Don’t Settle for Eventual Consistency

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 56 / 58

Page 87: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Epilogue

About

▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDB Cluster, Percona.▶ Slides will be at slideshare and owncloud▶ Keybase.io: rdprabhu▶ About.me: raghavendra.prabhu▶ Keybase.io: rdprabhu▶ Presentation under CC BY-SA 4.0

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 57 / 58

Page 88: Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Epilogue

Image Credits▶ http://galeracluster.com/documentation-webpages/▶ http://www.thelastdragontribute.com/40th-anniversary-death-of-bruce-lee/▶ https://upload.wikimedia.org/wikipedia/commons/6/60/Corpus_callosum.png▶ http://www.thebarrow.org/Neurological_Services/Epilepsy/204354▶ https://flic.kr/p/9J6GNu▶ https://secure.flickr.com/photos/brewbooks/7780990192▶ https://www.flickr.com/photos/kwerfeldein/2649294869▶ https://secure.flickr.com/photos/mindmob/51951632▶ https://secure.flickr.com/photos/arenamontanus/2227769907▶ https://www.flickr.com/photos/markop/477199204▶ http://galeracluster.com/wp-content/uploads/2013/10/galera_replication1.png▶ https://www.flickr.com/photos/gcwest/281385801▶ https://www.flickr.com/photos/opethdamna/360934079▶ http://digital-amphetamine.deviantart.com/art/Sky-82555664▶ http://highload.co/i/logo.png▶ https://flic.kr/p/xTT8n▶ https://www.flickr.com/photos/29233640@N07/13466208953▶ https://www.flickr.com/photos/bob_in_thailand/9782777742/

Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 58 / 58