29
Gustavo Barrancos (@gbarrancos) Rafael Ferreira (@rafaeld) Cloud Reliability Patterns @ Nubank

Cloud Reliability Patterns

Embed Size (px)

Citation preview

Page 1: Cloud Reliability Patterns

Gustavo Barrancos (@gbarrancos) Rafael Ferreira (@rafaeldff)

Cloud Reliability Patterns @ Nubank

Page 2: Cloud Reliability Patterns

“Nu Minimal Keynote Template” © Nu Bank - 05.01.2014 2

Complex systems usually operate in failure mode

John Gall

Page 3: Cloud Reliability Patterns

Growth

3

Page 4: Cloud Reliability Patterns

Microservices

4

Page 5: Cloud Reliability Patterns

5

Asynchronous Messaging

Page 6: Cloud Reliability Patterns

Asynchronous Messaging

6

• Decoupling • Pub/Sub • Fault Isolation • Load Distribution • Operability

Page 7: Cloud Reliability Patterns

Asynchronous Messaging

7

• Concerns • Complexity • Harder to test • Single Point of Failure

Page 8: Cloud Reliability Patterns

Asynchronous Messaging

8

Apache Kafka

Page 9: Cloud Reliability Patterns

9

Blue - Green Deployments

Page 10: Cloud Reliability Patterns

Blue-Green Deployments

10

“A release technique that reduces downtime and risk by running two identical production environments: Blue and Green”

https://docs.cloudfoundry.org/devguide/deploy-apps/blue-green.html

Page 11: Cloud Reliability Patterns

Blue-Green Deployments - AWS Autoscaling Groups

11

Elastic Load Balancer

Page 12: Cloud Reliability Patterns

Blue-Green Deployments - AWS Elastic Network Interface

12

Page 13: Cloud Reliability Patterns

13

Pervasive Monitoring

Page 14: Cloud Reliability Patterns

Pervasive monitoring - Systems Health Metrics

“Nu Minimal Keynote Template” © Nu Bank - 05.01.2014 14

Page 15: Cloud Reliability Patterns

15

Riemannhttp://www.riemann.io

Page 16: Cloud Reliability Patterns

Pervasive monitoring - Riemann

16

Page 17: Cloud Reliability Patterns

17

Pervasive Monitoring - Business Metrics

Page 18: Cloud Reliability Patterns

Pervasive Monitoring - Errors

18

Page 19: Cloud Reliability Patterns

19

Traceability

Page 20: Cloud Reliability Patterns

Traceability - log all I/O

“Nu Minimal Keynote Template” © Nu Bank - 05.01.2014 20

2016-03-28T17:50:23.991Z [ACQUISITION] INFO - {:cid "DEFAULT.6DTVD.PUQPE.EQ36K.4XQGO", :log :out-request, :method :get, :uri “https://prod-customers.nubank.com.br/api/customers/21ede21-1a6b-44dc-b731-f58279edd421"}

2016-03-28T17:50:23.993Z [CUSTOMERS] INFO - {:cid "DEFAULT.6DTVD.PUQPE.EQ36K.4XQGO.WHBVH", :log :in-request, :method :get, :path "/api/customers/:id"}

Page 21: Cloud Reliability Patterns

Traceability - Correlation Id

“Nu Minimal Keynote Template” © Nu Bank - 05.01.2014 21

Page 22: Cloud Reliability Patterns

22

Healthchecks

Page 23: Cloud Reliability Patterns

Healthchecks

23

Elastic Load Balancer

Page 24: Cloud Reliability Patterns

Healthchecks - Heartbeats

24

Elastic Load Balancer

Page 25: Cloud Reliability Patterns

25

Deadletters

Page 26: Cloud Reliability Patterns

Deadletters

26

Elastic Load Balancer

Page 27: Cloud Reliability Patterns

27

Circuit Breakers

Page 28: Cloud Reliability Patterns

Circuit Breakers

28

Elastic Load Balancer

Page 29: Cloud Reliability Patterns

29

Questions?

(we are hiring! http://nubank.workable.com)