56
События, шины и интеграция данных в непростом мире микросервисов

События, шины и интеграция данных в непростом мире микросервисов / Валентин Гогичашвили (Zalando SE)

  • Upload
    ontico

  • View
    225

  • Download
    6

Embed Size (px)

Citation preview

События, шины и интеграция данных в непростом мире микросервисов

Valentine Gogichashvili

Head of Data Engineering @ZalandoTech twitter: @valgog google+: +valgog email: [email protected]

15 countries

4 fulfillment centers

18+ million active customers

~3 billion € revenue

150,000+ products

10,000+ employees

135 million visits per month

One of Europe's largest online fashion retailers

Zalando Technology

BERLIN

BERLIN

DORTMUND

DUBLIN

HELSINKI

ERFURT

MÖNCHENGLADBACH

HAMBURG

Zalando Technology

1500+ Technologists

Rapidly growing

international team

http://tech.zalando.de

Good old small world

Good old small world

Started as a tiny online shop

Prototyped on Magento (PHP)

Used MySQL as a database

Web Application

Backend

Database

REBOOT

REBOOT

5½ years ago

• Java

• macro service architecture with SOAP as RPC layer

• PostgreSQL

• Heavy usage of Stored Procedures

• 4 databases + 1 sharded database on 2 shards

• Python for tooling (i.e code deploy automation)

REBOOT

Java Web Frontend

Java Backend

PostgreSQL

Java Backend

PostgreSQL

Java Backend

PostgreSQL 9.0 RC1 PostgreSQL 9.0

RC1 PostgreSQL 9.0 RC1 PostgreSQL

"macro" services

REBOOT

• PostgreSQL

• Heavy usage of Stored Procedures

• clean transaction scope

• very clean data

• processing close to data

REBOOT

• PostgreSQL

• Java Sproc Wrapper

• complex type mapping

• transparent sharding

Live long and prosper…

Very stable architecture that is still in use in the oldest

(vintage) components

We implemented everything ourselves starting from

warehouse and order management and finishing with

Web Shop and Mobile Applications

Live long and prosper…

"I want to code in Scala/Clojure/Haskell because it is cool and compact"

Live long and prosper…

"I want to code in Scala/Clojure/Haskell because it is cool and compact"

"But nobody will be able to support your code if you leave the company,

everybody should use Java, learn SQL and write Stored Procedures"

Live long and prosper…

"I want to code in Scala/Clojure/Haskell because it is cool and compact"

"But nobody will be able to support your code if you leave the company,

everybody should use Java, learn SQL and write Stored Procedures"

"I like the team and people but I am moving on to another company where I

can use cooler technologies!"

Radical Agility

Radical Agility

● Autonomy

● Purpose

● Mastery

Autonomous teams

● can choose own technology stack

● including persistence layer

● are responsible for operations

● should use isolated AWS accounts

Autonomy is not Anarchy

Supporting autonomy — STUPS

Supporting autonomy — STUPS

Supporting autonomy — Microservices

Business Logic

Data Storage

Team A Business Logic

Data Storage

Team B

RES

T A

PI R

EST AP

I

Applications communicate using REST APIs

Databases hidden behind the walls of AWS VPC

public Internet

Supporting autonomy — Microservices

Business Logic

Data Storage

Team A Business Logic

Data Storage

Team B

RES

T A

PI R

EST AP

I

public Internet

Classical ETL process is impossible!

Data integration in the classical world

Data integration in the classical world

DBA Business Intelligence

Developers

Data integration in the classical world

Data integration in the classical world

Business Logic DBA Business Intelligence

Data Warehouse (DWH)

Database

ETL process

DBA

BI

Dev

Data integration in the classical world

Classical ETL process

Business Logic

Data Warehouse (DWH)

Database DBA

BI

Business Logic

Database

Business Logic

Database

Business Logic

Database

Dev

Data integration in the classical world

Classical ETL process • Use-case specific — a lot of manual work

• Usually outputs data into a Data Warehouse

• well structured

• easy to use by the end user (SQL)

Data integration in the world of microservices

Data integration in the world of microservices

Data integration in the world of microservices

Business Logic

Database R

EST

AP

I

Business Logic

Database

RES

T A

PI

Business Logic

Database

RES

T A

PI

Business Logic

RES

T A

PI

Business Logic

Database

RES

T A

PI

Business Logic

Database

RES

T A

PI

Business Logic

RES

T A

PI

Data integration in the world of microservices

Queues to the rescue!

Data integration in the world of microservices

Queues to the rescue!

But will be there a unified bus?

Data integration in the world of microservices

Queues to the rescue!

But will be there a unified bus?

What about form and schema?

Data integration in the world of microservices

Business Logic

Database R

EST

AP

I

Business Logic

Database

RES

T A

PI

Business Logic

Database

RES

T A

PI

Business Logic

RES

T A

PI

Ap

p A

Ap

p B

Ap

p C

Ap

p D

Business Intelligence

Data integration in the world of microservices

Business Logic

Database R

EST

AP

I

Business Logic

Database

RES

T A

PI

Business Logic

Database

RES

T A

PI

Business Logic

RES

T A

PI

Ap

p A

Ap

p B

Ap

p C

Ap

p D

Business Intelligence

Data integration in the world of microservices

Business Logic

Database R

EST

AP

I

Business Logic

Database

RES

T A

PI

Business Logic

Database

RES

T A

PI

Business Logic

RES

T A

PI

Ap

p A

Ap

p B

Ap

p C

Ap

p D

Data heavy services (ML and DDDM)

Nakadi Event Bus

Nakadi Event Bus

• A secured HTTP API

This allows microservices teams to maintain service boundaries, and not

directly depend on any specific message broker technology. Access to the API

can be managed and secured using OAuth scopes;

• An event type registry

Events sent to Nakadi can be defined with a schema and managed via a

registry. Events can be validated before they are distributed to consumers;

• Inbuilt event types

Nakadi has optional support for events describing business processes and

data changes using standard primitives for identity, timestamps, event types,

and causality

Nakadi Event Bus

• Schema versioning

Enable schema evolution in the backwards compatible way;

• Low latency event delivery

Once a publisher sends an event using a simple HTTP POST, consumers can be

pushed to via a streaming HTTP connection, allowing near real-time event

processing;

• Built on proven infrastructure

Nakadi uses Apache Kafka as its internal message broker and PostgreSQL as a

backing database for storing metadata and schema information.

Nakadi Cluster

Schema validation

Schema metadata

store

Event metadata enrichment

ZooKeeper config store

Messaging Backend (Kafka now, Kynesis & others later)

Nakadi Broker

Management of offsets for high level API

Nakadi Event Bus — next steps

Nakadi Cluster

Nakadi Event Bus — next steps

Nakadi Cluster

Nakadi UI

Nakadi Event Bus — next steps

Nakadi Cluster (physically separated

per event type)

Nakadi Cluster (physically separated

per event type)

Nakadi UI

Nakadi Event Bus — next steps

Nakadi Cluster (physically separated

per event type)

Nakadi Cluster (physically separated

per event type)

Nakadi Schema Manager

Nakadi UI

Nakadi Event Bus — next steps

Nakadi Cluster (physically separated

per event type)

Nakadi Cluster (physically separated

per event type)

Nakadi Schema Manager

Nakadi Schema Crawler

Nakadi UI

Nakadi Event Bus — next steps

Nakadi Cluster (physically separated

per event type)

Nakadi Cluster (physically separated

per event type)

Nakadi Schema Manager

Nakadi Schema Crawler

Nakadi UI

Nakadi Topology Manager

Bubuku Kafka AWS supervisor

Bubuku Kafka AWS supervisor

• IP address discovery using AWS

• Exhibitor discovery using AWS load balancer name

• Rebalance partitions on different events

• React on Exhibitor topology changes

• Automatic Kafka restart in case if broker is considered dead

• Graceful broker termination in case of supervisor stop

• Broker start/stop/restart synchronization across the cluster