Upload
zhenzhong-xu
View
199
Download
0
Embed Size (px)
Citation preview
Keystone Event Processing Pipeline
Zhenzhong Xu
on a Dockerized Microservices Architecture
Real-time Data Infrastructure – Netflix
Cloud Infrastructure - Microsoft
About Me
About Netflix
● 83M+ Subscribers
● 125M+ Streaming Hours / Day
● > 1/3 Peak NA Internet Traffic
● Thousands of Device Types
● Many Tens of Thousands of VMs
● 3 Active-Active Regions Across the World
Observe
Orient
Decide
Act CD
Observe
Orient
Decide
Act
Innovation
CD
Observe
Orient
Decide
Act
Innovation
Big Data
CD
Observe
Orient
Decide
Act
Innovation
Big Data
Culture
CD
Observe
Orient
Decide
Act
Innovation
Big Data
Culture
Cloud
CD
Microservices Ecosystem
Why a Event Processing Platform in Netflix?
● 500+ Billion events generated per day
● 1+T events processed per day
○ >1 PB
○ 4M – 16M / sec
○ 13GB - 43GB /sec
● Message Payload: 3 kb - 10mb
Data Driven Culture
● Realtime System Failure Detection
● A/B Testing
● Recommendation Algorithm
● Fraud Detection
● Distributed Tracing
● Log Quering
Paved Road in a Microservices Ecosystem
Microservices produces events
Storage service, and Batch/Stream
Processing services
Event Processing
Pipeline
Paved Road in a Microservices Ecosystem
Supports Batch & Streaming
Evolution of Netflix Keystone Pipeline
In the Old Days ...
EMR
EventProducers
About a year ago
EventProducer
Druid
Stream Consumers
EMR
ConsumerKafka
Suro Router
EventProducer
Suro
Kafka
SuroProxy
Today
Stream Consumers
SamzaRouter
EMR
FrontingKafka
ConsumerKafka
Control Plane
EventProducer
KS P
roxy
Self Service UI
Event flowKeystone Pipeline As a Service
Stream Consumers
SamzaRouter
EMR
FrontingKafka
EventProducer
ConsumerKafka
Control Plane
Self Service UI
Stream Consumers
SamzaRouter
EMR
FrontingKafka
EventProducer
ConsumerKafka
Control Plane
Self Service UI
Stream Consumers
SamzaRouter
EMR
FrontingKafka
EventProducer
ConsumerKafka
Control Plane
Self Service UI
Stream Consumers
SamzaRouter
EMR
FrontingKafka
EventProducer
ConsumerKafka
Control Plane
Self Service UI
Stream Consumers
SamzaRouter
EMR
FrontingKafka
EventProducer
ConsumerKafka
Control Plane
Self Service UI
What exactly is Keystone?
Keystone is ...
… a collection of microservices & components
Stream Processing
ServiceElastic
Pub/Sub Queue
Producer API
Control Plane
Consumer API
Self Service UI
Keystone is ...… a single self-contained logical service
Event Processing
Pipeline
Keystone is ...… an self-scaling, multi-tenancy service that embraces CI/CD
Keystone is ...… a self healing, cloud failure tolerant service, guarantees at-least-once delivery semantics
Let’s drill down ...
For the purposes of this talk, we’ll focus on...
Stream Processing
Service
Elastic Pub/Sub Queue
Producer SDK
Control Plane
Consumer SDK
Self Service UI
Overview
Self Service UI
Routing Infrastructure
Routing Infrastructure
EC2 InstancesZookeeper(Instance Id assignment)
JobJob
Job
ksnode
Checkpointing Cluster
Server Group (Cluster)Store logs
in S3
Routing Infrastructure
+
CheckpointingCluster
+
0.9.1Go
C language
Control Plane
Custom Cluster Orchestration and Scheduling Layer
Control Plane
• Decides container resources
• Schedules container placements
• Orchestrates cluster deployments
Design Decisions?Distributed System is all about
trade-offs.
Container
● Process Isolation● Fast Startup
Service Protocol
● Declarative
● Idempotent
● Reconciliation
State Management
● Stateless vs Stateful service
● Single source of truth
Scaling
● Self Scaling
● Partition boundary
● Idempotent operations
● Immutable server deployments
Delivery Semantics
● At-most-once
● At-least-once (best effort)
● Exactly-once
At-least-once under failure condition
● Checkpointing mechanism
● Optimize for writes
● Occasional reads
Multi-tenancy
● Isolation
● Heterogenous
● Cluster fragmentation
Failure Recovery
• Back pressure
• Network blip
• Container level failure
• Instance level failure
• Zone level failure
• Cluster level failure - Kafka-Kong
• Regional failure - Chaos-Kong
Stream Processing Engine
• Discovery integration
• Custom wire format integration
• Samza: Per partition serialized process loop
• Samza: Simple payload transformation
• Plugable abstraction
Current Scale - Routing Service
● 14,000 + docker containers
● 1,400 + EC2 C3-4XL instances
● 3 regions
Future Improvements
● Integrate with more sophisticated orchestration/scheduling/cluster management ecosystem
● Unlock value in real-time unbounded data streams
● Data Discovery● Data Silos
Questions?