Upload
krzysztof-debski
View
388
Download
2
Embed Size (px)
Citation preview
Krzysztof Dębski, 22nd October 2015
How to handle large amount of events
Geecon.cz 2015
Krzysztof Dębski, 22nd October 2015
Who am I
15 years as an IT professional
6 years as an Architect in Allegro Group
1 year as a Product Owner http://hermes.allegro.tech
@DebskiChris
Krzysztof Dębski, 22nd October 2015
Allegro Group
500+ people in IT
50+ independent teams
16 years on market
2 years after technical revolution
Krzysztof Dębski, 22nd October 2015
Agenda
Choose a tool
Distribute load
Be reliable
Focus on throughput or latency
Improve security
Think ahead
Krzysztof Dębski, 22nd October 2015
What is important when handling events
Choose a tool
Distribute load
Be reliable
Focus on throughput or latency
Improve security
Think ahead
Krzysztof Dębski, 22nd October 2015
Choose a tool
Krzysztof Dębski, 22nd October 2015
Kafka
Service
Producer
Service
Consumer
KafkaBroker
Zookeeper
Krzysztof Dębski, 22nd October 2015
Kafka topics
Producer_1…Producer_n
Remove old events
Publish eventTopic
Krzysztof Dębski, 22nd October 2015
Kafka partitions Producer_1…Producer_n
Publish eventPartition 0
Partition 1
Partition 2
Krzysztof Dębski, 22nd October 2015
1
Consumer group 2Consumer group 1
Kafka partitions
Consumer 1
Broker 1
P0
Broker 2
P1
Broker 3
P2
Consumer 2 Consumer 3 Consumer 4 Consumer 5
Krzysztof Dębski, 22nd October 2015
1
Consumer group 2Consumer group 1
Kafka partitions
Consumer 1
Broker 1
P0
Broker 2
P1
Broker 3
P2
Consumer 2 Consumer 3 Consumer 4 Consumer 5 Consumer 6
Krzysztof Dębski, 22nd October 2015
1
Consumer group 2Consumer group 1
Kafka partitions
Consumer 1
Broker 1
P0
Broker 2
P1
Broker 3
P2
Consumer 2 Consumer 3 Consumer 4 Consumer 5 Consumer 6
Broker 4
P3
Krzysztof Dębski, 22nd October 2015
Kafka replicas
Service
Producer
Service
Consumer
Broker
Zookeeper
Broker
Broker
P1 P0
P2 P1
P0 P2
Krzysztof Dębski, 22nd October 2015
Kafka replicas
Service
Producer
Service
Consumer
Broker
Zookeeper
Broker
Broker
P1 P0
P2 P1
P0 P2
Krzysztof Dębski, 22nd October 2015
HermesHermes Frontend
Hermes Frontend
Hermes Frontend
Hermes Consumer
Hermes ConsumerREST
REST, JMS
Krzysztof Dębski, 22nd October 2015
Distribute load
Krzysztof Dębski, 22nd October 2015
Kafka partitions Producer
Publish eventPartition 0
Partition 1
Partition 2
Default Partitioning - Round Robin
Krzysztof Dębski, 22nd October 2015
Kafka partitions Producer
Publish eventPartition 0
Partition 1
Partition 2
Default Partitioning - Round Robinin practice
Binds to single partition for 10 mins
Krzysztof Dębski, 22nd October 2015
Kafka partitions Producer
Key_0Partition 0
Partition 1
Partition 2
Default Partitioning - Key based
Key_1
Key_2
Krzysztof Dębski, 22nd October 2015
Rebalancing leadersBroker 1
P1 P0
Broker 2
P2 P1
Broker 3
P0 P2
Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000
Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
Krzysztof Dębski, 22nd October 2015
Rebalancing leadersBroker 1
P1 P0
Broker 2
P2 P1
Broker 3
P0 P2
Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000
Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
Brokers that should have partition copies
Krzysztof Dębski, 22nd October 2015
Rebalancing leadersBroker 1
P1 P0
Broker 2
P2 P1
Broker 3
P0 P2
Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000
Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
In Sync Replicas
Krzysztof Dębski, 22nd October 2015
Rebalancing leadersBroker 1
P1 P0
Broker 2
P2 P1
Broker 3
P0 P2
Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000
Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
Leader broker ID
Krzysztof Dębski, 22nd October 2015
Rebalancing leadersBroker 1
P1 P0
Broker 2
P2 P1
Broker 3
P0 P2
Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000
Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
Krzysztof Dębski, 22nd October 2015
Rebalancing leadersBroker 1
P1 P0
Broker 2
P2 P1
Broker 3
P0 P2
Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000
Topic: test Partition: 0 Leader: 1 Replicas: 3, 1 ISR: 1Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2
Krzysztof Dębski, 22nd October 2015
Rebalancing leadersBroker 1
P1 P0
Broker 2
P2 P1
Broker 3
P0 P2
Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000
Topic: test Partition: 0 Leader: 1 Replicas: 3, 1 ISR: 1, 3Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
Krzysztof Dębski, 22nd October 2015
Be reliable
Krzysztof Dębski, 22nd October 2015
ACK levels
0 - don’t wait for the response
1 - only the leader has to acknowledge
-1 - all replicas must be in sync
Spee
d
Safe
ty
Krzysztof Dębski, 22nd October 2015
Lost events
ERROR [Replica Manager on Broker 2]: Error when processing fetch request for partition [test,1] offset 10000 from consumer with correlation id 0. Possible cause:
Request for offset 10000 but we only have log segments in the range 8000 to 9000. (kafka.server.ReplicaManager)
Krzysztof Dębski, 22nd October 2015
Lost events
Broker 1 Broker 2
Producer
ACK = 1
Replication factor = 1
offset commited = 10000 offset commited = 9000
Zookeeper
Krzysztof Dębski, 22nd October 2015
Lost events
Broker 1 Broker 2
Producer
ACK = 1
Replication factor = 1
replica.lag.max.messages = 2000
offset commited = 10000 offset commited = 9000
Zookeeper
Krzysztof Dębski, 22nd October 2015
Lost events
Broker 1 Broker 2
offset commited = 10000 offset commited = 9000
Zookeeper
Producer
Krzysztof Dębski, 22nd October 2015
Lost events
Broker 1 Broker 2
offset commited = 10000 offset commited = 9000
Zookeeperoffset commited = 9000
Producer
Krzysztof Dębski, 22nd October 2015
Event identification
Hermes Frontend
KafkaBroker
POST{“event”: ”test”}
{ "id": "58d7ff07-dd0e-4103-9b1f-55706f3049e6", "timestamp”: 1430443071995, “data”: {“event”: ”test”}}
HTTP 201 CreatedMessage-id: 58d7ff07-dd0e-4103-9b1f-55706f3049e6
Krzysztof Dębski, 22nd October 2015
Lost events
Hermes FrontendProducer
Hermes Consumer
Consumer
KafkaBroker
Zookeeper
Tracker
Publicationdata Delive
ry
attempts
Krzysztof Dębski, 22nd October 2015
Normal operation
Hermes FrontendProducer
Hermes Consumer
Consumer
KafkaBroker
Zookeeper
POST
HTTP201Created
Krzysztof Dębski, 22nd October 2015
Abnormal situation
Hermes FrontendProducer
Hermes Consumer
Consumer
KafkaBroker
Zookeeper
POST
HTTP202Accepted
Krzysztof Dębski, 22nd October 2015
Focus on throughput or latency
Krzysztof Dębski, 22nd October 2015
Throughput
Does whole world stop when you stop?
Krzysztof Dębski, 22nd October 2015
Throughput
Co-ordinated omission
Krzysztof Dębski, 22nd October 2015
75%
99%
99,9%
resp
onse
tim
e
Slow responses
Krzysztof Dębski, 22nd October 2015
Slow response vs. message sizem
essa
ge c
size
75%
99%
99,9%
Krzysztof Dębski, 22nd October 2015
resp
onse
tim
e
75%
99%
99,9%
Slow response and fixed message size
Krzysztof Dębski, 22nd October 2015
Kafka
kernel 3.2.x
Krzysztof Dębski, 22nd October 2015
Kafka
kernel 3.2.x
Krzysztof Dębski, 22nd October 2015
Kafka
kernel 3.2.x kernel >= 3.8.x
Krzysztof Dębski, 22nd October 2015
Optimize message sizem
essa
ge s
ize
99,9%all topics
99,9%biggest topic
Krzysztof Dębski, 22nd October 2015
Optimize message size
JSON human readable
big memory and network footprint
poor support for Hadoop
Krzysztof Dębski, 22nd October 2015
Optimize message size
JSON
Snappy
ERROR Error when sending message to topic t3 with key: 4 bytes, value: 100
bytes with error: The server experienced an unexpected error when
processing the request (org.apache.kafka.clients.producer.internals.
ErrorLoggingCallback)
java: target/snappy-1.1.1/snappy.cc:423: char* snappy::internal::
CompressFragment(const char*, size_t, char*, snappy::uint16*, int): Assertion
`0 == memcmp(base, candidate, matched)' failed.
errors on publishing large amount of messages
Krzysztof Dębski, 22nd October 2015
Optimize message size
JSON
Snappy
Lz4
failed on distributed data
com
pres
sion
ratio
single
topic
multiple
topics
Krzysztof Dębski, 22nd October 2015
Optimize message size
JSON
Snappy
Lz4
Avro
small network footprint
Hadoop friendly
easy schema verification
Krzysztof Dębski, 22nd October 2015
Kafka Offset Monitor
Krzysztof Dębski, 22nd October 2015
Improve security
Krzysztof Dębski, 22nd October 2015
Kafka-1682
Kafka <= 0.8.2
No security !
Kafka > 0.8.2
unix-like users, permissions, ACL
Krzysztof Dębski, 22nd October 2015
Manage your topics
cz.geecon.demo.basicGroup Topic
Krzysztof Dębski, 22nd October 2015
Improved securityAuthentication and authorization interfaces provided
By Default:
You can create any topic in your group
You can publish everywhere (in progress)
Group owner defines subscriptions
Krzysztof Dębski, 22nd October 2015
Think ahead
Krzysztof Dębski, 22nd October 2015
Consumer backoff
You can’t have exactly one delivery
http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/
Krzysztof Dębski, 22nd October 2015
Improved offset managementHermesProducer
Hermes consumer
Publish event
Commited
Local unsent events
Read event
Krzysztof Dębski, 22nd October 2015
Improved offset management
Hermes consumer
Local unsent events
New event
Serviceinstance
Krzysztof Dębski, 22nd October 2015
Improved offset management
Hermes consumer
Local unsent events
New event
Serviceinstance
HTTP 503 Unavailable
Krzysztof Dębski, 22nd October 2015
Improved offset management
Local unsent events
New event
Serviceinstance
HTTP 503 Unavailable
Check TTL & Add to queue
Hermes consumer
Krzysztof Dębski, 22nd October 2015
Consumer backoff
100% adapt 1/s 1/min
Krzysztof Dębski, 22nd October 2015
Turn back the time
PUT /groups/{group}/topics/{topic}/subscriptions/{subscription}/retransmission -8h
Krzysztof Dębski, 22nd October 2015
Blog: http://allegro.tech
Twitter: @allegrotechblog
Twitter: @debskichris