Upload
emreakis
View
671
Download
0
Embed Size (px)
Citation preview
a
Emre Akış
2
Outline
• Why do we use Apache Kafka ?
• What is it?
• How it works?
• Demo
• Ecosystem
3
Big Data
• Data doesn’t fit in one computer
• Welcome to the distributed systems
4
(Near) Real-time Big Data & Analytics
• Events (e.g. clickstreams)
• Sensors
• Internet of Things (IoT)
• Data streams
5
Messaging Queues
FIFO
6
Distributed Messaging Queues
• Scalable
• Reliable
• High throughput (read & write)
7
Why’s for Apache Kafka
• Clean and simple architecture
• Easy to use
• Easy to deploy
• High throughput
• Scalability
• High availability
• Persistence (for a while)
8
Apache Kafka 101
• Distributed, partitioned, replicated commit log
service.
• Provides the functionality of a messaging
system.
9
Cluster
Language agnostic TCP protocol
Cluster => group of servers(brokers)
10
Topic
• Category or feed name to which messages are published.
• Partitioned log• Each partition– Ordered– Immutable seq.– Appended to
offset => sequential id number
11
Partition Distribution
• Distributed over servers in the cluster• Replicated for fault tolerance (configurable)• Each partition has a leader server (read &
writes)• Others acts followers (replicate leader)• In case of partition failure one of the followers
becomes new leader
12
Producer
• Decides which message to which partition
– Round-robin
– Semantic partitioning
13
Consumer
• Queue vs. Publish/Subscribe• Traditional queue ordering vs per-partition
ordering
14
Guarantees
• Messages in a partition will be same order they are sent by a producer.
• Consumers see messages in the stored order in log.
15
Demo
• Basic Command Line Tools – Start a server– Create a topic– Send a message– Start a consumer– Multi-broker cluster
• No arguments displays usage information
16
Clients
• Java• Python• Ruby• Go• C/C++• .NET• Clojure• Node.js
• Scala• JRuby• Perl• Erlang• PHP• Rust• HTTP Rest
https://cwiki.apache.org/confluence/display/KAFKA/Clients
17
Administrative Tools
• Kafka Manager (powered by Yahoo)• Kafkat : Command-line administration for Kafka
brokers.• Kafka Web Console : Displays information about
your Kafka cluster including which nodes are up and what topics they host data for.
• Kafka Offset Monitor : Displays the state of all consumers and how far behind the head of the stream they are.
18
Ecosystem
• Samza• Spark Streaming• Storm
https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
19
Use Cases
• Messaging • Website activity tracking (at Linkedin)• Metrics • Log aggregation • Stream processing (with Storm or Samza)• Event sourcing (state changes are logged by time)• Commit log (like database transaction log – log
compaction)
20
Who uses ?
• Yahoo
• Netflix
• Spotify
• Uber
• Goldman Sachs
• Tumblr
• PayPal
• Box
• Airbnb
• Mozilla
• Cisco
• Etsy
• Foursquare
• StumbleUpon
• Coursera
• …
https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
21
Resources• http://kafka.apache.org/• https://cwiki.apache.org/confluence/display/KAFKA/Index• https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem• http://www.confluent.io/blog
22
Q & A
23
About Me
• Twitter : @akisemre• Linkedin : https://tr.linkedin.com/in/emreakis