Upload
satoshi-tagomori
View
16.660
Download
4
Embed Size (px)
DESCRIPTION
Citation preview
Fluentd and WebHDFS& what makes it possible to write out_webhdfs in 30min.
TAGOMORI Satoshi (@tagomoris)NHN Japan
Fluentd meetup 3 (2012/11/08)
12年11月8日木曜日
@tagomorisNHN Japan Corp (Web Service Division)
Fluentd committer, plugin developerfluent-agent-lite, ...
12年11月8日木曜日
Usecase of Fluentd
Monitoring, Notification and Visualization
growthforecast, notifier, ikachan, ....
Real-time aggregation
datacounter, numeric-counter, numeric-aggregator, ..
Real-time processing
parser, exec_filter, ....
12年11月8日木曜日
Log Collection !
12年11月8日木曜日
Fluentd as log collector
Many many output plugins for various storages
file, file-alternative
mongo, couch, cassandra, redis, s3, ....
Hadoooooooooooooooooooooooooooooooooooooop
12年11月8日木曜日
Fluentd with HDFS
To write data on HDFS:
Java native protocol: HDFSClient.java
hadoop fs -put
libhdfs and its binding (like scribed)
Cloudera Hoop (2011/07-)
+WebHDFS (Apache 1.0-), +HttpFs (Apache 2.0-)
12年11月8日木曜日
fluent-plugin-webhdfs
Output plugin to write data into HDFS
Supports WebHDFS and HttpFs
First release: 2012/05/20 by tagomoris
v0.1.0 bundled within td-agent v1.1.10 (or later)
12年11月8日木曜日
WebHDFSHTTP REST API of HDFS
Clients communicate all of NameNode and DataNodes (like HDFSClient)
HTTP
Client
NameNode
DataNode
DataNode
DataNode
12年11月8日木曜日
Java NativeHTTP
HttpFsProxy server 'httpfs', provides REST API for HDFS
Same method set with WebHDFS (not like Hoop)
Clients communicate with httpfs server onlyNameNode
DataNode
DataNode
DataNode
httpfsserverClient
12年11月8日木曜日
WebHDFS or HttpFs
WebHDFS: Peer-to-Peer communication
Jetty based HTTP server
High throughput and stability
HttpFs: Proxyed and Centralized communication
Tomcat based HTTP server
Simple network topology
Relatively low performance and SPOF
12年11月8日木曜日
Configuration: WebHDFSUse Apache 1.0.0(or later), CDH3u5 or CDH4(or later)
In Namenode/Datanodedfs.webhdfs.enabled=truedfs.support.append=true (only CDH3u5 ?)dfs.support.broken.append=true (only CDH3u5 ?)
In fluent-plugin-webhdfs (type webhdfs)host hostname.of.namenodeport 50070path /hdfs/access.%Y%m%d_%H.${hostname}.log
12年11月8日木曜日
WebHDFS in NHN Japan
BEFORE: 1400 Timeouts/day with Hoop
Tue Aug 14 15:04:34 2012 +0900"fix to use webhdfs to write into hdfs""2012-08-14 15:08:18 +0900: starting fluentd-0.10.25"Wed Aug 15 13:11:04 2012 +0900"fix timeouts for busy AM2-5"
AFTER: 130 Timeouts from 08/16 to 11/07
1.2-1.5 TB/day from 10 fluentd nodes
12年11月8日木曜日
CONCLUSION 1
WebHDFS is good enough for:
continuous appending into log file
daily operations to move/remove/copy/head/tail over client libraries (and your scripts)
Fluentd and td-agent is good enough for:
log collector before Hadoop/HDFS
12年11月8日木曜日
break
12年11月8日木曜日
fluent-plugin-webhdfs commit log
Thu May 17 18:20:15 2012 on 'fluent-plugin-webhdfs'
"writing code": in fact, no lines of ruby code....
Sun May 20 19:01:26 2012 on 'xxxxx'(some commits)
Sun May 20 19:35:34 2012 on 'fluent-plugin-webhdfs'
"fix typo": tagged as v0.0.1
12年11月8日木曜日
30min!?fluent-plugin-webhdfs
120 lines (including blank line and 'end')
65 lines of configurations
very few lines of actual code
WebHDFS operations by 'webhdfs' gem
Output formatting by 'PlainTextFormatterMixin'
12年11月8日木曜日
webhdfs gem commit log
Sun May 20 17:00:57 2012
(15 commits)
Sun May 20 19:01:26 2012
"v0.3: add WebHDFS::Client"
12年11月8日木曜日
fluent-mixin-*fluent-mixin-plaintextformatter
output text data formatter
webhdfs, file-alternative, hoop
fluent-mixin-config-placeholders
provide placeholders like '${hostname}', '${uuid}' in configurations
webhdfs, ping-message
12年11月8日木曜日
CONCLUSION 2
Output plugins have many (complex) problems:
communication, formatting, configuration formats, ...
We CAN/MUST depends on existing GEMS!
We SHOULD write fluent-mixin gems for other plugin developers!
many features/codes may be shared by many plugins
unified syntax/features over plugins
12年11月8日木曜日
Questions?
Thanks!
photo: croutonthanks to @kbysmnr
12年11月8日木曜日