20
Fluentd and WebHDFS & what makes it possible to write out_webhdfs in 30min. TAGOMORI Satoshi (@tagomoris) NHN Japan Fluentd meetup 3 (2012/11/08) 12118日木曜日

Fluentd and WebHDFS

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Fluentd and WebHDFS

Fluentd and WebHDFS& what makes it possible to write out_webhdfs in 30min.

TAGOMORI Satoshi (@tagomoris)NHN Japan

Fluentd meetup 3 (2012/11/08)

12年11月8日木曜日

Page 2: Fluentd and WebHDFS

@tagomorisNHN Japan Corp (Web Service Division)

Fluentd committer, plugin developerfluent-agent-lite, ...

12年11月8日木曜日

Page 3: Fluentd and WebHDFS

Usecase of Fluentd

Monitoring, Notification and Visualization

growthforecast, notifier, ikachan, ....

Real-time aggregation

datacounter, numeric-counter, numeric-aggregator, ..

Real-time processing

parser, exec_filter, ....

12年11月8日木曜日

Page 4: Fluentd and WebHDFS

Log Collection !

12年11月8日木曜日

Page 5: Fluentd and WebHDFS

Fluentd as log collector

Many many output plugins for various storages

file, file-alternative

mongo, couch, cassandra, redis, s3, ....

Hadoooooooooooooooooooooooooooooooooooooop

12年11月8日木曜日

Page 6: Fluentd and WebHDFS

Fluentd with HDFS

To write data on HDFS:

Java native protocol: HDFSClient.java

hadoop fs -put

libhdfs and its binding (like scribed)

Cloudera Hoop (2011/07-)

+WebHDFS (Apache 1.0-), +HttpFs (Apache 2.0-)

12年11月8日木曜日

Page 7: Fluentd and WebHDFS

fluent-plugin-webhdfs

Output plugin to write data into HDFS

Supports WebHDFS and HttpFs

First release: 2012/05/20 by tagomoris

v0.1.0 bundled within td-agent v1.1.10 (or later)

12年11月8日木曜日

Page 8: Fluentd and WebHDFS

WebHDFSHTTP REST API of HDFS

Clients communicate all of NameNode and DataNodes (like HDFSClient)

HTTP

Client

NameNode

DataNode

DataNode

DataNode

12年11月8日木曜日

Page 9: Fluentd and WebHDFS

Java NativeHTTP

HttpFsProxy server 'httpfs', provides REST API for HDFS

Same method set with WebHDFS (not like Hoop)

Clients communicate with httpfs server onlyNameNode

DataNode

DataNode

DataNode

httpfsserverClient

12年11月8日木曜日

Page 10: Fluentd and WebHDFS

WebHDFS or HttpFs

WebHDFS: Peer-to-Peer communication

Jetty based HTTP server

High throughput and stability

HttpFs: Proxyed and Centralized communication

Tomcat based HTTP server

Simple network topology

Relatively low performance and SPOF

12年11月8日木曜日

Page 11: Fluentd and WebHDFS

Configuration: WebHDFSUse Apache 1.0.0(or later), CDH3u5 or CDH4(or later)

In Namenode/Datanodedfs.webhdfs.enabled=truedfs.support.append=true (only CDH3u5 ?)dfs.support.broken.append=true (only CDH3u5 ?)

In fluent-plugin-webhdfs (type webhdfs)host hostname.of.namenodeport 50070path /hdfs/access.%Y%m%d_%H.${hostname}.log

12年11月8日木曜日

Page 12: Fluentd and WebHDFS

WebHDFS in NHN Japan

BEFORE: 1400 Timeouts/day with Hoop

Tue Aug 14 15:04:34 2012 +0900"fix to use webhdfs to write into hdfs""2012-08-14 15:08:18 +0900: starting fluentd-0.10.25"Wed Aug 15 13:11:04 2012 +0900"fix timeouts for busy AM2-5"

AFTER: 130 Timeouts from 08/16 to 11/07

1.2-1.5 TB/day from 10 fluentd nodes

12年11月8日木曜日

Page 13: Fluentd and WebHDFS

CONCLUSION 1

WebHDFS is good enough for:

continuous appending into log file

daily operations to move/remove/copy/head/tail over client libraries (and your scripts)

Fluentd and td-agent is good enough for:

log collector before Hadoop/HDFS

12年11月8日木曜日

Page 14: Fluentd and WebHDFS

break

12年11月8日木曜日

Page 15: Fluentd and WebHDFS

fluent-plugin-webhdfs commit log

Thu May 17 18:20:15 2012 on 'fluent-plugin-webhdfs'

"writing code": in fact, no lines of ruby code....

Sun May 20 19:01:26 2012 on 'xxxxx'(some commits)

Sun May 20 19:35:34 2012 on 'fluent-plugin-webhdfs'

"fix typo": tagged as v0.0.1

12年11月8日木曜日

Page 16: Fluentd and WebHDFS

30min!?fluent-plugin-webhdfs

120 lines (including blank line and 'end')

65 lines of configurations

very few lines of actual code

WebHDFS operations by 'webhdfs' gem

Output formatting by 'PlainTextFormatterMixin'

12年11月8日木曜日

Page 17: Fluentd and WebHDFS

webhdfs gem commit log

Sun May 20 17:00:57 2012

(15 commits)

Sun May 20 19:01:26 2012

"v0.3: add WebHDFS::Client"

12年11月8日木曜日

Page 18: Fluentd and WebHDFS

fluent-mixin-*fluent-mixin-plaintextformatter

output text data formatter

webhdfs, file-alternative, hoop

fluent-mixin-config-placeholders

provide placeholders like '${hostname}', '${uuid}' in configurations

webhdfs, ping-message

12年11月8日木曜日

Page 19: Fluentd and WebHDFS

CONCLUSION 2

Output plugins have many (complex) problems:

communication, formatting, configuration formats, ...

We CAN/MUST depends on existing GEMS!

We SHOULD write fluent-mixin gems for other plugin developers!

many features/codes may be shared by many plugins

unified syntax/features over plugins

12年11月8日木曜日

Page 20: Fluentd and WebHDFS

Questions?

Thanks!

photo: croutonthanks to @kbysmnr

12年11月8日木曜日