48
Log Analysis Systems And its designs In LINE Corp. 2014 Early 2014/02/20 (Thu) @tagomoris (TAGOMORI Satoshi) LINE Corp. LINE Developer Meetup in Fukuoka #1 14220日木曜日

Log Analysis System And its designs in LINE Corp. 2014 early

Embed Size (px)

DESCRIPTION

LINE developer meetup in fukuoka 1 #LINE_DM

Citation preview

Page 1: Log Analysis System And its designs in LINE Corp. 2014 early

Log Analysis SystemsAnd its designsIn LINE Corp. 2014 Early

2014/02/20 (Thu)@tagomoris (TAGOMORI Satoshi)LINE Corp.LINE Developer Meetup in Fukuoka #1

14年2月20日木曜日

Page 2: Log Analysis System And its designs in LINE Corp. 2014 early

TAGOMORI Satoshi (@tagomoris)LINE Corp.

Development Support Team

14年2月20日木曜日

Page 3: Log Analysis System And its designs in LINE Corp. 2014 early

14年2月20日木曜日

Page 4: Log Analysis System And its designs in LINE Corp. 2014 early

14年2月20日木曜日

Page 5: Log Analysis System And its designs in LINE Corp. 2014 early

Data Collecting,Aggregation, Analytics,

Visualization

14年2月20日木曜日

Page 6: Log Analysis System And its designs in LINE Corp. 2014 early

See also:

「OSSで支えられるライブドアの巨大ログ集計」 (2012 Summer)http://www.slideshare.net/tagomoris/oss-nhntech

「Log analysis system with Hadoop in livedoor 2013 Winter」(2013 early)http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013

「Batch and Stream processing with SQL」 (2013 Fall)http://www.slideshare.net/tagomoris/batch-and-stream-processing-with-sql

14年2月20日木曜日

Page 7: Log Analysis System And its designs in LINE Corp. 2014 early

disclaimer:

This talk is about“a” log analysis system

in LINE.

14年2月20日木曜日

Page 8: Log Analysis System And its designs in LINE Corp. 2014 early

SQL好きですか?

14年2月20日木曜日

Page 9: Log Analysis System And its designs in LINE Corp. 2014 early

System Overview (2014)

Web ServersFluentdCluster

ArchiveStorage(scribed)

FluentdWatchers Graph

Tools

Notifications(IRC)

Hadoop Cluster(HDFS, MR)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

Norikra

Presto Cluster

14年2月20日木曜日

Page 10: Log Analysis System And its designs in LINE Corp. 2014 early

System Overview (2014)

Web ServersFluentdCluster

ArchiveStorage(scribed)

FluentdWatchers Graph

Tools

Notifications(IRC)

Hadoop Cluster(HDFS, MR)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

Norikra

Presto Cluster

JavaNode Perl

Ruby

14年2月20日木曜日

Page 11: Log Analysis System And its designs in LINE Corp. 2014 early

System Overview (2014)

Web ServersFluentdCluster

ArchiveStorage(scribed)

FluentdWatchers Graph

Tools

Notifications(IRC)

Hadoop Cluster(HDFS, MR)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

Norikra

Presto Cluster

SQL

fluentd.conf

14年2月20日木曜日

Page 12: Log Analysis System And its designs in LINE Corp. 2014 early

Who uses it?Internet Messaging Service

Public Web Service

Game

Private Web Service (for closed person-to-persons)

Internal Web Service (administrator only)

Data Analytics Service

14年2月20日木曜日

Page 13: Log Analysis System And its designs in LINE Corp. 2014 early

Who uses it?Internet Messaging Service

Public Web Service

Game

Private Web Service (for closed person-to-persons)

Internal Web Service (administrator only)

Data Analytics Service

14年2月20日木曜日

Page 14: Log Analysis System And its designs in LINE Corp. 2014 early

Data analytics players

StoragesHadoop Cluster

Visualization Tools

ADMINISTRATOR

Raw Log FormatsApplication Logs

Data SizesData Semantics

PROGRAMMER

SERVICE DIRECTORSALES

Whatever Metrics They Want

BOARD MEMBER

........

14年2月20日木曜日

Page 15: Log Analysis System And its designs in LINE Corp. 2014 early

Data analytics players

StoragesHadoop Cluster

Visualization Tools

ADMINISTRATOR

Raw Log FormatsApplication Logs

Data SizesData Semantics

PROGRAMMER

SERVICE DIRECTORSALES

Whatever Metrics They Want

BOARD MEMBER

........

WE NEED THE QUERY LANGUAGEWHAT THEY ALL CAN

RUN AND UNDERSTAND!!!!!!!!!!

14年2月20日木曜日

Page 16: Log Analysis System And its designs in LINE Corp. 2014 early

Web ServersFluentdCluster

ArchiveStorage(scribed)

FluentdWatchers Graph

Tools

Notifications(IRC)

Hadoop Cluster(HDFS, MR)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

14年2月20日木曜日

Page 17: Log Analysis System And its designs in LINE Corp. 2014 early

Web ServersFluentdCluster

ArchiveStorage(scribed)

FluentdWatchers Graph

Tools

Notifications(IRC)

Hadoop Cluster(HDFS, MR)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

Norikra

Presto Cluster

14年2月20日木曜日

Page 18: Log Analysis System And its designs in LINE Corp. 2014 early

14年2月20日木曜日

Page 19: Log Analysis System And its designs in LINE Corp. 2014 early

SQL: Hive

14年2月20日木曜日

Page 20: Log Analysis System And its designs in LINE Corp. 2014 early

SQL: Hive

14年2月20日木曜日

Page 21: Log Analysis System And its designs in LINE Corp. 2014 early

Schema-less Stream Processing with SQL

Norikra

14年2月20日木曜日

Page 22: Log Analysis System And its designs in LINE Corp. 2014 early

14年2月20日木曜日

Page 23: Log Analysis System And its designs in LINE Corp. 2014 early

Software StackHadoop: CDH 4.5.0 w/ JDK6 (WebHDFS, Hive, HiveServer)

Presto: 0.59 w/ JDK7

Shib: v0.3.0 w/ Node.js v0.10

Fluentd: v0.10.39 w/ Ruby 2.0.0

And many plugins

Norikra: v0.1.3 w/ JRuby 1.7.4

14年2月20日木曜日

Page 24: Log Analysis System And its designs in LINE Corp. 2014 early

14年2月20日木曜日

Page 25: Log Analysis System And its designs in LINE Corp. 2014 early

Batches and StreamsHadoop is for batchesHigh performance batch is important

HDFS has good performance

Stream log writing and calcurationsare also VERY VERY IMPORTANT

Hybrid System:Stream processing + Batch

14年2月20日木曜日

Page 26: Log Analysis System And its designs in LINE Corp. 2014 early

Collect and deliveras

STREAM

Calculateas

BATCH

14年2月20日木曜日

Page 27: Log Analysis System And its designs in LINE Corp. 2014 early

BATCH

1st gen: First impl.Web Servers Scribed

ArchiveStorage(scribed)

Hadoop ClusterCDH3b2

(Hadoop Streaming)

hiveserver

STREAM

Shib

(LIBHDFS)

14年2月20日木曜日

Page 28: Log Analysis System And its designs in LINE Corp. 2014 early

Hadoop and Hive

Filesystem (HDFS)

Processing Framework (Hadoop MapReduce)

Query Compiler: SQL -> MR (Hive)

Thrift API Server (HiveServer)

Old style Java (....)

14年2月20日木曜日

Page 29: Log Analysis System And its designs in LINE Corp. 2014 early

Shib

WebUI Client for Hive

Query editor/executer + result viewer

HTTP JSON API Gateway for Hive query execution

Node.js

14年2月20日木曜日

Page 30: Log Analysis System And its designs in LINE Corp. 2014 early

2nd gen: +FluentdWeb Servers Fluentd

Cluster

ArchiveStorage(scribed)

Hadoop ClusterCDH3u2

(Hive)

Cludera Hoop

HuahinManager

hiveserver

STREAM

Shib

BATCH

14年2月20日木曜日

Page 31: Log Analysis System And its designs in LINE Corp. 2014 early

FluentdLog collector

Apache-like configuration

Pluggable Input/Output/Buffer on public plugin repository (rubygems.org)

Ruby 1.9 or later

Collect, and Store

collect: fluent-agent-lite (perl)

store: fluent-plugin-webhdfs

14年2月20日木曜日

Page 32: Log Analysis System And its designs in LINE Corp. 2014 early

Collect and deliveras

STREAM

Calculateas

BATCH

Monitoras

STREAM

14年2月20日木曜日

Page 33: Log Analysis System And its designs in LINE Corp. 2014 early

3rd gen: +MonitoringWeb Servers Fluentd

Cluster

ArchiveStorage(scribed)

FluentdWatchers Graph

Tools

Notifications(IRC)

Hadoop ClusterCDH3u5

(Hive)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

14年2月20日木曜日

Page 34: Log Analysis System And its designs in LINE Corp. 2014 early

Fluentd plugins

Monitoring in real-time

message num/size counting

min, max, average and percentiles

Visualization and Notification

Graph tools (GrowthForecast / Focuslight)

IRC (or Mail, HipChat, ...)

14年2月20日木曜日

Page 35: Log Analysis System And its designs in LINE Corp. 2014 early

4th gen: +HA (hadoop)

Web Servers FluentdCluster

ArchiveStorage(scribed)

FluentdWatchers Graph

Tools

Notifications(IRC)

Hadoop ClusterCDH4

(HDFS, YARN)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

14年2月20日木曜日

Page 36: Log Analysis System And its designs in LINE Corp. 2014 early

Collect and deliveras

STREAM

Calculateas

BATCH

Monitoras

STREAM

Calculateas

STREAMon demand

14年2月20日木曜日

Page 37: Log Analysis System And its designs in LINE Corp. 2014 early

Web ServersFluentdCluster

ArchiveStorage(scribed)

FluentdWatchers Graph

Tools

Notifications(IRC)

Hadoop Cluster(HDFS, MR)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

Norikra

5th gen: +Norikra

14年2月20日木曜日

Page 38: Log Analysis System And its designs in LINE Corp. 2014 early

NorikraSQL Query for Streams

Add/Remove on demand (without restarts)

... and many features

HTTP JSON API

JRuby on JVM with Esper

14年2月20日木曜日

Page 39: Log Analysis System And its designs in LINE Corp. 2014 early

Norikra Queries: (1)

SELECT name, ageFROM events

WHERE current=”Fukuoka”

{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Fukuoka”}

{“name”:”tagomoris”,”age”:34}

14年2月20日木曜日

Page 40: Log Analysis System And its designs in LINE Corp. 2014 early

Norikra Queries: (2)

SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

WHERE current=”Fukuoka” GROUP BY age

{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Fukuoka”}

{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...

every 5 mins

14年2月20日木曜日

Page 41: Log Analysis System And its designs in LINE Corp. 2014 early

Collect and deliveras

STREAM

Calculateas

BATCH

Monitoras

STREAM

Calculateas

STREAMon demand

Calculateas

BATCHimmediatelyon demand

14年2月20日木曜日

Page 42: Log Analysis System And its designs in LINE Corp. 2014 early

Web ServersFluentdCluster

ArchiveStorage(scribed)

FluentdWatchers Graph

Tools

Notifications(IRC)

Hadoop Cluster(HDFS, MR)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

Norikra

Presto Cluster

5th gen: +Presto

14年2月20日木曜日

Page 43: Log Analysis System And its designs in LINE Corp. 2014 early

PrestoOpen sourced by Facebook at 2013/11/07

MPP Engine: Massive Parallel Processing Engine

like Google BigQuery(Dremel), Cloudera Impala

short latency queries (It’s not main usage of Hive)

SQL

HTTP JSON API

Java 7 !

14年2月20日木曜日

Page 44: Log Analysis System And its designs in LINE Corp. 2014 early

Shib v0.3.0: presto support

User(browser)

AnalysisBatches

ServiceAdmin Tools

Shib

HiveServer

Presto

HTTP JSON API HTTP JSON API

THRIFT

HiveServer2THRIFT

14年2月20日木曜日

Page 45: Log Analysis System And its designs in LINE Corp. 2014 early

Non-monolithic architecture

Many subsystems for many purposes

Add/Update/Replace per subsystems

High interoperability by RPC-based connections

Gateway can hide backend implementations

14年2月20日木曜日

Page 46: Log Analysis System And its designs in LINE Corp. 2014 early

WHAT TO DOIS

NOT WHAT WE WANT TOBUT

WHAT WE ARE WANTED TO.

14年2月20日木曜日

Page 47: Log Analysis System And its designs in LINE Corp. 2014 early

THERE ARE MANY OF WHAT TO DO!

THANKS!

14年2月20日木曜日

Page 48: Log Analysis System And its designs in LINE Corp. 2014 early

Software list:

http://fluentd.org/

http://prestodb.io/

http://norikra.github.io/

https://github.com/tagomoris/shib

14年2月20日木曜日