Log Analysis System And its designs in LINE Corp. 2014 early

  • View
    5.705

  • Download
    0

Embed Size (px)

DESCRIPTION

LINE developer meetup in fukuoka 1 #LINE_DM

Text of Log Analysis System And its designs in LINE Corp. 2014 early

  • Log Analysis Systems And its designs In LINE Corp. 2014 Early 2014/02/20 (Thu) @tagomoris (TAGOMORI Satoshi) LINE Corp. LINE Developer Meetup in Fukuoka #1 14220
  • TAGOMORI Satoshi (@tagomoris) LINE Corp. Development Support Team 14220
  • 14220
  • 14220
  • Data Collecting, Aggregation, Analytics, Visualization 14220
  • See also: OSS (2012 Summer) http://www.slideshare.net/tagomoris/oss-nhntech Log analysis system with Hadoop in livedoor 2013 Winter(2013 early) http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013 Batch and Stream processing with SQL (2013 Fall) http://www.slideshare.net/tagomoris/batch-and-stream-processing-with-sql 14220
  • disclaimer: This talk is about a log analysis system in LINE. 14220
  • SQL 14220
  • System Overview (2014) Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14220 BATCH SCHEDULED BATCH Shib ShibUI
  • System Overview (2014) Ruby Fluentd Cluster Web Servers STREAM Archive Storage (scribed) Notications (IRC) Fluentd Watchers Graph Tools Norikra Java webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14220 SCHEDULED BATCH NodeBATCH Perl Shib ShibUI
  • System Overview (2014) Archive Storage (scribed) Fluentd Cluster Web Servers uentd.conf STREAM Notications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) SQL hive server Huahin Manager Presto Cluster 14220 BATCH SCHEDULED BATCH Shib ShibUI
  • Who uses it? Internet Messaging Service Public Web Service Game Private Web Service (for closed person-to-persons) Internal Web Service (administrator only) Data Analytics Service 14220
  • Who uses it? Internet Messaging Service Public Web Service Game Private Web Service (for closed person-to-persons) Internal Web Service (administrator only) Data Analytics Service 14220
  • Data analytics players PROGRAMMER Raw Log Formats Application Logs Data Sizes Data Semantics SERVICE DIRECTOR SALES Whatever Metrics They Want Storages Hadoop Cluster Visualization Tools ADMINISTRATOR ........ BOARD MEMBER 14220
  • Data analytics players PROGRAMMER Raw Log Formats Application Logs Data Sizes Data Semantics SERVICE DIRECTOR SALES WE NEED THE QUERY LANGUAGE Whatever Metrics They Want WHAT THEY ALL CAN RUN AND UNDERSTAND!!!!!!!!!! Storages Hadoop Cluster Visualization Tools ADMINISTRATOR ........ BOARD MEMBER 14220
  • Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notications (IRC) Fluentd Watchers Graph Tools webhdfs Hadoop Cluster (HDFS, MR) 14220 hive server Huahin Manager BATCH SCHEDULED BATCH Shib ShibUI
  • Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14220 BATCH SCHEDULED BATCH Shib ShibUI
  • 14220
  • SQL: Hive 14220
  • SQL: Hive 14220
  • Norikra Schema-less Stream Processing with SQL 14220
  • 14220
  • Software Stack Hadoop: CDH 4.5.0 w/ JDK6 (WebHDFS, Hive, HiveServer) Presto: 0.59 w/ JDK7 Shib: v0.3.0 w/ Node.js v0.10 Fluentd: v0.10.39 w/ Ruby 2.0.0 And many plugins Norikra: v0.1.3 w/ JRuby 1.7.4 14220
  • 14220
  • Batches and Streams Hadoop is for batches High performance batch is important HDFS has good performance Stream log writing and calcurations are also VERY VERY IMPORTANT Hybrid System: Stream processing + Batch 14220
  • Collect and deliver as STREAM 14220 Calculate as BATCH
  • 1st gen: First impl. Web Servers Scribed STREAM (LIBHDFS) Hadoop Cluster CDH3b2 (Hadoop Streaming) 14220 hive server BATCH Shib Archive Storage (scribed)
  • Hadoop and Hive Filesystem (HDFS) Processing Framework (Hadoop MapReduce) Query Compiler: SQL -> MR (Hive) Thrift API Server (HiveServer) Old style Java (....) 14220
  • Shib WebUI Client for Hive Query editor/executer + result viewer HTTP JSON API Gateway for Hive query execution Node.js 14220
  • 2nd gen: +Fluentd Web Servers Archive Storage (scribed) Fluentd Cluster STREAM Cludera Hoop Hadoop Cluster CDH3u2 (Hive) 14220 hive server Huahin Manager BATCH Shib
  • Fluentd Log collector Apache-like conguration Pluggable Input/Output/Buer on public plugin repository (rubygems.org) Ruby 1.9 or later Collect, and Store collect: uent-agent-lite (perl) store: uent-plugin-webhdfs 14220
  • Collect and deliver as STREAM Calculate as BATCH Monitor as STREAM 14220
  • 3rd gen: +Monitoring Web Servers Archive Storage (scribed) Fluentd Cluster STREAM Fluentd Watchers webhdfs Hadoop Cluster CDH3u5 (Hive) 14220 Notications (IRC) hive server Huahin Manager BATCH Graph Tools SCHEDULED BATCH Shib ShibUI
  • Fluentd plugins Monitoring in real-time message num/size counting min, max, average and percentiles Visualization and Notication Graph tools (GrowthForecast / Focuslight) IRC (or Mail, HipChat, ...) 14220
  • 4th gen: +HA (hadoop) Web Servers Archive Storage (scribed) Fluentd Cluster STREAM Fluentd Watchers webhdfs Hadoop Cluster CDH4 (HDFS, YARN) 14220 Notications (IRC) hive server Huahin Manager BATCH Graph Tools SCHEDULED BATCH Shib ShibUI
  • Calculate as STREAM on demand Collect and deliver as STREAM Calculate as BATCH Monitor as STREAM 14220
  • 5th gen: +Norikra Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) 14220 hive server Huahin Manager BATCH SCHEDULED BATCH Shib ShibUI
  • Norikra SQL Query for Streams Add/Remove on demand (without restarts) ... and many features HTTP JSON API JRuby on JVM with Esper 14220
  • Norikra Queries: (1) {name:tagomoris, age:34, address:Tokyo, corp:LINE, current:Fukuoka} SELECT name, age FROM events WHERE current=Fukuoka {name:tagomoris,age:34} 14220
  • Norikra Queries: (2) {name:tagomoris, age:34, address:Tokyo, corp:LINE, current:Fukuoka} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=Fukuoka GROUP BY age every 5 mins {age:34,cnt:3}, {age:33,cnt:1}, ... 14220
  • Calculate as STREAM on demand Collect and deliver as STREAM Calculate as BATCH Monitor as STREAM 14220 Calculate as BATCH immediately on demand
  • 5th gen: +Presto Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14220 BATCH SCHEDULED BATCH Shib ShibUI
  • Presto Open sourced by Facebook at 2013/11/07 MPP Engine: Massive Parallel Processing Engine like Google BigQuery(Dremel), Cloudera Impala short latency queries (Its not main usage of Hive) SQL HTTP JSON API Java 7 ! 14220
  • Shib v0.3.0: presto support HiveServer User (browser) THRIFT HiveServer2 Shib Analysis Batches HTTP JSON API THRIFT HTTP JSON API Presto Service Admin Tools 14220
  • Non-monolithic architecture Many subsystems for many purposes Add/Update/Replace per subsystems High interoperability by RPC-based connections Gateway can hide backend implementations 14220
  • WHAT TO DO IS NOT WHAT WE WANT TO BUT WHAT WE ARE WANTED TO. 14220
  • THERE ARE MANY OF WHAT TO DO! THANKS! 14220
  • Software list: http://uentd.org/ http://prestodb.io/ http://norikra.github.io/ https://github.com/tagomoris/shib 14220