Upload
rakuten-inc
View
624
Download
3
Embed Size (px)
Citation preview
Rakuten’s Journey with Splunk
- Evolution of Splunk as a Service
June/30/2016
Keisuke Noda / Peng Yang
Rakuten, Inc.
About Us
• Name– Keisuke Noda– 野田啓介
• Position– Architect / Manager– Data Store Platform Group
• Background– Software engineer– Database administrator
About Us
• Name– Peng Yang (Larry)– 陽 鵬
• Position– Infra/Web engineer– Data Store Platform Group
• Background– Software engineer– AR/MR engineer
Founded: February 7, 1997
IPO: April 19, 2000 (JASDAQ Stock Exchange)
Office: Rakuten Crimson House (Tokyo, Japan)
Employees: 12,981 (as of Dec, 2015)
Capital: JPY 203,587 million (as of Dec, 2015)
About Company
• Self-made database monitoring system
• Legacy and complex system
Batch Server RDBMS Web Application
Add a column Modify
codesModifycodes
Add a column
Input the Datainto RDB
Store the DataVisualizethe Data
Before
OutputDatabase Status
Why Splunk?
• Self-made database monitoring system
• One Splunk is simple
Input Data / Store Data / Visualize DataOutput
Database Status
Then, Splunk began to be used in various groups…
So Easy!!
All in One!
Cool Visuals!!
After
Why Splunk?
. . . Splunk as a Service was born
• Splunking in various groups
• Many repetitive operations(license management, construction, operations …etc)
One big platform will solve the problem.In addition, it may have many other benefits…
Why as a Service?
Corporate IT …Merchant Security
Server
Example
Dep. A Dep. B
Marketplace Credit Card
E-money
Database
Network
Dep. C Dep. D Dep. E
• Rakuten’s organization
• Many departments and groups
Service Overview
Admin User
…Network
Security
Credit Card
Corporate IT
Our Group
• Roles of Splunk as a Service
• Admin
• User
Service Overview
• No need to manage Infrastructure
• Easy to start Splunking instantly
• Charged by measured rate
• Input Size
• Storage SizeRakuten
Splunk as a Service
Details inlater part
For UserService Overview
• Environment
• Private Cloud• High availability
• On time delivery
• Flexibility
• Physical servers for Indexer• Many-core
• Large Capacity SSD
Service Design
Splunk as a service
• System configuration• v6.3.X (as of June 2016)
• Using indexer cluster
• Using SHC
• Full components• SHC, >10 Dedicated SHs
• Cluster Master
• >10 Indexers
• Heavy Forwarders
• Deployment Server
CM
SHC Dedicated SH
Indexer
Forwarder
Server
DS
Service Design
• Other specifications• Splunk account is created for each user
• 1 user = 1 group, 1 service, or 1 project
• Each user has his/her own App
• Basically a user can see only his/her own data
• Users can choose the term of storage retention from 1 day to 6 years for each input
• Admin does not do backups
• Dedicated Search Head is ready for users who need
Service Design
DatabaseReal-time monitoringTroubleshootingUsage reportService KPI management
SecurityIDS real-time monitoringFraud detection
Private CloudReal-time monitoringResource management
ApplicationReal-time monitoringService KPI managementPerformance management
StorageReal-time monitoringResource managementService KPI management
More…
NetworkReal-time monitoringTroubleshootingTrend analysis
ServerReal-time monitoringTroubleshootingUsage report
Use Cases
DatabaseReal-time monitoringTroubleshootingUsage reportService KPI management
SecurityIDS real-time monitoringFraud detection
Private CloudReal-time monitoringResource management
ApplicationReal-time monitoringService KPI managementPerformance management
StorageReal-time monitoringResource managementService KPI management
More…
NetworkReal-time monitoringTroubleshootingTrend analysis
ServerReal-time monitoringTroubleshootingUsage report
Use Cases
• Before
• Analyze by grep command
• Take >10 minutes to handle incidents
• After
• Application access/error monitoring in real-time
• Address incidents automatically
Application
Use Cases
ApplicationReal-time monitoringPerformance management
access log
Log Sharing among users
SecurityIDS real-time monitoringFraud detection
Use Cases
• Before
• Have difficulties to get access log
• Take a lot of time to analyze…
• After
• Analyze log easily only by themselves
• Detect irregular accesses with deep algorism
Security
Use Cases
DatabaseReal-time monitoringTroubleshootingUsage reportService KPI management
SecurityIDS real-time monitoringFraud detection
Private CloudReal-time monitoringResource management
ApplicationReal-time monitoringService KPI managementPerformance management
StorageReal-time monitoringResource managementService KPI management
More…
NetworkReal-time monitoringTroubleshootingTrend analysis
ServerReal-time monitoringTroubleshootingUsage report
Use Cases
• Users• Difficult to start using Splunk (Small number of users) • No standard format to configure .conf files (Take much time) • Difficult to manage current configurations (Inconvenient management)
• Admins• Make configurations for each user request manually (High Man-hour) • Difficult to manage current configurations (Hard to maintain)
Need a tool to improve the situation
Why dotconf-assist?
• Users of dotconf-assist• User• Admin
• Application type• RESTful web application based on Splunk API
• The features of dotconf-assist• (User) manage Splunk Inputs, Apps, Forwarders, Server Class and
Deployment requests etc.• (Admin) manage Splunk account information, users’ configurations,
users’ requests etc.
What is dotconf-assist?
Sign inSign up
Approve
Create Splunk
Account
SetServer Class
Set App
RequestDeployment
Search
ApproveDeployment
Deploy Apps(Automatically)
Install Forwarders
DEV STG PROD DEV STG PROD
User Process
AdminProcess
ManageDeployment
Users’ Servers
dotconf-assist
Splunk Servers
Workflow of dotconf-assist
Splunk Users Before After
Configurations Send ticket to admin Only input necessary value
Deployment request Send ticket to admin Simple clicks
Lead time to start Splunk 1 day <10 min
Splunk Admins Before After
Handle users’ requests Create an account (>10 min)Make input config (>5 min)
1 click (5 sec)4-Step click (10 sec)
Statistics information
(user, hosts, inputs…)
View from multiple Splunkservers
View from one interface
Contributions of dotconf-assist
• Github• https://github.com/rakutentech/dotconf-assist
• Frameworks• Ruby on Rails, Bootstrap
• License• MIT License
• Policies• Freely use• Accept pull requests
How to Access Source Code
• Expand users
• Upgrade to v6.4
• Enhance dotconf-assist
• Improve usability
• Visualize stats index size for each input
• Complete automation
• Re-Architect Log Management System in Rakuten
What is Next?
• Rakuten is using one big Splunk as a Service• Advantages for user
• No need to manage Infrastructure, License, and detailed configurations
• Easy log sharing among users
• Advantages for admin• Can manage operations and license efficiently
• Have many satisfied users
• dotconf-assist improves Rakuten Splunk as a Service• Helped users to start Splunking easily
• Decreased man-hour for Admins
Wrap up
• Tips for starting Splunk
• Purpose is very important
• Consider your business demands/problems
• No need to modify log format
• Collaborate with existing systems/tools
• Take useful training and Q&A meet up by Splunk Engineers
Appendix - Splunk Tips
• Tips for managing Splunk• Newer Splunk version is better than older
• High-end server is much better for Indexers
• Heavy forwarders are useful for splitting workloads of indexing pipeline
• Easy access control for users by using Tag
• Use DMC for monitoring
• Use Splunk API for better usability & reduction administration cost
Appendix - Splunk Tips
• Tips for using Splunk• Use alert and automatic delivering report & dashboard
• Use embedded reports
• See Splunk answers
• Share log data with other team
• Use Splunk API for collaboration with existing systems
• Dark background for dashboard is cool
• Enjoy Splunk
Appendix - Splunk Tips
• Rakuten is hiring
• http://global.rakuten.com/corp/careers/engineering/
Appendix - Hiring