Treasure DataHadoop meets Cloud with Multi-Tenancy
Kazuki OhtaFounder and CTO at Treasure Data, Inc.
Hadoopユーザー会 [email protected]
@kzk_mover
Friday, April 5, 13
Who are you? Kazuki Ohta (太田一樹)
• @kzk_mover, [email protected]
Treasure Data, Inc.• Chief Technology Officer, Founded July 2011
Hadoop User Group Japan• One of Founders• “Hadoop徹底入門”
Open-Source Enthusiast• Hadoop, memcached, jemalloc, MongoDB, memcached, uim, etc...
2
Friday, April 5, 13
3
Data Volume
Cloud
EnterpriseRDBMSLightweight
RDBMS
DB2
1Bil entryOr 10TB
TraditionalData Warehouse
$10Bmarket
$34Bmarket
Database-as-a-service
Big Data-as-a-Service
On-Premise
© 2012 Forrester Research, Inc. Reproduction Prohibited
Treasure Data = Cloud + Big Data
Friday, April 5, 13
4
What is the Problem?
Friday, April 5, 13
Big Data? NoSQL?
5
Friday, April 5, 13
6
Too Many Solutions
Friday, April 5, 13
7from http://marblejenka.blogspot.jp/2013/01/hadoop.html
Hadoop Versions
Too Many Variations (+Eco System)
Friday, April 5, 13
Current Big Data Solutions: ‘Feature Creep’
8http://en.wikipedia.org/wiki/Feature_creepFriday, April 5, 13
9
We need Machete :)
Machete Design by James LindenbaumHeroku Co-Founderhttp://www.youtube.com/watch?v=3BhDLm9jo5Y
EVERYTHINGwith
ONE interface
Simple & Discoverable
Friday, April 5, 13
‘Simplicity’ itself is a feature :)
10
by Anand Babu PeriasamyGlusterFS Co-Founder
Friday, April 5, 13
Next Topic: Cloud?
11
Friday, April 5, 13
12
http://www.saasblogs.com/saas/demystifying-the-cloud-where-do-saas-paas-and-other-acronyms-fit-in/
Friday, April 5, 13
Battle Field of IaaS Vendors: SCM
13
HW Performance / Price
Time
On-Premise
Decrease withMoore’s Law
IaaS Vendors
Battle Field:Supply Chain Management
In the near future, most of HW buyers aren’t individual companies, but cloud.
Friday, April 5, 13
PaaS, SaaS:IT is all about Operation
14
With PaaS, you offload your development operations function and have the PaaS provider handle the tools and components required to deploy and manage applications reliably. - EngineYard
More Sleep, More Value
Friday, April 5, 13
15
PaaS/SaaS Battle Field: ‘Time’ is Money
CustomerValue
Time
IdealExpectation
Sign-up or PO
Obsoleteover time
Reality(On-Premise)
HW/SW Selection, PoC, Deploy...Upgrade
Friday, April 5, 13
16
Introductionto
Treasure Data
Friday, April 5, 13
17
Company Overview
US team as of 2012 JulyFriday, April 5, 13
Company Overview Silicon Valley-based Company
• All Founders are Japanese• Hironobu Yoshikawa• Kazuki Ohta• Sadayuki Furuhashi
OSS Enthusiasts• MessagePack, Fluentd, etc.• Cloud native
18
Friday, April 5, 13
19
Our 50+ Customers – Fortune Global 500 leaders and start-ups including:
250 billion records / month in Feb 2013
2 million jobs executed
Friday, April 5, 13
20
Vision: Single Analytics Platform for the World
Friday, April 5, 13
Investors Bill Tai Naren Gupta - Nexus Ventures, Director of Redhat, TIBCO Othman Laraki - Former VP Growth at Twitter James Lindenbaum, Adam Wiggins, Orion Henry - Heroku
Founders Anand Babu Periasamy, Hitesh Chellani - Gluster
Founders Yukihiro “Matz” Matsumoto - Creator of Ruby Dan Scheinman - Director of Arista Networks + 10 more people
• and....21
Jerry Yang, Founder of Yahoo!where Hadoop was invented :)
Check out Today (2013/01/21)’s Morning 日経新聞!
Friday, April 5, 13
22
Treasure Data’sPhilosophy and Architecture
Friday, April 5, 13
23
Big Data Adoption Stages
Intelligence Sophistication
Standard Reports
Ad-hoc Reports
Drill Down Query
Alerts
Statistical Analysis
Predictive Analysis
Optimization
What happened?
Where?
Where exactly?
Error?
Why?
What’s a trend?
What’s the best?
Analytics
Reporting
Treasure Data’s FOCUS
(80% of needs)
Friday, April 5, 13
24
Full Stack Support for Big Data Reporting
Our best-in-class architecture and operations team ensure the integrity and availability of your data.
Data from almost any source can be securely and reliably uploaded using td-agent in streaming or batch mode.
Our SQL, REST, JDBC, ODBC and command-line interfaces support all major query tools and approaches.
You can store gigabytes to petabytes of data efficiently and securely in our cloud-based columnar datastore.
Friday, April 5, 13
25
Treasure Data = Collect + Store + Query
Friday, April 5, 13
26
Example in AdTech: MobFox
1. Europe’s largest independent mobile ad exchange.
2. 20 billion imps/month (circa Jan. 2013)
3. Serving ads for 15,000+ mobile apps (circa Jan. 2013)
4. Needed Big Data Analytics infrastructure ASAP.
Friday, April 5, 13
27
Two Weeks From Start to Finish!
Friday, April 5, 13
28
Our Value was Proven :)
CustomerValue
Time
Our Value: Save Time!
Sign-up or PO
Obsoleteover time
Reality(On-Premise)
HW/SW Selection, PoC, Deploy...Upgrade
SimpleInterface
Friday, April 5, 13
29
Architecture Breakdown
Data Collection• Increasing variety of
data sources• No single data schema• Lack of streaming data
collection method• 60% of Big Data project
resource consumed
Data Store/Analytics• Remaining complexity in
both traditional DWH and Hadoop (very slow time to market)
• Challenges in scaling data volume and expanding cost.
Connectivity• Required to ensure
connectivity with existing BI/visualization/apps by JDBC, REST and ODBC.
Friday, April 5, 13
1) Data Collection 60% of BI project resource is consumed here Most ‘underestimated’ and ‘unsexy’ but MOST important Fluentd: OSS lightweight but robust Log Collector
• http://fluentd.org/
30
15:40~ Log analysis system with Hadoop in livedoor 2013
by Satoshi Tagomori @ NHN Japan
16:30~ いかにしてHadoopにデータを集めるか by Sadayuki Furuhahsi @ Treasure Data, Inc.
These talks will cover Fluentd :)
Friday, April 5, 13
31
2) Data Store / Analytics - Columnar Storage
Friday, April 5, 13
32
3) Connectivity
Query
Web App
MySQLTreasure Data
Columnar Storage
QueryProcessingCluster
Query API
REST API
JDBC, ODBC Driver
td-command
BI apps
Postgres
Result
Friday, April 5, 13
Most Difficult Challenge: Multi-Tenancy All customers share the Hadoop clusters (4 Data Centers) Resource Sharing (Burst Cores), Rapid Improvement, Ease of Upgrade
33
datacenter A
datacenter B
datacenter C
datacenter D
Local FairScheduler
Local FairScheduler
Local FairScheduler
Local FairScheduler
GlobalScheduler
On-DemandResouce Allocation
Job Submission+ Plan Change
Friday, April 5, 13
Conclusion Big Data is too complex
• Needs Simplicity• Machete v.s. Swiss Army Knife (Feature Creep)
IT is changing• The value of Software itself is decreasing• Operation is the key
Treasure Data = Cloud + Big Data• Currently Focusing on Big Data Reporting• Instant Value with Simple Interface
34
Friday, April 5, 13
35
We’re Hiring Top Talents, please contact me :)
Friday, April 5, 13
3618
Appendix
Friday, April 5, 13
37
Big Data Market GrowthBig Data Revenue Breakdown(average of IDC, Gartner and Wikibon stats)
CAGR 38%
“More than half a billion dollars in venture capital has been invested in new big data technology.”
— Dan Vessett, IDC
“In 2012…BI and Analytics are rated #1 priorities.” — Ravi Kalakota, Gartner
“Big Data is the new definitive source of competitive advantage across all industries.”
— Jeff Kelly, Wikibon
Friday, April 5, 13
38
Big Data Situation
CustomerValue
Time
Treasure Data
AWS
On-premise solutions
Sign-up or PO
Software B
EMR
RedShift
Software A
Obsolescenceover time
Friday, April 5, 13
39
Treasure Data Service ArchitectureUser
Apache
App
App
Other data sources
RDBMS
Treasure Data columnar data
warehouse
QueryProcessingCluster
Query API
HIVE, PIG (to be supported)
JDBC, REST
MAPREDUCE JOBS
td-command
BI apps
Friday, April 5, 13
40
Our Own Open Source technologiesWe are open source natives and proud of our heritage.We’ve contributed to Hibernate, Hadoop, Cassandra, Memcached, KDE, MongoDB among others.Our product reflects our deep commitment to the open-source community and is built on top of open source software we’ve authored and open sourced.• Fluentd - a popular data collector daemon written in Ruby www.fluentd.org (a leading user: SlideShare/Linkedin, One Kings Lane)• MessagePack - a fast, compact serializer. www.msgpack.org (a leading user: Pinterest, Redis)
Substantial commitment(Code, Packaging, Documentation,
Sponsorship)
Tech marketing, Possible lead gen
Friday, April 5, 13
41
Example in Web Industry
Friday, April 5, 13
42
Example Use Case – MySQL to TD
Friday, April 5, 13
43
Example Use Case – MySQL to TD
Friday, April 5, 13
Big Data for the Rest of Us
www.treasure-data.com | @TreasureData
Friday, April 5, 13