Upload
etu-solution
View
867
Download
6
Embed Size (px)
Citation preview
1 © Cloudera, Inc. All rights reserved.
Cloudera 助力台湾 大数据产业的发展
Kai X. Miao (苗凯翔) Vice President, Cloudera Corpora@on
2 © Cloudera, Inc. All rights reserved.
Big Data Is Only GeGng Bigger Par@cularly Relevant in the Telecom Space
Data Growth
STRUCTURED DATA – 10% COMPLEX DATA – 90%
1980 TODAY
USER PROFILES
USAGE DATA
MOBILE & DEVICES
NETWORK MARKETING & CRM PUBLIC & TRADE
3rd Platform
Clients
Rich User Experiences
IOT Clients
By 2020,world data will reach 40ZB
In 2012,we have 2.8ZB1
3 © Cloudera, Inc. All rights reserved.
TradiGonal Data Architecture Can’t Handle Big Data
Instrumenta@on
Storage Grid (Original Raw Data)
Collec@on
ETL Compute Grid
BI Reports + Interac@ve Apps
RDBMS/EDW Can’t explore original raw data
Can’t scale
Sending data to graveyard
4 © Cloudera, Inc. All rights reserved.
A Major LimitaGon of RDBMS/EDW
• Schema must be created before any data can be loaded
• An explicit load opera@on has to take place which transforms data to DB internal structure
• New columns must be added explicitly before new data for such columns can be added into the data base
Schema-‐on-‐Write
5 © Cloudera, Inc. All rights reserved.
Expanding Data Requires A New Approach
©2014 Cloudera, Inc. All rights reserved.
5
1980s Bring Data to Compute
Now Bring Compute to Data
RelaGve size & complexity
Data InformaGon-‐centric
businesses use all data:
Mul@-‐structured, internal & external data
of all types
Compute
Compute
Compute
Process-‐centric businesses use:
• Structured data mainly • Internal data only
• “Important” data only
Compute
Compute
Compute
Data
Data
Data
Data
6 © Cloudera, Inc. All rights reserved.
Hadoop改变处理数据方式
Hadoop方式 传统方式
$30,000+ per TB
• Hard to scale • Network is a bogleneck • Only handles rela@onal data • Difficult to add new fields & data types
昂贵的、专有的、“可靠的”服务器 昂贵的软件许可
Network
数据存储 (SAN, NAS)
计算 (RDBMS, EDW)
$300 -‐ $1,000 per TB
• Scales out forever • No boglenecks • Easy to ingest any data • Agile data access
廉价的PC服务器 便宜的、开源的软件
Compute (CPU)
Memory Storage (Disk)
z z
7 © Cloudera, Inc. All rights reserved. 7
A Strong Track Record of Innova@on
2008 CLOUDERA FOUNDED BY MIKE OLSON AMR AWADALLAH & JEFF HAMMERBACHER
2009 HADOOP CREATOR
DOUG CUTTING JOINS CLOUDERA
2009 CLOUDERA RELEASES CDH THE FIRST COMMERCIAL APACHE HADOOP DISTRIBUTION
2010 CLOUDERA MANAGER: FIRST MANAGEMENT
APPLICATION FOR HADOOP
2011 CLOUDERA REACHES 100 PRODUCTION CUSTOMERS
2011 CLOUDERA
UNIVERSITY EXPANDS TO 140 COUNTRIES
2012 CLOUDERA ENTERPRISE 4 THE STANDARD FOR HADOOP IN THE ENTERPRISE
2012 CLOUDERA
CONNECT REACHES 300 PARTNERS
2014 THE ENTERPRISE DATA HUB LAUNCHED
2013 CLOUDERA IMPALA CLOUDERA NAVIGATOR CLOUDERA SEARCH
2013 TOM REILLY JOINS AS CEO
OVER 800 PARTNERS IN CLOUDERA CONNECT
2014 SERIES F FUNDING WITH INTEL AS KEY PARTNER
OVER 900 PARTNERS IN CLOUDERA CONNECT
2014 CLOUDERA ENTERPRISE 5
CDH Cloudera Manager
CLOUDERA ENTERPRISE
4 ASK BIGGER QUESTIONS
ENTERPRISE DATA HUB
CLOUDERA ENTERPRISE
5
8 © Cloudera, Inc. All rights reserved.
Cloudera公司简介
©2014 Cloudera, Inc. All rights reserved.
创始 2008年, 由前 员工共同创始
員工人數 900人以上
世界级技術支持 24x7的全球工作人员
积极主动与预测技術支持方案
关键任务 数以千计的企业用户
几百多个付费客户
最广泛的生态系统 1400多个商业合作伙伴
Cloudera University 培训100,000人以上
开源领袖 Cloudera的员工是业界领先的开发者和提供商
我们与英特尔的合作将能成功地开拓市场
9 © Cloudera, Inc. All rights reserved. 9
Open Source Scalable Flexible Cost-‐EffecGve
✔
Managed ✖ Open Architecture ✖ Secure and Governed ✖
✔
✔
✔
3RD PARTY APPS
STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE
CLOUDERA’S ENTERPRISE DATA HUB
BATCH PROCESSING
MAPREDUCE
ANALYTIC SQL IMPALA
SEARCH ENGINE
SOLR
MACHINE LEARNING
SPARK
STREAM PROCESSING SPARK STREAMING
WORKLOAD MANAGEMENT YARN
FILESYSTEM HDFS
ONLINE NOSQL HBASE
DATA MAN
AGEMEN
T CLO
UDERA N
AVIGATO
R
SYSTEM
MAN
AGEMEN
T CLO
UDERA M
ANAG
ER
SENTRY
DBMS Sensors LOGS
Sqoop
Flume
10 © Cloudera, Inc. All rights reserved.
WEB/MOBILE APPLICATION
ENTERPRISE DATA WAREHOUSE
ENTERPRISE REPORTING BI / ANALYTICS DATA
MODELING DEVELOPER
SDKs CLOUDERA MANAGER
CLOUDERA NAVIGATOR
ENTERPRISE DATA HUB
Security Admins System Admins Engineers Data Scien@sts Analysts Business Users
Customers & End Users
SYS LOGS WEB LOGS FILES RDBMS
The Modern InformaGon Architecture
11 © Cloudera, Inc. All rights reserved.
Customer Success Across Industries Financial & Business Services Telecom Technology Healthcare Life Sciences
Media
Retail Consumer Energy Public Sector
12 © Cloudera, Inc. All rights reserved.
客户360度分析 • Enhanced customer experience & support • Personaliza@on, targeted offerings, loyalty programs • Sen@ment analysis
渠道优化 • Campaign management • Selec@on process op@miza@on
供应链优化 • Manufacturing process efficiency • Supplier/merchant management
⻛风险管理 • Fraud detec@on • Intrusion detec@on & digital forensics
审计 • Regulatory compliance (reten@on, privacy) • Usage analysis and media@on • e-‐Discovery
市场资讯 • Compe@@ve analysis • Economic factor analysis • Customer segmenta@on
数据服务 • Data as-‐a-‐product • Data enriched with insights/inferences
Cloudera⼤大数据应⽤用案例种类
12
13 © Cloudera, Inc. All rights reserved.
制造业的数据来自哪里?
设备&传感器
• Device Readings • Device Performance • Device Diagnos@cs • Bagery / Power Consump@on
• Sotware Logs • Environmental Interac@ons
• R&D • Quality / Tes@ng
工厂&作业
• MES • Sensors • Video / Surveillance • Line Produc@vity • Machines • Staffing / Scheduling
供应链&库存
• ERP • Supplier / Manufacturer • Orders / Receivables • Commodity Supplies / Prices
市场 & CRM
• Transac@ons • Accounts • Warran@es / Atermarket
• Customer Service Logs • Campaigns / Promo@ons
• Website / SEO • Affiliates / Merchants • Surveys • Compe@@ve Intelligence
公共 & 交易
• Market Intelligence • Policy / Regula@on • Demographic / Census • Psychographic • Infla@on / Macroeconomic • Gas Prices • Labor Sta@s@cs • Social / Search • Public Health Data • Clinical Studies • Store Schema@cs • Journals / Editorial • Seismic / Specula@on
14 © Cloudera, Inc. All rights reserved.
• reduce the cost of sending deepwater drillships out into the ocean (1M$/day)
• doing a beger job of processing the vast amounts of data that can help iden@fy reservoirs of oil(0.5PB)
• Chevron gathers informa@on in five dimensions – the x and y coordinates of both the wave’s source and target, along with the @me it was collected.
• Construct picture of what the terrain looks like under the ocean floor
• The company uses CDH to sort that data.
Solu@on
优化运营–雪佛龙
• The more data Chevron can collect, the beger it can find pockets of oil and natural gas underground.
• Hadoop can do some of the seismic data processing in a less expensive way – 10x less than tradi@onal technologies on average.
Challenge Benefit
Chevron is reducing their cost of sending deepwater drillships into the ocean by more precisely iden@fying oil reservoirs.
15 © Cloudera, Inc. All rights reserved.
Automo@ve & Industrial
Problem
Solu+on
Background
Proac+ve Quality Assurance Build machine learning algorithms that iden@fy produc@on anomalies prior to field tes@ng and find performance flaws that could not be iden@fied in R&D.
Silos Limit Op+ons Legacy systems hold historical data from produc@on line telemetry, factory surveillance and sensors, call centers, in-‐car telema@cs, etc. That data is useless if it is kept offline and in silos.
Anomaly Detec+on Spark includes MLLib, a library of machine learning algorithms for large data, enabling clustering to iden@fy outliers from typical produc@on pagerns.
Use Case
卡特彼勒 卡特彼勒公司总部位于美国伊利诺州。是世界上最大的工程机械和矿山设备生产厂家、燃气发动机和工业用燃气轮机生产厂家之一,也是世界上最大的柴油机厂家之一。
16 © Cloudera, Inc. All rights reserved.
Telco Consumer Profile
16 ©2014 Cloudera, Inc. All rights reserved.
Contact, Credit info, date of renewal
Device type: phone, mobile broadband, tablet
Data/Voice Usage and Top-‐up
App Preference, interests, usage
Usage trends: @me of day, data amounts
Loca@on
Website usage
Social Networks Like/dislike, profile info
17 © Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved.
Use Case
Problem
Solu+on
Partners
Ac(onable Sen(ment Analysis Isolate customer profiles to personalize mix of plans, services, offers based on convergence of informa@on from network, GPS, social, call centers, accounts, etc.
Can’t Scale Beyond Silos Current systems can not integrate social, telemetric, and systems data in real @me with historical data to tailor product mix and incen@ve plans to the user.
Calculate Anything HBase is a real-‐@me database accommoda@ng complex historic data. Spark and Impala converge ETL, analy@cs, and repor@ng for on-‐demand modeling.
Customer 360o View
17
18 © Cloudera, Inc. All rights reserved.
Where Is the Financial Services Data? Mapping and Consolida@on Are the Tip of the Iceberg for Big Data
Retail Banking
• Bank Transac@ons • Customer Data • ATM Ac@vity • Online Ac@vity • Mobile Ac@vity • Demographic / Census Data
• Marke@ng / CRM • Social / Sen@ment
Credit Cards & Payments
• Card Transac@ons • Customer Data • Online Ac@vity • Demographic / Census Data
• Marke@ng / CRM • Integra@on with Retailers / Loyalty
• Social / Sen@ment
Investment Banking
• Trade Data • Customer Data • Web Logs • Research / Publica@ons • Market Data • Communica@ons / Documenta@on
Insurance
• Claims / Policy Data • Customer Data • Demographic / Census Data
• Weather Data • Vehicle Telemetry • Video / Surveillance • Sensors • Internet of Things
Services & SROs
• Trade Data • Communica@ons / Documenta@on
• Market Data • Research / Publica@ons • Surveys
19 © Cloudera, Inc. All rights reserved.
Data silos spread across company with 80+ years’ history • Analysis on 1 state takes 24
hours • Can’t analyze all 50 states at
once
Universal data archive on Cloudera • Supports storage, ETL,
applied math
Solu@on
Customer Spotlight: Allstate
Holis@c analysis on all 50 states in 16 hours • 75X faster performance
Challenge Benefit
Combining 80+ years of data across all business units & all 50 states.
20 © Cloudera, Inc. All rights reserved.
Thank you!