Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
泛大数据时代的 Oracle 解决之道贺辉群 David He
Industry Solution Manager for Big Data
Oracle Enterprise Architecture
2
大数据是什么?
3
什么是大数据?
4
大数据定义
Big Data: Techniques and
Technologies that Enable Enterprises
to Effectively and Economically
Analyze All of their Data
- IDC, Carl Olofson
5
写模式和读模式 Schema on Write vs. Schema on Read
Traditional “Schema on Write”
– Required data must first be identified and
modelled in a “schema”
– Data is integrated and loaded via ETL
– Value realized only after this is done
Big Data “Schema on Read”
– Required data captured in code for each
program accessing the data
– Data is integrated in code in map/reduce
framework
– Value realized faster
获得数据价值的时间以及数据分析的灵活性
6
能力评估
0
1
2
3
4
5Tooling maturity
Stringent Non-Functionals
ACID transactions
Security
Variety of data formats
Data sparsity
ETL complexity
Cost effectively store low value data
Ingestion rate
STP
Hadoop on BDA
Oracle on Exa
Hadoop vs. RDBMS
7
统一的数据分析平台Unified Analytics API
SQL R MR
Unified Analytics Processing Platform
Hadoop RDBMS
IB
Management Framework and Tools
8
大数据和分析
9
全量数据
更好的决策
更快的执行
大数据分析
9
10
大数据用户案例
找到未知的关系
关联不同的数据结果集
发现机会降低成本
11
发现和洞察The Value of Not Requiring a Pre-defined Schema
Customers System A
ProductsSystem C
OrdersSystem D
Social MediaSystem D
Call CenterSystem E
Derived MetricsCommon across some systemse.g. Sentiment Score, Avg ResolutionTime, Customer Satisfaction
Unique Dimensions or MetricsCustomer type, Age, Profitability, Fidelity
Support Jagged dataFor diverse structuressuch as product specs
Unique Dimensions or MetricsThemes, Competitors , Klout
Table-free = 不需要过度架构、自适应、灵活的数据分析架构
Global dimensionsCommon across all systemse.g. Product ID, Period, Location, Themes, Customer ID
Global MetricsCommon across some systemse.g. Cost, Count
12
大数据激发新的洞察
Correlations and patterns from
disparate, linked data sources yield
the greatest insights and
transformative opportunities - Gartner
13
大数据和分析区分报表和分析
Descriptive Predictive
Reporting Analytics
Dashboards
Hindsight
What happened?
Shows Results
Relational / OLAP
Visualization
Insight
What will happen?
Predicts Results
Hadoop / NoSQL
14
Three Big Data Differences
Scale Trumps Smarter
– No More Sampling
– Large Data Sets + Simple Algorithms > Samples + Complex Algorithms
Scale Trumps Better
– The Real World is Messy
– Large Data Sets w/ Bad Data > Small Data Sets w/ No Bad Data
Correlation is More Important Than Exactitude
– We Are After Trends, Not Values
– This is not for your billing system
Suggested Reading: Big Data A Revolution That Will Transform How We Live, Work and Think
by Kenneth Cukier and Viktor Mayer-Schonberger
This Requires a Change in How You Think!
15
金融大数据
16
金融大数据的应用模式
IT优化
– 更好、更快、更经济、更合理的去管理和处理数据
大数据分析
– 分析所有的数据,不论规模、结构和速度
业务流程转换
– 使用大数据提升现有运作流程
17
IT优化
18
Big Data Usage PatternETL and Batch Processing Workloads on Hadoop
Integrate
SQL
SQL
NoSQL
• Scalable
• Flexible
• Cost
Effective
DW & BI
Analytics
Web
Mainframe
19
Objectives
Large US Regional Bank
Comply with regulations requiring more
data to support stress testing
Reduce IT costs & streamline processing
by eliminating duplicate data stores
Solution
Single, reliable BDA/Exadata-based ODS
supporting all downstream systems
Landing zone & archival repository for
both structured & unstructured data
Use Exadata as “19th” BDA node Operational Data StoreMainframe, RD
BMS, more
BDA Exadata
• Agile business
model
• All data
• De-normalized
& Partial-
normalized
• Normalized
• Aggregate data
• EDW
Oracle Enterprise Manager
Oracle Data Integrator
Data Delivery
Master
S1
Master
S2
Master
SnSOA/API
CRMS
Other
Results & Benefits
Fast access to 85% more data
Lower costs, simplified architecture
and fast time to value
20
Thomson Reuters
Objectives
Maximize cross-sell opportunities
Lower cost and complexity
Solution
Economically capture all customer activity
Testing 50M events/sec ingest rates into
the Oracle Big Data Appliance
Feeds Exadata EDW for customer
profitability & segmentation analysis
Rick KingChief Operating Officer for Technology
Thomson Reuters
“Oracle's engineered systems… are geared
toward high performance big data delivery - and
that is exactly the type of work we do”
BDA Exadata Exalytics
EDW
Sandbox & DR
Event Capture
& Store
Interactive
Analytics
Research
Applications
Upsell/Cross Sell
21
Big Data Usage PatternExpand Data Warehouse with Granular Data Store
MartsData Warehouse
Σ Σ
Business
Intelligence
Archiving
• Online
• Scalable
• Flexible
• Cost
Effective
Data Factory
22
End-to-end business information environment that provides accurate, transparent and timely information to shareholders, regulators and management
Objectives
Tier 1 Global BankNew Information Management Architecture
Results & Benefits
Reduce complexity and risk of changes
Reduce cost of operation
Increased stability & performance
Solution
7 Exadata Racks
16 Node Hadoop Cluster – 33TB
Oracle Loader for Hadoop
23
大数据分析
24
Ad-hoc
Big Data Usage PatternScale-out Information Discovery
• Online
• Scalable
• Flexible
• Cost
Effective
Data Factory
Continuous On-Demand
25
Enable customers to learn about stocks and increase buying confidence
Cultivate the advisor-client relationship online and acquire smaller clients
Objectives
Credit SuisseIncreased sales through instant access to information
Results & Benefits
Incremental sales for Bank based on this
application for 5 years.
Improved customer relationships
Solution
Information Discovery on pooled research
data sets in multiple unstructured formats
Oracle powers their internal application that
advisors utilize to quickly find information on
financial metrics
26
Big Data Usage PatternInstant Responses based on Historical Analysis
Business
Intelligence
• Online
• Scalable
• Flexible
• Cost
Effective
Integrate
Event Decisions
27
Solution ArchitectureReal-time Personalized Offers
Extr
act,
Tra
nsfo
rm a
nd
Lo
ad
Front Office
Channel Systems
Call Center
Reporting / Analytics
Oracle Endeca / Oracle Business Intelligence /
Oracle R Enterprise / Oracle Exalytics
Customer
Database
Content
Recommendations
Oracle Big Data Appliance
Co
nte
nt
Pre
sen
tmen
t / D
isp
osit
ion
ATM
Branch
Online
Mobile
Back Office Systems
External Data
Debit Card Transcations
Credit Card Transcations
Customer CRM Data
Reference Data
Clickstream Data
Card Merchant Data
Social Data
Ora
cle
RT
D /
Ora
cle
OE
P /
Ora
cle
Fus
ion
Mid
dlew
are
Ora
cle
Dat
a In
tegr
ator
Cloudera / Oracle NoSQL / Big Data
Connectors
28
Omni-Channel Offers with 360 View of CustomerPersonalized Offers to Any Channel in Real-time
Real-time profile updates
Self-learning, closed loop model
Best-in-class modeling across
structured and unstructured data
Add new dimensions in your
recommendation process
One View of Customer
Deliver highly personalized offers to
any channel in real-time
Channel Systems
Call Center
ATM
Branch
Online
Mobile
29
New Wholesale Bank InitiativesProvide Wholesale Merchant Offers & Mobile Payments
Based on Location and individual
customer preferences
Millions of Customers X Thousands
of Merchant Offers
Protect Payments and Drive
Wholesale Deposits
Become Trustee for your Customer’s
Commercial Identity
30
Real-time Location-Based OffersTier 1 Global Bank
Objectives
Customer profile enrichment with Big Data
Capture credit card POS and merchant data with
event processor
Determine geo location of POS and nearby bank
wholesale customers
Leverage real-time decision engine to generate
offer to mobile device
Solution
Increase revenue through real-
time, location based offers
31
业务流程转换
32
Oracle Financial Services Analytical ApplicationsAnalytical Tools for Banking, Capital Markets and Insurance
Performance Management & Finance
Model Risk
V2 061912
Performance Management
Customer Insight
Governance & Compliance
Risk Management
Hedge Management IFRS 9 – IAS 32/39
ICAAP/Risk Appetite
Customer Profitability
Stress Testing
Loan Loss Forecasting Pricing Management
Risk Adjusted Performance
Know Your Customer
Risk Management
Operational Risk & Compliance Mgt. Regulatory Compliance (Financial Crime)
Customer Insight
Anti-Money Laundering
Trading Compliance Broker Compliance
Fraud Detection Operational Risk
Credit Risk
Institutional Performance
Retail Performance
Marketing
Customer Segmentation
Capital Management
Liquidity Risk
Economic Capital Advanced (Credit Risk)
Operational Risk Economic Capital
Balance Sheet Planning
Profitability
Asset Liability Management
Market Risk
Basel Regulatory Capital
Retail Portfolio Models and Pooling
Funds Transfer Pricing
Reconciliation
Channel Insight
Compliance Risk Business
Continuity Risk
Counterparty Risk
Audit
FSDF
33
OFSAA – Current Architecture
34
Oracle大数据解决方案
35
Oracle Big Data Solution
Stream Acquire – Organize – Analyze
Oracle BI Foundation Suite
Oracle Real-TimeDecisions
Endeca Information Discovery
Decide
Oracle Event Processing Oracle Big Data
Connectors
Oracle DataIntegrator
Oracle
Advanced
Analytics
Oracle
Database
Oracle
Spatial
& Graph
Apache Flume
OracleGoldenGate
Oracle
NoSQL
Database
Cloudera
Hadoop
Oracle R
Distribution
36
Why Make Big Data a Divided World?
VS
37
Unified Data Analytics EnvironmentUnified Analytics API
SQL R MR
Unified Analytics Processing Platform
Hadoop RDBMS
IB
Management Framework and Tools
38
使用SQL跨Oracle和Hadoop联合分析数据
SQL Analytics on ALL data
Expand the data pool for
analytics leveraging Hadoop
Stream Hadoop resident data
through Big Data Connectors
for SQL processing
Use the full power of Oracle
SQL on all data
Or use Oracle Loader for
Hadoop to integrate data in
Oracle Database
SQL
Hadoop Oracle Database
IB
39
使用R跨Oracle和Hadoop联合分析数据R Analytics on ALL data
Expand the data pool for
analytics leveraging Hadoop
Improve scalability and
performance for R without
changes to your programs
Dynamically leverage Hadoop
through Big Data Connectors
to execute R analytics
R
Hadoop Oracle Database
IB
40
统一数据分析平台
Real-Time
Analytics
Thousands of
Users
Secure and
Available
All Data On-
line and
Ready to Use
Large Scale
Systems
Cost Effective
41
Logical Architecture
42
Solution ArchitectureReal-time Personalized Offers
43
Oracle Big Data Solution
Oracle BI Foundation Suite
Oracle Real-TimeDecisions
Endeca Information Discovery
Decide
Oracle
Advanced
Analytics
Oracle
Database
Oracle
Spatial
& Graph
Acquire – Organize – Analyze
Oracle Big Data Connectors
Oracle DataIntegrator
Stream
Oracle Event Processing
Apache Flume
OracleGoldenGate
Oracle
NoSQL
Database
Cloudera
Hadoop
Oracle R
Distribution
Scalable key-value store
Scalable, low-cost data storage
and processing engine
Statistical analysis framework
44
Hadoop
The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers
using simple programming models. Hadoop is designed to scale up from
single servers to thousands of machines, each offering local
computation and storage. Rather than rely on hardware to deliver high-
availability, the library itself is designed to detect and handle failures at
the application layer, so delivering a highly-available service on top of a
cluster of computers, each of which may be prone to failures.
Framework for distributed processing
Large Data Sets
Clusters of Computers
Simple Computing Models
Highly Available Service
45
Big Data Appliance X3-2
Sun Oracle X3-2L Servers with per server:
• 2 * 8 Core Intel Xeon E5 Processors
• 64 GB Memory
• 36TB Disk space
Integrated Software:
• Oracle Linux
• Oracle Java JDK
• Cloudera Distribution of Apache Hadoop (CDH)
• Cloudera Manager
• Oracle R Distribution
• Oracle NoSQL Database
All integrated software (except NoSQL DB CE) is supported as part of Premier Support for Systems and Premier Support for
Operating Systems
46
Big Data Appliance 产品家族
Starter Rack is a fully cabled and
configured for growth with 6 servers
In-Rack Expansion delivers 6 server
modular expansion block
Full Rack delivers optimal blend of
capacity and expansion options
Grow by adding rack – up to 18 racks
without additional switches
47
Divide Full Rack BDA in multiple clusters
Provide more flexible configurations for customers
Automatic reconfiguration when expanding the cluster
灵活的配置
6 Node Cluster
12 Node Cluster
Example Configuration
48
数据多份冗余存储
没有NameNode单点故障
NameNode自动故障切换
Metadata多份数据同步
Oracle Big Data Appliance高可用解决方案Cloudera CDH 4.1
Active Name Node
Passive Name Node
49
Engineered for Quicker Time to Value at Lower Cost
http://www.oracle.com/us/corporate/analystreports/industries/esg-big-data-wp-1914112.pdf
ESG believes that a "buy" versus "do-it-yourself"
approach will yield roughly one-third faster time-
to-market benefit improvement...
0
5
10
15
20
25
30
Oracle Big Data Appliance Build it yourself
Time to Market (Weeks)
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
Oracle Big Data Appliance Build it yourself
Cost: Initial Infrastructure/Tasks
[…] nearly 40% cost savings versus IT
architecting, designing, procuring, configuring, an
d implementing its own big data infrastructure.
50
Mammoth一键安装配置
BDA’s Single Command Install, patch and upgrade Utility
– Distributes the binaries and installs all BDA software based on a set of
configuration specifications
– Sets all optimized parameters for OS, JVM, Hadoop and Oracle SW
– Applies (one-off) patches and Updates for:
OS, Kernel, JDK and Firmware (switch, HBA, disk controllers etc.)
Cloudera Software Stack and required components
Oracle NoSQL Database and Oracle Big Data Connectors
Specifically built by Oracle for BDA
51
集成管理框架Management Infrastructure combines EM and Cloudera Manager
Quick view of Hardware and Software status
in Oracle Enterprise Manager
52
Oracle Audit Vault and Database Firewall
DatabasesRelational Data
HadoopNon-Relational Data
Operating Systems
Audit Vault
OneConsolidated, secure repository for all audit data
Centralized platform for audit reporting, alerting and policy management
53
Oracle Big Data Appliance
Oracle Audit Vault monitoring enabled at or after
BDA installation
Capture all HDFS access and MapReduce
activity
– Who initiated the activity
– What data was accessed
– When did the activity take place
54
Kerberos Integration
Kerberos Pre-Configured upon Install
– Point at an external Key Distribution Center
– Install HA Key Distribution Center on the BDA
Strong authentication for
– All Hadoop services
– Oracle Big Data Connectors
Ensure that users are who they claim to be
Ensure authentication across the enterprise
Automatic Authentication
55
LDAP and Network Encryption
LDAP
– Link Kerberos to existing LDAP services
– Simplify permissions management for all Hadoop services
– Centrally manage permissions for the enterprise
Network Encryption1
– Ensure that data in-motion is protected
– Data moved within Hadoop jobs is encrypted
– Simple installation choice for Big Data Appliance
Secure Transmission, Integrated Authentication
1 Currently planned for release 2.3.1
56
Oracle Loader for Hadoop
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
ORACLE LOADER FOR HADOOP
超高速数据加载
利用Hadoop并行能力,降低数据库CPU加载负荷
在线模式和离线模式
连续不断的数据输入
Oracle Data
Warehouse
SHUFFLE
/SORT
SHUFFLE
/SORT
高达15TB/小时
57
Oracle Loader for Hadoop支持多种数据源
Oracle Data
Warehouse
SHUFFLE
/SORT
SHUFFLE
/SORT
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
ORACLE LOADER FOR HADOOPDelimited
text files
Hive tables
User written
input format
Various data
sources
Oracle NoSQL
Database
58
Oracle SQL Connector for Hadoop从 Oracle 数据库直接标准SQL访问Hadoop上的数据
对 Hadoop上的数据通过标准全功能 SQL 进行访问
数据库和Hadoop上的数据进行关联查询
更低延时的Hadoop数据访问解决方案
DCH
外部表
DCHOSCH
SQL 查询
InfiniBand
HDFS 客户端
HDFS Oracle 数据库
59
Oracle Data IntegratorSimplify Map Reduce
Automatically generates
MapReduce code
High performance loads into
Data Warehouse leveraging
both OLH and OSCH
Manages the process across
platforms
OLH
&
OSCH
Oracle
Data
Integrator
60
Oracle Big Data Connector Hadoop、NoSQL与RDBMS的融合
HIVE
HDFS
HDFS
Datafile_part_1
Oracle Database
Oracle SQL
Connector
for Hadoop
外部表
SQL查询
聚合
KVInputFormat外部表
Oracle
NoSQL
Database
Hadoop
Oracle
Data
Integrator
Datafile_part_x
Oracle
Loader
Hadoop
关系型结构化数据
• Oracle Loader for Hadoop
• Oracle SQL Connector for Hadoop
• Oracle Data Integrator Application for Hadoop
• Oracle R Connector for Hadoop
61
Oracle XQuery for Hadoop
Acquire – Organize – Analyze
Oracle Big Data Connectors
Oracle DataIntegrator Oracle
Loaderfor
Hadoop
OXH is a transformation engine for Big Data
XQuery language executed on the Map/Reduce framework
XQuery
for $ln in
text :collect ion()
let $f :=
tokenize($ln)
where $f[1] = 'x '
return
text :put ($f[2] )
Map/Reduce
Execut ion Plan
M/R
M/R
M/R
M/R
Map/Reduce
Worker Nodes
HDFS
OXH
Engine
62
Oracle XQuery for Hadoop
Ease of Use
Parallel distributed parsing of big XML files
Standard declarative transformation language
Comprehensive support for nested data structures
No schema setup required
Rich built-in function library
Extensible with user-defined Java functions
Multiple output destinations from a single query
Key Features
63
Oracle XQuery for HadoopInput / Output Data Formats
Input
HDFS
Oracle
NoSQL DB
Text
CSV
JSON
Avro
XML
Output
HDFS
Oracle
NoSQL DB
Text
CSV
JSON
Avro
Oracle
NoSQL DBXML
Oracle Database
Map/Reduce Job Counters
64
Oracle R Enterprise
Oracle R Enterprise brings R’s statistical functionality closer to the data
1. Eliminate R’s memory constraint by enabling R
to work directly & transparently on database objects
– Allows R to run on very large data sets
2. Architected for Enterprise production infrastructure
– Automatically exploits database parallelism without require parallel R
programming
– Build and immediately deploy
3. Oracle R leverages the latest R algorithms and packages
– R is an embedded component of the DBMS server
– R will run across your Hadoop cluster *
* Future feature
65
Oracle Advanced Analytics
Oracle Advanced Analytics extends Oracle Database
into a comprehensive analytical platform
– Predictive analytics, data mining, text mining, statistical
analysis,
advanced numerical computations
Scalable and parallel: analyze huge volumes of data
Tightly integrated with SQL: share results of analytics
throughout enterprise
Built for data analysts
66
Oracle NoSQL Database
Simple Key-Value Data Model
Horizontally Scalable
Highly Available
Simple administration
ACID Transactions at scale
Transparent load balancing
Elastic Configuration
Commercial grade software and support
Scalable, Highly Available, Key-Value Database
Application
Storage NodesDatacenter B
Storage NodesDatacenter A
Application
NoSQL DB Driver
Application
NoSQL DB Driver
Application
67
68