Upload
cason-selling
View
287
Download
0
Embed Size (px)
Citation preview
微軟資料倉儲的願景
大綱
未來趨勢 Appliance
PDW AU3 新功能
Hub & Spoke 架構運用
PDW & Big Data
客戶案例分享
與其他 MPP 比較
客戶
計畫
使用
未來 3 年真正會成長
0%
25
%5
0%
75
%1
00
%
-50% -25% 0% 25% 50% 75% 100%
Decreasing Usage Increasing Usage
Nar
row
Com
mitm
ent
Broa
d Co
mm
itmen
t新世代資料倉儲平台功能一覽表
DBMS Built for
Transactions
SMP
Centralized EDW
Analytics within EDW
Analytics Outside
EDW
Blades in Racks
DBMS Built for DW
Server Virtualizati
onDW
Bundles
Security
DW Appliance
Mixed Workloads
Data Federation
Columnar DBMS
Streaming Data
SOA
Low-Power Hardware
In-Memory DBMS
SaaS
Open Source OS
Open SourceReporting
Open SourceData Integration
Software Appliance
Public CloudOpen Source
DBMS
Advanced Analytics
Data Quality
HA for DW
Web Services
MPP
64-bit MDM
Real-time DW
Source: TDWI
Declining usage despite commitment
Flat growth, good/
moderate commitment
Good growth, good commitment
Good growth, moderate
commitment
Good growth, small commitment
Areas of strategic investment for Microsoft
Microsoft Confidential
Big Data
未來三年資料倉儲使用空間了解市場需求
Source: TDWI Report – Next Generation DW
Don't Know
More than 10 TB
3 - 10 TB
1 - 3 TB
Less than 1TB
0% 10% 20% 30% 40% 50%
6%
34%
25%
18%
17%
2%
17%
19%
21%
41%
資料倉儲所管理的資料量
Today In 3 years
效能 : 分析大量資料的能力 .
資料持續持續成長 , 彈性擴充 , 是考量主要關鍵 10s of TBs, to 100s of TB, to PBs
大量使用 appliance model 主因在於 balanced appliance 的效能與彈性擴充能力 . (MPP solutions)
Jim Cobelius, Forrester Research
Appliances 未來四年是主要趨勢 (2015: US: 40 億 )Cloud DW longer-term單機緩慢下降
Source: MS internal analysis, DBSMIT Cloud Market Opportunity Forecast
CAGR
-0.3%
26.2%
7.1%
Share(‘15)
4.6%
5.0%
30.0%
60.4%
FY10 FY11 FY12 FY13 FY14 FY150
2
4
6
8
10
12
14
7.9 8 8.2 8.2 8.1 7.7
1.1 1.5 1.9 2.4 3 3.8
DW Software License RevenueUS$ Billions
Public Cloud
Private Cloud
Appliances/RA
Traditional
7.1%
資料倉儲未來趨勢
SQL server 2012 資料平台
BI Applications
Third-Party BI Applications
Reporting Services Reports
Excel Workbooks
PowerPivot Applications
SharePoint Dashboards
and Scorecards
BI Platform分析服務 報表服務
EIM Platform
整合服務 Master Data Services
HadoopWindows/Azure
Data Sources
App FabricData Quality Services
Stream Insight
微軟 MDS / QDS 解決方案 在 Excel 產生快速與簡單 model強大 MDM 功能 : hierarchies, validation rules, versions, and workflowsDe-duplication and matching through integration with Data Quality ServicesDQS -> Data Quality Service 未來規劃
Data quality in the Azure cloudData cleansing in ExcelDQS API
公有資料 用戶端
• discover • analyze• create• transform• clean• curate• govern• secure• publish
• secure• govern• transform• publish
Visibility
Control
IT PROS
SQL SERVER DQS + MDS 解決方案
企業資訊管理與資料倉儲
內外部資料取得關聯 models
協同資料加值
Microsoft Confidential
PUBLISHER INFORMATION WORKERS & DEVELOPERS
• DATA ANALYST• POWER BI USER• DATA STEWARD
• DATA SCIENTIST• ANALYTICS
DEVELOPER • APP DEVELOPER
• discover• create • clean
CONSUMER INFORMATION
WORKERS• BUSINESS
DECISION MAKER• BUSINESS DATA
MANAGERWeb data
Public curated
連結外部
內部資料
發布內部資料讓 user 挖掘
(of data, data services and data models)
DW + EIM
微軟領導資料倉儲與商業智慧
[Gartner, Inc., Magic Quadrant for Data Warehouse Database Management Systems Magic Quadrant, Mark A. Beyer, Donald Feinberg, Merv Adrian, Roxane Edjlali, February 6, 2012. The Magic Quadrant is copyrighted 2012 by Gartner, Inc. and is reused with permission. The Magic Quadrant is a graphical representation of a marketplace at and for a specific time period. It depicts Gartner's analysis of how certain vendors measure against criteria for that marketplace, as defined by Gartner. Gartner does not endorse any vendor, product or service depicted in the Magic Quadrant, and does not advise technology users to select only those vendors placed in the "Leaders" quadrant. The Magic Quadrant is intended solely as a research tool, and is not meant to be a specific guide to action. Gartner disclaims all warranties, express or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
9
“Microsoft exhibits one of the best value propositions on the market with a low cost and a highly favorable price/performance ratio”
- Gartner, February 2012
Challengers
Leaders
Niche players Visionaries
發展目標完整性
執行
能力
Challengers
Leaders
Niche players Visionaries
發展目標完整性執
行能
力As of February 2012
資料倉儲 商業智慧
Microsoft
Microsoft
Reference Architectures
Fast Track for
資料倉儲全方位解決方案
Software
Dell Parallel Data Warehouse
HP Enterprise Data Warehouse
Dell Quickstart Data Warehouse
HP Business Data Warehouse
Appliances
比較 Fast Track 與 PDW 之差異性
Fast Track PDW (Parallel Data Warehousing )
版本 3.0 1.0 AU3
資料量 1~80TB 20~500TB
架構 SMP (Symmetric Multi-Processing) 對稱多處理技術 1 SQL node
MPP (Massive Parallel Processing ):大規模並行處理技術 8~10 SQL nodes
高可用性 可建置 內建
主要觀點 從硬體與軟體平衡來加速效能與降低 TCO . 快速建置
( 同左 ) 大規模並行處理 , Share Nothing, 簡易擴充
硬體廠商 HP, Dell, BULL, IBM HP, Dell
擴充性 Scale-Up Scale-Up & Scale-Out (~500TB)
Control Rack Data Rack
Compute Nodes Storage Nodes
Spare Compute
NodeD
ual
Fib
er
Ch
an
nel
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Du
al
In
fin
iban
d
Control Nodes
Active /
Passive
Landing Zone
Backup Node
SQL
Management Node
SQL
SQL
SQL Server PDW 硬體架構
SQL Server Parallel Data WarehouseOverall Architecture
Legend:
Control
NodeClient
Interface(JDBC, ODBC,
OLE-DB, ADO.NET)
DMS Manager
PDW Engine
…Compute Node 1DMS
CorePDW Agent
Landing Zone NodeBulk Data Loader
PDW Agent
Management NodeActive
DirectoryPDW Agent
PDW AgentCompute Node 2DMS
CorePDW Agent
Compute Node 10DMS
CorePDW AgentPDW service
Data Movement Service= DMS=
=
ETL Interface
Data Rack Control Rack
PDW 支援 SQL(Native) Client
PDW 核心效益高擴充性 : 10s to 100s of TB高效能 : MPP 大規模並行處理 , 實體分散 Server workloadAppliance: 軟硬體結合最佳化 sequential IO低成本 : 真正“ Share Nothing”, Node 獨立運用所有硬體資源開放性 : Open Platform 可使新硬體技術容易導入 ,
不會區限於單一硬體廠商靈活度 : 使用者建立的 Tables 以 replicated 或是
hash distributed 分散於 所有 nodes選擇建置架構 Hub and Spoke 中央 DW 架構高整合性 : 快速與簡易整合微軟 BI (SSIS,SSAS,SSRS)完整工具支援 : BI, ETL, MDM and Streaming dataParallel loading – 1 rack ~750GB/hr
PDW 管理
https://controlnodeipaddress
Dashboard Query activity Load activity Backup and restore Active locks Active sessions Alerts Appliance state
全部管理 Appliance 都透過單一管理介面
Console
PDW Configuration Manager
Appliance topology Services status Network configuration Privileges
Central Configuration of SQL instances across each node
整合監控 System Center
Full Management Pack for System
Center provides key monitoring and
alerting capabilities
透過 PDW Query 管理介面看 create distribution table 過程
Microsoft Confidential
Hub & Spoke 架構運用案例
20
Central DW Hub
OLAP
ETL Tools
OLTP
DB Cloud
SQL Data Service
Remote Table Copy
DW Loader
Sync Framework
SQL Azur
e
PDW
Hub-And-Spoke 架構優點所有系統都是透過高速網路連接再一起Parallel database copy – 速度高達 400 GB/min簡易的 data mart ETL / ELT 流程 各部門預算獨立使用 , 系統統一控管
每一部門 (BU) 可保有自己的分析超市 , 無需更動可即時滿足各部門 (BU) 原業務性之需求當系統 Capacity 飽合時 將不用再各別添購加強跨業務行銷能力
分開管理與使用者的工作,避免單一系統管理與使用都在同一套系統上可整合 SMP/MPP, OLTP/OLAP & Cloud 系統獨立延展或是建立子系統增加節點 (spokes) 不會影響到其他使用者
Microsoft Confidential
規劃 PDW / TERADATA 架構範例Data Sources
ETL Area
AS400/DB2
Oracle
DB2
Informix
Daily
Weekly
WeeklyDaily
Raw data
Aggregation
Staging
Staging
Raw dataELT
TERADATA
Parallel Data Warehouse
SSIS
HourlyPower User
Casual UserWeb APP
Web Portal
Data MartInternal Users
Proposed Architecture
Real-time/Directly
Cloud BI Platform Cloud Services
test
New Data
New Data and Aggregation
In the future
Frond-End
test
DM2
DM3
DM4
DM1
Daily
Sharepoint
BI Applications
Unix/Oracle
及時整合多種資料庫SSIS CDC
AS400
SQL
S/390 Unix Windows
DB2 DB2 Oracle SQL
Data Center
AttunityCDC
CDC /SSIS
SSIS
PDW 透過 CDC 及時同步整合 Oracle, DB2 & mySQL
與 SQL Server 整合
SQL Server PDW AU3 重點Release Themes
BI, Analytics, & ETL 工具整合
效能再提高
Broader functionality
Full Alignment
Less work for the same results
Do the same work more efficiently
Native Support for- Analysis Services- Reporting
Services- PowerPivot
Lay the foundation for broad connectivity support
SQL Server PDW AU3 架構 PDW AU3 Architecture with Shell DB
1. User issues a query
2. Query is sent to the Shell through sp_showmemo_xml stored procedure
SQL Server performs parsing, binding, authorizationSQL optimizer generates execution alternatives
3. MEMO containing candidate plans, histograms, data types is generated
4. Parallel execution plan generated
5. Parallel plan executes on compute nodes
6. Result returned to the user
Shell Appliance(SQL Server)
Engine ServiceP
lan
Steps
Plan
S
teps
Plan
S
tepsM
EM
O
Compute Node (SQL Server)
Compute Node (SQL Server)
Compute Node (SQL Server)
Con
trol N
od
e
SELECTSELECT
Return
PDW AU3 新功能MPP Cost
Based Optimizer &
DMS enhancemen
ts
Manageability
Stored Procedure
s
½ Rack configuration
s
Tabular Data
Stream
Enhanced
Integration
International Language
support
T-SQLEnhancemen
ts
• 10x faster query performance • Cost-Based Optimization on
MPP Engine• Remove Data Type
Conversions from DMS
• Entry Level Capacity for Massive Parallel Processing
• Upgrade Capability
• Seamless management of PDW
• Lower cost of Monitoring and Administration
• Enriched Programmability• Subset of SMP based SQL
functionality
• Enhanced Support for SQL Server ApplicationsCommon Connectivity
• Native Communication Protocol
• More Expressive Functions in PDW T-SQL
• Extends integration to 3rd party with SAS connector
• Full International Language Support through increased Collation Types
AU3 Shell DB 自動 Cost Based 最佳化 Operator trees 簡單 query 語法
(l_o = o_o)
PDW AU2 operator tree
O (o_o) LI (l_o)
(l_o = o_o)
shuffle (l_pk)
PDW AU3 operator tree
O (o_o)
LI (l_o)
(l_pk = p_pk)
broadcast
P (p_pk)
SELECT * from orders JOIN lineitem on (o_orderkey =
l_orderkey) JOIN part on (l_partkey = p_partkey)WHERE p_name like '%smoke%';
P (p_pk)
PDW 整合 -- Informatica 架構 ClientPDW
Server
Source Reader
Repository
TargetWriter
Bulk LoaderETL
Source Defn
Target Defn
Source table
Target table
Bulk Loader used for Fast
Loads
Fully Supported
PDW Source and target within
Informatica
目前支援的 BI 工具 AU3 T-SQL compatibility allows for common access for multiple tools
Current support on PDW drivers includesMicroStrategySAP BusinessObjectsInformatica
Other tools have ‘mixed experience’Cognos support required : CURRENT_TIMESTAMP , @@DATEFIRST, SET OPTION …Core connectivity enhancements planned for the next 2 releases
PDW 整合 – Hadoop Connector
HDFS
Landing Zone Node
Bulk Data Loader
PDW agentdwsql
SQOOP based
adapter
Config file
PD
W
BI-Directional
Import/export interface
Existing PDW Tools for loading /
bulk loads
Delimited File support
Importing HDFS 資料到 PDW 進階分析
HADOOP
Sensor/RFID Data
Blogs,
Docs
Web Data
SQL Server PDW
Interactive BI/Data
Visualization
SQOOP
Application Programmers
DBMS Admin
Power BI
Users
Hadoop – 透過 SQOOP 整合 PWD (export)
…
Landing Zone
Compute Node
1
Compute Node
8
HDFS
…
PDW-configuration file
PDW Hadoo
p Connector
SQOOP export with
source (HDFS path) &
target (PDW DB & table)
1. FTP Server
Copies incoming data
on Landing Zone
3.
2.Read HDFS
data via mappers
Invokes‘DWLoader’
Telnet
Server
4.
Control Node
Compute Nodes
Windows/PDW
Linux/Hadoo
p
5.
PDW New Connector for Hadoop SQOOP
TB+ 國內大型資料運用客戶260 TB 台灣空照圖ESRI 地理資訊整合運用
6 TB SQL Server 2008 on SAP一天 250 萬 SAP dialog steps , 平均反應時間約 0.5 秒
2 TB of 5+ 1 cluster, High availability網銀 for 7 X 24 access
12 TB 行動數據資料倉儲每日增加 40G , 自動化 Data loading
1.2 TB 通聯分析平台150 億筆搜尋 350筆 result set in 4 Sec
30T 大型資料倉儲250~400 concurrent user
36
Banking Core System in SQL
新光銀行台灣銀行富邦銀行
Core Banking 分行 / 網銀
中華郵政Bank of
America
First Premier Bank
Data Warehouse
Direct Edge from 4 hour 15 min
http://www.informationweek.com/news/software/info_management/229900133
38
High TCO 2000 series starts at $32K/TB 6000 series starts at $57K/TB! Expensive projects, no/less service
partners
High Value, Low Cost Solution Starting at $1.6M List Price And under $12K/TB
TERADATAPDW
Incomplete BI Solution No integrated BI tool Preferred BI & ETL Tools very
expensive Likes to build inflexible self-made
ETL processes
Complete BI Solution First class integration with Microsoft
BI, ETL & MDM tools Integration with System Center
Vendor lock-in and inflexible Solution 6000 series requires proprietary
BYNET No Hardware choice Advocates Centralized DW
Architecture
Open and flexible: No proprietary hardware No Vendor lock-in Espouses Distributed Architecture
PDW 與 TERADATA 比較表
39
Expensive Solution Full Rack X2-8 starts at $9.9M List
Price And $35K per TB!
High Value, Low Cost Solution Starting at $1.6M List Price And under $12K/TB
EXADATAPDW
Fragmented BI Solution Customers have to choose from a
myriad of tools for BI and ETL Lacks compelling Self-service BI
tools
Complete, Integrated BI Solution First class integration with Microsoft
BI, ETL & MDM tools Compelling self-service BI
Vendor lock-in and inflexible Solution Locks customers to Sun hardware And expensive contracts
Open and flexible: No proprietary hardware No Vendor lock-in Espouses Distributed Architecture
General Purpose Appliance Targets DW and OLTP but not ideal for both Database servers Use Shared resources and an
unbalanced design. Result - performance issues at scale
Optimized Solution PDW is fully engineered for high scale
DW Uses true MPP with balanced
architecture
PDW 與 Exadata 比較表
40
Total Solution is Not so cheap Despite low appliance cost, Netezza’s
total cost is not so cheap Add Expensive services from IBM Global
Services!
High Value Solution Through integration with SQL Server, MS
DW offers low cost solution Licensing discounts via EA and EAP
IBM NETEZZAPDW
Netezza is a Data Mart Most of their customers use Netezza as a
high performance Data Mart and rarely as a hub
Netezza does not perform as well with mixed workloads
PDW is the EDW Hub: PDW is ideal as a EDW Hub Offers high concurrency and Mixed
workloads
Vendor lock-in Requires proprietary FPGA Card No Hardware choice Lock customer to IBM platform and
services
Open and flexible: No proprietary hardware No Vendor lock-in Espouses Distributed Architecture
PDW 與 NETEZZA 比較表
1Unreliable Service Quality EMC is struggling to staff EDW projects
with qualified personnel. Some GP customers are compained about the poor level of service
Greenplum has a limited number of trained SI Partners
High Quality Service Microsoft offers rich Partner ecosystem
behind PDW. This includes GSIs like Accenture-Avanade
We also offer a strong CoE Team from MCS and Premier Mission Critical Support to deliver high quality service to PDW customers
EMC GREENPLUMPDW
Incomplete BI Solution No integrated BI tool Requires Analytical tools e.g. SAS or
SAP BOBJ Also relies on 3rd party ETL tools
Complete BI Solution First class integration with Microsoft
BI, ETL & MDM tools Integration with System Center
1Poor Concurrency: Greenplum has real issues with
concurrency It cannot run 100 queries
simultaneously
High Concurrency: PDW offers high concurrency And supports simultaneous data
loading and querying
1. Source: Customer feedback obtained from market research conducted in APAC
PDW 與 GREENPLUM 比較表
SQL Server PDW Roadmap What is coming next?
Q1 Q2 Q3 Q4 Q1 Q2
• Improved node manageability
• Better performance and reduced overhead
• OEM requests
• Programmability
• Batches• Control
flow• Variables
• Temp tables• QDR infiniband
switch• Onboard Dell
• Columnar store index
• Stored procedures• Integrated
Authentication• PowerView
integration• Workload
management• LZ/BU redundancy• Windows 8 • SQL Server 2012• Hardware refresh
2011 2012
• Cost based optimizer • Native SQL Server
drivers, including JDBC
• Collations• More expressive
query language • Data Movement
Services performance
• SCOM pack• Stored procedures
(sub)• Half-rack
• 3rd party integration (Informatica, MicroStrategy, Business Objects, HADOOP)
Q4
V-NextAppliance Update
3Appliance Update
1
Shipped Appliance Update 2
Q3
Shipped
Shipped
Connect. Share. Discussshttp://www.microsoft.com/taiwan/techdays2012/
Microsoft Certification & Training Resources
http://www.microsoft.com/learning/zh/tw/
Resources for IT Professionals
http://social.technet.microsoft.com/Forums/zh-tw/categories
/
Resources for Developershttp://social.msdn.microsoft.com/Forums/zh-tw/categories
/
Resources