Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
微软云上数据平台概括
Action
People
Automated Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive
Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop, Spark,
Storm, HBase
Managed Clusters)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Data Sources
Apps
Sensors and devices
Data
From data to decisions and actions
Diagnostic[Interactive Dashboards]
Prescriptive[Recommendations & Automation]
Predictive[Machine Learning]
Descriptive[Reports]
What should I do?
What will happen?
Why did it happen?
Whathappened? Insight
LOB
Applications
SocialDevices
Clickstream
Sensors
Video
Web
Relational
A highly scalable, distributed, parallel file system in the cloud specifically designed to work
with a variety of big data analytics workloads
Azure Data Lake Store
Batch
Map
Reduce
Script
Pig
SQL
Hive
NoSQL
HBase
In-Memory
Spark
Predictive
R Server
Batch
U-SQL
HDInsightADL
Analytics
关于Azure HDInsight
Analytics
Storage
Microsoft Hadoop Stack
Azure HDInsight
Machine Learning
Local (HDFS) or Cloud (Azure Blob/Azure Data Lake Store)
Azure HDInsight
Fully-managed Hadoop and Spark for the cloud
100% Open Source Hortonworks data platform
Clusters up and running in minutes
Supported by Microsoft with industry’s best SLA
Familiar BI tools for analysis
Open source notebooks for interactive data science
63% lower TCO than deploying Hadoop on-premise*
Hadoop and Spark as a Service on Azure
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
OD
BC
Perimeter Level SecurityVirtual Network
Network Security (i.e. Firewalls)
Gateway Service
Multi-User AuthenticationKerberos
Azure Active Directory
Authorization
using Apache
RangerHive policies
HBase policies
File and Folder level
ACLS on ADLS Data SecurityEncryption @ Rest supported
On both Azure Storage Blob and
ADLS
HDInsight案例分享
关于HDInsight - Hive
Platform Core SQL Engine Connectivity
• Ad-Hoc
• Drill-Down
• BI Tools: Tableau,
Excel
• Continuous ingestion
from operational DB
• Slowly changing
dimensions
Legend
Existing
Development
• Multidimensional Analytics
• MDX Tools• Excel
Emerging
…
Interactive Hive
cluster (new)SDK, PowerShell
JDBC, ODBC, Visual Studio, Hue, AmbariHadoop cluster
…
演示: HDInsight cluster & Hive
基于HDInsight – Hive的企业
数据仓库
Pay only for time the cluster was actually used
Since data & metadata is persisted, experience is as if the cluster was never deleted
Always on cluster (Persistent) Cluster as a service (On demand)
Storage choice Local HDFS, Azure Blob, Azure
Data Lake Store
Azure Blob, Azure Data Lake Store
Job Scheduling Oozie Azure Data Factory
Data persistence after
cluster deletion
N/A Azure Blob, Azure Data Lake Store
Metadata persistence
after cluster deletion
N/A Azure SQL
Billing Billing for entire time cluster is up Billing per job
Optimization Summary
Choose from dozens of VMs and scale out capability to increase parallelism
Choose Tez execution Engine
Avoid reading entire partitions by breaking files into pieces
Columnar format supported by Hive which also allows you to use ACID and LLAP
Enables Hive to process 1024 rows at one time to make execution faster
演示: Query Authoring Tools演示: 100GB query with Batch
Azure official website https://www.azure.cn/ to official information, solution, documentation, and SDKs for Azure in China
Azure Marketplace in China: https://market.azure.cn/
Azure 1RMB Trial: https://www.azure.cn/pricing/1rmb-trial-full/
Microsoft 云科技公众号 Azure 云助手手机 App
Developer Notes for Azure in China Applications https://www.azure.cn/dev-notes/ to developer differences between Global and China
Azure 中国官网 https://www.azure.cn/ 提供最新产品与解决方案信息, 技术文档,以及SDKs下载
Azure 镜像市场: https://market.azure.cn/
申请一元试用,即刻体验 Azure 服务:https://www.azure.cn/pricing/1rmb-trial-full/
Microsoft 云科技公众号 Azure 云助手手机 App
Azure 应用程序开发说明 https://www.azure.cn/dev-notes/ 概述了海外与中国区服务开发人员需要注意的区别
•顶级项目
•Apache Kylin, 中国唯一的Apache顶级开源项目,核心开发者及贡献
者都在中国
•行业认可
•连续两年荣获InfoWorld ”最佳开源大数据工具奖”,今年更是与
Google TensorFlow一起获得该奖
•用户认可
•国内外超过100多家大型公司正式使用Kylin作为大数据分析平台解决
方案,分布各个行业
Cbe
C
Kylin的O(1) 算法使得查询性能与数据集大小无关
超大数据,超高性能,超高并发
大规模数据分析,无需编码
Azure Resource Manager
Resources Group
Virtual
network
Kylin server
Blob Storage
HDinsight
▪ Azure:成熟的云计算平台
▪ HDInsight:自动伸缩
▪ Power BI:自助式可视化BI
▪ Apache Kylin:高性能+高并发+标准SQL