2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Modern
Applications
Connected Data Architecture
Transformational Use‐Cases• Predictive Retail
• Factory Automation
• Connected Cars
• Predictive Analytics
• Artificial Intelligence
Data atRest
Data inMotion
ACTIONABLE INTELLIGENCE
Digital Transformation fueled by Big Data Analytics and IoT
• Cloud andData Center
• Powered byOpen Source
The Shift to the
Connected Data Architecture
System‐centric User‐centric
Mainframe Client / Server Web and SaaS
IDMSRelational Database
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop in the Data Center
Create and Manage Central Data Lakes
Support all Types of Data
Reduce Architecture Costs by 80% or More
Drive Transformational New Use Cases
Provide Flexible Processing and Access Methods
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop in the Cloud
Fast On‐Ramp for New Users
Elastic Compute and Storage Capabilities
Eliminate Hardware purchases
Facilitate Certain Modern Data Applications through Cloud Connectivity
Zero‐configuration access engine capabilities (HD Insight)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep Historical
Analysis
DATA C ENTER
Stream Analytics
Machine
Learning
C LOUD
Edge Analytics
Transformational Applications Require Connected Data
Data at Rest
Data at Rest
Data in Motion
Data in Motion
Edge Data
Edge Data
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Continuous Insights
Deliver insights from ALL data, origin to rest
Enterprise Ready
Management Security
Governance
Any Delivery Model
Data CenterCloud Hybrid
Open Innovation
Architecture Community Ecosystem
Our Focus: Enable Modern Applications on Connected Data Platforms
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Actionable Intelligence fromConnected Data Platforms
Capturing perishable insights from data in motion
Ensuring rich, historical insightson data at rest
Necessary for moderndata applications DATA AT
RESTDATA IN MOTION
ACTIONABLEINTELLIGENCE
Modern Data Applications
Hortonworks DataFlow
Hortonworks Data Platform
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks Data Platform for Data at RestPowered by Open Enterprise Hadoop
Open
Interoperable
Ready
Central
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks Data Platform 2.5 Highlights
Dynamic Security: Apache Atlas + Ranger Integration
Enterprise Spark at Scale: Apache Zeppelin Notebook for Spark
Streamlined Operations: Apache Ambari
Interactive Query in Seconds: Hive with LLAP (Technical Preview )
Real‐Time Applications: Storm and HBase/Phoenix
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas + Ranger ‐ More Powerful Together
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Introducing Tag Based SecurityApache Atlas and Ranger Integration
Basic Tag policy – Access and entitlements can be based on attributes. As an
example: Personally Identifiable Information (PII) is a tag that can be
leveraged to protect sensitive personal data.
Geo‐based policy – Access policy based on location. As an example: A user
might be able to access data in North America, but may be restricted from
access in EMEA due to privacy compliance.
Time‐based policy – Access policy based on time windows. An an example:
A user might be able to access data only between 8AM – 5PM (common in
SOX regulations.)
Prohibitions – Restrictions on combining two data sets which might be in
compliance originally, but not when combined together. As an example, SSNs
and Names)
Key Benefits:
New scalable metadata
based security paradigm
Dynamic, real‐time policy
Automatic updates to
changes in metadata
Centralized and simple to
manage policy
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas Powers Cross‐Component Data Lineage
As a part of HDP 2.5, users can track lineage across the following components using Atlas:
Apache Sqoop – Import from and export to relational databases, and additional package that leverages Sqoop
Hive ‐ Dataset lineage with entity versioning (including schema changes)
Apache Kafka/ Storm ‐ IoT event‐level processing, such as syslogs or sensor data
Falcon ‐ Data lifecycle at Feed and Process entity level for replication, and repeating workflows. Tracks period‐icy, throttling,
eviction. ATLAS‐69 FALCON‐1570
Key Benefits:
Enterprises need open
solutions, not single app
vendor
More native connectors than
any other vendor
Hardened metadata
infrastructure
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sqoop
Teradata
Connector
Apache
Kafka
Expanded Native Connector: Dataset Lineage
Custom
Activity
Reporter
Metadata
Repository
RDBMS
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas Enables Business Catalog for Ease of Use
Organize data assets along business terms
– Authoritative: Hierarchical business Taxonomy Creation
– Agile modeling: Model Conceptual, Logical, Physical assets
– Definition and assignment of tags like PII (Personally Identifiable Information)
Comprehensive features for compliance
– Multiple user profiles including Data Steward and Business Analysts
– Object auditing to track “Who did it”
– Metadata Versioning to track ”what did they do”
Faster Insight:
– Data Quality tab for profiling and sampling
– User Comments
Key Benefits:
Easy way to create business
Taxonomy
Useful for multiple user types
including Data Steward and
Business Analysts
Comprehensive features for
compliance
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Business CatalogModel and explore metadata via the
new Business Catalog in Apache Atlas
Data Steward
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streamlining Operations, Three Phase Plan
Focused Strategic Investments into our core products to give customers more unique tooling to quickly understand the cluster’s health, how business users are using it, and where to focus efforts when issues arise.
⬢ Capabilities
– Phase 1: Advanced Performance & Health Metrics Dashboards – with Ambari 2.2.2
– Phase 2: Consolidated Cluster Activity Reporting – NEW! with SmartSense 1.3.0
– Phase 3: Centralized & Contextual Log Search – Tech Preview with Ambari 2.4.0
⬢ Core Technologies
– Apache Ambari
– Ambari Metrics System
– Apache Solr
– Hortonworks SmartSense
– Grafana
Grafana
So l r
Ambar iMet r i c s Sys tem
Log Sea rch
Ded i cated U I s
Sma r t S en s eAMBAR I
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streamlined OperationsPhase 1: Advanced Metrics Visualization & Dashboarding
Goal: Quickly understand cluster health metrics and key performance indicators
⬢ Capabilities
– Centralized Dashboarding focusing on component Health & Performance
– Ad‐Hoc Graph Creation
⬢ Pre‐Built Dashboards
– HDFS
– YARN
– HBase
⬢ Core Technologies
– Ambari Metrics System
– Grafana
Grafana
Ambar iMet r i c s Sys tem AMBAR I
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari now includes pre‐built
dashboards for visualizing
cluster health
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streamlined OperationsPhase 2: Consolidated Cluster Activity Reporting
Goal: Quickly visualize and report on how business users and tenants are using the cluster, top 10 queues, users, most time consuming jobs
⬢ Capabilities
– Top K Activity Reporting
– Chargeback
⬢ Services Covered
– YARN
– MapReduce
– Hive/Tez
– Spark
– HDFS
⬢ Core Technologies
– Hortonworks SmartSense
– Apache Zeppelin
Ambari
Metrics System
Ambar iMet r i c s Sys tem
Apache Zeppe l in
Sma r t S en s eAMBAR I
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Activity Explorer: Cluster Utilization Reporting
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Preview: Streamlined Operations Investments
Phase 3: Centralized & Contextual Log Search
Goal: When issues arise, be able to quickly find issues across all HDP components
⬢ Capabilities
– Rapid Search of all HDP component logs
– Search across time ranges, log levels, and for keywords
⬢ Core Technologies:
– Apache Ambari
– Apache Solr
– Apache Ambari Log Search
So l r
Log Sea rch
AMBAR I
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tune the log collection system with
Guided Smart Configurations
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
View a comprehensive inventory
of operational logs for each host
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive 2 with LLAP Enable Interactive Query In Seconds
Developer Productivity: Interactive query in seconds
Ease of Use and Adoption : 100% compatible with Hive SQL
Enterprise Readiness: Linear scaling at Terabytes volume of data
Streamlined Operations: LLAP integration with Ambari with automated dashboards
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive 2 with LLAP: Preliminary Numbers
0
10
20
30
40
50
60
70
80
q3 q7 q12 q13 q19 q21 q26 q27 q42 q43 q45 q52 q55 q60 q73 q84 q89 q91 q98
Hive2.0 and LLAP: TPC‐DS at 10 TB Scale, 18 Nodes
Hive2.0‐Tez
LLAP
Min query time:
Query 55: 2.38s
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Cloud?
Unlimited Elastic Scale
Ephemeral & Long‐Running
IT & Business Agility
No UpfrontHW Costs
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How Do We Approach The Cloud Market?
HYBRID SEGMENTToday’s enterprise customers
CLOUD ONRAMPNew users via digital engagement or
existing customers exploring cloud options
Azure HDInsight, HDP, and HDF
are our Premier offerings.
Customer journey to future state architecture,
cloud operation & consumption model.
Azure HDInsight is our Premier offering.
Focused offerings for AWS that enable us to
engage and position our Premier offerings.
Cloud‐first approach to product design, development, testing & delivery
Seamless Connected Data Architecture
across Cloud and Data Center.
Always‐on enterprise use cases are common.
Elasticity, Automation,
Pay as you Go, One‐Click Start.
Ephemeral use cases are common starting point.
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Outlook: Cloud and the Big Data Market
Public cloud adoption (AWS, Azure, Google) will continue to accelerate
Many customers will go Cloud First to simplify/speed adoption
Customers deploying in public cloud expect a pay‐as‐you‐go (PAYG) pricing model
– Hourly pricing is default; “reserved” optimizes annual spend; “spot” optimizes hourly spend
Interested in running workloads in the cloud and in addition to on‐premise clusters.
Familiar with Native Cloud tooling.
Heightens importance of product packaging and user experience tuned to Cloud
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cloud IaaS and Hadoop as a Service
Running Hadoop on Cloud IaaS
Using Hadoop as a Cloud Service
Public Cloud Service Providers
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Microsoft Azure HDInsightsPowered by Hortonworks Data Platform
Seamless Access to the Public Cloudfor Spark, Hive, and HBase and other mission‐critical workloads
Unmatched Economicscombining HDInsight’s elasticity in the cloud with HDP’s cost efficiencies at scale
Enterprise Readiness with robust security, governance and operations in the cloud, powered by Hortonworks Data Platform
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connected Data Architecture with Azure HDInsight
Ideal Use Cases
Data Prep, Query, and Analysis
(Hadoop, Hive, Pig)
Iterative In‐Memory Analysis
(Spark)
Advanced Statistics, Modeling, Machine Learning
(R Server on Spark)
NoSQL Data Storage
(HBase)
Real‐time Event Processing (Storm)
HDInsight Cluster Types
C L O U D
DATA C E N T E R
HDFData Flow
Management
Azure HDInsight
Cloud Data Processing
HDPEn t e r p r i s e D a t a L a k e
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Runs in more datacenters than anyone else
Azure doubling compute and storage every 6 months
Central US
Iowa
West US
California
East US
Virginia
North Central US
Illinois
South Central US
Texas
Brazil South
Sao Paulo State
West Europe
Netherlands
China North *
Beijing
China South *
Shanghai
Japan East
Tokyo, Saitama
Japan West
Osaka
East Asia
Hong Kong
SE Asia
Singapore
Australia South East
Victoria
Australia East
New South Wales
India Central
Pune
North Europe
Ireland
East US 2
Virginia
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Standard Hadoop Projectsfor Hive, YARN, HDFS, MapReduce, Pig, Tez, Sqoop, oozie, Zookeeper, Mahout, Phoenix
Compehensive List of Emerging ProjectsSpark, Storm Hbase, and R
Ability to Add ProjectsAdd various projects to the the cloud
Microsoft Azure HDInsight and Apache Projects in the CloudYA RN
D A T A O P E R A T I N G S Y S T E M
OPERATIONS SECURITY
GOVERNANCE
STORAGE
STORAGE
Machine
LearningBatch
StreamingInteractive
Search
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Forrester Wave™: Big Data Hadoop Cloud Solutions, Q2 2016
“Elasticity, Automation, And Pay‐As‐You‐Go Compel
Enterprise Adoption Of Hadoop In The Cloud”
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connected Data Architecture with HDC for AWS
C L O U D
DATA C E N T E R
HDFData Flow
Management
HDC for AWS
Cloud Data Processing
HDPEn t e r p r i s e D a t a L a k e
Ideal Use Cases
Data Science and Exploration
(Spark, Zeppelin)
ETL and Data Preparation
(Hive, Spark)
Analytics and Reporting
(Hive2 w/LLAP, Zeppelin)
TECH PREVIEW
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cluster Types
Hortonworks Data Cloud for AWS
TECH PREVIEW
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Prescriptive On‐Demand Ephemeral Workloads
** Planned list of available Cluster Types
TECH PREVIEW
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Hortonworks Cloud Solutions?
Choice of Cloud
Rich Set of Capabilities and Security
S3 Integrations on AWS (Tech Preview)
Award Winning Hadoop Expertise
Zero‐configuration access engine capabilities (HD Insight)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep Historical
Analysis
DATA C ENTER
Stream Analytics
Machine
Learning
C LOUD
Edge Analytics
Connected Data Platforms Integrate Cloud and Data Center Deployments
Data at Rest
Data at Rest
Data in Motion
Data in Motion
Edge Data
Edge Data