Hortonworks Inc. 2011 2015. All Rights Reserved
Hadoop Trends & Hadoop on EC2Yifeng JiangSolutions Engineer, Hortonworks, inc.March 22, 2015
Hortonworks Inc. 2011 2015. All Rights Reserved
(Yifeng Jiang)
Solutions Engineer @ Hortonworks Japan HBase book author Twitter: @uprush
Hortonworks Inc. 2011 2015. All Rights Reserved
Hadoop Hadoop Hadoop on EC2 Deployment Options
Hortonworks Inc. 2011 2015. All Rights Reserved
HadoopModern Data Architecture
Page 4
Hortonworks Inc. 2011 2015. All Rights Reserved
Hadoop
Number of Issues Resolved Number of Line of Code Increased
http://ajisakaa.blogspot.jp
Hortonworks Inc. 2011 2015. All Rights Reserved
Open Leadership
Code Contributed in 2014 by Organizationhttp://ajisakaa.blogspot.jp
Hortonworks Inc. 2011 2015. All Rights Reserved
:
20116: Yahoo! Hadoop 24 201412: 600Hadoop
Apache Project Committers PMC Members
Hadoop 27 21 Pig 5 5
Hive 18 6
Tez 16 15 HBase 6 4
Phoenix 4 4 Accumulo 2 2
Storm 3 2 Slider 11 11 Falcon 5 3 Flume 1 1 Sqoop 1 1 Ambari 36 28 Oozie 3 2
Zookeeper 2 1 Knox 13 3
Ranger 11 n/a
TOTAL 164 109
Hortonworks Inc. 2011 2015. All Rights Reserved
40075 2/3 Fortune 1000 100%
Hortonworks Inc. 2011 2014. All Rights Reserved
Hadoop Hortonworks
Hortonworks Inc. 2011 2015. All Rights Reserved
Hortonworks Inc. 2011 2015. All Rights Reserved
HDP (Hortonworks Data Pla/orm)
(MDA)
Modern Data Architecture
HDFS
Clickstream Web & Social
Geoloca;on Sensor & Machine
Server Logs
Unstructured
SOU
RC
ES
Existing Systems
ERP CRM SCM
AN
ALY
TIC
S
Data Marts
Business Analytics
Visualization & Dashboards
AN
ALY
TIC
S
Applications Business Analytics Visualization & Dashboards
HDFS (Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-Time Batch Partner ISV Batch Batch MPP EDW
Hortonworks Inc. 2011 2015. All Rights Reserved
Hortonworks Data Platform 2.2 Stack
Hortonworks Inc. 2011 2015. All Rights Reserved
HDP IS Apache Hadoop There is ONE Enterprise Hadoop: everything else is a vendor derivation
Hortonworks Data Platform 2.2
Had
oop
&YA
RN
Pig
Hiv
e &
HC
atal
og
HB
ase
Sqo
op
Ooz
ie
Zoo
keep
er
Am
bari
Sto
rm
Flu
me
Kno
x
Pho
enix
Acc
umul
o
2.2.0 0.12.0
0.12.0 2.4.0
0.12.1
Data Management
0.13.0
0.96.1
0.98.0
0.9.1 1.4.4
1.3.1
1.4.0
1.4.4
1.5.1
3.3.2
4.0.0
3.4.5 0.4.0
4.0.0
1.5.1
Fal
con
0.5.0
Ran
ger
Spa
rk
Kaf
ka
0.14.0 0.14.0
0.98.4
1.6.1
4.2 0.9.3
1.2.0 0.6.0
0.8.1
1.4.5 1.5.0
1.7.0
4.1.0 0.5.0
0.4.0 2.6.0
* version numbers are targets and subject to change at time of general availability in accordance with ASF release process
3.4.5
Tez
0.4.0
Slid
er
0.60
HDP 2.0
October
2013
HDP 2.2 October
2014
HDP 2.1
April
2014
Sol
r
4.7.2
4.10.0
0.5.1
Data Access Governance & Integration Security Operations
Hortonworks Inc. 2011 2015. All Rights Reserved
HadoopHive, Ambari, Ranger, and more
Page 13
Hortonworks Inc. 2011 2015. All Rights Reserved
HDFS: more Efficient Data Lake Storage
Tiered Storage DataNode DISK, SSD, RAM, ARCHIVAL
HDFS NFS Gateway HDFSNFS
Roadmap: Archival Tier GA
o8 Erasure Coding
o3x1.4x
S3 Swift SAN Filers
Collection of tiered storages
All disks as a single storage
Hortonworks Inc. 2011 2015. All Rights Reserved
YARN: extends Hadoop into Data OS
CPU Cgroup YARN Node Label
NM
NM
RS
NM
NM
NM
NM
RS
NM
NM
RS MR
Label: HBaseRegionServer
Label: HBaseRegionServer hbase
HBase on Slider
YARN App CS Queue
Hortonworks Inc. 2011 2015. All Rights Reserved
Slider: more YARN Ready Engines
YARN: Data Operating System (Cluster Resource Management)
1
Script
Pig
SQL
Hive TezTez
Others
Engines
Tez
Java Scala
Cascading Tez
Others
ISV Engines
Storm
Stream
Others
Engines
Slider
Solr
Search
HBase
NoSQL Slider
Accumulo
NoSQL
Slider
Spark
In-Memory K
afka
Slider
HDFS (Hadoop Distributed File System)
YARN HBase, Accumulo, Storm SDK for 3rd-party ISVs
Hortonworks Inc. 2011 2015. All Rights Reserved
Hive: Enterprise SQL at Hadoop Scale
: Insert, Update, Delete Roadmap: BEGIN, COMMIT, ROLLBACK
: 100 ORC File Hive on Tez Cost Based Optimizer Roadmap: LLAP
17
Hortonworks Inc. 2011 2015. All Rights Reserved
Spark: Enterprise Ready Spark on HDP 2.2.3
SparkHadoop
Spark 1.2 GA Spark on YARN ORC Hive on Spark Spark with Ambari
18
Hortonworks Inc. 2011 2015. All Rights Reserved
Kerberos
?
HDP 2.2
RANGER
Hortonworks Inc. 2011 2015. All Rights Reserved
Ranger:
20
Hortonworks Inc. 2011 2015. All Rights Reserved
Ambari: Hadoop Apache Ambari: Hadoop for Everyone, 100% Open Source
Hortonworks Inc. 2011 2015. All Rights Reserved
Hadoop on EC2 Deployment Options
Hortonworks Inc. 2011 2015. All Rights Reserved
Best Practices
HadoopHadoop: EMR
Hadoop on EC2
EBS
S3 HDP
Hortonworks Inc. 2011 2015. All Rights Reserved
EC2 Big and cheap
Hortonworks Inc. 2011 2015. All Rights Reserved
Hadoop
Big and cheap
12 cores Dual Intel Xeon E5-2650v2 (8c) or E5-2660v2 (10c) Processors
128GB or 256GB RAM 12 SATA / NLSAS, 1~4TB per drivers 1 or 10GbE nic
Hortonworks Inc. 2011 2015. All Rights Reserved
HadoopEC2
Big and cheap
Hortonworks Inc. 2011 2015. All Rights Reserved
Deploy:
I2Hs1
HDFS Tiered Storage YARN Node Label
HDP Cluster
I2.8xlarge I2.8xlarge
I2.8xlarge Hs1.8xlarge
I2.8xlarge Hs1.8xlarge
Hortonworks Inc. 2011 2015. All Rights Reserved
Storage Policy: SSD & Hot S
SD
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
DIS
K
DIS
K
DIS
K
DIS
K
DIS
K
DIS
K
HDP Cluster
A
DIS
K
DIS
K
DIS
K
A A
SSD All replicas on SSD DataSet A
(e.g., HBase)
Hot All replicas on
DISK
DataSet B (others)
B B B
I2.8x I2.8x I2.8x hs1.8x hs1.8x hs1.8x
Hortonworks Inc. 2011 2015. All Rights Reserved
Storage Policy:
AmbariHDFS Conguration Groups I2 Hs1
AmbariGroupsDataNodedfs.datanode.data.dir I2 group: [SSD]/hadoop/hdfs/data1,[SSD]/hadoop/hdfs/data2, Hs1 group: [DISK]/hadoop/hdfs/data1,[DISK]/hadoop/hdfs/data2,HDFS
Hortonworks Inc. 2011 2015. All Rights Reserved
Storage Policy
$ hdfs dfs -mkdir /hbase
$ hdfs dfsadmin -setStoragePolicy /hbase ALL_SSDSet storage policy ALL_SSD on /hbase
$ hdfs dfsadmin -getStoragePolicy /ssdThe storage policy of /ssd:BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD], creationFallbacks=[DISK], replicationFallbacks=[DISK]}
HBaseSSDi2 /hbase ALL_SSD
Hortonworks Inc. 2011 2015. All Rights Reserved
Ambari Blueprint ElasticHadoop
Hortonworks Inc. 2011 2015. All Rights Reserved
Ambari Blueprints
The CloudFormation for Hadoop
Microsoft AzureHDP
Hortonworks Inc. 2011 2015. All Rights Reserved
API
JSON API
Blueprint
Ambari Server Blueprint API API
IMPORT CLUSTER
INSTANTIATE
Hortonworks Inc. 2011 2015. All Rights Reserved
CLUSTER
EXPORT
Blueprint
GET /api/v1/clusters/mycluster?format=blueprint
Hortonworks Inc. 2011 2015. All Rights Reserved
100
{ "configurations" : [ { hdfs-site" : {
"dfs.datanode.data.dir" : /hadoop/1,/hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : master-host", "components" : [ { "name" : "NAMENODE }, { "name" : "RESOURCEMANAGER }, ], "cardinality" : "1" }, { "name" : worker-host", "components" : [ { "name" : DATANODE }, { "name" : NODEMANAGER }, ], "cardinality" : "1+" }, ], "Blueprints" : { "blueprint_name" : multi-node-hdfs-yarn", "stack_name" : "HDP", "stack_version" : "2.0" }}
{ "blueprint" : multi-node-hdfs-yarn", "host_groups" :[ { "name" : master-host", "hosts" : [ { "fqdn" : master001.ambari.apache.org
} ] }, { "name" : worker-host", "hosts" : [ { "fqdn" : worker001.ambari.apache.org
}, { "fqdn" : worker002.ambari.apache.org
}, { "fqdn" : worker099.ambari.apache.org
} ] } ]}
1. POST -d @hakone-blueprint.json /api/v1/blueprints/hakone
2. POST -d @hosts.json /api/v1/clusters/hakone
Hortonworks Inc. 2011 2015. All Rights Reserved
: Base AMI Ambari Server Ambari AgentAmbari ServerAmbari Agent AMIEC2 BootstrapAmbari server IP SpotBlueprint API API
Hortonworks Inc. 2011 2015. All Rights Reserved
# Ambari Reset (to clear previous installed clusters)ambari-server stopambari-server resetambari-server start
# Launch ec2 spot instancesec2-request-spot-instances
# re-create clustercurl -X POST -d @hakone-blueprint.json -u admin:admin localhost:8080/api/v1/blueprints/hakonecurl -X POST -d @hosts.json -u admin:admin localhost:8080/api/v1/clusters/hakone
Hortonworks Inc. 2011 2015. All Rights Reserved
HDP
Page 38
Hortonworks Inc. 2011 2015. All Rights Reserved
Hadoop Trends and Hadoop on EC2
Hadoop (MDA)Hadoop Hadoop Hadoop Hadoop on EC2
Hortonworks Inc. 2011 2015. All Rights Reserved
Thank youYifeng Jiang, Solutions Engineer, Hortonworks@uprush