Transcript
  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Hadoop Trends & Hadoop on EC2Yifeng JiangSolutions Engineer, Hortonworks, inc.March 22, 2015

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    (Yifeng Jiang)

    Solutions Engineer @ Hortonworks Japan HBase book author Twitter: @uprush

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Hadoop Hadoop Hadoop on EC2 Deployment Options

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    HadoopModern Data Architecture

    Page 4

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Hadoop

    Number of Issues Resolved Number of Line of Code Increased

    http://ajisakaa.blogspot.jp

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Open Leadership

    Code Contributed in 2014 by Organizationhttp://ajisakaa.blogspot.jp

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    :

    20116: Yahoo! Hadoop 24 201412: 600Hadoop

    Apache Project Committers PMC Members

    Hadoop 27 21 Pig 5 5

    Hive 18 6

    Tez 16 15 HBase 6 4

    Phoenix 4 4 Accumulo 2 2

    Storm 3 2 Slider 11 11 Falcon 5 3 Flume 1 1 Sqoop 1 1 Ambari 36 28 Oozie 3 2

    Zookeeper 2 1 Knox 13 3

    Ranger 11 n/a

    TOTAL 164 109

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    40075 2/3 Fortune 1000 100%

    Hortonworks Inc. 2011 2014. All Rights Reserved

    Hadoop Hortonworks

  • Hortonworks Inc. 2011 2015. All Rights Reserved

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    HDP (Hortonworks Data Pla/orm)

    (MDA)

    Modern Data Architecture

    HDFS

    Clickstream Web & Social

    Geoloca;on Sensor & Machine

    Server Logs

    Unstructured

    SOU

    RC

    ES

    Existing Systems

    ERP CRM SCM

    AN

    ALY

    TIC

    S

    Data Marts

    Business Analytics

    Visualization & Dashboards

    AN

    ALY

    TIC

    S

    Applications Business Analytics Visualization & Dashboards

    HDFS (Hadoop Distributed File System)

    YARN: Data Operating System

    Interactive Real-Time Batch Partner ISV Batch Batch MPP EDW

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Hortonworks Data Platform 2.2 Stack

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    HDP IS Apache Hadoop There is ONE Enterprise Hadoop: everything else is a vendor derivation

    Hortonworks Data Platform 2.2

    Had

    oop

    &YA

    RN

    Pig

    Hiv

    e &

    HC

    atal

    og

    HB

    ase

    Sqo

    op

    Ooz

    ie

    Zoo

    keep

    er

    Am

    bari

    Sto

    rm

    Flu

    me

    Kno

    x

    Pho

    enix

    Acc

    umul

    o

    2.2.0 0.12.0

    0.12.0 2.4.0

    0.12.1

    Data Management

    0.13.0

    0.96.1

    0.98.0

    0.9.1 1.4.4

    1.3.1

    1.4.0

    1.4.4

    1.5.1

    3.3.2

    4.0.0

    3.4.5 0.4.0

    4.0.0

    1.5.1

    Fal

    con

    0.5.0

    Ran

    ger

    Spa

    rk

    Kaf

    ka

    0.14.0 0.14.0

    0.98.4

    1.6.1

    4.2 0.9.3

    1.2.0 0.6.0

    0.8.1

    1.4.5 1.5.0

    1.7.0

    4.1.0 0.5.0

    0.4.0 2.6.0

    * version numbers are targets and subject to change at time of general availability in accordance with ASF release process

    3.4.5

    Tez

    0.4.0

    Slid

    er

    0.60

    HDP 2.0

    October

    2013

    HDP 2.2 October

    2014

    HDP 2.1

    April

    2014

    Sol

    r

    4.7.2

    4.10.0

    0.5.1

    Data Access Governance & Integration Security Operations

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    HadoopHive, Ambari, Ranger, and more

    Page 13

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    HDFS: more Efficient Data Lake Storage

    Tiered Storage DataNode DISK, SSD, RAM, ARCHIVAL

    HDFS NFS Gateway HDFSNFS

    Roadmap: Archival Tier GA

    o8 Erasure Coding

    o3x1.4x

    S3 Swift SAN Filers

    Collection of tiered storages

    All disks as a single storage

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    YARN: extends Hadoop into Data OS

    CPU Cgroup YARN Node Label

    NM

    NM

    RS

    NM

    NM

    NM

    NM

    RS

    NM

    NM

    RS MR

    Label: HBaseRegionServer

    Label: HBaseRegionServer hbase

    HBase on Slider

    YARN App CS Queue

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Slider: more YARN Ready Engines

    YARN: Data Operating System (Cluster Resource Management)

    1

    Script

    Pig

    SQL

    Hive TezTez

    Others

    Engines

    Tez

    Java Scala

    Cascading Tez

    Others

    ISV Engines

    Storm

    Stream

    Others

    Engines

    Slider

    Solr

    Search

    HBase

    NoSQL Slider

    Accumulo

    NoSQL

    Slider

    Spark

    In-Memory K

    afka

    Slider

    HDFS (Hadoop Distributed File System)

    YARN HBase, Accumulo, Storm SDK for 3rd-party ISVs

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Hive: Enterprise SQL at Hadoop Scale

    : Insert, Update, Delete Roadmap: BEGIN, COMMIT, ROLLBACK

    : 100 ORC File Hive on Tez Cost Based Optimizer Roadmap: LLAP

    17

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Spark: Enterprise Ready Spark on HDP 2.2.3

    SparkHadoop

    Spark 1.2 GA Spark on YARN ORC Hive on Spark Spark with Ambari

    18

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Kerberos

    ?

    HDP 2.2

    RANGER

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Ranger:

    20

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Ambari: Hadoop Apache Ambari: Hadoop for Everyone, 100% Open Source

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Hadoop on EC2 Deployment Options

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Best Practices

    HadoopHadoop: EMR

    Hadoop on EC2

    EBS

    S3 HDP

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    EC2 Big and cheap

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Hadoop

    Big and cheap

    12 cores Dual Intel Xeon E5-2650v2 (8c) or E5-2660v2 (10c) Processors

    128GB or 256GB RAM 12 SATA / NLSAS, 1~4TB per drivers 1 or 10GbE nic

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    HadoopEC2

    Big and cheap

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Deploy:

    I2Hs1

    HDFS Tiered Storage YARN Node Label

    HDP Cluster

    I2.8xlarge I2.8xlarge

    I2.8xlarge Hs1.8xlarge

    I2.8xlarge Hs1.8xlarge

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Storage Policy: SSD & Hot S

    SD

    SS

    D

    SS

    D

    SS

    D

    SS

    D

    SS

    D

    SS

    D

    SS

    D

    SS

    D

    DIS

    K

    DIS

    K

    DIS

    K

    DIS

    K

    DIS

    K

    DIS

    K

    HDP Cluster

    A

    DIS

    K

    DIS

    K

    DIS

    K

    A A

    SSD All replicas on SSD DataSet A

    (e.g., HBase)

    Hot All replicas on

    DISK

    DataSet B (others)

    B B B

    I2.8x I2.8x I2.8x hs1.8x hs1.8x hs1.8x

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Storage Policy:

    AmbariHDFS Conguration Groups I2 Hs1

    AmbariGroupsDataNodedfs.datanode.data.dir I2 group: [SSD]/hadoop/hdfs/data1,[SSD]/hadoop/hdfs/data2, Hs1 group: [DISK]/hadoop/hdfs/data1,[DISK]/hadoop/hdfs/data2,HDFS

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Storage Policy

    $ hdfs dfs -mkdir /hbase

    $ hdfs dfsadmin -setStoragePolicy /hbase ALL_SSDSet storage policy ALL_SSD on /hbase

    $ hdfs dfsadmin -getStoragePolicy /ssdThe storage policy of /ssd:BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD], creationFallbacks=[DISK], replicationFallbacks=[DISK]}

    HBaseSSDi2 /hbase ALL_SSD

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Ambari Blueprint ElasticHadoop

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Ambari Blueprints

    The CloudFormation for Hadoop

    Microsoft AzureHDP

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    API

    JSON API

    Blueprint

    Ambari Server Blueprint API API

    IMPORT CLUSTER

    INSTANTIATE

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    CLUSTER

    EXPORT

    Blueprint

    GET /api/v1/clusters/mycluster?format=blueprint

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    100

    { "configurations" : [ { hdfs-site" : {

    "dfs.datanode.data.dir" : /hadoop/1,/hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : master-host", "components" : [ { "name" : "NAMENODE }, { "name" : "RESOURCEMANAGER }, ], "cardinality" : "1" }, { "name" : worker-host", "components" : [ { "name" : DATANODE }, { "name" : NODEMANAGER }, ], "cardinality" : "1+" }, ], "Blueprints" : { "blueprint_name" : multi-node-hdfs-yarn", "stack_name" : "HDP", "stack_version" : "2.0" }}

    { "blueprint" : multi-node-hdfs-yarn", "host_groups" :[ { "name" : master-host", "hosts" : [ { "fqdn" : master001.ambari.apache.org

    } ] }, { "name" : worker-host", "hosts" : [ { "fqdn" : worker001.ambari.apache.org

    }, { "fqdn" : worker002.ambari.apache.org

    }, { "fqdn" : worker099.ambari.apache.org

    } ] } ]}

    1. POST -d @hakone-blueprint.json /api/v1/blueprints/hakone

    2. POST -d @hosts.json /api/v1/clusters/hakone

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    : Base AMI Ambari Server Ambari AgentAmbari ServerAmbari Agent AMIEC2 BootstrapAmbari server IP SpotBlueprint API API

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    # Ambari Reset (to clear previous installed clusters)ambari-server stopambari-server resetambari-server start

    # Launch ec2 spot instancesec2-request-spot-instances

    # re-create clustercurl -X POST -d @hakone-blueprint.json -u admin:admin localhost:8080/api/v1/blueprints/hakonecurl -X POST -d @hosts.json -u admin:admin localhost:8080/api/v1/clusters/hakone

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    HDP

    Page 38

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Hadoop Trends and Hadoop on EC2

    Hadoop (MDA)Hadoop Hadoop Hadoop Hadoop on EC2

  • Hortonworks Inc. 2011 2015. All Rights Reserved

    Thank youYifeng Jiang, Solutions Engineer, Hortonworks@uprush