Spark와 Hadoop, 완벽한 조합 (한국어)

  • View
    4.121

  • Download
    0

Embed Size (px)

Transcript

  • Spark HDP, (Hortonworks Data Platform)

    , Hortonworks Korea

    Hortonworks Inc. 2011 2015. All Rights Reserved

  • Hadoop?

    /

    Hortonworks Inc. 2011 2015. All Rights Reserved

  • 4ZB DATA

    MOBILE

    DEVICES

    HUMAN

    CONTENT

    INTERNET

    OF THINGS

    44ZB DATA

    Page 3 Hortonworks Inc. 2011 2015. All Rights Reserved

    Source: http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm

  • , , ,

    Page 4 Hortonworks Inc. 2011 2015. All Rights Reserved

  • Apache Hadoop ,

    Page 5 Hortonworks Inc. 2011 2015. All Rights Reserved

  • Page 6 Hortonworks Inc. 2011 2015. All Rights Reserved

    Hadoop

    App App App App

    App

    App

    Page 6 Hortonworks Inc. 2011 2015. All Rights Reserved

    H ADOO P

  • Hadoop

    Page 7 Hortonworks Inc. 2011 2015. All Rights Reserved

  • Page 8 Hortonworks Inc. 2011 2015. All Rights Reserved

    Payment Tracking

    Call Analysis

    Machine Data

    Product Design

    Social Mapping

    Factory Yields

    Defect Detection

    Due Diligence

    M & A Proactive Repair Disaster

    Mitigation Investment Planning

    Next Product

    Recs

    Store Design

    Risk Modeling

    Ad Placement

    Inventory Predictions

    Sentiment Analysis

    Ad Placement

    Basket Analysis Segments

    Customer Support

    Supply Chain

    Cross- Sell

    Customer Retention

    Vendor Scorecards

    Optimize Inventories

    , , .

  • Page 9 Hortonworks Inc. 2011 2015. All Rights Reserved

    Historical Records

    OPEX Reduction

    Mainframe Offloads

    Fraud Prevention

    Data as a

    Service

    Public Data

    Capture

    IT Hadoop . , ETL , .

    Digital Protection

    Device Data

    Ingest

    Rapid Reporting

    ETL

  • Page 10 Hortonworks Inc. 2011 2015. All Rights Reserved

    Hortonworks . .

    Social Mapping

    Payment Tracking

    Factory Yields

    Defect Detection

    Call Analysis

    Machine Data

    Product Design M & A

    Due Diligence

    Next Product

    Recs

    Store Design

    Risk Modeling

    Ad Placement

    Proactive Repair

    Disaster Mitigation

    Investment Planning

    Inventory Predictions

    Customer Support

    Sentiment Analysis

    Supply Chain

    Ad Placement

    Basket Analysis Segments

    Cross- Sell

    Customer Retention

    Vendor Scorecards

    Optimize Inventories

    OPEX Reduction

    Mainframe Offloads

    Historical Records

    Data as a

    Service

    Public Data

    Capture

    Fraud Prevention

    Device Data

    Ingest

    Rapid Reporting

    Digital Protection

    ETL

  • Hortonworks?

    Hadoop

    Hortonworks Inc. 2011 2015. All Rights Reserved

  • Page 12 Hortonworks Inc. 2011 2015. All Rights Reserved

    Hortonworks Hadoop

    H O R T O N W O R K S D ATA P L AT F O R M

    YARN:

  • Page 13 Hortonworks Inc. 2011 2015. All Rights Reserved

    Hortonworks Apache

    Apache Hadoop 1/3,

    Hadoop

    Hadoop

    A PA C H E H A D O O P C O M M I T T E R S

  • Page 14 Hortonworks Inc. 2011 2015. All Rights Reserved

    STO

    RA

    GE

    STOR

    AG

    E

    Hortonworks

    Hortonworks Hadoop

    =

    Apache

    ,

    Project 1

    Project 5

    Project 4

    Project 3

    Project 2

    Project 6

  • Page 15 Hortonworks Inc. 2011 2015. All Rights Reserved

    Hortonworks

    Hortonworks SmartSense

    Hortonworks SmartSense

  • Page 16 Hortonworks Inc. 2011 2015. All Rights Reserved

    Hortonworks

    Hortonworks Hadoop , 100 40%

    F100 75% F100 65% F100 55% F100 46% F100 40%

    Hortonworks

    2014 Forrester Wave

    The Forrester Wave: Big Data Hadoop Solutions

  • Page 17 Hortonworks Inc. 2011 2015. All Rights Reserved

    Hortonworks

    556 (2015 8 5 ) 2015 2 119 NASDAQ : HDP

    Hortonworks Data Platform

    , ,

    ,

    Hadoop

    2011

    Yahoo! 24 Hadoop , ,

    740+

    1350+

  • Page 18 Hortonworks Inc. 2011 2015. All Rights Reserved

    Hortonworks IT

    IT , ,

    Hortonworks

    Hadoop

    , ,

    2015 6 Shared Accounts of Hortonworks (A, I) (All Cut, n=35)

    Hortonworks, Big Data #1

    Microsoft, Hosting #2

    MongoDB, Warehousing #3

    Tableau, Big Data #4

    20

    Source: https://hortonworks.com/blog/cio-survey-hortonworks-data-platform-now-a-top-it-imperative/

  • Spark HDP,

    Spark on YARN,

    Hortonworks Inc. 2011 2015. All Rights Reserved

  • Page 20 Hortonworks Inc. 2011 2015. All Rights Reserved

    API DataFrames, , SQL

    Hive Hadoop SQL Spark Hadoop

    , ,

    Hadoop

    Hortonworks Spark

    Storage

    YARN: Data Operating System

    Governance Security

    Operations

    Resource Management

  • Page 21 Hortonworks Inc. 2011 2015. All Rights Reserved

    YARN SLA ,

    - RDD HDFS

    SQL SparkSQL Hive , HS2; ORC

    Spark NoSQL RDDs for predicate pushdown HBase

    , , :

    Apache Zeppelin

    Spark Hadoop ?

    Storage

    YARN: Data Operating System

    Governance Security

    Operations

    Resource Management

  • Page 22 Hortonworks Inc. 2011 2015. All Rights Reserved

    Apache Atlas, Spark Apache Falcon

    Apache Ranger , Apache Ambari

    Linux, Windows,

    Spark Cloudbreak Ambari - Azure, AWS, GCP, OpenStack, Docker

    Spark Hadoop ?

    Storage

    YARN: Data Operating System

    Governance Security

    Operations

    Resource Management

  • Page 23 Hortonworks Inc. 2011 2015. All Rights Reserved

    !

  • Page 24 Hortonworks Inc. 2011 2015. All Rights Reserved

    CDO ( )

    : ,

  • Page 25 Hortonworks Inc. 2011 2015. All Rights Reserved

    :

    -

    Cloudbreak 1. 2. Spark blueprint

    3. HDP

    Microsoft Azure

  • Page 26 Hortonworks Inc. 2011 2015. All Rights Reserved

    Login to launch.hortonworks.com which is a self-service portal for launching HDP clusters to the cloud (cloudbreak.sequenceiq.com)

  • Page 27 Hortonworks Inc. 2011 2015. All Rights Reserved

    Select a cloud provider, then start the process of creating your cluster

  • Page 28 Hortonworks Inc. 2011 2015. All Rights Reserved

    Name the cluster, choose your region, and pick your blueprintin this case, we want hdp-spark-cluster for our data science work

  • Page 29 Hortonworks Inc. 2011 2015. All Rights Reserved

    We clicked create cluster and Cloudbreak is now provisioning our Spark environment on Azure

  • Page 30 Hortonworks Inc. 2011 2015. All Rights Reserved

    We can now access Zeppelin which is a data science notebook for Spark thats similar to iPython notebook

  • Page 31 Hortonworks Inc. 2011 2015. All Rights Reserved

    Lets look at our data. We can see eventType, if the drivers certified, how many hours driven, as well as weather data such as foggy, rainy, etc.

  • Page 32 Hortonworks Inc. 2011 2015. All Rights Reserved

    Lets start asking questions of our data; such as, does fatigue cause violations?

  • Page 33 Hortonworks Inc. 2011 2015. All Rights Reserved

    Lets view the data in a pie chart graphic to see how violations look by hours driven.

  • Page 34 Hortonworks Inc. 2011 2015. All Rights Reserved

    How are violations impacted by fog?

  • Page 35 Hortonworks Inc. 2011 2015. All Rights Reserved

    Does location have an impact on incidents?

  • Page 36 Hortonworks Inc. 2011 2015. All Rights Reserved

    OK, weve learned enough about the data and what features we want to include in our model. So well run a logistic regression on training data.

  • Page 37 Hortonworks Inc. 2011 2015. All Rights Reserved

    Lets run our code

  • Page 38 Hortonworks Inc. 2011 2015. All Rights Reserved

    Lets look at our model. Next step is to hand the model off to the Enterprise Architect to integrate into our real-time application.

  • Page 39 Hortonworks Inc. 2011 2015. All Rights Reserved

    YARN

    HDFS

    BI

    (ActiveMQ)

    SQL NoSQL Use

    Model

  • Storm Spark

    Hortonworks Inc. 2011 2015. All Rights Reserved

  • Page 41 Hortonworks Inc. 2011 2015. All Rights Reserved

    HDFS

    YARN

    (Storm)

    (Kafka)

    (Hive on Tez)

    (HBase)

    Predic'on Bolt

    Spark Storm

    (Spark)

    Spark

  • Page 42 Hortonworks Inc. 2011 2015. All Rights Reserved

    Pig

    HDFS

    HCatalog ()

    DB

    Tableau

  • Page 43 Hortonworks Inc. 2011 2015. All Rights Reserved

    Spark Bolt

    : HDFS

    YARN

    Storm