26
コカ・コーライーストジャパン株式会社 From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola east Japan October 27, 2016 Information Systems, Enterprise Architect & Innovation project manager Damien Contreras ダミアン コントレラ

Coca-Cola East Japan - hadoop summit 2016

Embed Size (px)

Citation preview

Page 1: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

Fromasingledroplettoafullbottle,ourjourneytoHadoopatCoca-ColaeastJapan

October27,2016InformationSystems,EnterpriseArchitect

&Innovationprojectmanager

DamienContrerasダミアン コントレラ

Page 2: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

InThissession

• AboutCoca-ColaEastJapan• HadoopJourneyatCCEJ• HadoopProjects• Hadoopforthemanufacturing

industry• HadoopforCCEJ:What’sNext

Page 3: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社 3

• Coca-ColaEastJapanwasestablishedonJul.1,2013throughthemergeroffourbottlers.• OnApr.1,2015,itunderwentfurtherbusinessintegrationwith

SendaiCoca-ColaBottlingCo.,Ltd.• AnnouncedMOUwithCoca-ColaWestonApril26,2016to

proceedwithdiscussions/reviewofbusinessintegrationopportunities

• Japan'slargestCoca-ColaBottler,withanextensivelocalnetwork,sellingthemostpopularbeveragebrandsinJapan

DataasofDecember2015

AboutCoca-ColaEastJapan

Page 4: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

Page 5: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

CCEJDataLandscape

DATAINSILOS(Datamart,ERP,DWH,Staging,Mainframe,…)

P2PINTERFACES(NoESB,MultipleETL&InterfaceServers)

NOGOVERNANCE(MultipleDataformatsforsamebusiness

context,NoMetaDataMgt.)

BATCHORIENTED(File,Scheduler,…)

Page 6: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

HadoopJourney:Genesis

Yarn

HiveKNIME

WEKA Tez

Analytics System Processing Integration DatasourceData

Restitution

HDFS

MR

Centos

Flatfiles

July2015• Pilotphase• 5nodes• AzureA1à A4• 100GB• 70GBofRAM• Team:1person

Ambari

KNIME

Page 7: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

HadoopJourney:Stability

Yarn

Hive

Ranger

KNIME

TezPython

NotebookNiFi

Analytics System Processing Integration DatasourceData

Restitution

Flatfiles

HDFS

MR

Centos

ActiveDirectory

November2015• Pilotphase• 6nodes• AzureA4à D&DS13• 1TBofdata• 336GBofRAM• Team:2persons

Zeppelin

Ambari

Page 8: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

HadoopJourney:Production

Yarn

HiveSpark

BWonHana

Ranger

KNIME Zeppelin

TezPython

Notebook

NiFi

Analytics System Processing Integration DatasourceData

Restitution

Flatfiles

WebServices

HDFS

MR

Centos

ActiveDirectory

March2016• 8 nodes• AzureD/DS13• 3TBofData• 64cores• 448GBRam• Team:2people

Ambari

Page 9: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

• 13nodes• 20TB• 104cores• 728GBRAM• 1000+Tables• 3ProductionSystems

HadoopEco-systematCCEJ

Analytics System Processing Integration

Datasource

DataRestituti

on

Aggregateddata Visualization

2Data Hub

Pastdata Forecastdata

1Analytics

3Master Data

Centralize

Lineage

Governance

Yarn

Hive

Spark

BWonHanaHTML

Report

Ranger

Zeppelin

Tez Presto

AirPal

PythonNotebook

MySQL

NiFi

SAPECC

Boomi

SparklingWaterTensorflow

Flatfiles

WebServices

HDFS

MR

Drill

Centos

ActiveDirecto

ry

Ambari

KNIME

Page 10: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

May Jun July Aug Sept Oct Nov Dec Jan Feb Mar Apr May Jun July Aug Sept Oct

Timeline

Hadoop/NiFi PlatformPlatformPOC

VMAnalyticsPOC ForecastImplementationVMAnalyticsPOC

2015 2016

POCVMPlacement

FlowimplementationBWReportintegration

1 SAPintegration&MDM3

2 Write-Offreport

Page 11: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

20TB

Page 12: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

HIGHNbr.OFMACHINES

550,000 VM,On/Offline

Nbr.SKUsperVM

25 SKUs,Hot&Cold

VendingReplenishment:TheBusinessCase

EXTERNALFACTORS(Weather,Citydata,Geo-Location,Events)

VENDINGROUTES(VisitListpertruck,Logisticsdependence)

ColdHot

Howto:Reducenbr.ofvisitsOptimizeTruckstockAvoidoutofstocks

Page 13: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

Vendingreplenishmentforecast:TheProject

TheChallenge:• Deploymentin3months

• 1½hourtogeneratetheforecast• +20%ofaccuracyversuspreviousversion• 120stepsintheprogram

Picking list

Visit Plan

OnlineVM

OfflineVM

Everyday

Yes NoNoArbitrationForecastgeneration

HadoopHasDelivered:• Feed5GB+ofnewdataeveryday• Processhighvolumeofdata(in-memory)

300GB+• Integratefromdifferentdatasources• Generatemorecomplicatedforecastthan

legacysystems

14Millionitems

Page 14: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

Page 15: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

Staging:TheCaseof“write-off”report

Drill Web ServerAzure

X7systems

MasterData

GenerateSQLquery

JSON

HTMLInterface

Verify&CheckCombine

Report

Challenges:• Datasetharmonization

(Sales,Billing,Inventory)• Datavolumefromsource

systems• ComplexComputationlogic• Notclearfunctional

requirements

Objectives:• Aggregatealargenumberof

dataset40+flows4GB ofdataeveryday

• Singleviewofdata,anywhere,toFinance,SC&Commercial

• Dynamictransactionvs.staticinexcel

• Reducemanualworktozero

Comparison=

Aggregation=+

Enrichmentà

Analyticsà

Transformation(conversion)à

Page 16: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

MDM:CentralizationandDispatch

External Systems

4 Replicatedata

Event driven

3 Consistencycheck

Ruleengine ReplicationEngineMDMRepository

2 MDMregistration

Lineage

1 MDMCreation

Challenges:• Ruleenginedefinitionand

implementation• MDMonHadoop&ESB

integration• MDM&SAPSynchronization

Objectives:• SingleMDMrepository• Centralizedbridgetables&

Mappingtable• StandardizationofMDMacross

datalandscape• Targeteddistribution/replication

ofMDMtoexternalsystems

Realization:• MySQLandHadoopsynchronization

300+tables• ReplicationenginewithESB• MDM-Tool:PilotwithCustomer

Master• Fullgo-live:April2017

Page 17: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

Usecase– SAPIntegration/salesinterfacereportObjectives:

• LeveragethemostgranulardataalreadyinHadoop

• LeveragetheprocessingpowerofHadoop

x9 flows

x4 flows

x7 flows

x9 flowsMD&Bridge

VendingSalesData

Legacy formatdata

CCEJ formatdata

Bridgetable&Master Combine

Calculate

x9 outputtables

Company1

Company2

Company 3

Azure

Challenges:• Manydataformatrequiring

complexdatatransformation• Widevarietyofdatasources&

technologiestotransferdata• Datamappingbetweensystems

Realization:• DatastructureinHadoop• Logicforonetypeofsales

channelimplemented• Fullgo-live:April2017

Page 18: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

Hadoop:What’sNext

Increasedatavelocity&CreateatrueDataLake

Improvedatacollection,quality,profiling,meta-data&proposeacatalogofcurateddatatoendusers

TowardaDataDrivenDecisionProcess

DevelopSupport&OperationalExcellence

Page 19: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

IthankCCEJmanagementwhohadthecouragetobelieveinanAgileapproach

Thanktomyteammemberandcomrades

Page 20: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

Yourturn,letsshareideas&acoke!

DamienContrerasEmail:[email protected]:DamienContrerasTwitter:@dvolute

Page 21: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

TheinsideofHadoop

Page 22: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

BW on Hana

IntegrationLandscapeoverviewHadoop Prod

NifiProd

NiFiProd

Oracle

Boomi

HiveJDBC

DrillIDOCS

JDBC

Flat files

MySQL

SAP ECC

Other systems

Other systems

FTP

JDBC

HTTP HTMLinterface

Power users

Acquisition Transformation Restitution

dt=20161024

dt=20161025

t_my_table_txt_p

My_file_20161024.csv

My_file_20161025.csv

Myflow-data

t_my_table_txt_p(Externaltexttables)

t_my_table_txt_p

t_my_bridge_table_txt_p

+Myflow-data(Database) t_my_report_orc_p

(ORCtables)

Page 23: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

GuidelinesaroundNiFi flows

Prod

Dev

Prod

Dev

Azure

Triggers

System source NiFiListener

Extraction

webCall

JDBC

Groups

Encryption

/ Flow

Master Data

Transaction Data

ProcessingGroup

Page 24: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

GuidelinesaroundNiFi flows

Retry

Processor

WritetoerrorlogSuccess

OnErrorReadfromErrorlog

Re-Process

UpdateErrorlog

SendData

Every5mins

ErrorHandling / Flow

Master Data

Transaction Data

Page 25: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

NiFi enhancement:example

Page 26: Coca-Cola East Japan - hadoop summit 2016

コカ・コーライーストジャパン株式会社

TechnicalArchitectureHadoop Production environment

….

Node3

Node4

Node5 Node11

AD

NiFi

Node0

Node1

Node2 Node6

Hadoop Dev environment

Node3Node0

Node1 Node2

Prod environment

Dev environment

RDBMS

FTPServer

SAPECC

Azure

NiFi

NiFi

NiFi