48
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dickson Yue, Solutons Architect 20 May 數位行銷大數據顯學 產業議程 - 電子商務、數位媒體與行銷

Big Data Analytics on AWS for Digital Marketing

Embed Size (px)

Citation preview

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Dickson Yue, Solutons Architect

20 May

數位行銷大數據顯學

產業議程 - 電子商務、數位媒體與行銷

CRM ERP

DBsLog file

AWStats

days

MB

2002Big bang

<2005Hello world

Page/Event

tracking

GA

hours

GB

SOLOMO

minutes - hours

TB

<2008New customer service

New System monitoring

New QA

IoT

O2O

seconds – hours

PB

2016Fast and big

data driven marketing

Insights

NEO

seconds – hours, future

Past

Retrospective report

Present

Dashboard, alert

Future

Prediction, insight

How to get started?

Collect Process Analyze

StoreData Answers

Time to Answer (Latency)

Throughput

Cost

Batch Real-time Prediction

Report Alert Future

Amazon S3

Amazon Kinesis

Amazon DynamoDB

Amazon RDS (Aurora)

AWS Lambda

KCL Apps

Amazon

EMRAmazon

Redshift

Amazon Machine

Learning

Collect Process Analyze

Store

Data Collection

and StorageData

Processing

Event

Processing

Data

Analysis

Data Answers

TrackingClickstream, user retention

Data source

• Page

• Click event

• Web log

• Thing event

Use case

Answer

• User retention

• High spending customer

navigation pattern

• User segmentation

• UX improvement

• What deal/ad to try

next

Requirement

Ingest

• Scalability

• Raw data

• Low running cost

Analyze

• Full visibility

• Data without sampling

• Data from device in any

form factor

• Flexibility

• Join with different

datasets

JavaScript

(Snowplow)

AWS SDK

logstach

Fluentd

Ingest Store

@ 30km/s a.k.a 300 rps

HTTP Post

Amazon

S3

Storage

@ 100km/s

Ingest Store Process

JavaScript

(Snowplow)

AWS SDK

LOG4J

Flume

Fluentd

HTTP Post

Amazon

Lambda

Amazon

S3Amazon

Kinesis

API Server Streaming

Buffer

24hrs-7days

Compute Storage

Store

Web

Servers

50k rps

@ 100km/s

Ingest Store Process

JavaScript

(Snowplow)

AWS SDK

LOG4J

Flume

Fluentd

HTTP Post

Amazon

Lambda

Amazon

S3Amazon

Kinesis

API Gateway

API Server Streaming

Buffer

24hrs-7days

Compute Storage

Store

50k rps

Amazon

S3

Storage

Store Process

EMR

Redshift

RedshiftEMR

ETL

Visualize

JDBC

ODBC

JDBC

ODBC

Many storage layers to choose from

Amazon DynamoDB

EMR-DynamoDB

connector

Amazon RDS

Amazon

Kinesis

Streaming data

connectorsJDBC Data Source

w/ Spark SQL

Elasticsearch

connector

Amazon Redshift

Amazon Redshift Copy

From HDFS

EMR File System

(EMRFS)

Amazon S3

Amazon EMR

Amazon S3

Store Process

EMR

Visualize

JDBC

ODBC

RedshiftBasket

CRM ERP

DBs

Log file

Day-14 retention over time

User retention and growth

N-day retention

User retention and growth

0

1000

2000

3000

4000

5000

6000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Daily A

cti

ve U

sers

Product Age (days)

Product A

Product B

High churn = wasted ad dollars

$-

$5,000.00

$10,000.00

$15,000.00

$20,000.00

$25,000.00

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Product age (days)

Product A

Product B

BUSINESS MEDIA operates more than 20 business-to-businesses with significant holdings in the

automotive, electronic, medical and finance industries

MAGAZINES publishes 20 U.S. titles and close to 300 international editions

BROADCASTINGcomprises 31 television and two radio stations

NEWSPAPERS owns 15 daily and 34 weekly newspapers

Hearst includes over 200 businesses in over

100 countries around the world

1

Data Pipeline

Buzzing API

API

Ready

Data

Amazon

Kinesis

S3 Storage

Node.JS

App- ProxyUsers to

Hearst

Properties

Clickstream

Data Science

Application

Amazon Redshift

ETL on EMR

100 seconds

1G/day

30 seconds

5GB/day

5 seconds

1G/day

Milliseconds

100GB/day

LATENCY

THROUGHPUT Models

Agg Data

Social listeningSocial CRM, Chatbots

Data

Brand page activity

Post #hashtag

User profile

Use case

Answer

Campaign performance

Customer service automation

Building Chatbot

Logstash

AWS SDK

Ingest Store

Bot AWS SDK

App

CrawlersAWS SDK

Amazon

Kinesis

Process Store

Amazon S3

Amazon

Lambda

ElasticSearch

Analysts

AWS SDK

Motivation for listening to social media

Customer is reporting a possible service issue

Motivation for listening to social media

Customer is making a feature request

Motivation for listening to social media

Customer is angry or unhappy

Motivation for listening to social media

Customer is asking a question

Why do we need machine learning for this?

The social media stream is high-volume, and most of the

messages are not CS-actionable

Logstash

AWS SDK

Ingest Store

Bot AWS SDK

App

CrawlersAWS SDK

Amazon

Kinesis

Process

Amazon

Lambda

Analysts

AWS SDK

Machine

learning

Notification

Action

Support

issue

Database

Feature

request

Keep training the ML model with new data

Action

Amazon S3

AWS SDK

Ingest Store

Bot AWS SDK

Messenger

Amazon

Kinesis

Process

Amazon

Lambda

Analysts

Machine

learning

Action

Bot

App

Get prediction

Keep training the ML model with new dataAmazon S3

OI from Business viewwith custom source

Operational intelligence (OI)

What is the view of my storefront?

What is your campaign performance now?index=bakery-storefront

Refrigerator

POS

Door sensor

Water

Camera

Storefront

Kitchen

Lambda

SQS

AWS IoT

SQSPoller

Http Event Collector

Serverless

Architecture

Sushiro – Real-time streaming from IoT & analysis

Sushiro – Real-time streaming & analysisReal-time data ingested by Amazon Kinesis is analyzed in Amazon Redshift

380 stores stream live data from

Sushi plates

Inventory information combined

with consumption information

near real-time

Forecast demand by store,

minimize food waste, and

improve efficiencies

Amazon

Source DBs

3rd Party Data

Log Data

Reporting

Analysis

Processing

Data Lake

S3

Source of truth

Data

Shop sensors

Pos transactions

Use case

Answer

Cross sell up sell in real time