Upload
amazon-web-services
View
484
Download
2
Embed Size (px)
Citation preview
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dickson Yue, Solutons Architect
20 May
數位行銷大數據顯學
產業議程 - 電子商務、數位媒體與行銷
seconds – hours, future
Past
Retrospective report
Present
Dashboard, alert
Future
Prediction, insight
Collect Process Analyze
StoreData Answers
Time to Answer (Latency)
Throughput
Cost
Batch Real-time Prediction
Report Alert Future
Amazon S3
Amazon Kinesis
Amazon DynamoDB
Amazon RDS (Aurora)
AWS Lambda
KCL Apps
Amazon
EMRAmazon
Redshift
Amazon Machine
Learning
Collect Process Analyze
Store
Data Collection
and StorageData
Processing
Event
Processing
Data
Analysis
Data Answers
Data source
• Page
• Click event
• Web log
• Thing event
Use case
Answer
• User retention
• High spending customer
navigation pattern
• User segmentation
• UX improvement
• What deal/ad to try
next
Requirement
Ingest
• Scalability
• Raw data
• Low running cost
Analyze
• Full visibility
• Data without sampling
• Data from device in any
form factor
• Flexibility
• Join with different
datasets
JavaScript
(Snowplow)
AWS SDK
logstach
Fluentd
Ingest Store
@ 30km/s a.k.a 300 rps
HTTP Post
Amazon
S3
Storage
@ 100km/s
Ingest Store Process
JavaScript
(Snowplow)
AWS SDK
LOG4J
Flume
Fluentd
HTTP Post
Amazon
Lambda
Amazon
S3Amazon
Kinesis
API Server Streaming
Buffer
24hrs-7days
Compute Storage
Store
Web
Servers
50k rps
@ 100km/s
Ingest Store Process
JavaScript
(Snowplow)
AWS SDK
LOG4J
Flume
Fluentd
HTTP Post
Amazon
Lambda
Amazon
S3Amazon
Kinesis
API Gateway
API Server Streaming
Buffer
24hrs-7days
Compute Storage
Store
50k rps
Many storage layers to choose from
Amazon DynamoDB
EMR-DynamoDB
connector
Amazon RDS
Amazon
Kinesis
Streaming data
connectorsJDBC Data Source
w/ Spark SQL
Elasticsearch
connector
Amazon Redshift
Amazon Redshift Copy
From HDFS
EMR File System
(EMRFS)
Amazon S3
Amazon EMR
User retention and growth
0
1000
2000
3000
4000
5000
6000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Daily A
cti
ve U
sers
Product Age (days)
Product A
Product B
High churn = wasted ad dollars
$-
$5,000.00
$10,000.00
$15,000.00
$20,000.00
$25,000.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Product age (days)
Product A
Product B
BUSINESS MEDIA operates more than 20 business-to-businesses with significant holdings in the
automotive, electronic, medical and finance industries
MAGAZINES publishes 20 U.S. titles and close to 300 international editions
BROADCASTINGcomprises 31 television and two radio stations
NEWSPAPERS owns 15 daily and 34 weekly newspapers
Hearst includes over 200 businesses in over
100 countries around the world
Data Pipeline
Buzzing API
API
Ready
Data
Amazon
Kinesis
S3 Storage
Node.JS
App- ProxyUsers to
Hearst
Properties
Clickstream
Data Science
Application
Amazon Redshift
ETL on EMR
100 seconds
1G/day
30 seconds
5GB/day
5 seconds
1G/day
Milliseconds
100GB/day
LATENCY
THROUGHPUT Models
Agg Data
Data
Brand page activity
Post #hashtag
User profile
Use case
Answer
Campaign performance
Customer service automation
Building Chatbot
Logstash
AWS SDK
Ingest Store
Bot AWS SDK
App
CrawlersAWS SDK
Amazon
Kinesis
Process Store
Amazon S3
Amazon
Lambda
ElasticSearch
Analysts
AWS SDK
Why do we need machine learning for this?
The social media stream is high-volume, and most of the
messages are not CS-actionable
Logstash
AWS SDK
Ingest Store
Bot AWS SDK
App
CrawlersAWS SDK
Amazon
Kinesis
Process
Amazon
Lambda
Analysts
AWS SDK
Machine
learning
Notification
Action
Support
issue
Database
Feature
request
Keep training the ML model with new data
Action
Amazon S3
AWS SDK
Ingest Store
Bot AWS SDK
Messenger
Amazon
Kinesis
Process
Amazon
Lambda
Analysts
Machine
learning
Action
Bot
App
Get prediction
Keep training the ML model with new dataAmazon S3
Refrigerator
POS
Door sensor
Water
Camera
Storefront
Kitchen
Lambda
SQS
AWS IoT
SQSPoller
Http Event Collector
Serverless
Architecture
Sushiro – Real-time streaming & analysisReal-time data ingested by Amazon Kinesis is analyzed in Amazon Redshift
380 stores stream live data from
Sushi plates
Inventory information combined
with consumption information
near real-time
Forecast demand by store,
minimize food waste, and
improve efficiencies
Amazon