29
Data Engineering In Practice: SmartNews Ads裏のDMP System Lan

SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Embed Size (px)

Citation preview

Page 1: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Data Engineering In Practice: SmartNews Ads裏のDMP System

Lan

Page 2: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Who am I• Lan

• Veteran hacker but new in AD world

• someone who can make a computer do what he wants—whether the computer wants to or not. (http://paulgraham.com/gba.html)

• ex-{Rakuten, GREE}

• Distribution System, Info Retrieval, ML

Page 3: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Today’s Talk

• DMP in SmartNews Ads

• #1. Prediction

• #2. Targeting

• Future Work & Summary

Page 4: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

DMP = Data Management Platform

Page 5: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

DMP in SmartNews Ads• Private DMP ( 90%+1st-party data )

• Data Collect, Clean, Aggregation

• ID Mapping

• User Profiling

• User Clustering

• CTR / CVR Prediction

• Lookalike

• Custom Audience

Page 6: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

DMPClusters

AD delivery cluster

AD Log in S3

Kinesis

AD tracker

Video AD delivery cluster

DMPstreaming

Audience Data

in DynamoDBRDB

Hadoop

ML

Analytics

Models&

Targeting

SmartNewsLog

ML

Small company but not small data

•Article Meta > 200K/day •Article x {read, share, read_related …} •Channel x {subscribe, preview, view, …} •Push, Live, Weather, Setting, … •Survey result

•Audience Data > 14M (~5M MAU)

•AD Meta •AD History •AD Conversions •AD Optout

• Managed/Compressed Data > 130TB

• Lookalike seeds

• ~1TB Data for training CTR prediction model •> 1M unique features

•User Demographics •Device •Locations •…

Page 7: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

#1 Prediction

Page 8: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Pick up an ADto feed here

Page 9: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Similar to Recommendation

but DIFFERENT

• optimization goal • accuracy of the probability

Page 10: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

More than Ranking • When we do AD auction

• eCPM (effective Cost per Mille) = CTR (Click Through Rate) x CPC (Cost per Click)

• Suppose we have

• CTRad1=0.05 > CTRad2=0.04 > CTRad3=0.03

• CPCad1 = 10JPY, CPCad2 = 13JPY, CPCad3 = 20JPY(winner)

• but if: pCTRad1 = 0.2 (winner) > pCTR’ad2 = 0.1 > pCTR’ad3 = 0.03

• then we lost 0.1JPY potential income

Page 11: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

The CTR(CVR) prediction Problem

μ(a, u, c) = p(click | a,u,c)

Page 12: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

CTR Prediction v1• Train and scoring daily

• One GBDT (Gradient Boosting Decision Tree) model per AD campaign

• using ~1month’s data

• Hundreds of small batches inside Hadoop Yarn

• Quick and Simple

• dev in 1 month

• pick up best features for every campaign

• minutes ~ 1 hour for model training

• explainable Tree models

• no need for AD feature

• Same approach for CVR prediction (CPC / CVR = CPA (Cost Per Acquisition) )

delivery result

UserFeatures

generatesamples

Yarn

Users

predictions

sample

model

scoring

sample

model

scoring

sample

model

scoring

Page 13: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Metrics• NE (Normalized Cross- Entropy)

• the average log loss when using predicted CTR / the average log loss per impression

• https://facebook.com//download/321355358042503/adkdd_2014_camera_ready_junfeng.pdf

• AUC (Area under the ROC curve, AUROC)

• measure ranking quality

• others: Precision/Recall, ECS(Effective catalog size), CTR / CVR / Sales, etc

Page 14: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Review of CTR Prediction v1• Marked improvement, moderate AUC & NE

• And

• hard to do overall tuning

• hard to prediction online (feature set differs)

• latency for new campaigns

• relatively poor performance to new campaigns (cold start)

• lost the connections between campaigns even for the same advertiser

• …

Page 15: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

CTR Prediction v2• A simple model for all

• AD feature added

• Dynamic features extraction

• All calculation distributed

• GBDT + LogisticRegression

• Train once per day, scoring twice

Page 16: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

About the Features• >1M unique features, sparse

• GBDT provides great feature engineering

• (sometimes) feature engineering is kind of intuition and trial-and-error

• demographic, device, location, reading interests…

• AD history is helpful

• Feature Hashing, Binarization & Discretization, …

Page 17: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Performance improvement

Page 18: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

#2 Targeting

Watabe

TamTam

Komiya

Takei

Ikeishi

Nagase

Lan

Niku

Game

Beer

Snack

Costume

Gourmet

Princess

Page 19: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

It’s difficult comparing to

Page 20: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Profiling User by Statistics and ML

• Gender Prediction (precision: 0.90+), Age Prediction, …

• News Channel / Source Preference

• AD Slot Preference

• …

Page 21: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Standard Targeting

• Female in Kansai who subscribes Travel Channel

Page 22: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Lookalike Targeting

Page 23: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Lookalike Targeting• Our solution

• Solve it as an classification problem

• Seed user as Positive Sample

• While all targeting candidates as Negative Sample (w/ random sampling )

• based on Spark MLlib Logistic Regression

• 30%~50% CVR↑ comparing to normal targeting

Page 24: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Article Keyword TargetingKeyword

Realtime Calculating Reach UU

Only user who exceeds a certain

read-time threshold will be included

Page 25: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Custom Audience

SmartNewsAD

tracker

Send any custom event(S2S req, web beacon, etc)

EventAudience

BloomFilter Obj

Updatingper

Several Minutes

YourService / App / Site

SmartNewsAD

DeliveryCluster

AD targeting/

Delete Targeting

Lookalike

Lookalike Targeting

Page 26: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Future Work

Page 27: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Targeting Audience by Interests

Page 28: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Collect Negative Signal to

Optimize UX

Page 29: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム

Summary of My 1st SmartNews Year

• Challenge place. We’re startup so we can move quick and break things

• Learn from the industry leaders. Keep trial-and-error.

• Number don’t lie. Don’t trust your intuition over number.

• But if you really doubt the number, look closely. there may be BUG hidden.