Upload
amela-harrell
View
53
Download
1
Embed Size (px)
DESCRIPTION
机器学习在互联网广告中的应用. 庄宝童. Agenda. 介绍 机器学习应用 Common utility Advertiser Publisher user 总结. 为什么需要互联网广告?. 流量(用户)是互联网 公司的重要资产 互联网内容免费模式,需要流量变现来维持运营 广告收入占比: Google : 95% (2012 , http ://investor.google.com/financial/tables.html ) Facebook : 83% ( 2011 ) Baidu :? Alibaba :? - PowerPoint PPT Presentation
Citation preview
为什么需要互联网广告?
• 流量(用户)是互联网公司的重要资产
• 互联网内容免费模式,需要流量变现来维持运营
• 广告收入占比:
– Google :95% (2012,http
://investor.google.com/financial/tables.html)
– Facebook:83% (2011)
– Baidu:?
– Alibaba:?
• 特点:效果量化可追踪,运营销售参与少,曝光成本低
• 对互联网广告公司而言,是一种理想的“印钞机”商业模式(吴军,《浪潮之巅》)
我们需要什么样的广告?Find the best match between a given
user in a given context and a suitable
advertisement
-- Andrei Broder and Dr.
Vanja 2011
Advertise
rs
Ad Network
Ads
Page
Pick best ads
User
Publisher
Response rates(click, conversion,ad-view)
Bids
Auction
Select argmax f(bid, rate)
Statisticalmodel
conversion
Players in the ecosystem
• Publisher’s utility : Revenue , user engagement• Advertiser ‘s utility : ROI• User’s utility : relevance
mechanism design• 合同定价 ( futures market ),CPM 或 CPT 计价
• 拍卖定价 (spot market)– GFP
– GSP
– VCG
• 计价方式– CPM (Cost per Mille-impressions): publisher 风险最小,如 yahoo , sina
的品牌广告
– CPC (Cost per Click) : publisher 和 advertiser 风险共担, google
adwords ,百度凤巢等大部分属于此类
– CPA (cost per Action) : advertiser 风险最小,如淘宝客。
CPC 的 ranking functions
• Bid ranking : bid
– 源于 goto.com (overture 前身,后被yahoo 收购)
• Revenue ranking : CTR * bid
– Google 首创
– 核心问题: CTR prediction
model
P(click | user, ad, context)
• ad : creative, bid-terms, landing page, campaign,
advertiser, format (text/image/video), size, etc.
• user : cookie, demo, geo, behavioral, activity
history
• context : query, publisher, page-content, session,
time
algorithms
• Logistic Regression + feature engineering (google,
yahoo, baidu, facebook , etc)
• Microsoft (Baysian Probit Regression)
• Google : boosting http
://users.soe.ucsc.edu/~niejiazhong/slides/
chandra.pdf
• Taobao (Mixture of Logistic Regression)
• trends : big data + nonlinear/feature learning
challenges
• Sparsity : use Natural hierarchies or
Auto-generated hierarchies
• Missing data
• Bias : position , ad category , etc
• Dynamical /seasonal effects
• Spam/noisy data
features
• Features:
– Click feedback features ( COEC )
– Query features
– Query-ad text matching features
• Preprocess:
– 离散化 分段
– 特征交叉
– 层次特征—处理稀疏性 ( variance bias trade-off)
– 特征平滑,变换
training
• 训练集• 正负样本分层采样 – imbalance training 问题
• Instances : 1B
• Features : 10B
• 分布式训练– MPI (baidu, taobao)
– map reduce (google)
Evaluation
• Offline evaluation
– MSE, MAE
– AUC
• Online A/B test
– 分层实验平台( google , Overlapping Experiment
Infrastructure: More, Better, Faster
Experimentation )
– 正态 / 二项分布样本的假设检验
Explore/Exploit
• 低 mean ,高方差的 ads 应该給予展示机会
• E.g. Consider 2 ads (same bids)
– Goal: Select most popular
– CTR1 ~ (mean=.01,var=.1), CTR2~
(mean=.05,var~0)
CTR
Pro
babi
lity
dens
ity Ad 2
Ad 1
E&E 常用算法• Upper confidence bound policy (UCB)
– Mean + uncertainty-estimate
– mean + k* sd(estimator)
• Thompson sampling
– 从 posterior 里随机采样,比较适合 Bayesian 类的算法
• 问题– 广告集合巨大, explore 代价过大
– 跟传统 Multi-Arms bandits 问题不太一样,广告集合是动态的,且每次会选择多个
Advertiser’s perspective
• Keyword selection
• Bid optimization
• Smart pricing
• Anti fraud
• Impression forecasting : time series
• Smooth delivery: allocation algorithms
CVR prediction
• 用途:– Smart pricing :外部流量千差万别,广告主没有精力也能力做分
媒体的出价,需要按照点击价值进行智能出价 ( Google , smart
pricing grows the pie) ,以保证广告主的 ROI
– DSP: real time bidding
– CPA 模式的 rank function : ctr * cvr * bid
• 做法:与 CTR 预估问题类似,但更困难– 转化数据获取困难,且更为稀疏
– 不同广告主的转化定义不一致
User’s perspective
• User fatigue
• User privacy
• Behavioral targeting / retargeting
• Query intent
• Low quality ads detection ( google,
detecting adversarial advertisements in the
wild)