22
机机机机机机机机机机机机机机 机机机

机器学习在互联网广告中的应用

Embed Size (px)

DESCRIPTION

机器学习在互联网广告中的应用. 庄宝童. Agenda. 介绍 机器学习应用 Common utility Advertiser Publisher user 总结. 为什么需要互联网广告?. 流量(用户)是互联网 公司的重要资产 互联网内容免费模式,需要流量变现来维持运营 广告收入占比: Google : 95% (2012 , http ://investor.google.com/financial/tables.html ) Facebook : 83% ( 2011 ) Baidu :? Alibaba :? - PowerPoint PPT Presentation

Citation preview

机器学习在互联网广告中的应用庄宝童

Agenda

• 介绍

• 机器学习应用– Common utility

– Advertiser

– Publisher

– user

• 总结

为什么需要互联网广告?

• 流量(用户)是互联网公司的重要资产

• 互联网内容免费模式,需要流量变现来维持运营

• 广告收入占比:

– Google :95% (2012,http

://investor.google.com/financial/tables.html)

– Facebook:83% (2011)

– Baidu:?

– Alibaba:?

• 特点:效果量化可追踪,运营销售参与少,曝光成本低

• 对互联网广告公司而言,是一种理想的“印钞机”商业模式(吴军,《浪潮之巅》)

我们需要什么样的广告?Find the best match between a given

user in a given context and a suitable

advertisement

-- Andrei Broder and Dr.

Vanja 2011

Advertise

rs

Ad Network

Ads

Page

Pick best ads

User

Publisher

Response rates(click, conversion,ad-view)

Bids

Auction

Select argmax f(bid, rate)

Statisticalmodel

conversion

Players in the ecosystem

• Publisher’s utility : Revenue , user engagement• Advertiser ‘s utility : ROI• User’s utility : relevance

mechanism design• 合同定价 ( futures market ),CPM 或 CPT 计价

• 拍卖定价 (spot market)– GFP

– GSP

– VCG

• 计价方式– CPM (Cost per Mille-impressions): publisher 风险最小,如 yahoo , sina

的品牌广告

– CPC (Cost per Click) : publisher 和 advertiser 风险共担, google

adwords ,百度凤巢等大部分属于此类

– CPA (cost per Action) : advertiser 风险最小,如淘宝客。

CPC 的 ranking functions

• Bid ranking : bid

– 源于 goto.com (overture 前身,后被yahoo 收购)

• Revenue ranking : CTR * bid

– Google 首创

– 核心问题: CTR prediction

model

P(click | user, ad, context)

• ad : creative, bid-terms, landing page, campaign,

advertiser, format (text/image/video), size, etc.

• user : cookie, demo, geo, behavioral, activity

history

• context : query, publisher, page-content, session,

time

algorithms

• Logistic Regression + feature engineering (google,

yahoo, baidu, facebook , etc)

• Microsoft (Baysian Probit Regression)

• Google : boosting http

://users.soe.ucsc.edu/~niejiazhong/slides/

chandra.pdf

• Taobao (Mixture of Logistic Regression)

• trends : big data + nonlinear/feature learning

challenges

• Sparsity : use Natural hierarchies or

Auto-generated hierarchies

• Missing data

• Bias : position , ad category , etc

• Dynamical /seasonal effects

• Spam/noisy data

features

• Features:

– Click feedback features ( COEC )

– Query features

– Query-ad text matching features

• Preprocess:

– 离散化 分段

– 特征交叉

– 层次特征—处理稀疏性 ( variance bias trade-off)

– 特征平滑,变换

training

• 训练集• 正负样本分层采样 – imbalance training 问题

• Instances : 1B

• Features : 10B

• 分布式训练– MPI (baidu, taobao)

– map reduce (google)

Evaluation

• Offline evaluation

– MSE, MAE

– AUC

• Online A/B test

– 分层实验平台( google , Overlapping Experiment

Infrastructure: More, Better, Faster

Experimentation )

– 正态 / 二项分布样本的假设检验

实践• 实时计算,性能问题– 简单有效的候选集选取

– 精确计算

• Online learning

Explore/Exploit

• 低 mean ,高方差的 ads 应该給予展示机会

• E.g. Consider 2 ads (same bids)

– Goal: Select most popular

– CTR1 ~ (mean=.01,var=.1), CTR2~

(mean=.05,var~0)

CTR

Pro

babi

lity

dens

ity Ad 2

Ad 1

E&E 常用算法• Upper confidence bound policy (UCB)

– Mean + uncertainty-estimate

– mean + k* sd(estimator)

• Thompson sampling

– 从 posterior 里随机采样,比较适合 Bayesian 类的算法

• 问题– 广告集合巨大, explore 代价过大

– 跟传统 Multi-Arms bandits 问题不太一样,广告集合是动态的,且每次会选择多个

Advertiser’s perspective

• Keyword selection

• Bid optimization

• Smart pricing

• Anti fraud

• Impression forecasting : time series

• Smooth delivery: allocation algorithms

CVR prediction

• 用途:– Smart pricing :外部流量千差万别,广告主没有精力也能力做分

媒体的出价,需要按照点击价值进行智能出价 ( Google , smart

pricing grows the pie) ,以保证广告主的 ROI

– DSP: real time bidding

– CPA 模式的 rank function : ctr * cvr * bid

• 做法:与 CTR 预估问题类似,但更困难– 转化数据获取困难,且更为稀疏

– 不同广告主的转化定义不一致

User’s perspective

• User fatigue

• User privacy

• Behavioral targeting / retargeting

• Query intent

• Low quality ads detection ( google,

detecting adversarial advertisements in the

wild)

Publisher’s perspective

• Revenue

• User engagement

谢谢