12
How to Rush a Contest in 24 Hours 中中中中中中中 中中中 中中中中中中 中中中 (avail..)

How to Rush a Contest in 24 Hours

  • Upload
    holden

  • View
    106

  • Download
    0

Embed Size (px)

DESCRIPTION

How to Rush a Contest in 24 Hours. 中科院自动化所:李勇保 高 珩. 中科院计算所:孔东营(avail..). 一碗泡面. 两瓶红牛. 几套工具. 大食桶. 白瓶 红瓶. sklearn libfm,pmf,omf. 必备工具. BT机器. Libfm (Factorization Machine Library). Steffen Rendle: http://libfm.org 分类 or 回归 学习方法: SGD ALS - PowerPoint PPT Presentation

Citation preview

Page 1: How to Rush a Contest in 24 Hours

How to Rush a Contestin 24 Hours

中科院自动化所:李勇保 高 珩

中科院计算所:孔东营 (avail..)

Page 2: How to Rush a Contest in 24 Hours

必备工具• BT 机器

一碗泡面大食桶

两瓶红牛白瓶红瓶

几套工具sklearnsklearn

libfm,pmf,omflibfm,pmf,omf

Page 3: How to Rush a Contest in 24 Hours

• Steffen Rendle: http://libfm.org

• 分类 or 回归• 学习方法:

– SGD

– ALS

– MCMC: easy to handle (no learning rate, no regularization)

• 输入: libsvm 格式• 迭代次数不需太多• 尝试不同参数,寻找最优

Libfm ( Factorization Machine Library )

Page 4: How to Rush a Contest in 24 Hours

training_set.txt

user_id

score

score of user and movie

max

min

average

score_rate

never scoremovie_id

特征

movie_tag.txt

user_history.txt

user_social.txt: num of followers

movie_id

movie_tag

user_id

movie_id

session_id

show times of movie

tags of user (filtered or weighted)

match of user_tag

tag_num of movie

score of tags

score of movie (based on tag)

tags of movie(low weight)

300w+ feature

Page 5: How to Rush a Contest in 24 Hours

Result of Libfm

• 仅用 user_id 和 movie_id , board : 0.622• libfm 单模型, board : 0.608• 获取多模型结果:

– 使用不同特征集– 改变隐变量数目– 改变学习方法( MCMC , SGD , ALS )– 迭代次数不能太高– 隐变量数目过大( >60 ),速度很慢

Page 6: How to Rush a Contest in 24 Hours

http://svdfeature.apexlab.org/

( 上海交大 Apex 实验室 )

学习方法:

输入:类似 libsvm 格式基于 feature 的可扩展性加入 global feature : movie-tag

3 个结果: 0.619 , 0.620 , 0.621

SVDFeature

高 珩

Page 7: How to Rush a Contest in 24 Hours

• Salakhutdinov et al.:

http://www.cs.utoronto.ca/~amnih/papers/pmf.pdf

• 概率图模型表示:

PMF ( Probabilistic Matrix Factorization )

Page 8: How to Rush a Contest in 24 Hours

• 求解目标函数 :

• 学习方法: Gradient Descent in U and V

• 模型输入:训练集评分矩阵,测试集评分初始化矩阵

• 模型输出:测试集评分预测矩阵

• 选择隐变量维度– 2 个结果: 0.618 , 0619

PMF ( Probabilistic Matrix Factorization )

Page 9: How to Rush a Contest in 24 Hours

Ordinal Matrix Factorization• Matrix Factorization

• Ordinal Regression

vuy T

54

21

1

5

2

1

y

y

y

r孔东营

Page 10: How to Rush a Contest in 24 Hours

Ordinal Matrix Factorization

• Probabilistic Ordinal Regression

Page 11: How to Rush a Contest in 24 Hours

主题特征• PLSA

• EM 算法求解 :– E 步:

– M 步:

• 对于 user social 和 movie _user 求 PLSA 主题作为特征

Page 12: How to Rush a Contest in 24 Hours

Ensemble

• libfm– 10 个结果: 0.608-0.622

• PMF– 2 个结果: 0.618 , 0619 (user_id,movie

_id)

• SVD– 3 个结果: 0.619-0.621

• OMF– 3 个结果: 0.614,0.618,0.616

• PLSA– 1 个结果: 0.621

• LR– 2 个结果: 0.642 , 0.627

• SGD– 1 个结果: 0.633

选择不同的特征集和隐变量个数

Ridge RegressionRandom 10-kold

交叉验证