36
Time is of the Essence : Improving Recency Ranking Using Twitter Data (WWW2k10) Anlei Dong, Ruiqiang Zhang, Pranam Kolari , Jing Bai, Fernando Diaz, Yi Chang, Zhaohui Zheng Presenter : ChinHui Chen ( 陳陳陳 )

Time is of the Essence : Improving Recency Ranking Using Twitter Data

Embed Size (px)

Citation preview

Page 1: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Time is of the Essence : Improving Recency Ranking Using Twitter Data (WWW2k10)

Anlei Dong, Ruiqiang Zhang, Pranam Kolari , Jing Bai, Fernando Diaz, Yi Chang, Zhaohui Zheng

Presenter : ChinHui Chen ( 陳晉暉 )

Page 2: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Author

• Anlei Dong• Area: Yahoo! Search Sciences• Location: Yahoo! Labs Silicon Valley

Page 3: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Agenda

• Introduction• Motivation • Method • Experiment• Discussion

Page 4: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Introduction

• Recency Sensitive Queriesex: earthquake -> relevant , timely

• Problem : 1. 0 recall prob2. user’s need for relevant content is immediate

• Use Micro-Blogging site

Page 5: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Motivation

• General web search algorithm : – match signals (Language/VSM Model)– query-independent signals (PageRank)

• But when we issue recency sensitive queries– Fresh docs may have very few in-links.– Fresh docs may have very few clicks.

Page 6: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Motivation (con’t)

• So how ?

Page 7: Time is of the Essence : Improving Recency Ranking Using Twitter Data
Page 8: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods

• Learning to rank • What ?– 想像是個黑盒子– 給一堆 Query – Doc Pair 的 feature– 就會 Train 一個 Model– 之後遇到未知 Query 抽 Feature 即可對 Doc 排序

Page 9: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Feature

Score

Prediction

Page 10: Time is of the Essence : Improving Recency Ranking Using Twitter Data
Page 11: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods (con’t)

• So the goal is :

Query : 章魚哥Ranking List

Regular URL

Fresh URL

Page 12: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods (con’t)

• So the goal is :

Query : 章魚哥Regular URL Fresh URL

標準答案 測試

Regular URL Fresh URL

Query : 新的 Query抽 feature 抽 feature

3514

4425

抽 feature 抽 feature

????

????

Train Predict

Page 13: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods(con’t)

• Therefore, the main steps :1. Extract Features.2. Apply Learning to rank.

Page 14: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods – FeatureSet

• Content Features– Functions of the content of the doc

(ex. query term match, …)• Aggregate Features– A doc’s long term popularity, usage

(in-link stat, clicks, PageRank, …)• Twitter Features

Page 15: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods – FeatureSet (con’t)

• Content Features• Aggregate Features• Twitter Features– Textual Features– Social Network Features– Other Features

Page 16: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods – FeatureSet (con’t)

• Twitter - Textual Features (q vs url)Goal : 將 URL 用 text 表示 , 可算 Cosine

Mm post

w URLs

Dm post

v words

Represents a URL by combination of twitter contents

Page 17: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods – FeatureSet (con’t)

• Twitter - Textual FeaturesGoal : 沒有 match 的 term 應該懲罰

PS:

Page 18: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods – FeatureSet (con’t)

• Twitter - Textual FeaturesGoal : phrase

Page 19: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods – FeatureSet (con’t)

• 整理 - Textual Features (q vs url)

Page 20: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods – FeatureSet (con’t)

• Twitter - Social Network Features

A user i posted URL j

Page 21: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods – FeatureSet (con’t)

• Twitter - Other Features– The next page

Page 22: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Avg stat of users who issued the tiny url.

First user who issued the tiny url.

Issued the tiny url with highest Twitter score

Page 23: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods – FeatureSet (con’t)

Regular URL Twitter URL

Content Features(ex. term info )

O O

Aggregate Features(ex. PageRank)

O poor

Twitter Features No Have

Page 24: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods – Ranking

• We have feature set now.• How does Learning To Rank work ??

1. Build Relevance Model (Training)2. Predict ranking (Prediction)

Page 25: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods – Ranking(con’t)

• 1. Build Relevance Model : – Train 一個 query 與 url 是否相關的 Model.

– Straightforward : • 1. sample query-url pairs (regular + twitter) , and label them.• 2. train a ranking function

(RankSVM, RankBoost, Gbrank, RankNet,…)

• But … regular url >>>>> twitter url(twitter feature 只有 twitter url 有 , 會被忽略 )

Page 26: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods – Ranking(con’t)

• 1. Build Relevance Model : – Modified :

M represents ranking functionD represents data setF represents feature setTRAIN-MLR (D,F) : train D using F PREDICT(D,M) : scores dataset, D using M

Page 27: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods – Ranking(con’t)

• 2. Prediction Ranking Straightforward : 1. apply step1. model to regular/twitter urls. 2. rank url by sorting scores.

Regular Twitter

Page 28: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Methods – Ranking(con’t)

Page 29: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Experiments

• Dataset : – Queries : • only consider time-sensitive queries in one hour.

– Regular URL : • in the search engine index during one hour.

– Twitter URL : • 9-hour period before the query time.

Page 30: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Experiments(con’t)• Label : – query-url pairs: perfect, excellent, good, fair, bad.– documents :

提升 fresh降低 out of date

Page 31: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Experiments(con’t)

Page 32: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Experiments(con’t)Evaluation :

Page 33: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Experiments(con’t)

• Q :

Page 34: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Experiments(con’t)Mregular = content + aggreMcontent = content onlyMtwitter = twitter only

Regular Twitter

Content Feature(ex. term info )

O O

Aggregate(ex. PageRank)

O poor

Twitter Features No Have

• Result:

Page 35: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Experiments(con’t)

• Feature Importance

Authority & activity of users

Page 36: Time is of the Essence : Improving Recency Ranking Using Twitter Data

Q&A