Time is of the Essence : Improving Recency Ranking Using Twitter Data

Time is of the Essence : Improving Recency Ranking Using Twitter Data (WWW2k10)

Anlei Dong, Ruiqiang Zhang, Pranam Kolari , Jing Bai, Fernando Diaz, Yi Chang, Zhaohui Zheng

Presenter : ChinHui Chen ( 陳晉暉 )

Author

• Anlei Dong• Area: Yahoo! Search Sciences• Location: Yahoo! Labs Silicon Valley

Agenda

• Introduction• Motivation • Method • Experiment• Discussion

Introduction

• Recency Sensitive Queriesex: earthquake -> relevant , timely

• Problem : 1. 0 recall prob2. user’s need for relevant content is immediate

• Use Micro-Blogging site

Motivation

• General web search algorithm : – match signals (Language/VSM Model)– query-independent signals (PageRank)

• But when we issue recency sensitive queries– Fresh docs may have very few in-links.– Fresh docs may have very few clicks.

Motivation (con’t)

• So how ?

Methods

• Learning to rank • What ?– 想像是個黑盒子– 給一堆 Query – Doc Pair 的 feature– 就會 Train 一個 Model– 之後遇到未知 Query 抽 Feature 即可對 Doc 排序

Feature

Score

Prediction

Methods (con’t)

• So the goal is :

Query : 章魚哥Ranking List

Regular URL

Fresh URL

Methods (con’t)

• So the goal is :

Query : 章魚哥Regular URL Fresh URL

標準答案測試

Regular URL Fresh URL

Query : 新的 Query抽 feature 抽 feature

3514

4425

抽 feature 抽 feature

????

????

Train Predict

Methods(con’t)

• Therefore, the main steps :1. Extract Features.2. Apply Learning to rank.

Methods – FeatureSet

• Content Features– Functions of the content of the doc

(ex. query term match, …)• Aggregate Features– A doc’s long term popularity, usage

(in-link stat, clicks, PageRank, …)• Twitter Features

Methods – FeatureSet (con’t)

• Content Features• Aggregate Features• Twitter Features– Textual Features– Social Network Features– Other Features


• Twitter - Textual Features (q vs url)Goal : 將 URL 用 text 表示 , 可算 Cosine

Mm post

w URLs

Dm post

v words

Represents a URL by combination of twitter contents


• Twitter - Textual FeaturesGoal : 沒有 match 的 term 應該懲罰

PS:


• Twitter - Textual FeaturesGoal : phrase


• 整理 - Textual Features (q vs url)


• Twitter - Social Network Features

A user i posted URL j


• Twitter - Other Features– The next page

Avg stat of users who issued the tiny url.

First user who issued the tiny url.

Issued the tiny url with highest Twitter score


Regular URL Twitter URL

Content Features(ex. term info )

O O

Aggregate Features(ex. PageRank)

O poor

Twitter Features No Have

Methods – Ranking

• We have feature set now.• How does Learning To Rank work ??

1. Build Relevance Model (Training)2. Predict ranking (Prediction)

Methods – Ranking(con’t)

• 1. Build Relevance Model : – Train 一個 query 與 url 是否相關的 Model.

– Straightforward : • 1. sample query-url pairs (regular + twitter) , and label them.• 2. train a ranking function

(RankSVM, RankBoost, Gbrank, RankNet,…)

• But … regular url >>>>> twitter url(twitter feature 只有 twitter url 有 , 會被忽略 )


• 1. Build Relevance Model : – Modified :

M represents ranking functionD represents data setF represents feature setTRAIN-MLR (D,F) : train D using F PREDICT(D,M) : scores dataset, D using M


• 2. Prediction Ranking Straightforward : 1. apply step1. model to regular/twitter urls. 2. rank url by sorting scores.

Regular Twitter


Experiments

• Dataset : – Queries : • only consider time-sensitive queries in one hour.

– Regular URL : • in the search engine index during one hour.

– Twitter URL : • 9-hour period before the query time.

Experiments(con’t)• Label : – query-url pairs: perfect, excellent, good, fair, bad.– documents :

提升 fresh降低 out of date

Experiments(con’t)

Experiments(con’t)Evaluation :


• Q :

Experiments(con’t)Mregular = content + aggreMcontent = content onlyMtwitter = twitter only

Regular Twitter

Content Feature(ex. term info )

O O

Aggregate(ex. PageRank)

O poor

Twitter Features No Have

• Result:


• Feature Importance

Authority & activity of users

Q&A

Data & Analytics

Time is of the Essence : Improving Recency Ranking Using Twitter Data