Upload
hopebay-technologies-inc
View
128
Download
0
Embed Size (px)
Citation preview
Time is of the Essence : Improving Recency Ranking Using Twitter Data (WWW2k10)
Anlei Dong, Ruiqiang Zhang, Pranam Kolari , Jing Bai, Fernando Diaz, Yi Chang, Zhaohui Zheng
Presenter : ChinHui Chen ( 陳晉暉 )
Author
• Anlei Dong• Area: Yahoo! Search Sciences• Location: Yahoo! Labs Silicon Valley
Agenda
• Introduction• Motivation • Method • Experiment• Discussion
Introduction
• Recency Sensitive Queriesex: earthquake -> relevant , timely
• Problem : 1. 0 recall prob2. user’s need for relevant content is immediate
• Use Micro-Blogging site
Motivation
• General web search algorithm : – match signals (Language/VSM Model)– query-independent signals (PageRank)
• But when we issue recency sensitive queries– Fresh docs may have very few in-links.– Fresh docs may have very few clicks.
Motivation (con’t)
• So how ?
Methods
• Learning to rank • What ?– 想像是個黑盒子– 給一堆 Query – Doc Pair 的 feature– 就會 Train 一個 Model– 之後遇到未知 Query 抽 Feature 即可對 Doc 排序
Feature
Score
Prediction
Methods (con’t)
• So the goal is :
Query : 章魚哥Ranking List
Regular URL
Fresh URL
Methods (con’t)
• So the goal is :
Query : 章魚哥Regular URL Fresh URL
標準答案 測試
Regular URL Fresh URL
Query : 新的 Query抽 feature 抽 feature
3514
4425
抽 feature 抽 feature
????
????
Train Predict
Methods(con’t)
• Therefore, the main steps :1. Extract Features.2. Apply Learning to rank.
Methods – FeatureSet
• Content Features– Functions of the content of the doc
(ex. query term match, …)• Aggregate Features– A doc’s long term popularity, usage
(in-link stat, clicks, PageRank, …)• Twitter Features
Methods – FeatureSet (con’t)
• Content Features• Aggregate Features• Twitter Features– Textual Features– Social Network Features– Other Features
Methods – FeatureSet (con’t)
• Twitter - Textual Features (q vs url)Goal : 將 URL 用 text 表示 , 可算 Cosine
Mm post
w URLs
Dm post
v words
Represents a URL by combination of twitter contents
Methods – FeatureSet (con’t)
• Twitter - Textual FeaturesGoal : 沒有 match 的 term 應該懲罰
PS:
Methods – FeatureSet (con’t)
• Twitter - Textual FeaturesGoal : phrase
Methods – FeatureSet (con’t)
• 整理 - Textual Features (q vs url)
Methods – FeatureSet (con’t)
• Twitter - Social Network Features
A user i posted URL j
Methods – FeatureSet (con’t)
• Twitter - Other Features– The next page
Avg stat of users who issued the tiny url.
First user who issued the tiny url.
Issued the tiny url with highest Twitter score
Methods – FeatureSet (con’t)
Regular URL Twitter URL
Content Features(ex. term info )
O O
Aggregate Features(ex. PageRank)
O poor
Twitter Features No Have
Methods – Ranking
• We have feature set now.• How does Learning To Rank work ??
1. Build Relevance Model (Training)2. Predict ranking (Prediction)
Methods – Ranking(con’t)
• 1. Build Relevance Model : – Train 一個 query 與 url 是否相關的 Model.
– Straightforward : • 1. sample query-url pairs (regular + twitter) , and label them.• 2. train a ranking function
(RankSVM, RankBoost, Gbrank, RankNet,…)
• But … regular url >>>>> twitter url(twitter feature 只有 twitter url 有 , 會被忽略 )
Methods – Ranking(con’t)
• 1. Build Relevance Model : – Modified :
M represents ranking functionD represents data setF represents feature setTRAIN-MLR (D,F) : train D using F PREDICT(D,M) : scores dataset, D using M
Methods – Ranking(con’t)
• 2. Prediction Ranking Straightforward : 1. apply step1. model to regular/twitter urls. 2. rank url by sorting scores.
Regular Twitter
Methods – Ranking(con’t)
Experiments
• Dataset : – Queries : • only consider time-sensitive queries in one hour.
– Regular URL : • in the search engine index during one hour.
– Twitter URL : • 9-hour period before the query time.
Experiments(con’t)• Label : – query-url pairs: perfect, excellent, good, fair, bad.– documents :
提升 fresh降低 out of date
Experiments(con’t)
Experiments(con’t)Evaluation :
Experiments(con’t)
• Q :
Experiments(con’t)Mregular = content + aggreMcontent = content onlyMtwitter = twitter only
Regular Twitter
Content Feature(ex. term info )
O O
Aggregate(ex. PageRank)
O poor
Twitter Features No Have
• Result:
Experiments(con’t)
• Feature Importance
Authority & activity of users
Q&A