1
Pengjie Ren, Zhumin Chen and Jun MaInformation Retrieval Lab.Shandong University报告人:任鹏杰2013 年 11 月 18日
Understanding Temporal Intent of User Query based on Time-based Query Classification
2
Outline
Why Temporal Intent Detection?Query Temporal Pattern TaxonomyQuery Pattern Detection FrameworkExperiment ResultsApplicationConclusion and Future Work
3
Outline
Why Temporal Intent Detection?Query Temporal Pattern TaxonomyQuery Pattern Detection FrameworkExperiment ResultsApplicationConclusion and Future Work
4
Why Temporal Intent Detection?
Richard McCreadie SIGIR 2013 Users tend to prefer rankings that integrate tweets or newswire articles soon after an event breaks, and blogs and Wikipedia pages become more useful over time.
Automatic temporal intent detection is very significant for time-sensitive information retrieval, temporal diversity etc.!
Hideo Joho WWW 201348.2% seek for information about the same day as they perform the search;32.7% look for past information;8.1% look for future information;10.9% say that their information needs do not have specific temporal attributes.
5
Outline
Why Temporal Intent Detection?Query Temporal Pattern TaxonomyQuery Pattern Detection FrameworkExperiment ResultsApplicationConclusion and Future Work
6In this paper, we propose an approach to identify the different temporal patterns automatically.
Different Temporal Patterns Imply Different Temporal Intents
Kulkarni A et al. (WSDM 2011) find some temporal patterns of query through mining query logs.
However, they do not propose methods to identify those patterns automatically.
Query frequency Curves from Google Trend
7
Query Temporal Pattern Taxonomy
Java JDK
Haiti Earthquake
Christmas PresentEarthquake
Clearly, we can use spikes to detect query temporal patterns.
8
What is a Spike?
A spike is a set of continuous points on the query frequency curve that burst singularly. Generally, it represents an event.
Spikes are hard to be detected effectively and precisely. Specially, we found it not effective to learn a cutting line to identify all spikes.
Southeast Asia Earthquake
Pakistan earthquake
China earthquake
Haiti earthquake
Japan earthquake
Virginia earthquake
9
Outline
Why Temporal Intent Detection?Query Temporal Pattern TaxonomyQuery Pattern Detection FrameworkExperiment ResultsApplicationConclusion and Future Work
10
Query Classification System
Query Pattern Detection Framework
Training Set
Query Log
Feature Extraction
Query frequency curves
Query
Classifier(SVM)
QueryPattern
Preprocess
11
(1). Preprocess
ttt YsmF
Trend ComponentSeasonal Component
Random Component
Use polynomial regression to model Trend Component.
According to time series analysis, any curve contains three components.
This is what we care in this paper.So we should remove Trend Component.
;T ξmt xw
12
);,,0|(~ ξ StξWe use Student-t Distribution instead of Gaussian Distribution because we do not have exact training data pair (X, mt). We have to use (X,F) instead.Thus, St and Yt components become noise when training. Student-t Distribution is more robust to noise than Gaussian Distribution.
From PRML
Student-t
Gaussian
noise
without noise both work well
||;||2
1),,|(log),,(
1
T wxww
n
iiifStL Log likelihood loss function
(1). Preprocess
13
Original Query Curve
Trend Component
Seasonal & Random Component
(1). Preprocess
14
(2). Feature Extraction
MeanStandard DeviationMR (Max Rate)SR (Spike Rate)
Basic Features
Curve Distance Features
Regression Features
For preprocessed query frequency curves, we define following features.
DQoT
DOQ
DAMQ
DPMQ
CutoffSpikesPD(Periodic Deviation)
15
MR (Max Rate)
tt
M
f
fMR
16
SR (Spike Rate)
tt
mMMMMmMNM
f
fffffffffSR
}),...,,,,...,{},...,,max({ 1121
}),...,,max({ 21 NM ffff
MQ
OQ QoT
m is half the period of a spike.
17
How to determine the value of m?
MjMMMMiM frfffff },...,,,,...,{ 11
SR (Spike Rate)
18
Distance between Two Curves
Fiq :shifting time series Fi by q time units.
|| || :the l2 norm.
This measure finds the optimal alignment (translation q) and the scaling coefficient α for matching the shapes of the two time series. It is difficult to find the optimum solution. In practice, we shift all possible q to find the approximation solution.
)1)(||1||
||21||(min)2,1(tan
, F
FFFFceDis q
q
Jaewon Yang and Jure Leskovec. Patterns of temporal variation in online media. WSDM, 2011.
19Jaewon Yang and Jure Leskovec. Patterns of temporal variation in online media. WSDM, 2011.
Distance between Two Curves
20
DQoT DOQ DAMQ DPMQ
DQoT: Average distance from annotated QoT curves.DOQ : Average distance from annotated OQ curves.DAMQ : Average distance from annotated AMQ curves.DPMQ : Average distance from annotated PMQ curves.
Similar to KNN but cost much less time.
21
Cutoff Spikes PD
What about training data? (F, Cutoff) pair is not known.
XWTCutoff
PD: Measure periodicity…… …… …………
Spikes: Number of spikes…… …………
Above 8 features are combined to learn a cutting off line
We can use annotated pair (F, Pattern Category) to approximate (F, Cutoff).
For this curve, because we annotate it as MQ, the cutoff value line in the pink area.
22
Outline
Why Temporal Intent Detection?Query Temporal Pattern TaxonomyQuery Pattern Detection FrameworkExperiment ResultsApplicationConclusion and Future Work
Experiment Results
5,000 queries from Query Track 07-09 of TREC.Corresponding query frequency files from Google Trends.Manually annotate categories of these queries in terms of their frequency curves.5-fold
Query Class QoT OQ AMQ PMQ average
P 0.952 0.928 0.846 0.914 0.910
R 0.973 0.915 0.831 0.924 0.911
F1 0.962 0.922 0.838 0.919 0.910
Classification Performance Comparison for Different Query Categories
AMQ
PMQ
QoT
OQ
24
Feature Effectiveness Analysis
25
Outline
Why Temporal Intent Detection?Query Temporal Pattern TaxonomyQuery Pattern Detection FrameworkExperiment ResultsApplicationConclusion and Future Work
26
Application – Temporal Diversity
Temporal intents of user query are uncertain, we should diversify the search results in time dimension in order to cover more important time unit of user query.
Tt Zz Sd
ztqdPqzPqtPqSP ))),,|(1(1)(|()|()|(
Temporal Intent Coverage
Subtopic Coverage
Novelty
27
Application – Temporal Diversity
MMR SIGIR’98xQuAD WWW’10IA-Select WSDM’09LM+T+D SIGIR’13RM+T+S+D Our method
28
Outline
Why Temporal Intent Detection?Query Temporal Pattern TaxonomyQuery Pattern Detection FrameworkExperiment ResultsApplicationConclusion and Future Work
Conclusion
We shift the problem of temporal intents detection to classification problem.
We propose effective features to detect temporal intents effectively.
We imply temporal intents results to temporal diversity and achieve high performance.
29
30
Future Work
More Effective FeaturesData sparse problem for long queries
31
Thanks a lot for your attention!