Upload
jin-young-kim
View
5.167
Download
7
Embed Size (px)
Citation preview
HUMANE INFORMATION SEEKING:
GOING BEYOND THE IR WAY
JIN YOUNG KIM @ SNU DCC
1
Jin Young Kim
• Graduate of SNU EE / Business
• 5th Year Ph.D Student in UMass Computer Science
• Starting as a Applied Researcher at Microsoft Bing
2
Today’s Agenda
• A brief introduction of IR as a research area
• An example of how we design a retrieval model
• Other research projects and recent trends in IR
3
BACKGROUND
An Information Retrieval Primer
4
Information Retrieval?
• The study of how an automated system can enable
its users to access, interact with, and make sense of
information.
5
User
Query
DocumentVisit
IssueSurface
IR Research in Context
• Situated between human interface and
system / analytics research
• Aims at satisfying user’s information needs
• Based on large-scale system infrastructure & analytics
• Need for convergence research!
6
Information Retrieval
Large-scale
System Infra.
Large-scale
(Text)Analytic
s
End-user Interface
(UX / HCI / InfoViz)
Major Problems in IR
• Matching
• (Keyword) Search : query – document
• Personalized Search : (user+query) – document
• Contextual Advertising : (user+context) – advertisement
• Quality
• Authority/ Spam / Freshness
• Various ways to capture them
• Relevance Scoring
• Combination of matching and quality features
• Evaluation is critical for optimal performance
7
User
Query
DocumentVisit
IssueSurface
HUMANE INFORMATION
RETRIEVAL
Going Beyond the IR Way
8
9
You need the freedom of expression.You need someone who understands.
Information seeking requires a communication.
10
Information Seeking circa 2012
Search engine accepts keywords only.Search engine doesn’t understand you.
11
Toward Humane Information Seeking
Rich User Interactions
Rich User Modeling
Profile
Context
Behavior
Search
Browsing
Filtering
from Query to SessionRich User ModelingHCIR Way:
12
Action Response
Action Response
Action Response
USER SYSTEM
Interaction
History
Filtering / Browsing
Relevance Feedback
…
Filtering Conditions
Related Items
…
User
Model
Rich User InteractionIR Way:The
ProfileContextBehavior
HCIR = HCI + IR
13
The Rest of Talk…
Web Search
Personal SearchImproving search and browsing for known-item findingEvaluating interactions combining search and browsing
User modeling based on reading level and topicProviding non-intrusive recommendations for browsing
Book SearchAnalyzing interactions combining search and filtering
PERSONAL SEARCH
Retrieval And Evaluation Techniques
for Personal Information [Thesis]
14
Example: Desktop Search
15
Example: Search over Social Media
Ranking using Multiple
Document Types for
Desktop Search [SIGIR10]
Evaluating Search in
Personal Social Media
Collections [WSDM12]
Structured Document Retrieval: Background
• Field Operator / Advanced Search Interface
• User’s search terms are found in multiple fields
16
Understanding Re-finding Behavior in Naturalistic Email
Interaction Logs. Elsweiler, D, Harvey, M, Hacker., M [SIGIR'11]
Structured Document Retrieval: Models
• Document-based Retrieval Model
• Score each document as a whole
• Field-based Retrieval Model
• Combine evidences from each field
q1 q2 ... qm
Document-based Scoring Field-based Scoring
f1
f2
fn
...
q1 q2 ... qm
f1
f2
fn
...
f1
f2
fn
...
w1
w2
wn
w1
w2
wn
17
18
1
1
2
21
2
• Field Relevance
• Different field is important for different query-term
‘james’ is relevant
when it occurs in <to>
‘registration’ is relevant
when it occurs in <subject>
Improved Matching for Email SearchStructured Documents[CIKM09, ECIR09,12]
Estimating the Field Relevance
• If User Provides Feedback
• Relevant document provides sufficient information
• If No Feedback is Available
• Combine field-level term statistics from multiple sources
19
content
title
from/to
Relevant Docs
content
title
from/to
Collection
content
title
from/to
Top-k Docs
+ ≅
20
Retrieval Using the Field Relevance
• Comparison with Previous Work
• Ranking in the Field Relevance Model
q1 q2 ... qm
f1
f2
fn
...
f1
f2
fn
...
w1
w2
wn
w1
w2
wn
q1 q2 ... qm
f1
f2
fn
...
f1
f2
fn
...
P(F1|q1)
P(F2|q1)
P(Fn|q1)
P(F1|qm)
P(F2|qm)
P(Fn|qm)
Per-term Field Weight
Per-term Field Score
sum
multiply
• Retrieval Effectiveness (Metric: Mean Reciprocal Rank)
DQL BM25F MFLM FRM-C FRM-T FRM-R
TREC 54.2% 59.7% 60.1% 62.4% 66.8% 79.4%
IMDB 40.8% 52.4% 61.2% 63.7% 65.7% 70.4%
Monster 42.9% 27.9% 46.0% 54.2% 55.8% 71.6%
Evaluating the Field Relevance Model
21
40.0%
45.0%
50.0%
55.0%
60.0%
65.0%
70.0%
75.0%
80.0%
DQL BM25F MFLM FRM-C FRM-T FRM-R
TREC
IMDB
Monster
Fixed Field Weights Per-term Field Weights
Evaluation Challenges for Personal Search• Evaluation of Personal Search
• Each based on its own user study
• No comparative evaluation was performed yet
• Solution: Simulated Collections
• Crawl CS department webpages, docs and calendars
• Recruit department people for user study
• Collecting User Logs
• DocTrack: a human-computation search game
• Probabilistic User Model: a method for user simulation
22
[CIKM09,SIGIR10,CIKM11]
DocTrack Game
23
Find It!
Target Item
Summary so far…
• Query Modeling for Structured Documents
• Using the estimated field relevance improves the retrieval
• User’s feedback can help personalize the field relevance
• Evaluation Challenges in Personal Search
• Simulation of the search task using game-like structures
• Related work : ‘Find It If You Can’ [SIGIR11]
24
WEB SEARCH
Characterizing Web Content, User Interests, and
Search Behavior by Reading Level and Topic
25
[WSDM12]
Reading level distribution varies across major topical categories
User Modeling by Reading Level and Topic• Reading Level and Topic
• Reading Level: proficiency (comprehensibility)
• Topic: topical areas of interests
• Profile Construction
• Profile Applications
• Improving personalized search ranking
• Enabling expert content recommendation
P(R|d1) P(T|d1)P(R|d1) P(T|d1)
P(R|d1) P(T|d1) P(R,T|u)
Profile matching can predict user’s preference over search results• Metric
• % of user’s preferences predicted by profile matching
• Profile matching measured in KL-Divergence of RT profiles
• Results
• By the degree of focus in user profile
• By the distance metric between user and website
User Group #Clicks KLR(u,s) KLT(u,s) KLRT(u,s)
↑Focused 5,960 59.23% 60.79% 65.27%
147,195 52.25% 54.20% 54.41%
↓Diverse 197,733 52.75% 53.36% 53.63%
Comparing Expert vs. Non-expert URLs• Expert vs. Non-expert URLs taken from [White’09]
Higher Reading Level
Low
er To
pic
Div
ersity
Enabling Browsing for Web Search
• SurfCanyon®
• Recommend results
based on clicks
30
Initial results indicate that
recommendations are useful
for shopping domain.
[Work-in-progress]
BOOK SEARCH
Understanding Book Search Behavior on the Web
31
[Submitted to SIGIR12]
Understanding Book Search on the Web
• OpenLibrary
• User-contributed online digital library
• DataSet: 8M records from web server log
32
Comparison of Navigational Behavior
• Users entering directly show different behaviors from
users entering via web search engines
33
Users entering the site directly Users entering via Google
Comparison of Search Behavior
34
Rich interaction reduces the query lengthsFiltering induces more interactions than search
LOOKING ONWARD
35
Where’s the Future? – Social Search
• The New Bing Sidebar makes search a social activity.
36
Where’s the Future? – Semantic Search
• The New Google serves ‘knowledge’ as well as docs.
37
Where’s the Future? – Siri-like Agent
• The New Google serves ‘knowledge’ as well as docs.
38
Exciting Future is Awaiting US!
• Recommended Readings in IR:
• http://www.cs.rmit.edu.au/swirl12
39
Any
Questions
?
Selected Publications
• Structured Document Retrieval• A Probabilistic Retrieval Model for Semi-structured Data [ECIR09]
• A Field Relevance Model for Structured Document Retrieval [ECIR11]
• Personal Search• Retrieval Experiments using Pseudo-Desktop Collections [CIKM09]
• Ranking using Multiple Document Types in Desktop Search [SIGIR10]
• Building a Semantic Representation for Personal Information [CIKM10]
• Evaluating an Associative Browsing Model for Personal Info. [CIKM11]
• Evaluating Search in Personal Social Media Collections [WSDM12]
• Web / Book Search• Characterizing Web Content, User Interests, and Search Behavior by
Reading Level and Topic [WSDM12]
• Understanding Book Search Behavior on the Web [In submission to SIGIR12]
40
More at @lifidea, or
cs.umass.edu/~jykim
My Self-tracking Efforts
• Life-optimization Project (2002~2006)
• LiFiDeA Project (2011-2012)
41
OPTIONAL SLIDES
42
The Great Divide: IR vs. HCI
IR
• Query / Document
• Relevant Results
• Ranking / Suggestions
• Feature Engineering
• Batch Evaluation (TREC)
• SIGIR / CIKM / WSDM
HCI
• User / System
• User Value / Satisfaction
• Interface / Visualization
• Human-centered Design
• User Study
• CHI / UIST / CSCW
43
Can we learn from each other?
The Great Divide: IR vs. RecSys
IR
• Query / Document
• Reactive (given query)
• SIGIR / CIKM / WSDM
RecSys
• User / Item
• Proactive (push item)
• RecSys / KDD / UMAP
44
The Great Divide: IR in CS vs. LIS
IR in CS
• Focus on ranking &
relevance optimization
• Batch & quantitative
evaluation
• SIGIR / CIKM / WSDM
• UMass / CMU /
Glasgow
IR in LIS
• Focus on behavioral
study & understanding
• User study & qualitative
evaluation
• ASIS&T / JCDL
• UNC / Rutgers / UW
45
• What
• How
Problems & Techniques in IR
Data Format (documents, records and linked data) /
Size / Dynamics (static, dynamic, streaming)
User &
Domain
End User (web and library)
Business User (legal, medical and patent)
System Component (e.g., IBM Watson)
Needs Known-item vs. Exploratory Search
Recommendation
46
System Indexing and Retrieval
(Platforms for Big Data Handling)
Analytics Feature Extraction
Retrieval Model Tuning & Evaluation
Presentation User Interface
Information Visualization
More about the Matching Problem
• Finding Representations
• Term vector vs. Term distribution
• Topical category, Reading level, …
• Estimating Representations
• By counting terms
• Using automatic classifiers
• Calculating Matching Scores
• Cosine similarity vs. KL-divergence
• Combining multiple reps.
47
User
Query
DocumentVisit
IssueSurface