Considering a Faceted Search-based Model Marti Hearst UCB SIMS hearst@sims.berkeley.edu NAS CSTB DNS...

Preview:

Citation preview

Considering a Faceted Search-based Model

Marti HearstUCB SIMS

hearst@sims.berkeley.edu

NAS CSTB DNS Meeting on

Internet Navigation and the Domain Name System:

Technical Alternatives and Policy Implications

July 12, 2001

Outline

The Klensin proposal Synopsis Issues Recommendations

UIs and faceted search

A Proposal

A Search-based access model for the DNSIETF Internet-Draft by John Klensinhttp://www.ietf.org/internet-drafts/draft-klensin-dns-search-00.txt

A multi-layer approach to naming Faceted descriptions are used to facilitate both

flexible naming and inexact search

This talk: What does research tell us about the search issues?

Faceted Classification System(simple, regulated)

Free-textSearch

(unregulated)

DNS (unchanged)

Faceted System(detailed, unregulated)

Klensin’s proposal

Search

Lookup

Layer 2

IndustryCategory:Restaurant

Geolocation:Miami

Language:Spanish

NetworkLocation

Name:Jose’s Pizza

Faceted System(simple, regulated)

Layer 2

Inputs: search values for one or more facets

Outputs: appropriate DNS namesand all tuples with matched facets

Allow for partial (fuzzy) match

Jose’s Pizza, MiamiAlberto’s Pizza, MiamiJose’s Bistro, MiamiJose’s Pizza, SaratogaJoe’s Pizza, Miami…

Jose’s Pizza, MiamiAlberto’s Pizza, MiamiJose’s Bistro, MiamiJose’s Pizza, SaratogaJoe’s Pizza, Miami…

Layer 2 Selling Points

Allows sharing of name space among different (commercial) entities

Allows specification according to meaningful attributes

Layer 2 DNS Issues

How to guarantee uniqueness? How to determine appropriate

descriptors? How to use in a hyperlink? Requires a user interface for

confirmation of correct choice

Layer 2 Descriptor Issues

Emphasis on geolocation may be problematic

May be too spareSFMOMASFMOMA exhibitsSFMOMA exhibit on digital art called 101010

Faceted System(detailed, unregulated)

Layer 3

Not centrally coordinated(provided by commercial services)

More detailed facetsAllow for inheritanceContext-sensitive(e.g., restaurant has menu attribute auto repair has services, etc.)

Inputs: service-dependentOutputs: layer 2 names

Free-textSearch

(unregulated)

Layer 4

Use standard search to find sites that discuss topics that relate to the query (as web search works today)

Relation to Web Search

Web search is perceived to work better today than two years ago. Why? Finds appropriate starting points

Also known as source selection Search for “toyota” no longer returns “Tony’s Toyota pages” as

the top-ranked hit Before the web, source selection was a

separate operation from free text search Also, queries tended to be longer

Web search engines could do this exclusively – but they do other things as well.

Recommendations on Klensin Proposal

A promising, intriguing approach One tweak:

Combine layers 2 and 3 Have a partly regulated portion, and

an open portionThis however is susceptible to spamming

Not clear if this should be regulated

General Pitfalls ofControlled Vocabularies

Difficult to get agreement on the set of labels

Difficult to assign labels consistently Granularity Salience / Emphasis Context Connotations

New labels always appearing; old ones shift in meaning

Lay people won’t know the system

The Wron

How to do it wrongForce into a Hierarchy

Let’s try to find UCB

The Wron

How to do it wrong

The Wron

How to do it wrong

What is the problem?

Two deeply hierarchical facets Region Education

Forced in convoluted ways into one hierarchy with irregular cross links

Two Approaches

Statistical approaches map words into metadata terms

Create flexible user interfaces that progressively reveal appropriate subparts of the system (How to do so is a topic of our research.)

The Practice

Using descriptors “under the hood” The limited empirical work indicates

Combining free text + descriptors works best Some e-commerce sites do this for finding

products Can sometimes match queries to standard

information needs “buy” + “palm” “review” + “crouching tiger” “berkeley” + “gap”

The Wron

walmart.comUses metadata

“under the hood”

The Promise

Using descriptors in the User Interface

Use faceted metadata for navigation Query Previews Tailored Search Forms Tightly Combine Navigation & Search

Facets

Orthogonal sets of descriptors Gets complicated when they are

hierarchical Example: recipes

Metadata Facets

Time/Date Topic TaskGeoRegion

Advantage: Great for Mixing and Matching

Faceted Recipe Metadata

PrepareCuisineIngredient Dish

Recipe

The Wron

Sunset.comNot the right way

Dynamic Previews

Avoid empty results sets Show the possible next steps A way to seamlessly integrate

Related topics User preferences (personalization) Context-sensitivity

The Wron

The Wron

The Wron

The Wron

Metadata Usage in Epicurious

Can choose category types in any order But categories never more than one level

deep And can never use more than one instance

of a category Even though items may be assigned more than

one of each category type Items (recipes) are dead-ends

Don’t link to “more like this” Not fully integrated with search

The Wron

Epicurious Metadata UsageProblem: lacks integration with search

The Wron

This is fixed in marthastewart.com

The Wron

Advanced search more specific than sunset.com;

also allows for disjunction;

thus less likely to get null

results

UIs for faceted metadata

Use dynamic previews Allow user to select metadata in any order At each step, show different types of relevant

metadata, based on prior steps and personal history, include # of documents

Previews restricted to only those metadata types that might be helpful

Tightly integrate with keyword search

The Flamenco Research Project

Systematically determine what works for integrating metadata into search interfaces

Develop recommendations that reflect both the task structure and the richness of the information structure

http://bailando.sims.berkeley.edu/flamenco.html

Summary

Agreement on metadata descriptors assignment is difficult to achieve Descriptors need to be constantly updated Layer 2 is probably not rich enough

Assigning specifiers is quite different than searching for specified items

Fuzzy search can help, but Requires a UI for confirmation of correct choices This will end up looking like a search service Can make search more meaningful and task-based

Summary

Web search engines can do source selection, but Sometimes users do want source selection, But often search hits based on content of

pages is often closer to what users want to do

We need to be certain not to confuse source selection from content search