47
20th Oct 2008 Colin Bird, © 2008 IBM Corp oration 1 Seek and ye might find The universe is finite … or is it?

20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Embed Size (px)

Citation preview

Page 1: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

20th Oct 2008 Colin Bird, © 2008 IBM Corporation 1

Seek and ye might find

The universe is finite … or is it?

Page 2: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 220th Oct 2008

Why this talk?

e-Research

course

Extreme Blue

Retrievability issue

Ideas

Page 3: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 320th Oct 2008

Retrievability in context

A Big issue IBM customers Google but …

Information availability Cliché Presumption of need

Page 4: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 420th Oct 2008

The endless cycle of idea and action, Endless invention, endless experiment, ... All our knowledge brings us nearer to our ignorance ... Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?

T. S. Eliot – The Rock (1934)

Page 5: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 520th Oct 2008

Product information

Concepts

Understanding

(?)

Primarily task-based

IEHS

Page 6: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 620th Oct 2008

Retrieval with IEHS

Navigation

Search [Help System]

Links

Page 7: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 720th Oct 2008

Search and ye shall find …… or shall ye?

The hits contain the search terms, but are not necessarily about them.

RelevantDocuments

HitsRelevant Hits

Recall = RH / RD Precision = RH / H Minimise H-RH

Also important: a manageable number of hits!

Page 8: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 820th Oct 2008

Improving retrieval with IEHS

Facet browsing

Classified with a taxonomy

Page 9: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 920th Oct 2008

Principles of faceted classification

Categories: Fundamental to the

domain Often class hierarchies Almost always

mutually exclusive

Intersections SWED … Still TMI? Why do we classify

things?What are we hoping for?

Page 10: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 1020th Oct 2008

Why do we …?

organise things

put things into

categories

look for patterns

arrange things to fit our models

Page 11: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 1120th Oct 2008

“Classification, broadly defined, is the act of organising the universe of knowledge into some systematic order. It has been considered the most fundamental activity of the human mind.”

Lois Mai Chan (Library Science expert)

“People naturally group things into classes … By dealing with classes rather than individual things, we can organize our knowledge of the world in a concise and practical way. Faced with an escaped Rhinoceros, it's helpful to quickly

think, ‘Dangerous wild animal - take cover’. “

Paul Englefield (IBM colleague)

Page 12: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 1220th Oct 2008

We classify …

To make information and knowledge:Easier to findQuicker to find

To model understandingLinnaeus – classification of speciesSynonyms

Characterize Categorize

Page 13: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 1320th Oct 2008

Scientists …

“Scientific method depends upon increasingly more sophisticated characterizations of subjects of the investigation.” [Wikipedia]

Organise Explain Predict Experiment

Page 14: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 1420th Oct 2008

Page 15: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 1520th Oct 2008

Making information easier to find

Narrow the browsing space: Use metadata to describe what information is about Exploit the metadata that others have provided

Classification can at least help to structure the information universe into manageable chunks.

Astronomers say the universe is finite, which is a comforting thought for those people who

can't remember where they leave things.

Woody Allen

Page 16: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 1620th Oct 2008

Two forms of narrowing

Labelling: the system identifies the items that are about the subjects users are interested in

Filtering: the system withholds information that isdeemed to be not relevant

Can employ both

Labelled: relevant

Filtered to omit content that is

not relevant

Page 17: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 1720th Oct 2008

Labelling: Classification vs Indexing

A isAbout X Group similar items Offer items for selection

B isRelevantTo Y Distinguish similar items Facilitate rapid access to

a specific item of interest

Compare?

Contrast?

Scope of control

Indexing

Classification

Page 18: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 1820th Oct 2008

For classification to be effective,

We need the right: Categories Subjects Devices for capturing

the metadata

Page 19: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 1920th Oct 2008

Where do categories come from?

---------------

--------------------

---------------

----------

----------

----------

----------

----------

structure

How? Nature

orNurture?

Nature : self-organising? or

Nurture : invented and imposed?

Page 20: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 2020th Oct 2008

Dewey Decimal Classification: 200: Religion210 Natural theology

220 Bible

230 Christian theology

240 Christian moral & devotional theology

250 Christian orders & local church

260 Christian social theology

270 Christian church history

280 Christian sects & denominations

290 Other religions

Is this what you want in the 21st Century?Adapted from Clay Shirky’s Writings

Page 21: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 2120th Oct 2008

Organising classification metadata

Controlled vocabulary Consistent Unambiguous

Structure Levels of detail Exploit relationships

Options: Term list Taxonomy Ontology Folksonomy – how does this fit in?

Page 22: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 2220th Oct 2008

Information systems context:A classification of concepts into a hierarchical

structure according to whether the concepts are more general or more specific

A biological definition: “A classification of living organisms into a

hierarchical structure of species, genera, families etc.”

Taxonomy

Page 23: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 2320th Oct 2008

Folksonomy [Social tagging]

Familiar representation End-user participation Share descriptions

Related terms (not synonyms) Process vs Outcome Vocabulary space

Collaborative tagging Knowledge organisation and discovery Community control

Page 24: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 2420th Oct 2008

Vocabulary control - for and against

Context: delivering online informationDeclaration of coverageRetrieval of topics about a subject

Precision vs Recall Rigid structures vs Flexibility

Classification Community organising Content-based search

Page 25: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 2520th Oct 2008

Information classification in IBM context

Information centers (IEHS) and ibm.com Taxonomy structure

Vocabulary control (across IBM)Extension modelGovernance process

…… is this enough?

Page 26: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 2620th Oct 2008

Towards an alliance Both seek to enhance retrievability Taxonomies provide a consistent and unambiguous

structure. Folksonomies involve real users, are initially uncontrolled,

but the community exerts control over time One regrettable and overly pessimistic distinction:

Classification imposes structure (and so freezes content)

Static

Community organising encourages end-user input

Dynamic

Page 27: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 2720th Oct 2008

Basis for an Extreme Blue

project

Taxonomy IEHS

Facet browser

Collaborative taggingFolksonomy

Added value:• retrieval

• user insights

Retrievability issue

Page 28: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 2820th Oct 2008

Some alliance issues Points of difference:

Consistency & Ambiguity Control Imposed structures versus end-user input

Usability, in a wide sense of the term Scalability, particularly if number of terms in the folksonomy

exceeds significantly the number of subjects in the taxonomy Does the approach scale to communities larger than a group of like-

minded individuals? What life-cycle model is appropriate for folksonomy-type metadata? Relationships between social tagging and facet browsing, and other

forms of information filtering, considering also the governance of those relationships.

Reliability of user tagging behaviour indicators when considering the information structure and architecture.

Page 29: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 2920th Oct 2008

Enhanced Tagging for Discovery (EnTag) "combination and comparison of controlled and

folksonomy approaches" "attempting to get the best of both worlds“

TAXONOMY DIRECTED FOLKSONOMIES Integrating user tagging and controlled vocabularies

for Australian education networks

“How Semantic Tagging Increases Findability”http://www.hedden-information.com/

articles.htm

Not alone …

Page 30: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 3020th Oct 2008

The Extreme Blue Programme

Premiere IBM Summer Vacation Scheme Brightest & best students! Real IBM incubator projects

Team based projects Business & technical students Mentors

12 Weeks, June – September Culminates in EMEA Expo Worldwide…

Page 31: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 3120th Oct 2008

Extreme Blue Worldwide…

Austin

San Jose

Raleigh

Dublin

Hursley

La Gaude

Amsterdam

Böblingen

Beijing

Bangalore

Toronto

Brazil

Page 32: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 3220th Oct 2008

Why Extreme Blue?

Projects

Patents

People

Media

TalentInnovation Proofs of concepts

Works for IBM Could work for anyone Requires commitment

Page 33: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 3320th Oct 2008

sTAGr

Maybe a few words about the origin of the name …

Page 34: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 3420th Oct 2008

What did sTAGr aim to do?

Formal and informal – both options Insights from tags used and topics tagged Dynamic vocabularies – explore issues

Individuals and groups have a significantly better prospect of locating the information they need if they contribute to the

classification and organisation of the

information

Page 35: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 3520th Oct 2008

sTAGr = social tagging in IEHS Technical stuff

Back-end server to store the folksonomyTagging interfaces and “tag analysis” Investigate the potential issues

Usability testing Business student:

Internal marketingWhitepaper

Page 36: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 3620th Oct 2008

Tagging UI locations

Page 37: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 3720th Oct 2008

Links to Tag Clouds

Sections Minimise

Help Page

Visible Ratings See All Tags on

Topic

Tag UI

Page 38: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 3820th Oct 2008

Project Findings

100%

Faster100%

Faster

Page 39: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 3920th Oct 2008

Résumé

Adding a tagging facility does improve retrievability, but …

Original justification included the phrase:“smarter and more responsive routes to information discovery”

Can the alliance of a folksonomy and a formal taxonomy further enhance retrievability?

Page 40: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 4020th Oct 2008

Tag “analysis”

Similarly-tagged topics and similar tags Synonyms Homographs

Structure from tag sets

Problems: Unstable folksonomy Autotagger distortion

Page 41: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 4120th Oct 2008

Dynamic vocabularie

s

Combined interface

Informing the taxonomy

Informing the information architecture

How about the original ideas …

Page 42: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 4220th Oct 2008

In the crystal ball …

Potential developments from sTAGr:

1. Adaptive browsing, using facet browser interfaces that users personalise by combining their own tags with the subject entry points provided by the formal taxonomy

2. Dynamic vocabularies that present users with a merger of formal and informal classification terms, generated dynamically each time a given user accesses the information center

3. Batch analysis of user tagging behaviour to generate recommendations about information restructuring and subject coverage, thereby enabling the information itself to evolve according to user practice

4. … and a new name

Page 43: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 4320th Oct 2008

Choice:Combined interface

DynamicvocabularyConsistent

Unambiguous

User inputRelevance

Personalize

or

Exploit other users’ tags

Find entry point

Facet browser

Page 44: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 4420th Oct 2008

Informing the taxonomy

Controlled vocabulary, slow to changeUnambiguous definition, maybe scope noteMOAT - http://moat-project.org/

Populating facetsExtension taxonomiesSparse classification: entry pointsTags become candidate subjectsTag rating: collaborative selection

Page 45: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 4520th Oct 2008

Tag “analysis” revisited Informing the taxonomy

Eliciting emergent structure within the folksonomy Identifying potential synonyms and homographs Discovering relationships between tags and subjects

in the taxonomy, exploiting the topic structure to do so Informing the information architecture

Analysing user tagging behaviour to generate recommendations about the structure of the information space

Analysing user tagging behaviour to generate recommendations about the coverage of the taxonomy and the relevance of some terms within it

Page 46: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 4620th Oct 2008

Acknowledgements

Timothy Catt Tom Clabon David Rankine James Thompson

Cerys Giddings Scott Couper1

Loughborough University

University of Bristol

University of St. Andrews

University of York

IBM Technical mentor

IBM Business mentor

Page 47: 20th Oct 2008Colin Bird, © 2008 IBM Corporation 1 Seek and ye might find The universe is finite … or is it?

Colin Bird, © 2008 IBM Corporation 4720th Oct 2008

?