Upload
tracy-maxwell
View
214
Download
1
Embed Size (px)
Citation preview
20th Oct 2008 Colin Bird, © 2008 IBM Corporation 1
Seek and ye might find
The universe is finite … or is it?
Colin Bird, © 2008 IBM Corporation 220th Oct 2008
Why this talk?
e-Research
course
Extreme Blue
Retrievability issue
Ideas
Colin Bird, © 2008 IBM Corporation 320th Oct 2008
Retrievability in context
A Big issue IBM customers Google but …
Information availability Cliché Presumption of need
Colin Bird, © 2008 IBM Corporation 420th Oct 2008
The endless cycle of idea and action, Endless invention, endless experiment, ... All our knowledge brings us nearer to our ignorance ... Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?
T. S. Eliot – The Rock (1934)
Colin Bird, © 2008 IBM Corporation 520th Oct 2008
Product information
Concepts
Understanding
(?)
Primarily task-based
IEHS
Colin Bird, © 2008 IBM Corporation 620th Oct 2008
Retrieval with IEHS
Navigation
Search [Help System]
Links
Colin Bird, © 2008 IBM Corporation 720th Oct 2008
Search and ye shall find …… or shall ye?
The hits contain the search terms, but are not necessarily about them.
RelevantDocuments
HitsRelevant Hits
Recall = RH / RD Precision = RH / H Minimise H-RH
Also important: a manageable number of hits!
Colin Bird, © 2008 IBM Corporation 820th Oct 2008
Improving retrieval with IEHS
Facet browsing
Classified with a taxonomy
Colin Bird, © 2008 IBM Corporation 920th Oct 2008
Principles of faceted classification
Categories: Fundamental to the
domain Often class hierarchies Almost always
mutually exclusive
Intersections SWED … Still TMI? Why do we classify
things?What are we hoping for?
Colin Bird, © 2008 IBM Corporation 1020th Oct 2008
Why do we …?
organise things
put things into
categories
look for patterns
arrange things to fit our models
Colin Bird, © 2008 IBM Corporation 1120th Oct 2008
“Classification, broadly defined, is the act of organising the universe of knowledge into some systematic order. It has been considered the most fundamental activity of the human mind.”
Lois Mai Chan (Library Science expert)
“People naturally group things into classes … By dealing with classes rather than individual things, we can organize our knowledge of the world in a concise and practical way. Faced with an escaped Rhinoceros, it's helpful to quickly
think, ‘Dangerous wild animal - take cover’. “
Paul Englefield (IBM colleague)
Colin Bird, © 2008 IBM Corporation 1220th Oct 2008
We classify …
To make information and knowledge:Easier to findQuicker to find
To model understandingLinnaeus – classification of speciesSynonyms
Characterize Categorize
Colin Bird, © 2008 IBM Corporation 1320th Oct 2008
Scientists …
“Scientific method depends upon increasingly more sophisticated characterizations of subjects of the investigation.” [Wikipedia]
Organise Explain Predict Experiment
Colin Bird, © 2008 IBM Corporation 1420th Oct 2008
Colin Bird, © 2008 IBM Corporation 1520th Oct 2008
Making information easier to find
Narrow the browsing space: Use metadata to describe what information is about Exploit the metadata that others have provided
Classification can at least help to structure the information universe into manageable chunks.
Astronomers say the universe is finite, which is a comforting thought for those people who
can't remember where they leave things.
Woody Allen
Colin Bird, © 2008 IBM Corporation 1620th Oct 2008
Two forms of narrowing
Labelling: the system identifies the items that are about the subjects users are interested in
Filtering: the system withholds information that isdeemed to be not relevant
Can employ both
Labelled: relevant
Filtered to omit content that is
not relevant
Colin Bird, © 2008 IBM Corporation 1720th Oct 2008
Labelling: Classification vs Indexing
A isAbout X Group similar items Offer items for selection
B isRelevantTo Y Distinguish similar items Facilitate rapid access to
a specific item of interest
Compare?
Contrast?
Scope of control
Indexing
Classification
Colin Bird, © 2008 IBM Corporation 1820th Oct 2008
For classification to be effective,
We need the right: Categories Subjects Devices for capturing
the metadata
Colin Bird, © 2008 IBM Corporation 1920th Oct 2008
Where do categories come from?
---------------
--------------------
---------------
----------
----------
----------
----------
----------
structure
How? Nature
orNurture?
Nature : self-organising? or
Nurture : invented and imposed?
Colin Bird, © 2008 IBM Corporation 2020th Oct 2008
Dewey Decimal Classification: 200: Religion210 Natural theology
220 Bible
230 Christian theology
240 Christian moral & devotional theology
250 Christian orders & local church
260 Christian social theology
270 Christian church history
280 Christian sects & denominations
290 Other religions
Is this what you want in the 21st Century?Adapted from Clay Shirky’s Writings
Colin Bird, © 2008 IBM Corporation 2120th Oct 2008
Organising classification metadata
Controlled vocabulary Consistent Unambiguous
Structure Levels of detail Exploit relationships
Options: Term list Taxonomy Ontology Folksonomy – how does this fit in?
Colin Bird, © 2008 IBM Corporation 2220th Oct 2008
Information systems context:A classification of concepts into a hierarchical
structure according to whether the concepts are more general or more specific
A biological definition: “A classification of living organisms into a
hierarchical structure of species, genera, families etc.”
Taxonomy
Colin Bird, © 2008 IBM Corporation 2320th Oct 2008
Folksonomy [Social tagging]
Familiar representation End-user participation Share descriptions
Related terms (not synonyms) Process vs Outcome Vocabulary space
Collaborative tagging Knowledge organisation and discovery Community control
Colin Bird, © 2008 IBM Corporation 2420th Oct 2008
Vocabulary control - for and against
Context: delivering online informationDeclaration of coverageRetrieval of topics about a subject
Precision vs Recall Rigid structures vs Flexibility
Classification Community organising Content-based search
Colin Bird, © 2008 IBM Corporation 2520th Oct 2008
Information classification in IBM context
Information centers (IEHS) and ibm.com Taxonomy structure
Vocabulary control (across IBM)Extension modelGovernance process
…… is this enough?
Colin Bird, © 2008 IBM Corporation 2620th Oct 2008
Towards an alliance Both seek to enhance retrievability Taxonomies provide a consistent and unambiguous
structure. Folksonomies involve real users, are initially uncontrolled,
but the community exerts control over time One regrettable and overly pessimistic distinction:
Classification imposes structure (and so freezes content)
Static
Community organising encourages end-user input
Dynamic
Colin Bird, © 2008 IBM Corporation 2720th Oct 2008
Basis for an Extreme Blue
project
Taxonomy IEHS
Facet browser
Collaborative taggingFolksonomy
Added value:• retrieval
• user insights
Retrievability issue
Colin Bird, © 2008 IBM Corporation 2820th Oct 2008
Some alliance issues Points of difference:
Consistency & Ambiguity Control Imposed structures versus end-user input
Usability, in a wide sense of the term Scalability, particularly if number of terms in the folksonomy
exceeds significantly the number of subjects in the taxonomy Does the approach scale to communities larger than a group of like-
minded individuals? What life-cycle model is appropriate for folksonomy-type metadata? Relationships between social tagging and facet browsing, and other
forms of information filtering, considering also the governance of those relationships.
Reliability of user tagging behaviour indicators when considering the information structure and architecture.
Colin Bird, © 2008 IBM Corporation 2920th Oct 2008
Enhanced Tagging for Discovery (EnTag) "combination and comparison of controlled and
folksonomy approaches" "attempting to get the best of both worlds“
TAXONOMY DIRECTED FOLKSONOMIES Integrating user tagging and controlled vocabularies
for Australian education networks
“How Semantic Tagging Increases Findability”http://www.hedden-information.com/
articles.htm
Not alone …
Colin Bird, © 2008 IBM Corporation 3020th Oct 2008
The Extreme Blue Programme
Premiere IBM Summer Vacation Scheme Brightest & best students! Real IBM incubator projects
Team based projects Business & technical students Mentors
12 Weeks, June – September Culminates in EMEA Expo Worldwide…
Colin Bird, © 2008 IBM Corporation 3120th Oct 2008
Extreme Blue Worldwide…
Austin
San Jose
Raleigh
Dublin
Hursley
La Gaude
Amsterdam
Böblingen
Beijing
Bangalore
Toronto
Brazil
Colin Bird, © 2008 IBM Corporation 3220th Oct 2008
Why Extreme Blue?
Projects
Patents
People
Media
TalentInnovation Proofs of concepts
Works for IBM Could work for anyone Requires commitment
Colin Bird, © 2008 IBM Corporation 3320th Oct 2008
sTAGr
Maybe a few words about the origin of the name …
Colin Bird, © 2008 IBM Corporation 3420th Oct 2008
What did sTAGr aim to do?
Formal and informal – both options Insights from tags used and topics tagged Dynamic vocabularies – explore issues
Individuals and groups have a significantly better prospect of locating the information they need if they contribute to the
classification and organisation of the
information
Colin Bird, © 2008 IBM Corporation 3520th Oct 2008
sTAGr = social tagging in IEHS Technical stuff
Back-end server to store the folksonomyTagging interfaces and “tag analysis” Investigate the potential issues
Usability testing Business student:
Internal marketingWhitepaper
Colin Bird, © 2008 IBM Corporation 3620th Oct 2008
Tagging UI locations
Colin Bird, © 2008 IBM Corporation 3720th Oct 2008
Links to Tag Clouds
Sections Minimise
Help Page
Visible Ratings See All Tags on
Topic
Tag UI
Colin Bird, © 2008 IBM Corporation 3820th Oct 2008
Project Findings
100%
Faster100%
Faster
Colin Bird, © 2008 IBM Corporation 3920th Oct 2008
Résumé
Adding a tagging facility does improve retrievability, but …
Original justification included the phrase:“smarter and more responsive routes to information discovery”
Can the alliance of a folksonomy and a formal taxonomy further enhance retrievability?
Colin Bird, © 2008 IBM Corporation 4020th Oct 2008
Tag “analysis”
Similarly-tagged topics and similar tags Synonyms Homographs
Structure from tag sets
Problems: Unstable folksonomy Autotagger distortion
Colin Bird, © 2008 IBM Corporation 4120th Oct 2008
Dynamic vocabularie
s
Combined interface
Informing the taxonomy
Informing the information architecture
How about the original ideas …
Colin Bird, © 2008 IBM Corporation 4220th Oct 2008
In the crystal ball …
Potential developments from sTAGr:
1. Adaptive browsing, using facet browser interfaces that users personalise by combining their own tags with the subject entry points provided by the formal taxonomy
2. Dynamic vocabularies that present users with a merger of formal and informal classification terms, generated dynamically each time a given user accesses the information center
3. Batch analysis of user tagging behaviour to generate recommendations about information restructuring and subject coverage, thereby enabling the information itself to evolve according to user practice
4. … and a new name
Colin Bird, © 2008 IBM Corporation 4320th Oct 2008
Choice:Combined interface
DynamicvocabularyConsistent
Unambiguous
User inputRelevance
Personalize
or
Exploit other users’ tags
Find entry point
Facet browser
Colin Bird, © 2008 IBM Corporation 4420th Oct 2008
Informing the taxonomy
Controlled vocabulary, slow to changeUnambiguous definition, maybe scope noteMOAT - http://moat-project.org/
Populating facetsExtension taxonomiesSparse classification: entry pointsTags become candidate subjectsTag rating: collaborative selection
Colin Bird, © 2008 IBM Corporation 4520th Oct 2008
Tag “analysis” revisited Informing the taxonomy
Eliciting emergent structure within the folksonomy Identifying potential synonyms and homographs Discovering relationships between tags and subjects
in the taxonomy, exploiting the topic structure to do so Informing the information architecture
Analysing user tagging behaviour to generate recommendations about the structure of the information space
Analysing user tagging behaviour to generate recommendations about the coverage of the taxonomy and the relevance of some terms within it
Colin Bird, © 2008 IBM Corporation 4620th Oct 2008
Acknowledgements
Timothy Catt Tom Clabon David Rankine James Thompson
Cerys Giddings Scott Couper1
Loughborough University
University of Bristol
University of St. Andrews
University of York
IBM Technical mentor
IBM Business mentor
Colin Bird, © 2008 IBM Corporation 4720th Oct 2008
?