44
Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Metadata for the WebFrom Discovery to Description

CS 502 – 20020226Carl Lagoze – Cornell University

Page 2: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Co-existing Cost/Functionality Levels

Gre

ate

r Fun

ction

ality

&

Cost

Page 3: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Dublin Core Qualifiers

• From fuzzy buckets to more specific description

• Model of “graceful degradation”– Support both simplicity and specificity– Intra-domain and inter-domain semantics

Page 4: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Resource has property

DC:CreatorDC:TitleDC:SubjectDC:Date...

X

implied subject

impliedverb

one of 15properties

property value(an appropriateliteral)

[optional qualifier]

[optional qualifier]

qualifiers(adjectives)

Page 5: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Varieties of qualifiers: Element Refinements

• Make the meaning of an element narrower or more specific.

• Narrowing implies an is a relationship – a "date created“ is a "date“– an "is part of relation“ is a "relation“

• If your software does not understand the qualifier, you can safely ignore it.

Page 6: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Varieties of Qualifiers: Value Encoding Schemes

• Says that the value is– a term from a controlled vocabulary (e.g., Library of

Congress Subject Headings)– a string formatted in a standard way (e.g., "2001-05-

02" means May 3, not February 5)

• Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.

Page 7: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Resource has Date "2000-06-13"Revised

ISO8601

Resource has Subject "Languages -- Grammar"LCSH

Page 8: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Dumb-Down Principle for Qualifiers

• The fifteen elements should be usable and understandable with or without the qualifiers

• Qualifiers refine meaning (but may be harder to understand)

• Nouns can stand on their own without adjectives

• If your software encounters an unfamiliar qualifier, look it up -- or just ignore it!

• "has a“ relations break the model– E.g., a creator has a hair color

Page 9: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Resource has Date "2000-06-13"Revised

ISO8601

Resource has Subject "Languages -- Grammar"LCSH

Test for “good““ qualifiers:cover and ask: -- Does the statement still make sense? -- Is it still correct?

Page 10: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Resource has subjectaudience

Resource has creatoraffiliation

“Incorrect” Qualification

“Cornell University”

“pre-schoolers”

Page 11: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Open questions in this model

• Are uncontrolled and unconstrained values really useful for discovery?

• Is it possible for an organization (DCMI) to control the evolution of a language?

• How can "simple discovery metadata" be combined with complex descriptions? Is there a notion of graceful degradation?

• Can DC serve as a lingua franca (mapping template) among more complex models

Page 12: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Models for Deploying Metadata

• Embedded in the resource– low deployment threshold– Limited flexibility, limited model

• Linked to from resource– Using xlink– Is there only one source of metadata?

• Independent resource referencing resource– Model of accessing the object through its surrogate

Page 13: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Syntax Alternatives:HTML

• Advantages:– Simple Mechanism – META tags embedded in content– Widely deployed tools and knowledge

• Disadvantages– Limited structural richness (won’t support

hierarchical,tree-structured data or entity distinctions).

Page 14: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Dublin Core in HTML

• http://www.dublincore.org/documents/2000/08/15/dcq-html/

• HTML constructs– <link> to establish pseudo-namespace– <meta> for metadata statements

• name attribute for DC element (DC.element.ER)

• content attribute for element value

• scheme attribute for encoding scheme or controlled vocabulary

• lang attribute for language of element value

Page 15: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Dublin Core in HTML example

<link rel="schema.DC" href="http://purl.org/dc/elements/1.1"> <meta name="DC.Title" content="Business Unusual”><meta name=“DC.Title” lang=“es” content=“negocio inusual”> <meta name="DC.Creator" content="Carl Lagoze"> <meta name="DC.Subject" content="bibliographic control web cataloging "> <meta name="DC.Date.Created" scheme="W3CDTF"

content="2000-10-23"> <meta name="DC.Format" content="text/html"> <meta name="DC.Identifier" content="http://lcweb.loc.gov/lagoze_paper.html">

Page 16: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Unqualified Dublin Core in XML

http://www.dublincore.org/documents/2000/11/dcmes-xml/

<?xml version="1.0"?>

<!DOCTYPE rdf:RDF SYSTEM "http://dublincore.org/2000/12/01-dcmes-xml-dtd.dtd">

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:dc="http://purl.org/dc/elements/1.1/">

<rdf:Description rdf:about="http://www.ilrt.bristol.ac.uk/people/cmdjb/">

<dc:title>Dave Beckett's Home Page</dc:title>

<dc:creator>Dave Beckett</dc:creator>

<dc:publisher>ILRT, University of Bristol</dc:publisher>

<dc:date>2000-06-06</dc:date>

</rdf:Description>

</rdf:RDF>

Page 17: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Example of Dublin Core Use

A map in the United States Library of Congress on-line American Memory Collection

Page 18: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Title

The name given to the resource

< META name = “DC.Title” content = “Novi Belgii Novæque Angliæ:nec non partis Virginiæ tabula multis in locis emendata ” lang = “la” >

Page 19: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Creator

An entity primarily responsible for making the content of the resource

< META name = “DC.Creator” content = “Nicolaum Visscher” >

Page 20: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Subject

The topic of the content of the resource

< META name = “DC.Subject” content = “Middle Atlantic States” scheme = “LCSH”>< META name = “DC.Subject” content = “Maps” scheme = “LCSH”>< META name = “DC.Subject” content = “Early works to 1800” scheme = “LCSH”>

Page 21: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Description

An account of the content of the description

< META name = “DC.Description.Abstract” content = “An historical map showing the coast of New Jersey as perceived in the seventeenth century”>

Page 22: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Publisher

An entity responsible for making the resource available

< META name = “DC.Publisher” content = “Library of Congress, United States”>

Page 23: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Contributor

An entity responsible for making contributions to the content of the resource.

< META name = “DC.Contributor” content = “Historic Urban Plans”>

Page 24: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Date

A date associated with an event in the lifecycle of the resource

< META name = “DC.Date.Created” content = “1996-04-17” scheme = “W3C-DTF” >

Page 25: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Type

The nature or genre of the content of the resource

< META name = “DC.Type” content = “image”

scheme = “DCMIType”>

Page 26: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Format

The physical or digital manifestation of the resource

< META name = “DC.Format.Medium” content = “image/gif” scheme = “IMT”>

< META name = “DC.Format.Extent” content = “556K”>

Page 27: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Identifier

An unambiguous reference to the resource in the current context

< META name = “DC.Identifier” content = “http://loc.gov/coll1/img456.jpg” scheme = “URI”>

Page 28: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Source

A reference to a resource from which the present resource is derived.

< META name = “DC.Source” content = “G3715 1685 .V5 1969 (LOC catalog #)” >

Page 29: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Language

Language of the intellectual content of the object

< META name = “DC.Language” content = “nl”

scheme = “ISO 639-2”>

Page 30: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Relation

A reference to a related resource

< META name = “DC.Relation.isPartOf” content = “http://lcweb2.loc.gov/ammem/

gmdhtml/dsxpimg.html” scheme = “URI”>

Page 31: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Coverage

The extent or scope of the content of the resource

< META name = “DC.Coverage.Spatial” content = “New Jersey” scheme = “TGN" >< META name = “DC.Coverage.Temporal” content = “1650” scheme = W3C-DTF”>

Page 32: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Rights

Information about rights in and over the resource

< META name = “DC.Rights” content = “http://www.loc.gov/ rights_statement.htm”>

Page 33: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Distributed ContentThe Metadata Challenge

• From fixed, contained physical artifacts to fluid, distributed digital objects

• Need for basis of trust and authenticity in network environment

• Decentralization and specialization of resource description and need for mapping formalisms

Page 34: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Multi-entity nature of object description

Photographer

Camera type Software

Computer artist

Page 35: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Understanding Metadata based on Query Capabilities

• Simple boolean tags?– Creator=“Tom Baker” and “Title” contains “Dublin

Core”

• Agent, time, place questions?– Who was responsible for what and when and where

Page 36: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Attribute/Value approaches to metadata…

Hamlet has a creator Shakespeare

subject implied verb metadata noun literal

Play

wrig

ht

metadata adjective

The playwright of Hamlet was Shakespeare

R1

“Shakespeare”

“Hamlet”

dc:creator.playwright

dc:title

Page 37: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

…run into problems for richer descriptions…

Hamlet has a creator Stratford

birt

hpla

ce

The playwright of Hamlet was Shakespeare,who was born in Stratford

“Stratford”R1

“Shakespeare”dc:creator.playwright

dc:creator.birthplace

Page 38: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

…because of their failure to model entity distinctions

R1

“Stratford”

creatorR2

name “Shakespeare”

birthplacetitle

“Hamlet”

Page 39: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Applying a Model-Centric Approach

• Formally define common entities and relationships underlying multiple metadata vocabularies

• Describe them (and their inter-relationships) in a simple logical model

• Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.

Page 40: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Events are key to understanding metadata relationships?

• Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles.

• Clarifying attachment points facilitates understanding and querying “who was responsible for what when”.

Page 41: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

ABC/Harmony Event-aware metadata ontology• Recognizing inherent lifecycle aspects of

description (esp. of digital content)• Modeling incorporates time (events and

situations) as first-class objects– Supplies clear attachment points for agents, roles,

existential properties

• Resource description as a “story-telling” activity

Page 42: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Resource-centric Metadata

Title Anna Karenina

Author Leo Tolstoy

Illustrator Orest Vereisky

Translator Margaret Wettlin

Date Created 1877

Date Translated 1978

Description Adultery & Depression

Birthplace Moscow

Birthdate 1828

?

Page 43: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

“translator”

“Margaret Wettlin”“Orest Vereisky”

“illustrator”

“Anna Karenina”

“Tragic adultery andthe search for meaningfullove”

“English”

“author”

“creation”

“1877”“1978”

“translation”

“Russian”

“Leo Tolstoy”"Moscow"

“1828”

Page 44: Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

Cornell CS 502

Queries over complex descriptive graphs

• Ability to ask questions like “show me all the translations of War and Peace between 1980 and 1990”