DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM

Digital Humanities 101 - 2013/2014 - Course 6

Digital Humanities Laboratory

Frederic Kaplan

frederic.kaplan@epfl.ch

Semester 1 : Content of each course

• (1) 19.09 Introduction to the course / Live Tweeting and Collective note

taking

• (2) 25.09 Introduction to Digital Humanities / Wordpress / First assignment

• (3) 2.10 Introduction to the Venice Time Machine project / Zotero

•9.10 No course

• (4) 16.10 Digitization techniques / Deadline first assignment

• (5) 23.10 Datafication / Presentation of projects

• (6) 30.10 Semantic modelling / RDF / Deadline peer-reviewing of first

assignment

my header

Digital Humanities 101 - 2013/2014 - Course 6 | 2013 2o

Semester 1 : Content of each course

• (7) 6.11 Pattern recognition / OCR / Semantic disambiguation

• (8) 13.11 Historical Geographical Information Systems, Procedural modelling

/ City Engine / Deadline Project selection

• (9) 20.11 Crowdsourcing / Wikipedia / OpenStreetMap

• (10) 27.11 Cultural heritage interfaces and visualisation / Museographic

experiences

•4.12 Group work on the projects

•11.12 Oral exam / Presentation of projects / Deadline Project blog

•18.12 Oral exam / Presentation of projects

my header

Objective of today’s course

•Showing you the beauty and making you feel the power of semantic coding

•Give you a quick idea about what is behind the following strange acronyms :

RDF, URI, OWL, SPARQL, SWRL, CIDOC-CRM

•Motivate you to look deeper.

my header

A short introduction to semantic coding

•Many good books exist. I recommend

this one.

• I will reuse some of their example in the

following slides.

my header

Doris Stockly

my header

incanti.dhlab.ch

my header

The simplest kind of dataset, that everyone is familiarwith, is tabular data (any data kept in a table such as anExcel spreadsheet).

my header

Data kept in table is easy to display, sort, print, edit.

my header

You might not even think of data in an Excel spreadsheetas modeled. But there are semantics in data table.Where ?

my header

There are also obvious limitations with this kind ofstorage.

my header

You cannot search for the routes that stay more than 2days at Corfu. Sorting the columns does not capture thedeeper meaning of the text we entered.

my header

Relational databases are a solution. Many very matureproducts exist like Oracle DB, MySQL and PostgreSQL. Arelational database allows multiple tables to be joined in astandardized way.

my header

But, as our project goes we may need to reformate ourtables.This is called schema migration. A painful process.

my header

For big databases, schema can get incredibly complex.

my header

Trying to normalize these databases in a single schema isa labor-intensive process.

my header

How tomake future-proof schemata

my header

How tomake future-proof schemata

•With this mode of coding we can add easily new properties (price of

Route, captain, etc.). The schema is future-proof.

• In addition, the data about the data (i.e. the medadata, the name of

columns) is now part of the data itself.

•This is ideal for projects in Perpetual Beta.

my header

and most important it makes a direct and simpleconnection with a well-developed research field : logic.

my header

Indeed, this can bewritten in a different way

my header

Indeed, this can bewritten in a different way

• (Subject Predicate Object)• (R1 departure Venice)

•This is called a RDF statement, an atomic relation in a database

my header

RDF statements

• (Subject Predicate Object)• (R1 departure Venice)

•This is called a RDF statement, an atomic relation in a database

• (R1 departure-date 2.7.1422)

my header

This is a graph

my header

As RDF statements can be understood both a logicstatements and as parts of a graph, one can use manytools and idea from logic and graph theory to manipulatethem.

my header

•The nodes of the Graph are called Resources.

•When you want to coordinate multiple datasets it can become

increasingly difficult to guarantee unique and consistent identifiers fore

ach node.

•R1 that we use in our database may mean something else in an other

database.

•For naming resources, RDF uses URIs (Unique Resource Identifiers) and

an optional Fragment identifier.

my header

•You are probably familiar with URL (Universal Resource Locators), the

string used to specify how web pages are retrieved.

•URIs generalize this concept further by saying that anything, whether you

can retrieve it electronically or not, can be uniquely identified in a similar

my header

Since URIs can identified anything as a resource, thesubject of an RDF statement can be a resource, the objectcan be a resource and most importantly predicates arealways resources.

my header

An example of URI Ref for a common RDF predicate

my header

It is common in RDF to shorten URIs by assigning anamespace to the base URI and writing only thedistinctive part of the identifier. The last URIs can bewritten in a shorter manner : rdf:type

my header

Serialization

•While the data model that RFD uses is very simple, the serialized

representation tends to get complicated when a RDF graph is saved in a

file or sent over a network.

•Different serialization formats exist :, N3, RDF/XML(the most freq.

used), RDFa (RDF in attributes)

my header

Vocabularies

•A set of URIRefs is known as a vocabulary.

•We can design a specific vocabulary for our maritime route examples.

•There are also famous vocabularies like the RDF vocabulary (the set of

URIRefs describing the RDF concepts, ex. rdf :resource, rdf :type)

my header

SPARQL

•Just as SQL provides a standard query language across relational

databases, SPARQL provides a query language for RDF graphs.

(pronounce sparkle)

•SPARQL queries attempt to match patterns in the graph and bind

wildcard variables as its finds solutions.

•Departure( ?x1,Venice)

•Captain( ?x1, ?x2), Gender( ?x2,Women)

•Semantic coding is all about asking bigger questions.

my header

•With RDF coding, we can also write rules to infer new triples

• If hasParent( ?x1, ?x2) and hasBrother( ?x2, ?x3) then hasUncle( ?x1, ?x3)

•This is also a way of detecting possible incoherence in the set of

knowledge coded in the triple store (actors doing things after their death)

•One standard language to do this is SWRL (Semantic Web Rule

Language)

my header

Ontologies

•An ontology provides a special vocabulary with which knowledge can be

represented.

•This vocabulary allows us to specify which entities will be represented,

how they can be grouped and what relationship connect them together.

• (Venice isa Place), (Corfu isa Place), (Place haslat latitude), (Place

haslong longitude)

•Now, something very beautiful...

my header

An ontology can be expressed as RDF triples and storedin a graph alongside the data it describes.

my header

An ontology can be expressed as RDF triples and storedin a graph alongside the data it describes.

my header

•OWL (Web Ontology Language) is an ontology language layered on top

of RDF and RDFs

•Terminology statements

• ex:Bridge rdf:type rdfs:class

• ex:Bridge rdfs:subclass ex:Place

•Assertion statements

• ex:Rialto rdf:type ex:Bridge

• ex:ex:RialtoCons ex:broughtIntoExistence ex:Rialto

my header

It is relatively easy to create your own ontology using asoftware like Protégé. But some ontologies aim at beinguniversal

my header

CIDOC-CRM

my header

CIDOC-CRM

•CIDOC-CRM is an ontology for Cultural heritage.

•About 20 years of work.

•An ISO standard 21127.

•100+ schema. Very stable.

•CIDOC-CRM is a tentative to formalise an underlying semantics common

to many classifications. It includes very interesting ideas.

my header

CIDOC-CRM : Events

• In CIDOC-CRM, the modelling is event-centric.

•The underlying idea is to model change, not state. Therefore, temporal

entities play a central role.

• Instead of coding the birthdate of a actor, it is better to code the event

of its birth.

my header

Actors relate to things only via temporal entities and events.

my header

CIDOC-CRM : Events

•The participation or presence of several non-temporal entities in an event

e1 allows to conclude that they have been in the same time-interval and

space, even without knowledge of the particular time or space.

•They must have existed at that time. They have not been somewhere

else at that time (with electronic communication, the space volume in

which events occur can become very large).

•The events e0i of creation of each participant i have happened before or

at the time of e1. The events e2i of destruction (or vanishing) of each

participant have happened after or at the time of e1.

my header

CIDOC-CRM : Properties

•The property P11 had participants denotes active or passive involvement

of Actors, whereas P12 occurred in the presence of ranges from objects

just being there (e.g. a desk where a treaty was signed)

•The properties P92 brought into existence, P93 took out of existence are

limiting the existence of things which have a persistent existence.

my header

CIDOC-CRM : Place

•CIDOC-CRM has also implemented a very interesting model for places.

What is hard about places ?

•The question where is it can be answered in natural language by relation

to two different kinds of entities : geometric areas or objects.

my header

In France, in Athens, 39N 124E. Points given by spatialcoordinates are typically understood as the centre of awider, extended area.

my header

on mount St Helens, at the Rhine river.

my header

on Queen Elizabeth (the ship), in my suitcase, at home.

my header

CIDOC-CRM : Place

•Following the CIDOC CRM, geometric areas (E53 Place) can only be

defined relative to larger objects, including the surface of earth.

•Those objects in turn may be located at different times at different places

(relative to a larger object).

•The cultural interest is in the relation to other things and not to an

abstract absolute space. Absolute coordinates seem to make no sense

when the reference objects move.

•As historical information is incomplete and sparse, and many reference

objects move, normalization of place information to absolute coordinates

should not replace the primary information, which is typically relative.

my header

CIDOC-CRM : Places

my header

CIDOC-CRM : Influence

•Another problematic issue is the notion of influence. It is difficult to

develop a systematic understanding of the different forms of influence

and their mutual relations

•Some are more physical, like using a mould or a tool. The influence of a

mould on a produced object is strong and can often be verified on the

object afterwards. The influence of a hammer is less specific.

•Similarly, making a copy of a painting has a strong influence on the

product, copying the idea of a painting, a weak one. The latter is more

an intellectual influence than a physical one.

• If a real influence existed, a temporal sequence can be deduced.

my header

CIDOC-CRM

my header

CIDOC-CRM

my header

Summary : Guidelines for coding historical data

my header

(1) Prefer events to properties. Actors do not haveproperties, they participate to event. Instead of coding thebirthdate of a actor, it is better to code the event of itsbirth.

my header

(2) Code date intervals instead of dates. This is muchmore flexible and permits to detect inconsistencies.

my header

(3) Code places in a relative manner and not an absolutemanner. The cultural interest is in the relation to otherthings and not to an abstract absolute space. Absolutecoordinates seem to make no sense when the referenceobjects move.

my header

All this is very beautifut, but is it sufficient to do the kindof historical modeling we want to do ? We have an issue,which one ?

my header

Metaknowledge : Knowledge about how knowledge isproduced.

my header

How canwe encodemetaknowledge

•Expressed knowledge (RDF triples) is not in the same space as resources

(URI). We can easily attach new information to resource but not to

triples.

• It is not easy to represent metaknowledge like the origin of the

uncertainty linked with an information.

•To overcome this issue we need to introduce two levels of knowledge and

use a trick.

my header

Reifued RDF vs. Standard RDF

•An expressed RDF (RialtoReconstruction hasTimeSpan 1588-1591) can

be transformed in 3 reified triplets

• (s1 rdf:subject RialtoReconstruction)

• (s1 rdf:predicate hasTimeSpan)

• (s1 rdf:object 1588-1591)

my header

Reifued RDF vs. Standard RDF

•An expressed RDF (RialtoReconstruction hasTimeSpan 1588-1591) can

be transformed in 3 reified triplets

• (s1 rdf:subject RialtoReconstruction)

• (s1 rdf:predicate hasTimeSpan)

• (s1 rdf:object 1588-1591)

• (s1 metardf:reliability 0.8)

• (s1 metardf:creator FredericKaplan)

my header

Possible historical spaces

•Now our RDF store includes both historical knowledge and knowledge

about the creation of this historical knowledge.

•These kinds of metainformation can document all the construction

phases (whether realized by humans or machines)

•With this approach, we can extract through queries the historical

knowledge corresponding to some specific sources and thus create a

possible historical reality.

my header

Summary

my header

Encodingmetahistorical information

•We must not only model historical information, but model each step of

the construction of historical knowledge.

•There is a need for semantic framework capable of coding historical

information and meta-historical information.

•Coding meta-historical information implies documenting the choice of

sources, transcription phases, interpretation processes realized by humans

or machines.

my header

No unique global truth but fully documented possiblehistorical reconstructions

my header

DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM

Education

Rdf Editor

Institut für Informatik Betriebliche Informationssysteme RDF Storage / RDF Database: Sesame1 Persistente Speicherung von RDF-Daten in einer relationalen

305 Autoriõigus ja litsentsidmedia.voog.com/0000/0030/4552/files/15-08-2018 305...2018/08/15 · CIDOC presentation ©CIDOC 2009 Digiteerimine AutÕSs • Digiteerimise mõiste ja

Una proposta di ontologia compatibile agli standard CIDOC ... · 1.2.3 - Strato dei modelli RDF (Resource Description Framework) e RDF Schema ... ai beni culturali vengono esposti

Rdf Storage

Repositorio rdf

RDF Acciones

Schaible rdf-060814

RDF y SPARQL - dataweb.infor.uva.es · RDF Un ejemplo pr actico: Censo 2001 RDF SPARQL Discusi on nal Concepto y Or genes Modelo/Lenguaje RDF RDFS RDF es ... Framework para la …

Drupal RDF

Langage RDF/RDFs

Manual DH101 DH201

RDF : une introduction

24 aplicaciones rdf

RDF Manuale

Roteiros do CIDOC e Glossário da norma SECTRUM 4.0

DH101 2013/2014 course 10 - 3d printing, Javascript data visualization

Andres Uueni Archaeovision Tartu, Eestimedia.voog.com › 0000 › 0030 › 4552 › files › 14-08-2018 002... · 2018-09-12 · CIDOC presentation © CIDOC 2009 Kontseptualiseerimine

CIDOC CRM et Linked Data

MAPPING PICO-AP / CIDOC-CRM - CulturaItalia · 2018. 7. 27. · Il mapping tra PICO AP e CIDOC-CRM è stato sviluppato usando due diversi profili: CIDOC-CRM “Erlangen2” in quanto