ISOcat and RELcat, two cooperating semantic registries

Preview:

DESCRIPTION

M. Windhouwer, I. Schuurman. ISOcat and RELcat, two cooperating semantic registries. At the 24th Meeting of Computational Linguistics in the Netherlands (CLIN 24), Leiden, The Netherlands, January 17, 2014.

Citation preview

www.isocat.org

ISOcat and RELcat:2 cooperating Semantic Registries

Menzo Windhouwermenzo.windhouwer@dans.knaw.nl

The Language Archive – DANS

Ineke Schuurmanineke@ccl.kuleuven.be

KU Leuven, CLARIN-NL – Utrecht University17 January 2014 1CLIN 24

www.isocat.org

Outline

• The need for explicit semantics– ISOcat

• Mapping issues– Languages, theoretical frameworks– Granularity levels– RELcat

• CGN case study• Conclusions and future work

17 January 2014 CLIN 24 2

ccl
naast 'theoretical frameworks' ook languages toegevoegdtypo verbeterd in theoretical

www.isocat.org

Typological Database Nijmegen

TOP NOTION tds:Noun GROUPS{ NOTION tdn:GrammaticalDistinctions LABEL "Grammatical distinctions for nouns." GROUPS { NOTION tdn:AgentNouns LABEL "Agent nouns." DESCRIPTION "Nouns can function as the agent of a clause." LINK TO CONCEPT agentRole GROUPS { NOTION tdn:v098_plusAffix LABEL "Agent nouns formed by verb stem plus affix." LINK TO CONCEPTS (agentRole, verbalMorphology, boundAffix) DESCRIPTION <p>Agent nouns are formed by a verb stem plus an affix, e.g. English <qv>walk-er</qv>.</p> NOTE AUTHOR IS "TDS" TYPE IS "original TDN label" "AGENT NOUNS ARE VERB STEM PLUS AFFIX" IS FIELD v098;...

Explicit semantics!

17 January 2014 CLIN 24 3Notes: TDN is not in archived in TLA, but curated in TDS, a previous project Menzo worked on, and now archived at DANS;also this not a TDN punchcard

www.isocat.org

17 January 2014 CLIN 24 4

DOBES corpora

Explicit semantics!

Shared semantics!

www.isocat.org

ISOcat

• An open Data Category/Concept Registry where everyone can– find and select data categories/concepts– create new data categories/concepts– share data categories/concepts

• Each data category/concept has a Persistent Identifier which can be embedded in a resource (schema) to make the intended semantics (more) explicit

17 January 2014 CLIN 24 5

ccl
hoofdletters verwijders bij eerste bulletsmv naar ev bij tweede bullet (was mix)

www.isocat.org

Mapping issues

• Interesting resources for a specific research question might– use very different theoretical frameworks, which

might share few/none data categories/concepts– use more coarse or finer grained data

categories/concepts• How to overcome these differences by

mapping data categories/concepts to each other?

17 January 2014 CLIN 24 6

ccl
eachother => each other

www.isocat.org

Some examples

• definite article (PoS)– EN: 1 (-)– FR: 2 (masc, fem)– NL: 2 (neuter, non-neuter) – DE: 3 (masc, fem, neuter)

Dutch ‘non-neuter’ , for example, should be related to ‘masc’ and ‘fem’

17 January 2014 CLIN 24 7

www.isocat.org

Some examples

• Indirect object (syntax)– EN: indirect object– NL: • meewerkend voorwerp (1), or• meewerkend voorwerp (2) plus belanghebbend

voorwerp – All translated as ‘indirect object’

=> 3 definitions of ‘indirect object’, relations are to be shown !

17 January 2014 CLIN 24 8

ccl
synt nu voluit

www.isocat.org

Some examples

• Event (semantics)– ISO-TimeML: event and state, where ‘state’ is a

type of event

– Other theories (Kamp & Reyle etc): eventuality, two subtypes: ‘event’ and ‘state’

Concepts ‘eventuality’, ‘event’ and ‘state’ are to be related

17 January 2014 CLIN 24 9

www.isocat.org

ISOcat internal issues

Data categories that are almost the same, apart from type, profile, language, …

Currently we insert a new DC. But note that the original one and the new one should be marked as having a same-as relation

17 January 2014 CLIN 24 10

www.isocat.org

RELcat

• A Relation Registry (under construction) to store– (almost) same-as relationships– subsumption relationships (isSuperClassOf,

isSubClassOf)– mereology relationships (isPartOf, hasPart)– …

between data categories/concepts• The focus is on informal and possibly partial

ontologies to be used for resource discovery• Based on RDF triples17 January 2014 CLIN 24 11

www.isocat.org

CGN case study

• Atomic building blocks of CGN tags are defined in ISOcat (still private)

• The EBNF schema of a CGN tag is stored in SCHEMAcat

• The subsumption relations in the value domains are stored in RELcat

• (almost) same-as relationships with other data categories/concepts are also stored in RELcat

17 January 2014 CLIN 24 12

www.isocat.org

CGN granularity mappings

• How to deal with (almost) same-as relationships that involve more then one atomic CGN data category/concept?– Example: N(SOORT) = Common Noun

• Based on the CGN EBNF this involves the following slots of the /CGN tag/– /PoS/ = /N/– /NTYPE/ = /SOORT/

• How to express this in RDF?17 January 2014 CLIN 24 13

www.isocat.org

RELcat RDF mapping

• Data categories/concepts can function as subjects and objects in an RDF triple

• The predicate of an RDF triple is a RELcat relationship type

• Alternative: complex data categories as properties

17 January 2014 CLIN 24 14

www.isocat.org N(SOORT) = Common Noun

17 January 2014 CLIN 24 15

Common Noun

CGN tag

sameAs

isA

www.isocat.org N(SOORT) = Common Noun

17 January 2014 CLIN 24 16

Common Noun

PoS NTYPE

N SOORT

sameAshasPotentialValue hasPotentialValue

CGN tag

isA

hasPart hasPart has more parts

has morepotential values

has morepotential values

www.isocat.org N(SOORT) = Common Noun

17 January 2014 CLIN 24 17

Common Noun

PoS NTYPE

N SOORT

sameAs

hasPart hasPart

isAisA

isA isA

hasValuehasValuehasPotentialValue hasPotentialValue

CGN tag

isA

hasPart hasPart has more parts

has morepotential values

has morepotential values

www.isocat.org N(SOORT) = Common Noun

17 January 2014 CLIN 24 18

Common Noun

PoS NTYPE

N SOORT

sameAs

hasPart hasPart

isAisA

isA isA

hasValuehasValuehasPotentialValue hasPotentialValue

CGN tag

isA

hasPart hasPart has more parts

has morepotential values

has morepotential values

www.isocat.org Cooperation between ISOcat and RELcat

• ISOcat: value domains of closed data categories– RELcat: hasPotentialValue (new relationship type)

• ISOcat: is-a relations between simple data categories– RELcat: subsumption relations

• SCHEMAcat: part-of relationships– RELcat: mereology relationships

17 January 2014 CLIN 24 19

www.isocat.org

Conclusions and future work

• Simple mappings are easy• Complex mapping get easily fairly complex– UI support?– DSL support?– Alternative RDF mapping?

• User front-end for RELcat– Integration of RELcat and ISOcat?

17 January 2014 CLIN 24 20

www.isocat.org

Other examples

17 January 2014 CLIN 24 21

• “JJR” -> “POS=adjective & degree=comparative”• “Transitive” -> “thetavp=vp120 & synvps=[synNP]

& caseAssigner=True”• “VVIMP” -> “POS= verb & main verb &

mood=imperative”