Transcript
Page 1: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

ISOcat and RELcat:2 cooperating Semantic Registries

Menzo [email protected]

The Language Archive – DANS

Ineke [email protected]

KU Leuven, CLARIN-NL – Utrecht University17 January 2014 1CLIN 24

Page 2: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Outline

• The need for explicit semantics– ISOcat

• Mapping issues– Languages, theoretical frameworks– Granularity levels– RELcat

• CGN case study• Conclusions and future work

17 January 2014 CLIN 24 2

ccl
naast 'theoretical frameworks' ook languages toegevoegdtypo verbeterd in theoretical
Page 3: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Typological Database Nijmegen

TOP NOTION tds:Noun GROUPS{ NOTION tdn:GrammaticalDistinctions LABEL "Grammatical distinctions for nouns." GROUPS { NOTION tdn:AgentNouns LABEL "Agent nouns." DESCRIPTION "Nouns can function as the agent of a clause." LINK TO CONCEPT agentRole GROUPS { NOTION tdn:v098_plusAffix LABEL "Agent nouns formed by verb stem plus affix." LINK TO CONCEPTS (agentRole, verbalMorphology, boundAffix) DESCRIPTION <p>Agent nouns are formed by a verb stem plus an affix, e.g. English <qv>walk-er</qv>.</p> NOTE AUTHOR IS "TDS" TYPE IS "original TDN label" "AGENT NOUNS ARE VERB STEM PLUS AFFIX" IS FIELD v098;...

Explicit semantics!

17 January 2014 CLIN 24 3Notes: TDN is not in archived in TLA, but curated in TDS, a previous project Menzo worked on, and now archived at DANS;also this not a TDN punchcard

Page 4: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

17 January 2014 CLIN 24 4

DOBES corpora

Explicit semantics!

Shared semantics!

Page 5: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

ISOcat

• An open Data Category/Concept Registry where everyone can– find and select data categories/concepts– create new data categories/concepts– share data categories/concepts

• Each data category/concept has a Persistent Identifier which can be embedded in a resource (schema) to make the intended semantics (more) explicit

17 January 2014 CLIN 24 5

ccl
hoofdletters verwijders bij eerste bulletsmv naar ev bij tweede bullet (was mix)
Page 6: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Mapping issues

• Interesting resources for a specific research question might– use very different theoretical frameworks, which

might share few/none data categories/concepts– use more coarse or finer grained data

categories/concepts• How to overcome these differences by

mapping data categories/concepts to each other?

17 January 2014 CLIN 24 6

ccl
eachother => each other
Page 7: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Some examples

• definite article (PoS)– EN: 1 (-)– FR: 2 (masc, fem)– NL: 2 (neuter, non-neuter) – DE: 3 (masc, fem, neuter)

Dutch ‘non-neuter’ , for example, should be related to ‘masc’ and ‘fem’

17 January 2014 CLIN 24 7

Page 8: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Some examples

• Indirect object (syntax)– EN: indirect object– NL: • meewerkend voorwerp (1), or• meewerkend voorwerp (2) plus belanghebbend

voorwerp – All translated as ‘indirect object’

=> 3 definitions of ‘indirect object’, relations are to be shown !

17 January 2014 CLIN 24 8

ccl
synt nu voluit
Page 9: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Some examples

• Event (semantics)– ISO-TimeML: event and state, where ‘state’ is a

type of event

– Other theories (Kamp & Reyle etc): eventuality, two subtypes: ‘event’ and ‘state’

Concepts ‘eventuality’, ‘event’ and ‘state’ are to be related

17 January 2014 CLIN 24 9

Page 10: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

ISOcat internal issues

Data categories that are almost the same, apart from type, profile, language, …

Currently we insert a new DC. But note that the original one and the new one should be marked as having a same-as relation

17 January 2014 CLIN 24 10

Page 11: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

RELcat

• A Relation Registry (under construction) to store– (almost) same-as relationships– subsumption relationships (isSuperClassOf,

isSubClassOf)– mereology relationships (isPartOf, hasPart)– …

between data categories/concepts• The focus is on informal and possibly partial

ontologies to be used for resource discovery• Based on RDF triples17 January 2014 CLIN 24 11

Page 12: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

CGN case study

• Atomic building blocks of CGN tags are defined in ISOcat (still private)

• The EBNF schema of a CGN tag is stored in SCHEMAcat

• The subsumption relations in the value domains are stored in RELcat

• (almost) same-as relationships with other data categories/concepts are also stored in RELcat

17 January 2014 CLIN 24 12

Page 13: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

CGN granularity mappings

• How to deal with (almost) same-as relationships that involve more then one atomic CGN data category/concept?– Example: N(SOORT) = Common Noun

• Based on the CGN EBNF this involves the following slots of the /CGN tag/– /PoS/ = /N/– /NTYPE/ = /SOORT/

• How to express this in RDF?17 January 2014 CLIN 24 13

Page 14: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

RELcat RDF mapping

• Data categories/concepts can function as subjects and objects in an RDF triple

• The predicate of an RDF triple is a RELcat relationship type

• Alternative: complex data categories as properties

17 January 2014 CLIN 24 14

Page 15: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org N(SOORT) = Common Noun

17 January 2014 CLIN 24 15

Common Noun

CGN tag

sameAs

isA

Page 16: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org N(SOORT) = Common Noun

17 January 2014 CLIN 24 16

Common Noun

PoS NTYPE

N SOORT

sameAshasPotentialValue hasPotentialValue

CGN tag

isA

hasPart hasPart has more parts

has morepotential values

has morepotential values

Page 17: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org N(SOORT) = Common Noun

17 January 2014 CLIN 24 17

Common Noun

PoS NTYPE

N SOORT

sameAs

hasPart hasPart

isAisA

isA isA

hasValuehasValuehasPotentialValue hasPotentialValue

CGN tag

isA

hasPart hasPart has more parts

has morepotential values

has morepotential values

Page 18: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org N(SOORT) = Common Noun

17 January 2014 CLIN 24 18

Common Noun

PoS NTYPE

N SOORT

sameAs

hasPart hasPart

isAisA

isA isA

hasValuehasValuehasPotentialValue hasPotentialValue

CGN tag

isA

hasPart hasPart has more parts

has morepotential values

has morepotential values

Page 19: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org Cooperation between ISOcat and RELcat

• ISOcat: value domains of closed data categories– RELcat: hasPotentialValue (new relationship type)

• ISOcat: is-a relations between simple data categories– RELcat: subsumption relations

• SCHEMAcat: part-of relationships– RELcat: mereology relationships

17 January 2014 CLIN 24 19

Page 20: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Conclusions and future work

• Simple mappings are easy• Complex mapping get easily fairly complex– UI support?– DSL support?– Alternative RDF mapping?

• User front-end for RELcat– Integration of RELcat and ISOcat?

17 January 2014 CLIN 24 20

Page 21: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Other examples

17 January 2014 CLIN 24 21

• “JJR” -> “POS=adjective & degree=comparative”• “Transitive” -> “thetavp=vp120 & synvps=[synNP]

& caseAssigner=True”• “VVIMP” -> “POS= verb & main verb &

mood=imperative”