www.isocat.org
ISOcat and RELcat:2 cooperating Semantic Registries
Menzo [email protected]
The Language Archive – DANS
Ineke [email protected]
KU Leuven, CLARIN-NL – Utrecht University17 January 2014 1CLIN 24
www.isocat.org
Outline
• The need for explicit semantics– ISOcat
• Mapping issues– Languages, theoretical frameworks– Granularity levels– RELcat
• CGN case study• Conclusions and future work
17 January 2014 CLIN 24 2
www.isocat.org
Typological Database Nijmegen
TOP NOTION tds:Noun GROUPS{ NOTION tdn:GrammaticalDistinctions LABEL "Grammatical distinctions for nouns." GROUPS { NOTION tdn:AgentNouns LABEL "Agent nouns." DESCRIPTION "Nouns can function as the agent of a clause." LINK TO CONCEPT agentRole GROUPS { NOTION tdn:v098_plusAffix LABEL "Agent nouns formed by verb stem plus affix." LINK TO CONCEPTS (agentRole, verbalMorphology, boundAffix) DESCRIPTION <p>Agent nouns are formed by a verb stem plus an affix, e.g. English <qv>walk-er</qv>.</p> NOTE AUTHOR IS "TDS" TYPE IS "original TDN label" "AGENT NOUNS ARE VERB STEM PLUS AFFIX" IS FIELD v098;...
Explicit semantics!
17 January 2014 CLIN 24 3Notes: TDN is not in archived in TLA, but curated in TDS, a previous project Menzo worked on, and now archived at DANS;also this not a TDN punchcard
www.isocat.org
17 January 2014 CLIN 24 4
DOBES corpora
Explicit semantics!
Shared semantics!
www.isocat.org
ISOcat
• An open Data Category/Concept Registry where everyone can– find and select data categories/concepts– create new data categories/concepts– share data categories/concepts
• Each data category/concept has a Persistent Identifier which can be embedded in a resource (schema) to make the intended semantics (more) explicit
17 January 2014 CLIN 24 5
www.isocat.org
Mapping issues
• Interesting resources for a specific research question might– use very different theoretical frameworks, which
might share few/none data categories/concepts– use more coarse or finer grained data
categories/concepts• How to overcome these differences by
mapping data categories/concepts to each other?
17 January 2014 CLIN 24 6
www.isocat.org
Some examples
• definite article (PoS)– EN: 1 (-)– FR: 2 (masc, fem)– NL: 2 (neuter, non-neuter) – DE: 3 (masc, fem, neuter)
Dutch ‘non-neuter’ , for example, should be related to ‘masc’ and ‘fem’
17 January 2014 CLIN 24 7
www.isocat.org
Some examples
• Indirect object (syntax)– EN: indirect object– NL: • meewerkend voorwerp (1), or• meewerkend voorwerp (2) plus belanghebbend
voorwerp – All translated as ‘indirect object’
=> 3 definitions of ‘indirect object’, relations are to be shown !
17 January 2014 CLIN 24 8
www.isocat.org
Some examples
• Event (semantics)– ISO-TimeML: event and state, where ‘state’ is a
type of event
– Other theories (Kamp & Reyle etc): eventuality, two subtypes: ‘event’ and ‘state’
Concepts ‘eventuality’, ‘event’ and ‘state’ are to be related
17 January 2014 CLIN 24 9
www.isocat.org
ISOcat internal issues
Data categories that are almost the same, apart from type, profile, language, …
Currently we insert a new DC. But note that the original one and the new one should be marked as having a same-as relation
17 January 2014 CLIN 24 10
www.isocat.org
RELcat
• A Relation Registry (under construction) to store– (almost) same-as relationships– subsumption relationships (isSuperClassOf,
isSubClassOf)– mereology relationships (isPartOf, hasPart)– …
between data categories/concepts• The focus is on informal and possibly partial
ontologies to be used for resource discovery• Based on RDF triples17 January 2014 CLIN 24 11
www.isocat.org
CGN case study
• Atomic building blocks of CGN tags are defined in ISOcat (still private)
• The EBNF schema of a CGN tag is stored in SCHEMAcat
• The subsumption relations in the value domains are stored in RELcat
• (almost) same-as relationships with other data categories/concepts are also stored in RELcat
17 January 2014 CLIN 24 12
www.isocat.org
CGN granularity mappings
• How to deal with (almost) same-as relationships that involve more then one atomic CGN data category/concept?– Example: N(SOORT) = Common Noun
• Based on the CGN EBNF this involves the following slots of the /CGN tag/– /PoS/ = /N/– /NTYPE/ = /SOORT/
• How to express this in RDF?17 January 2014 CLIN 24 13
www.isocat.org
RELcat RDF mapping
• Data categories/concepts can function as subjects and objects in an RDF triple
• The predicate of an RDF triple is a RELcat relationship type
• Alternative: complex data categories as properties
17 January 2014 CLIN 24 14
www.isocat.org N(SOORT) = Common Noun
17 January 2014 CLIN 24 15
Common Noun
CGN tag
sameAs
isA
www.isocat.org N(SOORT) = Common Noun
17 January 2014 CLIN 24 16
Common Noun
PoS NTYPE
N SOORT
sameAshasPotentialValue hasPotentialValue
CGN tag
isA
hasPart hasPart has more parts
has morepotential values
has morepotential values
www.isocat.org N(SOORT) = Common Noun
17 January 2014 CLIN 24 17
Common Noun
PoS NTYPE
N SOORT
sameAs
hasPart hasPart
isAisA
isA isA
hasValuehasValuehasPotentialValue hasPotentialValue
CGN tag
isA
hasPart hasPart has more parts
has morepotential values
has morepotential values
www.isocat.org N(SOORT) = Common Noun
17 January 2014 CLIN 24 18
Common Noun
PoS NTYPE
N SOORT
sameAs
hasPart hasPart
isAisA
isA isA
hasValuehasValuehasPotentialValue hasPotentialValue
CGN tag
isA
hasPart hasPart has more parts
has morepotential values
has morepotential values
www.isocat.org Cooperation between ISOcat and RELcat
• ISOcat: value domains of closed data categories– RELcat: hasPotentialValue (new relationship type)
• ISOcat: is-a relations between simple data categories– RELcat: subsumption relations
• SCHEMAcat: part-of relationships– RELcat: mereology relationships
17 January 2014 CLIN 24 19
www.isocat.org
Conclusions and future work
• Simple mappings are easy• Complex mapping get easily fairly complex– UI support?– DSL support?– Alternative RDF mapping?
• User front-end for RELcat– Integration of RELcat and ISOcat?
17 January 2014 CLIN 24 20
www.isocat.org
Other examples
17 January 2014 CLIN 24 21
• “JJR” -> “POS=adjective & degree=comparative”• “Transitive” -> “thetavp=vp120 & synvps=[synNP]
& caseAssigner=True”• “VVIMP” -> “POS= verb & main verb &
mood=imperative”