Upload
menzo-windhouwer
View
334
Download
6
Embed Size (px)
DESCRIPTION
M. Windhouwer, I. Schuurman. ISOcat and RELcat, two cooperating semantic registries. At the 24th Meeting of Computational Linguistics in the Netherlands (CLIN 24), Leiden, The Netherlands, January 17, 2014.
Citation preview
www.isocat.org
ISOcat and RELcat:2 cooperating Semantic Registries
Menzo [email protected]
The Language Archive – DANS
Ineke [email protected]
KU Leuven, CLARIN-NL – Utrecht University17 January 2014 1CLIN 24
www.isocat.org
Outline
• The need for explicit semantics– ISOcat
• Mapping issues– Languages, theoretical frameworks– Granularity levels– RELcat
• CGN case study• Conclusions and future work
17 January 2014 CLIN 24 2
www.isocat.org
Typological Database Nijmegen
TOP NOTION tds:Noun GROUPS{ NOTION tdn:GrammaticalDistinctions LABEL "Grammatical distinctions for nouns." GROUPS { NOTION tdn:AgentNouns LABEL "Agent nouns." DESCRIPTION "Nouns can function as the agent of a clause." LINK TO CONCEPT agentRole GROUPS { NOTION tdn:v098_plusAffix LABEL "Agent nouns formed by verb stem plus affix." LINK TO CONCEPTS (agentRole, verbalMorphology, boundAffix) DESCRIPTION <p>Agent nouns are formed by a verb stem plus an affix, e.g. English <qv>walk-er</qv>.</p> NOTE AUTHOR IS "TDS" TYPE IS "original TDN label" "AGENT NOUNS ARE VERB STEM PLUS AFFIX" IS FIELD v098;...
Explicit semantics!
17 January 2014 CLIN 24 3Notes: TDN is not in archived in TLA, but curated in TDS, a previous project Menzo worked on, and now archived at DANS;also this not a TDN punchcard
www.isocat.org
17 January 2014 CLIN 24 4
DOBES corpora
Explicit semantics!
Shared semantics!
www.isocat.org
ISOcat
• An open Data Category/Concept Registry where everyone can– find and select data categories/concepts– create new data categories/concepts– share data categories/concepts
• Each data category/concept has a Persistent Identifier which can be embedded in a resource (schema) to make the intended semantics (more) explicit
17 January 2014 CLIN 24 5
www.isocat.org
Mapping issues
• Interesting resources for a specific research question might– use very different theoretical frameworks, which
might share few/none data categories/concepts– use more coarse or finer grained data
categories/concepts• How to overcome these differences by
mapping data categories/concepts to each other?
17 January 2014 CLIN 24 6
www.isocat.org
Some examples
• definite article (PoS)– EN: 1 (-)– FR: 2 (masc, fem)– NL: 2 (neuter, non-neuter) – DE: 3 (masc, fem, neuter)
Dutch ‘non-neuter’ , for example, should be related to ‘masc’ and ‘fem’
17 January 2014 CLIN 24 7
www.isocat.org
Some examples
• Indirect object (syntax)– EN: indirect object– NL: • meewerkend voorwerp (1), or• meewerkend voorwerp (2) plus belanghebbend
voorwerp – All translated as ‘indirect object’
=> 3 definitions of ‘indirect object’, relations are to be shown !
17 January 2014 CLIN 24 8
www.isocat.org
Some examples
• Event (semantics)– ISO-TimeML: event and state, where ‘state’ is a
type of event
– Other theories (Kamp & Reyle etc): eventuality, two subtypes: ‘event’ and ‘state’
Concepts ‘eventuality’, ‘event’ and ‘state’ are to be related
17 January 2014 CLIN 24 9
www.isocat.org
ISOcat internal issues
Data categories that are almost the same, apart from type, profile, language, …
Currently we insert a new DC. But note that the original one and the new one should be marked as having a same-as relation
17 January 2014 CLIN 24 10
www.isocat.org
RELcat
• A Relation Registry (under construction) to store– (almost) same-as relationships– subsumption relationships (isSuperClassOf,
isSubClassOf)– mereology relationships (isPartOf, hasPart)– …
between data categories/concepts• The focus is on informal and possibly partial
ontologies to be used for resource discovery• Based on RDF triples17 January 2014 CLIN 24 11
www.isocat.org
CGN case study
• Atomic building blocks of CGN tags are defined in ISOcat (still private)
• The EBNF schema of a CGN tag is stored in SCHEMAcat
• The subsumption relations in the value domains are stored in RELcat
• (almost) same-as relationships with other data categories/concepts are also stored in RELcat
17 January 2014 CLIN 24 12
www.isocat.org
CGN granularity mappings
• How to deal with (almost) same-as relationships that involve more then one atomic CGN data category/concept?– Example: N(SOORT) = Common Noun
• Based on the CGN EBNF this involves the following slots of the /CGN tag/– /PoS/ = /N/– /NTYPE/ = /SOORT/
• How to express this in RDF?17 January 2014 CLIN 24 13
www.isocat.org
RELcat RDF mapping
• Data categories/concepts can function as subjects and objects in an RDF triple
• The predicate of an RDF triple is a RELcat relationship type
• Alternative: complex data categories as properties
17 January 2014 CLIN 24 14
www.isocat.org N(SOORT) = Common Noun
17 January 2014 CLIN 24 15
Common Noun
CGN tag
sameAs
isA
www.isocat.org N(SOORT) = Common Noun
17 January 2014 CLIN 24 16
Common Noun
PoS NTYPE
N SOORT
sameAshasPotentialValue hasPotentialValue
CGN tag
isA
hasPart hasPart has more parts
has morepotential values
has morepotential values
www.isocat.org N(SOORT) = Common Noun
17 January 2014 CLIN 24 17
Common Noun
PoS NTYPE
N SOORT
sameAs
hasPart hasPart
isAisA
isA isA
hasValuehasValuehasPotentialValue hasPotentialValue
CGN tag
isA
hasPart hasPart has more parts
has morepotential values
has morepotential values
www.isocat.org N(SOORT) = Common Noun
17 January 2014 CLIN 24 18
Common Noun
PoS NTYPE
N SOORT
sameAs
hasPart hasPart
isAisA
isA isA
hasValuehasValuehasPotentialValue hasPotentialValue
CGN tag
isA
hasPart hasPart has more parts
has morepotential values
has morepotential values
www.isocat.org Cooperation between ISOcat and RELcat
• ISOcat: value domains of closed data categories– RELcat: hasPotentialValue (new relationship type)
• ISOcat: is-a relations between simple data categories– RELcat: subsumption relations
• SCHEMAcat: part-of relationships– RELcat: mereology relationships
17 January 2014 CLIN 24 19
www.isocat.org
Conclusions and future work
• Simple mappings are easy• Complex mapping get easily fairly complex– UI support?– DSL support?– Alternative RDF mapping?
• User front-end for RELcat– Integration of RELcat and ISOcat?
17 January 2014 CLIN 24 20
www.isocat.org
Other examples
17 January 2014 CLIN 24 21
• “JJR” -> “POS=adjective & degree=comparative”• “Transitive” -> “thetavp=vp120 & synvps=[synNP]
& caseAssigner=True”• “VVIMP” -> “POS= verb & main verb &
mood=imperative”