Verginica BARBU MITITELU
REȚEA SEMANTICO‐DERIVAȚIONALĂ PENTRU LIMBA ROMÂNĂ
REȚEA SEMANTICO‐DERIVAȚIONALĂ PENTRU LIMBA ROMÂNĂ
Autor: Verginica BARBU MITITELU Conducător ştiințific: Acad. Grigore BRÂNCUŞ
Lucrare realizată în cadrul proiectului „Valorificarea identităților culturale în procesele globale”, cofinanțat din Fondul Social European prin Programul Operațional Sectorial Dezvoltarea Resurselor Umane 2007 – 2013, contractul de finanțare nr. POSDRU/89/1.5/S/59758. Titlurile şi drepturile de proprietate intelectuală şi industrială asupra rezultatelor obținute în cadrul stagiului de cercetare postdoctorală aparțin Academiei Române.
Punctele de vedere exprimate în lucrare aparțin autorului şi nu angajează Comisia Europeană şi Academia Română, beneficiara proiectului.
Exemplar gratuit. Comercializarea în țară şi străinătate este interzisă.
Reproducerea, fie şi parțială şi pe orice suport, este posibilă numai cu acordul prealabil al Academiei Române.
ISBN 978‐973‐167‐173‐4 Depozit legal: Trim. II 2013
Verginica BARBU MITITELU
Rețea semantico‐derivațională pentru limba română
Editura Muzeului Național al Literaturii Române
Colecția AULA MAGNA
4
5
Cuprins
INTRODUCERE................................................................................................... 13
Contextul .........................................................................................................13
Scopul cercetării .............................................................................................14
Obiectivele cercetării .....................................................................................15
Structura lucrării ............................................................................................15
CAPITOLUL 1
CUNOŞTINȚE LEXICALE ................................................................................. 17
1.1. Definiție ....................................................................................................17
1.2. Factori de organizare a vocabularului .................................................17
1.3. Formalisme de reprezentare a cunoştințelor lexicale ........................19
1.4. Rețelele semantice...................................................................................19
Concluzii .........................................................................................................21
CAPITOLUL 2
PRINCETON WORDNET șI WORDNETUL ROMÂNESC........................... 22
2.1. Princeton WordNet: conținut şi organizare ........................................22
2.1.1. Categoriile gramaticale ................................................................22
2.1.2. Gruparea cuvintelor în funcție de categoria gramaticală........22
2.1.3. Concepte lexicalizate ....................................................................22
2.1.4. Relații ..............................................................................................23
2.1.5. Sens – formă...................................................................................29
2.1.6. Omonimia şi polisemia ................................................................30
2.1.7. Definiții...........................................................................................30
2.1.8. Numerotarea sensurilor ...............................................................30
6
2.2. Lexicografia de tip wordnet ..................................................................30
2.2.1. Modalități de creare a unui wordnet .........................................31
2.2.2. MultiWordNet...............................................................................32
2.3. Wordnetul românesc ..............................................................................33
2.3.1. Metodologie ...................................................................................33
2.3.2. Resurse lingvistice folosite ..........................................................33
2.3.3. Concepte implementate ...............................................................34
2.3.4. Instrumente....................................................................................34
2.3.5. Particularități .................................................................................36
2.3.6. Asigurarea calității........................................................................38
2.3.7. Aplicații în care a fost folosit RoWN..........................................38
2.3.8. Acces ...............................................................................................42
2.3.9. Valoarea RoWN ............................................................................42
2.3.10. Motivarea necesității introducerii relațiilor derivative în wordnet......................................................................................44
Concluzii .........................................................................................................45
CAPITOLUL 3
ÎMBOGĂȚIREA VOCABULARULUI ............................................................... 47
3.1. Mijloace de îmbogățire a vocabularului ..............................................47
3.2. Statutul domeniului formării cuvintelor .............................................50 3.2.1. Statutul prefixării printre celelalte procedee de formare
a cuvintelor ....................................................................................53 3.2.2. Statutul sufixării ............................................................................54
3.3. Prefixele ....................................................................................................54
3.3.1. Definiție ..........................................................................................54
3.3.2. Criterii de clasificare a prefixelor................................................54
3.3.3. Prefixe versus fenomene la inițiala cuvântului.........................60
3.4. Sufixele .....................................................................................................61
7
3.4.1. Definiție ..........................................................................................61
3.4.2. Criterii de clasificare a sufixelor .................................................61
3.5. Prefixoide și sufixoide ............................................................................63
3.5.1. Definiție ..........................................................................................63
3.5.2. Alți termeni....................................................................................63
3.5.3. Prefixoide și prefixe ......................................................................63
3.5.4. Sufixoide și sufixe .........................................................................65
3.5.5. Prefixoidele și sufixoidele ............................................................66
3.6. Sufixarea substantivală ..........................................................................67
3.6.1. Definiție ..........................................................................................67
3.6.2. Clasa morfologică a bazelor derivatelor substantivale cu sufixe .........................................................................................67
3.6.3. Valoarea semantică a sufixelor substantivale ...........................67
3.6.4. Originea sufixelor substantivale .................................................71
3.6.5. Structura sufixelor substantivale ................................................72
3.6.6. Productivitatea sufixelor substantivale .....................................72
3.7. Sufixe verbale ..........................................................................................73
3.7.1. Definiție ..........................................................................................73
3.7.2. Clasa morfologică a bazelor derivatelor verbale cu sufixe .....73
3.7.3. Valori semantice ale sufixelor verbale .......................................73
3.7.4. Originea sufixelor verbale ...........................................................76
3.7.5. Structura sufixelor verbale ..........................................................77
3.7.6. Sufixul verbal Ø (derivarea imediată) .......................................77
3.7.7. Productivitatea sufixelor verbale................................................77
3.8. Sufixe adjectivale.....................................................................................78
3.8.1. Definiție ..........................................................................................78
3.8.2. Clasa morfologică a bazelor derivatelor adjectivale cu sufixe .........................................................................................78
8
3.8.3. Valoarea semantică a sufixelor adjectivale................................78
3.8.4. Originea sufixelor adjectivale......................................................80
3.8.5. Structura sufixelor adjectivale.....................................................80
3.8.6. Productivitatea sufixelor adjectivale ..........................................80
3.9. Sufixe adverbiale.....................................................................................81
3.9.1. Definiție ..........................................................................................81
3.9.2. Clasa morfologică a bazelor derivatelor adverbiale cu sufixe .........................................................................................81
3.9.3. Valoarea semantică a sufixelor adverbiale ................................81
3.9.4. Originea sufixelor adverbiale......................................................81
3.9.5. Structura sufixelor adverbiale.....................................................81
3.9.6. Productivitatea sufixelor adverbiale ..........................................82
Concluzii .........................................................................................................82
CAPITOLUL 4
CREAREA LISTELOR DE AFIXE...................................................................... 84
4.1. Crearea manuală a inventarului de afixe românești ..........................85
4.2. Resurse și instrumente folosite pentru identificarea automată a sufixelor în cuvinte...............................................................................86
4.2.1. Resurse lingvistice ........................................................................86
4.2.2. Descrierea algoritmului de identificare a sufixelor în cuvinte........................................................................................87
4.3. Rezultate și interpretări..........................................................................90
Concluzii .........................................................................................................96
CAPITOLUL 5
IDENTIFICAREA PERECHILOR BAZĂ‐DERIVAT ÎN ROWN................... 98
5.1. Alte wordneturi cu relații derivative ...................................................98
5.2. Specificitatea relațiilor derivative .......................................................101
9
5.3. Resurse lingvistice folosite pentru identificarea derivatelor în RoWN.............................................................................102
5.4. Metodologie ...........................................................................................102
5.5. Validare ..................................................................................................104
5.5.1. Validarea cuvintelor prefixate...................................................104
5.5.2. Validarea cuvintelor sufixate ....................................................105
5.6. Pregătirea adnotării ..............................................................................106
Concluzii .......................................................................................................108
CAPITOLUL 6
ADNOTAREA MORFO‐SEMANTICĂ A PERECHILOR BAZĂ‐DERIVAT................................................................................................ 109
6.1. Principii de adnotare ............................................................................109
6.2. Posibilități de automatizare a adnotării relațiilor derivative..........111
6.3. Proprietățile relațiilor derivative ........................................................113
6.4. Etichete semantice.................................................................................113
6.4.1. Etichete semantice pentru cuvintele prefixate ........................114
6.4.2. Etichete semantice pentru cuvintele sufixate..........................117
6.5. Observații despre adnotare .................................................................123
Concluzii .......................................................................................................126
CAPITOLUL 7
REZULTATE. STATISTICI. DISCUȚII............................................................ 127
7.1. Afixele şi frecvența lor..........................................................................127
7.2. Date despre derivate.............................................................................128
7.3. Comparație cu EXPD............................................................................128
7.4. Date despre etichetele semantice ........................................................129
7.5. Densitatea relațiilor în RoWN.............................................................130
CONCLUZII ....................................................................................................... 131
10
Mulțumiri...............................................................................................134
BIBLIOGRAFIE .................................................................................................. 136
ANEXA 1. Lista afixelor româneşti ................................................................. 143
ANEXA 2. Sufixele româneşti identificate în RoWN (147). Posibilități de combinare cu părțile de vorbire. Derivatele obținute exprimate prin partea lor de vorbire............................ 149
ANEXA 3. AFIXELE IDENTIFICATE ÎN CUVINTELE DERIVATE VALIDATE MANUAL ŞI NUMĂRUL DE DERIVATE ÎN CARE AU FOST IDENTIFICATE....................................................................................... 155
ANEXA 4. Etichete semantice. Număr de ocurențe. Afixe specifice. Număr de ocurențe............................................................................................ 158
ADDENDA ......................................................................................................... 177
ABSTRACT...........................................................................................177
SUMMARY...........................................................................................184
177
ADDENDA
Abstract
On 26th September 2012 (the European Languages Day, established in
2001), META‐NET, a network of excellence made up of 60 centers from 34 countries, aiming at building the technological foundations of the European information society, published a report called Languages in the European In‐formation Society (http://www.meta‐net.eu/whitepapers/overview) in which the situation for 30 European languages in what concerns the linguistic technology is presented and the most urgent risks are explained. The results draw attention on the fact that digital support for 21 out the 30 languages is inexistent or weak. In the digital era we live in, it is vital that the users should be able to use their mother tongue on the Internet and there should be resources and instruments for processing it.
From the report made out by specialists we find out that for Romani‐an the situation is as follows: in general, most domains of Natural Language Processing are covered (except for language generation, dialogue management systems, multimodal corpora); there is no treebank for Roma‐nian; speech processing is inferior to text processing; the number of resources is smaller than the number of processing tools; most instruments are not freely available; the resources are qualitative and sustainable.
In this international and national context, our research aimed mainly at improving one of the fundamental resources developed and owned by the Research Institute for Artificial Intelligence of the Romanian Academy (RACAI), a member of META‐NET, that is the Romanian wordnet, by marking the derivational relations between existing words and by semantically labelling those relations whenever possible.
The applications in computational linguistics have better results when they use richer and more qualitative linguistic resources. The Roma‐nian semantic network is used by RACAI in tasks such as word sense disambiguation, question answering, information extraction, machine
178
translation; so we considered it appropriate to increase its quality by marking the relations between stems and derived words.
The semantic‐derivational network we envisaged contains word families with the following characteristics:
• Each family member is already in the Romanian wordnet; so we did not aimed at quantitatively enriching this linguistic resource;
• Each family member is a semantically disambiguated word; this is a consequence of the fact that we rely on wordnet, which contains word senses, not words in its nodes;
• The family members are grouped in pairs between whose members there is a derivational relation which is defined both formally (one of the literals in the pair can be obtained from the other one by adding or deleting an affix) and semantically (between the two literals we can establish a semantic relation, which we add as a semantic label to the relation).
We followed an interdisciplinary perspective on derivational relations. We started from the linguistic data, which were adapted to serve the needs of language processing.
The research objectives have been:
• Creating a complete inventory of Romanian affixes;
• Marking derivational relations between semantically disambiguated words from the Romanian wordnet;
• Semantic labelling of the derivational relations, whenever possible.
Our research is practically oriented. Its most important result is the morpho‐semantic component of the linguistic resource that we wanted to enrich (the Romanian wordnet).
Collaterally, we have also followed the below objectives:
• Automatic identification of affixes in a lexicon;
• Creating an inventory of possible semantic values of Romanian affixes;
• Statistical data on affixes productivity.
179
In the first chapter we define lexical knowledge as all the words in a language and the information about them; we discuss about the factors organizing the vocabulary (frequency, stylistic‐emotional factor, etymology, psychologic and semantic factors) and the formalisms of representing lexical knowledge (frame systems, conceptual graphs, description logic, semantic networks).
The main characteristics of semantic networks are presented in the first chapter of our work, while in the second we exemplify with the Prince‐ton WordNet (PWN) and describe it in details. As we can already talk about a wordnet‐style lexicography, we presented the various methods for creating a wordnet (manual, automatic, translation, expand and merge methods), with special emphasis on the one adopted for the development of the Romanian wordnet (RoWN) (i.e. the expand method), which is also described in details and the points where it differs from PWN are clearly identified and explained: the non‐lexicalized concepts marked as empty synsets, the order of literals in synsets is random, it is not based on their frequency in a corpus, sense numbering is nested, following the numbering in the electronic Explanatory Dictionary used for RoWN development. We also enumerate and briefly present the tasks in which RACAI has used the RoWN: word sense disambiguation, question answering, machine translation and smaller tasks, such as querry expansion, calculating the se‐mantic similarity or distance between two words, finding answers in a multilingual question answering system. From this applications perspecti‐ve we bring arguments in favour of adding derivational relations to RoWN, which can improve the results of the tasks presented above whenever they rely on the relations in the network.
A large chapter (the third one) is dedicated to the presentation of the evolution of research on derivation in Romanian linguistics and of its nowadays status. From the very beginning we have to say that it has a long tradition in the history of language study, which culminates with the publishing of the volume Word formation in Romanian, three volumes of which have already appeared (one dealing with compounding, one with prefixation and another one with noun suffixation); work on the other types of suffixation is still in progress. This book offers both a diachronic
180
and a synchronic analysis of Romanian affixes. Authors of older studies were more preoccupied with establishing the inventory of affixes and their origins, while more recent studies focus on the tendencies noticed in journalese, after the 1989 revolution: prefixation is more productive than it used to be and certain affixes are preferred for creating new words.
We present the means of enriching the vocabulary: language‐internal (derivation and compounding being the most productive), external (borrowings) and mixed (loan translation). The focus is on derivation. Ro‐manian linguists differentiate between affixes and pseudo‐affixes. The latter are words of Latin or Greek origin that can be attached to the beginning or end of a word to create new words. These are instances of compounding.
Derivation makes use of suffixes, prefixes, roots and stems. A root can combine with a suffix or/and a prefix to create a new, derived word. A stem can contain, besides the root, one or more affixes (suffixes or/and prefixes). It is base for another derived word. Derivation can also involve substitution of affixes. Moreover, sometimes derivation shortens the word, by cutting off its beginning (a prefix) or its ending (a suffix). This type of derivation is called regressive (back‐formation).
Relying on the literature, we defined prefixes and suffixes and we present various criteria by means of which they can be classified: for prefixes: structure (simple or complex), age (old or new), origin (inherited from Latin, borrowed from Slavic and Slavic languages, from Greek, from Romance languages, created in Romanian), semantics (various semantic values such as position, repetition, negation, etc.), productivity (productive or not), usage (in the common language or restricted to some domains); for suffixes: function (lexical or inflectional suffixes; the former create new words, while the latter help create new word forms), morphology (noun, verbal, adjective or adverb suffixes), structure (simple or complex), circulation (in the general language, in certain domains of activity or dialectally). The four morphologic types of suffixes are defined, the morphologic classes of their stems are enumerated and illustrated, their semantic values are identified and exemplified, are classified according to their origin and structure and productivity is discussed.
181
The importance of this chapter for the whole research is threefold: we created here, manually extracting from the literature, an inventory of Ro‐manian prefixes (83) and suffixes (482) (grouped in four morphologic classes: 260 noun suffixes, 104 adjective suffixes, 104 verbal suffixes, 14 ad‐verb suffixes), which will serve as a gold standard in the experiment presented in the next chapter; the semantic values of affixes will be helpful later on in establishing the semantic label for derivational relations; stylistic remarks on affixes can prove useful for marking derived words from the perspective of subjectivity analysis.
In Chapter 4 we present an experiment of automatic identification of suffixes in words. The linguistic resource used has been a lexicon of words in their base form. For identifying suffixes, we used generalized suffix trees. The results (i.e. the potential suffixes) were compared with the manually identified suffixes. We were able to find a few suffixes that we had not found in the literature. However, ignoring semantics (and even phonetics) in such an experiment leads to many false positives.
Chapter 5 starts with a presentation of related work, namely, wordnets of the world for which derivational relations have been marked: PWN, the Czech wordnet, the Turkish one, Estonian, Polish, Bulgarian and Serbian ones. We briefly describe their methodology and keep in mind the remark of most of the developers: derivational relations are language speci‐fic, while the associated semantic relations are not language specific, are valid cross‐linguistically.
For identifying the pairs stem‐derived word in RoWN we used the list of simple literals (LL) in RoWN (31872) and the list of affixes (LA) (492). For each element in LL we verified the formula: literal1 + affix = literal2, where literal1∈LL and literal2∈LL, where literal1 ≠ literal2, and affix∈LA and can be either a prefix or a suffix. We found 16418 such pairs: the prefixed words representing only one fifth of the suffixed ones. These pairs were subject to automatic and then manual validation and only 10442 of them were retained for the next step, the annotation.
In chapter 6 we explain the annotation principles. Derivational relations are marked at the word sense level, because various word sense of the derived word can establish various semantic relations with the different
182
senses of the stem. So, the derivational relation is defined both formally and semantically. Moreover, we consider also analyzable borrowings among our derived words, given the parallelism with derived words, later created in the language.
The derivational relations are characterized by symmetry, transitivity and non‐reflexivity.
In the sixth chapter we also define and exemplify the list of semantic labels that are associated to the derivational relations. They have been established taking care to ensure them a general character, to avoid doubling already existing relations in RoWN, to limit their amount to a reasonable number, to ensure a multilingual character. We have established 16 labels for prefixed words and 40 for suffixed ones.
The last chapter describes the annotation procedure: for each member of the pairs stem‐derived word, all the synsets in which it occurs in the RoWN were extracted. For the two sets of synsets the Cartesian product was calculated and each member was subject to manual annotation. 101729 such pairs were obtained. Looking at 55849 of them, we could manually annotate as derived pairs only 17061: they fulfill both conditions imposed on derivation: formal and semantic relatedness. The rest, although formally related, do not pass the semantic test, i.e. no semantic relation can be established between the two word senses.
We could extract statistics about the most frequent affixes in our RoWN and they are concordant with the remarks made by linguists by me‐re observation of texts and dictionaries. This can be a proof that the methodology for selecting the concepts to be implemented in RoWN has led to the inclusion of a set of literals that are, at least, derivationally, representative for the distribution of facts in the language.
A wordnet is a very valuable resource in great part due to the relations it contains. Thus, increasing their number improves the usefulness of the resource. Most of the relations RoWN contained were relations between words belonging to the same part of speech. However, two thirds of the derivational relations that have been added link words of a different part of speech. Increasing relations density, in general, and increasing the number of relations between words of a different part of speech, in particu‐
183
lar, contribute to the quality improvement of RoWN and, consequently, to better results obtained by the applications using it.
The main contributions of this project are: • Creation of an inventory of Romanian affixes; • Synthesis of the semantic values of these affixes; • Automatic identification of suffixes in derived words; • Outline and implementation of a method for identifying derived
words in RoWN; • Enrichment of the RoWN with derivational relations between
literals, thus increasing their density, especially of relations between words with a different part of speech;
• Adding semantic labels to these derivational relations; if derivational relations are language specific and hold between literals (more precisely, between word senses), the semantic labels are valid cross‐linguistically and hold between the synsets containing the literals in derivational relations; thus, semantic labels are similar to the semantic relations already in the wordnet and they can be transferred from one language into another, provided there exist aligned wordnets for those languages and unless there are possible language idiosyncrasies;
• Turning the RoWN into a knowledge base useful in various applications.
184
Summary
INTRODUCTION ................................................................................................ 13 Background.....................................................................................................13 Research aim...................................................................................................14 Research objectives ........................................................................................15 Book outline....................................................................................................15
CHAPTER 1
LEXICAL KNOWLEDGE ................................................................................... 17 1.1. Definition .................................................................................................17 1.2. Factors for organizing the vocabulary .................................................17 1.3. Formalisms for representing lexical knowledge ................................19 1.4. Semantic networks..................................................................................19 Conclusions.....................................................................................................21
CHAPTER 2
PRINCETON WORDNET AND THE ROMANIAN WORDNET................ 22 2.1. Princeton WordNet: content and organization...................................22
2.1.1. Grammatical categories.................................................................22 2.1.2. Grouping words depending on their grammatical category...22 2.1.3. Lexicalised concepts ......................................................................22 2.1.4. Relations ..........................................................................................23 2.1.5. Sense – form....................................................................................29 2.1.6. Homonymy and polysemy...........................................................30 2.1.7. Glosses .............................................................................................30 2.1.8. Sense numbering............................................................................30
2.2. Wordnet‐style lexicography ..................................................................30 2.2.1. Methods for creating a wordnet ..................................................31 2.2.2. MultiWordNet................................................................................32
185
2.3. The Romanian wordnet..........................................................................33 2.3.1. Methodology...................................................................................33 2.3.2. Linguistic resources used..............................................................33 2.3.3. Implemented concepts ..................................................................34 2.3.4. Instruments .....................................................................................34 2.3.5. Distinctive features ........................................................................36 2.3.6. Quality assurance...........................................................................38 2.3.7. Applications in which RoWN was used.....................................38 2.3.8. Access ..............................................................................................42 2.3.9. RoWN’s value.................................................................................42 2.3.10. Motivating the necessity for adding derivational
relations to RoWn .........................................................................44 Conclusions...............................................................................................45
CHAPTER 3
ENRICHING THE VOCABULARY .................................................................. 47 3.1. Means of enriching the vocabulary ......................................................47 3.2. The status of the word formation domain...........................................50
3.2.1. The status of prefixation among the other means of word formation........................................................................................53
3.2.2. The status of suffixation................................................................54 3.3. Prefixes .....................................................................................................54
3.3.1. Definition ........................................................................................54 3.3.2. Criteria for classifying the prefixes..............................................54 3.3.3. Prefixes versus phenomena at the beginning of words............60
3.4. Suffixes .....................................................................................................61 3.4.1. Definition ........................................................................................61 3.4.2. Criteria for classifying the suffixes ..............................................61
3.5. Pseudo‐prefixes and pseudo‐suffixes ..................................................63 3.5.1. Definition ........................................................................................63 3.5.2. Other terms .....................................................................................63
186
3.5.3. Pseudo‐prefixes and prefixes .......................................................63 3.5.4. Pseudo‐suffixes and suffixes ........................................................65 3.5.5. Pseudo‐prefixes and pseudo‐suffixes .........................................66
3.6. Noun suffixes...........................................................................................67 3.6.1. Definition ........................................................................................67 3.6.2. The morphologic classes of the stems of suffixed nouns .........67 3.6.3. The semantic values of nouns suffixes........................................67 3.6.4. The origins of noun suffixes .........................................................71 3.6.5. The structure of noun suffixes .....................................................72 3.6.6. The productivity of noun suffixes ...............................................72
3.7. Verb suffixes ............................................................................................73 3.7.1. Definition ........................................................................................73 3.7.2. The morphologic classes of the stems of suffixedverbs............73 3.7.3. The semantic values of verb suffixes...........................................73 3.7.4. The origins of verb suffixes ..........................................................76 3.7.5. The structure of verb suffixes.......................................................77 3.7.6. The zero verbsuffix ........................................................................77 3.7.7. The productivity of verb suffixes.................................................77
3.8. Adjective suffixes ....................................................................................78 3.8.1. Definition ........................................................................................78 3.8.2. The morphologic classes of the stems of suffixed adjectives...78 3.8.3. The semantic values of adjective suffixes...................................78 3.8.4. The origins of adjective suffixes...................................................80 3.8.5. The structure of adjective suffixes ...............................................80 3.8.6. The productivity of adjectivesuffixes..........................................80
3.9. Adverb suffixes .......................................................................................81 3.9.1. Definition ........................................................................................81 3.9.2. The morphologic classes of the stems of suffixed adverbs ......81 3.9.3. The semantic values of adverb suffixes ......................................81
187
3.9.4. The origins of adverb suffixes ......................................................81 3.9.5. The structure of adverb suffixes ..................................................81 3.9.6. The productivity of adverb suffixes ............................................82 Conclusions...............................................................................................82
CHAPTER 4
CREATING THE LIST OF AFFIXES ................................................................. 84 4.1. Creating the inventory of Romanian affixes manually .....................85 4.2. Resources and instruments used for automatic identification
of suffixes in derived words....................................................................86 4.2.1. Linguistic resources .......................................................................86 4.2.2. The algorithm for identifying suffixes in derived words.........87
4.3. Results and interpretation .....................................................................90 Conclusions.....................................................................................................96
CHAPTER 5
IDENTIFYING STEM‐DERIVED WORD PAIRS IN ROWN......................... 98 5.1. Other wordnets with derivational relations........................................98 5.2. Characteristics of derivational relations ............................................101 5.3. Linguistic resources used for identifying derived words
in RoWN.................................................................................................102 5.4. Methodology..........................................................................................102 5.5. Validation...............................................................................................104
5.5.1. Validating the prefixed words ...................................................104 5.5.2. Validating the suffixed words....................................................105
5.6. Preparing the annotation .....................................................................106 Conclusions...................................................................................................108
CHAPTER 6
MORPHO‐SEMANTIC ANNOTATION OF STEM‐DERIVED WORD PAIRS................................................................................................................... 109
6.1. Principles for annotation......................................................................109 6.2. Ways of automizing the annotation of derivational relations ........111
188
6.3. The properties of the derivational relations......................................113 6.4. Semantic labels ......................................................................................113
6.4.1. Semantic labels for prefixed words ...........................................114 6.4.2. Semantic labels for suffixed words............................................117
6.5. Remarks on annotation ........................................................................123 Conclusions...................................................................................................126
CHAPTER 7
RESULTS.STATISTICS. DISCUSSIONS ......................................................... 127 7.1. Affixes and their frequency .................................................................127 7.2. Data on derived words.........................................................................128 7.3. Comparison with EXPD.......................................................................128 7.4. Data on the semantic labels .................................................................129 7.5. Relations density in RoWN .................................................................130
CONCLUSIONS................................................................................................. 131
ACKNOWLEDGEMENTS................................................................................ 134
BIBLIOGRAPHY ................................................................................................ 136
ANNEX 1. THE LIST OF ROMANIAN AFFIXES ......................................... 143
ANNEX 2. THE ROMANIAN SUFFIXES IDENTIFIED IN ROWN (147). COMBINATION POSSIBILITIES WITH PARTS OF SPEECH. THE PART OF SPEECH OF THE DERIVED WORDS........... 149
ANNEX 3. THE AFFIXES IDENTIFIED IN THE MANUALLY VALIDATED DERIVED WORDS AND THE NUMBER OF THE IDENTIFIED DERIVED WORDS................................................ 155
ANNEX 4. SEMANTIC LABELS.NUMBER OF OCCURRENCES. SPECIFIC AFFIXES. NUMBER OF OCCURRENCES ............................. 158