45
INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

Embed Size (px)

Citation preview

Page 1: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER

Semantica lessicaleTesauriWordNet

Page 2: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

SEMANTICA LESSICALE

Nella lezione 2 iniziammo a discutere la caratterizzazione del significato delle parole nei dizionari contemporanei

In questa lezione discuteremo piu’ in dettaglio queste definizioni, e parleremo di altri tipi di dizionari che cercano di caratterizzare questi significati in modo piu’ preciso: tesauri e WordNet

Page 3: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

TIPI DI DEFINIZIONI IN UN DIZIONARIO

GENUS E DIFFERENTIA: “stating the superordinate concept next to

the definiendum together with at least one distinctive feature”

SINONIMIA TIPICALITA’ USO

Page 4: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

GENUS E DIFFERENTIA

horse noun

1 a solid-hoofed plant-eating domesticated mammal with a flowing mane and tail, used for riding, racing, and to carry and pull loads

New Oxford Dictionary of English

GENUS

DIFFERENTIAE

Page 5: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

LIMITI DELLA DEFINIZIONE VIA GENUS & DIFFERENTIA (lez.2)

Putnam: `faggio’ / `olmo’ `diamante’ / `zircone’

Jackson: happen vs occur vs befall vs transpire Everything is illuminated: `harmonize’ vs

`agree’,

Page 6: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

TIPI DI DEFINIZIONI IN UN DIZIONARIO

GENUS E DIFFERENTIA SINONIMIA

Molte parole, specialmente astratte, difficili da definire in modo analitico

In questo caso si usano sinonimi TIPICALITA’ USO

Page 7: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

DEFINIZIONE PER SINONIMIA

miserable 1 very unhappy, wretched 2 causing misery 3 squalid 4 mean

unhappy 1 sad or depressed 2 unfortunate or wretched

wretched 1 miserable or unhappy 2 worthless

Collins Pocket English Dictionary (2000)

CIRCOLARITA

Page 8: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

TIPI DI DEFINIZIONI IN UN DIZIONARIO

GENUS E DIFFERENTIA SINONIMIA TIPICALITA’

La definizione specifica cos’e’ “tipico” del referente

USO

Page 9: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

DEFINIZIONE PER TIPICALITA’

day of rest a day set aside from normal activity, typically, Sunday on religious grounds

measles an infectious viral disease causing fever and a red rash, typically occurring in childhood

Concise Oxford Dictionary

Page 10: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

TIPI DI DEFINIZIONI IN UN DIZIONARIO

GENUS E DIFFERENTIA SINONIMIA TIPICALITA’ USO

La definizione spiega l’uso di una parola Tipica specialmente per le parole

funzionali (articoli, preposizioni, etc)

Page 11: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

RELAZIONI DI SIGNIFICATO

Molte di queste definizioni stabiliscono il significato di una parola tramite relazioni di significato con altre parole: IPONIMIA: cane / animale SINONIMIA: scemo / cretino ANTONIMIA: giusto / sbagliato MERONIMIA: cavallo / criniera

Page 12: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

IPONIMIA

HYPONYMY is the relation between a subclass and a superclass: CAR and VEHICLE DOG and ANIMAL BUNGALOW and HOUSE

Generally speaking, a hyponymy relation holds between X and Y whenever it is possible to substitute Y for X: That is a X -> That is a Y E.g., That is a CAR -> That is a VEHICLE.

HYPERNYMY is the opposite relation

Page 13: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

SINONIMIA

Two words are SYNONYMS if they have the same meaning at least in some contexts

E.g., PRICE and FARE; CHEAP and INEXPENSIVE; LAPTOP and NOTEBOOK; HOME and HOUSE I’m looking for a CHEAP FLIGHT / INEXPENSIVE FLIGHT

From Roget’s thesaurus: OBLITERATION, erasure, cancellation, deletion

But few words are truly synonymous in ALL contexts: I wanna go HOME / ?? I wanna go HOUSE The flight was CANCELLED / ?? OBLITERATED / ???

DELETED

Page 14: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

ANTONIMIA

La relazione di antonimia lega lemmi con significati opposti: giusto / sbagliato; piccolo / grande

Alle volte anche antonimia ‘estesa’ destra / sinistra; cane / gatto

Page 15: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

ANTONIMIA

artificial not real

conventional not spontaneous or sincere or original

vacant not occupied

Concise Oxford Dictionary 9

Page 16: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

MERONIMIA

La relazione tra le parti ed il tutto: Criniera / cavallo; ruota / auto

Page 17: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

MERONIMIA NELLE DEFINIZIONI

horse noun

1 a solid-hoofed plant-eating domesticated mammal with a flowing mane and tail, used for riding, racing, and to carry and pull loads

New Oxford Dictionary of English

HYPERNYM

PARTI

Page 18: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

QUANTI SIGNIFICATI?

horse noun

1 a solid-hoofed plant-eating domesticated mammal with a flowing mane and tail, used for riding, racing, and to carry and pull loads

•Equus caballus, family Equidae (the horse family), descended from the wild Przewalski’s horse. The horse family also includes the asses and zebras.

An adult male horse; a stallion or gelding. A wild mammal of the horse family

2 a frame or structure on which something is mounted or supported, especially a sawhorse.

3 [mass noun] informal heroin

4 informal a unit of horsepower: the huge 63-horse 701-cc engine

5 Mining an obstruction in a vein

New Oxford Dictionary of English

Page 19: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

QUANTI SIGNIFICATI?

horse n 1 a domesticated perissodactyl mammal, Equus caballus, used for draught work and riding: family Equidae 2 the adult male of this species; stallion. 3 wild horse. 3a a horse (Equus caballus) that has become feral. 3b another name for Przewalski’s horse. 4a any other member of the family Equidae, such as the zebra or ass. 4b (as modifier): the horse family 5 (functioning as pl) horsemen, especially cavalry: a regiment of horse 6 Also called: buck Gymnastics: a padded apparatus on legs, used for vaulting, etc 7 a narrow board supported by a pair of legs at each end, used as a frame for sawing or as a trestle, barrier, etc 8 a contrivance on which a person may ride and exercise 9 a slang word for heroin 10 Mining a mass of rock within a vein or ore. 11 Nautical. A rod, rope or cable, fixed at the ends, along which something may slide by means of a thimble, shackle, or other fitting; traveller. 12 Chess. An informal name for knight. 13 Informal. Short for horsepower. 14 (modifier) drawn by horse or horses: a horse cart.

Collins English Dictionary 4

Page 20: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

OMONIMIA E POLISEMIA

OMONIMIA: I significati sono ben distinti (e.g., etimologie diverse) BANK ‘SCANNARE’ come ‘fare a pezzi’ /

‘italianizzazione di TO SCAN’; GRU come uccello / macchina per sollevare pesi

POLISEMIA: i significati sono collegati MOUTH VERDE’ come ‘avente un certo colore’ e come

‘ricco di vegetazione’

Page 21: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

QUANTI SIGNIFICATI?

The `lumpers’ like to lump meanings together and leave the user to extract the nuance of meaning that corresponds to a particular context, whereas the `splitters’ prefer to enumerate differences of meaning in more detail; the distinction corresponds to that between summarizing and analysing.

Allen, R. Lumping and splitting, English today, 16(4), 61-3

Page 22: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

CRITERI ?

GRAMMATICALI Sensi nominali vs verbali Usi transitivi & intransitivi (Hirst, 1987)

Ross KEPT staring at Nadia’s decolletage Nadia KEPT calm and made a cutting remark Ross wrote of his embarassment in the diary that he KEPT.

COLLOCAZIONI isometric da CED4:

(of a crystal or system of crystallization) having three mutually perpendicular equal axes

(of a method of projecting a drawing in three dimensions) having the three axes equally inclined and all lines drawn to scale

ETIMOLOGIA

Page 23: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

PROBLEMI

Gia’ menzionato: distinzioni di senso non sempre facili

Circolarita’ Relazioni non usate in modo coerente

Page 24: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

SEMANTICA & LESSICO: UN RIASSUNTO

“ate”

WORD-FORMS LEXEMES SENSES

EAT-LEX-1eat0600

eat0700

“eat”

“eats”

“eaten”

Page 25: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

L’ORGANIZZAZIONE DEL LESSICO

“stock”

WORD-FORMS LEXEMES SENSES

STOCK-LEX-1

STOCK-LEX-2

STOCK-LEX-3

stock0100

stock0200

stock0600

stock0700

stock0900

stock1000

Page 26: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

SINONIMIA

“cheap”

WORD-FORMS LEXEMES SENSES

CHEAP-LEX-1

CHEAP-LEX-2

INEXP-LEX-3

cheap0100

….

……

cheapXXXX

inexp0900

inexpYYYY

“inexpensive”

Page 27: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

DIZIONARI ORGANIZZATI SULLA BASE DEL SIGNIFICATO

Tesauri WordNet

Page 28: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

TESAURI

Dizionari organizzati per argomenti sono apparsi simultaneamente a quelli organizzati alfabeticamente (Ǽlfric: Glossary, ~ 1000)

Piu’ famoso dizionario tematico: Peter Mark Roget, Thesaurus of English Words and Phrases, apparso per la prima volta nel 1852

Page 29: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

ROGET THESAURUS: CLASSI

I. ABSTRACT RELATIONSSezioni: Existence, relation, quantity, order,

number, time, change, causation

II. SPACEIII. MATTERIV. INTELLECTV. VOLITIONVI. AFFECTIONS

Page 30: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

ROGET’S THESAURUS: SEZIONI & INSIEMI DI PAROLE

I. ABSTRACT RELATIONS

….IV. ORDER

1. GENERAL 58 Order 59 Disorder 60 Arrangement 61 Derangement

2. CONSECUTIVE 62 Precedence 63 Sequence 64 Precursor 65 Sequel 66 Beginning 67 End 68 Middle

Page 31: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

ALTRI TESAURI

A THESAURUS OF OLD ENGLISH (Roberts, 1995)

HISTORICAL THESAURUS OF ENGLISH (Christian Kay)

LONGMAN DICTIONARY OF SCIENTIFIC USAGE

Page 32: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

WORDNET

A lexical database created at Princeton Freely available for research from the Princeton site http://www.cogsci.princeton.edu/~wn/

Information about a variety of SEMANTICAL RELATIONS Three sub-databases (supported by psychological

research as early as (Fillenbaum and Jones, 1965)) NOUNs VERBS ADJECTIVES and ADVERBS

Each database organized around SYNSETS

Page 33: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

SYNSETS

Senses (or `lexicalized concepts’) are represented in WordNet by the set of words that can be used in AT LEAST ONE CONTEXT to express that sense / lexicalized concept: the SYNSET

E.g.,

{chump, fish, fool, gull, mark, patsy, fall guy, sucker, shlemiel, soft touch, mug}(gloss: person who is gullible and easy to take advantage of)

Page 34: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

IL DATABASE DEI NOMI

About 90,000 forms, 116,000 senses Relations:

hypernym breakfast -> meal

hyponym meal -> lunch

has-member faculty -> professor

member-of copilot -> crew

has-Part table -> leg

part-of course -> meal

antonym leader -> follower

Page 35: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

IPERNIMIA2 senses of robin                                                       

Sense 1robin, redbreast, robin redbreast, Old World robin, Erithacus rubecola -- (small Old World songbird with a reddish breast)       => thrush -- (songbirds characteristically having brownish upper plumage with a spotted breast)           => oscine, oscine bird -- (passerine bird having specialized vocal apparatus)               => passerine, passeriform bird -- (perching birds mostly small and living near the ground with feet having 4 toes arranged to allow for gripping the perch; most are songbirds; hatchlings are helpless)                   => bird -- (warm-blooded egg-laying vertebrates characterized by feathers and forelimbs modified as wings)                       => vertebrate, craniate -- (animals having a bony or cartilaginous skeleton with a segmented spinal column and a large brain enclosed in a skull or cranium)                           => chordate -- (any animal of the phylum Chordata having a notochord or spinal column)                               => animal, animate being, beast, brute, creature, fauna -- (a living organism characterized by voluntary movement)                                   => organism, being -- (a living thing that has (or can develop) the ability to act or function independently)                                       => living thing, animate thing -- (a living (or once living) entity)                                           => object, physical object --                                                => entity, physical thing --

Page 36: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

MERONIMIAwn beak –holon

Holonyms of noun beak

1 of 3 senses of beak

Sense 2

beak, bill, neb, nib

PART OF: bird

Page 37: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

VERBI

About 10,000 forms, 20,000 senses Relations between verb meanings:

Hypernym fly-> travel

Troponym Walk -> stroll

Entails Snore -> sleep

Antonym Increase -> decrease

Page 38: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

RELAZIONI TRA SIGNIFICATI VERBALI

V1 ENTAILS V2 when Someone V1 (logically) entails Someone V2- e.g., snore entails sleep

TROPONYMY when To do V1 is To do V2 in some manner- e.g., limp is a troponym of walk

Page 39: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

AGGETTIVI & AVVERBI

About 20,000 adjective forms, 30,000 senses

4,000 adverbs, 5600 senses Relations:

Antonym (adjective)

Heavy <-> light

Antonym (adverb) Quickly <-> slowly

Page 40: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

COME USARLO

Online: http://cogsci.princeton.edu/cgi-bin/webwn

Scaricatevelo, poi da command line: Get synonyms:

wn –synsn bank Get hypernyms:

wn –hypen robin (also for adjectives and verbs): get antonyms

wn –antsa right

Page 41: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

I LIMITI DI WORDNET

Coverage words not in WordNet

Crocidolite, spinoff (spin-off) Missing information: MERONYMY

Context-dependent senses: slump, crash, bust all synonyms in the WSJ corpus

The structure of WordNet Some information is encoded in complex ways

(room, wall, floor) But: MOVING TARGET!!

Page 42: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

MERONIMIA IN WORDNET: UN ESPERIMENTO

100 bridging descriptions in a mereological relation

Ran a script trying to find a direct link in WordNet (1.7) between one of the senses of the BD and one of the senses of any of the previous NPs

Results: in only 6 cases there is in WordNet a direct lexical relation between a BD and one of the CFs

Page 43: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

John looked at the HOUSE. The WALL was crumbling.

ARTIFACT

HOUSING BUILDING

HOUSE HOME ROOM

WALL FLOOR

IS-A IS-A

IS-AIS-A PART-OF

PART-OF PART-OF

Page 44: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

SOLUZIONE: ACQUISIZIONE LESSICALE

Parziale (aggiungi informazioni a WordNet, specialmente per domini specialistici)

Totale (crei un nuovo lessico a partire da zero)

Page 45: INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet

LETTURE

Jackson, cap. 8 C. Fellbaum. WordNet: An electronic

lexical database. MIT Press, 1998 cap. 1