ONTOLOGY LEARNING
A LITTLE BIT OF CONTEXT….THE LANGUAGE TECHNOLOGIES
INSTITUTE
Carnegie Mellon University’s School of Computer Science: 1 undergraduate program 7 graduate departments (CSD, HCI, LTI, RI, SEI,MLD,ETC)The Language Technologies Institute is a graduate department
in the School of Computer Science About 25 faculty About 125 graduate students (~85 PhD, ~40MS) About 30 visitors, post-docs, staff programmers, …
© 2005 JAMIE CALLAN
2
A LITTLE BIT OF CONTEXT….THE LANGUAGE TECHNOLOGIES
INSTITUTE
LTI courses and research focus on Machine translation, especially high-accuracy MT Natural language processing & computational linguistics Information retrieval & text mining Speech recognition & synthesis Computer-assisted language learning & intelligent tutoring Computational biology (“the language of the human genome”) … and combinations of the above
Speech-to-speech MT Open domain question answering …
3
A LITTLE BIT ABOUT ME
My Research Interests Text Mining Information Retrieval Natural Language Processing Statistical Machine Learning
My Earlier Work Question Answering Multimedia Information Retrieval Near-duplicate Detection Opinion Detection
TODAY’S TALK
ONTOLOGY LEARNING
ROADMAP
Introduction
Subtasks in Ontology Learning
Human-Guided Ontology Learning
User Study
Metric-Based Ontology Learning
Experimental Results
Conclusions
INFORMATION RETRIEVAL
TECHNOLOGIES
Web Search Engines have changed our life
Google’s great achievement
But, have Search Engines fulfilled Information Needs?
Some, only some
What does search bring to us?
Overwhelming Information in Search Results
Tedious manual judgment still needed
FIND A GOOD KINDERGARTEN IN THE
PITTSBURGH AREA
BUY A USED CAR IN THE PITTSBURGH AREA
IT WILL BE GREAT TO HAVE
A process to Crawl related documents Sort through relevant documents Identify important concepts/topics Organize materials
THIS IS EXACTLY THE TASK OF INFORMATION TRIAGE, OR PERSONAL ONTOLOGY LEARNING
INTRODUCTION
Ontology is a data model that represents a set of concepts within a domain and the set of pair-wise relationships between those concepts.
EXAMPLE: A SIMPLE ONTOLOGY
ball table
Game Equipment
EXAMPLE: WORDNET
EXAMPLE: ODP
ODP
INTRODUCTION
Ontology Learning is the task to construct a well-defined ontology given
a text corpus or
a set of concept terms
INTRODUCTION
Ontology offers a nice way to summarize the important topics in a domain/collection
Ontology facilitates knowledge sharing and reuse
Ontology offers relational associations for reasoning and inference
ROADMAP
Introduction
Subtasks in Ontology Learning
Human-Guided Ontology Learning
User Study
Metric-Based Ontology Learning
Experimental Results
Conclusions
SUBTASKS IN ONTOLOGY LEARNING
Concept Extraction
Synonym Detection
Relationship Formulation by Clustering
Cluster Labeling
SUBTASKS IN ONTOLOGY LEARNING
Concept Extraction
Synonym Detection
Relationship Formulation by Clustering
Cluster Labeling
CONCEPT EXTRACTION
Two Steps:
Noun N-gram and Named Entity Mining
Web-based Concept Filtering
NOUN N-GRAM MINING
I/PRP strongly/RB urge/VBP you/PRP to/TO cut/VB mercury/NN emissions/NNS from/IN power/NN plants/NNS by/IN 90/CD percent/NN by/IN 2008/CD ./.
NOUN N-GRAM MINING
I/PRP strongly/RB urge/VBP you/PRP to/TO cut/VB mercury/NN emissions/NNS from/IN power/NN plants/NNS by/IN 90/CD percent/NN by/IN 2008/CD ./.
Extracted Bi-grams
Mercury emissions
Power plants
Concept Filtering
Web-based POS error detection Assumption:
Among the first 10 google snippets, a valid concept appears more than a threshold (4 in our case)
Remove POS errors protect/NN polar/NN bear/NN
Remove Spelling errors Pullution, polor bear
CONCEPT EXTRACTION
SUBTASKS IN ONTOLOGY LEARNING
Concept Extraction
Synonym Detection
Relationship Formulation by Clustering
Cluster Labeling
CLUSTERING
Hierarchical Clustering
Different Strategies for Concepts at Different Abstraction Levels
EXAMPLE: A SIMPLE ONTOLOGY
ball table
Game Equipment
Abstract Level
Concrete Level
BOTTOM-UP HIERARCHICAL
CLUSTERING
Concept candidates are organized into groups based on the 1st sense of the head noun in WordNet
One of their common head nouns will be selected as the parent concept for this group
pollution subsumes water pollution, air pollution.
Create a high accuracy concept forests at the lower level of the ontology
Start from Concrete Concepts
ONTOLOGY FRAGMENTS
Different fragments are
grouped
CONTINUE TO BE BOTTOM-UP
Problem Still a forest Many concepts at top level are not
grouped Solution: Clustering In any clustering algorithm, we need
a metric Hard to know the metric to
measure distance for those top level nodes
HUMAN-GUIDED ONTOLOGY LEARNING
Learn What? A distance metric function
Learn from What? Concepts at lower levels since they are highly
accurate User feedback
After learning, then what? Apply the distance metric function to concepts at
the higher level to get distance scores for them Then use whatever clustering algorithm to group
them based on the distance scores
TRAINING DATA FROM LOWER LEVELS
A set of concepts x(i) on the ith level of the ontology hierarchy
Distance matrix y(i)
The Matrix entry which corresponding to concept x(i)
j and x(i)k is y(i)
jk∈{0,1},
y(i)jk = 0, if x(i)
j and x(i)k in the same group;
y(i)jk = 1, otherwise.
TRAINING DATA FROM LOWER LEVELS
0011
0011
1100
1100
y(i)
LEARNING THE DISTANCE METRIC
Distance metric represented as a Mahalanobis distance
Φ(x(i)j, x(i)
k)represents a set of pairwise underlying feature functions
A is a positive semi-definite matrix, the parameter we need to learn
Parameter estimation by Minimizing Squared Errors
SOLVE THE OPTIMIZATION
PROBLEM
Optimization can be done by
Newton’s Method
Interior-Point Method
Any standard semi-definite programming (SDP) solvers Sedumi, yalmip
GENERATE DISTANCE SCORES
We have learned A!
For any pair of concepts at higher level (x(i+1)l,
x(i+1)m), the corresponding entry in the distance
matrix y(i+1) is
K-MEDOIDS CLUSTERING
Flat clustering at a level
Use one of the concepts as the cluster center
Estimate the number of clusters by Gap statistics [Tibshirani et al. 2000]
HUMAN COMPUTER INTERACTION
SUBTASKS IN ONTOLOGY LEARNING
Concept Extraction
Synonym Detection
Relationship Formulation by Clustering
Cluster Labeling
CLUSTER LABELING
Problem: Concepts are grouped together,
but nameless Solution:
A web-based approach Send a query formed by
concatenating the child concepts to Google
Parse top 10 snippets The most frequent word is
selected to be the parent of this group
ROADMAP
Introduction
Subtasks in Ontology Learning
Human-Guided Ontology Learning
User Study
Metric-Based Ontology Learning
Experimental Results
Conclusions
USER STUDY
12 graduate students from political science in University of Pittsburgh
Divided into two groups Manual group:4 Interactive group:8
Task: Construct ontology hierarchy for 4 datasets
Mercury Polar bear Wolf Toxin Release Inventory (tri)
90 minutes limit or till user’s satisfaction
DATASETS
SOFTWARE USED FOR USER STUDY
QUALITY OF MANUAL VS. INTERACTIVE RUNS
Manual Users show moderate agreements (0.4~0.6)
Interactive runs produce similar quality results
Difference between manual and interactive runs is NOT statistically significant
COSTS OF MANUAL VS. INTERACTIVE RUNS
Interactive users use 40% less edits (statistically significant)
Interactive runs save 30-60 minutes per ontology
Within interactive runs, a human spends 64% less time than manual runs
CONTRIBUTIONS OF HUMAN-GUIDED ONTOLOGY
LEARNING
Effectively combine the strengths of automatic systems and human knowledge
Combine many techniques into a unified framework pattern-based(concept mining) knowledge-based (use of Wordnet) Web-based (concept filtering and cluster naming) Machine Learning
A detailed independent user study
WHAT TO IMPROVE?
Is bottom-up the best way to do? Maybe not Incremental clustering saves most efforts
We have used different technologies for concepts at different levels, how to formally generalize it? Model concept abstractness explicitly
We have tested on domain-specific corpora, how about corpora for more general purposes? Can we reconstruct WordNet or ODP?
ROADMAP
Introduction
Subtasks in Ontology Learning
Human-Guided Ontology Learning
User Study
Metric-Based Ontology Learning
Experimental Results
Conclusions
CHALLENGES
Hard to find a good name for a new group in bottom-up clustering framework
Formally model concept abstractness Intelligently use different techniques for concepts
at different abstract levels Flexibly incorporate heterogeneous features
State-of-the-art either use one type of semantic evidence to infer all relationships, or
Use one type of feature for a particular subtask
CHALLENGES
Hard to find a good name for a new group in bottom-up clustering framework
Formally model concept abstractness Intelligently use different techniques for concepts
at different abstract levels Flexibly incorporate heterogeneous features
State-of-the-art either use one type of semantic evidence to infer all relationships, or
Use one type of feature for a particular subtask
Solution:Increment
al clustering
CHALLENGES
Hard to find a good name for a new group in bottom-up clustering framework
Formally model concept abstractness Intelligently use different techniques for concepts
at different abstract levels Flexibly incorporate heterogeneous features
State-of-the-art either use one type of semantic evidence to infer all relationships, or
Use one type of feature for a particular subtask
Solution:Learn statistical models for each abstraction level
CHALLENGES
Hard to find a good name for a new group in bottom-up clustering framework
Formally model concept abstractness Intelligently use different techniques for concepts
at different abstract levels Flexibly incorporate heterogeneous features
State-of-the-art either use one type of semantic evidence to infer all relationships, or
Use one type of feature for a particular subtask
Solution:Separate metric learning and ontology construction
A UNIFIED SOLUTION
Metric-based Ontology Learning
LET’S BEGIN WITH SOME IMPORTANT
DEFINITIONS
An ontology is a data model
Concept Set Relationship Set Domain
MORE DEFINITIONS
ball table
Game Equipment
A Full Ontology
MORE DEFINITIONS
ball
Game Equipment
A Partial Ontology
table
MORE DEFINITIONSOntology
Metric
weight = 1.5 weight= 2
weight=1weight =1
d( , ) = 2
d( , ) = 1 ball
d( , ) = 4.5 table
MORE DEFINITIONS
weight= 1.5 weight= 2
weight=1weight=1
d( , ) = 2
d( , ) = 1 ball
d( , ) = 4.5 table
Information in an
Ontology T
MORE DEFINITIONS
d( , ) = 2
d( , ) = 1 ball
d( , ) = 1
Information in a Level L
ball
ASSUMPTIONS OF ONTOLOGY
Minimum Evolution Assumption: The
Optimal Ontology is the One Introduces the Least Information
Changes!
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption
ball
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption ball
table
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption
ball table
Game Equipment
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption
ball table
Game Equipment
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption
ball table
Game Equipment
ASSUMPTIONS OF ONTOLOGYAbstractness
Assumption: Each abstraction level
has its own Information
function
ASSUMPTIONS OF ONTOLOGYAbstractness
Assumption
ball table
Game Equipment
FORMAL FORMULATION OF ONTOLOGY
LEARNING
The Task of Ontology Learning is defined as
The construction of a full ontology T given a set of concepts C and an initial partial ontology T0
Keeping adding concepts in C into T0
Note T0 could be empty
Until a full ontology is formed
GOAL OF ONTOLOGY LEARNING
Find the optimal full ontology s.t. the information changes since T0 are least , i.e.,
Note that this is by the Minimum Evolution Assumption
GET TO THE GOAL
Goal:
Since the optimal set of concepts is always C
Concepts are added incrementally
GET TO THE GOAL
Plug in definition of information change
Transform into a minimization problemMinimum
Evolution objective function
EXPLICITLY MODEL ABSTRACTNESS
Model Abstractness for each Level by Least Square Fit
Plug in definition of amount of information for an abstraction level
Abstractnessobjective function
MULTIPLE CRITERION OPTIMIZATION
FUNCTIONMinimum Evolution
objective function
Abstractnessobjective function
Scalarization variable
THE OPTIMIZATION ALGORITHM
ESTIMATING ONTOLOGY METRIC
Assume ontology metric is a linear interpolation of some underlying feature functions
Ridge Regression to estimate and predict the ontology metric
FEATURES
Google KL-Divergence
Wikipedia KL-Divergence
Google Minipar Syntactic distance
Lexico-Syntactic Patterns
Term Co-occurrence
Word Length Difference
EVALUATION
Reconstruct subdirectories from WordNet and ODP
50 WordNet subdirectories are from 12 topics: gathering, professional, people, building, place, milk, meal, water, beverage, alcohol, dish and herb
50 ODP subdirectories are from 16 topics: computers, robotics, intranet, mobile computing, database, operating system, linux, tex, software, computer science, data communication, algorithms, data formats, security multimedia and artificial intelligence
DATASETS
ONTOLOGY RECONSTRUCTION
An absolute gain of 10% compared to the state-of-the-art system developed in Stanford University (ACL2006 Best Paper)
INTERACTION OF ABSTRACTION LEVELS
AND FEATURES
Abstract concepts are sensitive to the explicit modeling – a good modeling on abstract concepts greatly boost performance
Contributions from different features for abstract concepts vary; for concrete concepts indifferent
Simple features (term co-occurrence, word length) work the best
Combination of heterogeneous features works better than individual features
CONTRIBUTIONS OF METRIC-BASED
ONTOLOGY LEARNING
Avoid and hence solve the problem of unknown group names
Tackle the problem of no control over concept abstractness
Experiments show that concept at different abstraction levels behave different and sensitive to different features
Provide a solution to incorporate heterogeneous features
An absolute gain of 10% in precision for both WordNet and ODP than a state-of-the-art system
WE HAVE TALKED ABOUT
The Task of Information Triage and Personal Ontology Learning
Human-guided Ontology Learning
Metric-based Ontology Learning
AT THE BEGINNING, WE SAID:
IT WILL BE GREAT TO HAVE
A process to Crawl related documents Sort through relevant documents Identify important concepts/topics Organize materials
FIND A GOOD KINDERGARTEN IN THE
PITTSBURGH AREA
Are We there yet?
THE KINDERGARTEN EXAMPLE
We Have DONE with the organization! However, does it support further
inference for a decent decision making? Maybe not Future work!
More Future work Model multiple relationships
simultaneously More Efficient Distance Metric
Learning
THANK YOU AND QUESTIONS