Upload
lester
View
23
Download
0
Embed Size (px)
DESCRIPTION
Presenter : Bo- Sheng Wang Authors:ALFIO GLIOZZO, IDO DAGAN TSLP, 2009. Improving Text Categorization Bootstrapping via Unsupervised Learning. Outlines. Motivation Objectives Methodology Evaluation Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation
Citation preview
Improving Text Categorization Bootstrapping via Unsupervised Learning
Presenter : Bo-Sheng Wang Authors :ALFIO GLIOZZO, IDO DAGAN
TSLP, 2009
1
Outlines
• Motivation• Objectives• Methodology• Evaluation• Experiments• Conclusions• Comments
2
Motivation
• Supervised systems for text categorization require large amounts of hand-labeled texts
• IL inherently suffers from a score scaling problem and very little information about the intension of a category.
3
Objectives
• Investigate and improve two specific weaknesses that inherently affect the IL schema.
Latent Semantic Index
Gaussian Mixture Algorithm
4
Methodology-Latent Semantic Index
5
Vector Semantic Model
6
Methodology-Latent Semantic Index
7
Methodology-Latent Semantic Index
8
Methodology-Gaussian Mixture Algorithm
9
• This paper propose mapping the similarity values into class posterior probabilities using unsupervised estimation of Gaussian mixtures.
Methodology-Gaussian Mixture Algorithm
10
Seeds
11
Evaluation-Impact of LSI Similarity and GM on IL Performance
12
Evaluation-Extensional vs. Intensional Learning
• A major of a comparison between IL and EL is the amount of supervision required to obtain level of performance.
13
Experiments –
14
Conclusions
• We obtained competitive performance using only the category names as initial seeds.
• Drastically reduce the number of seeds while significantly improving the performance.
15
Comments
• Advantages– Performance,
• Disadvantage– Time
• Applications– Text Mining
16