16
Improving Text Categorization Bootstrapping via Unsupervised Learning Presenter : Bo-Sheng Wang Authors :ALFIO GLIOZZO, IDO DAGAN TSLP, 2009 1

Improving Text Categorization Bootstrapping via Unsupervised Learning

  • Upload
    lester

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Presenter : Bo- Sheng Wang Authors:ALFIO GLIOZZO, IDO DAGAN TSLP, 2009. Improving Text Categorization Bootstrapping via Unsupervised Learning. Outlines. Motivation Objectives Methodology Evaluation Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Improving Text Categorization Bootstrapping via Unsupervised Learning

Improving Text Categorization Bootstrapping via Unsupervised Learning

Presenter : Bo-Sheng Wang  Authors :ALFIO GLIOZZO, IDO DAGAN

TSLP, 2009

1

Page 2: Improving Text Categorization Bootstrapping via Unsupervised Learning

Outlines

• Motivation• Objectives• Methodology• Evaluation• Experiments• Conclusions• Comments

2

Page 3: Improving Text Categorization Bootstrapping via Unsupervised Learning

Motivation

• Supervised systems for text categorization require large amounts of hand-labeled texts

• IL inherently suffers from a score scaling problem and very little information about the intension of a category.

3

Page 4: Improving Text Categorization Bootstrapping via Unsupervised Learning

Objectives

• Investigate and improve two specific weaknesses that inherently affect the IL schema.

Latent Semantic Index

Gaussian Mixture Algorithm

4

Page 5: Improving Text Categorization Bootstrapping via Unsupervised Learning

Methodology-Latent Semantic Index

5

Page 6: Improving Text Categorization Bootstrapping via Unsupervised Learning

Vector Semantic Model

6

Page 7: Improving Text Categorization Bootstrapping via Unsupervised Learning

Methodology-Latent Semantic Index

7

Page 8: Improving Text Categorization Bootstrapping via Unsupervised Learning

Methodology-Latent Semantic Index

8

Page 9: Improving Text Categorization Bootstrapping via Unsupervised Learning

Methodology-Gaussian Mixture Algorithm

9

• This paper propose mapping the similarity values into class posterior probabilities using unsupervised estimation of Gaussian mixtures.

Page 10: Improving Text Categorization Bootstrapping via Unsupervised Learning

Methodology-Gaussian Mixture Algorithm

10

Page 11: Improving Text Categorization Bootstrapping via Unsupervised Learning

Seeds

11

Page 12: Improving Text Categorization Bootstrapping via Unsupervised Learning

Evaluation-Impact of LSI Similarity and GM on IL Performance

12

Page 13: Improving Text Categorization Bootstrapping via Unsupervised Learning

Evaluation-Extensional vs. Intensional Learning

• A major of a comparison between IL and EL is the amount of supervision required to obtain level of performance.

13

Page 14: Improving Text Categorization Bootstrapping via Unsupervised Learning

Experiments –

14

Page 15: Improving Text Categorization Bootstrapping via Unsupervised Learning

Conclusions

• We obtained competitive performance using only the category names as initial seeds.

• Drastically reduce the number of seeds while significantly improving the performance.

15

Page 16: Improving Text Categorization Bootstrapping via Unsupervised Learning

Comments

• Advantages– Performance,

• Disadvantage– Time

• Applications– Text Mining

16