18
南南南南南南 南南南南南 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member, IEEE, and Evangelos Milios, Senior Member, IEEE IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 9, SEPTEMBER 2008 Date : 2009/10/29 Speaker : Chin-Yen Yang

南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

Embed Size (px)

Citation preview

Page 1: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

南台科技大學 資訊工程系

Automatic Website Summarization by Image Content:

A Case Study with Logo and Trademark Images

Automatic Website Summarization by Image Content:

A Case Study with Logo and Trademark Images

Evdoxios Baratis, Euripides G.M. Petrakis, Member, IEEE, andEvangelos Milios, Senior Member, IEEE

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 9, SEPTEMBER 2008

Date : 2009/10/29

Speaker : Chin-Yen Yang

Page 2: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

2

Outline

INTRODUCTION1

IMAGE FEATURE EXTRACTION2

PROPOSED METHOD3

EXPERIMENTAL RESULTS4

5 CONCLUSIONS

Page 3: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

3

1. INTRODUCTION

We introduce the concept of image-based summarization

A fully automated image-based summarization approach is proposed

The evaluation of the method on corporate Websites is presented

Page 4: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

4

1. INTRODUCTION (C.)

Logos and trademarks are important characteristic signs of corporate Websites

A recent contribution reports that logos and trademarks comprise 32.6 percent of the total number of images on the Web

Page 5: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

5

2. IMAGE FEATURE EXTRACTION

Intensity histogram

Radial histogram

Angle histogram

Page 6: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

6

2. IMAGE FEATURE EXTRACTION (C.)

2.1 Image Representation

Page 7: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

7

3 PROPOSED METHOD

Page 8: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

8

3 PROPOSED METHOD (C.)

3.1 Image Information Extraction

1. Link information

2. Text Information

This information is displayed together with images or can be used for searching the Web

MaxDepth

LinkDepthMaxDepthDepth

1

Page 9: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

9

3 PROPOSED METHOD (C.)

3.2 Logo and Trademark Detection

Training the decision tree using histogram features outperforms training using raw histograms

Page 10: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

10

3 PROPOSED METHOD (C.)

Similarity detection

Three attributes corresponding to three histogram intersections, and one attribute corresponding to the euclidean distance of their vectors of moment invariants

The decision tree was pruned with a confidence value of 0.1 and achieved a 93.89 percent average classification accuracy

Page 11: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

11

3 PROPOSED METHOD (C.)

Image clustering

3.3 Duplicate Logo and Trademark Detection

From each cluster, one image is selected to represent the cluster in the summary

Page 12: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

12

3 PROPOSED METHOD (C.)

3.4 Logo and Trademark Ranking

ProbabilityInstancesDepth

Image Importance = Probability*Depth*Instances

Page 13: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

13

3 PROPOSED METHOD (C.)3.5 Image-Based Summarization

Cluster Importance = Image Importanceclusteriimage.

i

Page 14: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

14

4 EXPERIMENTAL RESULTS

Page 15: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

15

4 EXPERIMENTAL RESULTS (C.)

Page 16: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

16

5 CONCLUSIONS

First by extracting images with high probability of being logos or trademarks

Clustering similar images together and by ranking images in each cluster by importance

The most important image from each cluster is included in the summary

Page 17: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

17

5 CONCLUSIONS(C.)

76 percent detection accuracy85 percent classification accuracy64 percent summarization accuracy

Future work includes experimentation with larger training data sets and image types for improving the performance machine learning

Page 18: 南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,

南台科技大學 資訊工程系