Upload
sabrina-anastasia-hunter
View
220
Download
2
Embed Size (px)
Citation preview
南台科技大學 資訊工程系
Automatic Website Summarization by Image Content:
A Case Study with Logo and Trademark Images
Automatic Website Summarization by Image Content:
A Case Study with Logo and Trademark Images
Evdoxios Baratis, Euripides G.M. Petrakis, Member, IEEE, andEvangelos Milios, Senior Member, IEEE
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 9, SEPTEMBER 2008
Date : 2009/10/29
Speaker : Chin-Yen Yang
2
Outline
INTRODUCTION1
IMAGE FEATURE EXTRACTION2
PROPOSED METHOD3
EXPERIMENTAL RESULTS4
5 CONCLUSIONS
3
1. INTRODUCTION
We introduce the concept of image-based summarization
A fully automated image-based summarization approach is proposed
The evaluation of the method on corporate Websites is presented
4
1. INTRODUCTION (C.)
Logos and trademarks are important characteristic signs of corporate Websites
A recent contribution reports that logos and trademarks comprise 32.6 percent of the total number of images on the Web
5
2. IMAGE FEATURE EXTRACTION
Intensity histogram
Radial histogram
Angle histogram
6
2. IMAGE FEATURE EXTRACTION (C.)
2.1 Image Representation
7
3 PROPOSED METHOD
8
3 PROPOSED METHOD (C.)
3.1 Image Information Extraction
1. Link information
2. Text Information
This information is displayed together with images or can be used for searching the Web
MaxDepth
LinkDepthMaxDepthDepth
1
9
3 PROPOSED METHOD (C.)
3.2 Logo and Trademark Detection
Training the decision tree using histogram features outperforms training using raw histograms
10
3 PROPOSED METHOD (C.)
Similarity detection
Three attributes corresponding to three histogram intersections, and one attribute corresponding to the euclidean distance of their vectors of moment invariants
The decision tree was pruned with a confidence value of 0.1 and achieved a 93.89 percent average classification accuracy
11
3 PROPOSED METHOD (C.)
Image clustering
3.3 Duplicate Logo and Trademark Detection
From each cluster, one image is selected to represent the cluster in the summary
12
3 PROPOSED METHOD (C.)
3.4 Logo and Trademark Ranking
ProbabilityInstancesDepth
Image Importance = Probability*Depth*Instances
13
3 PROPOSED METHOD (C.)3.5 Image-Based Summarization
Cluster Importance = Image Importanceclusteriimage.
i
14
4 EXPERIMENTAL RESULTS
15
4 EXPERIMENTAL RESULTS (C.)
16
5 CONCLUSIONS
First by extracting images with high probability of being logos or trademarks
Clustering similar images together and by ranking images in each cluster by importance
The most important image from each cluster is included in the summary
17
5 CONCLUSIONS(C.)
76 percent detection accuracy85 percent classification accuracy64 percent summarization accuracy
Future work includes experimentation with larger training data sets and image types for improving the performance machine learning
南台科技大學 資訊工程系