Upload
qamar
View
59
Download
0
Embed Size (px)
DESCRIPTION
Understanding and Predicting Interestingness of Videos Yu-Gang Jiang , Yanran Wang , Rui Feng , Hanfang Yang, Yingbin Zheng , Xiangyang Xue School of Computer Science, Fudan University, Shanghai, China. AAAI 2013 Bellevue, USA. Two New Datasets. The problem. Results. Flickr Dataset: - PowerPoint PPT Presentation
Citation preview
Understanding and Predicting Interestingness of VideosYu-Gang Jiang , Yanran Wang , Rui Feng , Hanfang Yang, Yingbin Zheng, Xiangyang Xue
School of Computer Science, Fudan University, Shanghai, ChinaAAAI 2013Bellevue, USA
Applications:• Web Video Search• Video Recommendation System
Related Work:• There is a few studies about predicting Aesthetics and
Interestingness of Images
Key Idea is building computational model to predict which video is more interesting, when given two videos.
Contributions:• Conducted a pilot study on video interestingness• Built two new datasets to support this study• Evaluated a large number of features and get interesting
observations
Can a computational model automatically analyze video contents and predict the interestingness of videos?
We conduct a pilot study on this problem, and demonstrates a simple method to identify more interesting videos.
The problem
Key Idea
VS.
Two New DatasetsFlickr Dataset:• Source: Flickr.com• Video Type: Consumer Videos• Video Number: 1200 • Categories: 15 (basketball, beach…)• Duration: 20 hrs in total• Label: Top 10% as interesting videos;
Bottom 10% as uninteresting
YouTube Dataset:• Source: YouTube.com• Video Type: Advertisements• Video Number: 420• Categories: 14 (food, drink…)• Duration: 4.2 hrs in total• Label: 10 human assessors to compare
video pairs
Prediction & EvaluationComputational Framework: • Aim: train a model to compare the interestingness of two videos
Feature:
Prediction:• Adopt Joachims’ Ranking SVM (Joachims 2003) to train prediction models• For both datasets, we use 2/3 of the videos for training and 1/3 for testing• Use Kernel-level Fusion & Equal Weights to fuse multiple features.
Evaluation:• Accuracy (the percentage of correctly ranked test video pairs)
Visual features
Audio features
High-level attribute features
Ranking SVM
resultsMulti-modal fusionVS.
Multi-modal feature extraction
Visual features Color Histogram SIFT HOG SSIM GIST
Audio features MFCC Spectrogram SIFT Audio-Six
High-level attribute features
Classemes Objectbank Style
ResultsVisual Feature Results:
• Overall the visual features achieve very impressive performance on both datasets• Among five features, SIFT and HOG are very effective, and their combination performs best
Audio Feature Results:
• The three audio features are effective and complementary. Comparing them gets best performance
Attribute Feature Results:
• Attribute features do not work as well as we expected. Especially style performs poorly. It is a very interesting observation since in the prediction of image interestingness, style is claimed effective
Visual+Audio+Attribute Fusion Results:
• Fusing visual and audio features leads to substantial performance gains with 2.6% increase on Flickr and 5.4% increase on YouTube. While adding Attribute features is not that effective
SIFT
HOGSSI
M GIST
Color Hist
rogram
SIFT+H
OG
SIFT+H
OG+SSIM
SIFT+H
OG+GIST
SIFT+H
OG+Color
50
60
70
80 74.2
SIFT
HOGSSI
M GIST
Color Hist
rogram
SIFT+H
OG
SIFT+H
OG+SSIM
SIFT+H
OG+GIST
SIFT+H
OG+Color
50
60
70
80
50556065707580 76.4
50556065707580
Style
Classemes
Objectbank
Style+Classe
mes
Classemes+
Objectbank
50607080
Style
Classemes
Objectbank
Style+Classe
mes
Classemes+
Objectbank
50607080
Visual(S
IFT+HOG)
Audio(MFCC+SS+Audio-Six)
Attribute(O
bjectbank+
Classeme)
Visual+Audio
Visual+Audio+Attrib
ute50607080
50607080
Flickr YouTube
Datasets are available at: www.yugangjiang.info/research/interestingness
76.6 68.074.567.0 67.1
65.764.874.7
64.5 56.8
71.778.676.6
68.0
2.6% 5.4%
ConclusionWe conducted a study on predicting video interestingness. We also built two new datasets. A great number of features have been evaluated, leading to interesting observations:• Visual and Audio features are effective in predicting video interestingness• A few features useful in image interestingness do not extend to video domain
(Style…)