Video summarization by graph optimization Lu Shi Oct. 7, 2003

Preview:

Citation preview

Video summarization by graph optimization

Lu Shi

Oct. 7, 2003

Outline Introduction Goals Stage I: Candidate video shot selection

Video segmentation Video feature detection Candidate video shots

Stage II: Graph based video summary generation Dissimilarity function Spatial-temporal relation graph Optimization

Experiments and Results Conclusion & Future Work

IntroductionMotivation

Huge volume of video data are distributed over the Web

How to help the user to grasp the content of the video quickly

When the bandwidth is narrow, how to present the video to the user

Applications Video skimming (dynamic) Static story board (static)

Goals Criterion for video summary

Conciseness. The video skimming should not exceed the given

target length

Comprehensive coverage Both the visual diversity and temporal distribution of

the original video should be covered.

Visual coherence. The video skimming should not be too jumpy

Stage I: Candidate shot selection

Video segmentation A video shot is an unbroken sequence of images

recorded continuously by a camera. The content of a video shot can be represented by

key frames(e.g first and last) A video sequence is formed by a series of video

shots Video shots can be detected by various video

segmentation methods.

Stage I: Candidate shot selection

Video segmentation Middle slice image (Concatenated by video frame center lines) Calculate minimal pixel difference between rows Filtering and thresholding

Stage I: Candidate shot selection

Video feature detection Face detection Voice, noise detection Audio volume Specific color (fire,etc) Text caption

Features indicate interesting content that should be considered putting into the summary

Stage I: Candidate shot selection

Select candidate shots With interesting features extracted Any combination of extracted features Adjacent candidate shots can be merged into video shot

clusters to increase the visual coherence

Stage II: Graph modeling

Video shot pairwise dissimilarity function Visual(spatial) similarity: Histogram

correlation between key frames Temporal distance: the distance between

shot center points Definition

)),((),(1),( ji shshsTemporalDikjiji eshshVisualSimshshDis

Stage II: Graph modeling

Video shot pairwise dissimilarity function Linear with visual dissimilarity Exponential with temporal distance: to

approximate the user’s memory (k = 400 in the experiment)

Definition Similar definition for video clusters

)),((),(1),( ji shshsTemporalDikjiji eshshVisualSimshshDis

Stage II: Graph modeling Video shot cluster pairwise dissimilarity function

Between one video shot and one video shot cluster

Between two shot clusters

jxj

x

xxiji scsh

sclength

shlengthshshDisscshDis ,

)(

)(),(),(

iyi

y

yjyji scsh

sclength

shlengthscshDisscscDis ,

)(

)(),(),(

Stage II: Graph modeling

Model the candidate shot set as a directional graph G(V,E), conveys both the spatial and the temporal property of

the video A vertex vi corresponds to a video shot, the weight on the

vertex is the shot’s length An edge eij corresponds to the dissimilarity between video

shot i and shot j

Stage II: Graph modeling

The real shot/cluster pairwise dissimilarity function

Stage II: Graph based video summary generation

Video skimming generation Given a target video skimming length SummaryLength A path in the spatial-temporal relation graph corresponds to

a set of video shots The object function is the length of the path Find the longest path, with the constraint that the vertex

weight summation of the path is within [Summarylength-threshold, SummaryLength]

Stage II: Graph based video summary generation

Optimal substructure We denote the state as (ThisShot, LeftSize) The optimal substructure is:

If LeftSize is too small then opt(ThisShot, LeftSize) = 0 And then we can use dynamic programming to find the best

solution.

)(,((max),( 1 NextShotlengthLeftSizeNextShotoptLeftSizeThisShotopt ShotNumThisShotNextShot

)),( NextShotThisShot shshDis

Stage II: Graph based video summary generation

Dynamic programming Set opt(LastShot, 0..threshold) to 0; Set opt(LastShot, threshold+1…SummaryLength) to -X Calculate the opt(ThisShot, LeftSize) with the optimal

substructure equation, ThisShot from LastShot-1 to 0,

Get opt(0,SummaryLength), which is the longest path’s

length. Then trace back to find the path. The time complexity: The spatial complexity:

gthSummaryLenn 2

gthSummaryLenn

Stage II: Graph based video summary generation

Video skimming generation The generated video skimming based on video shots and

video shot clusters is shown below ( SummaryLength= 1500, Video Length = 11479).

Stage II: Graph based video summary generation

Static video story board generation The static video story board is generated with the key

frames of the skimming video shots.

Stage II: Graph based video summary generation

Evaluation The generated video skimming has grasped both

the visual diversity and temporal coverage Massive subjective test not carried out yet (Does it

make sense?) Quantitative objective evaluation is a big problem

Future work

Combine with video structure V-Toc (Video table of

contents) Video shot groups Video scenes

Future work Video structure

Video shot group and video scene

Q & A

Thank you!

Recommended