Hyeonsoo , Kang

Hyeonsoo, Kang

Video Repeat Recogni-tion and Mining by Vis-ual Features

▫ Structure of the algorithm

▫ Introduction

1. Known Video Repeat Recogni-tion

2. Unknown Video Repeat Recog-nition▫ Results

1. Known Video Repeat Recogni-tion

2. Unknown Video Repeat Recog-nition

“repeat recognition?”

Video repeats which refer to copies of a video clip ubiquitously exist in broadcast and web videos, such as TV commer-cials, station logo, or program logo, etc.

“repeat recognition?”

Distortions – partial repeats, caption overlayRobust detection, searching effi-ciency, and also learning issue

Important for video content analysis and retrieval.

Benefit

Challenge

Applications: Video syntactical seg-mentation, commercial monitoring, video copy detection, web video mul-tiplicity estimation, video content summary, personalization, video compression, …

So what exactly are we going to do?

Observations?

CNN news shots

Observations?

Observations?

Observations?

Observations?

Observations?

TIME AXIS

Video repeat recognition approaches are chiefly twofold:(a) Known video repeat recogni-tion(b) Unknown video repeat recognition

(a) Known video repeat recogni-tion

(b) Unknown video repeat recognition

Prior knowledge about video repeats are known construct a feature vec-tor set and use nearest neighbor (NN) classifier to recognize copies of prototype videos.

Prior knowledge about video repeats are unknown detection, search, and learning issues arise.





1. Video Feature Extraction: frames are sampled every half second. For each sample frame, RGB color histogram and texture feature are calcu-lated – R,G,B channels divided into 8 bins each, texture is computed as 13 components.

2. Cluster each color, and texture space separately, hence we get





Wait, but how big is the computation then?




Wait, but how big is the computation then? We have to consider both 8 X 8 X 8 X S & 13 X U features …




Wait, but how big is the computation then? We have to consider both 8 X 8 X 8 X S & 13 X U features …

We don’t want to match videos in this massive space. Rather, we want to project the space into a smaller subset


And assume that,

Then an eigenvector problem,

3. Use OPCA (maximize the Rayleigh quotient)

(a) Known video repeat recogni-tion4. Once we got a smaller subspace, we now want to recognize video copies. We will use NN classifier to efficiently analyze. A test video is recognized as the closest prototype video if

the distance (difference) is below a threshold .



Otherwise the test video does not belong to any prototype video.

In order to determine the threshold value , we need to analyze the video database. And remember, this is a known video repeat recognition problem, hence we know the statistical data about the database!





We first define three types of distance, , and





We first define three types of distance, , and

(a) Known video repeat recogni-tion, and (1) : Within-class distance between distorted pro-

totype videos and model database.

(2) , : The minimum between-class distance be-tween distorted prototype video and the data-base.

(3) : The minimum distance between non-proto-type video and the database

(a) Known video repeat recogni-tionThese distance types are useful because …

(A)Distorted prototype video is classified as non-prototype video

(B)Distorted copy of one prototype video is recog-nized as another prototype video

(C)Non-prototype video is recognized as a proto-type video.

Are the only cases of recognition errors.

(a) Known video repeat recogni-tionThese distance types are useful because …

(A)Distorted prototype video is classified as non-prototype video

(B)Distorted copy of one prototype video is recog-nized as another prototype video

(C)Non-prototype video is recognized as a proto-type video.

Are the only cases of recognition errors.

Therefore the probability that a video q be wrongly classified is …


The density functions of , and are respec-tively.


[Continued]

Our little old Math knowl-edge says that we need to differentiate the C1 function once in order to achieve the minimum (maximum)

From the func-tion’s property

A prototype video database which consists of 1000 short video clips with length from 15 to 90s, most of which are commercials and film trailers. [Video format] - frame size 720x576, 25fps. - Distorted copies obtained by downsizing to 352x288, with frame rate reduction from 25fps to 15fps (This is common distortion lying between broadcast video and web video copies)[Video length]- Set as 10s when computing the feature vectors. [The number of clusters]- texture feature clusters: 5- color feature clusters: 1

Then OPCA is adopted to compute the 64 subspace projections

Experiments and Results

Statistical Analysis of the subspace …


Histograms of , and

Density functions of , and

[Continued]


Histograms of , and

Density functions of , and

[Continued]

So we set the threshold of 0.05 (or less is also possible according to the equation)


[Continued]

Minimum training error rates










(b) Unknown video repeat recognitionBig picture: We employ two cascade detectors. This is unknown video repeat recognition problem, so we need to give an algorithm to the machine to recognize repeats. Again, we’ll use visual properties, here we’ll em-

ploy color fingerprint (Yang et. al [ ]) The first detector discovers potential repeated

clips, and the second one improves accuracy.


[Continued]

FIRST Stage

SECONDStage


VU1 VU2 VU3 VU4…KF 1 KF2 KF3 KF4

Wait, but how do we find keyframes?



Wait, but how do we find keyframes? Keyframe selection is based on color histogram

difference. Suppose H1 and H0 are color histograms of cur-

rent frame and last keyframe respectively, then current frame is selected as new keyframe if the following condition is satisfied,

[Continued]



[Continued]



[Continued]

And we average the K blending images – Color fin-gerprint is the ordered catenation of these block features.


[Continued]


Let R, G, B the average color values of a block, and their descending order is (V1, V2, V3), then the ma-jor color and minor color are determined by the fol-lowing rules:


[Continued]


Let R, G, B the average color values of a block, and their descending order is (V1, V2, V3), then the ma-jor color and minor color are determined by the fol-lowing rules:


[Continued]

If we divided the blending images into 8 X 8 blocks (M = N = 8) then the color feature is a 128 dimen-sional symbol vector!

To decrease the complexity of searching, we trans-form the data into a string representation using LSH (Local Sensitive Hashing) and use unit length filtering.

Now, the actual algorithm for the machine to rec-ognize the repeats, we devised a similarity mea-sure.


[Continued]

If we divided the blending images into 8 X 8 blocks (M = N = 8) then the color feature is a 128 dimen-sional symbol vector!

To decrease the complexity of searching, we trans-form the data into a string representation using LSH (Local Sensitive Hashing) and use unit length filtering.

Now, the actual algorithm for the machine to rec-ognize the repeats, we devised a similarity mea-sure.


[Continued]

Given two video units vu_i and vu_j, their difference is defined as:

Where F_i, and F_j are color fingerprint vectors of vu_i and vu_j, d(F_i, F_j) is color fingerprint distance function, len(*) is length feature.


[Continued]

Second stage matching was conducted, we decide whether the repeat pair from the first stage is true or not from the following condition:

Score is the similarity value, L is the minimum length of the two clips in seconds, and is thresh-old. Once a repeat pair is verified, their boundaries are extended until a dissimilar one encountered.

Here we use a soft threshold!


Test set: for news video we chose 30 min CNN and ABC news clips from TRECVID. – for 12 days; 6 hours in total.

By manually searching short repeat clips including program logos and commercials, - 34 kinds of repeat clips with totally 186 instances

were found from CNN collection- While 35 kinds, 116 instances were found from

ABC collection


Detector training: 3 hour CNN news videos are ran-domly chosen for training (6 30-minute videos from the video collections)

And we are going to look at the resulting values of

+ the number of video units produced.


After training, detectors were tested on the rest 3 hour CNN videos and 6 hour ABC videos.

Recall and precision on CNN videos are 92.3 % / 96%

Recall and precision on ABC videos are 90.1 % / 90%

With the shortest correct repeat detected is 0.26s long, and the longest one is 75s long.

Also boundary accuracy was measured, the small-est shift was 0s (exact) and the largest one was 16.4s. The average shift was 0.47s. After the sec-ond stage of our algorithm, large shifts were re-duced to 0 ~ 1 second.


Performance comparison of video segmentation meth-ods

CNN news video structure analysis by video repeats

Summary

Video Repeat Recognition problem has many bene-fits yet it is a challenging task.For recognizing video repeats, we have divided the problem into two primary ones:

(1)Known video repeat recognition and(2)Unknown video repeat recognition

For the first one, we have constructed a feature vector set and use nearest neighbor (NN) classifier to recognize copies of prototype videos.

For the second one, we have dealt with the detec-tion, search, and learning issues arisen.

Our result is pretty accurate and efficient, while be-ing able to detect short video repeats and the long ones.

Questions1. What are the applications that video repeat

recognition can be used?

2. To decrease the complexity of searching, what methods did we use?

3. What are the techniques used for efficiency of the algorithm on the known video repeat recogni-tion?

Questions1. What are the applications that video repeat

recognition can be used? Video syntactical segmentation, commercial mon-itoring, video copy detection, web video multiplicity estimation, video content summary, personaliza-tion, video compression, …

2. In the unknown repeat recognition, to decrease the complexity of searching, what methods did we use? To decrease the complexity of searching, we transform the data into a string representation us-ing LSH (Local Sensitive Hashing) and use unit length filtering.

3. What are the techniques used for efficiency of the algorithm on the known video repeat recogni-tion, and why? subspace discriminative analysis by OPCAwhy? Because the degree of dimension on the fea-ture representation is too big

Bibliography[1] Yang, Xianfeng, Qi Tian, and Ee-Chien Chang. "A color fingerprint of video shot for content identification." Proceedings of the 12th annual ACM international conference on Multi-media. ACM, 2004.[2Yang, Xianfeng, and Qi Tian. "Video Repeat Recognition and Mining by Visual Features." Video Search and Mining. Springer Berlin Heidelberg, 2010. 305-326.

THANK YOU!

Q & A

Documents

Hyeonsoo , Kang