61
Hyeonsoo, Kang Video Repeat Recognition and Mining by Visual Features

Hyeonsoo , Kang

  • Upload
    emelda

  • View
    61

  • Download
    0

Embed Size (px)

DESCRIPTION

Video Repeat Recognition and Mining by Visual Features. Hyeonsoo , Kang. ▫ Introduction. ▫ Structure of the algorithm. Known Video Repeat Recognition Unknown Video Repeat Recognition. ▫ Results. Known Video Repeat Recognition Unknown Video Repeat Recognition. “repeat r ecognition?”. - PowerPoint PPT Presentation

Citation preview

Page 1: Hyeonsoo , Kang

Hyeonsoo, Kang

Video Repeat Recogni-tion and Mining by Vis-ual Features

Page 2: Hyeonsoo , Kang

▫ Structure of the algorithm

▫ Introduction

1. Known Video Repeat Recogni-tion

2. Unknown Video Repeat Recog-nition▫ Results

1. Known Video Repeat Recogni-tion

2. Unknown Video Repeat Recog-nition

Page 3: Hyeonsoo , Kang

“repeat recognition?”

Page 4: Hyeonsoo , Kang

Video repeats which refer to copies of a video clip ubiquitously exist in broadcast and web videos, such as TV commer-cials, station logo, or program logo, etc.

“repeat recognition?”

Page 5: Hyeonsoo , Kang

Distortions – partial repeats, caption overlayRobust detection, searching effi-ciency, and also learning issue

Important for video content analysis and retrieval.

Benefit

Challenge

Applications: Video syntactical seg-mentation, commercial monitoring, video copy detection, web video mul-tiplicity estimation, video content summary, personalization, video compression, …

Page 6: Hyeonsoo , Kang

So what exactly are we going to do?

Page 7: Hyeonsoo , Kang

Observations?

CNN news shots

Page 8: Hyeonsoo , Kang

Observations?

Page 9: Hyeonsoo , Kang

Observations?

Page 10: Hyeonsoo , Kang

Observations?

Page 11: Hyeonsoo , Kang

Observations?

Page 12: Hyeonsoo , Kang

Observations?

Page 13: Hyeonsoo , Kang
Page 14: Hyeonsoo , Kang

TIME AXIS

Page 15: Hyeonsoo , Kang

Video repeat recognition approaches are chiefly twofold:(a) Known video repeat recogni-tion(b) Unknown video repeat recognition

Page 16: Hyeonsoo , Kang

(a) Known video repeat recogni-tion

(b) Unknown video repeat recognition

Prior knowledge about video repeats are known construct a feature vec-tor set and use nearest neighbor (NN) classifier to recognize copies of prototype videos.

Prior knowledge about video repeats are unknown detection, search, and learning issues arise.

Page 17: Hyeonsoo , Kang

(b) Unknown video repeat recognition

Prior knowledge about video repeats are known construct a feature vec-tor set and use nearest neighbor (NN) classifier to recognize copies of prototype videos.

Prior knowledge about video repeats are unknown detection, search, and learning issues arise.

(a) Known video repeat recogni-tion

Page 18: Hyeonsoo , Kang

1. Video Feature Extraction: frames are sampled every half second. For each sample frame, RGB color histogram and texture feature are calcu-lated – R,G,B channels divided into 8 bins each, texture is computed as 13 components.

2. Cluster each color, and texture space separately, hence we get

(a) Known video repeat recogni-tion

Page 19: Hyeonsoo , Kang

1. Video Feature Extraction: frames are sampled every half second. For each sample frame, RGB color histogram and texture feature are calcu-lated – R,G,B channels divided into 8 bins each, texture is computed as 13 components.

2. Cluster each color, and texture space separately, hence we get

(a) Known video repeat recogni-tion

Wait, but how big is the computation then?

Page 20: Hyeonsoo , Kang

1. Video Feature Extraction: frames are sampled every half second. For each sample frame, RGB color histogram and texture feature are calcu-lated – R,G,B channels divided into 8 bins each, texture is computed as 13 components.

2. Cluster each color, and texture space separately, hence we get

(a) Known video repeat recogni-tion

Wait, but how big is the computation then? We have to consider both 8 X 8 X 8 X S & 13 X U features …

Page 21: Hyeonsoo , Kang

1. Video Feature Extraction: frames are sampled every half second. For each sample frame, RGB color histogram and texture feature are calcu-lated – R,G,B channels divided into 8 bins each, texture is computed as 13 components.

2. Cluster each color, and texture space separately, hence we get

(a) Known video repeat recogni-tion

Wait, but how big is the computation then? We have to consider both 8 X 8 X 8 X S & 13 X U features …

We don’t want to match videos in this massive space. Rather, we want to project the space into a smaller subset

Page 22: Hyeonsoo , Kang

(a) Known video repeat recogni-tion

And assume that,

Then an eigenvector problem,

3. Use OPCA (maximize the Rayleigh quotient)

Page 23: Hyeonsoo , Kang

(a) Known video repeat recogni-tion4. Once we got a smaller subspace, we now want to recognize video copies. We will use NN classifier to efficiently analyze. A test video is recognized as the closest prototype video if

the distance (difference) is below a threshold .

Page 24: Hyeonsoo , Kang

(a) Known video repeat recogni-tion4. Once we got a smaller subspace, we now want to recognize video copies. We will use NN classifier to efficiently analyze. A test video is recognized as the closest prototype video if

the distance (difference) is below a threshold .

Otherwise the test video does not belong to any prototype video.

In order to determine the threshold value , we need to analyze the video database. And remember, this is a known video repeat recognition problem, hence we know the statistical data about the database!

Page 25: Hyeonsoo , Kang

(a) Known video repeat recogni-tion4. Once we got a smaller subspace, we now want to recognize video copies. We will use NN classifier to efficiently analyze. A test video is recognized as the closest prototype video if

the distance (difference) is below a threshold .

Otherwise the test video does not belong to any prototype video.

In order to determine the threshold value , we need to analyze the video database. And remember, this is a known video repeat recognition problem, hence we know the statistical data about the database!

We first define three types of distance, , and

Page 26: Hyeonsoo , Kang

(a) Known video repeat recogni-tion4. Once we got a smaller subspace, we now want to recognize video copies. We will use NN classifier to efficiently analyze. A test video is recognized as the closest prototype video if

the distance (difference) is below a threshold .

Otherwise the test video does not belong to any prototype video.

In order to determine the threshold value , we need to analyze the video database. And remember, this is a known video repeat recognition problem, hence we know the statistical data about the database!

We first define three types of distance, , and

Page 27: Hyeonsoo , Kang

(a) Known video repeat recogni-tion, and (1) : Within-class distance between distorted pro-

totype videos and model database.

(2) , : The minimum between-class distance be-tween distorted prototype video and the data-base.

(3) : The minimum distance between non-proto-type video and the database

Page 28: Hyeonsoo , Kang

(a) Known video repeat recogni-tionThese distance types are useful because …

(A)Distorted prototype video is classified as non-prototype video

(B)Distorted copy of one prototype video is recog-nized as another prototype video

(C)Non-prototype video is recognized as a proto-type video.

Are the only cases of recognition errors.

Page 29: Hyeonsoo , Kang

(a) Known video repeat recogni-tionThese distance types are useful because …

(A)Distorted prototype video is classified as non-prototype video

(B)Distorted copy of one prototype video is recog-nized as another prototype video

(C)Non-prototype video is recognized as a proto-type video.

Are the only cases of recognition errors.

Therefore the probability that a video q be wrongly classified is …

Page 30: Hyeonsoo , Kang

(a) Known video repeat recogni-tion

The density functions of , and are respec-tively.

Page 31: Hyeonsoo , Kang

(a) Known video repeat recogni-tion

[Continued]

Our little old Math knowl-edge says that we need to differentiate the C1 function once in order to achieve the minimum (maximum)

From the func-tion’s property

Page 32: Hyeonsoo , Kang

A prototype video database which consists of 1000 short video clips with length from 15 to 90s, most of which are commercials and film trailers. [Video format] - frame size 720x576, 25fps. - Distorted copies obtained by downsizing to 352x288, with frame rate reduction from 25fps to 15fps (This is common distortion lying between broadcast video and web video copies)[Video length]- Set as 10s when computing the feature vectors. [The number of clusters]- texture feature clusters: 5- color feature clusters: 1

Then OPCA is adopted to compute the 64 subspace projections

Experiments and Results

Page 33: Hyeonsoo , Kang

Statistical Analysis of the subspace …

Experiments and Results

Page 34: Hyeonsoo , Kang

Histograms of , and

Density functions of , and

[Continued]

Experiments and Results

Page 35: Hyeonsoo , Kang

Histograms of , and

Density functions of , and

[Continued]

So we set the threshold of 0.05 (or less is also possible according to the equation)

Experiments and Results

Page 36: Hyeonsoo , Kang

[Continued]

Minimum training error rates

Experiments and Results

Page 37: Hyeonsoo , Kang

(a) Known video repeat recogni-tion

(b) Unknown video repeat recognition

Prior knowledge about video repeats are known construct a feature vec-tor set and use nearest neighbor (NN) classifier to recognize copies of prototype videos.

Prior knowledge about video repeats are unknown detection, search, and learning issues arise.

Page 38: Hyeonsoo , Kang

Prior knowledge about video repeats are known construct a feature vec-tor set and use nearest neighbor (NN) classifier to recognize copies of prototype videos.

Prior knowledge about video repeats are unknown detection, search, and learning issues arise.

(a) Known video repeat recogni-tion

(b) Unknown video repeat recognition

Page 39: Hyeonsoo , Kang

(b) Unknown video repeat recognitionBig picture: We employ two cascade detectors. This is unknown video repeat recognition problem, so we need to give an algorithm to the machine to recognize repeats. Again, we’ll use visual properties, here we’ll em-

ploy color fingerprint (Yang et. al [ ]) The first detector discovers potential repeated

clips, and the second one improves accuracy.

Page 40: Hyeonsoo , Kang

(b) Unknown video repeat recognition

[Continued]

FIRST Stage

SECONDStage

Page 41: Hyeonsoo , Kang

(b) Unknown video repeat recognition

VU1 VU2 VU3 VU4…KF 1 KF2 KF3 KF4

Wait, but how do we find keyframes?

Page 42: Hyeonsoo , Kang

(b) Unknown video repeat recognition

VU1 VU2 VU3 VU4…KF 1 KF2 KF3 KF4

Wait, but how do we find keyframes? Keyframe selection is based on color histogram

difference. Suppose H1 and H0 are color histograms of cur-

rent frame and last keyframe respectively, then current frame is selected as new keyframe if the following condition is satisfied,

[Continued]

Page 43: Hyeonsoo , Kang

(b) Unknown video repeat recognition

VU1 VU2 VU3 VU4…KF 1 KF2 KF3 KF4

[Continued]

Page 44: Hyeonsoo , Kang

(b) Unknown video repeat recognition

VU1 VU2 VU3 VU4…KF 1 KF2 KF3 KF4

[Continued]

And we average the K blending images – Color fin-gerprint is the ordered catenation of these block features.

Page 45: Hyeonsoo , Kang

(b) Unknown video repeat recognition

[Continued]

And we average the K blending images – Color fin-gerprint is the ordered catenation of these block features.

Let R, G, B the average color values of a block, and their descending order is (V1, V2, V3), then the ma-jor color and minor color are determined by the fol-lowing rules:

Page 46: Hyeonsoo , Kang

(b) Unknown video repeat recognition

[Continued]

And we average the K blending images – Color fin-gerprint is the ordered catenation of these block features.

Let R, G, B the average color values of a block, and their descending order is (V1, V2, V3), then the ma-jor color and minor color are determined by the fol-lowing rules:

Page 47: Hyeonsoo , Kang

(b) Unknown video repeat recognition

[Continued]

If we divided the blending images into 8 X 8 blocks (M = N = 8) then the color feature is a 128 dimen-sional symbol vector!

To decrease the complexity of searching, we trans-form the data into a string representation using LSH (Local Sensitive Hashing) and use unit length filtering.

Now, the actual algorithm for the machine to rec-ognize the repeats, we devised a similarity mea-sure.

Page 48: Hyeonsoo , Kang

(b) Unknown video repeat recognition

[Continued]

If we divided the blending images into 8 X 8 blocks (M = N = 8) then the color feature is a 128 dimen-sional symbol vector!

To decrease the complexity of searching, we trans-form the data into a string representation using LSH (Local Sensitive Hashing) and use unit length filtering.

Now, the actual algorithm for the machine to rec-ognize the repeats, we devised a similarity mea-sure.

Page 49: Hyeonsoo , Kang

(b) Unknown video repeat recognition

[Continued]

Given two video units vu_i and vu_j, their difference is defined as:

Where F_i, and F_j are color fingerprint vectors of vu_i and vu_j, d(F_i, F_j) is color fingerprint distance function, len(*) is length feature.

Page 50: Hyeonsoo , Kang

(b) Unknown video repeat recognition

[Continued]

Second stage matching was conducted, we decide whether the repeat pair from the first stage is true or not from the following condition:

Score is the similarity value, L is the minimum length of the two clips in seconds, and is thresh-old. Once a repeat pair is verified, their boundaries are extended until a dissimilar one encountered.

Here we use a soft threshold!

Page 51: Hyeonsoo , Kang

Experiments and Results

Test set: for news video we chose 30 min CNN and ABC news clips from TRECVID. – for 12 days; 6 hours in total.

By manually searching short repeat clips including program logos and commercials, - 34 kinds of repeat clips with totally 186 instances

were found from CNN collection- While 35 kinds, 116 instances were found from

ABC collection

Page 52: Hyeonsoo , Kang

Experiments and Results

Detector training: 3 hour CNN news videos are ran-domly chosen for training (6 30-minute videos from the video collections)

And we are going to look at the resulting values of

+ the number of video units produced.

Page 53: Hyeonsoo , Kang

Experiments and Results

After training, detectors were tested on the rest 3 hour CNN videos and 6 hour ABC videos.

Recall and precision on CNN videos are 92.3 % / 96%

Recall and precision on ABC videos are 90.1 % / 90%

With the shortest correct repeat detected is 0.26s long, and the longest one is 75s long.

Also boundary accuracy was measured, the small-est shift was 0s (exact) and the largest one was 16.4s. The average shift was 0.47s. After the sec-ond stage of our algorithm, large shifts were re-duced to 0 ~ 1 second.

Page 54: Hyeonsoo , Kang

Experiments and Results

Performance comparison of video segmentation meth-ods

Page 55: Hyeonsoo , Kang

CNN news video structure analysis by video repeats

Page 56: Hyeonsoo , Kang

Summary

Video Repeat Recognition problem has many bene-fits yet it is a challenging task.For recognizing video repeats, we have divided the problem into two primary ones:

(1)Known video repeat recognition and(2)Unknown video repeat recognition

For the first one, we have constructed a feature vector set and use nearest neighbor (NN) classifier to recognize copies of prototype videos.

For the second one, we have dealt with the detec-tion, search, and learning issues arisen.

Our result is pretty accurate and efficient, while be-ing able to detect short video repeats and the long ones.

Page 57: Hyeonsoo , Kang

Questions1. What are the applications that video repeat

recognition can be used?  

2. To decrease the complexity of searching, what methods did we use?

3. What are the techniques used for efficiency of the algorithm on the known video repeat recogni-tion?

Page 58: Hyeonsoo , Kang

Questions1. What are the applications that video repeat

recognition can be used?   Video syntactical segmentation, commercial mon-itoring, video copy detection, web video multiplicity estimation, video content summary, personaliza-tion, video compression, …

2. In the unknown repeat recognition, to decrease the complexity of searching, what methods did we use? To decrease the complexity of searching, we transform the data into a string representation us-ing LSH (Local Sensitive Hashing) and use unit length filtering.

3. What are the techniques used for efficiency of the algorithm on the known video repeat recogni-tion, and why? subspace discriminative analysis by OPCAwhy? Because the degree of dimension on the fea-ture representation is too big

Page 59: Hyeonsoo , Kang

Bibliography[1] Yang, Xianfeng, Qi Tian, and Ee-Chien Chang. "A color fingerprint of video shot for content identification." Proceedings of the 12th annual ACM international conference on Multi-media. ACM, 2004.[2Yang, Xianfeng, and Qi Tian. "Video Repeat Recognition and Mining by Visual Features." Video Search and Mining. Springer Berlin Heidelberg, 2010. 305-326.

Page 60: Hyeonsoo , Kang

THANK YOU!

Page 61: Hyeonsoo , Kang

Q & A