報告者：陳冠州 2012.09.26

1

Investigating keyframe Selection Methods in the Novel Domain of Passively Captured Visual Lifelogs

研究 Keyframe 的新奇選擇方法基於被動式拍照的視覺生活紀錄

報告者：陳冠州2012.09.26

2

Outline

• Introduction• Background and related work• Keyframe approaches• Experiment overview• Results and discussion• Conclusions

3

1.Introduction

• 目標– 自動拍照– 檢索個人活動• 瀏覽網頁、接收 email 、交談、活動參與情形

– 硬體使用 Microsoft SenseCam• 被動紀錄生活經驗的穿戴式數位相機

– 紀錄成一連串的相片• 類似影像內容• 可以分割成不連續的 unit 或稱為 event

4

1.Introduction (cont.)

– 大量的影像• 成長速度約 1900 張 / 天• 22 個事件 / 天• 挑出有意義的資訊做管理　具有挑戰性• 挑出可代表 event 的影像 (keyframe) 是很重要的• 穿戴設備的人可以快速的審閱產生的內容

判斷是否具高度相關性• 天天產生的內容需要得到有效率的處理• 有很大的比例照到品質差的相片

5

2.Backguound And Related Work

• 事件分割 (Event Segmentation)– 需要自動劃分 LifeLog 的事件– LifeLog 的事件分割比 Video 的事件分割難• LifeLog 每張間格 50 秒， Video 是連續畫面 ( 容

易判斷出事件分割點 )

– 場景分割偵測 (scene boundary detection)• 非分鏡偵測 (shot boundary detection)

– 一個場景通常由多個分鏡構成– 因為事件或活動具有內在的含意– 使用這種方法是不切實際的無法預期 performance

6

7

2.Backguound And Related Work(cont.)

– 推薦結合影像特徵 (content) 及 sensor 感測值(context)• F-Measure value 為 0.6237 ( 越接近 1 越好 )• F-Measure=2PR/P+R

http://botonnote.blogspot.tw/2011/10/retrieval-precision-recall-f-measure.html• P: Precision ( 精確度 ) 　找到內容中　有意義的比例• R: Recall ( 召回率 ) 　找到有意義於所有有意義的比

例

http://botonnote.blogspot.tw/2011/10/retrieval-precision-recall-f-measure.html

http://botonnote.blogspot.tw/2011/10/retrieval-precision-recall-f-measure.html

8

2.Backguound And Related Work(cont.)

• Keyframe 選擇– 一個事件是由多張圖片組成– 每個事件需選出一張當作 Keyframe– 比較不同的 Keyframe 選擇方法• baseline• 選擇該事件中，與該事件其他圖最靠近的影像• 選擇該事件中，與該事件其他圖最靠近的影像

但與其他事件中的圖最具區別 ( 不同 ) 的影像– 後兩者方法較費時計算，可運用 GPU 加速

9

3.Keyframe Approaches

• 選 Keyframe 前，先分出 event• Paper 中會介紹到傳統的技術及新奇的技術• 事件分割 (Event Segmentation)– 先以天為單位分解成一個大區塊– 每張圖都使用 MPEG-7 descriptors 及

SenseCam 的 sensor 值作為描述圖片的方法• MPEG-7 descriptors [1]

– Colour layout– Colour structure– Scalable colour– Edge histogram

10

3.Keyframe Approaches(cont.)

– 分割一整天的圖片流程 [5]• 將一張圖片分出幾個大區塊，再與其他圖做比較• 決定一個門檻，當 visual 或 sensor 值有大改變時，

即可得知 event 的分界• 移除太相近的 event 分界點• 此方法的

– Precision ( 精確度 ) ： 62.57%– Recall( 召回率 ) ： 62.17%

[5] A. R. Doherty and A. F. Smeaton. Automatically segmenting lifelog data into events. In WIAMIS 2008 - 9th International Workshop on Image Analysis for Multimedia Interactive Services, 2008.

http://doras.dcu.ie/4651/1/Aiden_Doherty_WIAMIS_08.pdf





11


• Traditional Keyframe Selection Techniques– Three approaches( 後面說明 )• Middle image• Most representative of a given event• Most representative of a given event but also most

different to the other events

– 實驗• 使用上面三個方法• 101 events, 8,247 images• 1-5 Likert scale

12


– 徹底的比較 image VS. 只比事件平均 image• Select the image that is closest to all other images in

the event. ( 需要 n x n 次比較， n 為 event 內的圖片數量 )– 每次比較都要重新計算，選出一張跟其他張差距和最小的

• Select the image that is closest to the average of all the other images in a given event. ( 只需要 n 次比較 )– 只需要算出事件的平均值，找最接近平均值者

• 結論– 1-5 Likert scale ： (3.35 VS. 3.33)– 差距些微，但前者耗時很多

13

3.Keyframe Approaches(cont.)– Cooper & Foote 的另一種方法 VS. 本篇方法• Select the image that is closest to all the other images in

a given event, but most different to all the other images in the other events [4].– 此方法前部分需要 N x N 次，後部分需要 N x M 次比較，共 N x (N+M) 次比較N ：一個事件影像數， M ：一天影像數≒ 1900

• Selection of the image that is most representative of an event by being closet to the average value of that event, but also what distinguishes( 區別 ) it best from other events (by comparing against the average value of each of the other events).– 每個事件先算出最靠近平均值的 image ，比較就會比較快速– 此方法只需要 N x E 次比較， E ：一天 event 數量≒ 22

14


• 結論– 3.01 vs. 3.14– 本篇方法只需 Cooper & Foote 4% 的運算量

[4] M. Cooper and J. Foote. Discriminative techniques for keyframe selection. In ICME 2005 – IEEE International Conference on Multimedia and Expo, 2005.

http://cecs.uci.edu/~papers/icme05/defevent/papers/cr1319.pdf






15


– Best Vector Distance Metric (向量距離測量 )• 前面幾個章節都需要計算及比較距離

作者比較了以下幾種方法– Histogram Intersection (Likert score 3.35)– Kullback-Leiber (3.30)– Manhattan (3.54)– Euclidean (3.59)

• Euclidean approach performed best

– 在事件中增加重視 (權重 ) 中間部分的圖片(Middle of event)• 選擇開始或結尾的圖危險的可能性會增加• Likert score ： 3.59 VS. 3.25

16


–正規化與資料融合• 正規化

– v’ = v / Max – Min– 正規化到 0~1之間

• 資料融合– 使用 CombSUM – 將多筆正規化後的資料，加總做排名

17


• Image Quality Measures– Contrast Measure (對比測量 )• 轉換色彩空間 (RGB→YUV) 　 (*1)• 將整張影像以 8x8 為一個 block 做劃分• 每個 block 中，找出 Y(亮度 ) 的最大值及最小值做

相減，得到該 block 的對比• 平均所有 block 的對比，則為整張影像的對比

*1.YUV ：色彩空間的一種， Y 代表明亮度， UV 為色度及濃度

18


YUV 　　 Y

U 　　 V

圖片來源： http://nauful.com/pages/imagecompression.html

• YUV色彩空間

http://nauful.com/pages/imagecompression.html

19


– Colour Variance (顏色變異 )• 測量顏色的豐富程度• 需要色彩空間中， 8 個主要顏色 (*2)對每個 pixel 算

距離• 主要顏色對每個 pixel之間距離 (*3) 最小者，則將該

pixel 分至該主要顏色堆內• 算出每個主要顏色堆的平均變異數，根據經驗來說，

平均變異數會比閥值高 20%

*2.八個主要顏色為，黑、白、紅、綠、藍、黃、青、洋紅*3. Euclidean distance (歐幾里德距離 ) ：

20


– Noise Mesure (雜訊測量 )• 為了計算整個影像的雜訊量，我們需要每個 pixel 都進行檢測• 每個 3x3 的區塊中，分別計算每個 pixel 的值跟 3x3

區塊中平均值的歐幾里德距離• 在 3x3 區塊中間的值歐幾里德距離，如果是最大的

( 跟周圍 8 個比較 ) ，則那個 pixel 將被標記為雜訊• 最後計算被標記為雜訊的 pixel佔全部的

pixel 多少 %中間的值

21


– Global Sharpness (全域清晰度 )[17]• 做垂直方向的 Sobel 以找出邊緣• 依序掃描整張影像的每一列• 邊緣的開始與結束位置被定義為區域極值的位置，而這個位置接近邊緣• 這個邊緣的寬度不同於給出的結束跟開始位置，而對於這個邊緣，這個寬度被認定為區域模糊的大小• 整張影像的模糊大小為每個區域模糊大小的平均

[17] P. Marziliano, F. Dufaux, S. Winkler, and T. Ebrahimi. A no-reference perceptual blur metric. Image Processing. 2002. Proceedings. 2002 International Conference on, 3, 2002.

http://stefan.winklerbros.net/Publications/icip2002.pdf

http://stefan.winklerbros.net/Publications/icip2002.pdf

22

找垂直邊緣

邊緣數 =0(number of edges)全部模糊量 =0(total blur measurement)

模糊量 = 全部模糊量 / 邊緣數

找出邊緣的開始及結束位置 ( 區域極值 )

計算區域模糊量 ( 邊緣寬度 )

全部模糊量 = 全部模糊量 + 邊緣寬度邊緣數 = 邊緣數 +1

是否為最後一個 pixel

現在的位置是否有垂直邊緣

到下一個 pexel

23


• Selecting a quality approach ( 選擇評估品質的方法 )– 全面性的評估 (各種 Image Quality Measures混合搭配使

用 )» Contrast 、 Colour Variance 、 Global

Sharpness 、 Noise 、 Saliency 、 Accelerometer 、 Light Sensor

– 有 8248 張影像 101 個事件分類– 採用正規化後之低階影像特徵和 / 或 sensor 值– 一個標註者用 five-point Likert scale (*1) 評分

(*1) Likert scale ：李克特量表 1.強烈反對 2. 不同意 3.既不同意也不反對 4. 同意 5.堅決同意

24

3.Keyframe Approaches(cont.)– 11 種評估品質的方法 (quality approaches)

1. Sensor Values (2.91)• Accelerometer Sensor 、 Light Sensor

2. Basic Quality (2.21)• Blur 、 Noise 、 Colour Variance

3. Weighted Approach 1 (2.72)• Blur (0.2) 、 Noise (0.2) 、 Colour Variance (0.6)

4. All Quality Measures (3.67)• Blur 、 Noise 、 Colour

Variance 、 Contrast 、 Salience5. All Quality & Sensor (3.40)

• Accelerometer 、 Light 、 Blur 、 Noise 、 Colour Variance 、 Contrast 、 Salience

25

3.Keyframe Approaches(cont.)6. Combination Approach 1 (3.42)

• Accelerometer Sensor 、 Noise 、 Colour Variance 、 Contrast 、 Salience

7. Combination Approach 2 (3.72)• Blur 、 Noise 、 Colour Variance 、 Light

Sensor 、 Salience8. Simple Approach 1 (3.72)

• Contrast 、 Salience9. Simple Approach 2 (3.49)

• Blur 、 Contrast 、 Salience10. Simple Approach 3 (3.67)

• Blur 、 Colour Variance 、 Contrast 、 Salience11. Weighted Approach 2 (3.70)

• Blur 、 Noise (0.25)• Constrast 、 Salience (0.75)

26

3.Keyframe Approaches(cont.)– 結論

» 4, 7, 8, 10, 11 有較好的結果， 7, 8 最好» 將 Blur 以 Accelerometer sensor取代 4→6

• 3.67 VS. 3.42 結果不好» 將 Constrast 以 Light sensor取代 4→7

• 3.67 VS. 3.72 結果較好» 方法 8 的運算的速度比方法 7 要來的好

Figure 4: Comparision of performance as either better than, equal to or worse than the average performance of unique keyframes selected

27

3.Keyframe Approaches(cont.)• Approaches for Investigation (Keyframe select)

– Middle Image (Baseline)• Select middle image.

– Within Event ( 不超過 event)• Select the image within the event that is closest to the average value of

all images in the event.• 選擇一張最靠近事件圖片平均值的圖片

– Cross Event (超過 event)• Select the image within the event that is closest to the average value of

all the images in this event, but most different to the average value of all the images in the other events of that same day.

• 不僅符合 Within Event ，還要與其他事件的圖片平均值最不同– Image Quality

• Select the image with the highest quality.• 選擇品質最好的圖片

28


– Within Event and Image Quality Fusion• Select the image that is most representative of the

event, but which also has a good quality score.• 選擇最具代表性的圖片且也有很好品質分數

– Cross Event and Image Quality Fusion• Select the image that is most representative of the

event, that also has a good image quality, and is finally distinguishable from the images in the other events.• 選擇最具代表性的圖片，也有很好品質分數，且與

其他事件圖片有區別 ( 不同 )

29

4. Experiment Overview

• Segmentation method [5] 分割出事件• 有 6 種選擇 Keyframe 的方法–假設一個事件有可能選出 6 種不同的 Keyframe– SenseCam 使用者必須以 five-point Likert scale 判

定 6 種方法取出之 Keyframe 代表事件的程度(1~5)

–若不同的方法選到相同的圖，只需判別一次• 一致性• 減少判別 (5597 次 )

30

4. Experiment Overview (cont.)– Keyframe 標註工具• 上方顯示標註進度• 左邊為要標註的圖，右邊是那個 enent 所有的圖• 下方為 five-point Likert scale評分處

31

5. Results and Discussion

• 五位使用者判斷完 13,410 個 Keyframe 需求• Image Quality 與 Within Event 或 Cross Event 結

合– 最有效的方法 (3.99)– 比 baseline 好 8.4%– 前者效率較好

32

5. Results and Discussion (cont.)

• Quality measures–對 User3 較無效果–對 User1 、 5 非常有效– 此方法的

performance是變動的，但跟其他方法結合是有效的

33


• Overall daily average performance– “Within Event” 或” Cross Event” 結合 image

quality 在大部分的 event 將會勝出• 80%至少一個方法比 baseline 好• Quality measure 整體

表現好，但不代表單一天也會好

34


• Difficulty in Selecting Correct Keyframe– 一個事件中，可能有一個以上的活動可代表事

件 Keyframe ，故很難選擇• 使用者標註時，盡可能標註趣事

– 有大量的 visual change( 視覺畫面改變 )• 如：某事件代表穿戴者從家裡走路到附近超商

– 包含：下樓梯、開門、走在馬路上、接近超商、抵達超商

35


• Selection of events with high visual change– 為了判定大量高變化的影像，需要使用 MPEG-7

的特徵

36


• Performance of approaches on events with high visual variability– 高度視覺變化事件，六種方法的 performance• High visual variability 的 performance 比 all events 低• 此時 image quality 的 performance

37


–儘管 Quility 有結合其他方法的 approach ，但在 high visual 時，只用 Quility 在 usre1 、 5有很好的表現

38

6. Conclusions

• 選擇一個恰當的圖來當 Keyframe 是很具挑戰

• 全部的圖有高達 40% 的圖都是品質不好的• 傳統的方法沒有考慮到影像的品質• 唯一的缺點是較耗費計算• 69.92% Quility 比 Middle(baseline) 表現好• 比全部的平均好 6.07%

Documents

報告 者：陳冠州 2012.09.26

報告者：陳冠州 2012.09.26