238
多多 多多多

多媒體資料庫(New)3rd

  • Upload
    kevingo

  • View
    3.262

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 多媒體資料庫(New)3rd

多媒體資料庫

Page 2: 多媒體資料庫(New)3rd

提綱• 簡介• 多媒體資料庫的挑戰• 多維度索引技術• 文件資料庫• 影像資料庫• 音訊資料庫• 視訊資料庫

Page 3: 多媒體資料庫(New)3rd

簡介• 多媒體資料與傳統資料庫之比較

– 資料內容• 傳統資料庫

– 以文字的方式儲存,常以多個屬性描述一個實體或物件• 多媒體資料庫

– 為涵義豐富的媒體,內容無法單純以多個屬性將其描述– 資料展示

• 傳統資料庫– 文字,表單

• 多媒體資料庫– 需要更豐富的視覺聽覺之展示

Page 4: 多媒體資料庫(New)3rd

簡介• 範例

– 將圖片以傳統資料庫的方式處理儲存• 可下達的查詢

– 找出 XXX 所畫的圖片– 找出在 1945~1955 年,由 OOO 所繪製的圖片

• 無法處理的查詢– 找出與此圖片相類似的圖片– 找出左上角有一台紅色車子的圖片

Page 5: 多媒體資料庫(New)3rd

簡介• 多媒體資料庫必須能提供

– 有效率之多媒體資料之儲存– 提供內涵式資料的查詢

• 與媒體本身內容相關之查詢– 多樣性的多媒體資料之展示

Page 6: 多媒體資料庫(New)3rd

多媒體資料庫的挑戰• 大量資料之處理

– 多媒體資料所需之儲存空間比一般資料大得多

• 多維資料之索引– 快速的搜尋技巧

• 相似度之計算– 容錯式的查詢

• 資料之展示

Page 7: 多媒體資料庫(New)3rd

多維度索引技術• 如何將使用者查詢的結果快速正確的回傳,

是很重要的問題– 資料量大,逐筆搜尋比對耗費過多的時間– 避免逐筆比對搜尋

• 對資料建立索引加快查詢• 索引可視為一種分類的指標,依據索引的指示,

即可找到與查詢相關的資料。

Page 8: 多媒體資料庫(New)3rd

多維度索引技術• 在傳統資料庫中常見的索引結構

– B+-tree • 最廣為使用的索引結構

– Hash• Static hashing

• Dynamic hashing

– Grid file– Bitmap index

Page 9: 多媒體資料庫(New)3rd

B+ -Tree 簡介 •B+ -Tree 為一樹結構,且符合下列特性

•為一棵平衡樹,所有的葉節點到根節點的路徑長度皆相同•對於所有的非根節點以及非葉節點,必須擁有 n/ 2 ~ n 個子節點•葉節點必須擁有 (n-1)/2 ~ n-1 值

Page 10: 多媒體資料庫(New)3rd

B+ -Tree 節點結構

– Ki 為搜尋值 – Pi 為指向子節點的指標 (for nonleaf node

s) 或為指向資料的指標 (for leaf nodes). • 節點內的搜尋值為排序過的 K1 <

K2 < K3 < ... < Kn

Page 11: 多媒體資料庫(New)3rd

葉節點結構 葉節點之特性• 對於 i = 1, 2, . . . , n-1 , i 不是指向一個擁有搜尋值 Ki

的資料記錄就是指向一個存取單元 ((bucket) ,而這個存取單元只包含擁有搜尋值 Ki 的資料

• Pn 指向下一個葉節點

Page 12: 多媒體資料庫(New)3rd

非葉節點結構 • 在被 Pi 指到的搜尋樹內所有的搜尋值皆

小於 Ki-1 • 在被 Pi 指到的搜尋樹內所有的搜尋值皆

大於或等於 Ki

Page 13: 多媒體資料庫(New)3rd

範例

Page 14: 多媒體資料庫(New)3rd

討論• B+-tree 對於傳統表單資料庫的搜尋十分

有效率,且廣為被使用• 然而

– B+-tree 為單一維度的索引結構– 多媒體資料的特性

Page 15: 多媒體資料庫(New)3rd

• 多媒體資料的特徵– 文件

• 內容• 關鍵字

– 圖片• 主要構成顏色• 包含物件• 物件大小• 顏色分佈• 紋理特徵…

Page 16: 多媒體資料庫(New)3rd

– 音樂• 節拍• 和絃• 音調…

– 影片• 物體之移動軌跡• 包含物件• 顏色…

• 可以依內涵資訊為查詢條件– 找出與某張圖相像的圖– 找出包含類似某段旋律之歌曲– 找出有機車飛越火車的影片片段

Page 17: 多媒體資料庫(New)3rd

• 一個多媒體資料是由多個特徵所描述,可由多維資料表示

• 然而 B-tree, B+-tree– 僅能對單一維度的資料做索引– 不適用於多維度資料

• 如何對多維度資料建立索引加速查詢,對多媒體資料的搜尋十分重要。

Page 18: 多媒體資料庫(New)3rd

多維度上的索引結構• k-d tree

– 用來儲存 k –dimension 的資料– 在一個層級 (level) 中只比一個維度的資料– 在節點 N 所在層級比較的維度上,在節點 N

所指到的左子樹內所有的資料其該維度的值皆比節點 N該維度的值小,而右子樹的值則皆大於或等於節點 N該維度的值

Page 19: 多媒體資料庫(New)3rd

• 範例

Page 20: 多媒體資料庫(New)3rd
Page 21: 多媒體資料庫(New)3rd

– 隨堂練習• 考慮當 k>2 時的 k-d tree

– 自己試試看– 將 A(30,24,58), B(46,78, 33), C(20,3

3,15), D(58,40, 50), E(40,88,56), F(38,54,44) 插入 k-d tree 中

– 請利用你所建立的 k-d tree ,找出與 X(34,50,46)距離在 15 以內的點

• 刪除時如何處理 ?

Page 22: 多媒體資料庫(New)3rd

– 優點• 簡單

– 缺點• 樹的高度會因資料插入順序的不同而不同• 很可能造成一棵歪斜樹

– 搜尋的效率將會變得十分差• 資料刪除的過程較為複雜

Page 23: 多媒體資料庫(New)3rd

多維度上的索引結構• Mx-quadtree

– 樹的形狀與插入的點的個樹以及順序無關。– 設計者必須決定一個 k ,而 k 一旦決定,則

無法更改。– 整個地圖會被切成 個格子– 刪除與插入的步驟十分簡單

kk 22

Page 24: 多媒體資料庫(New)3rd

• 範例– 假設 k=2

– 地圖被切成 個格子– 將 A,B,C,D四個點放入MX-quad-tree 中

22 22

Page 25: 多媒體資料庫(New)3rd
Page 26: 多媒體資料庫(New)3rd

多維度上的索引結構• R-tree

– 為一棵平衡的樹– 針對大量資料的儲存十分有用– 可減少大量的磁碟存取– 一個 R-tree 的節點有 k 個指標– 除了根節點與葉節點外,每一個節點必須包

含 至 k 個非空的指標• 控制磁碟存取的次數

2

k

Page 27: 多媒體資料庫(New)3rd

– 葉節點包含真正的資料– 中間節點包含真正資料的群組輪廓,以長方形來表示

• 左上角以及右下角• 可為多維度

– 插入與刪除包括了節點的分裂以及整合,較為複雜。

Page 28: 多媒體資料庫(New)3rd

– 範例• 總共有八個物件• 兩維空間• 假設 k=3

Page 29: 多媒體資料庫(New)3rd

插入 p14

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1 p2

Pointers to data tuples

p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8

p2

p3

p4

p5

p9 p10

p11

p12

p13p14

Page 30: 多媒體資料庫(New)3rd

inserting p14

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1 p2

Pointers to data tuples

p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8

p2

p3

p4

p5

p9 p10

p11

p12

p13p14

p14

Page 31: 多媒體資料庫(New)3rd

– 刪除 p2

p14

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1 p2

Pointers to data tuples

p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8

p2

p3

p4

p5

p9 p10

p11

p12

p13p14

Page 32: 多媒體資料庫(New)3rd

• 找出包含 P2 的 MBR

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1 p2

Pointers to data tuples

p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8

p2

p3

p4

p5

p9 p10

p11

p12

p13p14

p14

Page 33: 多媒體資料庫(New)3rd

– R3 不滿足 R-tree 的定義 (underflow)

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1

Pointers to data tuples

p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8p3

p4

p5

p9 p10

p11

p12

p13p14

p14

Page 34: 多媒體資料庫(New)3rd

– 與鄰近的 Bounding rectangle 合併

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1

Pointers to data tuples

p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8p3

p4

p5

p9 p10

p11

p12

p13p14

p14

Page 35: 多媒體資料庫(New)3rd

• 將 R3和 R4 重新整理,修改個別的左上角以及右上角之值

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1

Pointers to data tuples

p8

p3p4 p9 p10p11 p12p13

R6 R7

R3

R4

R5

R6

R7

p1

p7

p6

p8p3

p4

p5

p9 p10

p11

p12

p13p14

p14

Page 36: 多媒體資料庫(New)3rd

另一種簡單之整合單一維度索引之多維索引結構

• 一個多媒體物件會包含的特徵為多維的– 假設一張圖片我們以平均的 R( 紅 ), G(綠 ),

B(藍 )當為特徵– 特徵空間為三維– 範例

Page 37: 多媒體資料庫(New)3rd

• 假設資料庫內有十張圖

P1

P3

P2

P5P4

Page 38: 多媒體資料庫(New)3rd

P6

P10P9P8

P7

Page 39: 多媒體資料庫(New)3rd

• 有下列三個查詢

Q1 Q2 Q3

Page 40: 多媒體資料庫(New)3rd

• 我們得到下列的 R,G,B 的平均值

Picture ID (R, G, B) Picture ID (R, G, B)

P1 (0.102, 0.101, 0.086) P8 (0.318, 0.365, 0.561)

P2 (0.275, 0.251, 0.161) P9 (0.361, 0.302, 0.184)

P3 (0.627, 0.447, 0.302) P10 (0.451, 0.396, 0.400)

P4 (0.145, 0.153, 0.227) Q1 (0.478, 0.541, 0.753)

P5 (0.141, 0.137, 0.184) Q2 (0.302, 0.310, 0.416)

P6 (0.212, 0.200, 0.231) Q3 (0.302, 0.223, 0.161)

P7 (0.180, 0.180, 0.102)

Page 41: 多媒體資料庫(New)3rd

• Case 1: 假設我們要找到與查詢圖片 Q1

相似度在 0.15 內的圖片– Q1 =(0.478, 0.541, 0.753), r=0.15

– 資料庫內找出個個維度與查詢最近的值• (0.451, 0.447, 0.561)

• (|0.478-0.451|, |0.541-0.447|, |0.753-0.561|)

=> (0.027,0.094,0.192)

所以沒有符合查詢的資料>0.15

Page 42: 多媒體資料庫(New)3rd

• Case 2:假設我們要找到與查詢圖片 Q2 相似度在 0.02 內的圖片– Q2=(0.302,0.310,0.416), r=0.02

– 第一步 : 在資料庫內找出個個維度與查詢最近的值

• (0.318, 0.302,0.400)

• (|0.302-0.318|, |0.310-0.302|, |0.416-0.400|)

皆小於 0.02 ,故必須進行第二步驟

Page 43: 多媒體資料庫(New)3rd

– 第二步驟 : 由第一個維度開始檢查,並將容許的錯誤減掉之前已用到的額度

• 第一維可用的誤差額度為 :0.02

• 第二維可用的誤差額度為 : (0.022-0.0162)1/2

• 第三維可用的誤差額度為 : (0.022-0.0162-0.0082)1/2

大於已知之最小誤差 0.016,所以資料庫內沒有符合查詢的資料存在

Page 44: 多媒體資料庫(New)3rd

• Case 3:假設我們要找到與查詢圖片 Q3 相似度在 0.05 內的圖片– Q3=(0.302, 0.223, 0.161), r=0.05

– (|0.302-0.318|, |0.223-0.200|, |0.161-0.161|)

=> (0.016,0.023,0) G>R>B– 第一步驟無法決定資料庫內沒有欲查詢的資

Page 45: 多媒體資料庫(New)3rd

– 搜尋 G 上的索引 , 找出在 0.223+-0.05 =[0.173,0.273] 範圍內的圖

• {P7, P6, P2}

– 搜尋 R 上的索引 , 找出在 0.302+-(0.044) =[0.258, 0.346] 範圍內的圖

• {P2, P8}

– 搜尋 B 上的索引,找出在 0.161+- 0.041 =[0.120, 0.202] 範圍內的圖

• {P2, P5, P9}

– 將結果整合,得到 P2 為符合查詢之資料

Page 46: 多媒體資料庫(New)3rd

演算法Algorithm 1 Build_Multi_Index /* input : a set of k points P i in n-dimensional feature space */ /* assume P i = (x1,i, x2,i, ..., xn,i) */ /* output : a multi-index MI = {I1, I2, ..., In} */ 0: begin 1: for i = 1 to n 2: begin 3: sort {xi,1, xi,2, ..., xi,k} and store the result in Ii 4: end 5: return {I1, I2, ..., In} 6: end

Page 47: 多媒體資料庫(New)3rd

Algorithm 2 Find_Difference_Vector /* input : a query point Q = (q1, q2, ..., qn) in n-dimensional feature space */ /* a query radius r */ /* a multi-index set MI = {I1, I2, ..., In} */ /* output : a difference vector = (1, 2, ...,n) */ 0: begin 1: for i = 1 to n 2: begin 3: search the nearest value to q i in Ii 4: let xi be the nearest value 5: i = | qi - xi | 6: if i r 7: return null 8: endif 9: end 10: return = (1, 2, ...,n) 11: end

Page 48: 多媒體資料庫(New)3rd

Algorithm 3 Candidate_Point_Retrieval /* input : a query point Q = (q1, q2, ..., qn) in n-dimensional feature space */ /* a query radius r */ /* a multi-index set MI = {I1, I2, ..., In} */ /* a difference vector = (1, 2, ...,n) */ /* output : the candidate point set in each dimension */ 0: begin 1: sort = (1, 2, ...,n) into ’ = (1’, 2’, ...,n’), 2: where i’ = oi, for 1 i n, oi denotes the subscript of i’ in 3: let R = (r1, r2, ...,rn) 4: let r1 = r

Page 49: 多媒體資料庫(New)3rd

0: for i = 2 to n 1: begin 2: ri = (ri-1

2 - i-12)1/2

3: if ri2 0

4: return null 5: endif 6: end 7: for i = 1 to n 8: search oi ri in Ioi and store the candidate points to S i 9: if NumOf(Si) = 0 10: return null 11: endif 12: return S1, S2, ...,Sn 13: end

Page 50: 多媒體資料庫(New)3rd

Algorithm 4 Candidate_Point_Set_Merging /* input : the candidate point sets S1, S2, ...,Sn */ /* the searching order list O 1, O2, ...,On generated by Algorithm 3 */ /* a query point Q = (q1, q2, ..., qn) in n-dimensional feature space */ /* a query radius r */ /* output : the result feature points and their similarity measures */ 0: begin 1: for i = 1 to k /* initial two tables: Counter and Distance */ 2: begin 3: Counter[i] = 0 /* store the number of dimensional conditions satisfied by Pi */ 4: Distance[i] = /* store the partial computed distance between Pi and Q */ 5: end 6: for each candidate point Pi in S1 7: begin 8: Counter[i] = 1 9: Distance[i] = | qOi - xOi,i | /* qOi is the value of Q in the Oi-th dimension */ 10: end /* xOi,i is the value of Pi in the Oi-th dimension */

Page 51: 多媒體資料庫(New)3rd

0: for j = 2 to n 1: begin 2: for each candidate point Pi in S j 3: begin 4: if Counter[i] = j-1 5: Distance[i] = (Distance[i]2+ (qOi - xOi,i )

2)1/2 6: if Distance[i] qOi 7: Counter[i] = Counter[i] + 1 /* Pi satisfies the query condition in */ 8: endif /* the j-th dimension */ 9: endif 10: end 11: end 12: for each Pi with Counter[i] = n 13: begin 14: output(Pi, Distance[i]) 15: end 16: end

Page 52: 多媒體資料庫(New)3rd

文件資料庫• 導論

– 文件內容的分析• 同意字 (Synonymy)

• 一辭多義 (Polysemy)

– 搜尋結果之評估• Precision

– 找到的文件正確的機率• Recall

– 相關的文件被找到的機率

Page 53: 多媒體資料庫(New)3rd

文件資料庫• Precision=

• Recall=

所有的文件

相關的文件

搜尋所得之結果

)(1

)(1100

搜尋所得之結果相關的文件搜尋所得之結果

card

card

)(1

)(1100

相關的文件相關的文件搜尋所得之結果

card

card

Page 54: 多媒體資料庫(New)3rd

文件資料庫• Precision/recall 之計算範例

– 請探討 precision/recall 之關係

所有的文件

相關的文件

搜尋所得之結果50 15

0

20

Page 55: 多媒體資料庫(New)3rd

文件資料庫• 文件內容之描述

– Stop lists• 文件內可被忽略的

字,如 : a, the, he…

– Word stems• 同一個字各種不同

之時態或單複數等等

– Frequency tables

Term/ 文件 d1 d2 d3 d4

sex 1 0 0 0

drug 1 0 1 3

videotape 1 0 0 3

connection 0 0 0 2

slip 0 2 2 0

boat 0 1 0 0

Page 56: 多媒體資料庫(New)3rd

文件資料庫• 查詢處理

– 文件相關性之計算

• 字詞距離

• Cosine距離

M

j

ivecjvec1

2))()((

|)(||)(|

)()(

jvecivec

jvecivec

Page 57: 多媒體資料庫(New)3rd

文件資料庫– 查詢型態

• 找出包含某些字詞的文件• 找出包含某些字詞但不包含另一些字詞的文件• 找出離查詢向量最近的文件• 找出離查詢向量最近的前 k 個文件• 找出與查詢向量距離之內的文件

Page 58: 多媒體資料庫(New)3rd

文件資料庫– 使用索引

• R-tree– 不適用於高維索引結構

• TV-tree– 與 R-tree 類似 ,但在一各節點 , 只考慮部分維度的關係

• Inverted list

• Signature files

Page 59: 多媒體資料庫(New)3rd

文件資料庫• Inverted list(反轉串列 )

– 以字詞為主所形成的反轉表– 以 table 為例

• Sex : d1• Drug: d1, d3, d4• Videotape: d1, d4

– 搜尋範例以及型態• and, or, not• 無法處理相似度查詢

– 缺點• Size 大

– 壓縮技巧

Page 60: 多媒體資料庫(New)3rd

文件資料庫• Signature files

– 每個關鍵字有它所對應的 code– 對一文件而言 ,該文件的 signature 即為將其

所包含的關鍵字的 code superimpose 在一起– 搜尋範例– 可處理之查詢型態

• And, Or

• Not ?

Page 61: 多媒體資料庫(New)3rd

文件資料庫• 討論

– R-tree, TV-tree 可處理相似度的查詢– Inverted indices, Signature files 無法處理相似度的查詢,

只能處理包含某些字詞的查詢– Signature files 不適合處理不包含某個 (些 ) 字詞的查

詢» 請舉例說明

– R-tree 不適合高維度的資料

Page 62: 多媒體資料庫(New)3rd

影像資料庫• 查詢範例

– 範例一 : 找出與這張圖相像的圖片– 範例二 : 找出左上角有一個紅色方形,而圖形的下

方為藍色的所有圖片• 可代表一張影像的資訊

– 與影像內涵資訊無關之資訊• 作者• 完成時間• 完成地點• etc..

Page 63: 多媒體資料庫(New)3rd

– 與影像內涵資訊相關的特徵• 顏色分佈

– 可以 color histogram 表示• 紋理• 內含物件

– 形狀– 顏色– 大小– 位置

• 主要構成顏色• Etc.

Page 64: 多媒體資料庫(New)3rd

• 由關鍵字查詢 (Query By Keyword)• 以文字屬性描述每張影像,可以對個個屬性建構索引,並可以 SQL 的方式下查詢

• 以範例查詢 (Query By Example (QBE)):• 使用者對系統展示一張範例圖,系統則根據資料庫內每張圖與這張範例圖的相似度決定回傳的答案

• 查詢型態• 找出離查詢範例最近的影像• 找出離查詢範例最近的前 k 個影像• 找出與查詢範例距離之內的影像

影像資料庫搜尋

Page 65: 多媒體資料庫(New)3rd

影像距離與相似度

1. Color Similarity

2. Texture Similarity

3. Shape Similarity

4. Object & Relationship similarity

Page 66: 多媒體資料庫(New)3rd

顏色相似度 (Color Similarity)

• 顏色佔的比例•Ex: R:20%, G:50%, B:30%

• 顏色分布圖 (Color histogram)

Dhist(I,Q)=(h(I)-h(Q))TA(h(I)-h(Q))

A is a similarity matrix colors that are very similar should have similarity values close to one.

Page 67: 多媒體資料庫(New)3rd

• Color layout matching: compares each grid square of the query to the corresponding grid square of a potential matching image and combines the results into a single image distance

where CI(g) represents the color in grid square g of a database image I and CQ(g) represents the color in the corresponding grid square g of the query image Q. some suitable representations of color are

1. Mean2. Mean and standard deviation3. Multi-bin histogram

顏色配置

g

QIcolorcolorgridded gCgCdQId ))(),((ˆ),(_

Page 68: 多媒體資料庫(New)3rd

• Pick and clickSuppose T(I) is a texture description vector which is a vector of numbers that summarizes the texture in a given image I (for example: Laws texture energy measures), then the texture distance measure is defined by

• Texture layout

材質相似度 (Texture Similarity)

g

QItexturetexturegridded gTgTdQId ))(),((ˆ),(_

2

__ )()(min),( QTiTQId Iiclickandpick

Page 69: 多媒體資料庫(New)3rd

形狀相似度 (Shape Similarity)

1. Shape Histogram

2. Boundary Matching

3. Sketch Matching

Page 70: 多媒體資料庫(New)3rd

Image-A Image-B Image-C

• 以內容來看,三張圖相像嗎 ?– 顏色資訊– 位置資訊

實作範例

Page 71: 多媒體資料庫(New)3rd

Color model

Red=(1,0,0)

Black=(0,0,0)

Yellow=(1,1,0)

Green=(0,1,0)

Blue=(0,0,1) Cyan=(0,1,1)

White=(1,1,1)Magenta=(1,0,1)

Red 0

°

H

V

S

Green 120°

Blue 240°

RGB color space v.s. HSV color space

Page 72: 多媒體資料庫(New)3rd

• 顏色與位置資訊的取得– 將圖切成一個一個的格

子– 找出每個格子的代表色– 相鄰格子有相同的顏色,

及組合成更大的格子– 最後的大區塊的顏色,位置及形狀是將來比對上所使用的重大資訊。

Page 73: 多媒體資料庫(New)3rd
Page 74: 多媒體資料庫(New)3rd

相似度比較• 兩張圖要相似有哪些因素是可能被使用者考慮

的 ?• 顏色配置• 顏色分布• 物件位置• 物件大小• 物件形狀

Page 75: 多媒體資料庫(New)3rd

))(),((

)()(0.1),( :

21

2121 RRatioRRatioMax

RRatioRRatioRRSimShape ratio

))(),((

)()(0.1),( :

21

2121 RSizeRSizeMax

RSizeRSizeRRSimSize size

),(*

),(*

),(*),( :

21

21

2121

RRSimW

RRSimW

RRSimWRRSim functionSimilarity

colorcolor

sizesize

ratioratioregion

Page 76: 多媒體資料庫(New)3rd

n

iiquery RQSizeSIZE

1

))((

,,21

))(),((*))((

),(1

m,, j

RDRQSimSIZE

RQSizeDQSim jiregion

n

i query

iimage

Page 77: 多媒體資料庫(New)3rd

範例D E F G

A 0.92 – 0.81 –

B – 0.65 0.59 –

C – 0.61 – –

SIZE = 13 + 10 + 5 = 28

747.061.0*28

559.0*

28

1092.0*

28

13imageSim

A13

B10

C5

Query image

Page 78: 多媒體資料庫(New)3rd

• 聽看看下面幾首音樂或音樂片段,你知道歌名是什麼嗎? Music 1 2 3 4 5 6 7 8 9 10

• 你是怎麼辦識出這首歌的呢?若要讓電腦幫我們做同樣的事,要怎麼設計呢?

音樂資料庫

Page 79: 多媒體資料庫(New)3rd

音樂的特徵• Static Music Information

如 調號、拍號等• Acoustical Feature

如 loudness、 pitch 等• Thematic Feature

如 melodies、 rhythms 及 chords例“ sol-sol-sol-mi”、” 0.5-0.5-0.5-2” 及“ C-Am-Dm-G7”

• Structural Feature古典音樂格式的二個基本規則hierarchical rule 及 repetition rule

Page 80: 多媒體資料庫(New)3rd

特徵的取樣• 相對音感 vs絕對音感—旋律的位移

– 考慮以絕對音感比對所會造成的問題• 升 key,降 key 所發生的問題

– 節拍取樣也有相同的問題• 依完整段落取 pattern

• 多音軌的取樣問題

Page 81: 多媒體資料庫(New)3rd

特徵的編碼• 將特徵取出後,依適當的編碼方式將特

徵標碼– 能應付音調的升降– 能應付節拍的快慢– 要讓聽起來像的音樂,其編碼出來的 code

之間的距離也要近

Page 82: 多媒體資料庫(New)3rd

範例• 利用重複出現的重要音調代表某首歌

– Hierarchical rulemusic object→movements→Sentences→phrases→figures

– Repetition rule如“ C6-Ab5-Ab5-C6” 及“ F6-C6-C6-Eb6”

Page 83: 多媒體資料庫(New)3rd

重複出現的式樣—定義• For a substring X of a sequence of notes S,

if X appears more than once in S, we call X a repeating pattern of S. The repeating frequency of the repeating pattern X, denoted as freq(X), is the number of appearances of X in S. The length of the repeating pattern X, denoted |X|, is the number of notes in X.

Page 84: 多媒體資料庫(New)3rd

重複出現的式樣—實例• “C-D-E-F-C-D-E-C-D-E-F”

RP:Repeating PatternRPF:Repeating Pattern Frequency

RP C-D-E-F C-D-E D-E-F C-D D-E

RPF 2 3 2 3 3

RP E-F C D E F

RPF 2 3 3 3 2

Page 85: 多媒體資料庫(New)3rd

重複出現的式樣• nontrival 的定義

A repeating pattern X is nontrivial if and only if there does not exist another repeating pattern Y such that freq(X)=freq(Y) and X is a substring of Y.

• 實例上頁的 10 個 RP 中,只有“ C-D-E-F” 及” C-D-E” 為 nontrival

Page 86: 多媒體資料庫(New)3rd

The Correlative-Matrix(1)

• Phrase

• Melody string S =“C6-Ab5-Ab5-C6-C6-Ab5-Ab5-C6-Db5-c6-Bb5-C6”

• Repeating Patterns

RPF PL(Pattern Length) RP

2 4 C6-Ab5-Ab5-C6

6 1 C6

4 1 Ab5

Page 87: 多媒體資料庫(New)3rd

The Correlative-Matrix(2)

C6

Ab5 Ab5 C6 C6 Ab5 Ab5 C6 Db5 C6 Bb5 C6

C6 -- 1 1 1 1 1

Ab5 -- 1 2 1

Ab5 -- 1 3

C6 -- 1 4 1 1

C6 -- 1 1 1

Ab5 -- 1

Ab5 --

C6 -- 1 1

Db5 --

C6 -- 1

Bb5 --

C6 --

Construction of correlative matrix T12,12

Page 88: 多媒體資料庫(New)3rd

The Correlative-Matrix(3)

• Find all RPs and their RFs.– 定義 candidate set CS 其格式為 (pattern,rep_c

ount,sub_count)– CS 一開始為空集合,接下來根據 T來計算

及 insert RP 到 CS 內– 因為條件有 (Ti,j=1)or(Ti,j>1) 及 (T(i+1),(j+1)=0) or

(T(i+1),(j+1) <>0), 所以有以下四種情形

Page 89: 多媒體資料庫(New)3rd

The Correlative-Matrix(4)– Case 1: (Ti,j=1) and (T(i+1),(j+1)=0)例 T1,4=1,T2,5=0insert(“C6”,1,0)into CS

– Case 2: (Ti,j=1) and (T(i+1),(j+1)<>0)例 T1,5=1,T2,6=2modify(“C6”,1,0)into(“C6”,2,1)

– Case 3: (Ti,j>1) and (T(i+1),(j+1)<>0)例 T2,6=2,T3,7=3insert(“C6-Ab5”,1,1), (“Ab5”,1,1)into CS

– Case 4: (Ti,j>1) and (T(i+1),(j+1)=0)例 T4,8=4,T5,9=0insert (C6-Ab5-Ab5-C6”,1,0),(“Ab5-Ab5-C6”,1,1)and (“Ab5-C6”,1,1)into CS and change(“C6”,6,1)into(“C6”,7,2)

Page 90: 多媒體資料庫(New)3rd

The Correlative-Matrix(5)

– 計算 RFrep_count=0.5f(f-1) 即 f=((1+SQRT(1+8*rep_count))/2例如本例中 (“C6”,15,1),即 C6 的 rep_count=15, 所以 f=((1+SQRT(1+8x15))/2=6同理“ Ab5” 的 RF 為 4,“C6-Ab5-Ab5-C6” 的 RF為 2

Page 91: 多媒體資料庫(New)3rd

The String-Join Approach(1)

• Melody string “C-D-E-F-C-D-E-C-D-E-F”

• 第一步 : 找出所有長度為 1 的 RPs,並記為 {X,freq(X),(position1,position2,…)}如本例可找到 {“C”,3,(1,5,8)},{“D”,3,(2,6,9)},{E”,3,(3,7,10)},and {“F”,2,(4,11)}

Page 92: 多媒體資料庫(New)3rd

The String-Join Approach(2)

• 接下來長度為 2 的 RPs 可由上面的 RPs經 joining( 記為“∞” ) 而得例如若要找“ C-D”, 已知 {“C”,3,(1,5,8)},{“D”,3,(2,6,9)}則可確定“ C-D”亦出現在 (1,5,8), 可表示為 {“C”,3,(1,5,8)}∞{“D”,3,(2,6,9)} ={“C-D”,3,(1,5,8)}

Page 93: 多媒體資料庫(New)3rd

The String-Join Approach(3)

• 同理{“D”,3,(2,6,9)}∞{“E”,3,(3,7,10)} ={“D-E”,3,(2,6,9)}{“E”,3,(3,7,10)}∞{“F”,2,(4,11)} ={“E-F”,2,(3,10)}

• 而長度為 4 的 , 可由長度為 2 的 join 而得如 {“C-D”,3,(1,5,8)}∞{“E-F”,2,(3,10)} ={“C-D-E-F”,2,(1,8)}

Page 94: 多媒體資料庫(New)3rd

The String-Join Approach(4)

• 長度為 3 的 ,因為 freq(“C-D-E-F”)=freq(“E-F”)=2, 可知不只“ E-F” 是 trivial,”D-E-F”也是 (否則 freq(“E-F”) 要大於 2)而 {“C-D”,3,(1,5,8)}∞{“D-E”,3,(2,6,9)} ={“C-D-E”,3,(1,5,8)}且 freq(“C-D-E”) > freq(“C-D-E-F”)所以“ C-D-E” 為 nontrivial

• 最後 , 得知此例的 nontrivial repeating patterns 為“ C-D-E-F” 及“ C-D-E”

Page 95: 多媒體資料庫(New)3rd

討論• 相對音感 vs絕對音感—旋律的位移• 依完整段落取 pattern• 不同音樂格式的轉換• 問題—重要卻沒重覆的 feature

Page 96: 多媒體資料庫(New)3rd

視訊資料庫• 內容組織

– 使用者會對哪一部分的內容感興趣– 如何儲存這部分的內容,使得查詢處理能很有

效率的被執行– 如何設計查詢語言,與傳統的 SQL 有何不同– 影片的內容可自動的被取出來嗎 ?

Page 97: 多媒體資料庫(New)3rd

影片內涵資訊• 物件

– 單純形狀的描述• 可做到自動化

– 有意義的物件描述• 人物

– 男主角,女主角…• 動物

– 豬,貓,狗…• 非生物

– 皮箱,鑰匙…• 幾乎不可能做到自動化

Page 98: 多媒體資料庫(New)3rd

• 活動– 單純描述

• 物件移動軌跡– 如何將軌跡編碼成電腦可比對的 code 為一個很重要的課題

• 可做到自動化– 含有意義的行為描述

• 車禍,甲男把皮箱交給乙女…• 必須用單純描述做為基礎描述• 不容易做到完全自動化

Page 99: 多媒體資料庫(New)3rd

視訊內涵資訊之建構• 兩種資訊

– 靜態• 將一個 frame 視為一張圖片• 利用圖片搜尋技巧

– 動態• 將連續 frame 視為一個動作• 物件移動軌跡必須被考慮

Page 100: 多媒體資料庫(New)3rd

• 靜態資訊

Page 101: 多媒體資料庫(New)3rd

• 動態資訊

Page 102: 多媒體資料庫(New)3rd

Preface (Cont’d)• 移動軌跡

Page 103: 多媒體資料庫(New)3rd

• 影片分析– Shot

• 單一連續的鏡頭所拍攝之影片段落• 組成影片的單位• 同一個 shot 內的 frame 內容類似

– 可以在一個 shot 中找出其代表的 frame ,來表示整個shot

• Shot 偵測– 利用顏色分佈的改變偵測 shot 的界線 (boundary)

» 可自動化 ,但可能會因顏色的突然改變而誤找– 目前 shot segmentation 工具可以達到一各十分高的正

確率 (>95%)

Page 104: 多媒體資料庫(New)3rd

– 場景 (SCENE)• 由多個描述相同事件的 shot 所組成• 可當作查詢的單位

– 物件移動軌跡• 找出物體移動的軌跡,可代表某一事件

Page 105: 多媒體資料庫(New)3rd

涵義概念式查詢• 可下達語意式的查詢

– 找出包含天空以及海的圖片– 找出有飛機飛過天空的影片

• 以低階 (low level) 特徵值的關聯 , 找出媒體之內涵意義– 半自動分類– Classification– Association pattern mining

• Concept 與 Semantic network

Page 106: 多媒體資料庫(New)3rd

Classification

• 目標– 預測資料之類別

• 步驟– 建立資料分類模型

• 根據訓練資料集 (training set)

– 評估資料分類模型的準確度• 根據測試資料集 (testing data)

– 資料分類預測

Page 107: 多媒體資料庫(New)3rd

TrainingData

NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

ClassificationAlgorithms

IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’

Classifier(Model)

Page 108: 多媒體資料庫(New)3rd

Classifier

TestingData

NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes

Unseen Data

(Jeff, Professor, 4)

Tenured?

Page 109: 多媒體資料庫(New)3rd

Classification• 演算法

– 決策樹 (decision tree)

– Bayesian Belief Networks

– k-nearest neighbor classifier

– case-based reasoning

– Genetic algorithm

– Rough set approach

– Fuzzy set approaches

– Neural Network

Page 110: 多媒體資料庫(New)3rd

訓練資料集 (training data set)

age income student credit_rating<=30 high no fair<=30 high no excellent31…40 high no fair>40 medium no fair>40 low yes fair>40 low yes excellent31…40 low yes excellent<=30 medium no fair<=30 low yes fair>40 medium yes fair<=30 medium yes excellent31…40 medium no excellent31…40 high yes fair>40 medium no excellent

Page 111: 多媒體資料庫(New)3rd

決策樹age?

overcast

student? credit rating?

no yes fairexcellent

<=30 >40

no noyes yes

yes

30..40

Page 112: 多媒體資料庫(New)3rd

Naïve bayesian Network :exampleOutlook Temperature Humidity Windy Classsunny hot high false Nsunny hot high true Novercast hot high false Prain mild high false Prain cool normal false Prain cool normal true Novercast cool normal true Psunny mild high false Nsunny cool normal false Prain mild normal false Psunny mild normal true Povercast mild high true Povercast hot normal false Prain mild high true N

P(p) = 9/14

P(n) = 5/14

Page 113: 多媒體資料庫(New)3rd

outlook

P(sunny|p) = 2/9 P(sunny|n) = 3/5

P(overcast|p) = 4/9

P(overcast|n) = 0

P(rain|p) = 3/9 P(rain|n) = 2/5

temperature

P(hot|p) = 2/9 P(hot|n) = 2/5

P(mild|p) = 4/9 P(mild|n) = 2/5

P(cool|p) = 3/9 P(cool|n) = 1/5

humidity

P(high|p) = 3/9 P(high|n) = 4/5

P(normal|p) = 6/9 P(normal|n) = 2/5

windy

P(true|p) = 3/9 P(true|n) = 3/5

P(false|p) = 6/9 P(false|n) = 2/5

Page 114: 多媒體資料庫(New)3rd

Play-tennis example: classifying X• An unseen sample X = <rain, hot, high, false>

• P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582

• P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286

• Sample X is classified in class n

Page 115: 多媒體資料庫(New)3rd

Bayesian Belief NetworksFamilyHistory

LungCancer

PositiveXRay

Smoker

Emphysema

Dyspnea

LC

~LC

(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)

0.8

0.2

0.5

0.5

0.7

0.3

0.1

0.9

Bayesian Belief Networks

The conditional probability table for the variable LungCancer

Page 116: 多媒體資料庫(New)3rd

Bayesian Belief Networks

Page 117: 多媒體資料庫(New)3rd

The k-Nearest Neighbor Algorithm

.

_

+

_ xq

+

_ _

+

_

_

+

.

..

. .

Page 118: 多媒體資料庫(New)3rd

Rough Set Approach• Rough sets are used to approximately or

“roughly” define equivalent classes

• A rough set for a given class C is approximated by two sets: a– lower approximation (certain to be in C) – upper approximation (cannot be described as not

belonging to C)

Page 119: 多媒體資料庫(New)3rd

Fuzzy set approach

Page 120: 多媒體資料庫(New)3rd

Association pattern mining

• 目標– 尋找項目 (item) 或物件間的關聯性– 關聯性

• 一起出現的次數要夠多 (support)

• 伴隨出現之條件機率值要夠大 (confidence)

• 演算法– Apriori algorithm

– Lattice approach

– FP-tree

Page 121: 多媒體資料庫(New)3rd

探勘關聯式法則 : 範例

For rule A C:support = support({A C}) = 50%

confidence = support({A C})/support({A}) = 66.6%

The Apriori principle:Any subset of a frequent itemset must be frequent

Transaction ID Items Bought2000 A,B,C1000 A,C4000 A,D5000 B,E,F

Frequent Itemset Support{A} 75%{B} 50%{C} 50%{A,C} 50%

Min. support 50%Min. confidence 50%

Page 122: 多媒體資料庫(New)3rd

Apriori 演算法• Join Step: Ck is generated by joining Lk-1with itself

• Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset

• Pseudo-code:Ck: Candidate itemset of size kLk : frequent itemset of size k

L1 = {frequent items};for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do

increment the count of all candidates in Ck+1 that are contained in t

Lk+1 = candidates in Ck+1 with min_support endreturn k Lk;

Page 123: 多媒體資料庫(New)3rd

範例

TID Items100 1 3 4200 2 3 5300 1 2 3 5400 2 5

Database D itemset sup.{1} 2{2} 3{3} 3{4} 1{5} 3

itemset sup.{1} 2{2} 3{3} 3{5} 3

Scan D

C1L1

itemset{1 2}{1 3}{1 5}{2 3}{2 5}{3 5}

itemset sup{1 2} 1{1 3} 2{1 5} 1{2 3} 2{2 5} 3{3 5} 2

itemset sup{1 3} 2{2 3} 2{2 5} 3{3 5} 2

L2

C2 C2

Scan D

C3 L3itemset{2 3 5}

Scan D itemset sup{2 3 5} 2

Page 124: 多媒體資料庫(New)3rd

FP-tree 演算法

• 把一大型資料庫壓縮至一緊實的資料結構– FP-tree

• 只包含探勘關聯式樣式所需之相關資料• 避免花費高昂的資料庫掃描

Page 125: 多媒體資料庫(New)3rd

FP-tree 建置過程

min_support = 0.5

TID Items bought (ordered) frequent items100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}200 {a, b, c, f, l, m, o} {f, c, a, b, m}300 {b, f, h, j, o} {f, b}400 {b, c, k, s, p} {c, b, p}500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}

Steps:

1. Scan DB once, find frequent 1-itemset (single item pattern)

2. Order frequent items in frequency descending order

3. Scan DB again, construct FP-tree

Page 126: 多媒體資料庫(New)3rd

{}

f:4 c:1

b:1

p:1

b:1c:3

a:3

b:1m:2

p:2 m:1

Header Table

Item frequency head f 4c 4a 3b 3m 3p 3

Page 127: 多媒體資料庫(New)3rd

FP-tree 主要探勘過程

1) 對 FP-tree 內的每個 node, 建置 conditional

pattern base

2) 對每一個 conditional pattern-base 建置 con

ditional FP-tree

3) 重複上面步驟,一直到 FP-tree 只剩下單一路徑

Page 128: 多媒體資料庫(New)3rd

Step 1: 對 FP-tree 內的每個 node,建置 conditional pattern base

Conditional pattern bases

item cond. pattern base

c f:3

a fc:3

b fca:1, f:1, c:1

m fca:2, fcab:1

p fcam:2, cb:1

{}

f:4 c:1

b:1

p:1

b:1c:3

a:3

b:1m:2

p:2 m:1

Header Table

Item frequency head f 4c 4a 3b 3m 3p 3

Page 129: 多媒體資料庫(New)3rd

Step 2: 對每一個 conditional pattern-base 建置 conditional FP-tree

All frequent patterns concerning m

m,

fm, cm, am,

fcm, fam, cam,

fcam

m-conditional pattern base:

fca:2, fcab:1

{}

f:3

c:3

a:3m-conditional FP-tree

{}

f:4 c:1

b:1

p:1

b:1c:3

a:3

b:1m:2

p:2 m:1

Header TableItem frequency head f 4c 4a 3b 3m 3p 3

Page 130: 多媒體資料庫(New)3rd

Mining Frequent Patterns by Creating Conditional Pattern-Bases

EmptyEmptyf

{(f:3)}|c{(f:3)}c

{(f:3, c:3)}|a{(fc:3)}a

Empty{(fca:1), (f:1), (c:1)}b

{(f:3, c:3, a:3)}|m{(fca:2), (fcab:1)}m

{(c:3)}|p{(fcam:2), (cb:1)}p

Conditional FP-treeConditional pattern-baseItem

Page 131: 多媒體資料庫(New)3rd

Step 3: Recursively mine the conditional FP-tree

{}

f:3

c:3

a:3m-conditional FP-tree

Cond. pattern base of “am”: (fc:3)

{}

f:3

c:3am-conditional FP-tree

Cond. pattern base of “cm”: (f:3){}

f:3

cm-conditional FP-tree

Cond. pattern base of “cam”: (f:3)

{}

f:3

cam-conditional FP-tree

Page 132: 多媒體資料庫(New)3rd

效能分析

0

10

20

30

40

50

60

70

80

90

100

0 0.5 1 1.5 2 2.5 3

Support threshold(%)

Ru

n t

ime(s

ec.)

D1 FP-grow th runtime

D1 Apriori runtime

Data set T25I20D10K

Page 133: 多媒體資料庫(New)3rd

Association pattern mining

• 傳統 Association pattern mining幾乎都是找出項目和項目間的關聯性

• 在多媒體應用中– 互斥之關係亦十分重要

• 可幫助分類的準確性

Page 134: 多媒體資料庫(New)3rd

Concepts與 Semantic network

• 概念 (concepts)– 知識表達之基本觀念– Semantic notions of the objects in the world

Page 135: 多媒體資料庫(New)3rd

Concepts與 Semantic network

– 概念之間的關係• 多重解析度 (multi-resolution)

Page 136: 多媒體資料庫(New)3rd

Concepts與 Semantic network

• Semantic network– 節點

• 物件 ,觀念或狀態– 連結

• 節點之間的關聯

Page 137: 多媒體資料庫(New)3rd
Page 138: 多媒體資料庫(New)3rd

• 參考資料– V.S. Subrahmanian, Principles of Multimedia Database Systems,

Morgan Kaufmann.

– C.Y. Tsai, A.L.P. Chen and K. Essig,”Efficient Image Retrieval Approaches for Different Similarity Requirements”, Proc. SPIE Conference on Storage and Retrieval for Image and Video Databases, 2000

– Jiawei Han and Micheline Kamber, Data Mining: Concepts and Te

chniques, Morgan Kaufmann, 2000.

Page 139: 多媒體資料庫(New)3rd

Content-Based Interactivity

Page 140: 多媒體資料庫(New)3rd
Page 141: 多媒體資料庫(New)3rd

Paper study : topic 1

A Semantic Modeling Approach for Video Retrieval by Content

Edoardo ArdizzoneMohand-Said HacidICMCS 1999 July

Page 142: 多媒體資料庫(New)3rd

Introduction

• Using keywords or free text to describe the necessary semantic objects is not sufficient.

• The issues that need to be addressed is1) the representation of video information2) the organization of this information3) user-friendly representation

Page 143: 多媒體資料庫(New)3rd

Introduction(cont.)

• We exploit the 2 languages

1) One for defining the schema (i.e. the

structure)

2) The other for querying through schema• And 2 layers for representing video`s conceptual

content

1) Object layer

2) Schema layer

Page 144: 多媒體資料庫(New)3rd

2 Layers for video`s conceptual content

• Object layer: collect objects of interest, their description and relation among them. Objects in video sequence are represented as visual entities.

• Schema layer: intend to capture the structure and knowledge for video retrieval. Visual entities can be classified into hierarchical structure.

Page 145: 多媒體資料庫(New)3rd

Schema Language—example1

Page 146: 多媒體資料庫(New)3rd

Query Language (QL)

• Querying a DB means retrieving stored objects that satisfy certain conditions or qualifications and hence are interesting for a user.

• In OODB, classes are used to represent sets of objects.

Page 147: 多媒體資料庫(New)3rd

QL cont.

• Queries are represented as concepts in our abstract language.

• The syntax and semantics of a concept language for making queries

Page 148: 多媒體資料庫(New)3rd

QL- Example

• “Sequences of movies directed by Kevin Costner in which he is also an actor”

actordirectedByfilm

nameCostnerKevindirectedByfilm

Sequence

)("".

)}()(|{)'( ' dRdRdRR IIIII

Page 149: 多媒體資料庫(New)3rd

QL- Example

• “the set of movies whose directors are also producers of some films”

).( producedBydirectedByFilm

)}'()(|)',{()'( ' dRdRddRR IIII

Page 150: 多媒體資料庫(New)3rd

Semantic Annotation of Sports Video

• Videos isn`t just a sequence of images. It add the temporal dimension.

• An approach for semantic annotation of sports videos that include several different sports and even non-sports content

Page 151: 多媒體資料庫(New)3rd

Introduction --Typical sequence of shots in sports video

Page 152: 多媒體資料庫(New)3rd
Page 153: 多媒體資料庫(New)3rd

Classifying visual shot features

Page 154: 多媒體資料庫(New)3rd

Implementation--Classifying visual shot features

(cont.)

Page 155: 多媒體資料庫(New)3rd

Conclusion

• There is a growing interest in video database and for dealing with access problems.

• One of the central problems in the creation of robust and scalable systems for manipulating video information lies in representing video content

Page 156: 多媒體資料庫(New)3rd

Conclusion cont.

• This framework is appropriate for supporting conceptual and intensional queries

• Be able to perform exact as well as partial or fuzzy matching

• some physical features : color, objects’s shape…ect.

Page 157: 多媒體資料庫(New)3rd

Paper study: topic 2

Indexing methods for approximate string matching

IEEE data engineering bulletin,2000 Gonzalo Navarro, Ricardo Baeza-Yates, Erkki Sutinen, Jorma Tarh

io

Page 158: 多媒體資料庫(New)3rd

outline

• Introduction• Basic concepts• Neighborhood generation• Partitioning into exact search• Intermediate partitioning• summarization

Page 159: 多媒體資料庫(New)3rd

Introduction

• Definition– given a long text T1…..n of length n and a comparatively short

pattern P1…..m of length m , both sequences over an alphabet Σ of size σ ,find the text positions that match the pattern with at most k “errors”.

• Applications– Retrieving musical passages similar to a sample

– Finding DNA subsequences after possible mutations

– Searching text under the presence of typing or spelling errors

Page 160: 多媒體資料庫(New)3rd

outline

• Introduction• Basic concepts• Neighborhood generation• Partitioning into exact search• Intermediate partitioning• summarization

Page 161: 多媒體資料庫(New)3rd

Suffix trees 1 g a a c c g a c c t 2 a a c c g a c c t 3 a c c g a c c t 4 c c g a c c t 5 c g a c c t 6 g a c c t 7 a c c t 8 c c t 9 c t10 t

Weak point:large space requirement,about 9 times of text size.

Page 162: 多媒體資料庫(New)3rd

Suffix array

a$

a a a a b b c d r ab b c d r r a ar r a a $ ca a c $ $ c

Require less space,about 4 times of text size

Page 163: 多媒體資料庫(New)3rd

Q-grams,Q-samplesTEXT

a b r a c a d a b r a

1 2 3 4 5 6 7 8 9 10 11

1

23

45

INDEXa b r ab r a cr a c aa c a dc a d a

1 82345

Q-samples,unlike q-grams,do not overlap, and may even be some space between each pair of samples.

Page 164: 多媒體資料庫(New)3rd

Edit distance

S U R G E R Y

0 1 2 3 4 5 6 7

S 1 0 1 2 3 4 5 6

U 2 1 0 1 2 3 4 5

R 3 2 1 0 1 2 3 4

V 4 3 2 1 1 2 3 4

E 5 4 3 2 2 1 2 3

Y 6 5 4 3 3 2 2 2ed(“SURVEY”,”SURGERY”) Final result

Page 165: 多媒體資料庫(New)3rd

outline

• Introduction• Basic concepts• Neighborhood generation• Partitioning into exact search• Intermediate partitioning• summarization

Page 166: 多媒體資料庫(New)3rd

Neighborhood generationPattern :abc with 1 error

{* bc, a* c,ab* }U{ab,ac,bc}U{* abc,a* bc,abc* }

Text a b r a c a d a b r a

{abr},{ac},{abr},..

resultsK-Neignborhood

K-neighborhood(candidate) could be quite large,So,this approach workswell for small m and k.

searching

Page 167: 多媒體資料庫(New)3rd

outline

• Introduction• Basic concepts• Neighborhood generation• Partitioning into exact search• Intermediate partitioning• summarization

Page 168: 多媒體資料庫(New)3rd

Partitioning into exact searchPattern :abr with 1 error

{a},{br}

Text a b r a c a d a b r a

{abra},{abra}..

resultsPartition pattern

1.For large error level the text areas to verify cover almost almost all the text.2.If s grow,pieces get shorter, more match to check,but make the filter stricter.

Exact search

verification

Text a b r a c a d a b r a

filtration

into (K+s) pieces

Page 169: 多媒體資料庫(New)3rd

outline

• Introduction• Basic concepts• Neighborhood generation• Partitioning into exact search• Intermediate partitioning• summarization

Page 170: 多媒體資料庫(New)3rd

Intermediate PartitioningPattern :abr with 1 error

{a},{br}

Text a b r a c a d a b r a

{abra},{abra}..

resultsPartition pattern

Neighborhood generation allow floor of k/j

verificationText a b r a c a d a b r a

into j (j=2)pieces

J=2 (j=K+1;partitioning into exact search)

searching

Page 171: 多媒體資料庫(New)3rd

Intermediate PartitioningPattern :abr with 1 error

{abr}

Text a b r a c a d a b r a

{abra},{abra}..

resultsPartition pattern

1.Which j value to use? the search time decreases when j move from 1 to k+1. but the verification cost grows, oppositiely.

Neighborhood generation allow floor of k/j

into j (j=1)pieces

J=1 (neighborhood generation)

searching

{*abr,a*br,ab*r,abr*}U{ab,br,ar}U{ab*,*br,a*r}

Page 172: 多媒體資料庫(New)3rd

outline

• Introduction• Basic concepts• Neighborhood generation• Partitioning into exact search• Intermediate partitioning• summarization

Page 173: 多媒體資料庫(New)3rd

summarization

Page 174: 多媒體資料庫(New)3rd

Paper study: topic 3

Lazy Users and automatic Video Retrieval Tools in (the) Lowlands

The Lowlands Team

CWI1, TNO2, University of Amsterdam3, University of Twente4

The Netherlands

Jan Baan2, Alex van Ballegooij1, Jan Mark Geusenbroek3, Jurgen den Hartog2, Djoerd Hiemstra4,

Johan List1, Thijs Westerveld4, Ioannis Patras3, Stephan Raaijmakers2, Cees Snoek3, Leon Todoran3,

Jeroen Vendrig3, Arjen P. de Vries1 and Marcel Worring3.

Proceeding of the 10th Text Retrieval Conference(TREC), 2001

Page 175: 多媒體資料庫(New)3rd

Outline

• Introduction

• Detector-base processing

• Probabilistic multimedia retrieval

• Interactive experiment

• Lazy users

• Discussion

• Conclusion

Page 176: 多媒體資料庫(New)3rd

Basic key subject of Multimedia database

• Indexing– K-d tree, point quadtree, MX-quadtree, R-tree,

suffix-tree, TV-tree……– Determined by database designer.

• Similarity – No standard.– How similar is decide by user.

Page 177: 多媒體資料庫(New)3rd

User is always Lazy!

• Facts 1:– Almost of end user know nothing about

“Query”.

• Facts 2:– What they want may only a concept, can not

clearly to descript.

• Facts 3:– Users like selection, not question.

Page 178: 多媒體資料庫(New)3rd

Introduction • Use two complementary automatic approac

h:– Visual content– Transcript

• The experiment focus on revealing relationships between:– Different modalities– The amount of human processing– The quality of rersults

Page 179: 多媒體資料庫(New)3rd

Introduction

Run Description

1 Detector-base, automatic

2 Combined 1-3, automatic

3 Transcript-base, automatic

4 Query articulation, interactive

5 Combined 1-4, interactive, by a lazy user

Page 180: 多媒體資料庫(New)3rd

Detector-base processing

Architecture for automatic syste

Page 181: 多媒體資料庫(New)3rd

Detector-base processing (cont)

• Detector for exact queries that yield yes/no answer depending if a set of predicates is satisfied.

• Detector for approximate queries that yield a measure that expresses how similar is.

Page 182: 多媒體資料庫(New)3rd

Detector-base processing (cont)

Selected detector

Analysis of the topic

description

Query by example

Filter-out irrelevant material

Final ranked results

Page 183: 多媒體資料庫(New)3rd

Detectors • Camera technique detection

– zoom, pan, tilt…….

• Face detector– no face, 1-face, 2-faces…5-faces, many-faces

• Caption retrieval – Text segmentation, OCR, fuzzy string matching

• Monologue detection– Shot should contain speech– Shot should have a static or unknown camera technique

– Shot should have a minimum length

• Detectors base on color invariant features– Keyframes store with color histogram

Page 184: 多媒體資料庫(New)3rd

Probabilistic multimedia retrieval

• We assume our documents are shots from video.

• Models of discrete signals(i.e. text).– Mixture of discrete probability measures

• Models of continuous signals(i.e. image).– Mixture of continuous probability measures

Page 185: 多媒體資料庫(New)3rd

• Using Bayes’ rule:

• If a query consists of several independent parts (e.g. a textual Qt and visual part Qv)

Probabilistic multimedia retrieval

Page 186: 多媒體資料庫(New)3rd

Probabilistic multimedia retrieval

• Hierarchical data model of video

video

shots

scenes scenes

shots

frames frames

Page 187: 多媒體資料庫(New)3rd

Probabilistic multimedia retrieval

• Text retrieval– Using Sphinx3 speech recognition system from

Carnegie Mellon University – Input query keyword– Retrieval to shots level

Page 188: 多媒體資料庫(New)3rd

Probabilistic multimedia retrieval

• Image retrieval– Retrieving the key frames of shots– Cut key frames of each shots into blocks of 8 x

8 pixels– Perform by Discrete Cosine Transform (DCT),

which used in the JPEG compression standard.

Page 189: 多媒體資料庫(New)3rd

Interactive experiments

• Topic lists.– http://www-nlpir.nist.gov/projects/t01v/topicsoverview.html

• Topic 33: White fort

• Topic 19: Lunar rover

• Topic 8: Jupiter

• Topic 25: Starwar

Page 190: 多媒體資料庫(New)3rd

Topic 33: White fort

Using Run 1:

Any color-based technique worked out well for this query

Example known-item keyframe

Page 191: 多媒體資料庫(New)3rd

Topic 19: Lunar rover

Color-histogramExample

Known-item keyframe

Color-based retrieval technique is not useful in this case

By Run 4 :

Allow user to making explicit their own world knowledge: in scenes on the moon, the sky is black.

Page 192: 多媒體資料庫(New)3rd

Topic 8: Jupiter

Example

Some correct answers keyframes

At first though, this query may seem to be easy to solve.

But it is apparent that colors in different photos.

Using three color-histogram and their interrelationships.

Color-sets

Page 193: 多媒體資料庫(New)3rd

Topic 25: Starwar

Example

Some correct answers keyframes

Text retrieval: (if you know the name)

The first filter selects only those images that have sufficient amount of golden content.

Secondly, a set of filters reduces the data-set by selecting those images that contain the color-sets shown.

R2D2, C3PO

Page 194: 多媒體資料庫(New)3rd

Lazy users• Lazy users identify result sets instead of correct

answer. (so our interactive results are not 100% precision.)

• The combination strategies used to construct run 5 consisted of:

Choose the run that looks best

Concatenate or interleave top-N from various runs

Continue with an automatic, seeded search

strategy

Page 195: 多媒體資料庫(New)3rd

Discussion

• How video retrieval systems should be evaluated.

• The inhomogeneity of the topics– “sailboat on the beach” vs. “yacht on the sea”

• The low quality of the data– photos of Jupiter

• The evaluation measures used

Page 196: 多媒體資料庫(New)3rd

Conclusion

• Our evaluation demonstrates the importance of combining various techniques to analyze the multiple modalities.

• the optimal technique depends always on the query.

• User interaction is still required to decide upon a good strategy.

Page 197: 多媒體資料庫(New)3rd

Paper study : topic 4

VIDEO INDEXING BY MOTION ACTIVITY MAPS

Wei Zeng; Wen Gao; Debin Zhao;

Image Processing. 2002. Proceedings. 2002 International Conference on , Volume: 1 , 2

002 Page(s): 912 -915

Page 198: 多媒體資料庫(New)3rd

Outline

• Introduction– motion indexing

• Motion Activity Map---MAM• Definition of MAM• Generation of MAM• Organization of MAMs• Experimental results• Conclusion

Page 199: 多媒體資料庫(New)3rd

Introduction • To find a video indexing technique which could

extract crucial information from videos for efficient visual content-based queries.

• In order to foster the content-based indexing and retrieval.

• The video indexing should be based on good feature representation such as motion feature.

• Motion feature depicts the dynamic contents of video, and enrich the semantics of videos, such as running and flying.

Page 200: 多媒體資料庫(New)3rd

Motion indexing

• Those techniques and systems about motion indexing can be categorized into four types.– Feature-based approach:– Trajectory-based approach:– Semantic-based approach– Image-based approach

Page 201: 多媒體資料庫(New)3rd

Feature-based approach:

• Computes the motion parameters of predefined motion model.

• Has been adopted by MPEG7(still draft)• example

Page 202: 多媒體資料庫(New)3rd
Page 203: 多媒體資料庫(New)3rd

Trajectory-based approach:

• This approach is often chosen by object-based system for indexing video.

Page 204: 多媒體資料庫(New)3rd
Page 205: 多媒體資料庫(New)3rd

Semantic-based approach• Provides semantic events or actions of

motion.• Reference paper “A Semantic Event-Detection

Approach and Its Application to Detecting Hunts in Wildlife Video”

Page 206: 多媒體資料庫(New)3rd
Page 207: 多媒體資料庫(New)3rd

Image-based approach

• Gives synthesized pictures generated from motion of video.

• MAM is the image-based approach

Page 208: 多媒體資料庫(New)3rd

Concepts of MAM(1)

• Motion activity map is an image that accumulates the motion activity on the specific grids along the temporal axis of videos.

ti

jgrid

Page 209: 多媒體資料庫(New)3rd

Concepts of MAM(2)

• It is an image-based representation about the magnitude and spatial distribution of motion.

• One video clique can generate several MAMs and all MAMs are organized into a hierarchical tree view according to the structure of video.

Page 210: 多媒體資料庫(New)3rd

Definition of MAM(1)• Motion activity map is an image synthesized from

motion vector field, and motion vector field can be defined as following temporal function. X(t), where t=t0,t1,………tk.

X(t) = v( i, j, t )

),,(

),,( ) t j, i, v(

tjiv

tjiv

y

x

(i, j)

Where vx= ( i, j, t ) and vy= ( i, j, t ) are the x-axis component and y-axis component of motion vector on the grid ( i, j).

Page 211: 多媒體資料庫(New)3rd

Definition of MAM(2)• Base on the motion vector field X(t), the

motion activity map (MAM) is computed as

kt

t

tjivfjiM0

),,((),( (i, j)

Where f(v(i, j, t)) is the motion activity measure function on grid (i, j) and is the grid set of video.

Page 212: 多媒體資料庫(New)3rd

Generation of MAM

Motion vector field

VideoVideo

Video

Temporal video

Segmentation

MAMComputing

MAMQuantization

MAM spatialSegmentation

MAM

Region-BasedMAMs

Demo video segmentation

Hall shall

Page 213: 多媒體資料庫(New)3rd

Organization of MAMs

• Video can be segmented into different shot levels such as shots and sub-shots, so there are a lot of MAMs corresponding to a video shot.

• All the MAMs of video can be organized into a hierarchical tree representing the structure of video.

Page 214: 多媒體資料庫(New)3rd

Organization of MAMs

Interactive Video Retrieval

VideoVideo

Video

TemporalSegmentation

MAMComputing

Layeredspatial

segmentation

MAM display

MAMDatabase

Page 215: 多媒體資料庫(New)3rd

Expermental results

(a)Key frame based MAM

(b)MAM

(c-f)Region-representation of MAM

Page 216: 多媒體資料庫(New)3rd

Conclusion

• VideoshotMAMsub-shot1MAM1sub-shot2MAM

2

• All the MAM could be segmented into Region-representation.

• Optimalize MAM-based representation, we mark the pixel of MAM with a specific color according to the related intensity.

Page 217: 多媒體資料庫(New)3rd

Paper study : topic 5

SOM-Base R*-Tree for Similarity Retrieval

Database Systems for Advanced Applications, 2001. Proceedings. Seventh International Conference on , 2001Kun-seok. Oh, Yaokai Feng, Kunihiko Kaneko, Akifumi Makinouchi, Sang-hyun Bae

Page 218: 多媒體資料庫(New)3rd

Outline

• Self-Organizing Maps (SOM)

• R*-Tree

• SOM-Based R*-Tree

• Experiments

• Conclusion

Page 219: 多媒體資料庫(New)3rd

Self-Organizing Maps (SOM)What is SOM1.SOM provide mapping from high-demensional featur

e vectors onto a two-dismensional space2.The mapping preserves the topology of the feature v

ector.3.The map is called to topological feature map, and pr

eserves the mutual relationships(similarity) in feature space of input data.4.The vectors contained in each node of the topologic

al feature map are usually called codebook vectors.

Page 220: 多媒體資料庫(New)3rd

• 我們使用 100 個類神經元排列成 10×10 的二維矩陣來進行電腦模擬,用來進行測試的輸入向量 的維度也是二維的資料,且其機率分佈為均勻地分佈在 。

}11;11{ 21 xx

Self-Organizing Maps (SOM)

Page 221: 多媒體資料庫(New)3rd

圖:均勻分佈之資料的自我組織特徵映射圖: (a) 隨機設定之初始鍵結值向量 ;(b) 經過 50 次疊代後之鍵結值向量 ;(c) 經過 1,000 次疊代後之鍵結值向量 ;(d) 經過 10,000 次疊代後之鍵結值向量 ;

Self-Organizing Maps (SOM)

Page 222: 多媒體資料庫(New)3rd

• 類神經元在特徵映射圖中的機率分佈,的確可以反應出輸入向量的機率分佈。這裏要強調一點的是,資料的機率分佈特性並非是線性地反應於映射圖中。

三群高斯分佈之資料。

Self-Organizing Maps (SOM)

Page 223: 多媒體資料庫(New)3rd

Self-Organizing Maps (SOM)SOM Algorithm

1.Init Map neuron.2.input feature vector x.3.find winner neuron

(BMN:Beat-Match Node)

4.adjusting all neuron’s weight

5.continus step 2, until no adjusting.

Page 224: 多媒體資料庫(New)3rd

R*-Tree

• The R*-tree improves the performance of the R-tree by modifying the insertion and split algorithms by introducing the forced reinsertions mechanism

• The R*-tree is proposed as an index structure for spatial data such as geographical and CAD data

Page 225: 多媒體資料庫(New)3rd

R*-Tree• Each internal node contais an array of (p,) entries.

Where p is a pointer to in child node of this internal node, and is the minimum bounding rectangle(MBR) of her child node pointer to by the pointer p.

• Each lead node contains an array of (OID, ) for spatial objects, where OID is an object identifer, and is the MBR of the object identified by OID.

Page 226: 多媒體資料庫(New)3rd

R*-Tree (cont.)Space of point data

Page 227: 多媒體資料庫(New)3rd

R*-Tree (cont.)Tree access structure

Page 228: 多媒體資料庫(New)3rd

SOM-Based R*-Tree1、 Clustering similar images

–We first generate the topological feature map using the SOM, We generate the BMIL by computing the distance between the feature vector and codebook vectors from the topological feature map.

–The BMN(best-match-nodes: node with minimum distance) is chosen from the map nodes.

–Next the weigth vector are updated

mini ii

BMN FV CBV

Page 229: 多媒體資料庫(New)3rd

SOM-Based R*-Tree (cont.)

Page 230: 多媒體資料庫(New)3rd

SOM-Based R*-Tree (cont.)

2、 Construction–In order to construct the R*-tree, we select a CBV (codeb

ook vector) from the topological feature map as an entry .

–If it is an empty node . We select the next codebook vector. Otherwise determine the leaf node which insert codebook vector.

–A leaf of the SOM-based R*-tree has the following structure:

1

1

: ( ,..., ,..., ) ( )

: ( , )

i pL E E E m p M

E OID

Page 231: 多媒體資料庫(New)3rd

Experiments• We preformed experiments to compare the Som-base

with SOM and R*-tree.• Image database use: 40,000 atrificial/natural

(storage on local disk)• Image size: 128*128 pixels • Performed on:COMPAQ deskpro( OS:FreeBSD) with

128MB RAM

Page 232: 多媒體資料庫(New)3rd

Experiments (cont.)• Feature Extraction:

– use Haar waveletes to compute feature vector

– The color space YIQ-space(NTSC transmission primaries )

– Each elecment of this feature vector represents an agerage of 32*32 pixels of original image.

– The color feature vector has 48 dimensions(4*4*3 ; where 3 is the ehree channels of YIQ-space)

Page 233: 多媒體資料庫(New)3rd

Experiments (cont.)• Construcion of SOM-based R*-tree

Page 234: 多媒體資料庫(New)3rd

Experiments (cont.)

Page 235: 多媒體資料庫(New)3rd

Experiments (cont.)• We experimented with four type of searches:

(I) normal SOM including empty nodes

(II) normal SOM with eliminated empty nodes

(III) normal R*-tree

(IV) SOM-based R*-tree with eliminated empty nodes

Page 236: 多媒體資料庫(New)3rd

Experiments (cont.)(1) Retrieval from SOM with empty nodes(2) Retrieval from SOM without empty

nodes

Page 237: 多媒體資料庫(New)3rd

Experiments (cont.)

Page 238: 多媒體資料庫(New)3rd

Conclusion

• For high-dimensional data ,we using a topological feature map and a best-matching-image-list (BMIL) obtained via the learning of a SOM

• In an experiment ,we performed a similarity search using real image data and compared the performance of the SOM-based R*-tree with a normal SOM and R*-tree ,base on retrieval time cost