多媒體資料庫(New)3rd

多媒體資料庫

提綱• 簡介• 多媒體資料庫的挑戰• 多維度索引技術• 文件資料庫• 影像資料庫• 音訊資料庫• 視訊資料庫

簡介• 多媒體資料與傳統資料庫之比較

– 資料內容• 傳統資料庫

– 以文字的方式儲存，常以多個屬性描述一個實體或物件• 多媒體資料庫

– 為涵義豐富的媒體，內容無法單純以多個屬性將其描述– 資料展示

• 傳統資料庫– 文字，表單

• 多媒體資料庫– 需要更豐富的視覺聽覺之展示

簡介• 範例

– 將圖片以傳統資料庫的方式處理儲存• 可下達的查詢

– 找出 XXX 所畫的圖片– 找出在 1945~1955 年，由 OOO 所繪製的圖片

• 無法處理的查詢– 找出與此圖片相類似的圖片– 找出左上角有一台紅色車子的圖片

簡介• 多媒體資料庫必須能提供

– 有效率之多媒體資料之儲存– 提供內涵式資料的查詢

• 與媒體本身內容相關之查詢– 多樣性的多媒體資料之展示

多媒體資料庫的挑戰• 大量資料之處理

– 多媒體資料所需之儲存空間比一般資料大得多

• 多維資料之索引– 快速的搜尋技巧

• 相似度之計算– 容錯式的查詢

• 資料之展示

多維度索引技術• 如何將使用者查詢的結果快速正確的回傳，

是很重要的問題– 資料量大，逐筆搜尋比對耗費過多的時間– 避免逐筆比對搜尋

• 對資料建立索引加快查詢• 索引可視為一種分類的指標，依據索引的指示，

即可找到與查詢相關的資料。

多維度索引技術• 在傳統資料庫中常見的索引結構

– B+-tree • 最廣為使用的索引結構

– Hash• Static hashing

• Dynamic hashing

– Grid file– Bitmap index

B+ -Tree 簡介 •B+ -Tree 為一樹結構，且符合下列特性

•為一棵平衡樹，所有的葉節點到根節點的路徑長度皆相同•對於所有的非根節點以及非葉節點，必須擁有 n/ 2 ~ n 個子節點•葉節點必須擁有 (n-1)/2 ~ n-1 值

B+ -Tree 節點結構

– Ki 為搜尋值 – Pi 為指向子節點的指標 (for nonleaf node

s) 或為指向資料的指標 (for leaf nodes). • 節點內的搜尋值為排序過的 K1 <

K2 < K3 < ... < Kn

葉節點結構葉節點之特性• 對於 i = 1, 2, . . . , n-1 ， i 不是指向一個擁有搜尋值 Ki

的資料記錄就是指向一個存取單元 ((bucket) ，而這個存取單元只包含擁有搜尋值 Ki 的資料

• Pn 指向下一個葉節點

非葉節點結構 • 在被 Pi 指到的搜尋樹內所有的搜尋值皆

小於 Ki-1 • 在被 Pi 指到的搜尋樹內所有的搜尋值皆

大於或等於 Ki

範例

討論• B+-tree 對於傳統表單資料庫的搜尋十分

有效率，且廣為被使用• 然而

– B+-tree 為單一維度的索引結構– 多媒體資料的特性

• 多媒體資料的特徵– 文件

• 內容• 關鍵字

– 圖片• 主要構成顏色• 包含物件• 物件大小• 顏色分佈• 紋理特徵…

– 音樂• 節拍• 和絃• 音調…

– 影片• 物體之移動軌跡• 包含物件• 顏色…

• 可以依內涵資訊為查詢條件– 找出與某張圖相像的圖– 找出包含類似某段旋律之歌曲– 找出有機車飛越火車的影片片段

• 一個多媒體資料是由多個特徵所描述，可由多維資料表示

• 然而 B-tree, B+-tree– 僅能對單一維度的資料做索引– 不適用於多維度資料

• 如何對多維度資料建立索引加速查詢，對多媒體資料的搜尋十分重要。

多維度上的索引結構• k-d tree

– 用來儲存 k –dimension 的資料– 在一個層級 (level) 中只比一個維度的資料– 在節點 N 所在層級比較的維度上，在節點 N

所指到的左子樹內所有的資料其該維度的值皆比節點 N該維度的值小，而右子樹的值則皆大於或等於節點 N該維度的值

• 範例

– 隨堂練習• 考慮當 k>2 時的 k-d tree

– 自己試試看– 將 A(30,24,58), B(46,78, 33), C(20,3

3,15), D(58,40, 50), E(40,88,56), F(38,54,44) 插入 k-d tree 中

– 請利用你所建立的 k-d tree ，找出與 X(34,50,46)距離在 15 以內的點

• 刪除時如何處理 ?

– 優點• 簡單

– 缺點• 樹的高度會因資料插入順序的不同而不同• 很可能造成一棵歪斜樹

– 搜尋的效率將會變得十分差• 資料刪除的過程較為複雜

多維度上的索引結構• Mx-quadtree

– 樹的形狀與插入的點的個樹以及順序無關。– 設計者必須決定一個 k ，而 k 一旦決定，則

無法更改。– 整個地圖會被切成個格子– 刪除與插入的步驟十分簡單

kk 22

• 範例– 假設 k=2

– 地圖被切成個格子– 將 A,B,C,D四個點放入MX-quad-tree 中

22 22

多維度上的索引結構• R-tree

– 為一棵平衡的樹– 針對大量資料的儲存十分有用– 可減少大量的磁碟存取– 一個 R-tree 的節點有 k 個指標– 除了根節點與葉節點外，每一個節點必須包

含至 k 個非空的指標• 控制磁碟存取的次數

2

k

– 葉節點包含真正的資料– 中間節點包含真正資料的群組輪廓，以長方形來表示

• 左上角以及右下角• 可為多維度

– 插入與刪除包括了節點的分裂以及整合，較為複雜。

– 範例• 總共有八個物件• 兩維空間• 假設 k=3

插入 p14

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1 p2

Pointers to data tuples

p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8

p2

p3

p4

p5

p9 p10

p11

p12

p13p14

inserting p14

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1 p2


p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8

p2

p3

p4

p5

p9 p10

p11

p12

p13p14

p14

– 刪除 p2

p14

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1 p2


p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8

p2

p3

p4

p5

p9 p10

p11

p12

p13p14

• 找出包含 P2 的 MBR

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1 p2


p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8

p2

p3

p4

p5

p9 p10

p11

p12

p13p14

p14

– R3 不滿足 R-tree 的定義 (underflow)

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1


p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8p3

p4

p5

p9 p10

p11

p12

p13p14

p14

– 與鄰近的 Bounding rectangle 合併

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1


p8

p3 p4 p9 p10p11 p12p13

R6 R7

R3 R4

R5

R6

R7

p1

p7

p6

p8p3

p4

p5

p9 p10

p11

p12

p13p14

p14

• 將 R3和 R4 重新整理，修改個別的左上角以及右上角之值

R1

R2

R1 R2

R3 R4 R5

p6 p7p5p1


p8

p3p4 p9 p10p11 p12p13

R6 R7

R3

R4

R5

R6

R7

p1

p7

p6

p8p3

p4

p5

p9 p10

p11

p12

p13p14

p14

另一種簡單之整合單一維度索引之多維索引結構

• 一個多媒體物件會包含的特徵為多維的– 假設一張圖片我們以平均的 R( 紅 ), G(綠 ),

B(藍 )當為特徵– 特徵空間為三維– 範例

• 假設資料庫內有十張圖

P1

P3

P2

P5P4

P6

P10P9P8

P7

• 有下列三個查詢

Q1 Q2 Q3

• 我們得到下列的 R,G,B 的平均值

Picture ID (R, G, B) Picture ID (R, G, B)

P1 (0.102, 0.101, 0.086) P8 (0.318, 0.365, 0.561)

P2 (0.275, 0.251, 0.161) P9 (0.361, 0.302, 0.184)

P3 (0.627, 0.447, 0.302) P10 (0.451, 0.396, 0.400)

P4 (0.145, 0.153, 0.227) Q1 (0.478, 0.541, 0.753)

P5 (0.141, 0.137, 0.184) Q2 (0.302, 0.310, 0.416)

P6 (0.212, 0.200, 0.231) Q3 (0.302, 0.223, 0.161)

P7 (0.180, 0.180, 0.102)

• Case 1: 假設我們要找到與查詢圖片 Q1

相似度在 0.15 內的圖片– Q1 =(0.478, 0.541, 0.753), r=0.15

– 資料庫內找出個個維度與查詢最近的值• (0.451, 0.447, 0.561)

• (|0.478-0.451|, |0.541-0.447|, |0.753-0.561|)

=> (0.027,0.094,0.192)

所以沒有符合查詢的資料>0.15

• Case 2:假設我們要找到與查詢圖片 Q2 相似度在 0.02 內的圖片– Q2=(0.302,0.310,0.416), r=0.02

– 第一步 : 在資料庫內找出個個維度與查詢最近的值

• (0.318, 0.302,0.400)

• (|0.302-0.318|, |0.310-0.302|, |0.416-0.400|)

皆小於 0.02 ，故必須進行第二步驟

– 第二步驟 : 由第一個維度開始檢查，並將容許的錯誤減掉之前已用到的額度

• 第一維可用的誤差額度為 :0.02

• 第二維可用的誤差額度為 : (0.022-0.0162)1/2

• 第三維可用的誤差額度為 : (0.022-0.0162-0.0082)1/2

大於已知之最小誤差 0.016，所以資料庫內沒有符合查詢的資料存在

• Case 3:假設我們要找到與查詢圖片 Q3 相似度在 0.05 內的圖片– Q3=(0.302, 0.223, 0.161), r=0.05

– (|0.302-0.318|, |0.223-0.200|, |0.161-0.161|)

=> (0.016,0.023,0) G>R>B– 第一步驟無法決定資料庫內沒有欲查詢的資

料

– 搜尋 G 上的索引 , 找出在 0.223+-0.05 =[0.173,0.273] 範圍內的圖

• {P7, P6, P2}

– 搜尋 R 上的索引 , 找出在 0.302+-(0.044) =[0.258, 0.346] 範圍內的圖

• {P2, P8}

– 搜尋 B 上的索引，找出在 0.161+- 0.041 =[0.120, 0.202] 範圍內的圖

• {P2, P5, P9}

– 將結果整合，得到 P2 為符合查詢之資料

演算法Algorithm 1 Build_Multi_Index /* input : a set of k points P i in n-dimensional feature space */ /* assume P i = (x1,i, x2,i, ..., xn,i) */ /* output : a multi-index MI = {I1, I2, ..., In} */ 0: begin 1: for i = 1 to n 2: begin 3: sort {xi,1, xi,2, ..., xi,k} and store the result in Ii 4: end 5: return {I1, I2, ..., In} 6: end

Algorithm 2 Find_Difference_Vector /* input : a query point Q = (q1, q2, ..., qn) in n-dimensional feature space */ /* a query radius r */ /* a multi-index set MI = {I1, I2, ..., In} */ /* output : a difference vector = (1, 2, ...,n) */ 0: begin 1: for i = 1 to n 2: begin 3: search the nearest value to q i in Ii 4: let xi be the nearest value 5: i = | qi - xi | 6: if i r 7: return null 8: endif 9: end 10: return = (1, 2, ...,n) 11: end

Algorithm 3 Candidate_Point_Retrieval /* input : a query point Q = (q1, q2, ..., qn) in n-dimensional feature space */ /* a query radius r */ /* a multi-index set MI = {I1, I2, ..., In} */ /* a difference vector = (1, 2, ...,n) */ /* output : the candidate point set in each dimension */ 0: begin 1: sort = (1, 2, ...,n) into ’ = (1’, 2’, ...,n’), 2: where i’ = oi, for 1 i n, oi denotes the subscript of i’ in 3: let R = (r1, r2, ...,rn) 4: let r1 = r

0: for i = 2 to n 1: begin 2: ri = (ri-1

2 - i-12)1/2

3: if ri2 0

4: return null 5: endif 6: end 7: for i = 1 to n 8: search oi ri in Ioi and store the candidate points to S i 9: if NumOf(Si) = 0 10: return null 11: endif 12: return S1, S2, ...,Sn 13: end

Algorithm 4 Candidate_Point_Set_Merging /* input : the candidate point sets S1, S2, ...,Sn */ /* the searching order list O 1, O2, ...,On generated by Algorithm 3 */ /* a query point Q = (q1, q2, ..., qn) in n-dimensional feature space */ /* a query radius r */ /* output : the result feature points and their similarity measures */ 0: begin 1: for i = 1 to k /* initial two tables: Counter and Distance */ 2: begin 3: Counter[i] = 0 /* store the number of dimensional conditions satisfied by Pi */ 4: Distance[i] = /* store the partial computed distance between Pi and Q */ 5: end 6: for each candidate point Pi in S1 7: begin 8: Counter[i] = 1 9: Distance[i] = | qOi - xOi,i | /* qOi is the value of Q in the Oi-th dimension */ 10: end /* xOi,i is the value of Pi in the Oi-th dimension */

0: for j = 2 to n 1: begin 2: for each candidate point Pi in S j 3: begin 4: if Counter[i] = j-1 5: Distance[i] = (Distance[i]2+ (qOi - xOi,i )

2)1/2 6: if Distance[i] qOi 7: Counter[i] = Counter[i] + 1 /* Pi satisfies the query condition in */ 8: endif /* the j-th dimension */ 9: endif 10: end 11: end 12: for each Pi with Counter[i] = n 13: begin 14: output(Pi, Distance[i]) 15: end 16: end

文件資料庫• 導論

– 文件內容的分析• 同意字 (Synonymy)

• 一辭多義 (Polysemy)

– 搜尋結果之評估• Precision

– 找到的文件正確的機率• Recall

– 相關的文件被找到的機率

文件資料庫• Precision=

• Recall=

所有的文件

相關的文件

搜尋所得之結果

)(1

)(1100

搜尋所得之結果相關的文件搜尋所得之結果

card

card

)(1

)(1100

相關的文件相關的文件搜尋所得之結果

card

card

文件資料庫• Precision/recall 之計算範例

– 請探討 precision/recall 之關係

所有的文件

相關的文件

搜尋所得之結果50 15

0

20

文件資料庫• 文件內容之描述

– Stop lists• 文件內可被忽略的

字，如 : a, the, he…

– Word stems• 同一個字各種不同

之時態或單複數等等

– Frequency tables

Term/ 文件 d1 d2 d3 d4

sex 1 0 0 0

drug 1 0 1 3

videotape 1 0 0 3

connection 0 0 0 2

slip 0 2 2 0

boat 0 1 0 0

文件資料庫• 查詢處理

– 文件相關性之計算

• 字詞距離

• Cosine距離

M

j

ivecjvec1

2))()((

|)(||)(|

)()(

jvecivec

jvecivec

文件資料庫– 查詢型態

• 找出包含某些字詞的文件• 找出包含某些字詞但不包含另一些字詞的文件• 找出離查詢向量最近的文件• 找出離查詢向量最近的前 k 個文件• 找出與查詢向量距離之內的文件

文件資料庫– 使用索引

• R-tree– 不適用於高維索引結構

• TV-tree– 與 R-tree 類似 ,但在一各節點 , 只考慮部分維度的關係

• Inverted list

• Signature files

文件資料庫• Inverted list(反轉串列 )

– 以字詞為主所形成的反轉表– 以 table 為例

• Sex : d1• Drug: d1, d3, d4• Videotape: d1, d4

– 搜尋範例以及型態• and, or, not• 無法處理相似度查詢

– 缺點• Size 大

– 壓縮技巧

文件資料庫• Signature files

– 每個關鍵字有它所對應的 code– 對一文件而言 ,該文件的 signature 即為將其

所包含的關鍵字的 code superimpose 在一起– 搜尋範例– 可處理之查詢型態

• And, Or

• Not ?

文件資料庫• 討論

– R-tree, TV-tree 可處理相似度的查詢– Inverted indices, Signature files 無法處理相似度的查詢，

只能處理包含某些字詞的查詢– Signature files 不適合處理不包含某個 (些 ) 字詞的查

詢» 請舉例說明

– R-tree 不適合高維度的資料

影像資料庫• 查詢範例

– 範例一 : 找出與這張圖相像的圖片– 範例二 : 找出左上角有一個紅色方形，而圖形的下

方為藍色的所有圖片• 可代表一張影像的資訊

– 與影像內涵資訊無關之資訊• 作者• 完成時間• 完成地點• etc..

– 與影像內涵資訊相關的特徵• 顏色分佈

– 可以 color histogram 表示• 紋理• 內含物件

– 形狀– 顏色– 大小– 位置

• 主要構成顏色• Etc.

• 由關鍵字查詢 (Query By Keyword)• 以文字屬性描述每張影像，可以對個個屬性建構索引，並可以 SQL 的方式下查詢

• 以範例查詢 (Query By Example (QBE)):• 使用者對系統展示一張範例圖，系統則根據資料庫內每張圖與這張範例圖的相似度決定回傳的答案

• 查詢型態• 找出離查詢範例最近的影像• 找出離查詢範例最近的前 k 個影像• 找出與查詢範例距離之內的影像

影像資料庫搜尋

影像距離與相似度

1. Color Similarity

2. Texture Similarity

3. Shape Similarity

4. Object & Relationship similarity

顏色相似度 (Color Similarity)

• 顏色佔的比例•Ex: R:20%, G:50%, B:30%

• 顏色分布圖 (Color histogram)

Dhist(I,Q)=(h(I)-h(Q))TA(h(I)-h(Q))

A is a similarity matrix colors that are very similar should have similarity values close to one.

• Color layout matching: compares each grid square of the query to the corresponding grid square of a potential matching image and combines the results into a single image distance

where CI(g) represents the color in grid square g of a database image I and CQ(g) represents the color in the corresponding grid square g of the query image Q. some suitable representations of color are

1. Mean2. Mean and standard deviation3. Multi-bin histogram

顏色配置

g

QIcolorcolorgridded gCgCdQId ))(),((ˆ),(_

• Pick and clickSuppose T(I) is a texture description vector which is a vector of numbers that summarizes the texture in a given image I (for example: Laws texture energy measures), then the texture distance measure is defined by

• Texture layout

材質相似度 (Texture Similarity)

g

QItexturetexturegridded gTgTdQId ))(),((ˆ),(_

2

__ )()(min),( QTiTQId Iiclickandpick

形狀相似度 (Shape Similarity)

1. Shape Histogram

2. Boundary Matching

3. Sketch Matching

Image-A Image-B Image-C

• 以內容來看，三張圖相像嗎 ?– 顏色資訊– 位置資訊

實作範例

Color model

Red=(1,0,0)

Black=(0,0,0)

Yellow=(1,1,0)

Green=(0,1,0)

Blue=(0,0,1) Cyan=(0,1,1)

White=(1,1,1)Magenta=(1,0,1)

Red 0

°

H

V

S

Green 120°

Blue 240°

RGB color space v.s. HSV color space

• 顏色與位置資訊的取得– 將圖切成一個一個的格

子– 找出每個格子的代表色– 相鄰格子有相同的顏色，

及組合成更大的格子– 最後的大區塊的顏色，位置及形狀是將來比對上所使用的重大資訊。

相似度比較• 兩張圖要相似有哪些因素是可能被使用者考慮

的 ?• 顏色配置• 顏色分布• 物件位置• 物件大小• 物件形狀

))(),((

)()(0.1),( :

21

2121 RRatioRRatioMax

RRatioRRatioRRSimShape ratio

))(),((

)()(0.1),( :

21

2121 RSizeRSizeMax

RSizeRSizeRRSimSize size

),(*

),(*

),(*),( :

21

21

2121

RRSimW

RRSimW

RRSimWRRSim functionSimilarity

colorcolor

sizesize

ratioratioregion

n

iiquery RQSizeSIZE

1

))((

,,21

))(),((*))((

),(1

m,, j

RDRQSimSIZE

RQSizeDQSim jiregion

n

i query

iimage

範例D E F G

A 0.92 – 0.81 –

B – 0.65 0.59 –

C – 0.61 – –

SIZE = 13 + 10 + 5 = 28

747.061.0*28

559.0*

28

1092.0*

28

13imageSim

A13

B10

C5

Query image

• 聽看看下面幾首音樂或音樂片段，你知道歌名是什麼嗎？ Music 1 2 3 4 5 6 7 8 9 10

• 你是怎麼辦識出這首歌的呢？若要讓電腦幫我們做同樣的事，要怎麼設計呢？

音樂資料庫

音樂的特徵• Static Music Information

如調號、拍號等• Acoustical Feature

如 loudness、 pitch 等• Thematic Feature

如 melodies、 rhythms 及 chords例“ sol-sol-sol-mi”、” 0.5-0.5-0.5-2” 及“ C-Am-Dm-G7”

• Structural Feature古典音樂格式的二個基本規則hierarchical rule 及 repetition rule

特徵的取樣• 相對音感 vs絕對音感—旋律的位移

– 考慮以絕對音感比對所會造成的問題• 升 key,降 key 所發生的問題

– 節拍取樣也有相同的問題• 依完整段落取 pattern

• 多音軌的取樣問題

特徵的編碼• 將特徵取出後，依適當的編碼方式將特

徵標碼– 能應付音調的升降– 能應付節拍的快慢– 要讓聽起來像的音樂，其編碼出來的 code

之間的距離也要近

範例• 利用重複出現的重要音調代表某首歌

– Hierarchical rulemusic object→movements→Sentences→phrases→figures

– Repetition rule如“ C6-Ab5-Ab5-C6” 及“ F6-C6-C6-Eb6”

重複出現的式樣—定義• For a substring X of a sequence of notes S,

if X appears more than once in S, we call X a repeating pattern of S. The repeating frequency of the repeating pattern X, denoted as freq(X), is the number of appearances of X in S. The length of the repeating pattern X, denoted |X|, is the number of notes in X.

重複出現的式樣—實例• “C-D-E-F-C-D-E-C-D-E-F”

RP:Repeating PatternRPF:Repeating Pattern Frequency

RP C-D-E-F C-D-E D-E-F C-D D-E

RPF 2 3 2 3 3

RP E-F C D E F

RPF 2 3 3 3 2

重複出現的式樣• nontrival 的定義

A repeating pattern X is nontrivial if and only if there does not exist another repeating pattern Y such that freq(X)=freq(Y) and X is a substring of Y.

• 實例上頁的 10 個 RP 中，只有“ C-D-E-F” 及” C-D-E” 為 nontrival

The Correlative-Matrix(1)

• Phrase

• Melody string S =“C6-Ab5-Ab5-C6-C6-Ab5-Ab5-C6-Db5-c6-Bb5-C6”

• Repeating Patterns

RPF PL(Pattern Length) RP

2 4 C6-Ab5-Ab5-C6

6 1 C6

4 1 Ab5


C6

Ab5 Ab5 C6 C6 Ab5 Ab5 C6 Db5 C6 Bb5 C6

C6 -- 1 1 1 1 1

Ab5 -- 1 2 1

Ab5 -- 1 3

C6 -- 1 4 1 1

C6 -- 1 1 1

Ab5 -- 1

Ab5 --

C6 -- 1 1

Db5 --

C6 -- 1

Bb5 --

C6 --

Construction of correlative matrix T12,12


• Find all RPs and their RFs.– 定義 candidate set CS 其格式為 (pattern,rep_c

ount,sub_count)– CS 一開始為空集合，接下來根據 T來計算

及 insert RP 到 CS 內– 因為條件有 (Ti,j=1)or(Ti,j>1) 及 (T(i+1),(j+1)=0) or

(T(i+1),(j+1) <>0), 所以有以下四種情形

The Correlative-Matrix(4)– Case 1: (Ti,j=1) and (T(i+1),(j+1)=0)例 T1,4=1,T2,5=0insert(“C6”,1,0)into CS

– Case 2: (Ti,j=1) and (T(i+1),(j+1)<>0)例 T1,5=1,T2,6=2modify(“C6”,1,0)into(“C6”,2,1)

– Case 3: (Ti,j>1) and (T(i+1),(j+1)<>0)例 T2,6=2,T3,7=3insert(“C6-Ab5”,1,1), (“Ab5”,1,1)into CS

– Case 4: (Ti,j>1) and (T(i+1),(j+1)=0)例 T4,8=4,T5,9=0insert (C6-Ab5-Ab5-C6”,1,0),(“Ab5-Ab5-C6”,1,1)and (“Ab5-C6”,1,1)into CS and change(“C6”,6,1)into(“C6”,7,2)


– 計算 RFrep_count=0.5f(f-1) 即 f=((1+SQRT(1+8*rep_count))/2例如本例中 (“C6”,15,1),即 C6 的 rep_count=15, 所以 f=((1+SQRT(1+8x15))/2=6同理“ Ab5” 的 RF 為 4,“C6-Ab5-Ab5-C6” 的 RF為 2

The String-Join Approach(1)

• Melody string “C-D-E-F-C-D-E-C-D-E-F”

• 第一步 : 找出所有長度為 1 的 RPs,並記為 {X,freq(X),(position1,position2,…)}如本例可找到 {“C”,3,(1,5,8)},{“D”,3,(2,6,9)},{E”,3,(3,7,10)},and {“F”,2,(4,11)}


• 接下來長度為 2 的 RPs 可由上面的 RPs經 joining( 記為“∞” ) 而得例如若要找“ C-D”, 已知 {“C”,3,(1,5,8)},{“D”,3,(2,6,9)}則可確定“ C-D”亦出現在 (1,5,8), 可表示為 {“C”,3,(1,5,8)}∞{“D”,3,(2,6,9)} ={“C-D”,3,(1,5,8)}


• 同理{“D”,3,(2,6,9)}∞{“E”,3,(3,7,10)} ={“D-E”,3,(2,6,9)}{“E”,3,(3,7,10)}∞{“F”,2,(4,11)} ={“E-F”,2,(3,10)}

• 而長度為 4 的 , 可由長度為 2 的 join 而得如 {“C-D”,3,(1,5,8)}∞{“E-F”,2,(3,10)} ={“C-D-E-F”,2,(1,8)}


• 長度為 3 的 ,因為 freq(“C-D-E-F”)=freq(“E-F”)=2, 可知不只“ E-F” 是 trivial,”D-E-F”也是 (否則 freq(“E-F”) 要大於 2)而 {“C-D”,3,(1,5,8)}∞{“D-E”,3,(2,6,9)} ={“C-D-E”,3,(1,5,8)}且 freq(“C-D-E”) > freq(“C-D-E-F”)所以“ C-D-E” 為 nontrivial

• 最後 , 得知此例的 nontrivial repeating patterns 為“ C-D-E-F” 及“ C-D-E”

討論• 相對音感 vs絕對音感—旋律的位移• 依完整段落取 pattern• 不同音樂格式的轉換• 問題—重要卻沒重覆的 feature

視訊資料庫• 內容組織

– 使用者會對哪一部分的內容感興趣– 如何儲存這部分的內容，使得查詢處理能很有

效率的被執行– 如何設計查詢語言，與傳統的 SQL 有何不同– 影片的內容可自動的被取出來嗎 ?

影片內涵資訊• 物件

– 單純形狀的描述• 可做到自動化

– 有意義的物件描述• 人物

– 男主角，女主角…• 動物

– 豬，貓，狗…• 非生物

– 皮箱，鑰匙…• 幾乎不可能做到自動化

• 活動– 單純描述

• 物件移動軌跡– 如何將軌跡編碼成電腦可比對的 code 為一個很重要的課題

• 可做到自動化– 含有意義的行為描述

• 車禍，甲男把皮箱交給乙女…• 必須用單純描述做為基礎描述• 不容易做到完全自動化

視訊內涵資訊之建構• 兩種資訊

– 靜態• 將一個 frame 視為一張圖片• 利用圖片搜尋技巧

– 動態• 將連續 frame 視為一個動作• 物件移動軌跡必須被考慮

• 靜態資訊

• 動態資訊

Preface (Cont’d)• 移動軌跡

• 影片分析– Shot

• 單一連續的鏡頭所拍攝之影片段落• 組成影片的單位• 同一個 shot 內的 frame 內容類似

– 可以在一個 shot 中找出其代表的 frame ，來表示整個shot

• Shot 偵測– 利用顏色分佈的改變偵測 shot 的界線 (boundary)

» 可自動化 ,但可能會因顏色的突然改變而誤找– 目前 shot segmentation 工具可以達到一各十分高的正

確率 (>95%)

– 場景 (SCENE)• 由多個描述相同事件的 shot 所組成• 可當作查詢的單位

– 物件移動軌跡• 找出物體移動的軌跡，可代表某一事件

涵義概念式查詢• 可下達語意式的查詢

– 找出包含天空以及海的圖片– 找出有飛機飛過天空的影片

• 以低階 (low level) 特徵值的關聯 , 找出媒體之內涵意義– 半自動分類– Classification– Association pattern mining

• Concept 與 Semantic network

Classification

• 目標– 預測資料之類別

• 步驟– 建立資料分類模型

• 根據訓練資料集 (training set)

– 評估資料分類模型的準確度• 根據測試資料集 (testing data)

– 資料分類預測

TrainingData

NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

ClassificationAlgorithms

IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’

Classifier(Model)

Classifier

TestingData

NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes

Unseen Data

(Jeff, Professor, 4)

Tenured?

Classification• 演算法

– 決策樹 (decision tree)

– Bayesian Belief Networks

– k-nearest neighbor classifier

– case-based reasoning

– Genetic algorithm

– Rough set approach

– Fuzzy set approaches

– Neural Network

訓練資料集 (training data set)

age income student credit_rating<=30 high no fair<=30 high no excellent31…40 high no fair>40 medium no fair>40 low yes fair>40 low yes excellent31…40 low yes excellent<=30 medium no fair<=30 low yes fair>40 medium yes fair<=30 medium yes excellent31…40 medium no excellent31…40 high yes fair>40 medium no excellent

決策樹age?

overcast

student? credit rating?

no yes fairexcellent

<=30 >40

no noyes yes

yes

30..40

Naïve bayesian Network :exampleOutlook Temperature Humidity Windy Classsunny hot high false Nsunny hot high true Novercast hot high false Prain mild high false Prain cool normal false Prain cool normal true Novercast cool normal true Psunny mild high false Nsunny cool normal false Prain mild normal false Psunny mild normal true Povercast mild high true Povercast hot normal false Prain mild high true N

P(p) = 9/14

P(n) = 5/14

outlook

P(sunny|p) = 2/9 P(sunny|n) = 3/5

P(overcast|p) = 4/9

P(overcast|n) = 0

P(rain|p) = 3/9 P(rain|n) = 2/5

temperature

P(hot|p) = 2/9 P(hot|n) = 2/5

P(mild|p) = 4/9 P(mild|n) = 2/5

P(cool|p) = 3/9 P(cool|n) = 1/5

humidity

P(high|p) = 3/9 P(high|n) = 4/5

P(normal|p) = 6/9 P(normal|n) = 2/5

windy

P(true|p) = 3/9 P(true|n) = 3/5

P(false|p) = 6/9 P(false|n) = 2/5

Play-tennis example: classifying X• An unseen sample X = <rain, hot, high, false>

• P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582

• P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286

• Sample X is classified in class n

Bayesian Belief NetworksFamilyHistory

LungCancer

PositiveXRay

Smoker

Emphysema

Dyspnea

LC

~LC

(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)

0.8

0.2

0.5

0.5

0.7

0.3

0.1

0.9

Bayesian Belief Networks

The conditional probability table for the variable LungCancer

Bayesian Belief Networks

The k-Nearest Neighbor Algorithm

.

_

+

_ xq

+

_ _

+

_

_

+

.

..

. .

Rough Set Approach• Rough sets are used to approximately or

“roughly” define equivalent classes

• A rough set for a given class C is approximated by two sets: a– lower approximation (certain to be in C) – upper approximation (cannot be described as not

belonging to C)

Fuzzy set approach

Association pattern mining

• 目標– 尋找項目 (item) 或物件間的關聯性– 關聯性

• 一起出現的次數要夠多 (support)

• 伴隨出現之條件機率值要夠大 (confidence)

• 演算法– Apriori algorithm

– Lattice approach

– FP-tree

探勘關聯式法則 : 範例

For rule A C:support = support({A C}) = 50%

confidence = support({A C})/support({A}) = 66.6%

The Apriori principle:Any subset of a frequent itemset must be frequent

Transaction ID Items Bought2000 A,B,C1000 A,C4000 A,D5000 B,E,F

Frequent Itemset Support{A} 75%{B} 50%{C} 50%{A,C} 50%

Min. support 50%Min. confidence 50%

Apriori 演算法• Join Step: Ck is generated by joining Lk-1with itself

• Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset

• Pseudo-code:Ck: Candidate itemset of size kLk : frequent itemset of size k

L1 = {frequent items};for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do

increment the count of all candidates in Ck+1 that are contained in t

Lk+1 = candidates in Ck+1 with min_support endreturn k Lk;

範例

TID Items100 1 3 4200 2 3 5300 1 2 3 5400 2 5

Database D itemset sup.{1} 2{2} 3{3} 3{4} 1{5} 3

itemset sup.{1} 2{2} 3{3} 3{5} 3

Scan D

C1L1

itemset{1 2}{1 3}{1 5}{2 3}{2 5}{3 5}

itemset sup{1 2} 1{1 3} 2{1 5} 1{2 3} 2{2 5} 3{3 5} 2

itemset sup{1 3} 2{2 3} 2{2 5} 3{3 5} 2

L2

C2 C2

Scan D

C3 L3itemset{2 3 5}

Scan D itemset sup{2 3 5} 2

FP-tree 演算法

• 把一大型資料庫壓縮至一緊實的資料結構– FP-tree

• 只包含探勘關聯式樣式所需之相關資料• 避免花費高昂的資料庫掃描

FP-tree 建置過程

min_support = 0.5

TID Items bought (ordered) frequent items100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}200 {a, b, c, f, l, m, o} {f, c, a, b, m}300 {b, f, h, j, o} {f, b}400 {b, c, k, s, p} {c, b, p}500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}

Steps:

1. Scan DB once, find frequent 1-itemset (single item pattern)

2. Order frequent items in frequency descending order

3. Scan DB again, construct FP-tree

{}

f:4 c:1

b:1

p:1

b:1c:3

a:3

b:1m:2

p:2 m:1

Header Table

Item frequency head f 4c 4a 3b 3m 3p 3

FP-tree 主要探勘過程

1) 對 FP-tree 內的每個 node, 建置 conditional

pattern base

2) 對每一個 conditional pattern-base 建置 con

ditional FP-tree

3) 重複上面步驟，一直到 FP-tree 只剩下單一路徑

Step 1: 對 FP-tree 內的每個 node,建置 conditional pattern base

Conditional pattern bases

item cond. pattern base

c f:3

a fc:3

b fca:1, f:1, c:1

m fca:2, fcab:1

p fcam:2, cb:1

{}

f:4 c:1

b:1

p:1

b:1c:3

a:3

b:1m:2

p:2 m:1

Header Table

Item frequency head f 4c 4a 3b 3m 3p 3

Step 2: 對每一個 conditional pattern-base 建置 conditional FP-tree

All frequent patterns concerning m

m,

fm, cm, am,

fcm, fam, cam,

fcam

m-conditional pattern base:

fca:2, fcab:1

{}

f:3

c:3

a:3m-conditional FP-tree

{}

f:4 c:1

b:1

p:1

b:1c:3

a:3

b:1m:2

p:2 m:1

Header TableItem frequency head f 4c 4a 3b 3m 3p 3

Mining Frequent Patterns by Creating Conditional Pattern-Bases

EmptyEmptyf

{(f:3)}|c{(f:3)}c

{(f:3, c:3)}|a{(fc:3)}a

Empty{(fca:1), (f:1), (c:1)}b

{(f:3, c:3, a:3)}|m{(fca:2), (fcab:1)}m

{(c:3)}|p{(fcam:2), (cb:1)}p

Conditional FP-treeConditional pattern-baseItem

Step 3: Recursively mine the conditional FP-tree

{}

f:3

c:3

a:3m-conditional FP-tree

Cond. pattern base of “am”: (fc:3)

{}

f:3

c:3am-conditional FP-tree

Cond. pattern base of “cm”: (f:3){}

f:3

cm-conditional FP-tree

Cond. pattern base of “cam”: (f:3)

{}

f:3

cam-conditional FP-tree

效能分析

0

10

20

30

40

50

60

70

80

90

100

0 0.5 1 1.5 2 2.5 3

Support threshold(%)

Ru

n t

ime(s

ec.)

D1 FP-grow th runtime

D1 Apriori runtime

Data set T25I20D10K

Association pattern mining

• 傳統 Association pattern mining幾乎都是找出項目和項目間的關聯性

• 在多媒體應用中– 互斥之關係亦十分重要

• 可幫助分類的準確性

Concepts與 Semantic network

• 概念 (concepts)– 知識表達之基本觀念– Semantic notions of the objects in the world


– 概念之間的關係• 多重解析度 (multi-resolution)


• Semantic network– 節點

• 物件 ,觀念或狀態– 連結

• 節點之間的關聯

• 參考資料– V.S. Subrahmanian, Principles of Multimedia Database Systems,

Morgan Kaufmann.

– C.Y. Tsai, A.L.P. Chen and K. Essig,”Efficient Image Retrieval Approaches for Different Similarity Requirements”, Proc. SPIE Conference on Storage and Retrieval for Image and Video Databases, 2000

– Jiawei Han and Micheline Kamber, Data Mining: Concepts and Te

chniques, Morgan Kaufmann, 2000.

Content-Based Interactivity

Paper study : topic 1

A Semantic Modeling Approach for Video Retrieval by Content

Edoardo ArdizzoneMohand-Said HacidICMCS 1999 July

Introduction

• Using keywords or free text to describe the necessary semantic objects is not sufficient.

• The issues that need to be addressed is1) the representation of video information2) the organization of this information3) user-friendly representation

Introduction(cont.)

• We exploit the 2 languages

1) One for defining the schema (i.e. the

structure)

2) The other for querying through schema• And 2 layers for representing video`s conceptual

content

1) Object layer

2) Schema layer

2 Layers for video`s conceptual content

• Object layer: collect objects of interest, their description and relation among them. Objects in video sequence are represented as visual entities.

• Schema layer: intend to capture the structure and knowledge for video retrieval. Visual entities can be classified into hierarchical structure.

Schema Language—example1

Query Language (QL)

• Querying a DB means retrieving stored objects that satisfy certain conditions or qualifications and hence are interesting for a user.

• In OODB, classes are used to represent sets of objects.

QL cont.

• Queries are represented as concepts in our abstract language.

• The syntax and semantics of a concept language for making queries

QL- Example

• “Sequences of movies directed by Kevin Costner in which he is also an actor”

actordirectedByfilm

nameCostnerKevindirectedByfilm

Sequence

)("".

)}()(|{)'( ' dRdRdRR IIIII

QL- Example

• “the set of movies whose directors are also producers of some films”

).( producedBydirectedByFilm

)}'()(|)',{()'( ' dRdRddRR IIII

Semantic Annotation of Sports Video

• Videos isn`t just a sequence of images. It add the temporal dimension.

• An approach for semantic annotation of sports videos that include several different sports and even non-sports content

Introduction --Typical sequence of shots in sports video

Classifying visual shot features

Implementation--Classifying visual shot features

(cont.)

Conclusion

• There is a growing interest in video database and for dealing with access problems.

• One of the central problems in the creation of robust and scalable systems for manipulating video information lies in representing video content

Conclusion cont.

• This framework is appropriate for supporting conceptual and intensional queries

• Be able to perform exact as well as partial or fuzzy matching

• some physical features : color, objects’s shape…ect.

Paper study: topic 2

Indexing methods for approximate string matching

IEEE data engineering bulletin,2000 Gonzalo Navarro, Ricardo Baeza-Yates, Erkki Sutinen, Jorma Tarh

io

outline

• Introduction• Basic concepts• Neighborhood generation• Partitioning into exact search• Intermediate partitioning• summarization

Introduction

• Definition– given a long text T1…..n of length n and a comparatively short

pattern P1…..m of length m ， both sequences over an alphabet Σ of size σ ,find the text positions that match the pattern with at most k “errors”.

• Applications– Retrieving musical passages similar to a sample

– Finding DNA subsequences after possible mutations

– Searching text under the presence of typing or spelling errors

outline


Suffix trees 1 g a a c c g a c c t 2 a a c c g a c c t 3 a c c g a c c t 4 c c g a c c t 5 c g a c c t 6 g a c c t 7 a c c t 8 c c t 9 c t10 t

Weak point:large space requirement,about 9 times of text size.

Suffix array

a$

a a a a b b c d r ab b c d r r a ar r a a $ ca a c $ $ c

Require less space,about 4 times of text size

Q-grams,Q-samplesTEXT

a b r a c a d a b r a

1 2 3 4 5 6 7 8 9 10 11

1

23

45

INDEXa b r ab r a cr a c aa c a dc a d a

1 82345

Q-samples,unlike q-grams,do not overlap, and may even be some space between each pair of samples.

Edit distance

S U R G E R Y

0 1 2 3 4 5 6 7

S 1 0 1 2 3 4 5 6

U 2 1 0 1 2 3 4 5

R 3 2 1 0 1 2 3 4

V 4 3 2 1 1 2 3 4

E 5 4 3 2 2 1 2 3

Y 6 5 4 3 3 2 2 2ed(“SURVEY”,”SURGERY”) Final result

outline


Neighborhood generationPattern :abc with 1 error

{＊ bc, a＊ c,ab＊ }U{ab,ac,bc}U{＊ abc,a＊ bc,abc＊ }

Text a b r a c a d a b r a

{abr},{ac},{abr},..

resultsK-Neignborhood

K-neighborhood(candidate) could be quite large,So,this approach workswell for small m and k.

searching

outline


Partitioning into exact searchPattern :abr with 1 error

{a},{br}


{abra},{abra}..

resultsPartition pattern

1.For large error level the text areas to verify cover almost almost all the text.2.If s grow,pieces get shorter, more match to check,but make the filter stricter.

Exact search

verification


filtration

into (K+s) pieces

outline


Intermediate PartitioningPattern :abr with 1 error

{a},{br}


{abra},{abra}..


Neighborhood generation allow floor of k/j

verificationText a b r a c a d a b r a

into j (j=2)pieces

J=2 (j=K+1;partitioning into exact search)

searching

Intermediate PartitioningPattern :abr with 1 error

{abr}


{abra},{abra}..


1.Which j value to use? the search time decreases when j move from 1 to k+1. but the verification cost grows, oppositiely.

Neighborhood generation allow floor of k/j

into j (j=1)pieces

J=1 (neighborhood generation)

searching

{*abr,a*br,ab*r,abr*}U{ab,br,ar}U{ab*,*br,a*r}

outline


summarization

Paper study: topic 3

Lazy Users and automatic Video Retrieval Tools in (the) Lowlands

The Lowlands Team

CWI1, TNO2, University of Amsterdam3, University of Twente4

The Netherlands

Jan Baan2, Alex van Ballegooij1, Jan Mark Geusenbroek3, Jurgen den Hartog2, Djoerd Hiemstra4,

Johan List1, Thijs Westerveld4, Ioannis Patras3, Stephan Raaijmakers2, Cees Snoek3, Leon Todoran3,

Jeroen Vendrig3, Arjen P. de Vries1 and Marcel Worring3.

Proceeding of the 10th Text Retrieval Conference(TREC), 2001

Outline

• Introduction

• Detector-base processing

• Probabilistic multimedia retrieval

• Interactive experiment

• Lazy users

• Discussion

• Conclusion

Basic key subject of Multimedia database

• Indexing– K-d tree, point quadtree, MX-quadtree, R-tree,

suffix-tree, TV-tree……– Determined by database designer.

• Similarity – No standard.– How similar is decide by user.

User is always Lazy!

• Facts 1:– Almost of end user know nothing about

“Query”.

• Facts 2:– What they want may only a concept, can not

clearly to descript.

• Facts 3:– Users like selection, not question.

Introduction • Use two complementary automatic approac

h:– Visual content– Transcript

• The experiment focus on revealing relationships between:– Different modalities– The amount of human processing– The quality of rersults

Introduction

Run Description

1 Detector-base, automatic

2 Combined 1-3, automatic

3 Transcript-base, automatic

4 Query articulation, interactive

5 Combined 1-4, interactive, by a lazy user

Detector-base processing

Architecture for automatic syste

Detector-base processing (cont)

• Detector for exact queries that yield yes/no answer depending if a set of predicates is satisfied.

• Detector for approximate queries that yield a measure that expresses how similar is.

Detector-base processing (cont)

Selected detector

Analysis of the topic

description

Query by example

Filter-out irrelevant material

Final ranked results

Detectors • Camera technique detection

– zoom, pan, tilt…….

• Face detector– no face, 1-face, 2-faces…5-faces, many-faces

• Caption retrieval – Text segmentation, OCR, fuzzy string matching

• Monologue detection– Shot should contain speech– Shot should have a static or unknown camera technique

– Shot should have a minimum length

• Detectors base on color invariant features– Keyframes store with color histogram

Probabilistic multimedia retrieval

• We assume our documents are shots from video.

• Models of discrete signals(i.e. text).– Mixture of discrete probability measures

• Models of continuous signals(i.e. image).– Mixture of continuous probability measures

• Using Bayes’ rule:

• If a query consists of several independent parts (e.g. a textual Qt and visual part Qv)



• Hierarchical data model of video

video

shots

scenes scenes

shots

frames frames


• Text retrieval– Using Sphinx3 speech recognition system from

Carnegie Mellon University – Input query keyword– Retrieval to shots level


• Image retrieval– Retrieving the key frames of shots– Cut key frames of each shots into blocks of 8 x

8 pixels– Perform by Discrete Cosine Transform (DCT),

which used in the JPEG compression standard.

Interactive experiments

• Topic lists.– http://www-nlpir.nist.gov/projects/t01v/topicsoverview.html

• Topic 33: White fort

• Topic 19: Lunar rover

• Topic 8: Jupiter

• Topic 25: Starwar

Topic 33: White fort

Using Run 1:

Any color-based technique worked out well for this query

Example known-item keyframe

Topic 19: Lunar rover

Color-histogramExample

Known-item keyframe

Color-based retrieval technique is not useful in this case

By Run 4 :

Allow user to making explicit their own world knowledge: in scenes on the moon, the sky is black.

Topic 8: Jupiter

Example

Some correct answers keyframes

At first though, this query may seem to be easy to solve.

But it is apparent that colors in different photos.

Using three color-histogram and their interrelationships.

Color-sets

Topic 25: Starwar

Example

Some correct answers keyframes

Text retrieval: (if you know the name)

The first filter selects only those images that have sufficient amount of golden content.

Secondly, a set of filters reduces the data-set by selecting those images that contain the color-sets shown.

R2D2, C3PO

Lazy users• Lazy users identify result sets instead of correct

answer. (so our interactive results are not 100% precision.)

• The combination strategies used to construct run 5 consisted of:

Choose the run that looks best

Concatenate or interleave top-N from various runs

Continue with an automatic, seeded search

strategy

Discussion

• How video retrieval systems should be evaluated.

• The inhomogeneity of the topics– “sailboat on the beach” vs. “yacht on the sea”

• The low quality of the data– photos of Jupiter

• The evaluation measures used

Conclusion

• Our evaluation demonstrates the importance of combining various techniques to analyze the multiple modalities.

• the optimal technique depends always on the query.

• User interaction is still required to decide upon a good strategy.


VIDEO INDEXING BY MOTION ACTIVITY MAPS

Wei Zeng; Wen Gao; Debin Zhao;

Image Processing. 2002. Proceedings. 2002 International Conference on , Volume: 1 , 2

002 Page(s): 912 -915

Outline

• Introduction– motion indexing

• Motion Activity Map---MAM• Definition of MAM• Generation of MAM• Organization of MAMs• Experimental results• Conclusion

Introduction • To find a video indexing technique which could

extract crucial information from videos for efficient visual content-based queries.

• In order to foster the content-based indexing and retrieval.

• The video indexing should be based on good feature representation such as motion feature.

• Motion feature depicts the dynamic contents of video, and enrich the semantics of videos, such as running and flying.

Motion indexing

• Those techniques and systems about motion indexing can be categorized into four types.– Feature-based approach:– Trajectory-based approach:– Semantic-based approach– Image-based approach

Feature-based approach:

• Computes the motion parameters of predefined motion model.

• Has been adopted by MPEG7(still draft)• example

Trajectory-based approach:

• This approach is often chosen by object-based system for indexing video.

Semantic-based approach• Provides semantic events or actions of

motion.• Reference paper “A Semantic Event-Detection

Approach and Its Application to Detecting Hunts in Wildlife Video”

Image-based approach

• Gives synthesized pictures generated from motion of video.

• MAM is the image-based approach

Concepts of MAM(1)

• Motion activity map is an image that accumulates the motion activity on the specific grids along the temporal axis of videos.

ti

jgrid

Concepts of MAM(2)

• It is an image-based representation about the magnitude and spatial distribution of motion.

• One video clique can generate several MAMs and all MAMs are organized into a hierarchical tree view according to the structure of video.

Definition of MAM(1)• Motion activity map is an image synthesized from

motion vector field, and motion vector field can be defined as following temporal function. X(t), where t=t0,t1,………tk.

X(t) = v( i, j, t )

),,(

),,( ) t j, i, v(

tjiv

tjiv

y

x

(i, j)

Where vx= ( i, j, t ) and vy= ( i, j, t ) are the x-axis component and y-axis component of motion vector on the grid ( i, j).

Definition of MAM(2)• Base on the motion vector field X(t), the

motion activity map (MAM) is computed as

kt

t

tjivfjiM0

),,((),( (i, j)

Where f(v(i, j, t)) is the motion activity measure function on grid (i, j) and is the grid set of video.

Generation of MAM

Motion vector field

VideoVideo

Video

Temporal video

Segmentation

MAMComputing

MAMQuantization

MAM spatialSegmentation

MAM

Region-BasedMAMs

Demo video segmentation

Hall shall

Organization of MAMs

• Video can be segmented into different shot levels such as shots and sub-shots, so there are a lot of MAMs corresponding to a video shot.

• All the MAMs of video can be organized into a hierarchical tree representing the structure of video.

Organization of MAMs

Interactive Video Retrieval

VideoVideo

Video

TemporalSegmentation

MAMComputing

Layeredspatial

segmentation

MAM display

MAMDatabase

Expermental results

(a)Key frame based MAM

(b)MAM

(c-f)Region-representation of MAM

Conclusion

• VideoshotMAMsub-shot1MAM1sub-shot2MAM

2

• All the MAM could be segmented into Region-representation.

• Optimalize MAM-based representation, we mark the pixel of MAM with a specific color according to the related intensity.


SOM-Base R*-Tree for Similarity Retrieval

Database Systems for Advanced Applications, 2001. Proceedings. Seventh International Conference on , 2001Kun-seok. Oh, Yaokai Feng, Kunihiko Kaneko, Akifumi Makinouchi, Sang-hyun Bae

Outline

• Self-Organizing Maps (SOM)

• R*-Tree

• SOM-Based R*-Tree

• Experiments

• Conclusion

Self-Organizing Maps (SOM)What is SOM1.SOM provide mapping from high-demensional featur

e vectors onto a two-dismensional space2.The mapping preserves the topology of the feature v

ector.3.The map is called to topological feature map, and pr

eserves the mutual relationships(similarity) in feature space of input data.4.The vectors contained in each node of the topologic

al feature map are usually called codebook vectors.

• 我們使用 100 個類神經元排列成 10×10 的二維矩陣來進行電腦模擬，用來進行測試的輸入向量的維度也是二維的資料，且其機率分佈為均勻地分佈在。

}11;11{ 21 xx

Self-Organizing Maps (SOM)

圖：均勻分佈之資料的自我組織特徵映射圖： (a) 隨機設定之初始鍵結值向量 ;(b) 經過 50 次疊代後之鍵結值向量 ;(c) 經過 1,000 次疊代後之鍵結值向量 ;(d) 經過 10,000 次疊代後之鍵結值向量 ;


• 類神經元在特徵映射圖中的機率分佈，的確可以反應出輸入向量的機率分佈。這裏要強調一點的是，資料的機率分佈特性並非是線性地反應於映射圖中。

三群高斯分佈之資料。


Self-Organizing Maps (SOM)SOM Algorithm

1.Init Map neuron.2.input feature vector x.3.find winner neuron

(BMN:Beat-Match Node)

4.adjusting all neuron’s weight

5.continus step 2, until no adjusting.

R*-Tree

• The R*-tree improves the performance of the R-tree by modifying the insertion and split algorithms by introducing the forced reinsertions mechanism

• The R*-tree is proposed as an index structure for spatial data such as geographical and CAD data

R*-Tree• Each internal node contais an array of (p,) entries.

Where p is a pointer to in child node of this internal node, and is the minimum bounding rectangle(MBR) of her child node pointer to by the pointer p.

• Each lead node contains an array of (OID, ) for spatial objects, where OID is an object identifer, and is the MBR of the object identified by OID.

R*-Tree (cont.)Space of point data

R*-Tree (cont.)Tree access structure

SOM-Based R*-Tree1、 Clustering similar images

–We first generate the topological feature map using the SOM, We generate the BMIL by computing the distance between the feature vector and codebook vectors from the topological feature map.

–The BMN(best-match-nodes: node with minimum distance) is chosen from the map nodes.

–Next the weigth vector are updated

mini ii

BMN FV CBV

SOM-Based R*-Tree (cont.)

SOM-Based R*-Tree (cont.)

2、 Construction–In order to construct the R*-tree, we select a CBV (codeb

ook vector) from the topological feature map as an entry .

–If it is an empty node . We select the next codebook vector. Otherwise determine the leaf node which insert codebook vector.

–A leaf of the SOM-based R*-tree has the following structure:

1

1

: ( ,..., ,..., ) ( )

: ( , )

i pL E E E m p M

E OID

Experiments• We preformed experiments to compare the Som-base

with SOM and R*-tree.• Image database use: 40,000 atrificial/natural

(storage on local disk)• Image size: 128*128 pixels • Performed on:COMPAQ deskpro( OS:FreeBSD) with

128MB RAM

Experiments (cont.)• Feature Extraction:

– use Haar waveletes to compute feature vector

– The color space YIQ-space(NTSC transmission primaries )

– Each elecment of this feature vector represents an agerage of 32*32 pixels of original image.

– The color feature vector has 48 dimensions(4*4*3 ; where 3 is the ehree channels of YIQ-space)

Experiments (cont.)• Construcion of SOM-based R*-tree

Experiments (cont.)

Experiments (cont.)• We experimented with four type of searches:

(I) normal SOM including empty nodes

(II) normal SOM with eliminated empty nodes

(III) normal R*-tree

(IV) SOM-based R*-tree with eliminated empty nodes

Experiments (cont.)(1) Retrieval from SOM with empty nodes(2) Retrieval from SOM without empty

nodes

Experiments (cont.)

Conclusion

• For high-dimensional data ,we using a topological feature map and a best-matching-image-list (BMIL) obtained via the learning of a SOM

• In an experiment ,we performed a similarity search using real image data and compared the performance of the SOM-based R*-tree with a normal SOM and R*-tree ,base on retrieval time cost

Technology

多媒體資料庫(New)3rd