Semantic video classification based on subtitles and domain terminologies

Preview:

Citation preview

SEMANTIC VIDEO CLASSIFICATION BASED ON SUBTITLES AND DOMAIN TERMINOLOGIES “基於字幕以及領域術語學為基礎的影⽚片語義分群”

FROM:KAMC 07’ 1ST INTERNATIONAL WORKSHOP ON KNOWLEDGE ACQUISITION FROM MULTIMEDIA CONTENT EDITOR: POLYXENI KATSIOULI, VASSILEIOS TSETSOS, STATHES HADJIEFTHYMIADES 報告者:蘇⿍鼎⽂文 指導教授:林熙禎

MOTIVATION

新教育革命當中國學生不用花半毛錢在家就能上到美國的知名大學課程

慕課: ⼀一場新教育⾰革命

免費教育網路服務:Coursera 已經有700萬註冊學⽣生,超過英國和法國⼤大學⽣生⼈人數的總和。

Coursera 使⽤用者中,三分之⼀一來⾃自於發展中的經濟體。

What is MOOC⼤大規模網路免費公開課程(Massive Open Online Course)

源於開放教育資源的教育理念

焦點著重於如何使學⽣生更輕易取得e化教學、更能永續經營e化教學

能⾃自由取得資源

沒有學⽣生⼈人數限制

MOOC的優點

只需要網路連線就可以線上學習

⾃自由分享、⾃自由批評和⾃自由瀏覽

課程彈性

Free!!

MOOC的挑戰

容易困惑或迷失⽅方向

需要具備⾃自我管理的學習態度

Guided Learning

在Video-sharing educational tool applied to the teaching in renewable energy subjects 論⽂文中實驗證明能夠⽤用⼀一個影⽚片學習系統幫助學⽣生提⾼高學習能⼒力以及動機

但影⽚片由專家⼿手動加⼊入費時且無法⾃自動化

是否能夠應⽤用Youtube海量影⽚片庫來幫助?

⾃自動分類影⽚片的⽅方法

Text MetaData

Title, Description, Tags

Entity Extraction from consistent text

A/V Features

Audio and Video signal classification

ideal for games

Less ideal for general content

Video Context

Entities from context

Comments

Web embeds

User engagement

問題

在Youtube的教育影⽚片,Text MetaData通常內容都太少了

畫⾯面、⾳音訊處理較困難且處理成本較重

是否有其他可⽤用⽂文字的⽅方式帶來較好的解決⽅方法?

Subtitle

Subtitle

Abstract

An unsupervised approach to classify video content by analyzing the corresponding subtitles

Based on the WordNet and WordNet domains

Apply natural language processing techniques on video subtitles

INTRODUCTION

semantic information from multimedia content

multimedia databases gain more and more popularity

a critical and challenging topic

explore efficient ways to index their content based on its features and semantics

Subtitlescarry information through natural language sentences

may not be able to detect all video semantics, but have several benefits:

more lightweight process than video and audio processing

high-level semantics are more closely related to human language

RELATED WORK

Semantic Video Indexing and Summarization Using Subtitles

partitions the script in segments

represents each one as a term frequency inverse document frequency (TF-IDF) vector

video retrieval and summarization are described through the application of machine learning techniques

MUMIS projectuse of natural language processing techniques for indexing and searching multimedia content

based on an XML-encoded ontology is applied to textual sources of different type and in different language separately

combines the annotations extracted from such sources into one integrated, formal description of their content

Semantic principal video shot classification via mixture Gaussian

a framework for semantic classification of educational surgery videos, two phases:

1.video content characterization via principal video shots

2.video classification through a mixture Gaussian model

Content-based Video Classification Using Support Vector Machines

based on low-level features such as color, shape and motion

use a Support Vector Machine (SVM) classifier

to classify them in one of the following class labels: “cartoons”, “commercials”, “cricket”, “football” and “tennis”

Text Classification

Decision trees are one of most important and successful machine learning technique

leaves represent classifications

branches correspond to the combinations of attributes that leads to those classifications

In this paper, we compare the proposed method for classification with a decision tree classifier

WORDNET AND WORDNET DOMAINS

WordNet

a large dictionaries(or lexical database)!

English nouns, verbs, adjectives and adverbs are grouped into sets of “synsets”

Synset contains a group of synonymous words or collocations

V.S. Traditional dictionariesTraditional dictionaries are arranged alphabetically

WordNet is arranged semantically

EX:

noun synset {base, alkali}

noun synset {basis, base, foundation, fundament, groundwork, cornerstone}

verb synset {establish, base, ground, found}.

semantic relations

Most synsets are connected to other synsets through a number of semantic relations

noun synsets are related through hypernymy (generalization), hyponymy (specialization), holonymy (whole of), and meronymy (part of) relations

semantic relations Exampleartefact: root sysnset

motorcar與motorVehicle互為Hypernyms &Hyponyms

WordNet domains

augmenting WordNet with domain labels

approximately 200 domain labels enhances WordNet synsets

If none of the domain labels is adequate for a specific synset, the label Factotum is assigned to it (almost 35% synsets)

Example

Fig. 1. Some senses of the word "plant" with their corresponding domains

SCHEME

Step 1: Text Preprocessing

subtitles are segmented into sentences

POS tagger is applied to the words of each phrase

stop words are removed as they carry no semantics and do not contribute to the understanding of the main text concepts

Keywords Extraction

identify and select only the most important and relevant subtitle words for further classifying the video

implemented the TextRank algorithm

The number of keywords extracted is based on the size of the text

TextRank

completely unsupervised graph-based ranking model

keywords extraction or text summarization

利⽤用投票的原理,讓每⼀一個單字給它的鄰居投贊成票,票的權重取決於⾃自⼰己的票數

derived from Google’s PageRank algorithm

Step 3: Word Sense Disambiguation

Most words in natural language are characterized by polysemy

Ex:

BANK

Step 3: Word Sense Disambiguation

Most words in natural language are characterized by polysemy

Ex:

BANK銀⾏行

Step 3: Word Sense Disambiguation

Most words in natural language are characterized by polysemy

Ex:

BANK銀⾏行

河岸斜坡

WSD algorithm

adaptation of Lesk’s algorithm for WSD

Lesk’s algorithm:

based on glosses found in traditional dictionaries

assigned the sense whose gloss shares the largest number of words with the glosses of the other words in the context

Extend Lesk’s algorithm

using WordNet to include the related words’ glosses

through semantic relations ex:hyponym, hypernym

⽐比較容易在上位或下位詞中找到相關字詞

Example

he sat on the bank of the river

Example

he sat on the bank of the river

Lesk’s algorithm

Sit

river

Example

he sat on the bank of the river

Lesk’s algorithm

Sit

river

Extend Version

stream, watercourse

lounge

Sprawl

Step 4: WordNet Domains Extraction

derive the domains which these synsets correspond to

calculate the occurrence score of each domain label and sort them in decreasing order.

extract the WordNet domains with the highest occurrence score

圖解

keyword

圖解

keyword Synset

圖解

keyword Synset Domain X

keyword Synset Domain X

keyword Synset Domain Y

keyword Synset Domain Z

圖解

keyword Synset Domain X Wv

keyword Synset Domain X

keyword Synset Domain Y

keyword Synset Domain Z

Dx

Dy

Dz

Step 5: Definition of correspondences between category labels and WordNet domains

choose the most appropriate class label

First, we looked up in WordNet the senses related to each category label

obtained the WordNet domains that correspond to the senses of each category

calculated for each category the occurrence score of each of the derived domains

Dc

Sense

Sense

Sense

Sense

Dc’

c

Dc

Sense

Sense

Sense

Sense

Dc’

c

Dx

Dy

Dz

Dc

Sense

Sense

Sense

Sense

Dc’

c

Dx

Dy

Dz

Dc

Sense

Sense

Sense

Sense

Dx

Dx

Dy

Dz

Dc’

Step 6: Category label assignment

top-ranked WordNet domains(Step5)

Video’s set of the WordNet domains (Step 4)

STEP5

STEP4

proposed deals with assigning a category label to the video entity

Equation(1)

C be the set with all the category labels

D the set of all the WordNet domains that correspond to each category label

D = {Dc'}

c∈C∪

D

D

D

c1c2c3cN

D

c1c2c3cN

Dx

Dy

Dz

D

c1c2c3cN

Dx

Dy

Dz

Da

Dc

Db

Dx

Dy

Db

Dc

Dy

Equation(2)checking which category c ∈ C satisfies equation

classifies video v under the category c

If more than one candidate, compare the second elements and so on

Dc'[0]=Wv[0]

D

c1c2c3cN

Dx

Dy

Dz

Da

Dc

Db

Dx

Dy

Db

Dc

Dy

D

c1c2c3cN

Dx

Dy

Dz

Da

Dc

Db

Dx

Dy

Db

Dc

Dy

Wv

Dx

Dy

Dz

D

c1c2c3cN

Dx

Dy

Dz

Da

Dc

Db

Dx

Dy

Db

Dc

Dy

Wv

Dx

Dy

Dz

D

c1c2c3cN

Dx

Dy

Dz

Da

Dc

Db

Dx

Dy

Db

Dc

Dy

Wv

Dx

Dy

Dz

D

c1c2c3cN

Dx

Dy

Dz

Da

Dc

Db

Dx

Dy

Db

Dc

Dy

Wv

Dx

Dy

Dz

Cv

D

c1c2c3cN

Dx

Dy

Dz

Da

Dc

Db

Dx

Dy

Db

Dc

Dy

Wv

Dx

Dy

Dz

Cv

D

c1c2c3cN

Dx

Dy

Dz

Da

Dc

Db

Dx

Dy

Db

Dc

Dy

Wv

Dx

Dy

Dz

Cv

c1

c3

D

c1c2c3cN

Dx

Dy

Dz

Da

Dc

Db

Dx

Dy

Db

Dc

Dy

Wv

Dx

Dy

Dz

Cv

c1

c3

D

c1c2c3cN

Dx

Dy

Dz

Da

Dc

Db

Dx

Dy

Db

Dc

Dy

Wv

Dx

Dy

Dz

Cv

c1

c3

D

c1c2c3cN

Dx

Dy

Dz

Da

Dc

Db

Dx

Dy

Db

Dc

Dy

Wv

Dx

Dy

Dz

Cv

c1

c3

D

c1c2c3cN

Dx

Dy

Dz

Da

Dc

Db

Dx

Dy

Db

Dc

Dy

Wv

Dx

Dy

Dz

Cv

c1

EXPERIMENT

Experiment on documentary

36 documentaries and General types for documentary

Geography, History, Animals, Politics…

easier to classify documentaries

usually restricted to a specific domain

contain narrative

statistical information

approximately 44% of all the WordNet domains extracted from each video are assigned the label ‘Factotum

Evaluation

Classification Accuracy reflects the proportion of the classifier’s correct category assignments that agree with the user’s assignments

used the Recall and F-measure performance measures to evaluate the classification results for each individual category

Domains and category

comparisonresults were compared to those obtained from decision tree classifier J4.8 of the WEKA tool

results obtained are very promising since it achieved an accuracy value of 69.4%

Expected distance between J4.8 as unsupervised method

POLYSEMA Platform

have been carried out in the context of the POLYSEMA project

develops an end-to-end platform for interactive TV services by exploiting the metadata of the broadcast transmission

POLYSEMA Platformpresent work is part of the activity in Development of semantics extraction techniques for automatic annotation of audiovisual content

Three kinds of techniques are currently investigated:

video summarization

domain ontology learning

video classification

CONCLUSION

Look back

an innovative method for unsupervised classification of video content

applying natural language processing techniques on their subtitles

promising experimental results using documentaries, especially given the fact that no training phase is required.

Improvement

video segments & Subtitle Segments

Compare to other text classification algorithms (mainly unsupervised approaches)

define more knowledge domains more close to the movie classification

keywords extraction algorithm

Comment基於字幕的Text mining⽅方式多採取Entity Extraction的⽅方法,近來則也有採MWH(multi-wing Harmoniums), Entity’s temporal features analysis的部分

作為unsupervised的⽅方式,其Category與Domain Label之間的Mapping為靜態建構,若要動態調整的時候應該不容易

⺫⽬目前採取Single Topic Single Video的⽅方式,但⼀一部影⽚片可能會不⽌止⼀一個議題,Video Segment的⽅方式⾃自動化可能不容易,有辦法發現Topic shifting?

Comment

現在網路教育資源不斷出現但通常難以被普通⼈人接觸到,缺少了⼀一個整合的系統。

若我們能夠了解影⽚片的語義,那我們也許有機會可以做出⼀一些有⽤用的應⽤用。例如幫助學⽣生找到輔助的教材。