Automatic Key Term Extraction and Summarization from Spoken Course Lectures

Speaker: Yun-Nung Chen 陳縕儂Advisor: Prof. Lin-Shan Lee 李琳山

National Taiwan University

Automatic Key Term Extraction and Summarizationfrom Spoken Course Lectures課程錄音之自動關鍵用語擷取及摘要

2

IntroductionMaster Defense, National Taiwan University

Target: extract key terms and summaries from course lectures

3

Key Term Summary

O Indexing and retrievalO The relations between key

terms and segments of documents

IntroductionMaster Defense, National Taiwan University

O Efficiently understand the document

Related to document understanding and semantics from the documentBoth are “Information Extraction”

4

Automatic Key Term Extraction

Master Defense, National Taiwan University

5

DefinitionO Key TermO Higher term frequencyO Core contentO Two typesO KeywordO Ex. “ 語音”

O Key phraseO Ex. “ 語言模型”


6


▼ Original spoken documents

Archive of spoken

documents

Branching Entropy

Feature Extraction

Learning Methods1)AdaBoost2)Neural Network

ASR

speech signal

ASR trans


7


Archive of spoken

documents

Branching Entropy

Feature Extraction


ASR

speech signal

ASR trans


8


Archive of spoken

documents

Branching Entropy

Feature Extraction


ASR

speech signal

ASR trans


9

Phrase Identification


Archive of spoken

documents

Branching Entropy

Feature Extraction


ASR

speech signal

First using branching entropy to identify phrases

ASR trans


10


Key Term Extraction


Archive of spoken

documents

Branching Entropy

Feature Extraction


ASR

speech signalKey terms

entropyacoustic model

:

Then using learning methods to extract key terms by some features

ASR trans


11


Key Term Extraction


Archive of spoken

documents

Branching Entropy

Feature Extraction


ASR



:

ASR trans


12

Branching Entropy

O Inside the phrase

hidden Markov model

How to decide the boundary of a phrase?

represent

iscan::

isofin

::


13

Branching Entropy

O Inside the phraseO Inside the phrase

hidden Markov model


represent

iscan::

isofin

::


14

Branching Entropy

hidden Markov model

boundary

Define branching entropy to decide possible boundary


represent

iscan::

isofin

::

O Inside the phraseO Inside the phraseO Boundary of the phrase


15

Branching Entropy

hidden Markov model

O Definition of Right Branching EntropyO Probability of xi given X

O Right branching entropy for X

X xi


represent

iscan::

isofin

::


16

Branching Entropy

hidden Markov model

O Decision of Right BoundaryO Find the right boundary located between X and xi where

Xboundary


represent

iscan::

isofin

::


17

Branching Entropy

hidden Markov model


represent

iscan::

isofin

::


18

Branching Entropy

hidden Markov model


represent

iscan::

isofin

::


19

Branching Entropy

hidden Markov model


represent

iscan::

isofin

::


boundary

Using PAT tree to implement

20


Key Term Extraction


Archive of spoken

documents

Branching Entropy

Feature Extraction


ASR



:

Extract prosodic, lexical, and semantic features for each candidate term

ASR trans


21

Feature ExtractionOProsodic features

O For each candidate term appearing at the first time

Feature Name Feature Description

Duration(I – IV)

normalized duration(max, min, mean, range)

Speaker tends to use longer duration to emphasize key terms

using 4 values for duration of the term

duration of phone “a” normalized by avg duration of phone “a”


22


O For each candidate term appearing at the first timeHigher pitch may represent significant information


Duration(I – IV)



23


O For each candidate term appearing at the first timeHigher pitch may represent significant information


Duration(I – IV)


Pitch(I - IV)

F0(max, min, mean, range)


24


O For each candidate term appearing at the first timeHigher energy emphasizes important information


Duration(I – IV)


Pitch(I - IV)



25


O For each candidate term appearing at the first timeHigher energy emphasizes important information


Duration(I – IV)


Pitch(I - IV)


Energy(I - IV)

energy(max, min, mean, range)


26

Feature ExtractionOLexical features


TF term frequency

IDF inverse document frequency

TFIDF tf * idf

PoS the PoS tag

Using some well-known lexical features for each candidate term


27

Feature ExtractionOSemantic features

O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Probability

Key terms tend to focus on limited topics

t1t2

t j

tn

D1D2

Di

DN

TK

Tk

T2

T1

P(T |D )k i P(t |T )j k

Di: documents Tk: latent topics tj: terms


28


O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Probability


LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)

non-key term

key term


describe a probability distribution


29


O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Significance

Within-topic to out-of-topic ratio



non-key term

key term


within-topic freq.

out-of-topic freq.


30


O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Significance

Within-topic to out-of-topic ratio



LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)

non-key term

key term


within-topic freq.

out-of-topic freq.


31


O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Entropy




non-key term

key term



32


O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Entropy




LTE term entropy for latent topic

non-key term

key term


Higher LTE

Lower LTE


33


Key Term Extraction


Archive of spoken

documents

Branching Entropy

Feature Extraction


ASR

speech signal

ASR trans

Key terms


:

Using supervised approaches to extract key terms


34

Learning MethodsOAdaptive Boosting (AdaBoost)ONeural Network

Automatically adjust the weights of features to train a classifier


35

ExperimentsAutomatic Key Term Extraction


36

ExperimentsOCorpusO NTU lecture corpusO Mandarin Chinese embedded by English wordsO Single speakerO 45.2 hours

OASR SystemO Bilingual AM with model adaptation [1]O LM with adaptation using random forests [2]


Language Mandarin English Overall

Char Acc (%) 78.15 53.44 76.26

[1] Ching-Feng Yeh, “Bilingual Code-Mixed Acoustic Modeling by Unit Mapping and Model Recovery,” Master Thesis, 2011.[2] Chao-Yu Huang, “Language Model Adaptation for Mandarin-English Code-Mixed Lectures Using Word Classes and Random Forests,” Master Thesis, 2011.

37

ExperimentsOReference Key TermsO Annotations from 61 students who have taken the courseO If the an annotator labeled 150 key terms, he gave each of

them a score of 1/150 , but 0 to othersO Rank the terms by the sum of all scores given by all

annotators for each termO Choose the top N terms form the listO N is average number of key terms

O N = 154 key termsO 59 key phrases and 95 keywords

OEvaluationO 3-fold cross validation


38

Pr Lx Sm Pr+Lx Pr+Lx+Sm0

10

20

30

40

50

60

ExperimentsOFeature EffectivenessO Neural network for keywords from ASR transcriptions

Each set of these features alone gives F1 from 20% to 42%Prosodic features and lexical features are additiveThree sets of features are all useful

20.78

42.86

35.63

48.15

56.55

Pr: ProsodicLx: LexicalSm: Semantic

F-measure


39

N-Gram+TFIDF BE+TFIDF BE+Adaboost BE+Neural Network

0

10

20

30

40

50

60

70

ASRManual

ExperimentsOOverall Performance (Keywords & Key Phrases)

Baseline

Branching entropy performs well

F-measure


N-GramTFIDF

Branching EntropyTFIDF

Branching EntropyAdaBoost

Branching EntropyNeural Network

key phrasekeyword

23.44

52.6057.68

62.70

40

The performance of manual is slightly better than ASR

N-Gram+TFIDF BE+TFIDF BE+Adaboost BE+Neural Network

0

10

20

30

40

50

60

70

ASRManual

ExperimentsOOverall Performance (Keywords & Key Phrases)

55.84

Baseline

62.3967.31

32.19

Supervised learning using neural network gives the best results

F-measure


N-GramTFIDF

Branching EntropyTFIDF

Branching EntropyAdaBoost

Branching EntropyNeural Network

key phrasekeyword

23.44

52.6057.68

62.70

41

Automatic Summarization


42

IntroductionOExtractive SummaryO Important sentences in the document

OComputing Importance of Sentences

O Statistical Measure, Linguistic Measure, Confidence Score, N-Gram Score, Grammatical Structure Score

ORanking Sentences by Importance and Deciding Ratio of Summary


Proposed a better statistical measure of a term

43

Statistical Measure of a TermOLTE-Based Statistical Measure (Baseline)

OKey-Term-Based Statistical MeasureO Considering only key terms

O Weighted by LTS of the term


Tk-1 Tk Tk+1… …

Key terms can represent core content of the documentLatent topic probability can be estimated more accurately

ti ϵ key

ti ϵ key

44

Importance of the SentenceOOriginal Importance

O LTE-based statistical measureO Key-term-based statistical measure

ONew ImportanceO Considering original importance and similarity of other

sentences


Sentences similar to more sentences should get higher importance

45

Random Walk on a GraphO IdeaO Sentences similar to more important sentences should be more important

OGraph ConstructionO Node: sentence in the documentO Edge: weighted by similarity between nodes

ONode ScoreO Interpolating two scores

O Normalized original score of sentence Si

O Scores propagated from neighbors according to edge weight p(j, i)


Nodes connecting to more nodes with higher scores should get higher scores

score of Si in k-th iter.

46

Random Walk on a GraphOTopical Similarity between SentencesO Edge weight sim(Si, Sj): (sentence i sentence j)O Latent topic probability of the sentence

O Using Latent Topic Significance


Sj t

LTS

Si

… …Tk Tk+1

tj

Tk-1

ti tk

47

Random Walk on a GraphOScores of SentencesO Converged equation

O Matrix form

O Solutiondominate eigen vector of P’

O Integrated with Original Importance


48

ExperimentsAutomatic Summarization


49

ExperimentsOSame Corpus and ASR SystemO NTU lecture corpus

OReference SummariesO Two human produced reference summaries for each documentO Ranking sentences from “the most important” to “of average importance”

OEvaluation Metric

O ROUGE-1, ROUGE-2, ROUGE-3O ROUGE-L: Longest Common Subsequence (LCS)


50

EvaluationMaster Defense, National Taiwan University

10% 20% 30%414345474951535557

10% 20% 30%18

23

28

10% 20% 30%9

14

19

10% 20% 30%40

45

50

ASR ROUGE-1 LTE Key

ROUGE-2 ROUGE-3 ROUGE-L

Key-term-based statistical measure is helpful

51


10% 20% 30%414345474951535557

10% 20% 30%18

23

28

10% 20% 30%9

14

19

10% 20% 30%40

45

50

ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L

Random walk can help the LTE-based statistical measure

10% 20% 30%414345474951535557

10% 20% 30%18

23

28

10% 20% 30%9

14

19

10% 20% 30%40

45

50

ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L

Random walk can also help the key-term-based statistical measure

LTE LTE + RW

Key Key + RW

ASR

ASR

Topical similarity can compensate recognition errors

52


10% 20% 30%414345474951535557

10% 20% 30%18

23

28

10% 20% 30%9

14

19

10% 20% 30%40

45

50

ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-LASR LTE LTE + RW Key Key + RW

10% 20% 30%4143454749515355

10% 20% 30%18

23

28

10% 20% 30%9

14

19

10% 20% 30%40

45

50

Manual

Key-term-based statistical measure and random walk using topical similarity are useful for summarization

53

Conclusions


54

Automatic Key Term Extraction Automatic Summarization

• The performance can be improved by▫ Key-term-based statistical

measure▫ Random walk with topical

similarity Compensating recognition errors Giving higher scores to sentences

topically similar to more important sentences

Considering all sentences in the document

• The performance can be improved by▫ Identifying phrases by branching

entropy▫ Prosodic, lexical, and semantic

features together

ConclusionsMaster Defense, National Taiwan University

55

Thanks for your attention! Q & A

Published Papers:[1] Yun-Nung Chen, Yu Huang, Sheng-Yi Kong, and Lin-Shan Lee, “Automatic Key Term Extraction from Spoken Course Lectures Using Branching Entropy and Prosodic/Semantic Features,” in Proceedings of SLT, 2010.[2] Yun-Nung Chen, Yu Huang, Ching-Feng Yeh, and Lin-Shan Lee, “Spoken Lecture Summarization by Random Walk over a Graph Constructed with Automatically Extracted Key Terms,” in Proceedings of InterSpeech, 2011.


Technology

Automatic Key Term Extraction and Summarization from Spoken Course Lectures