55
Speaker: Yun-Nung Chen 陳陳 Advisor: Prof. Lin-Shan Lee 陳陳陳 National Taiwan University Automatic Key Term Extraction and Summarization from Spoken Course Lectures 陳陳陳陳陳陳陳陳陳陳陳陳陳陳陳陳

Automatic Key Term Extraction and Summarization from Spoken Course Lectures

Embed Size (px)

Citation preview

Page 1: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

Speaker: Yun-Nung Chen 陳縕儂Advisor: Prof. Lin-Shan Lee 李琳山

National Taiwan University

Automatic Key Term Extraction and Summarizationfrom Spoken Course Lectures課程錄音之自動關鍵用語擷取及摘要

Page 2: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

2

IntroductionMaster Defense, National Taiwan University

Target: extract key terms and summaries from course lectures

Page 3: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

3

Key Term Summary

O Indexing and retrievalO The relations between key

terms and segments of documents

IntroductionMaster Defense, National Taiwan University

O Efficiently understand the document

Related to document understanding and semantics from the documentBoth are “Information Extraction”

Page 4: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

4

Automatic Key Term Extraction

Master Defense, National Taiwan University

Page 5: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

5

DefinitionO Key TermO Higher term frequencyO Core contentO Two typesO KeywordO Ex. “ 語音”

O Key phraseO Ex. “ 語言 模型”

Master Defense, National Taiwan University

Page 6: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

6

Automatic Key Term Extraction

▼ Original spoken documents

Archive of spoken

documents

Branching Entropy

Feature Extraction

Learning Methods1)AdaBoost2)Neural Network

ASR

speech signal

ASR trans

Master Defense, National Taiwan University

Page 7: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

7

Automatic Key Term Extraction

Archive of spoken

documents

Branching Entropy

Feature Extraction

Learning Methods1)AdaBoost2)Neural Network

ASR

speech signal

ASR trans

Master Defense, National Taiwan University

Page 8: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

8

Automatic Key Term Extraction

Archive of spoken

documents

Branching Entropy

Feature Extraction

Learning Methods1)AdaBoost2)Neural Network

ASR

speech signal

ASR trans

Master Defense, National Taiwan University

Page 9: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

9

Phrase Identification

Automatic Key Term Extraction

Archive of spoken

documents

Branching Entropy

Feature Extraction

Learning Methods1)AdaBoost2)Neural Network

ASR

speech signal

First using branching entropy to identify phrases

ASR trans

Master Defense, National Taiwan University

Page 10: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

10

Phrase Identification

Key Term Extraction

Automatic Key Term Extraction

Archive of spoken

documents

Branching Entropy

Feature Extraction

Learning Methods1)AdaBoost2)Neural Network

ASR

speech signalKey terms

entropyacoustic model

:

Then using learning methods to extract key terms by some features

ASR trans

Master Defense, National Taiwan University

Page 11: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

11

Phrase Identification

Key Term Extraction

Automatic Key Term Extraction

Archive of spoken

documents

Branching Entropy

Feature Extraction

Learning Methods1)AdaBoost2)Neural Network

ASR

speech signalKey terms

entropyacoustic model

:

ASR trans

Master Defense, National Taiwan University

Page 12: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

12

Branching Entropy

O Inside the phrase

hidden Markov model

How to decide the boundary of a phrase?

represent

iscan::

isofin

::

Master Defense, National Taiwan University

Page 13: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

13

Branching Entropy

O Inside the phraseO Inside the phrase

hidden Markov model

How to decide the boundary of a phrase?

represent

iscan::

isofin

::

Master Defense, National Taiwan University

Page 14: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

14

Branching Entropy

hidden Markov model

boundary

Define branching entropy to decide possible boundary

How to decide the boundary of a phrase?

represent

iscan::

isofin

::

O Inside the phraseO Inside the phraseO Boundary of the phrase

Master Defense, National Taiwan University

Page 15: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

15

Branching Entropy

hidden Markov model

O Definition of Right Branching EntropyO Probability of xi given X

O Right branching entropy for X

X xi

How to decide the boundary of a phrase?

represent

iscan::

isofin

::

Master Defense, National Taiwan University

Page 16: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

16

Branching Entropy

hidden Markov model

O Decision of Right BoundaryO Find the right boundary located between X and xi where

Xboundary

How to decide the boundary of a phrase?

represent

iscan::

isofin

::

Master Defense, National Taiwan University

Page 17: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

17

Branching Entropy

hidden Markov model

How to decide the boundary of a phrase?

represent

iscan::

isofin

::

Master Defense, National Taiwan University

Page 18: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

18

Branching Entropy

hidden Markov model

How to decide the boundary of a phrase?

represent

iscan::

isofin

::

Master Defense, National Taiwan University

Page 19: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

19

Branching Entropy

hidden Markov model

How to decide the boundary of a phrase?

represent

iscan::

isofin

::

Master Defense, National Taiwan University

boundary

Using PAT tree to implement

Page 20: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

20

Phrase Identification

Key Term Extraction

Automatic Key Term Extraction

Archive of spoken

documents

Branching Entropy

Feature Extraction

Learning Methods1)AdaBoost2)Neural Network

ASR

speech signalKey terms

entropyacoustic model

:

Extract prosodic, lexical, and semantic features for each candidate term

ASR trans

Master Defense, National Taiwan University

Page 21: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

21

Feature ExtractionOProsodic features

O For each candidate term appearing at the first time

Feature Name Feature Description

Duration(I – IV)

normalized duration(max, min, mean, range)

Speaker tends to use longer duration to emphasize key terms

using 4 values for duration of the term

duration of phone “a” normalized by avg duration of phone “a”

Master Defense, National Taiwan University

Page 22: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

22

Feature ExtractionOProsodic features

O For each candidate term appearing at the first timeHigher pitch may represent significant information

Feature Name Feature Description

Duration(I – IV)

normalized duration(max, min, mean, range)

Master Defense, National Taiwan University

Page 23: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

23

Feature ExtractionOProsodic features

O For each candidate term appearing at the first timeHigher pitch may represent significant information

Feature Name Feature Description

Duration(I – IV)

normalized duration(max, min, mean, range)

Pitch(I - IV)

F0(max, min, mean, range)

Master Defense, National Taiwan University

Page 24: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

24

Feature ExtractionOProsodic features

O For each candidate term appearing at the first timeHigher energy emphasizes important information

Feature Name Feature Description

Duration(I – IV)

normalized duration(max, min, mean, range)

Pitch(I - IV)

F0(max, min, mean, range)

Master Defense, National Taiwan University

Page 25: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

25

Feature ExtractionOProsodic features

O For each candidate term appearing at the first timeHigher energy emphasizes important information

Feature Name Feature Description

Duration(I – IV)

normalized duration(max, min, mean, range)

Pitch(I - IV)

F0(max, min, mean, range)

Energy(I - IV)

energy(max, min, mean, range)

Master Defense, National Taiwan University

Page 26: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

26

Feature ExtractionOLexical features

Feature Name Feature Description

TF term frequency

IDF inverse document frequency

TFIDF tf * idf

PoS the PoS tag

Using some well-known lexical features for each candidate term

Master Defense, National Taiwan University

Page 27: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

27

Feature ExtractionOSemantic features

O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Probability

Key terms tend to focus on limited topics

t1t2

t j

tn

D1D2

Di

DN

TK

Tk

T2

T1

P(T |D )k i P(t |T )j k

Di: documents Tk: latent topics tj: terms

Master Defense, National Taiwan University

Page 28: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

28

Feature ExtractionOSemantic features

O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Probability

Feature Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)

non-key term

key term

Key terms tend to focus on limited topics

describe a probability distribution

Master Defense, National Taiwan University

Page 29: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

29

Feature ExtractionOSemantic features

O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Significance

Within-topic to out-of-topic ratio

Feature Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)

non-key term

key term

Key terms tend to focus on limited topics

within-topic freq.

out-of-topic freq.

Master Defense, National Taiwan University

Page 30: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

30

Feature ExtractionOSemantic features

O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Significance

Within-topic to out-of-topic ratio

Feature Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)

LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)

non-key term

key term

Key terms tend to focus on limited topics

within-topic freq.

out-of-topic freq.

Master Defense, National Taiwan University

Page 31: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

31

Feature ExtractionOSemantic features

O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Entropy

Feature Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)

LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)

non-key term

key term

Key terms tend to focus on limited topics

Master Defense, National Taiwan University

Page 32: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

32

Feature ExtractionOSemantic features

O Probabilistic Latent Semantic Analysis (PLSA)O Latent Topic Entropy

Feature Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)

LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)

LTE term entropy for latent topic

non-key term

key term

Key terms tend to focus on limited topics

Higher LTE

Lower LTE

Master Defense, National Taiwan University

Page 33: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

33

Phrase Identification

Key Term Extraction

Automatic Key Term Extraction

Archive of spoken

documents

Branching Entropy

Feature Extraction

Learning Methods1)AdaBoost2)Neural Network

ASR

speech signal

ASR trans

Key terms

entropyacoustic model

:

Using supervised approaches to extract key terms

Master Defense, National Taiwan University

Page 34: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

34

Learning MethodsOAdaptive Boosting (AdaBoost)ONeural Network

Automatically adjust the weights of features to train a classifier

Master Defense, National Taiwan University

Page 35: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

35

ExperimentsAutomatic Key Term Extraction

Master Defense, National Taiwan University

Page 36: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

36

ExperimentsOCorpusO NTU lecture corpusO Mandarin Chinese embedded by English wordsO Single speakerO 45.2 hours

OASR SystemO Bilingual AM with model adaptation [1]O LM with adaptation using random forests [2]

Master Defense, National Taiwan University

Language Mandarin English Overall

Char Acc (%) 78.15 53.44 76.26

[1] Ching-Feng Yeh, “Bilingual Code-Mixed Acoustic Modeling by Unit Mapping and Model Recovery,” Master Thesis, 2011.[2] Chao-Yu Huang, “Language Model Adaptation for Mandarin-English Code-Mixed Lectures Using Word Classes and Random Forests,” Master Thesis, 2011.

Page 37: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

37

ExperimentsOReference Key TermsO Annotations from 61 students who have taken the courseO If the an annotator labeled 150 key terms, he gave each of

them a score of 1/150 , but 0 to othersO Rank the terms by the sum of all scores given by all

annotators for each termO Choose the top N terms form the listO N is average number of key terms

O N = 154 key termsO 59 key phrases and 95 keywords

OEvaluationO 3-fold cross validation

Master Defense, National Taiwan University

Page 38: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

38

Pr Lx Sm Pr+Lx Pr+Lx+Sm0

10

20

30

40

50

60

ExperimentsOFeature EffectivenessO Neural network for keywords from ASR transcriptions

Each set of these features alone gives F1 from 20% to 42%Prosodic features and lexical features are additiveThree sets of features are all useful

20.78

42.86

35.63

48.15

56.55

Pr: ProsodicLx: LexicalSm: Semantic

F-measure

Master Defense, National Taiwan University

Page 39: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

39

N-Gram+TFIDF BE+TFIDF BE+Adaboost BE+Neural Network

0

10

20

30

40

50

60

70

ASRManual

ExperimentsOOverall Performance (Keywords & Key Phrases)

Baseline

Branching entropy performs well

F-measure

Master Defense, National Taiwan University

N-GramTFIDF

Branching EntropyTFIDF

Branching EntropyAdaBoost

Branching EntropyNeural Network

key phrasekeyword

23.44

52.6057.68

62.70

Page 40: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

40

The performance of manual is slightly better than ASR

N-Gram+TFIDF BE+TFIDF BE+Adaboost BE+Neural Network

0

10

20

30

40

50

60

70

ASRManual

ExperimentsOOverall Performance (Keywords & Key Phrases)

55.84

Baseline

62.3967.31

32.19

Supervised learning using neural network gives the best results

F-measure

Master Defense, National Taiwan University

N-GramTFIDF

Branching EntropyTFIDF

Branching EntropyAdaBoost

Branching EntropyNeural Network

key phrasekeyword

23.44

52.6057.68

62.70

Page 41: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

41

Automatic Summarization

Master Defense, National Taiwan University

Page 42: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

42

IntroductionOExtractive SummaryO Important sentences in the document

OComputing Importance of Sentences

O Statistical Measure, Linguistic Measure, Confidence Score, N-Gram Score, Grammatical Structure Score

ORanking Sentences by Importance and Deciding Ratio of Summary

Master Defense, National Taiwan University

Proposed a better statistical measure of a term

Page 43: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

43

Statistical Measure of a TermOLTE-Based Statistical Measure (Baseline)

OKey-Term-Based Statistical MeasureO Considering only key terms

O Weighted by LTS of the term

Master Defense, National Taiwan University

Tk-1 Tk Tk+1… …

Key terms can represent core content of the documentLatent topic probability can be estimated more accurately

ti ϵ key

ti ϵ key

Page 44: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

44

Importance of the SentenceOOriginal Importance

O LTE-based statistical measureO Key-term-based statistical measure

ONew ImportanceO Considering original importance and similarity of other

sentences

Master Defense, National Taiwan University

Sentences similar to more sentences should get higher importance

Page 45: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

45

Random Walk on a GraphO IdeaO Sentences similar to more important sentences should be more important

OGraph ConstructionO Node: sentence in the documentO Edge: weighted by similarity between nodes

ONode ScoreO Interpolating two scores

O Normalized original score of sentence Si

O Scores propagated from neighbors according to edge weight p(j, i)

Master Defense, National Taiwan University

Nodes connecting to more nodes with higher scores should get higher scores

score of Si in k-th iter.

Page 46: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

46

Random Walk on a GraphOTopical Similarity between SentencesO Edge weight sim(Si, Sj): (sentence i sentence j)O Latent topic probability of the sentence

O Using Latent Topic Significance

Master Defense, National Taiwan University

Sj t

LTS

Si

… …Tk Tk+1

tj

Tk-1

ti tk

Page 47: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

47

Random Walk on a GraphOScores of SentencesO Converged equation

O Matrix form

O Solutiondominate eigen vector of P’

O Integrated with Original Importance

Master Defense, National Taiwan University

Page 48: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

48

ExperimentsAutomatic Summarization

Master Defense, National Taiwan University

Page 49: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

49

ExperimentsOSame Corpus and ASR SystemO NTU lecture corpus

OReference SummariesO Two human produced reference summaries for each documentO Ranking sentences from “the most important” to “of average importance”

OEvaluation Metric

O ROUGE-1, ROUGE-2, ROUGE-3O ROUGE-L: Longest Common Subsequence (LCS)

Master Defense, National Taiwan University

Page 50: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

50

EvaluationMaster Defense, National Taiwan University

10% 20% 30%414345474951535557

10% 20% 30%18

23

28

10% 20% 30%9

14

19

10% 20% 30%40

45

50

ASR ROUGE-1 LTE Key

ROUGE-2 ROUGE-3 ROUGE-L

Key-term-based statistical measure is helpful

Page 51: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

51

EvaluationMaster Defense, National Taiwan University

10% 20% 30%414345474951535557

10% 20% 30%18

23

28

10% 20% 30%9

14

19

10% 20% 30%40

45

50

ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L

Random walk can help the LTE-based statistical measure

10% 20% 30%414345474951535557

10% 20% 30%18

23

28

10% 20% 30%9

14

19

10% 20% 30%40

45

50

ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L

Random walk can also help the key-term-based statistical measure

LTE LTE + RW

Key Key + RW

ASR

ASR

Topical similarity can compensate recognition errors

Page 52: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

52

EvaluationMaster Defense, National Taiwan University

10% 20% 30%414345474951535557

10% 20% 30%18

23

28

10% 20% 30%9

14

19

10% 20% 30%40

45

50

ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-LASR LTE LTE + RW Key Key + RW

10% 20% 30%4143454749515355

10% 20% 30%18

23

28

10% 20% 30%9

14

19

10% 20% 30%40

45

50

Manual

Key-term-based statistical measure and random walk using topical similarity are useful for summarization

Page 53: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

53

Conclusions

Master Defense, National Taiwan University

Page 54: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

54

Automatic Key Term Extraction Automatic Summarization

• The performance can be improved by▫ Key-term-based statistical

measure▫ Random walk with topical

similarity Compensating recognition errors Giving higher scores to sentences

topically similar to more important sentences

Considering all sentences in the document

• The performance can be improved by▫ Identifying phrases by branching

entropy▫ Prosodic, lexical, and semantic

features together

ConclusionsMaster Defense, National Taiwan University

Page 55: Automatic Key Term Extraction and Summarization from Spoken Course Lectures

55

Thanks for your attention! Q & A

Published Papers:[1] Yun-Nung Chen, Yu Huang, Sheng-Yi Kong, and Lin-Shan Lee, “Automatic Key Term Extraction from Spoken Course Lectures Using Branching Entropy and Prosodic/Semantic Features,” in Proceedings of SLT, 2010.[2] Yun-Nung Chen, Yu Huang, Ching-Feng Yeh, and Lin-Shan Lee, “Spoken Lecture Summarization by Random Walk over a Graph Constructed with Automatically Extracted Key Terms,” in Proceedings of InterSpeech, 2011.

Master Defense, National Taiwan University