26
Discourse Prosodic Attributes, Boundary Information and Prosodic Highlight Speaker: Jr-Feng Huang PI: Chiu-yu Tseng Phonetics Lab, Institute of Linguistics, Academia Sinica, Taipei, Taiwan 子子子子 子子子子子子子子子子子子子子子 「」 2011/07/12 Jr-Feng Huang NGASR 2011 子子子子子

Discourse Prosodic Attributes, Boundary Information and Prosodic Highlight Speaker: Jr-Feng Huang PI: Chiu-yu Tseng Phonetics Lab, Institute of Linguistics,

Embed Size (px)

Citation preview

Discourse Prosodic Attributes, Boundary Information and Prosodic

HighlightSpeaker: Jr-Feng Huang

PI: Chiu-yu TsengPhonetics Lab, Institute of Linguistics,

Academia Sinica, Taipei, Taiwan

子計畫五「韻律屬性與語音事件偵測之研究」

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Outline

• Research Direction• Introduction• Speech materials• Discourse Prosodic Attributes• Analysis of prosodic boundary• Analysis of prosodic highlight• Findings so far

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Research Direction

• Argument

• Prosody model– Discourse structure (DS)

• Serving to group phrases and utterances to form speech paragraphs and spoken discourse

– Information structure (IS)• Serving to realize information weighting in continuous speech

In addition to prosody from segmental, lexical, phonological and syntactic levels; discourse prosody is also an intrinsic part of naturally occurring speech which the human ear is sensitive to, and which cannot be pinned down from analysis of sentence prosody, nor entirely by corresponding text transcription. (Tseng, Interspeech 2010)

Abundant Information

Lexical

Syntactic

Phonological Duration

F0

Amplitude

Segmental

Discourse

Structure InformationStructure

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Introduction• Cues of prosody model– Discourse structure→ Prosodic boundaries – Information structure→ Prosodic highlight (perceived

emphasis) • Goals:– Acoustic attributes and discriminative analysis for

prosodic boundaries cross genres (Tseng et al, 2008 , 2009)

– Seeing how perceived prosodic highlights can be explained by systematic patterns by genre, discourse structure, information weighting acoustic manifestations (Tseng et al, 2011)

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Speech Materials--Taiwan Mandarin

• Read speech• Plain text of 26 discourse pieces by M051 and F051

(CNA) (about 45 and 46 minutes, 160MB)• 34 simulated pieces of weather broadcast by M054 and

F054 (WB) (about 23 and 27 minutes, 95MB)

• Spontaneous speech– NTU DSP lecture by LSL (one male speaker, about

30 minutes) (SpnL/LEC)

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Annotations

• Preprocessing • Automatic Segmental labeling using the HTK and

manually spot-checked for phone boundaries. • Manual labeling of perceived prosodic boundaries by

HPG protocols.• Manual labeling of perceived focus and prominence

– prosodic highlight

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Annotation Rationale

• Labeling Perceived Boundary Breaks

• Labeling Perceived Prosodic Highlight (emphasis, accent)

Definition Characteristics

B1 normal syllabic boundary No identifiable pauses

B2 prosodic word boundary Before a slight change of tone of voice follows.

B3 prosodic phrase boundary A clearly perceived pause.

B4 breath group boundary Clearly heard change of breath

B5 prosodic group boundaryFinal lengthening followed by a complete stop before new paragraph, with change of break.

Definition

E0 unstressed portions marked by reduced pitch, volume and/or segment reduction

E1 normal pitch, volume with no segmental reduction

E2 higher pitch or louder volume irrespective of speaker’s tone of voice or intention

E3 higher pitch or louder volume marked by speaker’s tone of voice or intention

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Annotations Examples

phone boundary layer→

perceived prosodic boundary layer→

perceived prosodichighlight layer→

“以自有品牌建立起國際品牌形象”

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Acoustic Features and Methodology

Acoustic features• Vowel-based F0• Syllable-based duration• Vowel-based intensity

Methodology• Multiple regression model

(Tseng et al 2005)

2011/07/12 NGASR 2011 暑期研習會 Jr-Feng Huang

iii factorfactorx ....21

High layer information

Intrinsic attributes

PW

SYL

BG

SYL

PPhPPh

PW PW

SYL SYL Residues

Residues

Residues

Discourse Prosodic Attributes

• Examples: 3-PPh paragraph (Tseng et al, 2010)

PW Layer

PPh Layer

PG Layer

Nor

mal

ized

F0

Syllable Position

PG Initial PG Medial PG Final

Nor

mal

ized

F0

Syllable Position

Nor

mal

ized

Dur

ation

Syllable Position Syllable Position

Nor

mal

ized

Inte

nsityPG Initial PG Medial PG Final PG Initial PG Medial PG Final

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Prosodic Boundary• Phrases are not only major and minor phrases • Acoustic realization of prosodic boundaries

– Pre-boundary • F0 lowering,• Duration lengthening• Intensity decay

– Boundary pause– Post-boundary

• F0 reset• Duration shortening• Intensity jump

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

How Reliable Is Pause Duration ? (1/2)

• Cross genres, speakers and language– systematic pattern by pause duration, i.e. B3<B4<B5

μ / σ B3 B4 B5

RS_CNA_M051P 249 /207 520 /124 621/113

RS_CNA_F051P 229 /140 339 /172 394 /237

RS_WB_M054 165/145 490/123 555/166

SpnL_LSL 423/429 739/299 1153/498

Pause duration (ms) by break (B3, B4 and B5 and genre Read Speech (RS) CNA, weather broadcast WB; spontaneous speech (Spnl)

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

How Reliable Is Pause Duration ? (2/2)

• B3 (PPh) boundaries vary a great deal• Pause duration—not reliable• How is PPh boundary B3 be perceived?

– (Tseng et al, 2009)

Plotting of the distribution of pause duration of discourse boundary breaks B2, B3 and B4 in read speech (RS) CNA for speakers F051P (left) and M051P (right).

0%

2%

4%

6%

8%

-0.4 0.4 1.2 2 2.8 3.6 4.4 5.2 6

B2

B3

B4

0%

2%

4%

6%

8%

-0.4 0.4 1.2 2 2.8 3.6 4.4 5.2 6

B2

B3

B4

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Comparison of Discourse Boundary Discrimination (Tseng et al, 2009)

• Cross-feature Comparison by Corpus

CNA_F051

CNA_M051

LEC

Discrimination:

LEC

Cross-feature comparison of mean value by corpus (LEC, CNA_F051 and CNA_M051 from top to bottom; the horizontal axis represents indexes of feature type; the vertical axis denotes mean value of each feature).

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Analysis of Perceived Emphasis Annotations (1/3)

• Distribution of Perceived Emphasis

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Combined Emphasis(E2+E3)

Analysis of Perceived Emphasis Annotations (2/3)

• Perceived Emphasis Scale– Not only perceived emphasis but syntax constraint

2011/07/12 NGASR 2011 暑期研習會 Jr-Feng Huang

Analysis of Perceived Emphasis Annotations (3/3)

• Distribution of Perceived Emphasis by phrase boundaries– LEC: post-boundary = pre-boundary– CNA: post-boundary > pre-boundary– WB: post-boundary < pre-boundary

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Emphasis Loading

• Why? – Estimate information weighting in continuous speech

• Methodology– Normalize length of PPh

– Estimation

E3label if ,3

E2label if ,2

E1label if ,1

Socre

Syl Syl Syl Syl Syl

Syl Syl Syl Syl Syl Syl Syl

PPh N

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Results of Emphasis Loading• Within PPh by Relative Syllable Position

• Within BG and PG by Relative PPh Position

Initial PPh Medial PPh Final PPh Initial PPh Medial PPh Final PPh

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Acoustic Characteristics of Prosodic Highlights (1/2)

• Emphasis vs. no-emphasis without considering PPh-positions

Mean values of acoustic correlates by emphasis/no-emphasis and genres

• Significant acoustic factors by genres• LEC:

• Duration• Average F0 (F-ratio=846)• F0 range• Intensity (F-ratio=873)

• CNA• Average F0 (F-ratio=492)• Intensity (F-ratio=364)

• WB• Intensity (F-ratio=196)• Duration (F-ratio=170)

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Acoustic Characteristics of Perceived Highlights (2/2)

• Emphasis vs. no-emphasis with considering PPh-positions PPh-Initial

PPh-Final

PPh-Medial

• LEC• Duration• Average F0• F0 range• Intensity

• CNA• Average F0• Intensity• Duration in PPh-Medial position only

• WB• Intensity by all PPh positions• Duration in PPh-Medial position only

by all PPh positions

by all PPh positions

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Analysis of Perceived Emphasis by Decision Tree Toolkit

• Why? Evaluating the most significant factors for classification

• Methodology:

• Results:

Decision Tree-CNA Decision Tree-WB Decision Tree-LEC

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Discourse Pattern of Emph vs. No-Emph—CNACNA

Initial PPh Medial PPh Final PPh

Initial PPh Medial PPh Final PPh

Initial PPh Medial PPh Final PPh

CNA

Nor

mal

ized

D

urati

on

Initial PPh Medial PPh Final PPh

Initial PPh Medial PPh Final PPh

Initial PPh Medial PPh Final PPh

Nor

mal

ized

F0

Nor

mal

ized

in

tens

ity

Syllable position

Syllable position

Syllable position

Nor

mal

ized

D

urati

onN

orm

aliz

ed F

0N

orm

aliz

ed

inte

nsity

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Removing emphasis effect

Discourse Pattern of Emph vs. Non-Emph—WB

WB

Initial PPh Medial PPh Final PPh

Initial PPh Medial PPh Final PPh

Initial PPh Medial PPh Final PPh

WB

Initial PPh Medial PPh Final PPh

Initial PPh Medial PPh Final PPh

Initial PPh Medial PPh Final PPh

Syllable position

Syllable position

Syllable position

Nor

mal

ized

D

urati

onN

orm

aliz

ed F

0N

orm

aliz

ed

inte

nsity

Nor

mal

ized

D

urati

onN

orm

aliz

ed F

0N

orm

aliz

ed

inte

nsity

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Removing emphasis effect

Discourse Pattern of Emph vs. Non-emph —LEC

Nor

mal

ized

D

urati

onN

orm

aliz

ed F

0N

orm

aliz

ed

inte

nsity

LEC

Initial PPh Medial PPh Final PPh

Initial PPh Medial PPh Final PPh

Initial PPh Medial PPh Final PPh

LEC

Initial PPh Medial PPh Final PPh

Initial PPh Medial PPh Final PPh

Initial PPh Medial PPh Final PPh

Syllable position

Syllable position

Syllable position

Nor

mal

ized

D

urati

onN

orm

aliz

ed F

0N

orm

aliz

ed

inte

nsity

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會

Removing emphasis effect

Findings• Prosodic boundary

– Pause duration could be random– Boundary neighborhood contrast is more significant.

• Prosodic highlights– Speech mode (genre) related– Independent of discourse structure– underlying linguistic structures can be derived• Future directions– Speech technology development could benefit from

more understanding of information structure in relation to prosodic highlight.

2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會