Upload
alyson-mcdowell
View
216
Download
1
Embed Size (px)
Citation preview
Discourse Prosodic Attributes, Boundary Information and Prosodic
HighlightSpeaker: Jr-Feng Huang
PI: Chiu-yu TsengPhonetics Lab, Institute of Linguistics,
Academia Sinica, Taipei, Taiwan
子計畫五「韻律屬性與語音事件偵測之研究」
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Outline
• Research Direction• Introduction• Speech materials• Discourse Prosodic Attributes• Analysis of prosodic boundary• Analysis of prosodic highlight• Findings so far
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Research Direction
• Argument
• Prosody model– Discourse structure (DS)
• Serving to group phrases and utterances to form speech paragraphs and spoken discourse
– Information structure (IS)• Serving to realize information weighting in continuous speech
In addition to prosody from segmental, lexical, phonological and syntactic levels; discourse prosody is also an intrinsic part of naturally occurring speech which the human ear is sensitive to, and which cannot be pinned down from analysis of sentence prosody, nor entirely by corresponding text transcription. (Tseng, Interspeech 2010)
Abundant Information
Lexical
Syntactic
Phonological Duration
F0
Amplitude
Segmental
Discourse
Structure InformationStructure
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Introduction• Cues of prosody model– Discourse structure→ Prosodic boundaries – Information structure→ Prosodic highlight (perceived
emphasis) • Goals:– Acoustic attributes and discriminative analysis for
prosodic boundaries cross genres (Tseng et al, 2008 , 2009)
– Seeing how perceived prosodic highlights can be explained by systematic patterns by genre, discourse structure, information weighting acoustic manifestations (Tseng et al, 2011)
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Speech Materials--Taiwan Mandarin
• Read speech• Plain text of 26 discourse pieces by M051 and F051
(CNA) (about 45 and 46 minutes, 160MB)• 34 simulated pieces of weather broadcast by M054 and
F054 (WB) (about 23 and 27 minutes, 95MB)
• Spontaneous speech– NTU DSP lecture by LSL (one male speaker, about
30 minutes) (SpnL/LEC)
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Annotations
• Preprocessing • Automatic Segmental labeling using the HTK and
manually spot-checked for phone boundaries. • Manual labeling of perceived prosodic boundaries by
HPG protocols.• Manual labeling of perceived focus and prominence
– prosodic highlight
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Annotation Rationale
• Labeling Perceived Boundary Breaks
• Labeling Perceived Prosodic Highlight (emphasis, accent)
Definition Characteristics
B1 normal syllabic boundary No identifiable pauses
B2 prosodic word boundary Before a slight change of tone of voice follows.
B3 prosodic phrase boundary A clearly perceived pause.
B4 breath group boundary Clearly heard change of breath
B5 prosodic group boundaryFinal lengthening followed by a complete stop before new paragraph, with change of break.
Definition
E0 unstressed portions marked by reduced pitch, volume and/or segment reduction
E1 normal pitch, volume with no segmental reduction
E2 higher pitch or louder volume irrespective of speaker’s tone of voice or intention
E3 higher pitch or louder volume marked by speaker’s tone of voice or intention
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Annotations Examples
phone boundary layer→
perceived prosodic boundary layer→
perceived prosodichighlight layer→
“以自有品牌建立起國際品牌形象”
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Acoustic Features and Methodology
Acoustic features• Vowel-based F0• Syllable-based duration• Vowel-based intensity
Methodology• Multiple regression model
(Tseng et al 2005)
2011/07/12 NGASR 2011 暑期研習會 Jr-Feng Huang
iii factorfactorx ....21
High layer information
Intrinsic attributes
PW
SYL
BG
SYL
PPhPPh
PW PW
SYL SYL Residues
Residues
Residues
Discourse Prosodic Attributes
• Examples: 3-PPh paragraph (Tseng et al, 2010)
PW Layer
PPh Layer
PG Layer
Nor
mal
ized
F0
Syllable Position
PG Initial PG Medial PG Final
Nor
mal
ized
F0
Syllable Position
Nor
mal
ized
Dur
ation
Syllable Position Syllable Position
Nor
mal
ized
Inte
nsityPG Initial PG Medial PG Final PG Initial PG Medial PG Final
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Prosodic Boundary• Phrases are not only major and minor phrases • Acoustic realization of prosodic boundaries
– Pre-boundary • F0 lowering,• Duration lengthening• Intensity decay
– Boundary pause– Post-boundary
• F0 reset• Duration shortening• Intensity jump
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
How Reliable Is Pause Duration ? (1/2)
• Cross genres, speakers and language– systematic pattern by pause duration, i.e. B3<B4<B5
μ / σ B3 B4 B5
RS_CNA_M051P 249 /207 520 /124 621/113
RS_CNA_F051P 229 /140 339 /172 394 /237
RS_WB_M054 165/145 490/123 555/166
SpnL_LSL 423/429 739/299 1153/498
Pause duration (ms) by break (B3, B4 and B5 and genre Read Speech (RS) CNA, weather broadcast WB; spontaneous speech (Spnl)
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
How Reliable Is Pause Duration ? (2/2)
• B3 (PPh) boundaries vary a great deal• Pause duration—not reliable• How is PPh boundary B3 be perceived?
– (Tseng et al, 2009)
Plotting of the distribution of pause duration of discourse boundary breaks B2, B3 and B4 in read speech (RS) CNA for speakers F051P (left) and M051P (right).
0%
2%
4%
6%
8%
-0.4 0.4 1.2 2 2.8 3.6 4.4 5.2 6
B2
B3
B4
0%
2%
4%
6%
8%
-0.4 0.4 1.2 2 2.8 3.6 4.4 5.2 6
B2
B3
B4
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Comparison of Discourse Boundary Discrimination (Tseng et al, 2009)
• Cross-feature Comparison by Corpus
CNA_F051
CNA_M051
LEC
Discrimination:
LEC
Cross-feature comparison of mean value by corpus (LEC, CNA_F051 and CNA_M051 from top to bottom; the horizontal axis represents indexes of feature type; the vertical axis denotes mean value of each feature).
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Analysis of Perceived Emphasis Annotations (1/3)
• Distribution of Perceived Emphasis
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Combined Emphasis(E2+E3)
Analysis of Perceived Emphasis Annotations (2/3)
• Perceived Emphasis Scale– Not only perceived emphasis but syntax constraint
2011/07/12 NGASR 2011 暑期研習會 Jr-Feng Huang
Analysis of Perceived Emphasis Annotations (3/3)
• Distribution of Perceived Emphasis by phrase boundaries– LEC: post-boundary = pre-boundary– CNA: post-boundary > pre-boundary– WB: post-boundary < pre-boundary
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Emphasis Loading
• Why? – Estimate information weighting in continuous speech
• Methodology– Normalize length of PPh
– Estimation
E3label if ,3
E2label if ,2
E1label if ,1
Socre
Syl Syl Syl Syl Syl
Syl Syl Syl Syl Syl Syl Syl
PPh N
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Results of Emphasis Loading• Within PPh by Relative Syllable Position
• Within BG and PG by Relative PPh Position
Initial PPh Medial PPh Final PPh Initial PPh Medial PPh Final PPh
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Acoustic Characteristics of Prosodic Highlights (1/2)
• Emphasis vs. no-emphasis without considering PPh-positions
Mean values of acoustic correlates by emphasis/no-emphasis and genres
• Significant acoustic factors by genres• LEC:
• Duration• Average F0 (F-ratio=846)• F0 range• Intensity (F-ratio=873)
• CNA• Average F0 (F-ratio=492)• Intensity (F-ratio=364)
• WB• Intensity (F-ratio=196)• Duration (F-ratio=170)
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Acoustic Characteristics of Perceived Highlights (2/2)
• Emphasis vs. no-emphasis with considering PPh-positions PPh-Initial
PPh-Final
PPh-Medial
• LEC• Duration• Average F0• F0 range• Intensity
• CNA• Average F0• Intensity• Duration in PPh-Medial position only
• WB• Intensity by all PPh positions• Duration in PPh-Medial position only
by all PPh positions
by all PPh positions
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Analysis of Perceived Emphasis by Decision Tree Toolkit
• Why? Evaluating the most significant factors for classification
• Methodology:
• Results:
Decision Tree-CNA Decision Tree-WB Decision Tree-LEC
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Discourse Pattern of Emph vs. No-Emph—CNACNA
Initial PPh Medial PPh Final PPh
Initial PPh Medial PPh Final PPh
Initial PPh Medial PPh Final PPh
CNA
Nor
mal
ized
D
urati
on
Initial PPh Medial PPh Final PPh
Initial PPh Medial PPh Final PPh
Initial PPh Medial PPh Final PPh
Nor
mal
ized
F0
Nor
mal
ized
in
tens
ity
Syllable position
Syllable position
Syllable position
Nor
mal
ized
D
urati
onN
orm
aliz
ed F
0N
orm
aliz
ed
inte
nsity
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Removing emphasis effect
Discourse Pattern of Emph vs. Non-Emph—WB
WB
Initial PPh Medial PPh Final PPh
Initial PPh Medial PPh Final PPh
Initial PPh Medial PPh Final PPh
WB
Initial PPh Medial PPh Final PPh
Initial PPh Medial PPh Final PPh
Initial PPh Medial PPh Final PPh
Syllable position
Syllable position
Syllable position
Nor
mal
ized
D
urati
onN
orm
aliz
ed F
0N
orm
aliz
ed
inte
nsity
Nor
mal
ized
D
urati
onN
orm
aliz
ed F
0N
orm
aliz
ed
inte
nsity
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Removing emphasis effect
Discourse Pattern of Emph vs. Non-emph —LEC
Nor
mal
ized
D
urati
onN
orm
aliz
ed F
0N
orm
aliz
ed
inte
nsity
LEC
Initial PPh Medial PPh Final PPh
Initial PPh Medial PPh Final PPh
Initial PPh Medial PPh Final PPh
LEC
Initial PPh Medial PPh Final PPh
Initial PPh Medial PPh Final PPh
Initial PPh Medial PPh Final PPh
Syllable position
Syllable position
Syllable position
Nor
mal
ized
D
urati
onN
orm
aliz
ed F
0N
orm
aliz
ed
inte
nsity
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會
Removing emphasis effect
Findings• Prosodic boundary
– Pause duration could be random– Boundary neighborhood contrast is more significant.
• Prosodic highlights– Speech mode (genre) related– Independent of discourse structure– underlying linguistic structures can be derived• Future directions– Speech technology development could benefit from
more understanding of information structure in relation to prosodic highlight.
2011/07/12 Jr-Feng HuangNGASR 2011 暑期研習會