Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
한국음운론학회 May 2, 2015
춘계공동학술대회 HUFS
A model of lexically-conditioned variation
Jongho Jun
(Seoul National University: [email protected])
0. Roadmap
Free variation
Lexical variation
Mixed variation
A toy example
Proposal
Learning simulation
Conclusion
Free variation
1. Free variation: A single word has more than one phonetic form.
Examples
a. English t/d deletion
i. los[t] ~ los[ ] (books)
ii. wes[t] ~ wes[ ] (side)
b. Korean n-insertion
i. /com-jak/ [comjak] ~ [comnjak] ‘mothball’
ii. /pɛk-jəl/ [pɛkjəl] ~ [pɛŋnjəl] ‘white heat’
iii. /hoth-ipul/ [hotipul] ~ [honnipul] ‘unlined comforter’
And many others
2. An optimal model of free variation should explain the following:
• Multiple outputs
• Frequency matching
Neither of these can be explained in Standard OT.
3. t/d-deletion in some dialects of American English (from Coetzee & Pater 2008 ms)
Deletion rate (before a consonant as in ‘los[t] books’)
Chicano English 46%
Jamaican English 85%
Philadelphia English 100%
4. Probabilistic OT (or OT-like) theories
• Partially Ordered Constraints (POC: Kiparsky 1993; Anttila 1997 et seq)
• Stochastic OT (Boersma 1997, 1998; Boersma & Hayes 2001)
• Noisy Harmonic Grammar (Boersma & Pater to appear)
• Maximum Entropy Grammar (Goldwater & Johnson 2003)
In discussing how to explain phonological variation, the present study is mainly
2
concerned with Stochastic OT.
(i) Stochastic OT is adopted by Zuraw (2010) who proposes a very efficient model of
(lexical) variation.
(ii) My model of variation heavy relies on her work.
5. Revising standard OT (in Stochastic OT)
a. The constraint ranking need to be variable so that multiple outputs can be generated.
An OT analysis of English t/d-deletion
*CT: ‘No t/d occurs after a consonant.’ (Coetzee & Pater 2011)
ranking outcome
(i) *CT > MAX deleting (e.g. los[ ])
(ii) *CT < MAX non-deleting (e.g. los[t])
b. The relative frequency of possible rankings (or grammars) may match that of the
observed variants. Thus, the grammar can do frequency matching.
Chicano English Grammar
relative frequency ranking outcome
(i) 46% *CT > MAX deleting (e.g. los[ ])
(ii) 54% *CT < MAX non-deleting (e.g. los[t])
6. Stochastic OT
a. Constraint evaluation: no difference from standard OT
b. A difference from standard OT: speakers do not memorize the constraint ranking, but
constraint “score” (ranking value).
E.g. *CT (102) vs. MAX (100)
c. Each time the grammar is used to evaluate a candidate set, the ranking values are
converted to a corresponding ranking.
*CT (102) vs. MAX (100)
*CT > MAX
7. Noisy evaluation: Stochastic OT
a. Before transforming the ranking values into a ranking, each one is perturbed by adding
a (+/-) number, taken from a normal distribution.
b. Due to this noisy evaluation, the constraint ranking may be variable.
E.g. *CT (102-2) vs. MAX (100+1) *CT < MAX Opposite ranking!!! Multiple outputs can be generated.
c. The distribution of outputs may differ depending on the distance in ranking values
between the conflicting constraints.
e.g. *CT MAX Probability of ranking
reversal t/d Deletion
100 80
low more likely
100 99 high less likely
3
Lexical variation
8. Characteristic patterns of lexical variation
• The pronunciation of a word is fixed.
• Phonologically similar words have different pronunciation patterns.
9. Example: Nasal substitution in Tagalog (Zuraw 2010)
• A prefix-final nasal fuses with a stem-initial obstruent.
[+nasal] + {p/b, t/d, k/g} [m, n, ŋ]
e.g. maŋ + bigáj mamigáj
• Words are divided into two groups, substituting and non-substituting.
stem combined with prefix (maŋ/paŋ-)
i. mag-bigáj ‘give’ ma-migáj ‘to distribute’
substituting ii. buháj ‘life’ ma-muháj ‘to live’
iii. pighatɪ́ʔ ‘grief’ pa-mi-mighatɪ́ʔ ‘being in grief’
iv. buháj ‘life’ pam-buháj ‘vivifying’
non-substituting v. poʔók ‘district’ pam-poʔók ‘local’
vi. dinɪ́g ‘audible’ pan-dinɪ́g ‘sense of hearing’
10. An optimal model of lexical variation should explain the following:
• The fixed pronunciation of a real word
• Frequency matching
11. Law of frequency matching (Hayes et al. 2009: 826)
Speakers of languages with variable lexical patterns respond stochastically when tested on
such patterns. Their responses aggregately match the lexical frequencies.
12. Frequency matching in Tagalog nasal substitution (Zuraw 2010)
• Nasal substitution in existing words
More likely with stems beginning with voiceless obstruents (voicing effect)
Dictionary data (# of words)
stem-initial substituting non-substituting
/p/ 253 (96%) 10 (4%)
/b/ 177 (67%) 100 (36%)
• Nasal substitution in novel words
In an acceptability judgments test, the results in general reflected lexical
frequencies: p-initial > b-initial (in substitution rate)
Tagalog speakers know the distribution of nasal substitution.
13. Previous OT models of lexical variation
• Zuraw (2010)
• Lexically-indexed constraints (Pater 2000, Coetzee & Pater 2011)
4
14. Zuraw’s (2010) model: the fixed phonetic form of an existing word.
a. Morphologically complex forms of existing Tagalog words can be stored as such in the
lexicon.
b. UR = SR at least with respect to nasal substitution
conventional UR Proposed UR SR
i. /maŋ-bigáj/ /ma-migáj/ [mamigáj] ‘to distribute’
ii. /paŋ-poʔók/ /pam-poʔók/ [pampoʔók] ‘local’
c. The faithful realization of URs is guaranteed by high-ranking faithfulness constraints.
INTEGRITY-IO: /ma-migáj/ [mamigáj], not *[mambigáj]
UNIFORMITY-IO: /pam1-p2oʔók/ [pam1p2oʔók], not *[ pam1,2oʔók]
15. Zuraw’s (2010) model: Generalization to novel words
a. Novel words Do Not have their lexical entries. Novel stems are newly combined with
relevant prefixes.
b. The prefixes triggering nasal substitution end in a floating nasal feature, not segment.
/pa[+nas]1-p2…/ [pam2…]
|
[+nas]1
c. Nasal substitution may occur, not violating high-ranking UNIFORMITY-IO.
16. Ranking values (from Zuraw 2010, Table 3; definitions added)
112.213 INTEGRITY-IO “No breaking”
112.176 UNIFORMITY-IO “No coalescence”
102.799 *NC̥ “No sequence of a nasal and a voiceless obstruent”
… …
100.038 NOCODA “No Coda consonants”
… …
99.962 *[m “No stem initial [m]”
a. Faithfulness constraints INTEGRITY-IO and UNIFORMITY-IO have highest values. They
guarantee the faithful realization of the listed variant of each lexical item.
b. The remaining lower-ranked constraints (“subterranean” grammar in Zuraw’s
terminology) determine the output variants of novel words.
c. They can encode the frequency of nasal substitution.
17. A crucial aspect of Zuraw’s proposal (p. 419)
…the ranking of the “subterranean” markedness constraints can be learned despite training
data in which all words are pronounced faithfully…
18. cf. An alternative to lexical variation: lexically-indexed faithfulness constraints (Pater 2000;
Coetzee & Pater 2011)
a. Tagalog words are subdivided into two groups depending on whether they undergo
5
nasal substitution or not.
b. Stems like /bigáj/ belong to a substituting group (Sub) whereas stems like /poʔók/ a
non-substituting group (Non-sub).
c. Relevant faithfulness constraints are group-specific: UNIFORMITY-{Sub} vs.
UNIFORMITY-{Non-sub}.
d. The following ranking can successfully explain the fact that each prefixed word in
Tagalog has a fixed variant.
UNIFORMITY-{Non-sub} >> NOCODA >> UNIFORMITY-{Sub}
e. But, it is hard to explain not only generalization to novel words but also frequency
matching.
Mixed variation
19. Most previous studies on phonological variation were concerned with either free or lexical
variation, somewhat idealizing the observed variation.
a. Free variation: A given rule applies to every target word with equal probability.
b. Lexical variation: For each (potential target) word, a given rule is either always or never
applied.
• At least some cases of free variation turn out to be lexically-conditioned.
A given rule applies to each target word with a probability specific to it.
20. Mixed variation: English t/d-deletion (Coetzee & Pater 2011)
Deletion preference differs depending on the word.
Large differences between individual words:
e.g. mos(t) > hos(t), yeas(t) > feas(t)
Usage frequency does not explain all of the between word differences.
It seems unavoidable that the likelihood participation in a variable process is
conditioned to some extent by lexical idiosyncrasy. (Coetzee & Pater 2011)
21. Mixed variation: Korean n-insertion (Jun 2014)
a. Optional n-insertion in Korean: /n/ is optionally inserted at the juncture of two
morphemes when the first morpheme ends with a consonant and the following
morpheme begins with a high front vocoid.
/com-jak/ [comjak] ~ [comnjak] ‘mothball’
b. N-insertion applies to different words with different frequency.
Insertion rates vary greatly across words with /-jak/ ‘medicine’ :
Word Insertion rate (%)
i. /tok-jak/ 0 ‘poison’
ii. /an-jak/ 4.5 ‘eye drops’
iii. /sɛŋ-jak-hak/ 22.7 ‘pharmacognosy’
iv. /com-jak/ 52.3 ‘mothball’
v. /tuthoŋ-jak/ 68.2 ‘headache pill’
vi. /al-jak/ 86.4 ‘pill’
6
Moreover, the probability of application is relatively fixed for each existing word.
For 43 words adopted in both Kook et al. (2005) and Jun’s (2014) survey, the
correlation was rather high (r (43) = .783, p = 5.625e-10).
This suggests that Seoul Korean speakers know word-specific rates of n-insertion.
22. Frequency matching in Seoul Korean n-insertion (Jun 2014)
a. N-insertion of existing Korean words
e.g. Velar nasal effect: n-insertion is less likely after /ŋ/ than other sonorants.
/…ŋ+j…/ […ŋnj…] < /…m+j…/ […mnj…]
Insertion rate (an acceptability judgment test on existing words)
after /ŋ/ after /m, n, l/
0.38 0.49
b. N-insertion in novel words (e.g., king/some/ten/tall+jucenol)
In an acceptability judgment test on novel words, the results reflected the
relative frequencies of existing words.
after /ŋ/ after /m, n, l/
0.15 0.31
Korean speakers know the relative rate of n-insertion in existing Korean words.
23. An optimal model of mixed variation should be able to explain the following:
• Word-specific rates of existing words: Each existing word has its own rate of
application of a given rule.
• Generalization to novel words
• Frequency matching: The distribution of variants of novel words approximates the
distribution of existing words in aggregate.
24. Previous approaches to lexical variation can hardly explain mixed variation with no
modification.
• Lexically-indexed constraints approach (Pater 2000, Coetzee & Pater 2011) with
noisy-evaluation can explain the word-specific application rate of a given process
for existing words.
But, it’s unclear how the grammar with lexically-indexed constraints can
generalize to novel words, while frequency-matching the aggregate distribution
of the process in existing words.
• In Zuraw’s (2010) model, it is difficult to explain the fixed rate of application for each
existing word.
Given that a process applies to most of the target words (though with different
frequency), both variants, undergoing and not undergoing the process, need to be
listed as such in the lexicon.
But, such lexical listing is not sufficient to differentiate words with different rates
of rule application. What needs to be specified for each lexical item is not whether
the rule applies or not, but how often it applies.
7
A toy example
25. I constructed a toy example of mixed variation on the basis of the real Tagalog nasal
substitution pattern (Zuraw 2010).
a. A mini-lexicon with 4 words
2 /p/-initial stems
2 /b/-initial stems
b. Nasal substitution applies to each word with different frequency.
c. Voicing effect: Average nasal substitution rate is higher with /p/-initial stems than with
/b/-initial stems, although not all words with /p/-initial stems show higher substitution
rates than those with /b/-initial stems.
d. Two dialects: One dialect show on average higher nasal substitution rate than the other,
although not all words in the former have higher substitution rates than those in the
latter.
26. Nasal substitution rates of the two dialects (on average)
Zagalog I Zagalog II
All 4 words 70% 50%
2 p-initial stem 90% 70%
2 b-initial stem 50% 30%
27. Data of the two dialects
conventional Probability
Word UR SR Zagalog I Zagalog II
A /paŋ+pa/ [pampa] 0 0.2
[pama] 1 0.8
B /paŋ+pe/ [pampe] 0.4 0.6
[pame] 0.6 0.4
C /paŋ+bi/ [pambi] 0.2 0.4
[pami] 0.8 0.6
D /paŋ+bu/ [pambu] 0.6 0.8
[pamu] 0.4 0.2
28. Predictions: Generalization to novel words and frequency matching
Nasal substitution applies to novel words more frequently in Zagalog I than in Zagalog
II.
Nasal substitution applies to novel words with p-initial stems than those with b-initial
stems in both dialects.
29. An optimal model should be able to explain …
Word-specific rates of nasal substitution in existing words of Zagalog I, II
Generalization to novel words
Frequency matching
8
Proposal
30. Multiple lexical listing: All attested surface forms of an existing word, morphologically
complex or not, are listed in the lexicon (cf. Zuraw 2010, Pater et al. 2012).
Zagalog lexicon
Word Forms listed in the lexicon Cf. Surface forms
A /pampa/ [pampa]
/pama/ [pama]
B /pampe/ [pampe]
/pame/ [pame]
C /pambi/ [pambi]
/pami/ [pami]
D /pambu/ [pambu]
/pamu/ [pamu]
31. UR constraints are defined for forms listed in the lexicon.
They require that listed forms be chosen as URs (Pater et al. 2012).
Each variant form listed in the lexicon has its own UR constraint whose ranking value
may reflect speakers’ preference for the given variant, explaining the word-specific rate
of rule application.
UR constraints in Zagalog
UR constraints shorthand
a. i. /pampa/ is the UR of word A. PAMPA
ii. /pama/ is the UR of word A. PAMA
b. i. /pampe/ … word B PAMPE
ii. /pame/ … word B PAME
c. i. /pambi/ … word C PAMBI
ii. /pami/ … word C PAMI
d. i. /pambu/ … word D PAMBU
ii. /pamu/ … word D PAMU
32. Once a listed form is chosen as UR, it may surface as such through high-ranking
faithfulness constraints for forms listed in the lexicon.
Faithfulness constraints for forms listed in the lexicon
INTEGRITY.LIST No breaking for the forms listed in the lexicon.
UNIFORMITY.LIST No coalescence for the forms listed in the lexicon.
A priority in the ranking is given to these special faithfulness constraints, compared to
general faithfulness constraints like INTEGRITY and UNIFORMITY.
Notice that these special constraints are not active for novel words which do not have
corresponding forms listed in the lexicon.
Thus, their surface realization would be subject to the interaction of lower-ranked
constraints including UNIFORMITY and INTEGRITY.
9
33. Three-way distinction in the representation: word/morpheme vs. UR vs. SR
For candidates with different URs of the same word/morpheme to compete, the
word/morpheme-UR-SR 3-way distinction (Pater et al. 2012) is adopted in candidate
evaluation.
Two words A, B in Zagalog dialects
Word/morpheme UR SR
A pampa pampa
A pama pama
B pampe pampe
B pame pame
UR constraints are responsible for the mapping from word/morpheme to UR.
Faithfulness constraints are responsible for the mapping from UR to SR.
34. How to explain generalization to novel words and frequency matching:
Nasal substitution may generalize to novel words through the interaction of
markedness and general faithfulness constraints which are ranked below faithfulness
constraints for lexically listed forms (like Zuraw’s 2010 “subterranean” grammar).
The ranking values of the lower-ranked constraints encode the aggregate relative
frequency of substitution among existing words.
35. A total constraint set for the analysis of Zagalog nasal substitution
a. Faithfulness for listed items INTEGRITY.LIST
UNIFORMITY.LIST
b. UR PAMPA
PAMA
PAMPE
PAME
PAMBI
PAMI
PAMBU
PAMU
c. Markedness *NC̥
NOCODA
*[m
d. Faithfulness INTEGRITY
UNIFORMITY
Constraints in (a,b) are responsible for nasal (non-)substation among existing words.
Constraints in (c,d) are responsible for nasal (non-)substation among novel words.
10
Learning Simulation
36. The Gradual Learning Algorithm (GLA) Learner for Stochastic OT in OTSoft (Hayes,
Tesar & Zuraw 2013)
37. Options adopted in the simulation
Initial ranking values
Faithfulness constraints for listed forms: 120
the rest (UR, markedness and general faithfulness): 100
Parameters in the simulation
Number of time to go through forms: 5,000,000 cycles
Initial plasticity: 0.02; Final plasticity: 0.002
Number of time to test grammar: 10,000
38. Two sets of grammar learning and testing: Zagalog I, II.
Training data: Word/morpheme-UR-SR forms of four words with frequencies
proportional to the distributions shown above.
Testing data: two novel words with no frequency (one p-initial and one b-initial stem).
39. Training and testing data given to the learner (Zagalog I)
Word/
morpheme UR.SR
frequ
ency
*N
C
NO
CO
DA
*[m
PA
MP
A
PA
MA
…
I NT
EG
UN
IF
I NT
EG
.LIS
T
UN
IF. LIS
T
A pampa. pampa 0 1 1 1
pampa. pama 0 1 1 1 1
pama. pampa 0 1 1 1 1 1
pama. pama 100 1 1
B pampe. pampe 40 1 1
pampe. pame 0 1 1 1
pame. pampe 0 1 1 1 1
pame. pame 60 1
C pambi. pambi 20 1
pambi. pami 0 1 1 1
pami. pambi 0 1 1 1
pami. pami 80 1
D pambu. pambu 60 1
pambu. pamu 0 1 1 1
pamu. pambu 0 1 1 1
pamu. pamu 40 1
Novel paŋ+po. pampo 1 1
Word 1 paŋ+po. pamo 1 1
Novel paŋ+bo. pambo 1
Word 2 paŋ+bo. pamo 1 1
11
40. Training and testing data (Zagalog II): same as the above except for the frequency.
Word UR.SR Frequency
A pampa. pampa 20
pama. pama 80
B pampe. pampe 60
pame. pame 40
C pambi. pambi 40
pami. pami 60
D pambu. pambu 80
pamu. pamu 20
41. Simulation result: Ranking values learned
Zagalog I Zagalog II
120 INTEGRITY.LIST 120 INTEGRITY.LIST
120 UNIFORMITY.LIST 120 UNIFORMITY.LIST
108.1 PAMA 101.86 PAMA
104.3 *NC̥ 101.74 PAMBU
103.8 PAMPE 101.33 PAMPE
102.58 PAMBU 100.93 PAMI
101.81 NOCODA 100.53 *NC̥
100.09 PAMI 100.27 *[m
100 INTEGRITY 100 INTEGRITY
100 UNIFORMITY 100 UNIFORMITY
99.91 PAMBI 99.73 NOCODA
98.19 *[m 99.07 PAMBI
97.42 PAMU 98.67 PAME
96.2 PAME 98.26 PAMU
91.9 PAMPA 98.14 PAMPA
a. In Zagalog I (where nasal substitution rate is high, 70% on average), NOCODA, a
constraint triggering nasal substitution, (101.81) has a higher ranking value than
*[m, one blocking substitution (98.19).
b. In Zagalog II (where nasal substitution rate is medium, 50% on average), NOCODA
(99.73) has a slightly lower ranking value than *[m (100.27).
42. Nasal substitution rates predicted by the learned grammars
Conventional Zagalog I Zagalog II
UR training prediction training prediction
A /paŋ+pa/ 1 1 0.8 0.81
Existing B /paŋ+pe/ 0.6 0.61 0.4 0.41
word C /paŋ+bi/ 0.8 0.78 0.6 0.62
D /paŋ+bu/ 0.4 0.39 0.2 0.2
Novel 1 /paŋ+po/ n/a 0.95 n/a 0.5
word 2 /paŋ+bo/ n/a 0.70 n/a 0.27
(n/a = not available)
12
a. Word-specific substitution rate: The substitution rates for existing words in both
Zagalog dialects were successfully reproduced in the simulation.
b. Frequency matching:
The average substitution rate for novel words is higher in Zagalog I (0.83) than in
Zagalog II (0.39). These testing results are consistent with the higher aggregate
rate of substitution in Zagalog I lexicon.
Voicing effect: In both dialects, nasal substitution is more likely with /p/-initial
novel stems (i.e., /po/) than /b/-initial stems (i.e., /bo/). This testing result is
consistent with the voicing effect in the lexicon.
Conclusion
43. Building crucially on Zuraw (2010) and Pater et al (2012), I have proposed a model of
lexically-conditioned variation.
- Mechanisms adopted to explain the characteristic patterns of mixed variation:
Word-specific rate of rule application
Lexical listing of attested variants of morphologically complex words
UR constraints
(High-ranking) faithfulness constraints for forms listed in the lexicon
Generalization to novel words and frequency matching
Lower-ranked markedness and general faithfulness constraints.
Notice that the ranking values of the these lower-ranked constraints for
generalization and frequency matching can be learned although no words in the
training set show actual substitution (as in Zuraw 2010).
44. With a toy example based on Tagalog nasal substitution, I have illustrated how the proposed
model can be learned and capture the patterns of mixed variation.
45. What to do next: Apply the proposal to real language data like Korean n-insertion and
English t/d-deletion.
References (selected) Boersma, Paul. (1997) How we learn variation, optionality, and probability. Proceedings of the Institute of
Phonetic Sciences of the University of Amsterdam 21. 43-58. [Available on Rutgers Optimality
Archive, ROA-221.]
Boersma, Paul. (1998) Functional Phonology: Formalizing the Interaction between Articulatory and Perceptual
Drives. The Hague: Holland Academic Graphics. [Doctoral dissertation, University of Amsterdam.]
Boersma, Paul & Bruce Hayes. (2001) Empirical tests of the Gradual Learning Algorithm. Linguistic Inquiry,
32: 45-86.
Coetzee, Andries. & Joe Pater (2011) The place of variation in phonological theory. In Goldsmith et al. (eds.)
The Handbook of Phonological Theory, 2nd edition. Malden, MA and Oxford, UK: Blackwell, 401-434.
Hayes, B., B. Tesar, & K. Zuraw (2013) OTSoft 2.3.2, software package,
http://www.linguistics.ucla.edu/people/hayes/otsoft/.
Jun, Jongho (2014) Phonological variation in Seoul Korean n-insertion. Handout presented at The 45th annual
meeting of the North East Linguistic Society, MIT, October 31 - November 2, 2014.
Pater, Joe, Robert Staubs, Karen Jesney and Brian Smith. (2012) Learning probabilities over underlying
representations. In the Proceedings of the Twelfth Meeting of the ACL-SIGMORPHON:
Computational Research in Phonetics, Phonology, and Morphology. 62-71.
Zuraw, Kie. (2010) A model of lexical variation and the grammar with application to Tagalog nasal
substitution. NLLT 28.2: 417-472.