춘계공동학술대회 HUFS A model of lexically-conditioned ...ling.snu.ac.kr/jun/work/2015_hufs.pdf12. Frequency matching in Tagalog nasal substitution (Zuraw 2010) • Nasal substitution

한국음운론학회 May 2, 2015

춘계공동학술대회 HUFS

A model of lexically-conditioned variation

Jongho Jun

(Seoul National University: [email protected])

0. Roadmap

Free variation

Lexical variation

Mixed variation

A toy example

Proposal

Learning simulation

Conclusion

Free variation

1. Free variation: A single word has more than one phonetic form.

Examples

a. English t/d deletion

i. los[t] ~ los[ ] (books)

ii. wes[t] ~ wes[ ] (side)

b. Korean n-insertion

i. /com-jak/ [comjak] ~ [comnjak] ‘mothball’

ii. /pɛk-jəl/ [pɛkjəl] ~ [pɛŋnjəl] ‘white heat’

iii. /hoth-ipul/ [hotipul] ~ [honnipul] ‘unlined comforter’

And many others

2. An optimal model of free variation should explain the following:

• Multiple outputs

• Frequency matching

Neither of these can be explained in Standard OT.

3. t/d-deletion in some dialects of American English (from Coetzee & Pater 2008 ms)

Deletion rate (before a consonant as in ‘los[t] books’)

Chicano English 46%

Jamaican English 85%

Philadelphia English 100%

4. Probabilistic OT (or OT-like) theories

• Partially Ordered Constraints (POC: Kiparsky 1993; Anttila 1997 et seq)

• Stochastic OT (Boersma 1997, 1998; Boersma & Hayes 2001)

• Noisy Harmonic Grammar (Boersma & Pater to appear)

• Maximum Entropy Grammar (Goldwater & Johnson 2003)

In discussing how to explain phonological variation, the present study is mainly

2

concerned with Stochastic OT.

(i) Stochastic OT is adopted by Zuraw (2010) who proposes a very efficient model of

(lexical) variation.

(ii) My model of variation heavy relies on her work.

5. Revising standard OT (in Stochastic OT)

a. The constraint ranking need to be variable so that multiple outputs can be generated.

An OT analysis of English t/d-deletion

*CT: ‘No t/d occurs after a consonant.’ (Coetzee & Pater 2011)

ranking outcome

(i) *CT > MAX deleting (e.g. los[ ])

(ii) *CT < MAX non-deleting (e.g. los[t])

b. The relative frequency of possible rankings (or grammars) may match that of the

observed variants. Thus, the grammar can do frequency matching.

Chicano English Grammar

relative frequency ranking outcome

(i) 46% *CT > MAX deleting (e.g. los[ ])

(ii) 54% *CT < MAX non-deleting (e.g. los[t])

6. Stochastic OT

a. Constraint evaluation: no difference from standard OT

b. A difference from standard OT: speakers do not memorize the constraint ranking, but

constraint “score” (ranking value).

E.g. *CT (102) vs. MAX (100)

c. Each time the grammar is used to evaluate a candidate set, the ranking values are

converted to a corresponding ranking.

*CT (102) vs. MAX (100)

*CT > MAX

7. Noisy evaluation: Stochastic OT

a. Before transforming the ranking values into a ranking, each one is perturbed by adding

a (+/-) number, taken from a normal distribution.

b. Due to this noisy evaluation, the constraint ranking may be variable.

E.g. *CT (102-2) vs. MAX (100+1) *CT < MAX Opposite ranking!!! Multiple outputs can be generated.

c. The distribution of outputs may differ depending on the distance in ranking values

between the conflicting constraints.

e.g. *CT MAX Probability of ranking

reversal t/d Deletion

100 80

low more likely

100 99 high less likely

3

Lexical variation

8. Characteristic patterns of lexical variation

• The pronunciation of a word is fixed.

• Phonologically similar words have different pronunciation patterns.

9. Example: Nasal substitution in Tagalog (Zuraw 2010)

• A prefix-final nasal fuses with a stem-initial obstruent.

[+nasal] + {p/b, t/d, k/g} [m, n, ŋ]

e.g. maŋ + bigáj mamigáj

• Words are divided into two groups, substituting and non-substituting.

stem combined with prefix (maŋ/paŋ-)

i. mag-bigáj ‘give’ ma-migáj ‘to distribute’

substituting ii. buháj ‘life’ ma-muháj ‘to live’

iii. pighatɪ́ʔ ‘grief’ pa-mi-mighatɪ́ʔ ‘being in grief’

iv. buháj ‘life’ pam-buháj ‘vivifying’

non-substituting v. poʔók ‘district’ pam-poʔók ‘local’

vi. dinɪ́g ‘audible’ pan-dinɪ́g ‘sense of hearing’

10. An optimal model of lexical variation should explain the following:

• The fixed pronunciation of a real word

• Frequency matching

11. Law of frequency matching (Hayes et al. 2009: 826)

Speakers of languages with variable lexical patterns respond stochastically when tested on

such patterns. Their responses aggregately match the lexical frequencies.

12. Frequency matching in Tagalog nasal substitution (Zuraw 2010)

• Nasal substitution in existing words

More likely with stems beginning with voiceless obstruents (voicing effect)

Dictionary data (# of words)

stem-initial substituting non-substituting

/p/ 253 (96%) 10 (4%)

/b/ 177 (67%) 100 (36%)

• Nasal substitution in novel words

In an acceptability judgments test, the results in general reflected lexical

frequencies: p-initial > b-initial (in substitution rate)

Tagalog speakers know the distribution of nasal substitution.

13. Previous OT models of lexical variation

• Zuraw (2010)

• Lexically-indexed constraints (Pater 2000, Coetzee & Pater 2011)

4

14. Zuraw’s (2010) model: the fixed phonetic form of an existing word.

a. Morphologically complex forms of existing Tagalog words can be stored as such in the

lexicon.

b. UR = SR at least with respect to nasal substitution

conventional UR Proposed UR SR

i. /maŋ-bigáj/ /ma-migáj/ [mamigáj] ‘to distribute’

ii. /paŋ-poʔók/ /pam-poʔók/ [pampoʔók] ‘local’

c. The faithful realization of URs is guaranteed by high-ranking faithfulness constraints.

INTEGRITY-IO: /ma-migáj/ [mamigáj], not *[mambigáj]

UNIFORMITY-IO: /pam1-p2oʔók/ [pam1p2oʔók], not *[ pam1,2oʔók]

15. Zuraw’s (2010) model: Generalization to novel words

a. Novel words Do Not have their lexical entries. Novel stems are newly combined with

relevant prefixes.

b. The prefixes triggering nasal substitution end in a floating nasal feature, not segment.

/pa[+nas]1-p2…/ [pam2…]

|

[+nas]1

c. Nasal substitution may occur, not violating high-ranking UNIFORMITY-IO.

16. Ranking values (from Zuraw 2010, Table 3; definitions added)

112.213 INTEGRITY-IO “No breaking”

112.176 UNIFORMITY-IO “No coalescence”

102.799 *NC̥ “No sequence of a nasal and a voiceless obstruent”

… …

100.038 NOCODA “No Coda consonants”

… …

99.962 *[m “No stem initial [m]”

a. Faithfulness constraints INTEGRITY-IO and UNIFORMITY-IO have highest values. They

guarantee the faithful realization of the listed variant of each lexical item.

b. The remaining lower-ranked constraints (“subterranean” grammar in Zuraw’s

terminology) determine the output variants of novel words.

c. They can encode the frequency of nasal substitution.

17. A crucial aspect of Zuraw’s proposal (p. 419)

…the ranking of the “subterranean” markedness constraints can be learned despite training

data in which all words are pronounced faithfully…

18. cf. An alternative to lexical variation: lexically-indexed faithfulness constraints (Pater 2000;

Coetzee & Pater 2011)

a. Tagalog words are subdivided into two groups depending on whether they undergo

5

nasal substitution or not.

b. Stems like /bigáj/ belong to a substituting group (Sub) whereas stems like /poʔók/ a

non-substituting group (Non-sub).

c. Relevant faithfulness constraints are group-specific: UNIFORMITY-{Sub} vs.

UNIFORMITY-{Non-sub}.

d. The following ranking can successfully explain the fact that each prefixed word in

Tagalog has a fixed variant.

UNIFORMITY-{Non-sub} >> NOCODA >> UNIFORMITY-{Sub}

e. But, it is hard to explain not only generalization to novel words but also frequency

matching.

Mixed variation

19. Most previous studies on phonological variation were concerned with either free or lexical

variation, somewhat idealizing the observed variation.

a. Free variation: A given rule applies to every target word with equal probability.

b. Lexical variation: For each (potential target) word, a given rule is either always or never

applied.

• At least some cases of free variation turn out to be lexically-conditioned.

A given rule applies to each target word with a probability specific to it.

20. Mixed variation: English t/d-deletion (Coetzee & Pater 2011)

Deletion preference differs depending on the word.

Large differences between individual words:

e.g. mos(t) > hos(t), yeas(t) > feas(t)

Usage frequency does not explain all of the between word differences.

It seems unavoidable that the likelihood participation in a variable process is

conditioned to some extent by lexical idiosyncrasy. (Coetzee & Pater 2011)

21. Mixed variation: Korean n-insertion (Jun 2014)

a. Optional n-insertion in Korean: /n/ is optionally inserted at the juncture of two

morphemes when the first morpheme ends with a consonant and the following

morpheme begins with a high front vocoid.

/com-jak/ [comjak] ~ [comnjak] ‘mothball’

b. N-insertion applies to different words with different frequency.

Insertion rates vary greatly across words with /-jak/ ‘medicine’ :

Word Insertion rate (%)

i. /tok-jak/ 0 ‘poison’

ii. /an-jak/ 4.5 ‘eye drops’

iii. /sɛŋ-jak-hak/ 22.7 ‘pharmacognosy’

iv. /com-jak/ 52.3 ‘mothball’

v. /tuthoŋ-jak/ 68.2 ‘headache pill’

vi. /al-jak/ 86.4 ‘pill’

6

Moreover, the probability of application is relatively fixed for each existing word.

For 43 words adopted in both Kook et al. (2005) and Jun’s (2014) survey, the

correlation was rather high (r (43) = .783, p = 5.625e-10).

This suggests that Seoul Korean speakers know word-specific rates of n-insertion.

22. Frequency matching in Seoul Korean n-insertion (Jun 2014)

a. N-insertion of existing Korean words

e.g. Velar nasal effect: n-insertion is less likely after /ŋ/ than other sonorants.

/…ŋ+j…/ […ŋnj…] < /…m+j…/ […mnj…]

Insertion rate (an acceptability judgment test on existing words)

after /ŋ/ after /m, n, l/

0.38 0.49

b. N-insertion in novel words (e.g., king/some/ten/tall+jucenol)

In an acceptability judgment test on novel words, the results reflected the

relative frequencies of existing words.

after /ŋ/ after /m, n, l/

0.15 0.31

Korean speakers know the relative rate of n-insertion in existing Korean words.

23. An optimal model of mixed variation should be able to explain the following:

• Word-specific rates of existing words: Each existing word has its own rate of

application of a given rule.

• Generalization to novel words

• Frequency matching: The distribution of variants of novel words approximates the

distribution of existing words in aggregate.

24. Previous approaches to lexical variation can hardly explain mixed variation with no

modification.

• Lexically-indexed constraints approach (Pater 2000, Coetzee & Pater 2011) with

noisy-evaluation can explain the word-specific application rate of a given process

for existing words.

But, it’s unclear how the grammar with lexically-indexed constraints can

generalize to novel words, while frequency-matching the aggregate distribution

of the process in existing words.

• In Zuraw’s (2010) model, it is difficult to explain the fixed rate of application for each

existing word.

Given that a process applies to most of the target words (though with different

frequency), both variants, undergoing and not undergoing the process, need to be

listed as such in the lexicon.

But, such lexical listing is not sufficient to differentiate words with different rates

of rule application. What needs to be specified for each lexical item is not whether

the rule applies or not, but how often it applies.

7

A toy example

25. I constructed a toy example of mixed variation on the basis of the real Tagalog nasal

substitution pattern (Zuraw 2010).

a. A mini-lexicon with 4 words

2 /p/-initial stems

2 /b/-initial stems

b. Nasal substitution applies to each word with different frequency.

c. Voicing effect: Average nasal substitution rate is higher with /p/-initial stems than with

/b/-initial stems, although not all words with /p/-initial stems show higher substitution

rates than those with /b/-initial stems.

d. Two dialects: One dialect show on average higher nasal substitution rate than the other,

although not all words in the former have higher substitution rates than those in the

latter.

26. Nasal substitution rates of the two dialects (on average)

Zagalog I Zagalog II

All 4 words 70% 50%

2 p-initial stem 90% 70%

2 b-initial stem 50% 30%

27. Data of the two dialects

conventional Probability

Word UR SR Zagalog I Zagalog II

A /paŋ+pa/ [pampa] 0 0.2

[pama] 1 0.8

B /paŋ+pe/ [pampe] 0.4 0.6

[pame] 0.6 0.4

C /paŋ+bi/ [pambi] 0.2 0.4

[pami] 0.8 0.6

D /paŋ+bu/ [pambu] 0.6 0.8

[pamu] 0.4 0.2

28. Predictions: Generalization to novel words and frequency matching

Nasal substitution applies to novel words more frequently in Zagalog I than in Zagalog

II.

Nasal substitution applies to novel words with p-initial stems than those with b-initial

stems in both dialects.

29. An optimal model should be able to explain …

Word-specific rates of nasal substitution in existing words of Zagalog I, II

Generalization to novel words

Frequency matching

8

Proposal

30. Multiple lexical listing: All attested surface forms of an existing word, morphologically

complex or not, are listed in the lexicon (cf. Zuraw 2010, Pater et al. 2012).

Zagalog lexicon

Word Forms listed in the lexicon Cf. Surface forms

A /pampa/ [pampa]

/pama/ [pama]

B /pampe/ [pampe]

/pame/ [pame]

C /pambi/ [pambi]

/pami/ [pami]

D /pambu/ [pambu]

/pamu/ [pamu]

31. UR constraints are defined for forms listed in the lexicon.

They require that listed forms be chosen as URs (Pater et al. 2012).

Each variant form listed in the lexicon has its own UR constraint whose ranking value

may reflect speakers’ preference for the given variant, explaining the word-specific rate

of rule application.

UR constraints in Zagalog

UR constraints shorthand

a. i. /pampa/ is the UR of word A. PAMPA

ii. /pama/ is the UR of word A. PAMA

b. i. /pampe/ … word B PAMPE

ii. /pame/ … word B PAME

c. i. /pambi/ … word C PAMBI

ii. /pami/ … word C PAMI

d. i. /pambu/ … word D PAMBU

ii. /pamu/ … word D PAMU

32. Once a listed form is chosen as UR, it may surface as such through high-ranking

faithfulness constraints for forms listed in the lexicon.

Faithfulness constraints for forms listed in the lexicon

INTEGRITY.LIST No breaking for the forms listed in the lexicon.

UNIFORMITY.LIST No coalescence for the forms listed in the lexicon.

A priority in the ranking is given to these special faithfulness constraints, compared to

general faithfulness constraints like INTEGRITY and UNIFORMITY.

Notice that these special constraints are not active for novel words which do not have

corresponding forms listed in the lexicon.

Thus, their surface realization would be subject to the interaction of lower-ranked

constraints including UNIFORMITY and INTEGRITY.

9

33. Three-way distinction in the representation: word/morpheme vs. UR vs. SR

For candidates with different URs of the same word/morpheme to compete, the

word/morpheme-UR-SR 3-way distinction (Pater et al. 2012) is adopted in candidate

evaluation.

Two words A, B in Zagalog dialects

Word/morpheme UR SR

A pampa pampa

A pama pama

B pampe pampe

B pame pame

UR constraints are responsible for the mapping from word/morpheme to UR.

Faithfulness constraints are responsible for the mapping from UR to SR.

34. How to explain generalization to novel words and frequency matching:

Nasal substitution may generalize to novel words through the interaction of

markedness and general faithfulness constraints which are ranked below faithfulness

constraints for lexically listed forms (like Zuraw’s 2010 “subterranean” grammar).

The ranking values of the lower-ranked constraints encode the aggregate relative

frequency of substitution among existing words.

35. A total constraint set for the analysis of Zagalog nasal substitution

a. Faithfulness for listed items INTEGRITY.LIST

UNIFORMITY.LIST

b. UR PAMPA

PAMA

PAMPE

PAME

PAMBI

PAMI

PAMBU

PAMU

c. Markedness *NC̥

NOCODA

*[m

d. Faithfulness INTEGRITY

UNIFORMITY

Constraints in (a,b) are responsible for nasal (non-)substation among existing words.

Constraints in (c,d) are responsible for nasal (non-)substation among novel words.

10

Learning Simulation

36. The Gradual Learning Algorithm (GLA) Learner for Stochastic OT in OTSoft (Hayes,

Tesar & Zuraw 2013)

37. Options adopted in the simulation

Initial ranking values

Faithfulness constraints for listed forms: 120

the rest (UR, markedness and general faithfulness): 100

Parameters in the simulation

Number of time to go through forms: 5,000,000 cycles

Initial plasticity: 0.02; Final plasticity: 0.002

Number of time to test grammar: 10,000

38. Two sets of grammar learning and testing: Zagalog I, II.

Training data: Word/morpheme-UR-SR forms of four words with frequencies

proportional to the distributions shown above.

Testing data: two novel words with no frequency (one p-initial and one b-initial stem).

39. Training and testing data given to the learner (Zagalog I)

Word/

morpheme UR.SR

frequ

ency

*N

C

NO

CO

DA

*[m

PA

MP

A

PA

MA

…

I NT

EG

UN

IF

I NT

EG

.LIS

T

UN

IF. LIS

T

A pampa. pampa 0 1 1 1

pampa. pama 0 1 1 1 1

pama. pampa 0 1 1 1 1 1

pama. pama 100 1 1

B pampe. pampe 40 1 1

pampe. pame 0 1 1 1

pame. pampe 0 1 1 1 1

pame. pame 60 1

C pambi. pambi 20 1

pambi. pami 0 1 1 1

pami. pambi 0 1 1 1

pami. pami 80 1

D pambu. pambu 60 1

pambu. pamu 0 1 1 1

pamu. pambu 0 1 1 1

pamu. pamu 40 1

Novel paŋ+po. pampo 1 1

Word 1 paŋ+po. pamo 1 1

Novel paŋ+bo. pambo 1

Word 2 paŋ+bo. pamo 1 1

11

40. Training and testing data (Zagalog II): same as the above except for the frequency.

Word UR.SR Frequency

A pampa. pampa 20

pama. pama 80

B pampe. pampe 60

pame. pame 40

C pambi. pambi 40

pami. pami 60

D pambu. pambu 80

pamu. pamu 20

41. Simulation result: Ranking values learned

Zagalog I Zagalog II

120 INTEGRITY.LIST 120 INTEGRITY.LIST

120 UNIFORMITY.LIST 120 UNIFORMITY.LIST

108.1 PAMA 101.86 PAMA

104.3 *NC̥ 101.74 PAMBU

103.8 PAMPE 101.33 PAMPE

102.58 PAMBU 100.93 PAMI

101.81 NOCODA 100.53 *NC̥

100.09 PAMI 100.27 *[m

100 INTEGRITY 100 INTEGRITY

100 UNIFORMITY 100 UNIFORMITY

99.91 PAMBI 99.73 NOCODA

98.19 *[m 99.07 PAMBI

97.42 PAMU 98.67 PAME

96.2 PAME 98.26 PAMU

91.9 PAMPA 98.14 PAMPA

a. In Zagalog I (where nasal substitution rate is high, 70% on average), NOCODA, a

constraint triggering nasal substitution, (101.81) has a higher ranking value than

*[m, one blocking substitution (98.19).

b. In Zagalog II (where nasal substitution rate is medium, 50% on average), NOCODA

(99.73) has a slightly lower ranking value than *[m (100.27).

42. Nasal substitution rates predicted by the learned grammars

Conventional Zagalog I Zagalog II

UR training prediction training prediction

A /paŋ+pa/ 1 1 0.8 0.81

Existing B /paŋ+pe/ 0.6 0.61 0.4 0.41

word C /paŋ+bi/ 0.8 0.78 0.6 0.62

D /paŋ+bu/ 0.4 0.39 0.2 0.2

Novel 1 /paŋ+po/ n/a 0.95 n/a 0.5

word 2 /paŋ+bo/ n/a 0.70 n/a 0.27

(n/a = not available)

12

a. Word-specific substitution rate: The substitution rates for existing words in both

Zagalog dialects were successfully reproduced in the simulation.

b. Frequency matching:

The average substitution rate for novel words is higher in Zagalog I (0.83) than in

Zagalog II (0.39). These testing results are consistent with the higher aggregate

rate of substitution in Zagalog I lexicon.

Voicing effect: In both dialects, nasal substitution is more likely with /p/-initial

novel stems (i.e., /po/) than /b/-initial stems (i.e., /bo/). This testing result is

consistent with the voicing effect in the lexicon.

Conclusion

43. Building crucially on Zuraw (2010) and Pater et al (2012), I have proposed a model of

lexically-conditioned variation.

- Mechanisms adopted to explain the characteristic patterns of mixed variation:

Word-specific rate of rule application

Lexical listing of attested variants of morphologically complex words

UR constraints

(High-ranking) faithfulness constraints for forms listed in the lexicon

Generalization to novel words and frequency matching

Lower-ranked markedness and general faithfulness constraints.

Notice that the ranking values of the these lower-ranked constraints for

generalization and frequency matching can be learned although no words in the

training set show actual substitution (as in Zuraw 2010).

44. With a toy example based on Tagalog nasal substitution, I have illustrated how the proposed

model can be learned and capture the patterns of mixed variation.

45. What to do next: Apply the proposal to real language data like Korean n-insertion and

English t/d-deletion.

References (selected) Boersma, Paul. (1997) How we learn variation, optionality, and probability. Proceedings of the Institute of

Phonetic Sciences of the University of Amsterdam 21. 43-58. [Available on Rutgers Optimality

Archive, ROA-221.]

Boersma, Paul. (1998) Functional Phonology: Formalizing the Interaction between Articulatory and Perceptual

Drives. The Hague: Holland Academic Graphics. [Doctoral dissertation, University of Amsterdam.]

Boersma, Paul & Bruce Hayes. (2001) Empirical tests of the Gradual Learning Algorithm. Linguistic Inquiry,

32: 45-86.

Coetzee, Andries. & Joe Pater (2011) The place of variation in phonological theory. In Goldsmith et al. (eds.)

The Handbook of Phonological Theory, 2nd edition. Malden, MA and Oxford, UK: Blackwell, 401-434.

Hayes, B., B. Tesar, & K. Zuraw (2013) OTSoft 2.3.2, software package,

http://www.linguistics.ucla.edu/people/hayes/otsoft/.

Jun, Jongho (2014) Phonological variation in Seoul Korean n-insertion. Handout presented at The 45th annual

meeting of the North East Linguistic Society, MIT, October 31 - November 2, 2014.

Pater, Joe, Robert Staubs, Karen Jesney and Brian Smith. (2012) Learning probabilities over underlying

representations. In the Proceedings of the Twelfth Meeting of the ACL-SIGMORPHON:

Computational Research in Phonetics, Phonology, and Morphology. 62-71.

Zuraw, Kie. (2010) A model of lexical variation and the grammar with application to Tagalog nasal

substitution. NLLT 28.2: 417-472.

http://aclweb.org/anthology-new/W/W12/W12-2308.pdf

http://aclweb.org/anthology-new/W/W12/W12-2308.pdf

http://www.linguistics.ucla.edu/people/zuraw/dnldpprs/NasSub_NLLT2.pdf

http://www.linguistics.ucla.edu/people/zuraw/dnldpprs/NasSub_NLLT2.pdf

Documents

춘계공동학술대회 HUFS A model of lexically-conditioned ...ling.snu.ac.kr/jun/work/2015_hufs.pdf12. Frequency matching in Tagalog nasal substitution (Zuraw 2010) • Nasal substitution