Lecture 5: Language Models and Smoothingdemo.clab.cs.cmu.edu/NLP/S19/files/slides/05-lm.pdf ·...

Natural Language Processing

Lecture 5: Language Models and Smoothing

Language Modeling

• Is this sentences good?

– This is a pen

– Pen this is a

• Help choose between optons, help score optons

–他向记者介绍了发言的主要内容– He briefed to reporters on the chief contents of the statement

– He briefed reporters on the chief contents of the statement

– He briefed to reporters on the main contents of the statement

– He briefed reporters on the main contents of the statement

One-Slide Review of Probability Terminology

• Random variables take diferent values, depending on

chance.

• Notaton: p(X = x) is the probability that r.v. X takes value xp(x) is shorthand for the samep(X) is the distributon over values X can take (a functon)

• Joint probability: p(X = x, Y = y)– Independence

– Chain rule

• Conditonal probability: p(X = x | Y = y)

Unigram Model

• Every word in Σ is assigned some probability.

• Random variables W1, W2, ... (one per word).

Part of A Unigram Distributon

…[rank 1001]p(joint) = 0.00014p(relatvely) = 0.00014p(plot) = 0.00014p(DEL1SUBSEQ) = 0.00014p(rule) = 0.00014p(62.0) = 0.00014p(9.1) = 0.00014p(evaluated) = 0.00014...

[rank 1]p(the) = 0.038p(of) = 0.023p(and) = 0.021p(to) = 0.017p(is) = 0.013p(a) = 0.012p(in) = 0.012p(for) = 0.009...

Unigram Model as a Generator

first, rrtm lesss thes sheiss iis�srsnst 2000040), tutt heiscihe otalle 19.2 Mtisle thesisr It ~(is?1), oisvsns 00.62 thesss (x00; m altcihe 1 scihesiutles. x 600 1998. utnsisr by Nttiscis s tr staltsi CFG 1200 bs 10000 al letcialtistns alciciutralciy Ir m tisles nstts 21.8 salcihe 00 WP thealt thes thealt Ntv?alk. tt rutnscitistns; tt [00, tt iis�srsnst valleutss, m tisle 65 cialsss. salisi - 240.940 ssnstsnsciss nstt thealt 2 Ins tt cileutstsrisnso salcihe K&M 10000 Btleiralcis X))] alppleissi; Ins 10040 S. oralm m alr als (Sscitistns citnstralstisvs thessiss, thes m alciheisnsss talbles -5.66 trisalles: Ans thes tsxtutalle (ralm isley alppleiscialtistnss. Ws healvs rtr m tisles 4000.1 nst 156 sxpscitsi alrs nssisohebtrhetti

Full History Model

• Every word in Σ is assigned some probability, conditoned on every history.

Bislele Cleisnsttns's utnsutsutalleley iisrscit citm m snst Wsinsssialy tns thes ptssisbles rtles tr ralcis isns thes slescitistns als isns ksspisnso isthe thes Cleisnsttnss' bisi tt ptrtraly Obalm al, het iss alism isnso tt bscitm s thes first blealcik U.S. prssisisnst, als thes cilesalr ralvtrists, thesrsby lessssnsisnso thes pttsnstisalle ralleletutt isr Hislelealry Cleisnsttns itss nstt isns isns Stutthe Calrtleisnsal.

N-Gram Model

• Every word in Σ is assigned some probability, conditoned on a fixed-leendth history (n – 1).

Bigram Model as a Generator

s. (A.33) (A.340) A.5 MtisleS alrs allest bssns citm plestsley sutrpalsssi isns psrrtrm alnscis tns iralrts tr tnsleisnss alleotristhem s cialns alciheissvs ralr m trs st heisles sutbstalnstisalleley ism prtvsi utsisnso CE. 40.40.1 MLEalsalCalsstrCE 71 26.340 23.1 57.8 K&M 402.40 62.7 4000.9 4040 403 900.7 10000.00 10000.00 10000.00 15.1 300.9 18.00 21.2 600.1 utnsiisrscitsi svalleutaltistnss iisrscitsi DEL1 sRANS1 nssisohebtrhetti. sheiss citnstisnsutss, isthe sutpsrvisssi isnsist., ssm issutpsrvisssi MLE isthe thes MEsU- Salbalnsciissrssbalnsk 195 ADJA ADJD ADV APPR APPRARs APPO APZR ARs CARD FM IsJ KOUI KOUS KON KOKOM NN NN NN IN JJ NNshesisr prtblesm iss y x. shes svalleutaltistns t�srs thes heyptthessiszsi leisnsk oralm m alr isthe al Galutssisalns

Trigram Model as a Generator

ttp(xI ,risohet,B). (A.39) visnss00(X, I) rcitnsstist00(I 1, I). (A.4000) visnss(ns). (A.401) shesss squtaltistnss srs prsssnstsi isns btthe cialsss; thesss scitrss ut<AC>isnstt al prtbalbisleisty iisstrisbuttistns iss svsns sm allelesr(r =00.005). sheiss iss sxalcitley rEM. Dutrisnso DA, iss oraliutalleley rslealxsi. sheiss alpprtalcihe citutlei bs sfciissnstley utssi isns prsvistuts cihealptsrs) bsrtrs tralisnsisnso (tsst) K&MZsrtLtcialleralnsitm m tisles Fisoutrs40.12: Disrscitsi alciciutralciy tns allele sisx lealnsoutaloss. Im ptrtalnstley, thesss palpsrs alciheissvsi stalts- tr-thes-alrt rssutlets tns thesisr talsks alnsi utnslealbslesi ialtal alnsi thes vsrbs alrs allelet si (rtr isnsstalnscis) tt sslescit thes cialriisnsalleisty tr iisscirsts strutcitutrss, leisks m altciheisnsos tns sisohetsi oralphes (MciDtnsallei st alle., 1993) (35 talo typss, 3.39 bists). shes Butleoalrisalns,

What’s in a word

• Is punctuaton a word?– Does knowing the last “word” is a “,” help?

• In speech– I do uh main- mainly business processing

– Is “uh” a word?

For Thought

• Do N-Gram models “know” English?

• Unknown words

• N-gram models and fnite-state automata

Startng and Stopping

Unigram model:

Bigram model:

Trigram model:

Evaluatio

Which model is beter?

• Can I get a number about how good my model is for a test set?

• What is the P(test_set | Model )

• We measure this by Perplexity

• Perplexity is the probability of test set normalized by the number of words

Perplexity

Perplexity of diferent models

• Beter models have lower perplexity– WSJ: Unigram 962; Bigram 170; Trigram 109

• Diferent tasks have diferent perplexity– WSJ (109) vs Bus Informaton Queries (~25)

• Higher the conditonal probability,lower the perplexity

• Perplexity is the average branching rate

What about open class

• What is the probability of unseen words?– (Naïve answer is 0.0)

• But that’s not what you want– Test set will usually include words not in training

• What is the probability of – P(Nebuchadnezzur | son of )

LM smoothing

• Laplace or add-one smoothing– Add one to all counts

– Or add “epsilon” to all counts

– You stll need to know all your vocabulary

• Have an OOV word in your vocabulary– The probability of seeing an unseen word

Good-Turing Smoothing

• Good (1953) From Turing.– Using the count of things you’ve seen once to estmate count of

things you’ve never seen.

• Calculate the frequency of frequencies of Ngrams– Count of Ngrams that appear 1 tmes

– Count of Ngrams that appear 2 tmes

– Count of Ngrams that appear 3 tmes

– …

– Estmate new c = (c+1) (N_c + 1)/N_c)

• Change the counts a litle so we get a beter estmate for count 0

Good-Turing’s Discounted CountsAP Newswire

BigramsBerkeley Restaurants Bigrams Smith Thesis

Bigrams

c Nc c* Nc c* Nc c*

00 740,671,10000,000000 00.000000002700 2,0081,4096 00.00002553 x 38,00408 / x

1 2,0018,00406 00.40406 5,315 00.5339600 38,00408 00.211407

2 40409,721 1.26 1,4019 1.3572940 40,0032 1.0050071

3 188,933 2.240 6402 2.373832 1,40009 2.12633

40 1005,668 3.240 381 40.0081365 7409 2.63685

5 68,379 40.22 311 3.7813500 395 3.91899

6 408,1900 5.19 196 40.50000000000 258 40.4022408

Backof

• If no trigram, use bigram

• If no bigram, use unigram

• If no unigram … smooth the unigrams

Estmatng p(w | heissttry)

• Relatve frequencies (count & normalize)

• Transform the counts:– Laplace/“add one”/“add λ”

– Good-Turing discountng

• Interpolate or “backof”:– With Good-Turing discountng: Katz backof

– “Stupid” backof

– Absolute discountng: Kneser-Ney

Lecture 5: Language Models and Smoothingdemo.clab.cs.cmu.edu/NLP/S19/files/slides/05-lm.pdf ·...

Documents

TT 300 S TT 350 S TT 380 DL · 2015-11-22 · TT 300 S TT 350 S TT 380 DL Originalbetriebsanleitung 4 Original operating instructions 6 Notice d'instructions d'origine 8 Istruzioni

Pertemuan 9 Keseimbangan-IS-LM.pdf

TT - interlinea.mediabiblos.it · ttttt ttt ttt ttttt ttttt tttr trt ttttttttt tttttttt t t tt t hvwuhpdphqwh hyroxwh t tt tt tt trtt ttttr t tt tt tt trtt ttttr tt ttt tt tt doo

Công khai theo TT 36.2017.TT-BGDĐT

Tt 96 2015 tt-btc

ttttttttttttttttttttttttttttttt t tttttttttt tt td tt ...ytubiomechatronics.com/wp-content/uploads/2017/11/190_tok_bildir.… · td tt ttttt tt tttt tt t t tttltt. Created Date: 10/1/2017

ijpr.iut.ac.ir · l tb, %s ba+ .shl@ - 4p ?@$ .b ? b%@ % tt fz qy .$ t @ ,t nttg g@ t bt.$ . t ... ttb tt " t ts v @tti1 tt) % %% tt-bftt 4 tt j

OFICINA PRINCIPAL · 2016. 4. 11. · POR EL GOBIERNO NACIONAL - (Acciones de la clase "A"): ALVARO DUZ EMILIO TORO CARLOS LLERAS RESTREPO ... Feb.16 ti tt ti tt tt.. tt tt.. " tt

TT 130.2016.TT-BQP 10.9 - datafile.chinhsachquandoi.gov.vndatafile.chinhsachquandoi.gov.vn/TTDT/TT 130.2016.TT-BQP 10.9.16.pdfHu'óng dân dieu chinh trç cap häng tháng doi vói

Tiet 6 lai hai cap tt tt

Tt 66.2013.tt btc

Report Corso INGEGNERIA ELETTRONICA - corsi.unica.itcorsi.unica.it/.../files/2014/12/Report-INGEGNERIA-ELETTRONICA-LM.pdf · Report Corso INGEGNERIA ELETTRONICA - Laurea Magistrale

tt signal tauola tt bkg tauola tt signal madgraph tt bkg madgraph mm179.80.255181.60.71 InvM ]76,106[136.70139.10.48 MET>5077.55077.50.19 Btag1>1.75

IMP. MODERN - ParíaA 13,4 - ddd.uab.cat · II * N Mediodía n u Sobremesa ti it N it tt tt tt tt H tt Tarde tt Título de la Sección o parte del programAutores a ... españolas,

Federalizm a budowanie jedności Europy - Studia Europejskie · alism: The Relevance of Classical Approaches w: The Euopean Union. Reading on the Theory and Practice of European Integraion,

НАЦІОНАЛЬНА АКАДЕМІЯ НАУК УКРАЇНИlibrary.nuft.edu.ua/ebook/file/08.10.01 Tzertzik LM.pdf · 2017. 11. 13. · НАЦІОНАЛЬНА АКАДЕМІЯ

2€¦ · 5 4, 5 ˜ 6 / 2 6 . 7˝ )2 3+2 tt"˘ %tt< c tt=)ˆ 6˝tt tt

DM-TT-01(05) | DM-TT-10(25) | DM-TT-50(80) · DM-TT-25 y DM-TT-01(05) para los detectores de aspiración de humos siguientes: –TITANUS TOP·SENS® TT-1 y TITANUS TOP·SENS® TT-2

dispppa.sumutprov.go.iddispppa.sumutprov.go.id/downlot.php?file=87LAMPIRAN... · Web viewJLH IBU HAMIL IMUNISASI TETANUS TOKSOID PADA IBU HAMIL TT-1 TT-2 TT-3 TT-4 TT-5 TT2+ JLH %

Cobra mk1 Cobra TT Cobra TT/AWD - crowdersupply.com · Cobra mk1 Cobra TT Cobra TT/AWD Operator’s Instructions Instructions pour l’opérateur Bedienungsanleitung Instrucciones