69
 TRƯỜNG ĐẠI HC BÁCH KHOA HÀ NI VI N CÔNG NGH THÔNG TIN VÀ TRUY N THÔNG  ──────── * ───────  ĐỒ ÁN TT NGHIP ĐẠI HC NGÀNH CÔNG NGH THÔNG TIN Ứ NG DNG MÔ HÌNH MARKOV N GII QUYT VN ĐỀ NHP NHNG TRONG VĂN PHM LIÊN K T Sinh viên thc hin: Phm Hùng Lớ  p KHMT   K52 Giáo viên hướ ng dn: ThS. Nguyn Th Thu Hƣơng HÀ NI 6-2012

DoAn HungP Final

Embed Size (px)

Citation preview

TRNG I HC BCH KHOA H NI VIN CNG NGH THNG TIN V TRUYN THNG * N TT NGHIP I HC NGNH CNG NGH THNG TIN NG DNG M HNH MARKOV N GII QUYT VN NHP NHNG TRONG VN PHM LIN KT Sinh vin thc hin:Phm Hng Lp KHMT K52 Gio vin hng dn:ThS. Nguyn Th Thu Hng H NI 6-2012Sinh vin thc hin: Phm Hng, Kho 52 Lp KHMT- 1 - PHIU GIAO NHIM V N TT NGHIP 1. Thng tin v sinh vin H v tn sinh vin: Phm Hng in thoi lin lc 01699534074 Email: [email protected] Lp: Khoa hc my tnhK52 Khoa CNTT H o to: i hc hnh quy n tt nghip c thc hin ti: Trng i hc Bch Khoa H Ni Thi gian lm ATN: T ngy1/3/2012n31/5/2012 2. Mc ch ni dung ca ATN ng dng m hnh Markov n gii quyt vn nhp nhng trong vn phm lin kt 3. Cc nhim v c th ca ATN-Tm hiu v m hnh vn phm phi ng cnh, vn phm lin kt. -Tm hiu m hnh Markov n v ng dng mt s thut ton trong m hnh Markov gii quyt vn nhp nhng trong bi ton phn tch c php vn phm lin kt. 4. Li cam oan ca sinh vin: Ti Phm Hng - cam kt ATN l cng trnh nghin cu ca bn thn ti di s hng dn ca ThS Nguyn Th Thu Hng.Cc kt qu nu trong ATN l trung thc, khng phi l sao chp ton vn ca bt k cng trnh no khc. H Ni, ngy 26thng 5nm 2012 Tc gi ATN Phm Hng 5. Xc nhn ca gio vin hng dn v mc hon thnh ca ATN v cho php bo v: H Ni, ngythngnm 2012 Gio vin hng dn ThS Nguyn Th Thu Hng Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 2 - TM TT NI DUNG N TT NGHIP Trong x lngn ng t nhin, phn tchc php lvn ht sc quan trng. N l qu trnhphntchmtdycctxcnhcutrccphpcachngdatrnmtvn phm cho trc. Vi m hnh vn phm phi ng cnh, sau khi thc hin bc phn tch c phpthivivnphmcnhpnhngthssinhranhiucyphntchkhcnhaui vi mt cu u vo. Vy ta s la chn cy phn tch c php no lm kt qu ca qu trnh phn tch ny. Hng gii quyt ca n l da vo m hnh vn phm phi ng cnh xc sut s dng Vn phm lin kt a ra mt xc sut ng vi mi cy phn tch c php thu c.T ta c th la chn cy phn tch c php ph hp. Chng uca ngii thiu v cch tip cn thng k cho bi tonphn tch c php bao gm cc khi nim v vn phm phi ng cnh, vn phm phi ng cnh xc sut, bi ton phn tch c php. Chng hai n cp ti vic phn tch c php theo m hnh vn phm lin kt. Phn ny s gii thiu v vn phm lin kt Link Grammar, nh ngha dng tuyn, xy dng biu thc kt ni. Chng ba ca n trnh by v m hnh Markov n v cc gii thut . ChngcuinngdnggiithuttnhxcsuttrongvgiithutViterbigiiquyt vn kh nhp nhng bi ton phn tch c php vn phm lin kt. Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 3 - LI CM N Trc ht, em xin c chn thnh gi li cm n su sc ti cc thy c gio trong trng ihc BchKhoa H Ni ni chung vcc thy ctrongvin Cng ngh Thng tin v Truyn Thng, b mn Khoa hc my tnhni ring tn tnh ging dy, truyn t cho em nhng kin thc v nhng kinh nghim qu bu trong sut 5 nm hc tp v rn luyn ti trng i hc Bch Khoa H Ni. EmxincgilicmnnThcsNguynThThuHng-Gingvinb mn Khoa hc my tnh, vin Cng ngh Thng tin v Truyn thng, trng i hc BchKhoaHNihtlnggip,hngdnvchdytntnhtrongqu trnh em lm n tt nghip. Cui cng, em xin c gi li cm n chn thnh ti gia nh, bn b quan tm, ng vin, ng gp kin v gip trong qu trnh hc tp, nghin cu v hon thnh n tt nghip. H Ni, ngy 26thng 05 nm 2012 Phm Hng Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 4 - MC LC PHIU GIAO NHIM V N TT NGHIP.............................................. 1 TM TT NI DUNG N TT NGHIP.................................................... 2 LI CM N ........................................................................................................... 3 MC LC ................................................................................................................. 4 DANH MC HNH V ........................................................................................... 6 DANH MC BNG ................................................................................................. 7 DANH MC T VIT TT ................................................................................... 8 CHNG 1: M HNH VN PHM PHI NG CNH XC SUT V BI TON PHN TCH C PHP .............................................................................. 9 1.1. Vn phm .................................................................................................................... 10 1.2.Vn phm phi ng cnh. ............................................................................................... 11 1.2.1 Dn xut v ngn ng............................................................................................ 11 1.2.2 Cy dn xut .......................................................................................................... 12 1.2.3. Vn phm nhp nhng .......................................................................................... 14 1.4.Vn phm phi ng cnh xc sut (PCFG) .................................................................... 16 1.4.1 nh ngha ............................................................................................................. 16 1.4.2 Mt v d v vn phm phi ng cnh xc sut ...................................................... 17 1.5. Bi ton phn tch c php trong vn phm phi ng cnh .......................................... 19 CHNG 2: M HNH VN PHM LIN KT ............................................ 20 2.1 Gii thiu vn phm lin kt ........................................................................................ 20 2.2. nh ngha v k hiu ................................................................................................. 23 2.2.1.Cc vn phm lin kt n gin .......................................................................... 23 2.2.2. Dng tuyn ........................................................................................................... 24 2.3. Vn phm lin kt cho ting Vit ................................................................................ 26 2.3.1. Danh t v cm danh t ....................................................................................... 26 2.3.2. ng t v cm ng t ....................................................................................... 31 2.3.3. Tnh t v cm tnh t .......................................................................................... 34 2.3.4. Cc lin kt gia cm danh t, cm ng t v cm tnh t ................................ 35 2.3.5. Cc cu trc cu khc ........................................................................................... 39 Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 5 - 2.3.6. Lin t kt hp ..................................................................................................... 43 CHNG 3: M HNH MARKOV N TRONG VN PHM LIN KT .. 46 3.1. M hnh Markov n ..................................................................................................... 46 3.1.1. Gii thut tin ....................................................................................................... 48 3.1.2. Gii thut li ......................................................................................................... 49 3.1.3. Gii thut Viterbi .................................................................................................. 50 3.2. Thut ton trong ngoi ............................................................................................. 51 3.2.1. C s l thuyt ..................................................................................................... 51 3.2.2. Thut ton trong-ngoi trong vn phm phi ng cnh xc sut ........................... 54 CHNG 4: NG DNG TRONG PHN TCH C PHP VN PHM LIN KT ............................................................................................................... 58 4.1. Bi ton phn tch c php .......................................................................................... 58 4.2. Bi ton kh nhp nhng trong phn tch c php vn phm phi ng cnh ................ 58 4.2.1 Gii thut tnh xc sut trong gii quyt bi ton on nhn. ............................... 59 4.2.2. Gii thut Viterbi tm phn tch tt nht cho cu ................................................. 61 4.3. Bi ton kh nhp nhng trong phn tch c php vn phm lin kt......................... 62 4.3.1. Bi ton on nhn vn phm lin kt ................................................................. 63 4.3.2. Tm cy tt nht cho cu vn phm lin kt ......................................................... 65 4.4. S ph thuc vo tree bank ......................................................................................... 66 KT LUN ............................................................................................................. 67 TI LIU THAM KHO ..................................................................................... 68 Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 6 - DANH MC HNH V Hnh 1.1 V d v mt cy c php .......................................................................................... 9 Hnh 1.2 V d mt cy dn xut t vn phm ....................................................................... 13 Hnh 1.3 Ghp ni cc cy dn xut ....................................................................................... 14 Hnh 1.4 Cy dn xut khc nhau cho cng mt chui nhp ................................................. 15 Hnh 1.5 Cy phn tch ng php 1 ........................................................................................ 18 Hnh 1.6 Cy phn tch ng php 2 ........................................................................................ 18 Hnh 2.1 Cc yu cu lin kt ................................................................................................. 20 Hnh 2.2 Cu ng ng php .................................................................................................. 21 Hnh 2.3 Cu sai ng php ..................................................................................................... 21 Hnh 2.4 Lin kt cho cu the cat chased a snake ............................................................... 22 Hnh 2.5 Lin kt cho cuthe big snake the black cat chases bit Mary .............................. 23 Hnh 3.1 M hnh Markov n ................................................................................................. 46 Hnh 3.2 M hnh Markov theo nh ngha hnh thc ............................................................ 47 Hnh 3.3 Cy phn tch c php cu Stuart loves his thesis and Barbara does too ............ 51 Hnh 3.4 Tnh xc sut trong .................................................................................................. 55 Hnh 3.5 nh ngha xc sut ngoi ....................................................................................... 56 Hnh 3.6 Tnh xc sut ngoi .................................................................................................. 57 Hnh 4.1. Tnh xc sut trong m hnh vn phm phi ng cnh ............................................ 60 Hnh 4.2 Thut ton on nhn .............................................................................................. 64 Hnh 4.3. Xc sut cc lut PCFG ting Anh c xy dng kh chnh xc ......................... 66 Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 7 - DANH MC BNG Bng 1: V d ca mt t in................................................................................................ 22 Bng 2: T in phc tp hn ................................................................................................ 23 Bng 3: Danh t v cc thnh phn ph ................................................................................. 26 Bng 4: Cc thnh phn ph trc ng t ............................................................................ 31 Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 8 - DANH MC T VIT TT T vit ttT y ngha CYKCocke-Younger-KasamiGii thut CYK CNFChomsky normal formDng chun Chomsky VPLKVn phm lin kt PCFGProbabilistic context-free grammar Vn phm phi ng cnhxc sut HMMHidden Markov ModelM hnh Markov n Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 9 - CHNG1:MHNHVNPHMPHINGCNH XC SUT V BI TON PHN TCH C PHP Phn tch c php l mt khu ht sc quan trng trong x l ngn ng t nhin.Vi mt chui t u vo (cc t ny l u ra ca qu trnh phn tch t t, chng trnh tch t ) s a ra kt lun chui t c c ng c php khng, nu ng th s a ra phn tch nhm gip a ra c cu trc ng php ca chui t. Hay ni mt cch khc, phn tch c php (parsing hay syntatic analys) l qu trnh phn tch chui u vo nhm a ra cu trc ng php ca chui t da trn mt vn phm no . Cu trc ng php ny thng c a ra di dng cy c php, bi thng qua dng cy s ph thuc ca cc thnh phn trong chui t c quan st trc quan. Trong phn ny ca n s gii thiu vn phmm c s dng xuyn sut trong n, l vn phm phi ng cnh xcsut.lmrkhinimcavnphmphingcnhxcsut,taphicc nhng khi nim c bn v vn phm cng nh vn phm phi ng cnh. Tip n chng 1 s cp n hng tip cn thng k cho bi ton phn tch c php. Hnh 1.1 V d v mt cy c php Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 10 - 1.1. Vn phm Tacthhnhdungmtvnphmnhmtthitbtngmnckhnng sinh ra mt tp hp cc t trn mt bng ch ci cho trc. Mi t c sinh ra sau mt s hu hn bc thc hin cc quy tc ca vn phm. Vn phm G l mt b sp th t gm 4 thnh phn: G = < , , S, P > trong :- l mt bng ch ci, gi l bng ch ci c bn (hay bng ch ci kt thc), mi phn t ca n c gi l mt k hiu kt thc hay k hiu c bn.- l mt bng ch ci, = , gi l bng k hiu ph (hay bng ch ci khng kt thc), mi phn t ca n c gi l mt k hiu khng kt thc hay k hiu ph.-S c gi l k hiu xut pht hay tin ;-P l tp hp cc quy tc sinh c dng , c gi l v tri v c gi l v phi ca quy tc ny, vi , ( )* v trong cha t nht mt k hiu khng kt thc.P = { | = A, vi A , , , ( )* } Chng hn, vi = {0,1}, = {S, A, B} th cc quy tc S 0S1A, 0AB 1A1B, A,lccquytchplvvtrilunchatnht1khiuphthuc. Nhng cc quy tc dng0 A, 01 0B, l cc quy tc khng hp l. Mt v d v vn phm G = trong : = {ti, anh, ch, n, ung, cm, ph, sa, caf}, = {, , , , , ,}, S = , P= {, ti, anh, ch,, , n,ung, cm, ph, sa,caf}. Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 11 - 1.2.Vn phm phi ng cnh. Vn phm G = < , , S, P > m cc quy tc ca n c dng A, trong A, ( )*, c gi l vn phm nhm 2 hay vn phm phi ng cnh. Vnphmphingcnh(CFG)lmththnggmbnthnhphn,khiul vn phm G (, , S, P), trong : - l tp hu hn cc bin (hay k t cha kt thc) - l tp hu hn cc k t kt thc, V T = -P l tp hu hn cc lut sinh m mi lut sinh c dng A vi A l bin v l chui cc k hiu ( )* -S l mt bin c bit gi l k hiu bt u vn phm. V d v mt vn phm phi ng cnh Vn phm G ({S, A, B}, {a, b}, P, S ), trong P gm cc lut sinh sau: S AB AAA aBBB b Quy c k hiu: -Cc ch in hoa A, B, C, D, E, v S k hiu cc bin (S thng c dng lm k hiu bt u ).-Cc ch nh a, b,c, d, e,; cc ch s vmt s k hiu khck hiu cho cc k hiu kt thc.1.2.1 Dn xut v ngn ng nhnghangnngsinhbivnphmCFGG(,,S,P),tadnnhpkhi nim dn xut. Trc ht ta gii thiu hai quan h G v *G gia hai chui trong tp ( )*.Nu A l mt lut sinh trong vn phm v , l hai chui bt k trong tp( )* th AG ,hay ta cnni lutsinh A p dng vo chui A thu c chui ,ngha l A sinh trc tip trong vn phm G. Hai chui gi l quan Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 12 - h nhau bi G nu chui th hai thu c t chui th nht bng cch p dng mt lut sinh no . Gi s 1, 2, , m l cc chui thuc ( )* vi m 1 v : 1 G 2, 2 G 3, , m -1 G m th ta ni1*G m hay 1 dn xut ra m trong vn phm G. Ngn ng sinh bi vn phm phi ng cnh l tp hp tt c cc t, chui cc t c sinh bi vn phm phi ng cnh. Cho vn phm G (, , S, P), ta nh ngha : L(G) = {ww *v S *G w} Ngha l, mt chui thuc L(G) nu: 1)Chui gm ton k hiu kt thc.2)Chui c dn ra t k hiu bt u S.Ta gi L l ngn ng phi ng cnh (CFL) nu n l L(G) vi mt CFG G no . Chui gm cc k hiu kt thc v cc bin, c gi l mt dng cu sinh t G nuS *. Hai vn phm G1, G2 c gi l tng ng nu L(G1) = L(G2) 1.2.2 Cy dn xut dhnhdungsphtsinhraccchuitrongvnphmphingcnh,tathng din t mt chui dn xut qua hnh nh mt cy. Mt cch hnh thc, ta nh ngha nh sau: Cho vn phm G (, , S, P). Cy dn xut (hay cy phn tch c php) ca G c nh ngha nh sau : -Mi nt (nh) c mt nhn, l mt k hiu ( {})-Nt gc c nhn l k hiu bt u S.-Nu nt trung gian c nhn A th A V-Nu nt n c nhn A v cc nh n1, n2, , nkl con ca n theo th t t tri sang phi c nhn ln lt l X1, X2, , Xk th A X1X2 Xk l mt lut sinh trong tp lut sinh P. -Nu nt n c nhn l t rng th n phi l nt l v l nt con duy nht ca Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 13 - nt cha ca n. Xt vn phm G ({a, b}, {S, A}, S, P), trong P gm: S aAS | a A SbA | SS | ba Mt cy dn xut t vn phm c dng nh hnh sau : Hnh 1.2 V d mt cy dn xut t vn phm Ta thy, nt 1 c nhn S v cc con ca n ln lt l a, A, S (ch S aAS l mt lut sinh). Tng t, nt 3 c nhn A v cc con ca n l S, b, A (t lut sinh A SbA). Nt 4, 5 c cng nhn S v c nt con nhn a (lut sinh S a). Cui cng nt 7 c nhn A v c cc nt con b, a (lut sinh A ba). Trn cy dn xut, nu ta c cc l theo th t t tri sang phi th ta c mt dng cu trong G. Ta gi chui ny l chui sinh bi cy dn xut. i vi cy dn xut trn th ta c S * aabbaa bng chui dn xut : S aAS aSbAS aabAS aabbaS aabbaa Ti y ta xt mi quan h gia dn xut v cy dn xut : Nu G (, , S, P) l mt vn phm phi ng cnh th S * nu v ch nu c cy dn Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 14 - xut trong vn phm sinh ra .Trongvdvvnphmphingcnhtrn,bcutintrongdnxutlS aAS. Theo di cc bc suy dn sau , ta thy bin A c thay bi SbA, ri tr thnh abA v cui cng thnh abba, chnh l kt qu ca cy T2 (Acy). Cn binS th c thay bi a v l kt qu ca cy T3 (S cy). Ghp ni li, ta c cy dn xut m kt qu l chui aabbaa nh di y. Hnh 1.3 Ghp ni cc cy dn xut 1.2.3. Vn phm nhp nhng Mt vn phm phi ng cnh G c gi l nhp nhng nu tn ti mt w thuc L(G) m c t nht hai cy dn xut khc nhau. Xt vn phm G(, , S, P) vi ={ astronomers, ears, saw, stars, telescopes,with} ={ S,NP,VP,PP,P,V } v cc lut sinh S l k hiu bt u Tp lut P nh sau Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 15 - Vnphmktrnxydngc2cydnsutkhcnhauchocngmtcuu vo. Cu u vo l astronomers saw stars with ears Hnh 1.4 Cy dn xut khc nhau cho cng mt chui nhp Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 16 - 1.4.Vn phm phi ng cnh xc sut (PCFG) 1.4.1 nh ngha Vn phm phi ng cnh xc sut l b 4G{ E, , S, R} -: tp k hiu khng kt thc (bin) . -E: tp k hiu kt thc (khng giao vi ). -R: tp lut hay tp sn xut dng A|. A l k hiu khng kt thc, | l xu gm hu hn k hiu trn tp v hn (E)*,-S: k hiu u. -NgoiracncxcsutPngvimiluttrongtpR.Xcsutplstrong on [0,1]l biu din xc sut P(||A) ca lut A|. Mhnhxcsutnginnycsdnggiiquytvnnhpnhng.Xt mi cy ng php ca cu S (cy cho kt qu l S), cy c chn s l cy tha mn yu cu. Xc sut ca ca mt cy ng php T cho mt cu S bng tch cc xc sut tt c cc lut c dng trong T.Xc sut ca mt cy l tch cc xc sut ca n lut c dng LHSi RHSi (LHS: v tri (Left Hand Side), RHS: v phi (Right Hand Side)) ( )(

)

c s dng m rng n nt trong ca n. Cy c chn l cy c xc sut ln nht

()

()

( )()

( ) Xc sut ca mt cu W bng tng xc sut ca tt c cc cy c th c() ( )

Theo xc sut ca mi lut A | c tnh da trn treebank s dng nh gi v tng t cao nht Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 17 - (|) (|)() Trong -(|) l s cc lut c dng |. -() l s cc lut trong tp R c k hiu khng kt thc A v tri. V vy : (|)

Tng xc sut ca cc lut c dng A | trong tp lut R. y vn phm phi ng cnh vi cc lutthng c a v dngchun Chomsky.Dng chun Chomskyquy nh cc lut c ch c dng A BC v A| trong A,B,C l cc k hiu khng kt thc thuc cn | l k hiu kt thc thuc E. 1.4.2 Mt v d v vn phm phi ng cnh xc sutMt vn phm phi ng cnh G {E, , S, R }. Trong -: tp k hiu khng kt thc = { S, NP, VP, PP, N, V, P } -E: tp k hiu kt thc (khng giao vi ) E ={Ti, xe, nhn, chic, vi, ng nhm} -R:={SNPVP,NPTi,NPNN,Nchic,NPNPPP,Nxe,Nng nhm, VPV NP,VPVP PP, PP P NP, V nhn, Pvi } -S: k hiu u. Ta c cc lut trong R tha mn tnh cht (|)

Theo ta a ra xc sut cho mi lut: P(S NP VP) = 1 P(NPTi)=1/3 P(NPNN)=1/3 P(NP NP PP) = 1/3 P(N chic) = 1/3 P(Nxe)=1/3 P(N ng nhm) = 1/3 P(VPVNP)=1/2 P(VP VP PP) = 1/2 P(PP P NP) = 1 P(V nhn) = 1 P(P vi) = 1 Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 18 - Xt v d vi cu Ti nhn chic xe vi chic ng nhm.Do tnh nhp nhng, cu trn c th c 2 cy phn tch ng php. Hnh 1.5 Cy phn tch ng php 1 Xt cy T1, tnh xc sut cy T1 ng vi xc sut ca tt c cc lut s dng ()

Hnh 1.6 Cy phn tch ng php 2 Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 19 - Xt cy T2, tnh xc sut cy T2 ng vi xc sut ca tt c cc lut s dng ()

Nhn thy P(T1) > P(T2) do chn cy phn tch c php T1 hay T1 l cy phn tch c php tt hn, chp nhn hn T2. 1.5. Bi ton phn tch c php trong vn phm phi ng cnh B phn tch c php nhn chui cc token t b phn tch t vng v xc nhn mt chui t ny c th sinh ra t vn phm bng cch to ra cy phn tch c php cho chui. Phn tch c php kim tra xem chui u vo c c on nhn bi vn phm ang c xt khng. Nu n c on nhn th s xy dng cy phn tch c php theo tp lut ca vn phm. Tt nhin nu cy phn tch c php l duy nht th tht d dng la chn. Nhng y do tnh nhp nhng (bi nhiu nguyn nhn) ta s thu c nhiu hn mt cy phn tch c php. C rt nhiu cch tip cn, hng gii quyt trong x l bi ton phn tch c php nh-v phng php : phn tch top-down, bottom-up , gii thut CYK, gii thut Earley-v vn phm : vn phm phi ng cnh xc sut, vn phm lin kt Link Grammar Trong cc phng php trnh by trn, khi mt cu c nhiu cch phn tch, tc l sinh ra nhiu cy suy dn, chng ta khng xc nh c cy phn tch no l chnh xc nht. Vic chn cy phn tch rt quan trng bi khi bit mi quan h v ng php gia cc t trong chui s gip x l cc bc sau nh x l v ng ngha c kt qu cao trong qu trnh x l ngn ng. Vn phm cng ln, s nhp nhng nh vy xy ra cng nhiu. Cc m hnh c bn khng tnh ti s xut hin ng thi ca cc t nn khng x l c cc trng hp nhp nhng. B phn tch phi thc hin vic la chn mt cy c php hp l nht, cn c vo cc thng tin v ng ngha v ng cnh. Do , mt s phng php phn tch da trn xc sut thng k c a ra gii quyt vn ny.Trong n ny, ta s s dng vn phm lin kt(VPLK) kt hp m hnh da xc sut thng k gii quyt vn la chn cy phn tch ph hp. Vn phm lin kt Link Grammar [1] c Daniel D.K. Sleator v Davy Temperley a ra nm 1991.So vi cc phng php trc , VPLK vic m hnh ho ngn ng t nhin d dng v trc quan hn. Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 20 - CHNG 2: M HNH VN PHM LIN KT 2.1 Gii thiu vn phm lin ktPhn ln cc cu trong ngn ng t nhin c tnh cht l nu cc cung c v lin kt cc cp t c lin h vi nhau th cc cung ny khng ct nhau. Hin tng ny, m ta gi l tnh phng, l nn tng ca cc VPLK, h thng ngn ng hnh thc. Mt VPLK bao gm mt tp cc t (cc k hiu kt thc ca b vn phm), mi t c mt yu cu lin kt. Mt dy cc t l mt cu ca ngn ng c nh ngha bi b vn phm nu tn ti mt cch v cc cung (m chng ta s gi l cc lin kt) gia cc t sao cho tho mn cc iu kin sau: -Tnh phng (planarity): cc lin kt khng giao nhau (khi c v pha trn cc t). -Tnh kt ni (connectivity): cc lin kt c kh nng kt ni tt c cc t trong cu vi nhau. -Tnh tho mn (satisfaction): cc lin kt tho mn cc yu cu lin kt ca mi t trong cu. Cc yu cu lin kt ca mi t c cha trong mt t in. biu din cc yu cu lin kt, biu sau biu din mt t in n gin cho cc t a, the, cat, snake, Mary, ran, v chased. Cc yu cu lin kt ca mi t c biu din bng biu trn t . Hnh 2.1 Cc yu cu lin kt Mi mt hp c gn nhn v c hnh dng phc tp l mt kt ni. Mt kt ni cthomnbngcchcmnvomtktnitngthch(biuthquahnhca n). Nuhng chiucamt kt ni cv quay v phabn phi, th kt nitng thch vi n phi pha bnphi v quayv bn tri. Ch cmt trong cc ktni gn Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 21 - vochmencthomn(cccicnli,nuc,khngcdng).Do,cat yu cu mt kt ni D v pha bn tri, v hoc mt kt ni O v pha bn tri hoc mt kt ni S v pha bn phi. Ni mt cp kt ni vi nhau tng ng vi vic v mt lin kt gia cp t . Biu sau ch ra cc yu cu lin kt c tho mn nh th no trong cu The cat chased a snake. Hnh 2.2 Cu ng ng php (Ccktnikhngdngycloib)DdngnhnthyrngcuMary chasedthecatvthecatrancngtothnhcucabvnphmny.Dyt:the Mary chased cat khng trong ngn ng ny. Bt k c gng tho mn cc yu cu lin kt dn ti s vi phm mt trong ba lut hay sai ng php. y l mt trong s : Hnh 2.3 Cu sai ng php Tng t ran Mary v cat ran chased cng khng nm trong ngn ng ny. Mt tp cc lin kt cho thy mt dy cc t nm trong ngn ng ca VPLK c gilmtcchlinkt.Tbygichngtassdngbiunginhnbiu dincccchlinkt.ylmthnhthccnginhocabiuchrarng cu the cat chased a snake l mt phn trong ngn ng ny: Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 22 - Hnh 2.4 Lin kt cho cu the cat chased a snake Chng ta c mt k hiu ngn gn v c th c c bng my tnh biu din t in ca cc yu cu lin kt. T in sau m ho cc yu cu lin kt ca v d trn. TCng thc a snakeD+ snake catD- & (O- or S+) MaryO- or S+ ranS- chasedS- & O+ Bng 1: V d ca mt t in Yu cu lin kt ca mi t c biu din bng mt cng thc s dng ton t & v or, du ngoc v tn cc kt ni. Hu t + hoc sau mi tn kt ni biu th hng (tng i so vi t c nh ngha) m kt ni tng xng (nu c) phi nm. Ton t &cahaicngthccthomnnuchaicngthcthomn.Tontorcahai cngthcyucucngmtcngthcthomn.Thtcacctonhngcamt ton t & l c ngha. Ton hng cng xa v bn tri trong cch biu din, t n lin kt ti cng gn. Do , khi s dng cat l i tng, t hn nh (m n kt ni ti bng kt ni D-) phi gn hn ng t (m n kt ni ti bng kt ni O-). TinsaubiudinmtVPLKphctphn.(Khiu{exp}cnghalbiu thc exp l tu chn, v @A- c ngha l mt hoc nhiu kt ni loi A c th gn vo y.) Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 23 - TCng thc a theD+ snake cat{@A-} & D- & {B+} & (O- or S+) chased bitS- & (O+ or B-) ranS- big green blackA+ MaryO- or S+ Bng 2: T in phc tp hn CuthebigsnaketheblackcatchasedbitMarytrongngnngnycnh ngha bi b vn phm v ta c cc lin kt sau: Hnh 2.5 Lin kt cho cuthe big snake the black cat chases bit Mary Trongtrnghpny(cngnhtrnghptrn)cduynhtmtcchlinkt tho mn mi yu cu lin kt. 2.2. nh ngha v k hiu 2.2.1.Cc vn phm lin kt n gin T in VPLK bao gm mt tp hp cc phn t, mi phn t nh ngha cc yu cu lin kt ca mt hoc nhiu t. Cc yu cu ny c xc nh bi mt cng thc ca cc ktnickthpvinhaubngcctontnhphn&vor.Khnglmmttnh tngqutchngtacthquycrngmimtktninginlmtxuktkt thc bi + hoc -. Khi mt lin kt kt ni ti mt t, n lin quan n mt trong cc kt ni trong cng thc ca t , v khi c gi l thomn kt ni . Khng c hai lin kt c th thomntrongcngmtktni.Ccktni2phaucamtlinktphictn khp vi nhau, v kt ni bn tri phi kt thc bi + v kt ni bn phi phi kt thc bi -. Trong VPLK n gin, hai kt ni l khp nu v ch nu xu ca chng l nh nhau Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 24 - (khngtnh+hoc-cui).Mtdngtngquthncavickhpscgiithiu sau. Cc kt ni c tho mn bi cc lin kt phi tho mn ton b cng thc. Chng ta nh ngha tnh tho mn mt cng thc mt cch quy. tho mn ton t & ca hai cng thc, c hai u phi c tho mn. tho mn ton t or ca hai cng thc, mt trong hai cng thc phi c thomn, vkhngmtkt ni no trong cng thc cn li c th c tho mn. tin li i khi ta dng cng thc rng (( )) , m c tho mn bng cch khng kt ni ti lin kt no. Mt dy cc t l mt cu trong ngn ng c nh ngha bi b ngn ng nu tn ti mt cch v cc lin kt gia cc t sao cho tho mn cng thc ca mi t v cc lut sau: -Tnh phng: Cc lin kt c v bn trn cu v khng giao nhau. -Tnh kt ni: Cc lin kt c th kt ni mi t trong cu vi nhau. -Tnh th t: Khi cc kt ni ca mt cng thc c duyt t tri qua phi, cc t m n kt ni ti tin t gn ra xa. Ni cch khc, xt mt t, v xt hai lin kt kt ni t ti cc t v pha tri ca n. Lin kt kt ni ti t gn hn (lin kt ngn hn) phi tho mn mt kt ni xut hin v pha tri (trong cng thc) ca kt ni trong cc t khc. Tng t, mt lin kt v pha phi phi tho mn mt kt ni v pha tri (trong cng thc) ca mt lin kt xa hn v bn phi. -Tnh loi tr: khng c hai lin kt c th kt ni cng mt cp t. 2.2.2. Dng tuyn Vic s dng cc cng thc biu din mt t in VPLK l thun tin xy dng ccvnphmcangnngtnhin,nhnglicngknhchovicphntchtonhc ca VPLK v m t thut ton phn tch VPLK. Do chng ta gii thiu mt cch khc biu din mt VPLK gi l dng tuyn (disjunctive form). Trong dng tuyn, mi t ca vn phm c mt tp cc dng tuyn lin h vi n. Mi dng tuyn tngng mi mt cch c th tho mn ccyu cu ca mt t. Mt dng tuynbaogmhaidanhschcthtcacctnlinkt:danhschbntrivdanh schbnphi.Danhschbntribaogmccktnimnivphabnphicat hin ti (cc kt ni kt thc bi -), v danh sch bn phi cha cc kt ni m ni v pha bn phi ca t hin ti. Mt dng tuyn c k hiu: ((L1, L2,, Lm) (Rn, Rn-1,, R1)) Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 25 - Trong L1, L2,, Lm l cc kt ni m phi ni v pha tri v Rn, Rn-1,, R1 l cc ktniphinivphaphi.Slngccktnitrongtngdanhschcthbng0. Du+hoctheosaucthloibkhitncaktnikhisdngdngtuyn,v hng c ngm nh trong dng ca dng tuyn. tho mn cc yu cu lin kt ca mt t, mt trong cc dng tuyn phi c tho mn (v khng c lin kt no c th ni ti bt kdng tuyn no khc). thomn mt dng tuyn, tt c cc lin kt ca n phi c tho mn bi cc lin kt thch hp. Cc t m L1,L2, c lin kt ti t ng bn tri ca t hin ti, v tng n iu v khong cch tnh t t hin ti. Cc t m R1,R2, c lin kt ti t ng bn phi ca t hin ti, v tng n iu v khong cch tnh t hin ti. RtddngdchmtVPLKdngtuynvdngchun.iunycththc hin c mt cch n gin bng cch vit li mi dng tuyn thnh (L1 & L2 & & Lm & R1 & R2 & & Rn) v kt hp mi dng tuyn vi nhau vi ton t or to ra cng thc thch hp. Cngrtddngdchmtcngthctmttpccdngtuyn.iunyc thchinbngcchlitkttccccchmcngthccththomn.Vd,cng thc: (A- or ( )) & D- & (B+ or ( )) & (O- or S+) tng ng vi 8 dng tuyn sau: ((A,D) (S,B)) ((A,D,O) (B)) ((A,D) (S)) ((A,D,O) ( )) ((D) (S,B)) ((D,O) (B)) ((D) (S)) ((D,O) ( )) Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 26 - 2.3. Vn phm lin kt cho ting Vit Phn ny trnh by v vn phm lin kt cho ting Vit da trn nhng c im c phpcatingVit.Vnphmlinktmchngtacpdiykhnghonton ging v y vi vn phm c m t trong t in dng cho chng trnh. Sau y ch l miu t cch thc m vn phm lin kt cho ting Vit c xy dng v pht trin. 2.3.1. Danh t v cm danh t Danh t v cm danh t l nhng thnh phn c bn trong cu. N ch ra s vt, s vic, i tng c ni n trong cu. Vi vai tr rt ln danh t c th m nhim nhiu v tr trong cu (ch ng, tn ng). y ta nhc n danh t vi cc thnh phn ph cu thnh -3-2-1 Danht trung tm 123456 Tt c Nhngcc ci Ging ip StCa ti M ti lm y Bng 3: Danh t v cc thnh phn ph (-1)(-2)(-3) l cc thnh phn ph ng trc danh t (1)(2)(3)(4)(5)(6) l cc thnh phn ph ng sau danh t Cc thnh phn ph c nhim v chung l b sung cho danh t trung tm v mt ng ngha. 2.3.1.1. Phn u cm danh t Thnh phn ph th 3 (-1) Danh t c kt ni TDT3- v cc t ci c lin kt TDT1+ bn, gh : {TDT3-} ci : TDT3+ Vi lut nh vy, cc cm t sau s c on nhn: - ci bn, ci gh Thnh phn ph th 2 (-2) Danh t c kt ni TDT2-: Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 27 - bn, gh: {TDT2-} Kt hp vi kt ni xy dng trn ta c: bn, gh: {TDT3-} & {TDT2-} Ch y l kt ni TDT2 bn phi kt ni TDT3 v cc t trong nhm 2 ng xa hn v bn tri ca danh t so vi cc t trong nhm 3. Cc t thuc nhm 2 c kt ni TDT2+: nhng, cc, mi, mi, tng: TDT2+ Nh vy, cc cm t sau c on nhn: - nhng ci bn, tng ci gh S t trc danh t Danh t c kt ni: ST_DT- S t c kt ni: ST_DT+ Chngtarngstngtrcdanhtkhngthcngxuthinvicct thuc nhm 2 (nhng, cc, mi, mi,), vy ta c cc kt ni: bn, gh: {TDT3-} & {TDT2- or ST_DT-} mt, hai, ba, bn: ST_DT+ Nhng cm t sau s c on nhn: - ba ci bn, hai ci gh Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 28 - V nhng cm t sau s khng c on nhn: * nhng ba ci bn * mi bn ci gh Thnh phn ph th nht (-3) Danh t c kt ni TDT1-:bn, gh: {TDT3-} & {TDT2- or ST_DT-} & {TDT1-} Cc t trong nhm thnh phn ph th nht c lin kt TDT1+ tt c, tt thy, ton b, ton th: TDT1+ Cc lut ny gip ta on nhn c: - tt c nhng ci bn - ton b mi ci gh - tt c ba ci bn - ton b bn ci gh 2.3.1.2. Thnh phn sau danh t Thnh phn sau danh t th nht (1) Danh t ph ng sau danh t trung tm v tr sau th nht c kt ni SDT1-. Danh t c thm kt ni SDT1+ kt ni vi thnh phn ph ny. Khi danh t c kt ni SDT1- th khng th kt ni vi cc thnh phn ph trc v sau k trn, v vy ta c: bn, gh, ging, l xo: SDT1- or ({TDT3-} & {TDT2- or ST_DT-} & {TDT1-} & {SDT1+}) Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 29 - Ta c: Lut ny cho php on nhn trng hp nh trn v trnh trng hp nhp nhng khi cc thnh phn ph ng trc v sau ca danh t th nht c on nhn l b ngha cho danh t ph ng sau (nh trong v d trn t ci khng c lin kt vi l xo). Thnh phn sau danh t th ba (3) Thnh phn ph ny kt hp vi danh t trung tm bng quan h t bng hoc v theo dng: N1 + bng/v + N2. Vi trng hp ny, ta xy dng lut nh sau:Danh t c thm kt ni SDT3+: bn, gh, ging, l xo: SDT1- or ({TDT3-} & {TDT2- or ST_DT-} & {TDT1-} & {SDT1+} & {SDT3+}) Cc t bng, v c kt ni SDT3-. kt ni cc quan h t ny vi danh t ph ng sau ta dng lin kt t tn l GT_DT kt ni cc gii t v danh t i sau. Nh vy, cc quan h t ny c thm kt ni GT_DT+: bng, v: SDT3- & GT_DT+ Danh t c thm kt ni GT_DT-: bn, gh, ging, l xo: SDT1- or ({TDT3-} & {TDT2- or ST_DT-} & {TDT1-} & {SDT1+} & {SDT3+} & {GT_DT-}) Thnh phn sau danh t th t (4) Thnh phn ph ny kt hp vi trung tm bng quan h t ca, : N1 + ca/ + N2. Danh t ta thm kt ni SDT4+: Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 30 - bn, gh, ging, l xo: SDT1- or ({TDT3-} & {TDT2- or ST_DT-} & {TDT1-} & {SDT1+} & {SDT3+} & {SDT4+} & {GT_DT-}) Quan h t ca, c kt ni SDT4- v GT_DT+: ca, : SDT4- & GT_DT+ Vi trng hp c 2 thnh phn ph th 3 v 4 cng xut hin: ytathycsnhpnhng,trnghpthnhttcabnghacho g, v trng hp th hai t ca b ngha cho bn. Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 31 - Thnh phn sau danh t th su (6) Danh t c kt ni SDT6+: bn, gh, ging, l xo: SDT1- or ({TDT3-} & {TDT2- or ST_DT-} & {TDT1-} & {SDT1+} & {SDT3+} & {SDT4+} & {SDT6+} & {GT_DT-}) Cc t thuc nhm ny c kt ni SDT6-: y, y, kia, ny: SDT6- 2.3.1.3. Danh t tng hp Nhngdanhttnghpcngcccktninhtrn,ngoitrccktni:TDT3- (ci), TDT2- (nhng, mi, mi,), ST_DT- (mt, hai, ba,) t,ninon,sngbin,trngsao:SDT1-or{TDT1-}&{SDT1+}& {SDT3+} & {SDT4+} & {SDT6+} & {GT_DT-}) 2.3.2. ng t v cm ng t 2.3.2.1. Phn u ca cm ng t Nhm cng, cn, vn (-4) Nhm , ang, s, va, mi (-3) Nhm khng, cha, chng Nhm thng, hay, nng (-1) ng t Trung tm Nhm rt, hi, kh, kh (-2) Nhm ng, ch Bng 4: Cc thnh phn ph trc ng t Nhm 1 (-4) ng t c kt ni TT4-.i, ng, chy, lm : {TT4-} Cc t trong nhm ny c kt ni TT4+. Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 32 - cng, cn, vn, c: TT4+. Nhm 2 (-3) ng t c thm kt ni TT3-.i, ng, chy, lm: {TT3-} & {TT4-} Cc t trong nhm ny c kt ni TT3+. , ang, s, va, mi, sp, sp sa: TT3+. Nhm 5 (-2) ng t c thm kt ni TT2_1-.i, ng, chy, lm: {TT2_1-} & {TT3-} & {TT4-} Cc t trong nhm ny c kt ni TT2_1+. khng, chng, cha: TT2_1+. Ch : Chng ta k hiu lin kt ny l TT2_1 m khng phi l TT2 phn bit vi mt lin kt thuc nhm 4 cng v tr ny m ta s m t di. Nhm 4 (-2) ngtcthmktniTT2_2-.Ktninykhngcngtntiviktni TT2_1-. Ch c mt s ng t mi c kh nng lin kt vi cc t trong nhm ny: Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 33 - mong,mongch,tic,hitic,s,lolng,nh,m,c,qun:{TT2_1-or TT2_2-} & {TT3-} & {TT4-} Cc t trong nhm ny c kt ni TT2_2+. rt, hi, kh: TT2_2+. Nhm 3 (-1) ng t c thm kt ni TT1-.i, ng, chy, lm: { TT1-} & {TT2_1- or TT2_2-} & {TT3-} & {TT4-} Cc t trong nhm ny c kt ni TT1+. thng, hay: TT1+. Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 34 - Nhm 6 Thnh phn nhm 6 mang ngha sai khin, khuyn nh. Khi ng t kt hp vi cc t trong nhm ny th n khng th kt hp vi cc thnh phn ph trn. ng t c thm kt ni TT5-. Kt ni ny c lin h or vi cc lin kt cn li ca ng t. i, ng, chy, lm: ({TT1-} & {TT2_1- or TT2_2-} & {TT3-} & {TT4-}) or {TT5-} ng, ch: TT5+ 2.3.2.2. Phn sau cm ng t Cc t lm, qu thng i sau mt s ng t. y cng l nhng ng t c th kt hp vi rt, hi, kh. Cc ng t c kt ni ST+ mong,mongch,tic,hitic,s,lolng,nh,m,c,qun:{TT2_1-or TT2_2-} & {TT3-} & {TT4-} & {ST+} lm, qu: ST- 2.3.3. Tnh t v cm tnh t Phn u v phn sauca cm tnh t kh ging vi phn u vphn sau cacm ng t. Tuy nhin, tnh t khng i vi ng, ch. Do , cc thnh phn trc v sau ng t c thm cc kt ni tng ng n tnh t.cng, cn, vn, c: TT4+ or TTT4+ , ang, s, va, mi, sp, sp sa: TT3+ or TTT3+ khng, chng, cha: TT2_1+ or TTT2_1+ rt, hi, kh: TT2_2+ or TTT2_2+ Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 35 - thng, hay: TT1+ or TTT1+ lm, qu: ST- or STT- Cc kt ni cho tnh t: tt,p,,xanh:{TTT1-}&{TTT2_1-orTTT2_2-}&{TTT3-}&{TTT4-}& {STT+} 2.3.4. Cc lin kt gia cm danh t, cm ng t v cm tnh t Trongphnnychngtatmhiuvicxydngcclinktktniccthnh phn cm danh t, cm ng t, cm tnh t li vi nhau to thnh cu. thng nht, trong phn ny, ta ni danh t (ng t, tnh t) ng thi ch cm danh t (cm ng t,cm tnh t). 2.3.4.1. Lin kt gia danh t v ng t Trong mu cu n n gin C(N) + V, v ng c th l ng t m t hnh ng ca danh t ng trc n. Danh t c thm kt ni DT_T+ v ng t c kt ni DT_T- ti,bn,bn,gh,ging,lxo:SDT1-or({TDT3-}&{TDT2-orST_DT-}& {TDT1-} & {SDT1+} & {SDT3+} & {SDT4+} & {SDT6+} & {GT_DT- or DT_T+}) i, ng, chy, lm: (({TT1-} & {TT2_1- or TT2_2-} & {TT3-} & {TT4-}) or {TT5-}) & {DT_T-} Danhtcngcthngsaungtlmitngtrctipchohnhng(b ng).DongtcthmktniT_DT+vdanhtcktniT_DT-.Chc nhng ng t c th kt hp vi danh t ng sau mi c kt ni ny. Mt s ng t khng i hi phi c b ng th khng c kt ni ny (VD: khc, ci, ng,) ti,bn,bn,gh,ging,lxo:SDT1-or({TDT3-}&{TDT2-orST_DT-}& {TDT1-}& {SDT1+}& {SDT3+} & {SDT4+} & {SDT6+} & {GT_DT- or T_DT- or DT_T+}) Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 36 - hc,lm,n,c,vit,nghe,gp,thy:(({TT1-}&{TT2_1-orTT2_2-}& {TT3-} & {TT4-}) or {TT5-}) & {DT_T-} & {T_DT+} Danh t v ng t cng c th lin kt vi nhau qua gii t theo dng: V + gii t + N. Vi lin kt ny, ta thm kt ni T_GT+ cho ng t v vi cc gii t ta thm kt ni T_GT-. Gii t kt ni vi danh t bng lin kt GT_DT m ta xy dng trn. hc,lm,n,c,vit,nghe,gp,thy:(({TT1-}&{TT2_1-orTT2_2-}& {TT3-} & {TT4-}) or {TT5-}) & {DT_T-} & {T_DT+} & {T_GT+} bng, v : (SDT3- or T_GT-) & GT_DT+ , di, trc, sau : (T_GT- or SDT4-) & GT_DT+ t, vo, theo, bi, cng : T_GT- & GT_DT+ ng t, ta kt ni T_GT+ c quan h & vi T_DT+. iu ny cho php ta on nhn dng cu: V + N1 + gii t + N2. Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 37 - 2.3.4.2. Lin kt gia danh t v tnh t Xt cu n dng C(N) + V(A), tnh t lm v ng b ngha cho danh t lm ch ng. V vy ta thm kt ni DT_TT+ cho danh t v DT_TT- cho tnh t: ti,bn,bn,gh,ging,lxo:SDT1-or({TDT3-}&{TDT2-orST_DT-}& {TDT1-}& {SDT1+}& {SDT3+} & {SDT4+} & {SDT6+} & {GT_DT- or T_DT- or DT_T+ or DT_TT+}) tt,p,,xanh:{TTT1-}&{TTT2_1-orTTT2_2-}&{TTT3-}&{TTT4-}& {DT_TT-} & {STT+} 2.3.4.3. Lin kt gia ng t v tnh t Cc tnh t c th i sau ng t b ngha cho tnh t. Chng biu th trng thi, th cch ca ng t. ng t c kt ni T_TT+, tnh t c kt ni T_TT-: hc,lm,n,c,vit,nghe,gp,thy:(({TT1-}&{TT2_1-orTT2_2-}& {TT3-}&{TT4-})or{TT5-})&{DT_T-}&{T_DT+}&{T_TT+}& {T_GT+} tt,p,,xanh:{TTT1-}&{TTT2_1-orTTT2_2-}&{TTT3-}&{TTT4-}& {DT_TT- or T_TT-} & {STT+} ng t, kt ni T_TT+ c quan h & v ng bn phi kt ni T_DT+. iu ny cho php ta on nhn cu trc cu: N + V + A Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 38 - 2.3.4.4. Lin kt gia ng t v ng t Mtsngtnhccngttnhthi(phi,dm,nn,nh,)ihiphic ng t trc tip ng sau n. Vi nhng ng t ny ta thm kt ni T_T+. Cc ng t cn li ngoi ng t tnh thi ta thm kt ni T_T-. phi, dm, n, nn, c th, nh: (({TT1-} & {TT2_1- or TT2_2-} & {TT3-} & {TT4-}) or {TT5-}) & {DT_T-} & T_T+ & {T_TT+} hc,lm,n,c,vit,nghe,gp,thy:(({TT1-}&{TT2_1-orTT2_2-}& {TT3-}&{TT4-})or{TT5-})&{DT_T-orT_T-}&{T_DT+}& {T_TT+} & {T_GT+} Ccngtsaikhincngcthcngttheosaunhngkhngbtbuc.VD: Thy gio ngh (hc sinh) gi trt t. ralnh,bt,btbuc,p,nip,hi,ihi,cm,chophp,yucu,ngh: (({TT1-}&{TT2_1-orTT2_2-}&{TT3-}&{TT4-})or{TT5-})& {DT_T-} & {T_DT+} & {T_T+} Ch : ng t tnh thi v ng t sai khin c mt s c im khc vi cc ng t cnlinnlutktnicachngcngkhc.Chngtascpniunyrhn trong cc phn sau ny.Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 39 - Ccngtchsbtuvktthccngcthihingttheosauhoc khng. Nhng ng t ny c th i sau ng t khc nn chng c kt ni T_T-. Bt u, tip tc, thi, dng, kt thc, dng: (({TT1-} & {TT2_1-} & {TT3-} &{TT4-})or{TT5-})&{DT_T-orT_T-}&{T_DT+orT_T+}& {T_TT+} & {T_GT+} 2.3.5. Cc cu trc cu khc Ngoi cu trc cu n dng C + V n gin m chng ta xydng trn, ta s xy dng cc lut lin kt cho cc cu trc cu khc. 2.3.5.1. Cu trc so snh Cu trc so snh bc cao nht Trong ting Vit tnh t so snh bc cao nht thng dng vi nht. Do cc tnh t ta thm kt ni TT_SS+. T nht c kt ni TT_SS- tt,p,,xanh:{TTT1-}&{TTT2_1-orTTT2_2-}&{TTT3-}&{TTT4-}& {DT_TT-} & {STT+ or TT_SS+} nht: TT_SS- Kt ni TT_SS+ c quan h or vi kt ni STT+ v t so snh bc nht khng th i vi qu hay lm. Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 40 - Cu trc so snh hn / km Cu trc so snh hn / km thng gp l: C(N1) + V(A hn/km N2). Vi tnh t c sn kt ni TT_SS+. Cc t hn, km cng c kt ni TT_SS- vcthmktniSS_DT+ktnividanhtisau.Ccdanhtcthmktni SS_DT-. hn, km: TT_SS- & SS_DT+ ti,bn,bn,gh,ging,lxo:SDT1-or({TDT3-}&{TDT2-orST_DT-}& {TDT1-}& {SDT1+} & {SDT3+} & {SDT4+} & {SDT6+} & {GT_DT- or SS_DT+ orT_DT- or DT_T+ or DT_TT+}) 2.3.5.2. Cu hi Cnhiuloicuhi.Cccuhildothcctnghivntisao,vsaolun ng u cu. Chng c kt ni THT+ ti sao, v sao: THT+ Mt s cu hi khc th cc t nghi vn lun ng cui cu, chng c kt ni THS- Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 41 - u, th no, nh th no, phi khng: THS- mt s loi cu hi khc th t nghi vn c th ng u hay cui cu, chng c kt ni THT+ hoc THS- khi no, bao gi: THT+ or THS- Cc t nghi vn ny c kt ni vi ng t trong cu. Do cc ng t c cc kt ni tng ng vi cc kt ni trn l THT- v THS+ hc,lm,n,c,vit,nghe,gp,thy:(({TT1-}&{TT2_1-orTT2_2-}& {TT3-}&{TT4-})or{TT5-})&{DT_T-orT_T-}&{T_DT+}& {T_TT+} & {T_GT+} & {THT- or THS+} 2.3.5.3.Cu phc Trongphnny,chngtatrnhbycchxydnglinktchocuphc.Cuphc trong ting Vit c th c cc lin t nh: bi v, nhng, tuy nhin, hoc khng c lin t (ch c du phy ngn cch cc mnh ). Vic thc hin lin kt vi loi cu phc th nht n gin hn v ta c th s dng cc lin t . i vi loi cu phc th hai th vic xy dng phc tp hn do s nhp nhng ca du phy. Chng ti mi xt ti loi cu phc th nht. Loi cu phc th hai cn l vn m cn gii quyt. Cc lin t u c kt ni CL+ ni n mnh ng sau n: tuy nhin, nhng, nn, cho nn, bi v, v: CL+ Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 42 - Kt ni CL c kt ni n thnh phn v ng trong cu. L do chng ti kt ni lin t ti thnh phn v ng m khng phi l thnh phn ch ng v cc mnh sau lin t thngtntivngvikhikhngcchng.Doccngtvtnhtc thm kt ni CL-. Hu ht cc lin t c th ng gia hai mnh . kt ni cc lin t ny vi mnh ng trc, chng ta thm kt ni EV- cho cc lin t ny. Cng nh kt ni CL+, kt ni EV- s kt ni vi thnh phn v ng ca mnh ng trc. tuy nhin, nhng, nn, cho nn, bi v, v: CL+ & EV- Mt s lin t (bi v, v, sau khi,) cn c th ng u mnh th nht. Lc ny ta dng lin kt CO+ thay th cho lin kt EV- kt ni lin t vi mnh th hai (ta vn gi kt ni CL+ kt ni ti mnh ngay sau lin t). Ngoi ra cn c th c du phy ng gia hai mnh . kt ni ti du phy ny cc lin t cn c thm kt ni PH+ v kt ni ny l tu chn.bi v, v, d: CL+ & {PH+} & (EV- or CO+) Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 43 - Vi trng hp cu phc c c lin t u ca hai mnh (bi v nn , mc dtuynhin)taxlnhsau:tanihailintlibnglinktQHT.Lint ng u cu c kt ni QHT+, lin t ng u mnh hai c kt ni QHT-. Cc lin t vn kt ni vi mnh sau n bng kt ni CL+. Kt hp vi cc kt ni xy dng trn, ta c : tuy nhin, nhng, nn, cho nn, bi v, v: CL+ & (EV- or QHT-) bi v, v, d: CL+ & {PH+} & (EV- or (CO+ or QHT+)) 2.3.6. Lin t kt hp Lin t kt hp nh t v gy ra mt vn vi VPLK. V d nh trong cu: Hng v Long lm v gi mn qu ny, th phi c cc kt ni t tng t Hng v Long n tng t lm v gi. Nhng cc kt ni s giao nhau. Chng ti a ra mt phng n x l trng hp cc lin t v p dng n vo chng trnh.Cc t m t v lin kt li vi nhau to thnh mt danh sch lin kt v L (ta s gi l danh sch v). V d bt, v v sch ny l ca ti th bt, v v sch l mt danh sch v, cc t bt, v, sch l cc phn t ca ny. Tathynuthaycdanhschbngtngphntcanthtaccccuthuc VPLK. V d vi cu trn ta c cc cu u thuc VPLK: bt ny l ca ti, v ny l ca ti, sch ny l ca ti.Ngoiratacnthyrngccphnttrongdanhschcvaitrcphpnhnhau trongcccutrn.Nicchkhccclinkttccphntviphncnlicacu tng ng vi n l nh nhau, v cc lin kt ca phn cn li ca cu l khng i.V d xt cu: Ti vit chng trnh v hon thnh bo co trong 3 thng. Cc cm t vit chng trnh v honh thnh bo co l cc phn t trong danh sch v. T Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 44 - cu trn ta c th tch thnh 2 cu l: Ti vit chng trnh trong 3 thng v Ti hon thnhbocotrong3thng.Chrngtrongc2cu,ccphntuniviphn cn li ca cu bng cc lin kt nh nhau (DT_T v T_GT): Cmtvnvinhnghany.Nkhngarayucunoivicctca mi phn t trong danh sch v khi chng lin kt vi nhau. iu ny dn ti nhiu cu sai ng php nhng vn c chp nhn, ng thi to rt nhiu cch lin kt i vi cu ng ng php. V d: Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 45 - th cu sau c chp nhn: Ti thch bnh m v ko ch lm. gii quyt vn ny, ta thm mt iu kin rng buc cho cc phn t trong danh sch v: mi phn t trong danh sch phi c ni vi phn cn li ca cu thng qua ng mt t ca n. Gii php ny khng phi lc no cng ng. V d: - Ti tng mt b hoa cho m v mt mn qu cho c. Cu ny s b xc nhn l khng ng ng php v t tng lin kt vi c mt b hoavchom.Domtbhoachomkhngthlphnttrongdanhsch v.HintngnycnghayxyratrongtingVitvvvyylvnmcn phi gii quyt. Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 46 - CHNG3:MHNHMARKOVNTRONGVN PHM LIN KT Trong chng 3, n s cp n m hnh Markov n (HMM), thut ton xc sut trong-ngoi (inside outside algorithm). y l nhng khi nim s phc v cho vic xy dng h thng phn tch c php v kh nhp nhng trong vn phm lin kt. 3.1. M hnh Markov n M hnh Markov n c s dng rng ri trong khoa hc, k thut v nhiu lnh vc khc (nhn dng ting ni, dch my, sinh tin hc, ti chnh, kinh t, khoa hc x hi). nh ngha : m hnh Markov n (HMM- Hidden Markov Model) l mt bin th ca my hu hn trng thi, c mt tp trng thi n Q, mt bng ch ra (quan st) O, cc xc sut chuyn (transition probabilities) A, cc xc sut ra B, v cc xc sut trng thi u H. Trng thi hin hnh khng quan st c. Thay vo , mi trng thi a ra mt output vi xc sut no B. Thng thng cc trng thi Q, v cc u ra O l bit, v vy HMM c m t nh b ba (A,B, H) Hnh 3.1 M hnh Markov n nh ngha hnh thcTp trng thi n Q = {qi}, i = 1, . . .N Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 47 - Cc xc sut chuyn A = {aij = P(qj ti t+1 |qi ti t)}, vi P(a|b) l xc sut iu kin khi xy ra b, t =1, . . .T l thi gian v qi thuc Q. Mt cch phi hnh thc, A l xc sut m trng thi tip theo l qj khi trng thi hin thi ang l qi Quan st (k hiu) O = {ok}, k =1,. . . M CcxcxutaraB={bik=bi(ok)=P(ok|qi)},yokthucO.Mtcchphihnh thc, B l xc sut m output l ok khi trng thi hin hnh l qi Cc xc sut ban u H = {pi= P(qi ti t =1)} M hnh c trng bi tp y cc tham s A = {A,B,H} Hnh 3.2 M hnh Markov theo nh ngha hnh thc M hnh Markov c s dng gii quyt ba vn c bn sau -Cung cp cho m hnh cc tham s, tnh xc sut ca dy u ra c th. Gii bng gii thut tin v li (Forward-Backward Algorithm). -Cung cp cho m hnh cc tham s, tm dy cc trng thi n c kh nng ln nht m c th sinh ra dy u ra cung cp. Gii bng gii thut Viterbi. -Cung cpdy u ra, tm tp hp c kh nng nht ca chuyn tip trng thi v cc xc sut u ra. Gii bng thut ton Baum-Welch. Trong m hnh vn phm phi ng cnh, xc sut ca cc lut c xem nh cc xc sut chuyn trng thi v cc k hiu khng kt thc trong cc lut tng ng cc trng thi n cncckhiuktthctngngviura(dliuquanst).Nhvygiiquyt vn nhp nhng khi chn cy phn tch c php ph hp, ta s s dng gii thut tin Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 48 - li a ra xc sut ca cc cy phn tch ng vi chui t u vo. Cc t trong chui t u vo y li ng vai tr nh d liu u ra quan st c. Ngoi ra vi gii thut Viterbi, t chui u vo s tm c tp hp kh nng ln nht ca cc trng thi n c th sinh ra n. R rng vic tm ra cc lut c xc sut sinh ra chui ln nht s gip ch rt nhiu trong vic la chn cy phn tch c php ph hp. Phn tip theo ca chng s lm r hn v gii thut tin li v gii thut Viterbi 3.1.1. Gii thut tin Cho ot(i) l xc sut ca dy quan st b phn Ot = {o(1), o(2). . . .o(t)} c a ra bi mi dy trng thi c th kt thc bi trng thi th i ot(i) = P(o(1), o(2), . . .o(t)| q(t) =qi) Khi y xc sut khng iu kin ca dy quan st b phn l tng ca ot(i) trn tt c N trng thi Gii thut tin l gii thut quy tnh ot(i) cho dy quan st b phnc di tng dn t. u tin, xc sut cho dy k hiu n c tnh nh l tch ca xc sut trng thi xut pht l trng thi i v xc sut pht ra k hiu o(1) t trng thi i. Sau cng thc quy c s dng. Gi s ta tnhot(i) vi t no . tnh ot+1(j) ta nhn bt k ot(i) vi xc sut chuyn t trng thi i sang trng thi j , tnh tng cc tch cho tt c cc trng thi, sau nhn kt qu vi xc sut pht ra o(t+1). Lp li qu trnh ny, ta c th ln lt tnh oT(i) sau tnh tng li cho tt c cc trng thi ta t c xc sut cn tnh. M t hnh thc Khi to 1(i) = pi bi(o(1)) , i =1, ... , N quy Trong i =1, ... , N , t =1, ... , T - 1Kt thc Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 49 - (()() ()) l xc sut pht ra dy o(1). . . .o(T) ) mi trng thi c th . 3.1.2. Gii thut li Theo cch tng t ta c bin li i xng t(i) l xc sut iu kin ca dy quan st b phn t o(t+1) ti cui dy c sn sinh bi dy cc trng thi, bt u t trng thi th i t(i) = P(o(t+1), o(t+2), ... , o(T) | q(t) = qi ).Gii thut li tnh mt cch quy cc bin li bng cch i ngc dy quan st. Gii thuttinthngdngkhitnhxcsutcamtdyquanstcphtrabimt HMM nhng c hai gii thut u kh tm ra dy trng thi ti u v nh gi cc tham s HMM.M t hnh thc Khi to T (i) = 1 , i =1, ... , NTheo nh ngha trn, T(i) khng tn ti. y l m rng cho vic quy tit = T. quy trong i =1, ... , N , t = T - 1,T - 2 , . . . , 1Kt thc R rng hai gii thut tin v li cho cng mt kt qu P(O) = P(o(1), o(2), ... , o(T) ). Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 50 - 3.1.3. Gii thut Viterbi GiithutViterbichnratrngthittnhttithiuhakhnngxyracady trng thi cho mt dy quan st cho trc Gi st(i)l xc sut ln nht ca dy trng thi di t kt thc trng thi i v a ra t quan st u tin cho m hnh ny. t(i) = max{P(q(1), q(2), ..., q(t-1) ; o(1), o(2), ... , o(t) | q(t) = qi ).}Gii thut Viterbi l gii thut quy hoch ng s dng cng mt s nh gii thut tin tr hai khc bit: 1.N s dng php cc i ha thay v ly tng trong cc bc quy v bc kt thc 2.N lu li du vt ca cc i s lm t(i) cc i vi mi ti,lu tr chng trong ma trn.c kch thc N v T. Ma trn ny dng tm dy trng thi ti u bc quay lui. Khi to:1(i) = pi bi(o(1))1(i) = 0 , i =1, ... , N Theo nh ngha trn, T(i) khng tn ti . quy:t ( j) = max i [t - 1(i) aij] b j (o(t))t( j) = arg max i [t - 1(i) aij] Kt thc:p* = max i [T( i )]q*T = arg max i [T( i )] Dy trng thi quay lui: q*t = t+1( q*t+1) , t = T - 1,T - 2 , . . . , 1 Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 51 - 3.2. Thut ton trong ngoi 3.2.1. C s l thuyt ivimtcuvdnhStuartloveshisthesis,andBarbaradoestoolmth noconngicngnhmytnhcthhiucnghacacu.Ccbccacng vic gip hiu ngha ca cu c th c m t nh sau: u tin, ta s phi phn tch cu tch t, gn nhn sau phn tch c php bit c cu trc ca cu: chia cu thnh cc mnh nh hn- y s dng t and tch cuthnhhaimnh,sauphntchccmnhthnhcccm-ylcccm danh t (Noun Phrase - NP)v cm ng t (Verb Phrase - VP). Hnh 3.3 Cy phn tch c php cu Stuart loves his thesis and Barbara does too Saukhiphntchcphp,cngvictiptheoslphntchngngha.Vicphn tchngnghasacuthnhccphpbiudinlogic.Tcuuvophntchng nghassinhraccthhinnh:StuartlovessomeonesthesisandBarbaraloves someone elses thesis, Stuart loves his own thesis and Barbara loves Stuarts thesis too and Stuart loves someones thesis and Barbara loves that thesis too.Sau bc phn tch ng ngha, ti y ta s phi la chn th hin logic no.Vic ny cn thit phi cmt la chn mang tnh phn on m vic la chn da trn xc sut. y c th la chn rng ci thesis l ca Stuart ch khng phi l ca ai khc. Ktqucuicngcavicxlngnngktrncnhgibngvichiu nghacu theo cch ca ngi. t c kt qu nh vy th thc s rt kh v tt c nhng vn k trn: phn tch c php, phn tch ng ngha Chng ta s xt tm hiu Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 52 - vvnutinvcngccoilbctinxlquantrngtrongvicxl ngn ng t nhin l phn tch c php.C hai hng gii quyt cho vn phn tch c php l s dng tp lut v da thng k. Chng ta c th p dng cng lc hai hng trn nhm c kt qu cao hn. Vi cch tip cn da thng k, chng ta phi c mt tp cccuvicccutrccthc,tpmunycngiltreebank.Micutrongtp mu u c phn tch c php v cho kt qu nh mong mun ca con ngi kt qu mu.Tp mu ny s dng hun luyn cc m hnh xc sut hoc l kim tra li ng n ca m hnh xc sut.Trong n ny, chng ta s tm hiu v hai cch tnh xc sut trong vic x l ngn ng theo hng thng k. l xc sut trong ngoi (inside-outside probability). Thut ton c gii thiu bi James K. Baker vo nm 1979. Micuscchiarathnhcct,ytcngchnhlkhiuktthctrong vn phm. ng thi theo cy phn tch c php trn v d trn th cn c cc k hiu khng kt thc nh NP, VP, N, PNXc sut trong n gin nht y l xc sut k hiu khng kt thc sinh ra k hiu kt thc.V d nh y ta c NP his thesis.Gi sxcsutnylckhongminghncmdanht(NP)thcmtcmlhis thesis. Ta cInside (his thesis,NP) = 0.0001 Ta nhn thy rng vi nhng cm t nh his thesis th ch c th l mt cm danh t.Trong khi s nhp nhng ngn ng c th xy ra bt c cm t no.V d nh y t loves c th l ng t hoc l danh t. Ta cng c xc xut trong tng ng vi mi trng hp ca t loves Inside (loves, ng t) = 0.1 Inside (loves,danh t ) = 0.01 Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 53 - Xc sut ngoi li c tnh bi cc t cn li ca cu ngoi cm ang xt. y xc sut ngoi s tnh xc sut lin kt ca cm loves vi cc cm xung quanh n Stuart vhisthesis.XcsutcuStuartloveshisthesisvilove=danhtl0.001 cn vi love = ng t l 0.001 Ta cOutside (Stuart____ his thesis, ng t) = 0.001 Outside (Stuart____ his thesis,danh t) = 0.0001 Kt qu ca php nhn xc xut trong v xc sut ngoi cho ta xc sut ca cu vi cutrctngng.XcsutcacuStuartloveshisthesisbng0.10.001= 0.0001vikhnngloveslngtvbng0.010.0001=0.000001khnng loves l danh t. Tng ca hai xc sut k trn bng 0.0001 + 0.000001 =0.000101 vi tt c cc kh nng c th c ca t loves.Ta tnh c xc sut c iu kin sau P(Stuart loves his thesis | loves l ng t) = 0.0001 / 0.000101 0.99 P(Stuart loves his thesis | loves l danh t) = 0.0001 / 0.000101 0.01 Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 54 - 3.2.2. Thut ton trong-ngoi trong vn phm phi ng cnh xc sut Nh cp phn trn, m hnh vn phm phi ng cnh xc sut G(X, , S, R) thc chp l vn phm phi ng cnh c thm xc sut ca cc lut.V d nh lut A tng ng s c xc sut P (A ).Trong : tp k hiu khng kt thc (bin)X: tp k hiu kt thc (khng giao vi ) S: k hiu u R: Tp lut c a v dng chun Chomsky. Lut s ch c cc dng i jk i m Trong i,j,k l cc k hiu khng kt thc thuc v m l k hiu kt thc thuc X. Cc xc sut P c gi tr trong khong [0,1] v vi mi k t khng kt thc i th ( ) ( ) Nhn thy a[i,j,k] l xc sut t mt k hiu khng kt thc i ra cp k hiu khng kt thc j v k .Cn b[i,m] l xc sut t mt k hiu khng kt thc i ra mt k hiu kt thc m.t cu chng ta cn phn tch l O = O1,O2,O3,OT. Giithuttrong-ngoigithuytnguncmhnhhabngmtqutrnh Markov n, phi ng cnh. Gii thut cho php vn phm c nh gi mt mc nhp nhngbtk.ytathya[i,j,k]tngngxcsutchuyntrngthiisangjvk, b[i,m] tng ng xc sut sinh ra k hiu kt thc.Cu O tngng d liu quan st c.Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 55 - Bng cch ng dng vn phm phi ng cnh xc sut ta phi quan tm hai vn chnh l on nhn v hun luyn. Bi ton on nhn lin quan n vic tnh xc sut k hiu u S sinh ra dy quan st O trong vn phm G. (

O| G) Du (*) k hiu dy suy dn gmmt hay nhiu bc. Bi ton hun luyn lin quan n vic xc nh mt tp lut ca vn phm G khi cho dy hun luyn O(1), O(2),. . . . O(Q). Tng t nh cc xc sut tin forward (o) v xc sut li backward (|)ca gii thut m hnh ha Markov, ta nh ngha cc xc sut trong (e) v ngoi (f) h tr vic phn tch ca cc vn phm phi ng cnh xc sut Markov. Ta nh ngha cch tnh cc xc sut trong (e) v xc sut ngoi (f) Hnh 3.4 Tnh xc sut trong e(s,t,i) = xc sut k hiu khng kt thc i sinh ra on O(s),..,O(t). e (s,t,i) = P (i* O(s)..O(t) | G) 1)Nu s = te(s,s,i) = P (iO(s) | G) = b [i,O(s)]. Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 56 - 2)Nu s t Lut i jk s c p dng do ta c cch tnh nh sau (xem hnh 3.4) () ()()

Xcsuttronge(s,t,i):xcsuttkhiukhngktthcisinhrakhiujvk,j sinh ra on t s n r, k sinh ra on t r+1 n t.Xc sut trong e(s,t,i) bng tng xc sut sinh ra chui Os Ot t k hiu khng kt thc i. Hnh 3.5 nh ngha xc sut ngoi Xc sut ngoi f(s,t,i) c tnh bng xc sut sinh ra i v chui t O(1) O(s-1) bn tri , chi O(t+1) O(T) bn phi. f(s,t,i) = P (SO(1)O(s-1),i,O(t+1)O(T) | G). XcsutngoibngtngxcsuttkhiuxutphtSsinhrakhiukhngkt thc I v xc sut sinh chui ngoi khong s t. Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 57 - Nhn thy k hiu khng kt thc i s nm v phi ca hai suy din j ikv i ki (xem hnh 3.6) Hnh 3.6 Tnh xc sut ngoi ()[()()

()()

]V() {

Xcsuttrongevthctnhtheohngtdiln(bottom-up)cnxcsut ngoi f c tnh theo hng t trn xung (top- down).Da vo xc sut trong v ngoi ta tnh c xc sut cho cu( ) () ()

Vi O = O1,O2,O3,OT , s=1,t= T ta c: ( ) () () ()

Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 58 - CHNG4:NGDNGTRONGPHNTCHC PHP VN PHM LIN KT 4.1. Bi ton phn tch c phpBi ton phn tch c php l bi ton c bn trong x l ngn ng t nhin. Gii quyt bi ton l gii quyt hai vn chnh l-Chui u vo c c on nhn hay khng (Bi ton on nhn) -Tm cy phn tch c php tt nht cho cu S dng cc thut ton m hnh Markov, vic gii quyt bi ton con on nhn s s dng thut ton tnh xc sut trong. V mt l thuyt th r rng nu xc sut trong ln hn 0 th chui u vo c on nhn. Vic cy phn tch c php tt nht s s dnggiithutViterbi.GiithutViterbichophptnhxcsutlnnhtnhngsn xut c s dng sinh ra chui hun luyn. Khi tm c cc sn xut c s dngsinhrachuihunluynthvictmcycphpduynhtvphhpl khng kh khn. 4.2.Bitonkhnhpnhngtrongphntchcphpvnphmphing cnh Bi ton kh nhp nhng trong phn tch c php c th m t nh sau: cho mt cu vi nhiu cch phn tch khc nhau, ch ra cch phn tch ng nht Khithchinphntchcphptrnmhnhvnphmphingcnh,vnnhp nhng c gii quyt bng m hnh vn phm phi ng cnh xc sut trong mi sn xutca vn phm c lu tr km vi xc sut ca sn xut . Nu ngh mt cch n gin, cy phn tch ng l cy c xc sut ln nht, ta s gp vn sau: -Mi cu c th c rt nhiu phn tch, theo Manning et al ,1999 s nhng phn tch ca mt cu c th ln n hm m theo di cu. Vy vic tnh xc sut v tm ra cu c xc sut ln nht l khng kh thi -Mt cch n gin, xc sut ca sn xut o | c th c tnh theo cng thc(o|o) (o|) (o)

(o|)(o) Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 59 - Cng thc ny ch chnh xc khi treebank ln. Nu cha c treebank, c th chn mtbngliuvphntchtnh.Tuynhin,vntralmtculicthc nhiu phn tch, phn tch no s c chn. Nh vy ta gp phi bi ton kiu con g v qu trng Vic s dng gii thut trong-ngoi (inside-outside) gip gii quyt vn ny Gii thut trong- ngoi do Baker xut (1979) vi cc khi nim xc sut trong v xc sut ngoi. Trc ht ta xt gii thut trong phm vi vn phm phi ng cnh xc sut. 4.2.1 Gii thut tnh xc sut trong gii quyt bi ton on nhn. Gii thut trong- ngoi thc chtl ng dng ca cc gii thut tin- li ca m hnh Markov n (HMM). Trc khi i vo chi tit, chng ti a ra mt s quy c v k hiu -Tp k hiu khng kt thc ca vn phm c k hiu l {N1,. . . Nn}. K hiu u l N1 -Tp k hiu kt thc ca vn phm l {w1,. . . wV} -Cu c phn tch w1. . . wm.-wpq phn cu t t th p n t th q -

l k hiu khng kt thc Nj sinh ra dy cc t v tr p n v tr q trong cu -oj(p, q) xc sut ngoi -|j(p, q) xc sut trong c dng m t sn xut ng thi cng biu th mt bc suy dn (vit li) -Nj * wa. . . wb c th k hiu l yield(Nj) = wa . . . wb. Theom hnh HMM,cn nh nghama trn tham s. Vivn phmphi ng cnh, ma trn tham s l a[i, j,k] = P(Ni NjNk|G) b[i,m] = P(Ni wm|G) VnphmphingcnhG,khnggimtngqutcgithitldngchun Chomsky. iu kin sau l tha mn Sinh vin thc hin: Phm Hng, Kho 52 Lp Khoa Hc My Tnh- 60 - Rng buc ny c ngha l mi k hiu khng kt thc sinh ra hoc 2 k hiu khng kt thc hoc 1 k hiu kt thc (do VP dng chun Chomsky) |j(p, q) l xc sut k hiu khng kt thc i sinh ra quan st (dy cc t) wp,. . . .wq |j(p, q) = P(wpq |

G) Nhvy,vigiithuttrong(insidealgorithm),xcsutcamtcung(c sn sinh bi vn phm) l P(w1m|G) = P(N1

w1m|G) = |1(1, m) |1(1, m) > 0 th cu u vo w1wm c on nhn. Th tc quy tnh |j(p, q) 1)C s: Tnh |j(k,k). y chnh l xc sut ca lut Nj wk. |j(k, k) = P(wk |

G) = P(Nj wk|G) 2)Quy np: tnh |j(p, q) , p