Download docx - Knn

Transcript

Thut ton K Lng ging gn nht (K-Nearest Neighbors)1. Gii thiu thut ton K-Nearest Neighbors:K-Nearest Neighbors algorithm (KNN) c s dng rt ph bin trong lnh vc Data Mining. KNN l phng php phn lp cc i tng da vo khong cch gn nht gia i tng cn xp lp vi tt c cc i tng trong Training Data.Mt i tng c phn lp da vo k lng ging ca n. K l s nguyn dng c xc nh trc khi thc hin thut ton. Ngi ta thng dng khong cch Euclidean tnh khong cch gia cc i tng.2. Thut ton KNN dng trong phn lp c m t nh sau: Xc nh gi tr tham s K (s lng ging gn nht) Tnh khong cch gia i tng cn phn lp vi tt c cc i tng trong training data (thng s dng khong cch Euclidean, Cosine) Sp xp khong cch theo th t tng dn v xc nh k lng ging gn nht vi i tng cn phn lp Ly tt c cc lp ca k lng ging gn nht xc nh Da vo phn ln lp ca lng ging gn nht xc nh lp cho i tng.3. p dng cho bi ton phn loi vn bn: tng: Khi cn phn loi mt vn bn mi, thut ton s tnh khong cch (khong cch Euclidean, Cosine) ca tt c cc vn bn trong tp hun luyn n vn bn ny tm ra k vn bn gn nht (gi l k lng ging), sau dng cc khong cch ny nh trng s cho tt c ch . Trng s ca mt ch chnh l tng tt c cc vn bn trong k lng ging c cng ch , ch no khng xut hin trong k lng ging s c trng s bng 0. Sau cc ch s c sp xp theo mc gim dn v cc ch c trng s cao s c chn l ch ca vn bn cn phn loi.Khong cch gia 2 vn bn chnh l tng t gia 2 vn bn , 2 vn bn c gi tr tng t cng ln th khong cch cng gn nhau.V d: Dng cng thc Cosine tnh tng t gia 2 vn bn:

Vn bn A: Ti l hc sinh.Vn Bn B: Ti l sinh vin.Vn bn C: Ti l gio vin.Biu din vn bn theo vector:Tilhcsinhvingio

Vn bn A111100

Vn bn B110110

Vn bn C110011

Vector A = (1,1,1,1,0,0)Vector B = (1,1,0,1,1,0)Vector C = (1,1,0,0,1,1)

iu cho thy vn bn A tng t vn bn B hn so vi C.Hng dn ci t:Thng thng cc thut ton s gm 2 giai on hun luyn v phn lp, ring i vi thut ton KNN do thut ton ny khng cn to ra m hnh khi lm trn tp hun luyn cc vn bn c nhn/lp sn, nn khng cn giai on hun luyn (giai on hun luyn ca KNN l gn nhn cho cc vn bn trong tp hun luyn bng cch gom nhm cc vn bn c vector c trng ging nhau thnh cng 1 nhm).M t vector c trng ca vn bn: L vector c s chiu l s c trng trong ton tp d liu, cc c trng ny i mt khc nhau. Nu vn bn c cha c trng s c gi tr 1, ngc li l 0.u vo: Vector c trng ca vn bn cn phn lp. Cc vector c trng ca vn bn trong tp hun luyn (Ma trn MxN, vi M l s vector c trng trong tp hun luyn, N l s c trng ca vector). Tp nhn/lp cho tng vector c trng ca tp hun luyn.u ra: Nhn/lp ca vn bn cn phn loi.Qu trnh phn lp gm cc bc sau: Xc nh gi tr tham s K (s lng ging gn nht). Ty vo mi tp hun luyn (s lng mu trong tp hun luyn, khng gian tp mu c ph ht cc trng hp) m vic chn s K s nh hng n kt qu phn lp. Ln lt duyt qua cc vn bn (c i din bng vector c trng ca vn bn) trong tp hun luyn v tnh tng t ca vn bn vi vn bn cn phn lp. Sau khi c mng cc gi tr lu tng t ca vn bn cn phn lp vi cc vn bn trong tp hun luyn, ta sp xp tng t cc vn bn theo th t gim dn (lu y l tng t, tng t cng ln tc l khong cch cng gn) v ly ra k vn bn u tin trong mng (tc l k vn bn gn vi vn bn cn phn lp nht). Khi to mng A c di bng s phn lp lu s vn bn ca mi lp. Duyt qua k vn bn, m s vn bn trong tng phn lp v lu vo mng. Duyt qua mng A, tm lp c s vn bn nhiu nhtv chn l lp cho vn bn mi.4. Ti liu tham kho:[1] Nguyn Trn Thin Thanh, Trn Khi Hong, Tm hiu cc hng tip cn bi ton phn loi vn bn v xy dng phn mm phn loi tin tc bo in t, Kha lun c nhn tin hc 2005.


Recommended