Tham khao do thi khoang cach

Embed Size (px)

DESCRIPTION

Tham khao do thi khoang cach cua van ban

Citation preview

  • th khong cch ca vn bn v mt s ng dngKTLabH Quang ThyPhng Th nghim Cng ngh Tri thc - KTLabTrng HCN, i hc Quc gia H Ni,Ngy 31/5/2014**

  • Ni dung th khong cch v ng dng o Google chun v ng dngTin hc x hi

    ***

  • th khong cch: gii thiu Charu C. Aggarwal, Peixiang Zhao (2013). Towards graphical models for text processing. Knowl. Inf. Syst. 36(1): 1-21. Charu C. AggarwalResearch Scientist, IBM T. J. Watson Research Center in Yorktown Heights, BSc. IIT Kanpur (1993). PhD. MIT (1996).Awards: IBM Corporate (2003), IBM Outstanding Innovation (2008), IBM Research Division (2008), IBM Outstanding Technical Achievement (2009).Associate editor of Journals : ACM TKDD, Data Mining and Knowledge Discovery, ACM SIGKDD Explorations, and the Knowledge and Information Systems.http://www.informatik.uni-trier.de/~ley/pers/hd/a/Aggarwal:Charu_C=: 60 bi tp ch, 135 bi hi ngh, 2 sch, Peixiang ZhaoAssistant Professor, Florida State Univ. at TallahasseeBsc (2001), MSc (2004), PhD (2007) HK, PhD (2012) UIUChttp://www.informatik.uni-trier.de/~ley/pers/hd/z/Zhao:Peixiang.html: 4 bi tp ch, 16 bi hi ngh*

  • th khong cch: nh nghaPht biu y khc i cht so vi bi boCho ng liu C = {ti liu min ng dng} v V ={t c ngha trong C}. V d, V = {t trong C} \ {t dng}Vi mt ti liu D: th khong cch bc k ca D trn C l th G(C, D, k) = (N(C), A(D, k)) vi N(C) l tp nh, A(D,k) l tp cungN(C) = {nt v: vV v v xut hin trong D}. vV: xut hin 01 ln N(C). Trong N(C): gi nt i hoc t i. D nhn c t D sau khi loi b mi t V, gi nguyn th t cc t. Tp cung A(D,k) cha cung (i ,j) c hng t nt i ti nt j nu nh t i i trc t j vi khong cch k t trong D. Cung (i, j) c trng s m nu c nhiu nht m ln t i xut hin trc t j vi khong cch k trong D.*

  • th khong cch: v d t bi boV = {t ting Anh} \ {t dng}D ly t bi ng dao Mary had a little lamb l Mary had a little lamb, little lamb, little lamb, Mary had a little lamb, its fleece was white as snow. D=Mary little lamb, little lamb, little lamb, Mary little lamb, fleece white snow. Cc th khong cch bc 0,1,2:Bc 0: cc t n t kt ni. Bc k+1: thm cung v thm trng s*

  • th khong cch: tnh chtTnh cht tha:f(D): s lng t c ngha trong D k c bin(D): s lng t phn bit trong D chnh l s nt ca th |N(C)|n(D)*(k+1) k*(k-1)/2 |A(D,k)| f(D)*(k+1)Chng minh trong bi bo. Tnh phng ca ti liu ch cha t phn bit th khong cch bc khng qu 2 tng ng vi cc ti liu ch cha cc t phn bit l cc th phng (planar). Tnh n iuD1 l on con ca D2 G(C, D1, k) l th con ca G(C, D2, k).Chng minh trong bi bo.Lu : Ngc li khng lun ng G(C, D1, k) th con G(C, D2, k) khng D1 l on con ca D2: phc tp cu trc nm bt t ca th khong cch! Cc k hu ch cho truy hi theo on text chnh xc: Truy hi thng tin da trn th: xc nh bao ng ca tp vn bn cn tm: hiu qu hn trnh din khng gian vector nh ch s theo t kha.*

  • th khong cch: tnh chtTnh bo tn on giaoD1, D2 c xu chung F G(C, D1, k) v G(C, D1, k) chia s th con G(C, F, k).Suy din trc tip t tnh n iu. Tm kim ti liu c on v mt ch Gi thit: Mt ch c c trng bi tp S gm m t kha lin thng xy dng clique_c hng_hai chiu cha cc nt (t) ny.clique_c hng_hai chiu: mi cp nt u tn ti cung hai hng ( th y ) v mt chu trnh n ni mi nh clique.Tn s kt hp giao theo cung ca clique vi th G(C, D, k) cho bit s ln cc t kha tng ng xut hin trong D hnh vi cc b ca ch . Tnh cht xut hin clique hai chiuCho F1 l clique hai chiu cha m nt v D l ti liu thuc C. Cho E l giao theo cung ca tp cc cung ca G(C, D, k) c cha trong F1. Gi q l tng cc tn s ca cc cung trong E th q chnh l s ln cc t kha trong cc nt tng ng vi F1 xut hin vi khong cch k trong ti liu.

    *

  • TKC: Xc nh ch khc nhau Xc nh cc on lin quan cc ch khc nhauS1, S2 : tp t kha tng ng vi cc ch khc nhau.F1, F2: hai clique tng ng vi S1 v S2Gi F12 l clique cha cc nt ca S1+S2Xt E1 (D), E2 (D), E12 (D) l giao theo cung ca G(C, D, k) vi F1, F2, F12. E12 (D) l bao ng cc cung ca E1 (D) E2 (D)Tnh cc b cc ch khi tn s cc cung trong E1(D), E2(D) ln nhng tn s cc cung E12(D)-(E1(D) E2(D)) l nh. Bi ton xc nh tnh cc b cc ch Tm cc ti liu D m tn s theo cung ca (E1(D) E2(D)) l ln hn s1 v tn s theo cung trong E12(D)-(E1(D) E2(D)) l nh hn s2.

    *

  • TKC: phng n v hngnh ngha th khong cch v hng bc k ca ti liu D theo C l th G(C, D, k) = (N(D), A(D, k)): N(D) nh trng hp c hngA(D,k) l tp cung tng t nh trng hp c hng song c tnh c hai chiu (v trc v v sau).V d, th khong cch v hng bc 2 ca ti liu trong v d trc: th KC v hng nhn c bng cch i cung c hng thnh v hng. th v hng gi thng tin khong cch v b qua thng tin th t .Cha cp ng dng th KC v hng song (i) d thi hnh thun li cho KPDL; (ii) *

  • th khong cch: ng dng KPDLHai phng n p dngk thut c vi thay biu din ti t bng biu din th khong cch: d dng thi hnh.Dng cho khai ph DL v qun l cu trc: tng tc d dng hn cc phng php khai ph cu trc phc tp tnh tonS th khong 4-5 ln so vi biu din sn cC th lm chm song khng qu nng n. *

  • th khong cch: cc ng dng KPDLPhn cmCc thut ton phn cm lp hoc phn cp.da trn ht ging.Thut ton EM. Phn lp.Phn lp Bayes th ngyPhn lp k-lng ging gn nht hoc phn lp trng tmPhn lp da trn lut. nh ch s v truy hientire structural fragmentsTm kim chnh xc: cpTm kim gn ng Tm kim th con thng xuynPht hin o vn (Plagiarism detection)GA, GB th khong cch hai ti liuMCG (GA, GB) l th con chung ln nht gia hai ti liu..*

  • th khong cch: Mt s bn lunKhong cchTnh sau khi loi b t dng ?L do ? Nn chng tnh khong cch gi nguyn t dng. p dng tm kim mu trong nht k s kinCc hnh ng l t kha.Xy dng th khong cchMu tun t: Phn cmMu c th t: Pht hin th con thng xuyn.p dng cho cc bi ton x l vn bnTm tt vn bn: Biu din cu, biu din vn bn theo th khong cch, tnh quan trng, tng t hai cu Thay nt c ch s bng ch ..p dng cho phn lp a nhn, a th hin vn bnBiu din vn bn qua th khong cchp dng tnh cht cc b ca ch *

  • p dng khai ph mu t nht k s kinHai thch thc ca KPQT

    C2. i ph vi nht k s kin phc tp vi c trng a dngC4. i ph vi sai lch khi nim~ i ph vi nht k s kin qu ln

    Mt s ti liu nghin cu

    [Aalst13] Wil M. P. van der Aalst (2013). A General Divide and Conquer Approach for Process Mining. FedCSIS 2013: 1-10.[BA12a] R. P. Jagadeesh Chandra Bose,Wil M. P. van der Aalst (2012).Process diagnostics using trace alignment: Opportunities, issues, and challenges. Inf. Syst. 37(2): 117-141.[BAZP11]c R. P. Jagadeesh Chandra Bose, Wil M.P. van der Aalst, Indre Zliobaite and Mykola Pechenizkiy (2011). Handling Concept Drift in Process Mining.CAiSE 2011: 391-405.[Bose12] R. P. Jagadeesh Chandra Bose (2012). Process Mining in the Large: Preprocessing, Discovery, and Diagnostics. PhD Thesis, Eindhoven University of Technology, The Netherlands.*[Manifesto12] Wil van der Aalst et al. (2012). Process Mining Manifesto, BPM 2011 Workshops (Part I, LNBIP 99), pp. 169194.

  • Khai ph mu: Tru tng ha s kin *Tru tng ha s kinAbstractions of Eventsd liu s kin ni ti vt quy trnh qu c th hoc/v c nhiu mc tru tngXu hnh ng c th hnh ng gn vi quy trnh hn

    [Bose12] R. P. Jagadeesh Chandra Bose (2012). Process Mining in the Large: Preprocessing, Discovery, and Diagnostics. PhD Thesis, Eindhoven University of Technology, The Netherlands

  • Khai ph mu: Phn cm vt*Phn cm vtTrace ClusteringCc vt c tnh tng ng

    [Bose12] R. P. Jagadeesh Chandra Bose (2012). Process Mining in the Large: Preprocessing, Discovery, and Diagnostics. PhD Thesis, Eindhoven University of Technology, The Netherlands

  • Khai ph mu: Tin ha quy trnh*Tin ha quy trnhConcept DriftQuy trnh thay i theo thi gianCc vng i qu trnh kinh doanh khc nhau

    [Bose12] R. P. Jagadeesh Chandra Bose (2012). Process Mining in the Large: Preprocessing, Discovery, and Diagnostics. PhD Thesis, Eindhoven University of Technology, The Netherlands

  • Tru tng ha qu trnh kinh doanh*

    [Smir11] Sergey Smirnov (2011). Business Process Model Abstraction. PhD Thesis, The University of Potsdam.

  • 2. Khong cch Google chun v ng dngCc ti liu lin quan Rudi Cilibrasi, Paul M. B. Vitnyi (2004). The Google Similarity Distance Automatic Meaning Discovery Using Google. CoRR abs/cs/0412098.Rudi Cilibrasi, Paul M. B. Vitnyi (2007). The Google Similarity Distance. IEEE Trans. Knowl. Data Eng. 19(3): 370-383. C 1036 citation trong Google Scholar.Paul M. B. Vitnyi (2012). Information Distance: New Developments. CoRR abs/1201.1221.Andrew R. Cohen, Paul M. B. Vitnyi (2013). Normalized Google Distance of Multisets with Applications. CoRR abs/1308.3177.Cc tc giPaul M. B. Vitnyi: DBLP c 76 bi tp ch, 69 bi hi ngh, 69 bi thng bo, http://www.informatik.uni-trier.de/~ley/pers/hd/v/Vit=aacute=nyi:Paul_M=_B=.htmlRudi Cilibrasi: 4 bi hi ngh, 6 bi hi ngh, 9 bi thng bo, . http://www.informatik.uni-trier.de/~ley/pers/hd/c/Cilibrasi:Rudi.html*

  • Khong cch Google chunLp luni tng nhn c theo ngha en cc t: t chc gene ACGT ca chut hoc vn bn ni dung ca truyn Chin tranh v Ha bnh ca Lev Tolxtoi. i tng nhn c theo tn gi ca n: cu to gene ACGT ca chut hoc vn bn CT&HB ca Lev Tolxtoi. i tng ch nhn bit bng tn nh home hoc red khi m ch ci cha ni iu g.S dng tri thc min o tng t gin tip. Thng gp, v d nh TAC: Hai thnh phn (Track) ca TAC 2014 (http://www.nist.gov/tac/) l Knowledge Base Population (KBP) v Biomedical Summarization (BiomedSumm). Khong cch thng tin chun Cho hai xu x v y: vi K(x), K(y), K(x,y) phc tp Kolmogorov, di bit ca CT tnh ngn nht sn ra xu x, y., xy.. E(x,y) thc s l mt khong cch: ba tnh cht*

  • Khong cch Google chunLp luni tng nhn c theo ngha en cc t: t chc gene ACGT ca chut hoc vn bn ni dung ca truyn Chin tranh v Ha bnh ca Lev Tolxtoi. i tng nhn c theo tn gi ca n: cu to gene ACGT ca chut hoc vn bn CT&HB ca Lev Tolxtoi. i tng ch nhn bit bng tn nh home hoc red khi m ch ci cha ni iu g.S dng tri thc min o tng t gin tip. Thng gp, v d nh TAC: Hai thnh phn (Track) ca TAC 2014 (http://www.nist.gov/tac/) l Knowledge Base Population (KBP) v Biomedical Summarization (BiomedSumm). Khong cch thng tin chun Khong cch thng tin hai xu x v y: vi K(x), K(y), K(x,y) phc tp Kolmogorov, di bit ca CT tnh ngn nht sn ra xu x, y., xy.. E(x,y) thc s l mt khong cch: ba tnh chtKhong cch thng tin chun: *

  • Khong cch Google chunKhong cch nn chunKhong cch thng tin chun l cha tnh ton c (uncomputable) . Dng chng trnh nn d liu c sn thay th K.Cho b nn C: C(x) l di nn ca xKhong cch nn chun

    Khong cch Google chun

    G(x), G(x,y) l m ha Google ca x v (x,y)x= {trang web cha xu x}; xy={trang web cha c 2 xu}

    M ha Google

    *

  • CM N*KT-SISLAB*