Upload
dao-giang-nguyen
View
215
Download
0
Embed Size (px)
Citation preview
7/22/2019 L2-Gioi Thieu WEKA
1/18
Khai Ph DLiu
Trng i hc Bch Khoa H Ni
Vin Cng ngh Thng tin v Truyn thng
Nm hc 2012-2013
7/22/2019 L2-Gioi Thieu WEKA
2/18
Ni dung mn hc:
Gii thiu v Khai ph d liu
Gii thiu v cng c WEKA
Tin x l d liu
Pht hin cc lut kt hp
Cc k thut phn nhm
Lc cng tc
2Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
3/18
EKA Gii thiu WEKA l mt cng c phn mm vit bng Java, phc v
lnh vc hc my v khai ph d liu
Cc tnh nng chnh Mt tp cc cng c tin x l d liu, cc gii thut hc my,
,
Giao din ha (gm c tnh nng hin th ha d liu)
Mi trng cho php so snh cc gii thut hc my v khaip u
C th ti v t a ch:
. . . .
3Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
4/18
WEKA Cc mi trng chnh Simple CLI
ao n n g n u ng n n -
Explorer (chng ta s ch yu s dng mi trng ny!)
khm ph d liu
ExperimenterMi trng cho php tin hnh cc th nghim v thc hin cckim tra thng k (statistical tests) gia cc m hnh hc my
now e ge owMi trng cho php bn tng tc ha kiu ko/th thitk cc bc (cc thnh phn) ca mt th nghim
4Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
5/18
WEKA Mi trng Explorer
5Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
6/18
WEKA Mi trng Explorer Preprocess
chn v thay i (x l) d liu lm vic Classify
hun luyn v kim tra cc m hnh hc my (phn loi, hochi u /d on
Cluster
hc cc nhm t d liu (phn cm)
khm ph cc lut kt hp t d liu Select attributes
xc nh v la chn cc thuc tnh lin quan (quan trng)nht ca d liu
Visualize
xem (hin th) biu tng tc 2 chiu i vi d liu
6Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
7/18
WEKA Khun dng ca tp d liu WEKA chlm vic vi cc tp tin vn bn (text) c khun
dngARFF
V d ca mt tp d liu@r el at i on weat her
Tn ca tpd liu
@at t r i but e out l ook {sunny, over cast , r ai ny}
@at t r i but e t emper at ur e r eal
Thuc tnh
kiu nh danhat t r but e hum d t y r eal
@at t r i but e wi ndy {TRUE, FALSE}
@at t r i but e pl ay {yes, no}
Thuc tnh kiu s
Thuc tnh phn lp
@dat a
sunny, 85, 85, FALSE, no
over cast 83 86 FALSE es
cui cng)
Cc v d
7Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
8/18
WEKA Explorer: Tin x l d liu D liu c thc nhp vo (imported) t mt tp tin c
khun d n : ARFF CSV
D liu cng c thc c vo t mt a chURL, hoc tmt c s d liu thng qua JDBC
Cc cng c tin x l d liu ca WEKA c gi l filters
Ri rc ha (Discretization)
Ly mu (Re-sampling)
La chn thuc tnh (Attribute selection)
uy n rans orm ng v p om n ng c c u c n
Hy xem giao din ca WEKA Explorer
8Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
9/18
WEKA Explorer: Cc b phn lp (1) Cc b phn lp (Classifiers) ca WEKA tng ng vi
(phn lp) hoc cc i lng kiu s (hi quy/d on)
Nave Bayes classifier and Bayesian networks
Decision trees
Instance-based classifiers
Support vector machines
Neural networks
y xem g ao n c a xp orer
9Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
10/18
WEKA Explorer: Cc b phn lp (2) La chn mt b phn lp (classifier)
La chn cc ty chn cho vic kim tra (test options) Use training set. B phn loi hc c sc nh gi
Supplied test set. S dng mt tp d liu khc (vi tp
hc) cho vic nh gi Cross-validation. Tp d liu sc chia u thnh k tp
(folds) c kch thc xp xnhau, v b phn loi hc c sc nh gi bi phng php cross-validation
Percentage split. Chnh t l phn chia tp d liu i vivic nh gi
10Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
11/18
WEKA Explorer: Cc b
phn l
p (3) More options
.
Output per-class stats. Hin th cc thng tin thng k vprecision/recall i vi mi lp
.
(entropy) ca tp d liu
Output confusion matrix. Hin th thng tin v ma trn li phn lp(confusion matrix) i vi phn lp hc c
Store predictions for visualization. Cc don ca b phn lpc lu li trong b nh, c thc hin th sau
Output predictions. Hin th chi tit cc don i vi tp kim tra
Cost-sensitive evaluation. Cc li (ca b phn lp) c xc nhda trn ma trn chi ph (cost matrix) chnh
Random seed for XVal / % Split. Chnh gi trrandom seedc sng c o qu r n a c n ng u n n c c v c o p m ra
11Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
12/18
WEKA Explorer: Cc b
phn l
p (4) Classifier output hin th cc thng tin quan trng
. ,
d liu, s lng cc v d, cc thuc tnh, v f.f. th nghim Classifier model (full training set). Biu din (dng text) ca
Predictions on test data. Thng tin chi tit v cc don ca
b phn lp i vi tp kim tra ummary. c ng v m c c n x c c a p n p,
i vi f.f. th nghim chn
Detailed Accuracy By Class. Thng tin chi tit v mc chnh
x c c a p n p v m p Confusion Matrix. Cc thnh phn ca ma trn ny th hin s
lng cc v d kim tra (test instances) c phn lp ng vp n p sa
12Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
13/18
WEKA Explorer: Cc b
phn l
p (5) Result list cung cp mt s chc nng hu ch
.
c vo trong mt tp tin nh phn (binary file) Load model.c li mt m hnh c hc trc t mt
Re-evaluate model on current test set.nh gi mt m hnh
(b phn lp) hc c trc i vi tp kim tra (test set)n
Visualize classifier errors. Hin th ca s biu th hin cckt qu ca vic phn lp
Cc v dc phn lp chnh xc sc biu din bng k hiubi du cho (x), cn cc v d b phn lp sai sc biu dinbng k hiu vung ()
13Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
14/18
WEKA Explorer: Cc b
phn cm (1)
Cc b phn cm (Cluster builders) ca WEKA tng
ti vi mt tp d liu
Expectation maximization (EM)
k-Means ...
Cc b phn cm c thc hin th kt qu v so
s n v c c cm p cHy xem giao din ca WEKA Explorer
14Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
15/18
WEKA Explorer: Cc b
phn cm (2)
La chn mt b phn cm (cluster builder)
a c n c p n cm c us er mo e Use training set. Cc cm hc c s c kim tra i vi tp hc
Supplied test set. S dng mt tp d liu khc kim tra cc cmhc c
Percentage split. Ch nh t l phn chia tp d liu ban u cho vic
xy dng tp kim tra u v u . x
hc c i vi cc lp c ch nh
Store clusters for visualization
Lu li cc b phn lp trong b nh, c th hin th sau
Ignore attributes
15Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
16/18
WEKA Explorer: Lut k
t h
p
La chn mt m hnh (gii thut) pht hin lut kt hp ssoc a or ou pu
Run information. Cc ty chn i vi m hnh pht hin lutkt hp, tn ca tp d liu, s lng cc v d, cc thuc tnh
Associator model (full training set). Biu din (dng text) catp cc lut kt hp pht hin c
h tr ti thiu (minimum support) tin cy ti thiu (minimum confidence)
Kch thc ca cc tp mc thng xuyn (large/frequentitemsets
Lit k cc lut kt hp tm c
Hy xem giao din ca WEKA Explorer
16Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
17/18
WEKA Explorer: L
a ch
n thuc tnh
xc nh nhng thuc tnh no l quan trng nht
Trong WEKA, mt phng php la chn thuc tnh(attribute selection) bao gm 2 phn: .
ph hp ca cc thuc tnh
Vd: correlation-based, wrapper, information gain, chi-,
Search Method. xc nh mt phng php (th t) xt ccthuc tnh
Vd: best-first, random, exhaustive, ranking,
Hy xem giao din ca WEKA Explorer
17Khai Ph DLiu
7/22/2019 L2-Gioi Thieu WEKA
18/18
WEKA Explorer: Hin th
d
liu
Hin th d liu rt cn thit trong thc tGip xc nh mc kh khn ca bi ton hc
WEKA c th hin th Mi thuc tnh ring l (1-D visualization) -
Cc gi tr (cc nhn) lp khc nhau sc hin th
bng cc mu khc nhau an r J er r v c n r r ng n,
khi c qu nhiu v d (im) tp trung xung quanh mtv tr trn biu
Tnh nng phng to/thu nh (bng cch tng/gim gi trca PlotSize v PointSize)
Hy xem giao din ca WEKA Explorer
18Khai Ph DLiu