Upload
yung-ting-chen
View
110
Download
4
Embed Size (px)
Citation preview
: http://l.pulipuli.info/nccu/17-tm2017/3/31
mailto:[email protected]://l.pulipuli.info/nccu/17-tm
2
3
~~
(1)...
P4
P1
P2 /
P3
P4
P5
4
5
P4 P6
6
Weka
https://www.youtube.com/watch?v=-W3pnicVgn0
7
8
1.
2.
3.
4.
5.
6.
9
http://l.pulipuli.info/nccu/17-tm
10
Weka 3.8
()
CSV to ARFFARFF to CSV
http://l.pulipuli.info/nccu/17-tmhttps://docs.google.com/document/d/1FMJz4rWNGuJnSVEwJG5vFx_0lpLk0VqNTCO0oEHokPs/pubhttps://docs.google.com/spreadsheets/http://pulipulichen.github.io/jieba-js/weka_csv_arff.htmlhttp://pulipulichen.github.io/jieba-js/weka_arff_csv.html
11
Part 1.
12
13
14
cm g
2.98 0.74
3.5 0.76
15
16
1.
2.
3.
? !
17
1
? !
46
26
: 8
: 2
2
,,,,,,
18
2
? !
2
2
19
3
? !
20
3
? !
5
6
5 1 1 1 1 1 0 0
6 1 1 1 0 0 1 1
1=; 0=
21https://www.wikiwand.com/zh-tw/%E5%90%91%E9%87%8F%E7%A9%BA%E9%96%93%E6%A8%A1%E5%9E%8B
Gerard Salton
d1d2
22
Part 2.
23
1
2
3
24
25
26
https://github.com/fxsjy/jieba
Python
PHPJavaNode.js.NET (C#)C++R
27
Jieba
https://www.slideshare.net/ssuser4568b0/jieba
1.
2.Trie
3.DAG
4.
5.
HMMViterbi
https://github.com/fxsjy/jiebahttps://www.slideshare.net/ssuser4568b0/jieba
28
Jieba-JS
https://goo.gl/YrSTn9
https://goo.gl/YrSTn9
29
30
31
CSV to ARFF
1. CSV
[]
2. CSV to ARFF 3. ARFF
LibreOffice CalcGoogle
Unicode
32
Microsoft Office
Big5
1. CSV
33
1. CSV
document
class
class: ?
http://l.pulipuli.info/nccu/17-tm
34
1. CSV
CSV
http://l.pulipuli.info/nccu/17-tmhttps://docs.google.com/spreadsheets/d/1jaHDl0692t5OHzRlE4YOXJiRKTgv5xQajl_wqWGlQUY/export?format=csvhttps://docs.google.com/spreadsheets/d/1jaHDl0692t5OHzRlE4YOXJiRKTgv5xQajl_wqWGlQUY/export?format=csv
1. CSV
35
http://l.pulipuli.info/nccu/17-tm
36
2. CSV to ARFF
CSV to ARFF
http://l.pulipuli.info/nccu/17-tmhttps://pulipulichen.github.io/jieba-js/weka_csv_arff.html
37
2. CSV to ARFF
1. CSV
2.
3.
38
3. ARFF&
train
test
39
Weka
weka.filters.unsupervised.arrtibute.
StringToWordVector
http://l.pulipuli.info/nccu/17-tm
40
1.CSV
2.CSV to ARFF
3.ARFF
train
test
http://l.pulipuli.info/nccu/17-tm
41
Part 3.
42
Weka
Java
WindowsMac OSLinux
43
Weka
http://l.pulipuli.info/nccu/17-tm
Weka 3.8.1
http://l.pulipuli.info/nccu/17-tmhttps://docs.google.com/document/d/1FMJz4rWNGuJnSVEwJG5vFx_0lpLk0VqNTCO0oEHokPs/pub
Weka (1/2)
Weka
C:\Program Files
\Weka-3-8
RunWeka.ini
>
44
45
Weka (2/2)
fileEncoding=Cp1252
fileEncoding=utf-8
46
&
1
2
3
4
1-1.
1-2.
1-3. ()
47
1.
train
test
48
2. & 2-1. Weka Explorer
2-2.
2-3. Meta
2-4. NaiveBayes
2-5.
StringToWordVector
2-6.
49
2-1. Weka Explorer
Weka 3.8 Explorer
50
2-2. (1/2)
Open file
train
51
2-2. (2/2)
6
document class
52
2-3. Meta
53
2-3. Meta
weka.classifiers.meta.
FilteredClassifier
54
2-4. NaiveBayes
weka.classifiers.bayes.
NaiveBayes
55
2-5. ()StringToWordVector
weka.filters.
unsupervised.arrtibute.
StringToWordVector
56
57
2-6.
class
58
2. &
2-1. Weka Explorer
2-2.
2-3. Meta
2-4. NaiveBayes
2-5.
StringToWordVector
2-6.
59
3.
3-1.
3-2.
3-3.
Cross-vailidation: 610
(6)
60
3-1.
1~6
61
6
()
1
2
3
4
5
6
1.
2.
4/6=66.7%
62
3-2.
Start
63
3-3.
Correctly Classified
Instances:
66.7%
64
...
66.7%
31
3-1.
3-2.
3-3.
65
3.
66
4.
4-1.
4-2.
4-3.
4-4.
4-5. ARFF to CSV
4-6.
4-1. (1/2)
67
Supplied test set: [Set...]
4-1. (2/2)
68
Open file
test
class
69
4-2. (1/2)
More options...
Output predictions:
[Choose] CSV
70
4-2. (2/2)
outputDistribution:
TrueCSV
71
4-3.
Start
72
4-4. ARFF
Save result buffer
result.txt
73
4-5. ARFF to CSV (1/3)
http://l.pulipuli.info/nccu/17-tm
ARFF to CSV
http://l.pulipuli.info/nccu/17-tmhttp://pulipulichen.github.io/jieba-js/weka_arff_csv.html
74
4-5. ARFF to CSV (2/3)
test
result.txt
75
4-5. ARFF to CSV (3/3)
.csv
76
4-6. (1/3)Google
http://l.pulipuli.info/nccu/17-tm
http://l.pulipuli.info/nccu/17-tmhttps://drive.google.com/drive
77
4-6. (2/3)
.csv
78
4-6. (3/4)
79
4-6. (4/4)
80
document class
predicted
class
pro_dis:
pro_dis:
? *1 0
? *0.619 0.381
1896
? *1 0
? 0.001 *0.999
4-1.
4-2.
4-3.
4-4.
4-5. ARFF to CSV
4-6.
81
4.
82
Part 4.
83
84
Information Gain
()
1100% 260%40%
166% 33% 2100% 3100%
()
2 1
2 1
2 3
1 1
1 2
2 2
2 2
85
86
1.
2.:
StringToWordVector
3.Class
4.
InfoGainAttributeEval
Ranker
5.
87
2. (1/2)
weka.filters.
unsupervised.arrtibute.
StringToWordVector
88
2. (2/2)
89
3. Class
class
90
4.
weka.attributeSelection.
InfoGainAttributeEval
weka.attributeSelection.
Ranker
91
5. (1/2)
Start
92
5. (2/2)
Selected attributes
93
Ranked attributes:
0.459 9
0.459 49
0.459 46
document class
28 15
80
1891
3D
94
document class
28 15
80
1891
3D
111
1.
2.:
StringToWordVector
3.Class
4.
InfoGainAttributeEval
Ranker
5.
95
96
Part 5.
97
1.
2.
3.
4. TF-IDF
5.
6.() ()
7.
98
1891
1891
99
tokenizer:
CharacterNGram
Tokenizer
- max 1
-min 1
StringToWordVector
100
101
(1/2)
102
(2/2)
103
? !
1 1 1 1 1 8
1 1 1 1 1 9
104
()
28 15
1 1 1 1 1
1 2 2 1 1
105
TF-IDF
...
1 0 1
0 1 1
TF-IDF
...
2 0 1
0 2 1
106
TF-IDF
=
=
107
TF-IDF
=
2
2
*2
108
TF-IDF
(1/2)
=
*3
109
TF-IDF
(2/2)
=
(
)
*()
110
Weka
2-5: IDFTransform
TFTransform
111
weka.classifiers.bayes.
NaiveBayes
weka.classifiers.function.
Logistic
weka.classifiers.trees.
J48
weka.classifiers.functions.
SMO
weka.classifiers.functions.
MultilayerPerceptron
weka.classifiers.functions.
NeuralNetwork
Weka
112
Weka
:
WEKA
2015
ISBN: 978-986-379-067-9
113
510.25474 368
() ()
114
97 23
document
class
...
....
document
class
... 97
23
115
doNotOperateOn
PerClassBasis:
True
StringToWordVector
116
weka.classifiers.bayes.
NaiveBayes
weka.classifiers.functions.
MultilayerPerceptron
/
117
&
1
2
3
4 Weka
118
Part 6.
119
P4
P6
120
P6194.87%
()
121
?
122
123
http://l.pulipuli.info/nccu/17-tm
http://l.pulipuli.info/nccu/17-tm
http://blog.pulipuli.info/
http://blog.pulipuli.info/