View
45
Download
2
Category
Preview:
DESCRIPTION
Weka. Praktické použití. Antonín Pavelka. Weka - úvod. systém pro analýzu dat a prediktivní modelování University of Waikato, Nový Zéland 1993 TCL/TK, C, Makefiles 1997 rozhodnutí přejít na čistou Javu integrována RapidMiner Petaho (systém business intelligence) - PowerPoint PPT Presentation
Citation preview
WekaAntonn PavelkaPraktick pouit
Weka
Weka - vodsystm pro analzu dat a prediktivn modelovnUniversity of Waikato, Nov Zland1993 TCL/TK, C, Makefiles1997 rozhodnut pejt na istou JavuintegrovnaRapidMinerPetaho (systm business intelligence)GNU General Public License
*
Weka
Ovldngrafick rozhranExplorer jednotliv innosti na kliknutExperimenter systematick srovnnKnowledge flow innosti jako tokpkazov dekJava API*
Weka
Ukzka grafick rozhran ...*
Weka
... pkazov dek ...java classpath weka.jar weka.classifiers.bayes.NaiveBayes t data/iris.arff*
Weka
... Java API*
Weka
1. Attribute-Relation File Format (ARFF)*ARFF soubor
@relation spambase% spam, non-spam@attribute word_freq_make real@attribute 'char_freq_# real@attribute {spam, ham}@data0,0.64,0.64,spam0.21,0.28,0.5,spam0.06,0,0.71,ham
Chybjc hodnoty4.4,?,1.5,?,Tolkienetzce@attribute LCC string@attribute LCSH string
@dataAG5, 'Encyclopedias and dictionaries.;Twentieth century.as@ATTRIBUTE timestamp DATE "yyyy-MM-dd HH:mm:ss" @DATA "2001-04-03 12:12:12" "2001-05-03 12:59:55"dk formt0, X, 0, Y, "class A" {1 X, 3 Y, 4 "class A"}0, 0, W, 0, "class B" {2 W, 4 "class B"}
Weka
2. Pedzpracovn dat*
Weka
Histogramy*uiten seln atributpodezel seln atributbinrn clov atribut20-hodnotov atribut
Weka
Filtry*Remove V R 1-5,8 (V = inverze, zachovej pouze tyto atributy)Discretizenkter algoritmy nepracuj s slyurychlennkdy i zven pesnostipevzorkovndoplnn chybjcch atribut, odstrann chybjcch hodnotObfuscatorPrincipal Component Analysis, Partial Least SquaresAttributeSelection
Weka
StringToWordVector*@attribute text string@attribute class {class1,class2,class3}
@data'\n\t\n\t\tDumbek\'s Rand'
Klasifikace algoritmy 1*NaiveBayes, BayesNet, Averaged One-Dependence Estimators (AODE)SMO, SMOreg, LibSVM
Weka
StringKernel*@attribute name string@attribute class {female, male}@dataMidori,femaleKoichi,male
291 enskch a 385 muskch jmen (odstranno 13 univerzlnch jmen)prvn sputn: Q2 = 63 %
Weka
Dal SVM parametry a jejich optimalizace*meta.CVParameterSelection P "C 0.5 50000.0 5.0" ... Cross-validation Parameter: '-C' ranged from 0.5 to 50000.0 with 5.0 stepsClassifier Options: -C 12500.375 ...
bez predikce spolehlivosti
TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.887 0.192 0.777 0.887 0.828 0.847 female 0.808 0.113 0.904 0.808 0.853 0.847 maleWeighted Avg. 0.842 0.147 0.849 0.842 0.842 0.847
predikce spolehlivosti logistickou regres
TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.835 0.148 0.81 0.835 0.822 0.921 female 0.852 0.165 0.872 0.852 0.862 0.921 maleWeighted Avg. 0.845 0.158 0.846 0.845 0.845 0.921
Weka
Predikovan spolehlivost a ROC kivka*
Weka
Klasifikace algoritmy 2*MultilayerPerceptronvalidan mnoinapomalLinearRegressionPLSClassifier Partial Least Squares regressionstromyJ48, RandomForest, ...metaboosting, bagging, ...ClassifivationViaRegressionAttributeSelectedClassifierCostSensitiveClassifier
Weka
Ven chyb*TP Rate 0.81 0.915
meta.CostSensitiveClassifier% Rows Columns2 2% Matrix elements0 21 0
cena za patn klasifikovan P je 2x vt ne za N
Weka
Vbr atribut*Metoda hodnocenatributChiSquaredAttributeEvalSVMAttributeEvalpodmnoinyCfsSubsetEvalWrapperSubsetEval, ClassifierSubsetEval
Metoda prohledvnpro atributyRankerpro podmnoinyBestFirstGeneticSearch
Redukce dimenz filtremPrincipal Component Analysis, Partial Least Squares
Weka
Experimenter*
Weka
Knowledge Flow*
Weka
Zdroje*
KnihyWEKA Manual for Version 3-7-0Data Mining: Practical Machine Learning Tools and Techniques
Webhttp://www.cs.waikato.ac.nz/ml/weka/http://weka.wikispaces.com/http://wekadocs.com/http://www.hakank.org/weka/
Weka
Sputn Wekyssh X lethemodule add javavytvote si pracovn adres (mkdir , cd )wget loschmidt.chemi.muni.cz/~tonda/w.zipunzip w.zipjava Xmx256m jar weka.jar*
Weka
kol 1Explorer J48 a SMOspuste 2x Weku a Explorerv obou otevete spambase.arff a bte do tabu Classifyv Test options, More options nastavte Output predictions na Plain textv prvnm vyberte klasifiktor trees.J48kliknte do polka vpravo od tlatka Choose a nastavteuseLaplace: Truespuste 10-ti nsobn kov ovenv druhmvyberte klasifiktor functions.SMOkliknte do polka vpravo od tlatka Choose a nastavtebuildLogisticModels: TruenumFolds: 10spuste 10-ti nsobn kov ovenSrovnejte rychlost a pesnost obou algoritm. Odhadnte uitenost predikce dvryhodnosti vsledku (=== Predictions on test data ===, sloupec prediction).
*
Weka
kol 2Knoledge Flow - ROC kivkyspuste Knowledge Flowotevete spam_roc.kfnastavte ArffLoader na spambase.arffkliknte pravm tlatkem na ArffLoader, Start loadingsrovnejte ROC kivky NaiveBayese a BayesNetu (klik pravm tlatkem na horn Model Performance Chart, Show chart)srovnejte ROC kivky BayesNetu a AODE. Po kliknut na bod kivky se zobraz sla. Kolik procent spamu identifikujeme, pokud jsme ochotn tolerovat, e ve spamovm koi skon 4 % hamu (spam = class 1, osa X: FPR = FP/N, osa Y: TPR = TP / P)?
*
Weka
kol 3Experimenter - srovnn klasifiktorspuse Experimenterkliknte na tlatko NewResult destination: nastavte cestu a zvolte jmno novho ARFF souborupidejte dataset spam_discretized.arffpidejte algoritmy bayes.AODE, tree.J48, tree.RandomForestspuste vpoet v tabu Runjakmile skon, pejte do tabu Analysekliknte na Experiment a Perform TestJe pesnost nkter z metod na tto sad statisticky vznamn lep na hladin 0.05? Jak je to s Area_under_ROC?*
Weka
*/celkem/celkem*/celkem/celkem
Recommended