Upload
vankhanh
View
215
Download
1
Embed Size (px)
Citation preview
R
RapidMiner
KSK KSK Analytics Inc
w w w . k s k - a n l . c o m
KSK
2
KSK
Copyright KSK Analytics Inc. All rights reserved
3
Data Analysis for Everyone !
Web KSK
KSK
Copyright KSK Analytics Inc. All rights reserved
4
Big DataAIDB
/
BIBA
DB
TeraPetaDBDWH/ETL
KSK
Copyright KSK Analytics Inc. All rights reserved
5
2006 2007 Pentaho ()2008 2010 Infobright ()2011 RapidMiner ()2012 Jedox ()2013
2014 Cloudera () Revolution R ()
NYSOL ()
2015 ()2016 TensorFlow () 2017 IoT
KSK
Copyright KSK Analytics Inc. All rights reserved
6
(MBA)PentahoBIRapidMiner
BIBA 8DB 10 (TensorFlow Hadoop Spark etc)
RapidMinerEMCEMCDSAMBA
PentahoBIBI/BA
25
OR(2005, 2008, 2015)(2006, 2013)NYSOL
(TOEIC900over) 3MBA 4
7
()
Copyright KSK Analytics Inc. All rights reserved
2
~
8
Copyright KSK Analytics Inc. All rights reserved
12
~
SCM
9
Copyright KSK Analytics Inc. All rights reserved
ETL
2etc
10
Copyright KSK Analytics Inc. All rights reserved
11
Spark
200TByte
Copyright KSK Analytics Inc. All rights reserved
Japan Partner KSK Analytics KSK Analytics Inc.
201610KSK
Japan Partner KSK Analytics 13
RapidMiner()
KDnuggets1 Big Data LandscapeMachine Learning
http://mattturck.com/2016/02/01/big-data-landscape/
Japan Partner KSK Analytics 14
RapidMiner
RapidMiner
Japan Partner KSK Analytics 15
RapidMiner
()
()
ExcelBI
IT
RapidMiner IT
Japan Partner KSK Analytics 16
RapidMiner
60 employees worldwide
30,000+ downloads/
mo.
35,000+ acDve
deployments, with
250,000+ users
500+ customers in
over 50 countries
100+ acDve developers
2001 RapidMiner
Japan Partner KSK Analytics KSK Analytics Inc. 17
Advanced Analytics Platform
https://www.gartner.com/doc/reprints?id=1-2YC9GD6&ct=160209&st=sb
Japan Partner KSK Analytics KSK Analytics Inc. 18
http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
N=2,895
RapidMiner
Japan Partner KSK Analytics KSK Analytics Inc. 19http://mattturck.com/2016/02/01/big-data-landscape/
Japan Partner KSK Analytics KSK Analytics Inc. 20
RapidMiner
Japan Partner KSK Analytics KSK Analytics Inc. 21
RapidMiner
Japan Partner KSK Analytics
RapidMiner
10,000
1,000
22
Japan Partner KSK AnalyticsCopyright KSK
Analytics Inc. All rights reserved
23
1
Japan Partner KSK Analytics KSK Analytics Inc. 24
Japan Partner KSK Analytics KSK Analytics Inc. 25
Decision Tree
551166
RapidMiner
55
Japan Partner KSK Analytics 26
-
Japan Partner KSK Analytics KSK Analytics Inc. 27
RapidMiner
Japan Partner KSK Analytics KSK Analytics Inc. 28
()
SQL Join)
(If
Z
Data
etc
integer23-511,024,768 real11.23-0.0001 binominal2truefalseyesno polynominal3textdate2014/12/23 Dme17:59 date_Dme2014/12/23 17:59 file_path
Japan Partner KSK Analytics KSK Analytics Inc. 29
Japan Partner KSK Analytics KSK Analytics Inc. 30
Japan Partner KSK Analytics KSK Analytics Inc. 31
Japan Partner KSK Analytics KSK Analytics Inc. 32
Japan Partner KSK Analytics
(120)()
Japan Partner KSK Analytics 34
(Decision Tree) (k-NN) (SVM) Gradient Boosted Trees
Japan Partner KSK Analytics 35
Modeling Operator () RapidMiner v7.2Predic5ve(61) Default Model K-NN Naive Bayes Naive Bayes (Kernel) Decision Tree Decision Tree (Mul5way) Decision Tree (Weight-Based) ID3 CHAID Decision Stump Random Tree Random Forest Gradient Boosted Trees Rule Induc5on Single Rule Induc5on Single Rule Induc5on (Single AOribute) Subgroup Discovery Tree to Rules Neural Net AutoMLP Perceptron Deep Learning Linear Regression Polynomial Regression Vector Linear Regression Local Polynomial Regression Seemingly Unrelated Regression Gaussian Process Relevance Vector Machine Generalized Linear Model Logis5c RegressionSVMLogis5c Regression (Evolu5onary) Logis5c Regression Support Vector Machine Support Vector Machine (LibSVM) Support Vector Machine (Linear) Support Vector Machine (Evolu5onary) Support Vector Machine (PSO) Fast Large Margin Hyper Hyper
Linear Discriminant Analysis Quadra5c Discriminant Analysis Regularized Discriminant Analysis Vote Polynomial by Binomial Classifica5on Hierarchical Classifica5on Classifica5on by Regression Addi5ve Regression Rela5ve Regression Transformed Regression Bayesian Boos5ng Subgroup Discovery (Meta) AdaBoost Bagging Stacking MetaCost Find Threshold (Meta) Update Model Group Models Ungroup Models Create Formula
Segmenta5on(13) K-Means K-Means (Kernel) k-Means (fast) X-Means K-Medoids DBSCAN Expecta5on Maximiza5on Clustering Support Vector Clustering Random Clustering Agglomera5ve Clustering Top Down Clustering FlaOen Clustering Extract Cluster Prototypes
Associa5ons(6) FP-Growth Create Associa5on Rules Apply Associa5on Rules Generalized Sequen5al PaOerns Item Sets to Data Unify Item Sets Correla5ons(8) Correla5on Matrix Covariance Matrix ANOVA Matrix Grouped ANOVA Transi5on Matrix Transi5on Graph Mutual Informa5on Matrix Rainflow Matrix Similari5es(4) Data to Similarity Data to Similarity Data Similarity to Data Cross Distances Feature Weights(17) Weight by Informa5on Gain Weight by Informa5on Gain Ra5o Weight by Rule Weight by Value Average Weight by Devia5on Weight by Correla5on Weight by Chi Squared Sta5s5c Weight by Gini Index Weight by Tree Importance Weight by Uncertainty Weight by Relief Weight by SVM Weight by PCA Weight by Component Model Weight by User Specifica5on Data to Weights Weights to Data
Op5miza5on(20) Op5mize Parameters (Grid) Op5mize Parameters (Quadra5c) Op5mize Parameters (Evolu5onary) Set Parameters Clone Parameters Forward Selec5on Backward Elimina5on Op5mize Selec5on Op5mize Selec5on (Brute Force) Op5mize Selec5on (Weight-Guided) Op5mize Selec5on (Evolu5onary Op5mize by Genera5on (Evolu5onary Aggrega5on) Op5mize by Genera5on (GGA) Op5mize by Genera5on (AGA) Op5mize by Genera5on (YAGGA) Op5mize by Genera5on (YAGGA2) Op5mize Weights (Forward) Op5mize Weights (Backward) Op5mize Weights (Evolu5onary) Op5mize Weights (PSO)
Japan Partner KSK Analytics KSK Analytics Inc.
R/Python
400
Japan Partner KSK Analytics KSK Analytics Inc.
RapidMiner
Japan Partner KSK Analytics KSK Analytics Inc. 38
Japan Partner KSK Analytics
10923
39
/
Japan Partner KSK Analytics
10923
40
/
:
Japan Partner KSK Analytics
60
41
60 Accuracy: 74.02% +/- 9.84%
6 Accuracy: 85.57% +/- 6.18% (10%)
(1,000)
(100)
(1,000)
(1,000)
(1000100)NG
()()
Japan Partner KSK Analytics KSK Analytics Inc. 42
RapidMiner/(FreeDL)
Appendix
Japan Partner KSK Analytics
KSK Analytics Inc.
Japan Partner KSK Analytics
KSK Analytics Inc.
Japan Partner KSK Analytics
KSK Analytics Inc.
Japan Partner KSK Analytics
KSK Analytics Inc.
Japan Partner KSK Analytics
KSK Analytics Inc.
Japan Partner KSK Analytics
KSK Analytics Inc.
SNS
Japan Partner KSK Analytics KSK Analytics Inc. 49
RapidMiner
[]RapidMiner Studio(Named User License) RapidMiner ServerStudio(Instance License) StudioStudio
Japan Partner KSK Analytics KSK Analytics Inc. 50
RapidMiner -Named User License Studio Free Studio Small Studio Medium Studio Large
10,000 100,000 1,000,000 Unlimited
1 2 4 Unlimited
- Included Included Included
(3) Free (1) Free
(:SmallMidium)Radoop
-Instance License Server Free Server Small Server Medium Server Large
(GB) 2 16 64 Unlimited
1 4 8 Unlimited
Web 1,000/24hrs Unlimited Unlimited Unlimited
- Included Included Included
(3) Free (1) Free
-Pt1(2)Pt2(2)
Pt1 Pt2 Pt1Pt2
() Pt1Pt2
225,000 225,000
(http://www.rapidminer.jp/event)
Japan Partner KSK Analytics KSK Analytics Inc.
studio
AB
RapidMiner Server
Japan Partner KSK Analytics KSK Analytics Inc. 52
(Hadoop)
Radoop (Hadoop) Apache Hadoop Hive Apache Mahout
Hadoop
RapidMiner Radoop
Japan Partner KSK Analytics KSK Analytics Inc. 53
RapidMiner Studio (Windows/Mac/Linux)
2GHz 2 CPUGBGB12801024
3GHz 4 CPU16GB100GB
OS64bitWindows 7Windows 8Windows 8.1Windows 10LinuxMacOS X 10.8
Java64bitJava 8
Japan Partner KSK Analytics KSK Analytics Inc. 54
RapidMiner Server
2GHz 2 CPUGBGB12801024
3GHz 4 CPU32GB1TB100GB
OS64bitWindows Server 2008 R2Windows Server 2012Windows Server 2012 R2Windows 10Linux
Java64bitOracle Java 8
MySQLMicrosoft SQL ServerOraclePostgreSQL
Japan Partner KSK Analytics KSK Analytics Inc. 55
OracleMicrosoft SQL ServerMySQLPostgreSQLTeradataHP VerticaIBM Netezza
NOSQLMongoDBCassandraApache SolrSplunk (read only)
DropboxAmazon S3SalesforceTwitter (read only)Mozenda (read only)Zapier (write only)
CSV - Comma Separated ValueMDB/ACCDB - Microsoft Access databaseXLS/XLSX - Microsoft Excel spreadsheet (97-2003,2007-2013)XML - Extensible Markup LanguageARFF/XRFF - Weka file formatsDBF - dBASE Database File format (read only)SAV - IBM SPSS file format (read only)SAS - SAS file format up to v9.2 (read only)DTA - Stata file format (read only)QVX - QlikView data eXchange (write only)
()
Japan Partner KSK Analytics KSK Analytics Inc. 56
Pt(2)
RapidMinerCRISP-DMRapidMiner
k-NN
Japan Partner KSK Analytics KSK Analytics Inc. 57
Pt(2)
Japan Partner KSK Analytics KSK Analytics Inc. 58
RapidMiner Free ()http://www.rapidminer.jp/download RapidMinerhttp://www.rapidminer.jp/blog
KSK()http://www.ksk-anl.com/event
Japan Partner KSK Analytics KSK Analytics Inc.
KSK
www.ksk-anl.com [email protected]
RapidMiner
www.rapidminer.jp