59
Rでつまずいた必! ドラック&ドロップで分析できる データサイエンティスト御達ツール RapidMinerと分析サービスのご紹介 株式会社KSKアナリティクス KSK Analytics Inc www.ksk-anl.com

clouderaworldtokyo.comclouderaworldtokyo.com/session-download/C3-RapidMiner... · Translate this pageclouderaworldtokyo.com

Embed Size (px)

Citation preview

  • R

    RapidMiner

    KSK KSK Analytics Inc

    w w w . k s k - a n l . c o m

  • KSK

    2

  • KSK

    Copyright KSK Analytics Inc. All rights reserved

    3

    Data Analysis for Everyone !

    Web KSK

  • KSK

    Copyright KSK Analytics Inc. All rights reserved

    4

    Big DataAIDB

    /

    BIBA

    DB

    TeraPetaDBDWH/ETL

  • KSK

    Copyright KSK Analytics Inc. All rights reserved

    5

    2006 2007 Pentaho ()2008 2010 Infobright ()2011 RapidMiner ()2012 Jedox ()2013

    2014 Cloudera () Revolution R ()

    NYSOL ()

    2015 ()2016 TensorFlow () 2017 IoT

  • KSK

    Copyright KSK Analytics Inc. All rights reserved

    6

    (MBA)PentahoBIRapidMiner

    BIBA 8DB 10 (TensorFlow Hadoop Spark etc)

    RapidMinerEMCEMCDSAMBA

    PentahoBIBI/BA

    25

    OR(2005, 2008, 2015)(2006, 2013)NYSOL

    (TOEIC900over) 3MBA 4

  • 7

    ()

    Copyright KSK Analytics Inc. All rights reserved

  • 2

    ~

    8

    Copyright KSK Analytics Inc. All rights reserved

  • 12

    ~

    SCM

    9

    Copyright KSK Analytics Inc. All rights reserved

  • ETL

    2etc

    10

    Copyright KSK Analytics Inc. All rights reserved

  • 11

    Spark

    200TByte

    Copyright KSK Analytics Inc. All rights reserved

  • Japan Partner KSK Analytics KSK Analytics Inc.

    201610KSK

  • Japan Partner KSK Analytics 13

    RapidMiner()

    KDnuggets1 Big Data LandscapeMachine Learning

    http://mattturck.com/2016/02/01/big-data-landscape/

  • Japan Partner KSK Analytics 14

    RapidMiner

    RapidMiner

  • Japan Partner KSK Analytics 15

    RapidMiner

    ()

    ()

    ExcelBI

    IT

    RapidMiner IT

  • Japan Partner KSK Analytics 16

    RapidMiner

    60 employees worldwide

    30,000+ downloads/

    mo.

    35,000+ acDve

    deployments, with

    250,000+ users

    500+ customers in

    over 50 countries

    100+ acDve developers

    2001 RapidMiner

  • Japan Partner KSK Analytics KSK Analytics Inc. 17

    Advanced Analytics Platform

    https://www.gartner.com/doc/reprints?id=1-2YC9GD6&ct=160209&st=sb

  • Japan Partner KSK Analytics KSK Analytics Inc. 18

    http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html

    N=2,895

    RapidMiner

  • Japan Partner KSK Analytics KSK Analytics Inc. 19http://mattturck.com/2016/02/01/big-data-landscape/

  • Japan Partner KSK Analytics KSK Analytics Inc. 20

    RapidMiner

  • Japan Partner KSK Analytics KSK Analytics Inc. 21

    RapidMiner

  • Japan Partner KSK Analytics

    RapidMiner

    10,000

    1,000

    22

  • Japan Partner KSK AnalyticsCopyright KSK

    Analytics Inc. All rights reserved

    23

    1

  • Japan Partner KSK Analytics KSK Analytics Inc. 24

  • Japan Partner KSK Analytics KSK Analytics Inc. 25

    Decision Tree

    551166

    RapidMiner

    55

  • Japan Partner KSK Analytics 26

    -

  • Japan Partner KSK Analytics KSK Analytics Inc. 27

    RapidMiner

  • Japan Partner KSK Analytics KSK Analytics Inc. 28

    ()

    SQL Join)

    (If

    Z

    Data

    etc

    integer23-511,024,768 real11.23-0.0001 binominal2truefalseyesno polynominal3textdate2014/12/23 Dme17:59 date_Dme2014/12/23 17:59 file_path

  • Japan Partner KSK Analytics KSK Analytics Inc. 29

  • Japan Partner KSK Analytics KSK Analytics Inc. 30

  • Japan Partner KSK Analytics KSK Analytics Inc. 31

  • Japan Partner KSK Analytics KSK Analytics Inc. 32

  • Japan Partner KSK Analytics

    (120)()

  • Japan Partner KSK Analytics 34

    (Decision Tree) (k-NN) (SVM) Gradient Boosted Trees

  • Japan Partner KSK Analytics 35

    Modeling Operator () RapidMiner v7.2Predic5ve(61) Default Model K-NN Naive Bayes Naive Bayes (Kernel) Decision Tree Decision Tree (Mul5way) Decision Tree (Weight-Based) ID3 CHAID Decision Stump Random Tree Random Forest Gradient Boosted Trees Rule Induc5on Single Rule Induc5on Single Rule Induc5on (Single AOribute) Subgroup Discovery Tree to Rules Neural Net AutoMLP Perceptron Deep Learning Linear Regression Polynomial Regression Vector Linear Regression Local Polynomial Regression Seemingly Unrelated Regression Gaussian Process Relevance Vector Machine Generalized Linear Model Logis5c RegressionSVMLogis5c Regression (Evolu5onary) Logis5c Regression Support Vector Machine Support Vector Machine (LibSVM) Support Vector Machine (Linear) Support Vector Machine (Evolu5onary) Support Vector Machine (PSO) Fast Large Margin Hyper Hyper

    Linear Discriminant Analysis Quadra5c Discriminant Analysis Regularized Discriminant Analysis Vote Polynomial by Binomial Classifica5on Hierarchical Classifica5on Classifica5on by Regression Addi5ve Regression Rela5ve Regression Transformed Regression Bayesian Boos5ng Subgroup Discovery (Meta) AdaBoost Bagging Stacking MetaCost Find Threshold (Meta) Update Model Group Models Ungroup Models Create Formula

    Segmenta5on(13) K-Means K-Means (Kernel) k-Means (fast) X-Means K-Medoids DBSCAN Expecta5on Maximiza5on Clustering Support Vector Clustering Random Clustering Agglomera5ve Clustering Top Down Clustering FlaOen Clustering Extract Cluster Prototypes

    Associa5ons(6) FP-Growth Create Associa5on Rules Apply Associa5on Rules Generalized Sequen5al PaOerns Item Sets to Data Unify Item Sets Correla5ons(8) Correla5on Matrix Covariance Matrix ANOVA Matrix Grouped ANOVA Transi5on Matrix Transi5on Graph Mutual Informa5on Matrix Rainflow Matrix Similari5es(4) Data to Similarity Data to Similarity Data Similarity to Data Cross Distances Feature Weights(17) Weight by Informa5on Gain Weight by Informa5on Gain Ra5o Weight by Rule Weight by Value Average Weight by Devia5on Weight by Correla5on Weight by Chi Squared Sta5s5c Weight by Gini Index Weight by Tree Importance Weight by Uncertainty Weight by Relief Weight by SVM Weight by PCA Weight by Component Model Weight by User Specifica5on Data to Weights Weights to Data

    Op5miza5on(20) Op5mize Parameters (Grid) Op5mize Parameters (Quadra5c) Op5mize Parameters (Evolu5onary) Set Parameters Clone Parameters Forward Selec5on Backward Elimina5on Op5mize Selec5on Op5mize Selec5on (Brute Force) Op5mize Selec5on (Weight-Guided) Op5mize Selec5on (Evolu5onary Op5mize by Genera5on (Evolu5onary Aggrega5on) Op5mize by Genera5on (GGA) Op5mize by Genera5on (AGA) Op5mize by Genera5on (YAGGA) Op5mize by Genera5on (YAGGA2) Op5mize Weights (Forward) Op5mize Weights (Backward) Op5mize Weights (Evolu5onary) Op5mize Weights (PSO)

  • Japan Partner KSK Analytics KSK Analytics Inc.

    R/Python

    400

  • Japan Partner KSK Analytics KSK Analytics Inc.

    RapidMiner

  • Japan Partner KSK Analytics KSK Analytics Inc. 38

  • Japan Partner KSK Analytics

    10923

    39

    /

  • Japan Partner KSK Analytics

    10923

    40

    /

    :

  • Japan Partner KSK Analytics

    60

    41

    60 Accuracy: 74.02% +/- 9.84%

    6 Accuracy: 85.57% +/- 6.18% (10%)

    (1,000)

    (100)

    (1,000)

    (1,000)

    (1000100)NG

    ()()

  • Japan Partner KSK Analytics KSK Analytics Inc. 42

    RapidMiner/(FreeDL)

    Appendix

  • Japan Partner KSK Analytics

    KSK Analytics Inc.

  • Japan Partner KSK Analytics

    KSK Analytics Inc.

  • Japan Partner KSK Analytics

    KSK Analytics Inc.

  • Japan Partner KSK Analytics

    KSK Analytics Inc.

  • Japan Partner KSK Analytics

    KSK Analytics Inc.

  • Japan Partner KSK Analytics

    KSK Analytics Inc.

    SNS

  • Japan Partner KSK Analytics KSK Analytics Inc. 49

    RapidMiner

    []RapidMiner Studio(Named User License) RapidMiner ServerStudio(Instance License) StudioStudio

  • Japan Partner KSK Analytics KSK Analytics Inc. 50

    RapidMiner -Named User License Studio Free Studio Small Studio Medium Studio Large

    10,000 100,000 1,000,000 Unlimited

    1 2 4 Unlimited

    - Included Included Included

    (3) Free (1) Free

    (:SmallMidium)Radoop

    -Instance License Server Free Server Small Server Medium Server Large

    (GB) 2 16 64 Unlimited

    1 4 8 Unlimited

    Web 1,000/24hrs Unlimited Unlimited Unlimited

    - Included Included Included

    (3) Free (1) Free

    -Pt1(2)Pt2(2)

    Pt1 Pt2 Pt1Pt2

    () Pt1Pt2

    225,000 225,000

    (http://www.rapidminer.jp/event)

  • Japan Partner KSK Analytics KSK Analytics Inc.

    studio

    AB

    RapidMiner Server

  • Japan Partner KSK Analytics KSK Analytics Inc. 52

    (Hadoop)

    Radoop (Hadoop) Apache Hadoop Hive Apache Mahout

    Hadoop

    RapidMiner Radoop

  • Japan Partner KSK Analytics KSK Analytics Inc. 53

    RapidMiner Studio (Windows/Mac/Linux)

    2GHz 2 CPUGBGB12801024

    3GHz 4 CPU16GB100GB

    OS64bitWindows 7Windows 8Windows 8.1Windows 10LinuxMacOS X 10.8

    Java64bitJava 8

  • Japan Partner KSK Analytics KSK Analytics Inc. 54

    RapidMiner Server

    2GHz 2 CPUGBGB12801024

    3GHz 4 CPU32GB1TB100GB

    OS64bitWindows Server 2008 R2Windows Server 2012Windows Server 2012 R2Windows 10Linux

    Java64bitOracle Java 8

    MySQLMicrosoft SQL ServerOraclePostgreSQL

  • Japan Partner KSK Analytics KSK Analytics Inc. 55

    OracleMicrosoft SQL ServerMySQLPostgreSQLTeradataHP VerticaIBM Netezza

    NOSQLMongoDBCassandraApache SolrSplunk (read only)

    DropboxAmazon S3SalesforceTwitter (read only)Mozenda (read only)Zapier (write only)

    CSV - Comma Separated ValueMDB/ACCDB - Microsoft Access databaseXLS/XLSX - Microsoft Excel spreadsheet (97-2003,2007-2013)XML - Extensible Markup LanguageARFF/XRFF - Weka file formatsDBF - dBASE Database File format (read only)SAV - IBM SPSS file format (read only)SAS - SAS file format up to v9.2 (read only)DTA - Stata file format (read only)QVX - QlikView data eXchange (write only)

    ()

  • Japan Partner KSK Analytics KSK Analytics Inc. 56

    Pt(2)

    RapidMinerCRISP-DMRapidMiner

    k-NN

  • Japan Partner KSK Analytics KSK Analytics Inc. 57

    Pt(2)

  • Japan Partner KSK Analytics KSK Analytics Inc. 58

    RapidMiner Free ()http://www.rapidminer.jp/download RapidMinerhttp://www.rapidminer.jp/blog

    KSK()http://www.ksk-anl.com/event

  • Japan Partner KSK Analytics KSK Analytics Inc.

    KSK

    www.ksk-anl.com [email protected]

    RapidMiner

    www.rapidminer.jp