99
DATA MINING 雲雲雲雲雲雲 CDMS Smart Score II --- 雲 R 雲雲雲 謝謝謝 謝謝 謝謝謝 謝謝謝謝謝謝謝謝謝謝謝謝謝謝謝謝謝謝 謝謝謝謝謝謝謝謝謝謝 & 謝謝謝謝謝謝謝謝謝謝謝謝謝 謝謝謝謝謝謝謝謝謝謝 & 謝謝謝謝謝謝謝謝謝謝謝謝 謝謝謝謝謝謝謝謝謝謝謝謝謝 & 謝謝謝謝謝謝謝謝謝謝謝謝謝 西 謝謝謝謝謝謝謝謝謝 西 & 謝謝謝謝謝謝謝謝謝謝謝謝謝謝 謝謝謝謝謝謝謝謝謝謝 & 謝西謝謝謝謝謝謝謝謝 & 謝謝謝謝謝謝謝謝謝謝 謝謝謝謝謝謝謝謝謝謝謝謝謝

DATA MINING 雲端決策平台 CDMS Smart Score II

Embed Size (px)

DESCRIPTION

DATA MINING 雲端決策平台 CDMS Smart Score II. --- 以 R 為基礎. 謝邦昌 教授 劉思喆 輔仁大學統計資訊學系暨應用統計所教授 中國人民大學統計學院 & 廈門大學計畫統計系客座教授 首都經貿大學統計學院 & 中央財經大學統計學院博導 上海財經大學統計及管理學院 & 西南財經大學統計學院客座教授 西安財經學院統計學院 & 天津 財經大學統計學院客座教授 山東經濟學院統計學院 & 廣西 財經學院 客座教授 & 新疆 財經 學院客座教授 中華資料採礦協會榮譽理事長. Outline. - PowerPoint PPT Presentation

Citation preview

DATA MININGCDMS Smart Score II--- R

& &&&&&

1OutlineBackground, Motivation & PurposeRelated Work: R + iSmartScore & Microsoft Cloud Computing for BICDMS Smart Score IIExperimental ResultsConclusions2

3

4vs.IBMWatson100IBM

5IBM802,8001642600

IBM

6BackgroundGoogle, Amazon, IBM, Microsoft, Yahoo19802010IT7BackgroundCDMS Smart Score IIDATA MINING(Data Mining)(Cloud Intelligence)8BackgroundDATA MININGCRISP-DM

9BackgroundCRISP-DMDATA MININGIBM Intelligent MinerLicense450SAS Enterprise MinerLicense500SPSS Modeler (IBM)License500Microsoft SQL ServerLicense10010MotivationDATA MINING11Motivation-DATA MININGLicense100500(Noise)(Null Value)(Wrong Value)(Outlier)12Motivation-

13Motivation-DATA MININGDATA MINING14Motivation-150DATA MINING2~3DATA MINING8815Motivation-DATA MINING1,0006004003:2DATA MINING(Imbalanced Class Distribution Problem)16Motivation-(; Majority Class)(; Minority Class)-16%84%17Motivation-DATA MININGClient-ServerClientServer18Motivation-DATA MINING19Motivation-DATA MINING20PurposeDATA MINING- CDMS Smart Score IIDATA MINING21

DATA MINING22AppServMySQLPHPApache23Cloud Computing(Cloud Computing)24

25

CDMS Smart Score II26CDMS Smart Score IIDATA MINING(Browser) DATA MININGDATA MINING(Client)leeysCDMS Smart Score II27SOP

111213

70%30%F-measureAUCGini28CDMS Smart Score IIleeys6DATA MINING()

29CDMS Smart Score II(Wizard)DATA MININGDATA MINING(())CDMS Smart Score IIDATA MINING1DATA MININGCDMS Smart Score II1~2DATA MININGCDMS Smart Score IIleeys(Small Loan)30CDMS Smart Score IIleeysDATA MINING(Data Cleansing)OK&Next(Null Value Imputation)

31CDMS Smart Score II(Report)CDMS Smart Score II(Importance)((Null Value)(Outlier))32CDMS Smart Score II(Report Sharing)CDMS Smart Score II

33CDMS Smart Score II

34CDMS Smart Score IIleeys(Small Loan)sjyensjyen(Report Sharing)sjyenleeyssjyensjyen35CDMS Smart Score IIAnytime, Anywhere36Related Work: R + iSmartScore

37Related Work: R + iSmartScore2006ISmartSoft Inc.R + iSmartScoreDATA MININGR + iSmartScoreDATA MINING50038Related Work: R + iSmartScoreR + iSmartScoreDATA MINING//() ()(6)39Microsoft SQL Server 2008 Cloud Computing

40Related Work: Microsoft CC2010DATA MINING(http://www.sqlserverdatamining.com/cloud/)SQL Server 2005DATA MININGSQL Server 2005R + iSmartScoreDATA MININGR + iSmartScore41CDMS Smart Score IIDATA MININGDATA MINING-CDMS Smart Score II42

Knowledge Discovery in DatabaseData Warehousing (DB & SSIS)SSASSSRS43CDMS Smart Score IICDMS Smart Score II(Classification)(Prediction)(Clustering)(Association)44CDMS Smart Score II

45CDMS Smart Score II

46System Architecture for ClassificationUpload Dataset / Database Connection

MemberLoginFlat File / DatabaseMember DatabaseAttribute SelectionPartition DatasetTarget AttributeDataCleansingNull Value ImputationDataProfilingDataCodingDerivedAttributesDataFilteringAttribute ImportanceStatisticsTrainModelThresholdOptimizationTestModelReportingSingleScoreBatchScore

47CDMS Smart Score- Member Login http://120.125.85.66/mining/index.php

CDMS Smart Score II (P.S. test test )48

CDMS Smart Score- Upload Dataset.csvMicrosoft SQL Server 200849

CDMS Smart Score- Upload Dataset(/)50CDMS Smart Score- Upload Dataset

51

CDMS Smart Score- Target Attribute52

CDMS Smart Score- Attribute Selection()53

CDMS Smart Score- Partition Dataset54

CDMS Smart Score- Data Cleansing55

CDMS Smart Score- Null Value Imputation56

CDMS Smart Score- Null Value Imputation57

CDMS Smart Score - Data Profiling58

CDMS Smart Score - Data Profiling59

CDMS Smart Score - Data Profiling60CDMS Smart Score- Data Coding

61CDMS Smart Score- Data Coding(Arbitrary Assignment)(Discretization)

62

CDMS Smart Score- Derived Attributes63

CDMS Smart Score- Data Filtering ()64

CDMS Smart Score- Attribute Importance65

CDMS Smart Score- Statisticschecking_status66

CDMS Smart Score- Statisticsnum_dependents67

CDMS Smart Score- Train Model (NN Training Result)468

CDMS Smart Score- Train Model (LR Training Result)69CDMS Smart Score- Train Model (DT Training Result)

70

CDMS Smart Score- Threshold OptimizationF-measureBenefit71

CDMS Smart Score- Threshold Optimization (F-Measure)F-measure472

CDMS Smart Score- Confusion MatrixType I ErrorType II Error73

CDMS Smart Score- Confusion MatrixF-F-74

CDMS Smart Score Lift ChartIdeal AreaModel AreaScore = 75

CDMS Smart Score Lift Chart76

CDMS Smart Score- Precision vs. Recall(Precision)(Recall)77

CDMS Smart Score- Threshold Optimization (Benefit) Benefit78

CDMS Smart Score- Threshold Optimization (Benefit)Benefit79CDMS Smart Score- Threshold Optimization (Benefit)Confusion MatrixModel BenefitLift ChartPrecision vs. RecallProfit Chart

80

CDMS Smart Score Model BenefitModel BenefitIdeal BenefitModel Benefit Ratio=81

CDMS Smart Score- Profit ChartNeural Network30%82

CDMS Smart Score- Profit ChartNeural NetworkNaive Bayes30% 83

CDMS Smart Score- Profit ChartNBNNLR11.05DT84

CDMS Smart Score- Test Model85

CDMS Smart Score- Single Score78.384%86CDMS Smart Score- Deployment

http://120.125.85.122/SScore.aspx?id=AguaBOVw1z 87CDMS Smart Score- Deployment

88

CDMS Smart Score- Batch Score89

CDMS Smart Score- Batch Score90duration1 (good)

1 (bad)Predicted Results91

CDMS Smart Score- Reporting

92CDMS Smart Score V.S SPSS ClementineCar Insurance Dataset()ClementinePrecisionRecallF-measureDecision Tree55.56%6.33%11.36%Neural NetworkN/A 0.00%N/A CDMSSmartScorePrecisionRecallF-measureNaive Bayesian 33.86%81.01%47.76%Neural Network34.11%55.70%42.31%Logistic Regression35.63%78.48%49.01%

SPSS Clementine!93CDMS Smart Score V.S SPSS ClementineCar Insurance Dataset()5%

ClementinePrecisionRecallF-measureDecision Tree62.50%6.33%11.49%Neural Network39.29%13.92%20.56%CDMSSmartScorePrecisionRecallF-measureNaive Bayesian 36.84%70.89%48.49%Neural Network36.08%72.15%48.10%Logistic Regression35.71%75.95%48.58%

5%SPSS Clementine!94CDMS Smart Score V.S Intelligent MinerCard Application Dataset ()

Intelligent MinerPrecisionRecallF-measureDecision Tree50.00%55.56%52.63%Neural Network27.27% 83.33%41.00% CDMSSmartScorePrecisionRecallF-measureNaive Bayesian 37.31%69.44%48.54%Neural Network35.21%69.44%46.73%Logistic Regression30.88%58.33%40.39%

IBM Intelligent Miner!95CDMS Smart Score V.S Intelligent MinerCard Application Dataset ()5%

Intelligent MinerPrecisionRecallF-measureDecision Tree36%42%38%Neural Network35%64%45%CDMSSmartScorePrecisionRecallF-measureNaive Bayesian 32.47%69.44%44.25%Neural Network36.07%61.11%45.36%Logistic Regression35.48%61.11%44.90%

5%IBM Intelligent Miner!96CDMS Smart Score V.S SQL Server 2005Small Loan Dataset()

SQL Server 2005PrecisionRecallF-measureDecision Tree20.00%6.45%9.76%Neural NetworkN/A0.00%N/ACDMSSmartScorePrecisionRecallF-measureNaive Bayesian 34.29%38.71%36.36%Neural Network32.65%51.61%40.00%Logistic Regression20.00%70.97%31.21%

Microsoft SQL Server 2005!97CDMS Smart Score V.S SQL Server 2005Small Loan Dataset()5%

SQL Server 2005PrecisionRecallF-measureDecision TreeN/A0.00%N/ANeural NetworkN/A0.00%N/ACDMSSmartScorePrecisionRecallF-measureNaive Bayesian 24.14%45.16%31.46%Neural Network28.57%45.16%35.00%Logistic Regression16.26%64.52%25.97%

5%Microsoft SQL Server 2005!98

99