Upload
julian-patterson
View
85
Download
6
Embed Size (px)
DESCRIPTION
DATA MINING 雲端決策平台 CDMS Smart Score II. --- 以 R 為基礎. 謝邦昌 教授 劉思喆 輔仁大學統計資訊學系暨應用統計所教授 中國人民大學統計學院 & 廈門大學計畫統計系客座教授 首都經貿大學統計學院 & 中央財經大學統計學院博導 上海財經大學統計及管理學院 & 西南財經大學統計學院客座教授 西安財經學院統計學院 & 天津 財經大學統計學院客座教授 山東經濟學院統計學院 & 廣西 財經學院 客座教授 & 新疆 財經 學院客座教授 中華資料採礦協會榮譽理事長. Outline. - PowerPoint PPT Presentation
Citation preview
DATA MININGCDMS Smart Score II--- R
& &&&&&
1OutlineBackground, Motivation & PurposeRelated Work: R + iSmartScore & Microsoft Cloud Computing for BICDMS Smart Score IIExperimental ResultsConclusions2
3
4vs.IBMWatson100IBM
5IBM802,8001642600
IBM
6BackgroundGoogle, Amazon, IBM, Microsoft, Yahoo19802010IT7BackgroundCDMS Smart Score IIDATA MINING(Data Mining)(Cloud Intelligence)8BackgroundDATA MININGCRISP-DM
9BackgroundCRISP-DMDATA MININGIBM Intelligent MinerLicense450SAS Enterprise MinerLicense500SPSS Modeler (IBM)License500Microsoft SQL ServerLicense10010MotivationDATA MINING11Motivation-DATA MININGLicense100500(Noise)(Null Value)(Wrong Value)(Outlier)12Motivation-
13Motivation-DATA MININGDATA MINING14Motivation-150DATA MINING2~3DATA MINING8815Motivation-DATA MINING1,0006004003:2DATA MINING(Imbalanced Class Distribution Problem)16Motivation-(; Majority Class)(; Minority Class)-16%84%17Motivation-DATA MININGClient-ServerClientServer18Motivation-DATA MINING19Motivation-DATA MINING20PurposeDATA MINING- CDMS Smart Score IIDATA MINING21
DATA MINING22AppServMySQLPHPApache23Cloud Computing(Cloud Computing)24
25
CDMS Smart Score II26CDMS Smart Score IIDATA MINING(Browser) DATA MININGDATA MINING(Client)leeysCDMS Smart Score II27SOP
111213
70%30%F-measureAUCGini28CDMS Smart Score IIleeys6DATA MINING()
29CDMS Smart Score II(Wizard)DATA MININGDATA MINING(())CDMS Smart Score IIDATA MINING1DATA MININGCDMS Smart Score II1~2DATA MININGCDMS Smart Score IIleeys(Small Loan)30CDMS Smart Score IIleeysDATA MINING(Data Cleansing)OK&Next(Null Value Imputation)
31CDMS Smart Score II(Report)CDMS Smart Score II(Importance)((Null Value)(Outlier))32CDMS Smart Score II(Report Sharing)CDMS Smart Score II
33CDMS Smart Score II
34CDMS Smart Score IIleeys(Small Loan)sjyensjyen(Report Sharing)sjyenleeyssjyensjyen35CDMS Smart Score IIAnytime, Anywhere36Related Work: R + iSmartScore
37Related Work: R + iSmartScore2006ISmartSoft Inc.R + iSmartScoreDATA MININGR + iSmartScoreDATA MINING50038Related Work: R + iSmartScoreR + iSmartScoreDATA MINING//() ()(6)39Microsoft SQL Server 2008 Cloud Computing
40Related Work: Microsoft CC2010DATA MINING(http://www.sqlserverdatamining.com/cloud/)SQL Server 2005DATA MININGSQL Server 2005R + iSmartScoreDATA MININGR + iSmartScore41CDMS Smart Score IIDATA MININGDATA MINING-CDMS Smart Score II42
Knowledge Discovery in DatabaseData Warehousing (DB & SSIS)SSASSSRS43CDMS Smart Score IICDMS Smart Score II(Classification)(Prediction)(Clustering)(Association)44CDMS Smart Score II
45CDMS Smart Score II
46System Architecture for ClassificationUpload Dataset / Database Connection
MemberLoginFlat File / DatabaseMember DatabaseAttribute SelectionPartition DatasetTarget AttributeDataCleansingNull Value ImputationDataProfilingDataCodingDerivedAttributesDataFilteringAttribute ImportanceStatisticsTrainModelThresholdOptimizationTestModelReportingSingleScoreBatchScore
47CDMS Smart Score- Member Login http://120.125.85.66/mining/index.php
CDMS Smart Score II (P.S. test test )48
CDMS Smart Score- Upload Dataset.csvMicrosoft SQL Server 200849
CDMS Smart Score- Upload Dataset(/)50CDMS Smart Score- Upload Dataset
51
CDMS Smart Score- Target Attribute52
CDMS Smart Score- Attribute Selection()53
CDMS Smart Score- Partition Dataset54
CDMS Smart Score- Data Cleansing55
CDMS Smart Score- Null Value Imputation56
CDMS Smart Score- Null Value Imputation57
CDMS Smart Score - Data Profiling58
CDMS Smart Score - Data Profiling59
CDMS Smart Score - Data Profiling60CDMS Smart Score- Data Coding
61CDMS Smart Score- Data Coding(Arbitrary Assignment)(Discretization)
62
CDMS Smart Score- Derived Attributes63
CDMS Smart Score- Data Filtering ()64
CDMS Smart Score- Attribute Importance65
CDMS Smart Score- Statisticschecking_status66
CDMS Smart Score- Statisticsnum_dependents67
CDMS Smart Score- Train Model (NN Training Result)468
CDMS Smart Score- Train Model (LR Training Result)69CDMS Smart Score- Train Model (DT Training Result)
70
CDMS Smart Score- Threshold OptimizationF-measureBenefit71
CDMS Smart Score- Threshold Optimization (F-Measure)F-measure472
CDMS Smart Score- Confusion MatrixType I ErrorType II Error73
CDMS Smart Score- Confusion MatrixF-F-74
CDMS Smart Score Lift ChartIdeal AreaModel AreaScore = 75
CDMS Smart Score Lift Chart76
CDMS Smart Score- Precision vs. Recall(Precision)(Recall)77
CDMS Smart Score- Threshold Optimization (Benefit) Benefit78
CDMS Smart Score- Threshold Optimization (Benefit)Benefit79CDMS Smart Score- Threshold Optimization (Benefit)Confusion MatrixModel BenefitLift ChartPrecision vs. RecallProfit Chart
80
CDMS Smart Score Model BenefitModel BenefitIdeal BenefitModel Benefit Ratio=81
CDMS Smart Score- Profit ChartNeural Network30%82
CDMS Smart Score- Profit ChartNeural NetworkNaive Bayes30% 83
CDMS Smart Score- Profit ChartNBNNLR11.05DT84
CDMS Smart Score- Test Model85
CDMS Smart Score- Single Score78.384%86CDMS Smart Score- Deployment
http://120.125.85.122/SScore.aspx?id=AguaBOVw1z 87CDMS Smart Score- Deployment
88
CDMS Smart Score- Batch Score89
CDMS Smart Score- Batch Score90duration1 (good)
1 (bad)Predicted Results91
CDMS Smart Score- Reporting
92CDMS Smart Score V.S SPSS ClementineCar Insurance Dataset()ClementinePrecisionRecallF-measureDecision Tree55.56%6.33%11.36%Neural NetworkN/A 0.00%N/A CDMSSmartScorePrecisionRecallF-measureNaive Bayesian 33.86%81.01%47.76%Neural Network34.11%55.70%42.31%Logistic Regression35.63%78.48%49.01%
SPSS Clementine!93CDMS Smart Score V.S SPSS ClementineCar Insurance Dataset()5%
ClementinePrecisionRecallF-measureDecision Tree62.50%6.33%11.49%Neural Network39.29%13.92%20.56%CDMSSmartScorePrecisionRecallF-measureNaive Bayesian 36.84%70.89%48.49%Neural Network36.08%72.15%48.10%Logistic Regression35.71%75.95%48.58%
5%SPSS Clementine!94CDMS Smart Score V.S Intelligent MinerCard Application Dataset ()
Intelligent MinerPrecisionRecallF-measureDecision Tree50.00%55.56%52.63%Neural Network27.27% 83.33%41.00% CDMSSmartScorePrecisionRecallF-measureNaive Bayesian 37.31%69.44%48.54%Neural Network35.21%69.44%46.73%Logistic Regression30.88%58.33%40.39%
IBM Intelligent Miner!95CDMS Smart Score V.S Intelligent MinerCard Application Dataset ()5%
Intelligent MinerPrecisionRecallF-measureDecision Tree36%42%38%Neural Network35%64%45%CDMSSmartScorePrecisionRecallF-measureNaive Bayesian 32.47%69.44%44.25%Neural Network36.07%61.11%45.36%Logistic Regression35.48%61.11%44.90%
5%IBM Intelligent Miner!96CDMS Smart Score V.S SQL Server 2005Small Loan Dataset()
SQL Server 2005PrecisionRecallF-measureDecision Tree20.00%6.45%9.76%Neural NetworkN/A0.00%N/ACDMSSmartScorePrecisionRecallF-measureNaive Bayesian 34.29%38.71%36.36%Neural Network32.65%51.61%40.00%Logistic Regression20.00%70.97%31.21%
Microsoft SQL Server 2005!97CDMS Smart Score V.S SQL Server 2005Small Loan Dataset()5%
SQL Server 2005PrecisionRecallF-measureDecision TreeN/A0.00%N/ANeural NetworkN/A0.00%N/ACDMSSmartScorePrecisionRecallF-measureNaive Bayesian 24.14%45.16%31.46%Neural Network28.57%45.16%35.00%Logistic Regression16.26%64.52%25.97%
5%Microsoft SQL Server 2005!98
99