54
欢欢欢欢 欢欢欢欢 欢欢 欢欢 SQL SQL 欢欢欢欢 欢欢欢欢 / / 欢欢欢欢 欢欢欢欢 欢欢欢欢欢 欢欢欢欢欢

欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

  • Upload
    neola

  • View
    159

  • Download
    9

Embed Size (px)

DESCRIPTION

欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会. 今日安排. 微软 SQL 数据挖掘技术概述 左洪 微软公司 数据仓库在电信的应用 贝志城 明天高科 数据挖掘在 CRM 中的应用 王立军 中圣公司 灵通 IT Service 维护管理服务系统 邹雄文 广州灵通. Introduction to Data Mining with SQL Server 2000 左洪 高级产品市场经理 微软(中国)有限公司. Agenda. What is Data Mining The Data Mining Market - PowerPoint PPT Presentation

Citation preview

Page 1: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

欢迎光临欢迎光临微软微软 SQLSQL 数据挖掘数据挖掘 // 数据仓库数据仓库

技术研讨会技术研讨会

Page 2: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

今日安排

• 微软 SQL 数据挖掘技术概述− 左洪 微软公司

• 数据仓库在电信的应用− 贝志城 明天高科

• 数据挖掘在 CRM 中的应用− 王立军 中圣公司

• 灵通 IT Service 维护管理服务系统 – 邹雄文 广州灵通

Page 3: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Introduction to Data Mining Introduction to Data Mining with SQL Server 2000 with SQL Server 2000

左洪 左洪 高级产品市场经理高级产品市场经理微软(中国)有限公司微软(中国)有限公司

Page 4: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

AgendaAgenda

What is Data MiningWhat is Data Mining The Data Mining MarketThe Data Mining Market OLE DB for Data MiningOLE DB for Data Mining Overview of the Data Mining Overview of the Data Mining

Features in SQL Server 2000Features in SQL Server 2000 Q&AQ&A

Page 5: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

What Is Data Mining?What Is Data Mining?

Page 6: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

What is DM?What is DM? A process of data exploration and A process of data exploration and

analysis using automatic or semi-analysis using automatic or semi-automatic meansautomatic means ““Exploring data” – scanning samples of known Exploring data” – scanning samples of known

facts about “cases”.facts about “cases”. ““knowledge”: knowledge”: Clusters, Rules, Decision treesClusters, Rules, Decision trees, , Equations, Equations,

Association rules…Association rules…

Once the “knowledge” is extracted it:Once the “knowledge” is extracted it: Can be browsed Can be browsed

Provides a very useful insight on the cases behaviorProvides a very useful insight on the cases behavior

Can be used to predict values of other casesCan be used to predict values of other cases Can serve as a key element in closed loop analysisCan serve as a key element in closed loop analysis

Page 7: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

What drive high school What drive high school students to attend college?students to attend college?

Page 8: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

The deciding factors for high The deciding factors for high school students to attend college school students to attend college are…are…

Attend College:55% Yes45% No

All Students

Attend College:79% Yes11% No

IQ=High

Attend College:45% Yes55% No

IQ=Low

IQ ?

Wealth

Attend College:94% Yes6% No

Wealth = True

Attend College:69% Yes21% No

Wealth = False

ParentsEncourage?

Attend College:70% Yes30% No

Attend College:31% Yes69% No

ParentsEncourage = No

ParentsEncourage = Yes

Page 9: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Business Oriented DM ProblemsBusiness Oriented DM Problems

Targeted adsTargeted ads ““What banner should I display to this visitor?”What banner should I display to this visitor?”

Cross sellsCross sells ““What other products is this customer likely to buy?What other products is this customer likely to buy?

Fraud detectionFraud detection ““Is this insurance claim a fraud?”Is this insurance claim a fraud?”

Churn analysisChurn analysis ““Who are those customers likely to churn?”Who are those customers likely to churn?”

Risk ManagementRisk Management ““Should I approve the loan to this customer?”Should I approve the loan to this customer?”

… …

Page 10: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会
Page 11: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Http://www.tunes.comHttp://www.tunes.com

Page 12: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Mining Model

Mining Process - IllustratedMining Process - Illustrated

DMEngine

Data To Predict

DMEngine

Predicted Data

Training Data

Mining Model

Mining Model

Page 13: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

The Data Mining MarketThe Data Mining Market

Page 14: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

The $$$: Y2000 Market Size The $$$: Y2000 Market Size

DM Tools Market: $250MDM Tools Market: $250M 40% - license fees40% - license fees 60% consulting60% consulting

* Gartner

Page 15: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

The PlayersThe Players

Leading vendorsLeading vendors SASSAS SPSSSPSS IBMIBM Hundreds of smaller vendors offering DM Hundreds of smaller vendors offering DM

algorithms…algorithms…

Oracle –Thinking Machines acquisitionOracle –Thinking Machines acquisition

Page 16: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

The ProductsThe Products End-to-end Data Mining toolsEnd-to-end Data Mining tools

Extraction, Cleansing, Loading, Modeling, Algorithms (dozens), Extraction, Cleansing, Loading, Modeling, Algorithms (dozens), Analysts workbench, Reporting, Charting….Analysts workbench, Reporting, Charting….

The customer is the power-analystThe customer is the power-analyst PhD in statistics is usually required…PhD in statistics is usually required…

Closed tools – no standard APIClosed tools – no standard API Total vendor lock-inTotal vendor lock-in Limited integration with applicationsLimited integration with applications

DM an “outsider” in the Data WarehouseDM an “outsider” in the Data Warehouse Extensive consulting requiredExtensive consulting required Sky rocketing pricesSky rocketing prices

$60K+ for a single user license$60K+ for a single user license

Page 17: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

What the analysts say…What the analysts say…

““Stand-alone Data Mining Is Dead” Stand-alone Data Mining Is Dead” - Forrester- Forrester

““The demise of [stand alone] data The demise of [stand alone] data mining” – Gartnermining” – Gartner

Page 18: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

The Microsoft ApproachThe Microsoft Approach

Page 19: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

DataPro Users Survey DataPro Users Survey 1999-20011999-2001

““Data mining will be the fastest-Data mining will be the fastest-growing BI technology…”growing BI technology…”

Page 20: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

The $$$: 2000 Market Size The $$$: 2000 Market Size

DM DM ApplicationsApplications Market Size: Market Size: $1.5B$1.5B

* IDC

Page 21: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

SQL Server 2000 - The SQL Server 2000 - The Analysis PlatformAnalysis Platform SQL 2000 provides a complete Analysis SQL 2000 provides a complete Analysis

PlatformPlatform Not an isolated, stand alone DM productNot an isolated, stand alone DM product

Platform means:Platform means: The infrastructure for applicationsThe infrastructure for applications

Not an application by itselfNot an application by itself

Integrated vision for all technologies, toolsIntegrated vision for all technologies, tools Standard based API’s (OLE DB for DM)Standard based API’s (OLE DB for DM) ExtensibleExtensible ScaleableScaleable

Page 22: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Data FlowData Flow

DWOLTP OLAP

DMAppsReports

& Analysis

DM

Page 23: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Analysis Services 2000 - Analysis Services 2000 - ArchitectureArchitecture

Manager UI

DSO

Analysis Server Client

OLE DB OLAP

OLAPEngine(local)

OLAPEngine

DMEngine

DMEngine(local)

DM

DMM

DM Wizards

DM DTS Task

Ext. Ext.

Page 24: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

OLE DB for Data Mining…OLE DB for Data Mining…

Page 25: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Why OLE DB for DM?Why OLE DB for DM?

Make DM a Make DM a mass market technologymass market technology by: by: Leverage existing technologies and knowledge Leverage existing technologies and knowledge

SQL and OLE DB SQL and OLE DB Common industry wide concepts and data Common industry wide concepts and data

presentationpresentation Changing DM market perception from Changing DM market perception from

“proprietary” to “open”“proprietary” to “open” Increasing the number of players:Increasing the number of players:

Reduce the cost and risk of becoming a consumer – one tool Reduce the cost and risk of becoming a consumer – one tool works with multiple providersworks with multiple providers

Reduce the cost and risk of becoming a provider – focus on Reduce the cost and risk of becoming a provider – focus on expertise and find many partners to complement offeringexpertise and find many partners to complement offering

Dramatically increase the number of DM developersDramatically increase the number of DM developers

Page 26: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Integration With RDBMSIntegration With RDBMS

Customers would like to Customers would like to Build DM models from within their RDBMSBuild DM models from within their RDBMS Train the models directly off their relational tablesTrain the models directly off their relational tables Perform predictions as relational queries (tables Perform predictions as relational queries (tables

in, tables out)in, tables out) Feel that DM is a native part of their database.Feel that DM is a native part of their database.

Therefore…Therefore… Data mining models are relational objectsData mining models are relational objects All operations on the models are relationalAll operations on the models are relational The language used is SQL (w/Extensions)The language used is SQL (w/Extensions)

The effect: every DBA and VB developer can The effect: every DBA and VB developer can become a DM developerbecome a DM developer

Page 27: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Creating a Data Mining Creating a Data Mining Model (DMM)Model (DMM)

Page 28: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Identifying the “Cases”Identifying the “Cases”

DM algorithms analyze “cases”DM algorithms analyze “cases” The “case” is the entity being categorized The “case” is the entity being categorized

and classifiedand classified ExamplesExamples

Customer credit risk analysis: Customer credit risk analysis: Case = CustomerCase = Customer Product profitability analysis: Product profitability analysis: Case = ProductCase = Product Promotion success analysis: Promotion success analysis: Case = PromotionCase = Promotion

Each case encapsulate all we know about Each case encapsulate all we know about the entitythe entity

Page 29: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

A Simple Set of CasesA Simple Set of Cases

StudentIStudentIDD GenderGender

Parent Parent

IncomeIncomeIQIQ EncouragementEncouragement

CollegeCollege

PlansPlans

11 MaleMale 2340023400 120120 Not Not EncouragedEncouraged

NoNo

22 FemaleFemale 7920079200 9090 EncouragedEncouraged YesYes

33 MaleMale 4200042000 105105 Not Not EncouragedEncouraged

YesYes

Page 30: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

More Complicated CasesMore Complicated Cases

Cust Cust IDID AgeAge

MaritMaritalal

StatuStatuss

IQIQ

Favorite MoviesFavorite Movies

TitleTitle ScoreScore

11 3535 MM 22 Star WarsStar Wars 88

Toy StoryToy Story 99

TerminatorTerminator 77

22 2020 SS 33 Star WarsStar Wars 77

BraveheartBraveheart 77

The MatrixThe Matrix 1010

33 5757 MM 22 Sixth SenseSixth Sense 99

CasablancaCasablanca 1010

Page 31: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

A DMM is a Table!A DMM is a Table!

A DMM structure is defined as a tableA DMM structure is defined as a table Training a DMM means inserting data into Training a DMM means inserting data into

the tablethe table Predicting from a DMM means querying Predicting from a DMM means querying

the tablethe table

All information describing the case are All information describing the case are contained in columnscontained in columns

Page 32: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Creating a Mining ModelCreating a Mining Model

CREATE MINING MODEL [Plans Prediction]CREATE MINING MODEL [Plans Prediction]

((

StudentID LONG KEY,StudentID LONG KEY,

Gender TEXT DISCRETE,Gender TEXT DISCRETE,

ParentIncome LONG CONTINUOUS,ParentIncome LONG CONTINUOUS,

IQ DOUBLE CONTINUOUS,IQ DOUBLE CONTINUOUS,

Encouragement TEXT DISCRETE, Encouragement TEXT DISCRETE,

CollegePlans TEXT DISCRETE PREDICTCollegePlans TEXT DISCRETE PREDICT

))

USING Microsoft_Decision_TreesUSING Microsoft_Decision_Trees

Page 33: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Creating a mining model Creating a mining model with nested tablewith nested tableCreate Mining Model MoviePrediction Create Mining Model MoviePrediction

( (

CutomerId long key, CutomerId long key,

Age long continuous, Age long continuous,

Gender discrete,Gender discrete,

Education discrete,Education discrete,

MovieList table predict ( MovieList table predict (

MovieName text key MovieName text key

) )

) )

using microsoft_decision_treesusing microsoft_decision_trees

Page 34: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Training a DMMTraining a DMM

Page 35: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Training a DMMTraining a DMM Training a DMM means passing it data for Training a DMM means passing it data for

which the attributes to be predicted are knownwhich the attributes to be predicted are known Multiple passes are handled internally by the Multiple passes are handled internally by the

provider!provider!

Use an INSERT INTO statementUse an INSERT INTO statement The DMM will not persist the inserted data The DMM will not persist the inserted data Instead it will analyze the given cases and Instead it will analyze the given cases and

build the DMM content (decision tree, build the DMM content (decision tree, segmentation model, association rules)segmentation model, association rules)

INSERT [INTO] <mining model name>INSERT [INTO] <mining model name>

[(columns list)][(columns list)]<source data query><source data query>

Page 36: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

INSERT INTOINSERT INTO

INSERT INTO [Plans PredictionPlans Prediction](StudentID, Gender, ParentIncome, IQ,Encouragement, CollegePlans)SELECT

[StudentID], [Gender], [ParentIncome], [IQ],[Encouragement], [CollegePlans]

FROM [CollegePlans]

Page 37: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

When Insert Into Is Done…When Insert Into Is Done…

The DMM is trainedThe DMM is trained The model can be retrained The model can be retrained Content (rules, trees, formulas) can be Content (rules, trees, formulas) can be

exploredexplored OLE DB Schema rowsetOLE DB Schema rowset SELECT * FROM <dmm>.CONTENTSELECT * FROM <dmm>.CONTENT XML string (PMML)XML string (PMML)

Prediction queries can be executedPrediction queries can be executed

Page 38: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

PredictionsPredictions

Page 39: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

What are Predictions?What are Predictions? Predictions apply the rules of a trained Predictions apply the rules of a trained

model to a new set of data in order to model to a new set of data in order to estimate missing attributes or valuesestimate missing attributes or values

Predictions = queriesPredictions = queries The syntax is SQL - likeThe syntax is SQL - like The output is a rowsetThe output is a rowset

In order to predict you need:In order to predict you need: Input data setInput data set A trained DMMA trained DMM Binding (mapping) information between the Binding (mapping) information between the

input data and the DMMinput data and the DMM Specification of what to predictSpecification of what to predict

Page 40: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

The Truth Table ConceptThe Truth Table Concept

GendeGenderr

Parent Parent

IncomeIncomeIQIQ EncouragementEncouragement

CollegCollegee

PlansPlansProbabilityProbability

MaleMale 2000020000 8585 Not EncouragedNot Encouraged NoNo 85%85%

MaleMale 2000020000 8585 Not EncouragedNot Encouraged YesYes 15%15%

MaleMale 2000020000 8585 EncouragedEncouraged NoNo 60%60%

MaleMale 2000020000 8585 EncouragedEncouraged YesYes 40%40%

MaleMale 2000020000 9090 Not EncouragedNot Encouraged NoNo 80%80%

MaleMale 2000020000 9090 Not EncouragedNot Encouraged YesYes 20%20%

MaleMale 2000020000 9090 EncouragedEncouraged NoNo 58%58%

……

Page 41: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

PredictionPrediction

GenderGender ParentParent

IncomeIncome

IQIQ EncouragementEncouragement College College PlansPlans

ProbabilityProbability

MaleMale 2000020000 8585 Not EncouragedNot Encouraged NoNo 85%85%

MaleMale 2000020000 8585 Not EncouragedNot Encouraged YesYes 15%15%

MaleMale 2000020000 8585 EncouragedEncouraged NoNo 60%60%

MaleMale 2000020000 8585 EncouragedEncouraged YesYes 40%40%

MaleMale 2000020000 9090 Not EncouragedNot Encouraged NoNo 80%80%

MaleMale 2000020000 9090 Not EncouragedNot Encouraged YesYes 20%20%

MaleMale 2000020000 9090 EncouragedEncouraged NoNo 58%58%

MaleMale 2000020000 9090 EncouragedEncouraged YesYes 42%42%

MaleMale 2000020000 9595 Not EncouragedNot Encouraged NoNo 78%78%

MaleMale 2000020000 9595 Not EncouragedNot Encouraged YesYes 22%22%

MaleMale 2000020000 9595 EncouragedEncouraged NoNo 45%45%

It’s a JOIN!It’s a JOIN!

StudentStudentIDID

GenderGender ParentParent

IncomeIncome

IQIQ EncouragementEncouragement

11 MaleMale 4300043000 8585 Not EncouragedNot Encouraged

22 MaleMale 2000020000 135135 Not EncouragedNot Encouraged

33 FemaleFemale 2500025000 105105 EncouragedEncouraged

44 MaleMale 9600096000 100100 EncouragedEncouraged

55 FemaleFemale 5600056000 125125 Not EncouragedNot Encouraged

66 FemaleFemale 4600046000 9090 Not EncouragedNot Encouraged

Page 42: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

The Prediction Query The Prediction Query SyntaxSyntax

SELECT SELECT <columns to return or predict><columns to return or predict>

FROM FROM

<dmm> <dmm> PREDICTION JOIN PREDICTION JOIN

<input data set><input data set>

ONON <dmm column> <dmm column> = = <dmm input <dmm input column>…column>…

Page 43: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

ExampleExampleSELECT SELECT [New Students].[StudentID], [New Students].[StudentID],

[Plans Prediction].[CollegePlans], [Plans Prediction].[CollegePlans],

PredictProbability([CollegePlans])PredictProbability([CollegePlans])

FROM FROM

[Plans Prediction] [Plans Prediction] PREDICTION JOINPREDICTION JOIN

[New Students][New Students]

ON ON [Plans Prediction].[Gender][Plans Prediction].[Gender] = =

[New Students].[Gender] [New Students].[Gender] ANDAND

[Plans Prediction].[IQ][Plans Prediction].[IQ] = =

[New Students].[IQ] [New Students].[IQ] AND ...AND ...

Page 44: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

OLE DB DM Sample OLE DB DM Sample Provider with SourceProvider with Source

All required OLE DB objects, such as session, All required OLE DB objects, such as session, command, and rowset command, and rowset

The OLE DB for Data Mining syntax parser The OLE DB for Data Mining syntax parser Tokenization of input data Tokenization of input data Query processing engine Query processing engine A sample Naïve Bayes algorithm A sample Naïve Bayes algorithm Model persistence in XML and binary formats Model persistence in XML and binary formats Available at Available at

www.microsoft.com/data/oledb/DMResKit.htmwww.microsoft.com/data/oledb/DMResKit.htm

Page 45: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Integrated OLAP and DM Integrated OLAP and DM AnalysisAnalysis

Page 46: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Why Use DM with OLAPWhy Use DM with OLAP

Relational DM is designed for:Relational DM is designed for: Reports of patternsReports of patterns Batch predictions fed into an OLTP systemBatch predictions fed into an OLTP system Real-time singleton prediction in an Real-time singleton prediction in an

operational environmentoperational environment

OLAP is designed for OLAP is designed for interactive analysis by a knowledge worker interactive analysis by a knowledge worker Consistent and convenient navigational Consistent and convenient navigational

modelmodel Pre-aggregations of OLAP allow faster Pre-aggregations of OLAP allow faster

performanceperformance

Page 47: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Understanding DM Understanding DM Content – Decision TreesContent – Decision Trees

Credit Risk:65% Good35% Bad

All Customers

Credit Risk:89% Good11% Bad

Debt=Low

Credit Risk:94% Good6% Bad

ET = Salaried

Credit Risk:70% Good30% Bad

Education?

Credit Risk:31% Good69% Bad

Education=High School

Credit Risk:79% Good21% Bad

Credit Risk:45% Good55% Bad

Debt=High

Debt ?

Employ--ment Type?

ET = SelfEmployed

Education=College

Customers having high debt and college education:

Filter([Individual Customers].Members,Customers.CurrentMember.Properties(“Debt”) = “High”And Customers.CurrentMember.Properties(“Education”) = “College”)

Customers having low debt and are self employed:

Filter([Individual Customers].Members,Customers.CurrentMember.Properties(“Debt”) = LowAnd Customers.CurrentMember.Properties(“Employment Type”) = “Self Employed”)

Page 48: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

……Equivalent DM DimensionEquivalent DM Dimension

Customers with high debt and college education

All Customers

Customers with high debt

Customers with high debt and high school education

Customers with low debt and self employed

Customers with low debt

Customers with low debt and salaried

Custom Custom

Roll-upRoll-upCredit RiskCredit Risk

-- Good = 65%, Bad = Good = 65%, Bad = 35%35%

Aggregate(Filter(Aggregate(Filter(……

Good = 89%, Bad = Good = 89%, Bad = 11%11%

Aggregate(Filter(Aggregate(Filter(……

Good = 79%, Bad = Good = 79%, Bad = 21%21%

Aggregate(Filter(Aggregate(Filter(……

Good = 94%, Bad = Good = 94%, Bad = 6%6%

Aggregate(Filter(Aggregate(Filter(……

Good = 45%, Bad = Good = 45%, Bad = 55%55%

Aggregate(Filter(Aggregate(Filter(……

Good = 70%, Bad = Good = 70%, Bad = 30%30%

Aggregate(Filter(Aggregate(Filter(……

Good = 31%, Bad = Good = 31%, Bad = 69%69%

Page 49: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Tree = DimensionTree = Dimension Every node on the tree is a dimension memberEvery node on the tree is a dimension member The node statistics are the member propertiesThe node statistics are the member properties All members are calculatedAll members are calculated

Formula aggregates the case dimension members Formula aggregates the case dimension members that apply to this nodethat apply to this node

The MDX is generated by the DM algorithmThe MDX is generated by the DM algorithm

Analysis Service will automatically generate the Analysis Service will automatically generate the calculated dimension based on the DM content calculated dimension based on the DM content and also a virtual cubeand also a virtual cube

Applies to Applies to Classification (decision trees)Classification (decision trees) Segmentation (clusters)Segmentation (clusters)

Page 50: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Browsing the Virtual CubeBrowsing the Virtual Cube

Pivot the DM dimension:Pivot the DM dimension:

WAWA OROR CACA

All CustomersAll Customers 32003200 25002500 80008000

Customers with low debtCustomers with low debt 23202320 15031503 43004300

Customers with high debtCustomers with high debt 880880 997997 47004700

Customers … collegeCustomers … college 320320 450450 23102310

Customers … high schoolCustomers … high school 560560 547547 23902390

Credit Risk: 70% Good, 30% Bad

Page 51: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

PredictionsPredictions

You might want to view predictions for each You might want to view predictions for each casecase

For example:For example: What is the expected profitability of a product?What is the expected profitability of a product? What is the credit risk of a specific customer?What is the credit risk of a specific customer? What are the products this customer is likely to buy?What are the products this customer is likely to buy?

All of those predictions are available through All of those predictions are available through MDX calculated membersMDX calculated members

Singleton query is created automaticallySingleton query is created automatically

Page 52: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Prediction Calculated Prediction Calculated MemberMemberMeasures.[Probability of High Credit Risk]:Measures.[Probability of High Credit Risk]:

PREDICT(Customers.CurrentMember, PREDICT(Customers.CurrentMember,

““Credit Risk Model”,Credit Risk Model”,

““PredictionProbability(PredictionProbability(

PredictionHistogram(“PredictionHistogram(“Credit Credit RiskRisk”),”),

‘‘HighHigh’)’)““

))

Page 53: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Predictions ExamplePredictions Example

Probability of Probability of

High Credit High Credit RiskRisk

Probability of Probability of

Low Credit Low Credit RiskRisk

Joe SmithJoe Smith 73%73% 27%27%

John DowJohn Dow 68%68% 32%32%

William ClingtonWilliam Clington 45%45% 55%55%

Robert MaxwellRobert Maxwell 98%98% 2%2%

Denis RodmanDenis Rodman 81%81% 19%19%

Page 54: 欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会

Questions ?Questions ?

E-Mail: [email protected]: [email protected]://www.microsoft.com/china/sqlhttp://www.microsoft.com/china/sql