Upload
lorin-whitehead
View
220
Download
0
Embed Size (px)
Citation preview
October 2-3, 2015, İSTANBULBoğaziçi University
Prof.Dr. M.Erdal Balaban
Istanbul University
Faculty of Business Administration
Avcılar, Istanbul - TURKEY
Project Management in Data Mining
PRESENTATION OUTLINE What is Data Mining? Data Mining Environment Decision Making Process CRISP-DM Methodology Phases of Data Mining Process Flowchart of Data Mining Process (Proposal) Conclusions
2/17October 2, 2015
What is Data Mining?
“Data mining is the process of discovering useful patterns and trends in large data sets.” (Larose, 2014).
Data mining makes the difference which are used in many areas: health care, banking, finance, insurance, telecommunications, manufacturing, retail, market research, and the public sector.
3/17October 2, 2015
Data Mining Environment
DatabaseTechnology
Statistics
DatabaseTechnology
Data Mining
DatabaseTechnology
Machine Learning
OtherDisciplines
Information Science Visualizations
4/17October 2, 2015
Decision Making Process
DATA INFORMATION KNOWLEDGE DECISIONS ACTION
5/17October 2, 2015
CRISP-DM Methodology
CRISP-DM focuses data mining on rapid model development and deployment to optimize decisions.
CRoss-Industry Standard Process for Data Mining (Shearer, 2000)
6/17October 2, 2015
CRISP-DM
The Cross-Industry Standard Process for Data Mining (CRISP-DM) is the dominant data-mining process framework. It's an open standard; anyone may use it. The following list describes the various phases of the process.
7/17October 2, 2015
Determine Business Objectives
Collect Initial Data Data SetSelect Modeling Technique
Evaluate Results Plan Deployment
Background Initial Data Collection Report
Data Set Description Modeling Technique Assessment of Data Mining Results w.r.t. Business Success Criteria
Deployment Plan
Business Objectives Describe Data Select Data Modeling Assumptions Approved ModelsPlan Monitoring and Maintenance
Business Success Criteria Data Description Report Rationale for Inclusion/Exclusion
Generate Test Design Review Process Monitoring and Maintenance Plan
Assess Situation Explore Data Clean Data Test Design Review of Process Produce Final Report
Inventory of Resources Data Exploration Report Data Cleaning Report Build Model Determine Next Steps Final Report
Requirements, Assumptions, and Constraints
Verify Data Quality Construct Data Parameter Settings List of Possible Actions Final Presentation
Risks and Contingencies Data Quality Report Derived Attributes Models Decision Review Project
Terminology Generated Reports Model Description Experience Documantation
Costs and Benefits qIntegrate Data Assess Model Determine Data Mining Goals Merged Data Model Assessment
Data Mining Goals qFormat Data Revised Parameter Settings
Data Mining Success Criteria Reformatted Data
Produce Project Plan
Project Plan
Initial Assesment of Tools and Techniques
DataUnderstanding
DataPreparation
Modeling Evaluation DeploymentBusiness
Understanding
Tasks (bold) and outputs (italic) of the CRISP-DM reference model
8/17October 2, 2015
Define Project
DataGatheringData Sources
DataUnderstanding &Data Selection
DataPreprocessing
SupervisedLearning ?
TrainingDatasetTrainingDataset
TestDataset
TestDataset
Evaluation of Model Performance
Classification Methods
Clustering Methods or Association Rules
SelectingAlgorithm &
Model Building
MeasuringModel
Performance
EvaluateModel
Data Preparation
Model Implementation
Data Mining Phases (Proposal Flowchart)
No
Yes
High
Low
Dataset
Crucial Phase !
Crucial Phase !
KnowledgeRepresentation &
DecisionOctober 2, 2015
Planing for data mining project Produce project plan: List the stages in the project,
together with duration, resources required, and relations. Define the project Prepare data for data mining modeling Separate data into training and testing parts for
performance evaluation Apply alternative algorithms to build model and
evaluate the model’s performances Implement the model to generate knowledge and
make a decision before action
10/17October 2, 2015
Define project
Understand the project objectives and requirements on the first phase of data mining
List the assumptions made by the project and list the constrains on the project
Construct a cost-benefit analysis for the project
11/17October 2, 2015
Prepare data for data mining
Collect the data (or datasets), Select data, Explore data, Clean the data, Reformat data, Transform data.
12/17October 2, 2015
Separate the dataset for performance evaluation
Select the evaluation method Hold-out Cross validation (k-fold cv) Bootstrapping
13/17October 2, 2015
Apply alternative algorithms and select the best model There are several techniques for the same data mining
problem type. Some techniques have specific requirements on the form of data. Classification algorithms
k Nearest Neigbour (kNN) Naive Bayes Logistic Regression Decision Trees Support Vector Machines Artificial Neural Networks –ANNs
Clustering Algorithms Assocation Algorithms
The generated models that meet the selected criteria become approved models.
14/17October 2, 2015
Implement the model to make a decision
Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data.
Apply the model within the organization’s decision making process and then activate.
15/17October 2, 2015
CONCLUSIONS
1. Data Mining Techniques are important to discover knowledge which is more meaningful and valuable for decision making.
2. Project management approach is important for succeessful data mining. 3. Each phase of data mining process is important but most important
phases are data preparation before modeling and evaluation of model performance after modeling. These crucial phases are usually disregarded or skipped in practice.
4. All phases and sub operations should be planned and scheduled by using project management methods for successful data mining.
16/17October 2, 2015
Thank you very much for your attention and listenning.
Are there any questions and suggestions?
17October 2, 2015