17
October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY Project Management in Data Mining

October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

Embed Size (px)

Citation preview

Page 1: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

October 2-3, 2015, İSTANBULBoğaziçi University

Prof.Dr. M.Erdal Balaban

Istanbul University

Faculty of Business Administration

Avcılar, Istanbul - TURKEY

Project Management in Data Mining

Page 2: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

PRESENTATION OUTLINE What is Data Mining? Data Mining Environment Decision Making Process CRISP-DM Methodology Phases of Data Mining Process Flowchart of Data Mining Process (Proposal) Conclusions

2/17October 2, 2015

Page 3: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

What is Data Mining?

“Data mining is the process of discovering useful patterns and trends in large data sets.” (Larose, 2014).

Data mining makes the difference which are used in many areas: health care, banking, finance, insurance, telecommunications, manufacturing, retail, market research, and the public sector.

3/17October 2, 2015

Page 4: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

Data Mining Environment

DatabaseTechnology

Statistics

DatabaseTechnology

Data Mining

DatabaseTechnology

Machine Learning

OtherDisciplines

Information Science Visualizations

4/17October 2, 2015

Page 5: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

Decision Making Process

DATA INFORMATION KNOWLEDGE DECISIONS ACTION

5/17October 2, 2015

Page 6: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

CRISP-DM Methodology

CRISP-DM focuses data mining on rapid model development and deployment to optimize decisions.

CRoss-Industry Standard Process for Data Mining (Shearer, 2000)

6/17October 2, 2015

Page 7: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

CRISP-DM

The Cross-Industry Standard Process for Data Mining (CRISP-DM) is the dominant data-mining process framework. It's an open standard; anyone may use it. The following list describes the various phases of the process.

7/17October 2, 2015

Page 8: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

Determine Business Objectives

Collect Initial Data Data SetSelect Modeling Technique

Evaluate Results Plan Deployment

Background Initial Data Collection Report

Data Set Description Modeling Technique Assessment of Data Mining Results w.r.t. Business Success Criteria

Deployment Plan

Business Objectives Describe Data Select Data Modeling Assumptions Approved ModelsPlan Monitoring and Maintenance

Business Success Criteria Data Description Report Rationale for Inclusion/Exclusion

Generate Test Design Review Process Monitoring and Maintenance Plan

Assess Situation Explore Data Clean Data Test Design Review of Process Produce Final Report

Inventory of Resources Data Exploration Report Data Cleaning Report Build Model Determine Next Steps Final Report

Requirements, Assumptions, and Constraints

Verify Data Quality Construct Data Parameter Settings List of Possible Actions Final Presentation

Risks and Contingencies Data Quality Report Derived Attributes Models Decision Review Project

Terminology Generated Reports Model Description Experience Documantation

Costs and Benefits qIntegrate Data Assess Model Determine Data Mining Goals Merged Data Model Assessment

Data Mining Goals qFormat Data Revised Parameter Settings

Data Mining Success Criteria Reformatted Data

Produce Project Plan

Project Plan

Initial Assesment of Tools and Techniques

DataUnderstanding

DataPreparation

Modeling Evaluation DeploymentBusiness

Understanding

Tasks (bold) and outputs (italic) of the CRISP-DM reference model

8/17October 2, 2015

Page 9: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

Define Project

DataGatheringData Sources

DataUnderstanding &Data Selection

DataPreprocessing

SupervisedLearning ?

TrainingDatasetTrainingDataset

TestDataset

TestDataset

Evaluation of Model Performance

Classification Methods

Clustering Methods or Association Rules

SelectingAlgorithm &

Model Building

MeasuringModel

Performance

EvaluateModel

Data Preparation

Model Implementation

Data Mining Phases (Proposal Flowchart)

No

Yes

High

Low

Dataset

Crucial Phase !

Crucial Phase !

KnowledgeRepresentation &

DecisionOctober 2, 2015

Page 10: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

Planing for data mining project Produce project plan: List the stages in the project,

together with duration, resources required, and relations. Define the project Prepare data for data mining modeling Separate data into training and testing parts for

performance evaluation Apply alternative algorithms to build model and

evaluate the model’s performances Implement the model to generate knowledge and

make a decision before action

10/17October 2, 2015

Page 11: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

Define project

Understand the project objectives and requirements on the first phase of data mining

List the assumptions made by the project and list the constrains on the project

Construct a cost-benefit analysis for the project

11/17October 2, 2015

Page 12: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

Prepare data for data mining

Collect the data (or datasets), Select data, Explore data, Clean the data, Reformat data, Transform data.

12/17October 2, 2015

Page 13: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

Separate the dataset for performance evaluation

Select the evaluation method Hold-out Cross validation (k-fold cv) Bootstrapping

13/17October 2, 2015

Page 14: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

Apply alternative algorithms and select the best model There are several techniques for the same data mining

problem type. Some techniques have specific requirements on the form of data. Classification algorithms

k Nearest Neigbour (kNN) Naive Bayes Logistic Regression Decision Trees Support Vector Machines Artificial Neural Networks –ANNs

Clustering Algorithms Assocation Algorithms

The generated models that meet the selected criteria become approved models.

14/17October 2, 2015

Page 15: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

Implement the model to make a decision

Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data.

Apply the model within the organization’s decision making process and then activate.

15/17October 2, 2015

Page 16: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

CONCLUSIONS

1. Data Mining Techniques are important to discover knowledge which is more meaningful and valuable for decision making.

2. Project management approach is important for succeessful data mining. 3. Each phase of data mining process is important but most important

phases are data preparation before modeling and evaluation of model performance after modeling. These crucial phases are usually disregarded or skipped in practice.

4. All phases and sub operations should be planned and scheduled by using project management methods for successful data mining.

16/17October 2, 2015

Page 17: October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY

Thank you very much for your attention and listenning.

Are there any questions and suggestions?

[email protected]

17October 2, 2015