An Excel-based Data Mining Tool

Chapter 4

4.1 The iData Analyzer

Figure 4.1 The iDA system architecture

PreProcessor

Interface

HeuristicAgent

NeuralNetworks

LargeDataset

MiningTechnique

GenerateRules

RulesRuleMaker

ReportGenerator

ExcelSheets

Explaination

Figure 4.2 A successful installation

4.2 ESX: A Multipurpose Tool for Data Mining

CnC1 C2

I11 I1jI12

Root Level

Instance Level

Concept Level

I21 I2kI22

. . . In1 InlIn2

Figure 4.3 An ESX concept hierarchy

4.3 iDAV Format for Data Mining

Table 4.1 • Credit Card Promotion Database: iDAV Format

Income Magazine Watch Life Insurance Credit CardRange Promotion Promotion Promotion Insurance Sex Age

C C C C C C RI I I I I I I

40–50K Yes No No No Male 4530–40K Yes Yes Yes No Female 4040–50K No No No No Male 4230–40K Yes Yes Yes Yes Male 4350–60K Yes No Yes No Female 3820–30K No No No No Female 5530–40K Yes No Yes Yes Male 3520–30K No Yes No No Male 2730–40K Yes No No No Male 4330–40K Yes Yes Yes No Female 4140–50K No Yes Yes No Female 4320–30K No Yes Yes No Male 2950–60K Yes Yes Yes No Female 3940–50K No Yes No No Male 5520–30K No No Yes Yes Female 19

Table 4.2 • Values for Attribute Usage

Character Usage

I The attribute is used as an input attribute.

U The attribute is not used. D The attribute is not used for classification or clustering, but

attribute value summary information is displayed in all output reports.

O The attribute is used as an output attribute. For supervised learning with ESX, exactly one categorical attribute is selected as the output attribute.

4.4 A Five-step Approach for Unsupervised Clustering

Step 1: Enter the Data to be Mined

Step 2: Perform a Data Mining Session

Step 3: Read and Interpret Summary Results

Step 4: Read and Interpret Individual Class Results

Step 5: Visualize Individual Class Rules

Step 1: Enter The Data To Be Mined

Figure 4.4 The Credit Card Promotion Database

Step 2: Perform A Data Mining Session

Figure 4.5 Unsupervised settings for ESX

Figure 4.6 RuleMaker options

• Class Resemblance Scores

• Domain Resemblance Score

• Domain Predictability

Figure 4.8 Summery statistics for the Acme credit card promotion database

Figure 4.9 Statistics for numerical attributes and common categorical attri

bute values

Step 4: Read and Interpret Individual Class Results

• Class Predictability is a within-class measure.

• Class Predictiveness is a between-class measure.

Figure 4.10 Class 3 summary results

Figure 4.11 Necessary and sufficient attribute values for Class 3

Step 5: Visualize Individual Class Rules

Figure 4.7 Rules for the credit card promotion database

4.5 A Six-Step Approach for Supervised Learning

Step 1: Choose an Output Attribute

Step 2: Perform the Mining Session

Step 4: Read and Interpret Test Set Results

Step 5: Read and Interpret Class Results

Step 6: Visualize and Interpret Class Rules

Figure 4.12 Test set instance classification

Read and Interpret Test Set Results

4.6 Techniques for Generating Rules

Simple Procedure for Creating Best Set of Covering Rules

1.Choose an attribute that best differentiate all domains.2.Use the attribute to further subdivide instances into classes.3.For each subclass created in step 2 3.1 If the instances in the subclass satisfy a predefined criteria Then generate a defining rule for the subclass. 3.2 If the subclass does not satisfy the predefined criteria Then repeat step 1

4.6 Techniques for Generating Rules (RuleMaker)

1. Define the scope of the rules.

2. Choose the instances.

3. Set the minimum rule correctness.

4. Define the minimum rule coverage.

5. Choose an attribute significance value.

4.7 Instance Typicality

• The average similarity of instance to all other instances within its class.

• Identify prototypical and outlier instances.

• Select a best set of training instances.

• Used to compute individual instance classification confidence scores.

Figure 4.13 Instance typicality

4.8 Special Considerations and Features

• Avoid Mining Delays – at some point copy the original data into another Excel sheet

• The Quick Mine Feature – recommended when the dataset contains more than 2000 instances

• Erroneous and Missing Data – blank lines, beyond the last column, invalid characters

An Excel-based Data Mining Tool

Documents

A Concept of Network Analysis Tool by Data Mining

Evaluating a multi-media based tool for self-learning geographical information

Text Clustering: A Case Study A Multilingual Text Mining Approach Based On Self-Organizing Maps

Data Mining : A First View DATA MINING - A Tutorial Based Primer 2008. 3. 20 서 진 이 HPC Lab, UOS

Data Mining Oriented CRM Systems Based on MUSASHI: C- · PDF fileData Mining Oriented CRM Systems Based on MUSASHI: C-MUSASHI Katsutoshi Yada1, Yukinobu Hamuro2, Naoki Katoh3, Takashi

KaPPA-View. A Web-Based Analysis Tool for …...Bioinformatics KaPPA-View. A Web-Based Analysis Tool for Integration of Transcript and Metabolite Data on Plant Metabolic Pathway Maps1[w]

MAVIS: A Visualization Tool for Cohesion-based Bad Smell Inspection

CENTRO UNIVERSITÁRIO LA SALLE - biblioteca.unilasalle.edu.br · As a research tool used the motivograma based on Maslow, and a questionnaire based and a questionnaire based audit

ALivE Adaptation, Livelihoods and Ecosystem Planning Tool ... · Adaptation, Livelihoods and Ecosystems Planning Tool: User Manual vi Ecosystem-based Adaptation (EbA): “The use

Process Mining - Softwarepakketten.nl · 2019-12-19 · Process Mining Turning event data into value, responsibly 4 Abstract Process Mining is een techniek, tool en een methode om

KaPPA-View. A Web-Based Analysis Tool for Integration … · A Web-Based Analysis Tool for Integration of Transcript and Metabolite ... plants has resulted in the generation of considerable

bab 7 Activity-Based Costing : A Tool to Aid Decision Making

Identifying foreign language learning profiles in game-based environments by using data mining

SNP mining porcine ESTs with MAVIANT, a novel tool for SNP evaluation and annotation

On a Tool-Supported Model-Based Approach for Building … · 2019. 9. 16. · On a Tool-Supported Model-Based Approach for Building Architectures and Roadmaps: The MegaM@Rt2 Project

Title A method for stiffness tuning of machine tool supports … · 2016-10-05 · 2. Stiffness tuning of machine tool supports 2.1 Stiffness model of machine tool supports based

Automated Model Based Requirement Coverage Analysis Tool€¦ · Automated Model Based Requirement Coverage Analysis Tool ... Testing‟. 100 % Structural Coverage DOES NOT mean 100%

Data Mining Tool for Sports Analytics - Repositório Aberto › bitstream › 10216 › 122016 › 2 › 3480… · data mining methods in match data. This tool uses positional data

Performance Testing Strategy for Cloud-Based System using Open Source Testing Tool

Analytics of Hospital Clustering & Profiling as a Tool for Evidence-based Organization Development