2007061101839 s x

Embed Size (px)

DESCRIPTION

Data Mining

Citation preview

  • Intro to Data Mining..

    Worapoj Kreesuradej, Ph.D.Associate Professor

    Data Mining & Data Exploration Laboratory (DME Lab),Faculty of Information Technology,King Mongkut's Institute of Technology Ladkrabang,Web: www.it.kmitl.ac.th/dmeEmail: [email protected], [email protected].

    Grading z assignment 40 %z midterm exam 30 %z final exam 30 %

    Referencesz Christopher Westphal and Teresa

    Blaxton, Data Mining Solutions, John Wiley & Sons Inc., 1998.

    z Pieter Adrians and Dolf Zantinge, Data Mining, Addison Wesley, 1996.

    z Michael J.A. Berry and Gordon Linoff, Data Mining Techniques, John Wiley & Sons Inc., 1997.

    z Alex Berson and Stephen J. Smith, Data Warehousing, Data Mining & OLAP, McGraw Hill, 1997.

    Referencesz Vasant Dhar and Roger Stein, Seven

    Methods for Transforming Corporate Data Into Business Intelligence, Prentice Hall, 1997.

    z Cabena Peter etc., Discovering data mining, Prentice Hall, 1997.

    z Business Modeling and Data Miningby Dorian Pyle , Morgan Kaufmann; 1st edition (April 2003) .

    z Exploratory Data Mining and Data Cleaningby Tamraparni Dasu (Author), Theodore Johnson (Author) , John Wiley & Sons; 1st edition (May 9, 2003)

  • Referencesz Building Data Mining Applications for CRM by

    Alex Berson, Kurt Thearling, Stephen J. Smith, McGraw-Hill Osborne Media; (December 22, 1999)

    z Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Databy Bruce Ratner , : Chapman & Hall; (May 2003)

    z Data Mining, Ian H. Witten, Eible Frank, Morgan Kaufman, 2005.

    What is Data Mining? zDefinition: Data Mining is the

    process of extracting previously unknown, valid and actionable information from large database and then using the information to make crucial business decisions.zAlternative names: Knowledge

    Discovery in Databases (KDD)

    Evolution of Database Technologyz 1960s: Data collection, database

    creation, and network DBMSz 1970s: Relational data model,

    relational DBMS implementation

    Evolution of Database Technologyz 1980s: RDBMS, advanced data

    models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.)z 1990s2000s: Data mining and

    data warehousing, multimedia databases, and Web databases

  • CustomApplication

    Intelligence Enterprise

    ERP

    CustomApplication

    Packaged Application

    Client Data

    DataWarehouse

    Data Mining

    OLAP

    Query & Reporting tool

    Business ImpactIncreasing business Impact

    Data Mining

    Information Discovery

    Data ExplorationOLAP

    Statistical Analysis, Querying and Reporting

    Data Warehouses / Data Marts

    Data SourcesPaper, Files, Information Providers, Database Systems, OLTP

    Potential Applications of Data MiningzMarket analysis and managementpurchasing pattern over timecross-sellingcustomer profilingdirect mail campaignmarket segmentation

    Potential Applications of Data Mining

    zRisk analysis and management forecastingcredit scoring for loan application

    processingprofile of attrition (churn management)

  • Potential Applications of Data Miningz Fraud detection and managementmoney laundering detect suspicious

    money transactionsdetecting Inappropriate Medical

    Treatments

    Potential Applications of Data MiningzWeb miningWeb Usage MiningWeb Content Mining Automatic Classification of Web

    DocumentWeb Structure Mining

    Potential Applications of Data Miningz Text mining Dividing documents into groups Document feature extraction

    Structured Data

    UnstructuredData

    Data mining process

    Data Preparation

    Databases

    Business Objective

    Selection

    Data Mining

    Pattern Evaluation

    PreprocessedData

    TargetData

  • Data mining processzBusiness Objectives Determination Identify the business problems or

    opportunityzData Selection Identify all internal or external sources

    of information and select which subset of the data is needed for the data mining application.

    Data mining processzData PreprocessingThe goal of data preprocessing is to

    ensure the quality of the selected data.current data set, sampling data, unit

    conversion, representation formats, detecting missing value

    Data mining processzData Transformation the goal is to transform data to suit the

    intended analysis and the data formats required by the data mining algorithms, many of which have particular requirements.

    Data mining processzData Mining Select modeling techniqueData Mining Operations Predictive Modeling Database Segmentation Link Analysis Visualization

  • Database Segmentation (clustering)

    z partitioning a database intosegments of similar records, that isrecords that share a number ofproperties.Model: K-means, Kohonen neural

    networks

    Database Segmentation

    Age

    Annual Income

    Predictive Modeling

    z Finding models for future predictionClassification:predicts categorical class labels

    Prediction: models continuous-valued functions

    zModel: decision-tree, neural network

    Example of Predictive Modeling: Classification

    Classifier

    TrainingData Unseen Data

    (Jeff, Professor, 4)

    Tenured?NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes

  • Link Analysisz Finding frequent patterns,

    associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositoriesModel: Apiori Algorithm,

    Visualization of Link Analysis software

    Visualization Visualization

  • Visualization of a decision tree in MineSet 3.0 Data mining process

    zAnalysis of results Interpret and evaluate the output form

    data mining.Have we found something that is

    interesting, valid, and actionable?

    Data mining processzAssimilation of knowledgeThe objective is to put into action,

    according to the new, valid and actionable information from the previous process steps.

    Effort Required for Each Data Mining Process Step

  • Methodology for data miningzCRISP-DMCross Industry Standard

    Process for Data Mining (CRISP-DM)

    zConsortium of data miners from various industries manufacturing, marketing, and government

    Examples of Data Mining Systemsz IBM Intelligent MinerA wide range of data mining algorithmsScalable mining algorithmsToolkits: neural network algorithms,

    statistical methods, data preparation, and data visualization toolsTight integration with IBM's DB2

    relational database system

    Examples of Data Mining Systems

    zSAS Enterprise Miner A variety of statistical analysis toolsData warehouse tools and multiple data

    mining algorithmszClementine (from SPSS)Multiple data mining algorithms and

    advanced statistics

    Examples of Data Mining SystemszSQL Server 2005Multiple data mining modules:

    discovery-driven OLAP analysis, association, classification, and clustering Tight integration with SQL Server

    relational database system

  • Examples of Data Mining SystemszOracle Data MinerMultiple data mining modules:

    discovery-driven OLAP analysis, association, classification, and clustering

    Examples of Data Mining SystemszWeka (Open Source Software)Multiple data mining modules:

    association, classification, visualization and clustering Open Source Software

    Examples of Data Mining SystemszDBMiner (DBMiner Technology

    Inc.)Multiple data mining modules:

    discovery-driven OLAP analysis, association, classification, and clustering Efficient, association and sequential-

    pattern mining functions, and visual classification toolMining both relational databases and

    data warehouses

    Trends in Data MiningzApplication explorationdevelopment of application-specific

    data mining system Invisible data mining (mining as built-in

    function)

  • Trends in Data MiningzScalable data mining methodsConstraint-based mining: use of

    constraints to guide data mining systems in their search for interesting patterns

    z Integration of data mining with database systems, data warehouse systems, and Web database systems

    Trends in Data MiningzStandardization of data mining

    languageA standard will facilitate systematic

    development, improve interoperability, and promote the education and use of data mining systems in industry and society PMML (Predictive Model Markup Language) OLE DB for Data Mining JDM API (Java Data Mining API)

    zWeb MiningzMultimedia Mining

    Data Mining: Confluence of Multiple Disciplines

    Data Mining

    Database Technology Statistics

    OtherDisciplines

    InformationScience

    MachineLearning Visualization

    Thank you !!!