Upload
shruti-sharma
View
215
Download
0
Embed Size (px)
DESCRIPTION
Data Mining
Citation preview
Intro to Data Mining..
Worapoj Kreesuradej, Ph.D.Associate Professor
Data Mining & Data Exploration Laboratory (DME Lab),Faculty of Information Technology,King Mongkut's Institute of Technology Ladkrabang,Web: www.it.kmitl.ac.th/dmeEmail: [email protected], [email protected].
Grading z assignment 40 %z midterm exam 30 %z final exam 30 %
Referencesz Christopher Westphal and Teresa
Blaxton, Data Mining Solutions, John Wiley & Sons Inc., 1998.
z Pieter Adrians and Dolf Zantinge, Data Mining, Addison Wesley, 1996.
z Michael J.A. Berry and Gordon Linoff, Data Mining Techniques, John Wiley & Sons Inc., 1997.
z Alex Berson and Stephen J. Smith, Data Warehousing, Data Mining & OLAP, McGraw Hill, 1997.
Referencesz Vasant Dhar and Roger Stein, Seven
Methods for Transforming Corporate Data Into Business Intelligence, Prentice Hall, 1997.
z Cabena Peter etc., Discovering data mining, Prentice Hall, 1997.
z Business Modeling and Data Miningby Dorian Pyle , Morgan Kaufmann; 1st edition (April 2003) .
z Exploratory Data Mining and Data Cleaningby Tamraparni Dasu (Author), Theodore Johnson (Author) , John Wiley & Sons; 1st edition (May 9, 2003)
Referencesz Building Data Mining Applications for CRM by
Alex Berson, Kurt Thearling, Stephen J. Smith, McGraw-Hill Osborne Media; (December 22, 1999)
z Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Databy Bruce Ratner , : Chapman & Hall; (May 2003)
z Data Mining, Ian H. Witten, Eible Frank, Morgan Kaufman, 2005.
What is Data Mining? zDefinition: Data Mining is the
process of extracting previously unknown, valid and actionable information from large database and then using the information to make crucial business decisions.zAlternative names: Knowledge
Discovery in Databases (KDD)
Evolution of Database Technologyz 1960s: Data collection, database
creation, and network DBMSz 1970s: Relational data model,
relational DBMS implementation
Evolution of Database Technologyz 1980s: RDBMS, advanced data
models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.)z 1990s2000s: Data mining and
data warehousing, multimedia databases, and Web databases
CustomApplication
Intelligence Enterprise
ERP
CustomApplication
Packaged Application
Client Data
DataWarehouse
Data Mining
OLAP
Query & Reporting tool
Business ImpactIncreasing business Impact
Data Mining
Information Discovery
Data ExplorationOLAP
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
Data SourcesPaper, Files, Information Providers, Database Systems, OLTP
Potential Applications of Data MiningzMarket analysis and managementpurchasing pattern over timecross-sellingcustomer profilingdirect mail campaignmarket segmentation
Potential Applications of Data Mining
zRisk analysis and management forecastingcredit scoring for loan application
processingprofile of attrition (churn management)
Potential Applications of Data Miningz Fraud detection and managementmoney laundering detect suspicious
money transactionsdetecting Inappropriate Medical
Treatments
Potential Applications of Data MiningzWeb miningWeb Usage MiningWeb Content Mining Automatic Classification of Web
DocumentWeb Structure Mining
Potential Applications of Data Miningz Text mining Dividing documents into groups Document feature extraction
Structured Data
UnstructuredData
Data mining process
Data Preparation
Databases
Business Objective
Selection
Data Mining
Pattern Evaluation
PreprocessedData
TargetData
Data mining processzBusiness Objectives Determination Identify the business problems or
opportunityzData Selection Identify all internal or external sources
of information and select which subset of the data is needed for the data mining application.
Data mining processzData PreprocessingThe goal of data preprocessing is to
ensure the quality of the selected data.current data set, sampling data, unit
conversion, representation formats, detecting missing value
Data mining processzData Transformation the goal is to transform data to suit the
intended analysis and the data formats required by the data mining algorithms, many of which have particular requirements.
Data mining processzData Mining Select modeling techniqueData Mining Operations Predictive Modeling Database Segmentation Link Analysis Visualization
Database Segmentation (clustering)
z partitioning a database intosegments of similar records, that isrecords that share a number ofproperties.Model: K-means, Kohonen neural
networks
Database Segmentation
Age
Annual Income
Predictive Modeling
z Finding models for future predictionClassification:predicts categorical class labels
Prediction: models continuous-valued functions
zModel: decision-tree, neural network
Example of Predictive Modeling: Classification
Classifier
TrainingData Unseen Data
(Jeff, Professor, 4)
Tenured?NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes
Link Analysisz Finding frequent patterns,
associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositoriesModel: Apiori Algorithm,
Visualization of Link Analysis software
Visualization Visualization
Visualization of a decision tree in MineSet 3.0 Data mining process
zAnalysis of results Interpret and evaluate the output form
data mining.Have we found something that is
interesting, valid, and actionable?
Data mining processzAssimilation of knowledgeThe objective is to put into action,
according to the new, valid and actionable information from the previous process steps.
Effort Required for Each Data Mining Process Step
Methodology for data miningzCRISP-DMCross Industry Standard
Process for Data Mining (CRISP-DM)
zConsortium of data miners from various industries manufacturing, marketing, and government
Examples of Data Mining Systemsz IBM Intelligent MinerA wide range of data mining algorithmsScalable mining algorithmsToolkits: neural network algorithms,
statistical methods, data preparation, and data visualization toolsTight integration with IBM's DB2
relational database system
Examples of Data Mining Systems
zSAS Enterprise Miner A variety of statistical analysis toolsData warehouse tools and multiple data
mining algorithmszClementine (from SPSS)Multiple data mining algorithms and
advanced statistics
Examples of Data Mining SystemszSQL Server 2005Multiple data mining modules:
discovery-driven OLAP analysis, association, classification, and clustering Tight integration with SQL Server
relational database system
Examples of Data Mining SystemszOracle Data MinerMultiple data mining modules:
discovery-driven OLAP analysis, association, classification, and clustering
Examples of Data Mining SystemszWeka (Open Source Software)Multiple data mining modules:
association, classification, visualization and clustering Open Source Software
Examples of Data Mining SystemszDBMiner (DBMiner Technology
Inc.)Multiple data mining modules:
discovery-driven OLAP analysis, association, classification, and clustering Efficient, association and sequential-
pattern mining functions, and visual classification toolMining both relational databases and
data warehouses
Trends in Data MiningzApplication explorationdevelopment of application-specific
data mining system Invisible data mining (mining as built-in
function)
Trends in Data MiningzScalable data mining methodsConstraint-based mining: use of
constraints to guide data mining systems in their search for interesting patterns
z Integration of data mining with database systems, data warehouse systems, and Web database systems
Trends in Data MiningzStandardization of data mining
languageA standard will facilitate systematic
development, improve interoperability, and promote the education and use of data mining systems in industry and society PMML (Predictive Model Markup Language) OLE DB for Data Mining JDM API (Java Data Mining API)
zWeb MiningzMultimedia Mining
Data Mining: Confluence of Multiple Disciplines
Data Mining
Database Technology Statistics
OtherDisciplines
InformationScience
MachineLearning Visualization
Thank you !!!