25
Data Mining 資料探勘 1 1002DM01 MI4 Thu. 9,10 (16:10-18:00) B513 Introduction to Data Mining Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management , Tamkang University 淡江大學 資訊管理學系 http://mail. tku.edu.tw/myday/ 2012-02-16

資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

Data Mining資料探勘

1

1002DM01MI4

Thu. 9,10 (16:10-18:00) B513

Introduction to Data Mining

Min-Yuh Day戴敏育

Assistant Professor專任助理教授

Dept. of Information Management, Tamkang University淡江大學資訊管理學系

http://mail. tku.edu.tw/myday/2012-02-16

Page 2: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

課程資訊

• 課程名稱:資料探勘 (Data Mining)

• 授課教師:戴敏育 (Min-Yuh Day)

• 開課系級:資管四 (MI4)

• 開課資料:選修 單學期 2學分

• 上課時間:週四 9,10 (Thu 16:10-18:00)

• 上課教室:B513

2

Page 3: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

Data Mining at the Intersection of Many Disciplines

Stat

istics

Management Science & Information Systems

Artificial Intelligence

Databases

Pattern Recognition

MachineLearning

MathematicalModeling

DATAMINING

Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 3

Page 4: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

Knowledge Discovery (KDD) Process

Data Cleaning

Data Integration

Databases

Data Warehouse

Task-relevant Data

Selection

Data Mining

Pattern Evaluation

4Source: Han & Kamber (2006)

Data mining: core of knowledge discovery process

Page 5: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

Data WarehouseData Mining and Business Intelligence

Increasing potentialto supportbusiness decisions End User

BusinessAnalyst

DataAnalyst

DBA

DecisionMaking

Data Presentation

Visualization Techniques

Data MiningInformation Discovery

Data ExplorationStatistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses

Data SourcesPaper, Files, Web documents, Scientific experiments, Database Systems

5Source: Han & Kamber (2006)

Page 6: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

Business Pressures–Responses–Support Model

Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 6

Page 7: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

課程簡介• 本課程介紹資料探勘 (Data Mining) 的基礎概念及應用技術。

• 課程內容包括

– 資料探勘導論、

– 關連分析、

– 分類與預測、

– 分群分析、

– 資料探勘個案分析與實作、

– 文字探勘與網頁探勘、

– 社會網路分析、

– 意見分析。7

Page 8: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

Course Introduction• This course introduces the fundamental concepts and

applications technology of data mining. • Topics include

– Introduction to Data Mining, – Association Analysis, – Classification and Prediction, – Cluster Analysis, – Case Study and Implementation of Data Mining, – Text and Web Mining, – Social Network Analysis, – Opinion Mining

8

Page 9: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

課程目標

• 瞭解及應用資料探勘基本概念與技術。

9

Page 10: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

週次 日期 內容(Subject/Topics)1 101/02/16 資料探勘導論 (Introduction to Data Mining)2 101/02/23 關連分析 (Association Analysis)3 101/03/01 分類與預測 (Classification and Prediction)4 101/03/08 分群分析 (Cluster Analysis)5 101/03/15 個案分析與實作一 (分群分析):

Banking Segmentation (Cluster Analysis – KMeans)6 101/03/22 個案分析與實作二 (關連分析):

Web Site Usage Associations ( Association Analysis)7 101/03/29 個案分析與實作三 (決策樹、模型評估):

Enrollment Management Case Study (Decision Tree, Model Evaluation)

課程大綱 (Syllabus)

10

Page 11: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

週次 日期 內容(Subject/Topics)8 101/04/05 教學行政觀摩日 (--No Class--)9 101/04/12 期中報告 (Midterm Presentation)10 101/04/19 期中考試週

11 101/04/26 個案分析與實作四 (迴歸分析、類神經網路):Credit Risk Case Study (Regression Analysis, Artificial Neural Network)

12 101/05/03 文字探勘與網頁探勘 (Text and Web Mining)13 101/05/10 社會網路分析、意見分析

(Social Network Analysis, Opinion Mining)14 101/05/17 期末專題報告 (Term Project Presentation)15 101/05/24 畢業考試週

課程大綱 (Syllabus)

11

Page 12: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

教學目標之教學方法與評量方法

• 教學目標

– 瞭解及應用資料探勘基本概念與技術。

• 教學方法

– 講述、討論、模擬、實作、問題解決

• 評量方法

– 實作、報告、上課表現

12

Page 13: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

Objective

• Students will be able to understand and apply the fundamental concepts and technology of data mining.

13

Page 14: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

教材課本• 講義 (Slides)• 參考書籍

– Decision Support and Business Intelligence Systems, Ninth Edition, Efraim Turban, Ramesh Sharda, Dursun Delen, 2011, Pearson

– Applied Analytics Using SAS Enterprise Mining, Jim Georges, Jeff Thompson and Chip Wells, 2010, SAS

– Data Mining: Practical Machine Learning Tools and Techniques, Third Edition, Ian H. Witten, Eibe Frank, and Mark A. Hall, Morgan Kaufmann, 2011.

– Data Mining: Concepts and Techniques, Second Edition, Jiawei Han and Micheline Kamber, 2006, Elsevier

– 決策支援與企業智慧系統,九版,Efraim Turban 等著,李昇暾審定,2011,華泰

– 資料探勘:概念與方法,王派洲譯,2008,滄海

– 資料探勘,曾憲雄、蔡秀滿、蘇東興、曾秋蓉、王慶堯,2008,旗標

– SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,碁峯

– 資料採掘:理論與實務規劃手冊,孫惠民,2007,松崗

– Web 資料採掘技術經典,孫惠民,2008,松崗14

Page 15: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

作業與學期成績計算方式

• 批改作業篇數

– 2篇(Team Term Project)

• 學期成績計算方式

– 期中評量:30 % (期中報告) – 期末評量:30 % (期末報告)– 其他(課堂參與及報告討論表現): 40 %

15

Page 16: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

Team Term Project• Term Project Topics

– Data mining – Business Intelligence– Text mining – Web mining – Social Network Analysis

• 3-5 人為一組

– 分組名單於 2012.02.23 (四) 課程下課時繳交

– 由班代統一收集協調分組名單

16

Page 17: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

A Taxonomy for Data Mining TasksData Mining

Prediction

Classification

Regression

Clustering

Association

Link analysis

Sequence analysis

Learning Method Popular Algorithms

Supervised

Supervised

Supervised

Unsupervised

Unsupervised

Unsupervised

Unsupervised

Decision trees, ANN/MLP, SVM, Rough sets, Genetic Algorithms

Linear/Nonlinear Regression, Regression trees, ANN/MLP, SVM

Expectation Maximization, Apriory Algorithm, Graph-based Matching

Apriory Algorithm, FP-Growth technique

K-means, ANN/SOM

Outlier analysis Unsupervised K-means, Expectation Maximization (EM)

Apriory, OneR, ZeroR, Eclat

Classification and Regression Trees, ANN, SVM, Genetic Algorithms

Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 17

Page 18: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

The Evolution of BI Capabilities

Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 18

Page 19: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

A High-Level Architecture of BI

Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 19

Page 20: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

Social Network Analysis

20Source: http://www.fmsasg.com/SocialNetworkAnalysis/

Page 25: 資料探勘 - Tamkang Universitymail.tku.edu.tw/myday/teaching/1002/dm/1002dm01_data_mining.pdf · – SQL Server 2008 R2 資料採礦與商業智慧,謝邦昌、鄭宇庭、蘇志雄,2011,

Contact Information戴敏育博士 (Min-Yuh Day, Ph.D.)

專任助理教授淡江大學資訊管理學系

電話:02-26215656 #2347傳真:02-26209737研究室:i716 (覺生綜合大樓)地址: 25137 新北市淡水區英專路151號Email: [email protected]網址:http://mail.tku.edu.tw/myday/

25