23
Basic Data Mining Basic Data Mining Techniques Techniques

Basic Data Mining Techniques. Query Tools Statistical Techniques Visualization Techniques Case-Based Learning (K-Nearest Neighbor) Contents

Embed Size (px)

Citation preview

Basic Data Mining TechniquesBasic Data Mining Techniques

• Query Tools

• Statistical Techniques

• Visualization Techniques

• Case-Based Learning (K-Nearest Neighbor)

Contents

Query Tools and Statistical Techniques

• 客戶是電信公司最大的資產• 客戶行為存在於交換機的通話記錄中• 了解客戶行為成為電信公司的趨勢

– 案例 : 推銷電話線路• 替那些目前線路已經飽和的公司提供更多的電話線路‚ 是持續會有的商機

– 何時客戶會需要額外的連接線路 ?

推銷電話線路交換機的通話記錄

持續時間轉成總類

對時間排序統計佔線數

Naive Predictions

Query Tools and Statistical Techniques

Query Tools and Statistical Techniques

Query Tools and Statistical Techniques

Query Tools and Statistical Techniques

Query Tools and Statistical Techniques

Query Tools and Statistical Techniques

Music Magazine

Visualization Techniques (Scatter Diagram)

Distance between Data Points

• Records that are close to each other live in each other’s neighborhood– Customers of the same type (cluster) will show

the same behavior– Do as your neighbors do– Not really a learning technique– Disadvantage:

• Inefficiency• It is difficult to understand that the performance of

k-nearest neighbor is better than naïve prediction

K-Nearest Neighbor

r

K-Nearest Neighbor

Result of the K-Nearest Neighbor Process

67.1%

70.2%

55.3%

85.4%

91.9%

電影推薦

電影推薦

K-Nearest Neighbors for 0*3*6

• C1: 1 0 0 1 0 0 1

• M1: 0 1 1 1 0 0 1

• Distance = 3 or Similarity = 4

• C1: 1 0 0 1 0 0 1

• M2: 0 1 1 1 0 1 1

• Distance = 4 or Similarity = 3

K-Nearest Neighbors for 0*3*6

M1 4 M8 3 M15 4 M22 2

M2 3 M9 4 M16 6 M23 4

M3 6 M10 4 M17 4 M24 4

M4 5 M11 3 M18 5 M25 6

M5 4 M12 5 M19 6 M26 4

M6 4 M13 7 M20 7

M7 5 M14 6 M21 3

If Similarity_Threshold is 6Then 7 Neighbors (M3, M13, M14, M16, M19, M20, M25) are selected.

Similarity

Summarize these 7 Neighbors• Neighbor 1:

– 111 134 388 262 261 266 268 012 260 184 238 091 104 142 038

• Neighbor 2:– 240 256 290 441 442 442 510 518 518 520 522 001 005 016 184

• Neighbor 3:– none

• Neighbor 4:– 402 193 228 179 227 111 204 364

• Neighbor 5:– 280

• Neighbor 6:– 193

• Neighbor 7:– 186 189 193 214 239 179 227 263 240

Like Movies

Like Movies for 0*3*6

• Count = 03 Movie = 臥虎藏龍 (193)• Count = 02 Movie = 尖峰時刻 (184)• Count = 02 Movie = 蛇眼 (240)• Count = 02 Movie = 美麗人生 (442)• Count = 02 Movie = 厄夜叢林 (518)• Count = 02 Movie = 楚門的世界 (111)• Count = 02 Movie = 全民公敵 (179)• Count = 02 Movie = 神鬼傳奇 (227)

Data Mining Tool & Query Tool• Suppose a large database containing millions of

records that describe customers’ purchases– Who bought which product on what date?

– What is the average turnover in July?

– What is an optimal segmentation of clients?

– What are the most important trends in customer behavior?

• If you know exactly what you are looking for, use query tool

• If you know only vaguely what you are looking for, use data mining tool

Data Mining Tool & Query Tool