Upload
jasper-bradley
View
226
Download
4
Embed Size (px)
Citation preview
• Query Tools
• Statistical Techniques
• Visualization Techniques
• Case-Based Learning (K-Nearest Neighbor)
Contents
Query Tools and Statistical Techniques
• 客戶是電信公司最大的資產• 客戶行為存在於交換機的通話記錄中• 了解客戶行為成為電信公司的趨勢
– 案例 : 推銷電話線路• 替那些目前線路已經飽和的公司提供更多的電話線路‚ 是持續會有的商機
– 何時客戶會需要額外的連接線路 ?
• Records that are close to each other live in each other’s neighborhood– Customers of the same type (cluster) will show
the same behavior– Do as your neighbors do– Not really a learning technique– Disadvantage:
• Inefficiency• It is difficult to understand that the performance of
k-nearest neighbor is better than naïve prediction
K-Nearest Neighbor
r
K-Nearest Neighbors for 0*3*6
• C1: 1 0 0 1 0 0 1
• M1: 0 1 1 1 0 0 1
• Distance = 3 or Similarity = 4
• C1: 1 0 0 1 0 0 1
• M2: 0 1 1 1 0 1 1
• Distance = 4 or Similarity = 3
K-Nearest Neighbors for 0*3*6
M1 4 M8 3 M15 4 M22 2
M2 3 M9 4 M16 6 M23 4
M3 6 M10 4 M17 4 M24 4
M4 5 M11 3 M18 5 M25 6
M5 4 M12 5 M19 6 M26 4
M6 4 M13 7 M20 7
M7 5 M14 6 M21 3
If Similarity_Threshold is 6Then 7 Neighbors (M3, M13, M14, M16, M19, M20, M25) are selected.
Similarity
Summarize these 7 Neighbors• Neighbor 1:
– 111 134 388 262 261 266 268 012 260 184 238 091 104 142 038
• Neighbor 2:– 240 256 290 441 442 442 510 518 518 520 522 001 005 016 184
• Neighbor 3:– none
• Neighbor 4:– 402 193 228 179 227 111 204 364
• Neighbor 5:– 280
• Neighbor 6:– 193
• Neighbor 7:– 186 189 193 214 239 179 227 263 240
Like Movies
Like Movies for 0*3*6
• Count = 03 Movie = 臥虎藏龍 (193)• Count = 02 Movie = 尖峰時刻 (184)• Count = 02 Movie = 蛇眼 (240)• Count = 02 Movie = 美麗人生 (442)• Count = 02 Movie = 厄夜叢林 (518)• Count = 02 Movie = 楚門的世界 (111)• Count = 02 Movie = 全民公敵 (179)• Count = 02 Movie = 神鬼傳奇 (227)
Data Mining Tool & Query Tool• Suppose a large database containing millions of
records that describe customers’ purchases– Who bought which product on what date?
– What is the average turnover in July?
– What is an optimal segmentation of clients?
– What are the most important trends in customer behavior?
• If you know exactly what you are looking for, use query tool
• If you know only vaguely what you are looking for, use data mining tool