23
A Classification A Classification Approach for Movie Approach for Movie Recommender System Recommender System 指指指指 指指指 指指 指指M964020007 指指指 M964020011 指指指 M964020022 指指指

A Classification Approach for Movie Recommender System

Embed Size (px)

DESCRIPTION

A Classification Approach for Movie Recommender System. 指導教授:黃三益  老師 學生: M964020007 黃于珊 M964020011 李界寬 M964020022 程尚文. Agenda. Introduction Motivation and background Determination of data set The Data Mining Procedure Conclusion and Limitation. INTRODUCTION. - PowerPoint PPT Presentation

Citation preview

Page 1: A Classification Approach for Movie Recommender System

A Classification Approach A Classification Approach for Movie Recommender for Movie Recommender SystemSystem

指導教授:黃三益  老師學生: M964020007 黃于珊

M964020011 李界寬 M964020022 程尚文

Page 2: A Classification Approach for Movie Recommender System

AgendaAgenda

IntroductionMotivation and backgroundDetermination of data setThe Data Mining ProcedureConclusion and Limitation

Page 3: A Classification Approach for Movie Recommender System

1.MOTIVATION AND 1.MOTIVATION AND BACKGROUND BACKGROUND2.DETERMINATION OF 2.DETERMINATION OF DATA SET DATA SET

INTRODUCTION

Page 4: A Classification Approach for Movie Recommender System

Motivation and backgroundMotivation and background

Dataset 來源自 GroupLens◦ (Research lab in the Department of Computer

Science and Engineering at the University of Minnesota ; http://www.grouplens.org/)

線上電影推薦系統 -MovieLens

( http://www.movielens.org/ ) ◦加入會員,評價隨機選出的數部電影,即可享

受到網站給予的五部電影之推薦,並附上預測使用者喜好該電影的程度。

We all loves movies Find the rule

Page 5: A Classification Approach for Movie Recommender System

Determination of data setDetermination of data set

使用 MovieLens 目前提供兩種Datasets 的其中一種。◦內容包含 1682 部電影, 943 使用者,共

100,000 ratings 。◦提供足夠的樣本規模,讓我們可以適當的

建立和測試模型。

Page 6: A Classification Approach for Movie Recommender System

1.DATA MINING 1.DATA MINING PROCEDURE:10 STEP PROCEDURE:10 STEP2. CONCLUSION AND 2. CONCLUSION AND LIMITATION LIMITATION

The Data Mining Procedure

Page 7: A Classification Approach for Movie Recommender System

Step 1. Translate the business Step 1. Translate the business problem into a data mining problem into a data mining problemproblem電影種類與數目相當繁多,如何在眾

多的電影中可以快速的找到符合自己偏好的電影 ?◦電影推薦系統◦縮短搜尋時間 ◦Find the Rule

年齡、職業、性別等之偏好那些種類的電影◦Potential customers

Page 8: A Classification Approach for Movie Recommender System

Step 2. Select appropriate Step 2. Select appropriate datadata線上電影推薦系統 -MovieLens

Research lab in the Department of Computer Science and Engineering at the University of Minnesota ; http://www.grouplens.org/)

資料來源自加入其網站的會員對電影所作的評價與會員的相關個人資料

其所提供的 Dataset 內容包含 1682 部電影, 943 使用者,共 100,000 ratings 。

Page 9: A Classification Approach for Movie Recommender System

Step 3. Get to know the Step 3. Get to know the data(1/2)data(1/2)This data has been cleaned up

◦ users who had less than 20 ratings ◦ did not have complete demographic

information

Page 10: A Classification Approach for Movie Recommender System

Step 3. Get to know the Step 3. Get to know the data(2/2)data(2/2)

Attribute name Description Domain

Age User 年齡1: “Under 18” , 18: "18-24“ 25: “25-34” , 35: "35-44" 45: “45-49” , 50: "50-55“56: "56+”

Gender User 性別 "M" 代表男性, "F" 代表女性

Occupation User 職業

0: "other" or not specified 1: “academic/educator” 2: "artist" 3: “clerical/admin” 4: "college/grad student“And so on……

Movie Kind 電影類型

* Action * Adventure * Animation * Children‘s * Comedy * Crime* Documentary * Drama * Fantasy * Film-Noir * Horror * Musical* Mystery * Romance * Sci-Fi * Thriller * War * Western

Page 11: A Classification Approach for Movie Recommender System

Step 4. Create a model setStep 4. Create a model set

• Data Source–MovieLens (The GroupLens Research

Project at the University of Minnesota)• Data Characteristics:–100,000 ratings (1-5) from 943 users

on 1682 movies–Each user has rated at least 20 movies–seven-month period from September

19th, 1997 through April 22nd, 1998–With complete demographic

information

Page 12: A Classification Approach for Movie Recommender System

Step 5. Fix problems with the Step 5. Fix problems with the datadataVariable with too many values

◦Movie kind◦Occupation◦We do not consider variables such

as ZipCode and rate

Page 13: A Classification Approach for Movie Recommender System

Step 6.Transform data to bring Step 6.Transform data to bring information to the surfaceinformation to the surfaceWe skip this step due to the

uselessness of transforming data into different formats

Page 14: A Classification Approach for Movie Recommender System

Step 7. Build modelsStep 7. Build models

Data mining tool: ◦Weka Explorer 3.4.12

Classifier◦Decision tree methods◦using C4.5 algorithm

Performs well on both accuracy and speed

Page 15: A Classification Approach for Movie Recommender System

Weka: the softwareWeka: the software

Page 16: A Classification Approach for Movie Recommender System

Step8. Assess ModelStep8. Assess Model

Confusion Matrix

Table 1. Confusion Matrix of Classifier C4.5 from Training Set

The Kind of Movie Romance Thriller War

Romance 2,576 7,465 38

Thriller 1,742 15,643 53

War 1,095 6,428 90

Page 17: A Classification Approach for Movie Recommender System

Step8. Assess ModelStep8. Assess Model

Detailed Accuracy

Table 2. Detailed Accuracy of Classifier C4.5 from Training Set

Class TP Rate FP Rate Precision Recall F-Measure

Romance 0.256 0.113 0.476 0.256 0.333

Thriller 0.897 0.785 0.53 0.897 0.666

War 0.012 0.003 0.497 0.012 0.023

Page 18: A Classification Approach for Movie Recommender System

Step8. Assess ModelStep8. Assess Model

Other Information

Table 3. The Results of Classifier C4.5 from Training Set

Correctly Classified Instances 18,309 Rate : 52.1178%

Incorrectly Classified Instances 16,821 Rate : 47.8822%

Kappa statistic 0.1089

Mean absolute error 0.4023

Root mean squared error 0.4485

Relative absolute error 96.6655%

Root relative squared error 98.3189%

Total Number of Instances 35,130

Page 19: A Classification Approach for Movie Recommender System

Step 8. Assess ModelStep 8. Assess Model

Decision Tree◦Number of Leaves : 118◦Size of the tree : 216

Page 20: A Classification Approach for Movie Recommender System
Page 21: A Classification Approach for Movie Recommender System

Step 9. Deploy ModelStep 9. Deploy Model

It’s difficult to deploy, because ◦Computer’s resources are not

enough◦Difficult to implementation

Page 22: A Classification Approach for Movie Recommender System

Conclusion and LimitationConclusion and Limitation

Classification Approach : C4.5 → Decision Tree

Data Set : 35,130 dataLimitation

◦Hardware and software don’t support enough to mining more data to find more interest and complete rules.

Page 23: A Classification Approach for Movie Recommender System

Thanks For Your Attention.