Upload
neil-ryan
View
774
Download
0
Embed Size (px)
Citation preview
A KBB-like Reference Pricing System ---- Using Machine Learning
Team: Altima Hao Zhu Yingqi Yang
Product
A KBB-like Reference Pricing System
The end product could be integrated with online apt./room/house etc. rental listings to provide people looking for rental housing with a reference point for rent negotiation
Rental Details Asking Rent Reference Price % Above Reference
…… 1000 800 25%
Business Model
• Build our own service by scraping online rental listings and applying this system
• Cooperate with online rental listing providers such as Craigslist and provide this system as a value-added service
• Promote this system to other similar web services such as ebay auction to predict closing price
Approach
Data Set
Source Seattle Apt/House Rent Price Downloaded from GitHub
Size• Total of 2313 Entries from Nov. 2014• Training/Validation: 75/25
Attribute
• Responds (Price) • 14 Predictors
(Number of Bedrooms, Room Size, Listing Title……)
Approach
Data Exploration
Outlier:
Price < 600 or Price > 3100
Rent Price Distribution
Rent Price Histogram
Approach
Text Mining on Listing Title Variable
Approach
Model Selection
(1) K Nearest Neighbors – Regression Model (KNN)
An algorithm that stores all available cases and predict the numerical target based on a similarity measure (e.g., distance functions).
Numeric Variables -- Euclidean Categorical Variables – Hamming Distance
Approach
Model Selection
(1) K Nearest Neighbors – Regression Model (KNN)
Size Beds Zip code Price1 1710 4 98115 25002 2200 2 98199 28953 1420 2 98117 2150
Step1: Standardize Data Set
Size Beds 98104 98115 98117 Price1 0.564 0.212 0 1 0 25002 0.731 0.091 1 0 0 28953 0.465 0.091 0 0 1 2150
Approach
Model Selection
(1) K Nearest Neighbors – Regression Model (KNN)Step2: Give Reasonable Weights
Variable Size Bath Bed Zip Code ……
Weight 5 4 3 2 ……
Approach
Model Selection
(1) K Nearest Neighbors – Regression Model (KNN)
Forecast 2.053 0.273 98104 ?
③①
②
Step3: Calculate Distance
K =1 Price = 2150K =2 Price = (2150+2500) /2
Size Beds 98104 98115 98117 Price Distance1 2.819 0.636 0 1 0 2500 0.82 3.654 0.273 0 0 0 2895 1.63 2.325 0.273 0 0 1 2150 0.3
Approach
Model Selection
(1) K Nearest Neighbors – Regression Model (KNN)
(2) Other Models
• Decision Tree Model• Forest Model• Spline Model• Support Vector Machine Model (SVM)
Approach
Model Comparison
Model Name MAPE RMSEKNN Regression
Model 0.17963 20.53814
Decision Tree Model 0.15522 334.49524
Forest Model 0.12895 287.84426
Spline Model 0.16774 408.67882
* SVM Model 0.15726 336.83526
* Not able to implement in Alteryx Designer; Used R to develop instead
Result: Ensemble Model
Demo
Baths Beds Size Zip Code Price Reference Price % Above Reference
1 1 828 98121 2,055 2,038 0.011 2 900 98117 1,800 1,700 0.061 1 583 98121 2,395 1,395 0.721 1 577 98121 1,398 1,595 -0.12
Model Improvement
• Use a larger dataset to build the model to make it stronger
• Add attributes such as availability of pool, security guard, etc.• Include contents of the listings for text mining• Distinguish between house and apartment
• Add time component to the model to handle trend and seasonality in rent price
• Do more research on the variables to get better weights for KNN Regression Model
Q & A
AppendixK Nearest Neighbors – Regression Model (KNN)
D1 ¿ 2√(2.053−2 .819)2+(0.273−0.636)2+1
D2 ¿ 2√(2.053−3 .654 )2+(0.273−0.273)2+0
F_Price = 2150
F_Price = (2150+2500) / 2
K = 1
K = 2
Step3: Calculate Distance
Forecast 2.053 0.273 98104 ?
③①
②Size Beds 98104 98115 98117 Price Distance
1 2.819 0.636 0 1 0 2500 0.82 3.654 0.273 1 0 0 2895 1.63 2.325 0.273 0 0 1 2150 0.3
Reference
http://www.ncbi.nlm.nih.gov/pubmed/16723004
http://www.cs.upc.edu/~bejar/apren/docum/trans/03d-algind-knn-eng.pdf