AMSTERDAM-AIRBNB PRICE PREDICTION
Author
Namrata Paul and Soumyajit Behera
Last Updated
5 years ago
License
Creative Commons CC BY 4.0
Abstract
This project is about prediction of pricing of rentals in Amsterdam airbnb using KNN regression.
This project is about prediction of pricing of rentals in Amsterdam airbnb using KNN regression.
\documentclass{article}
\usepackage{arxiv}
\usepackage{float}
\usepackage[utf8]{inputenc} % allow utf-8 input
\usepackage[T1]{fontenc} % use 8-bit T1 fonts
\usepackage{hyperref} % hyperlinks
\usepackage{url} % simple URL typesetting
\usepackage{booktabs} % professional-quality tables
\usepackage{amsfonts} % blackboard math symbols
\usepackage{nicefrac} % compact symbols for 1/2, etc.
\usepackage{microtype} % microtypography
\usepackage{lipsum}
\usepackage{graphicx}
\title{AMSTERDAM-AIRBNB PRICE PREDICTION }
\author{
Namrata Paul\\
Department of Mathematics\\
Birla Institute Of Technology,Mesra \\
Ranchi 835215,Jharkhand,India \\
\texttt{nampaul1999@gmail.com} \\
%% examples of more authors
\And
Soumyajit Behera\\
Department of Mathematics \\
Birla Institute Of Technology,Mesra \\
Ranchi 835215,Jharkhand,India \\
\texttt{ashish454570@gmail.com} \\
\AND
Ankit Tewari \\
Artificial Intelligence Engineer \\
Knowledge Engineering and Machine Learning Group \\
\texttt{ankit.tewari@estudiant.upc.edu} \\
%% \And
%% Coauthor \\
%% Affiliation \\
%% Address \\
%% \texttt{email} \\
%% \And
%% Coauthor \\
%% Affiliation \\
%% Address \\
%% \texttt{email} \\
}
\begin{document}
\maketitle
\begin{abstract}
\ Price of rental rooms in Airbnb varies according to different filters,so there is a need for a system to predict price according to different features.Price prediction can help the developer determine the price at which host can rent their property.
The factors that influence the price are accommodations,number of bathrooms and bedrooms one is booking,number of guests,minimum night stay and where the room is located,whether it is near city centre or far away.The aim of this research is to predict price of property available in Airbnb.
\end{abstract}
% keywords can be removed
\keywords{Price Prediction \and KNN regression \and More}
\section{Introduction}
\ The data which is analysed in this project is data of Amsterdam-Airbnb.Airbnb is an American online marketplace and hospitality service brokerage company.
Member can use the service to arrange or offer lodging,primarily homestays or tourism experience.
Setting a reasonable price for rental property on Airbnb is quite challenging for the hosts to determine the optimal nightly rent price, as well for the customers to get rental according to their preference with an offered price.
\ This project's motive is to develop a price prediction model using KNN regression to tackle this problem.Features that affect the price will be used to predict the price which includes accommodates,number of bathrooms and bedrooms, number of guests to be included,minimum night stay,distance of rental rooms from the city centre.
\section{Dataset}
\ In this project we take data of airbnb listings in Amsterdam and try to predict the price of stay in that listing.
\ The source of the dataset: https://www.kaggle.com/adityadeshpande23/amsterdam-airbnb
\ The data 15181listings and 13 columns including price,what we are trying to predict.Each row in the data set is a specific listing that's available for renting on airbnb in Amsterdam.Here are some of the more important columns:
\begin{itemize}
\item accommodates: Number of guests the rental can accommodate.
\item bathrooms: Number of bathrooms included in the rental.
\item bedrooms: Number of bedrooms included in the rental.
\item guests included: Number of guests
\item minimum nights: Minimum number of nights a guest can stay for the rental.
\item Number of reviews: Number of reviews the previous guests have left.
\item d centre: Distance of rental rooms from the city centre
\end{itemize}
Some columns will not be used as features such latitude,longitude,instant bookable f,
instant bookable t, room type Entire home/apt,
room type Private room, room type Shared room.So we are left with 7 columns to process and consider as features.
\ There was no information about this data so we assume that since all the listings are in Amsterdam,the price that we are trying to predict is the general pricing for minimum night stay of the listing in USD,not for specific dates/seasons and not including additional fees, i.e, cleaning and airbnb service fees.
\begin{figure}[H]
\centering
\includegraphics[width=15cm]{pairplt.png}
\caption{Pairplot}
\label{fig:my_label}
\end{figure}
\section{Methods and Experiments}
\label{sec:headings}
\subsection{KNN Regression}
\ KNN regression is used on the dataset with all numerical features as model inputs.The model is implemented using scikit-learn library.Scikit-learn library provides a range of supervised learning algorithms via a consistent interface in Python. \ref{sec:headings}.
\subsection{Scikit-learn}
\ We've been writing functions from scratch to train the k-nearest neighbor models.
The workflow consists of four main steps:
\begin{itemize}
\item Instantiating the specific machine learning model that we want to use.
\item Fit the model to the training data.
\item Use the model to make predictions.
\item Check the accuracy of the predictions.
\end{itemize}
\ Now use the rows in the training set to predict the price value for the rows in the test set.
\subsection{KNeighborRegressor}
\ Sklearn.neighbors provides functionality for unsupervised and supervised neighbors-based learning methods and supervised neighnors-based learning comes in two flavours:classification for data with discrete labels and regression for data with continuous labels.
In this project KNeighborsRegressor is being used which implements learning based on the k nearest neighbors of each query point,where k is an user defined integer value.
The principle behind nearest neighbors method is to find a predefined number of training samples closest in distance to the new point and predict the label from these.
The number of samples being set in this project is a user-defined constant.
\subsection{Evaluate the model using r2\_score}
\ The model developed in this research will be tested using r2. R2 is statistical measure that represents the proportion of the variance for a dependent variable that's explained by a independent variable(s) in a regression model.If R2 increases,the model gets better.
\begin{equation}
\ R2= { \frac{TSS-RSS}{TSS}}
\end{equation}\ TSS is the total sum of squared difference of each observation from the overall mean.
RSS is the residual sum of squares.
\ Sklearn.metrics.r2\_score is used to calculate R2 for test set and train set.R provides an indication of the goodness of fit of a set of predictions to the actual values.
This is a value between 0 and 1 for non fit and perfect fit respectively.
.
\section{Results}
\label{sec:others}
\ In this project we have tried to predict the rental price of Amsterdam airbnb using KNN regression.
\begin{figure}[H]
\centering
\includegraphics[width=15cm]{traink.png}
\caption{Train set r2\_ score Vs Neighbors }
\label{fig:my_label}
\end{figure}
\ In figure 2, the relationship between training set and neighbors can be seen.We can see that the r2\_ score rapidly decreases as the value of k increases.
\
See awesome Table~\ref{tab:table}.
\begin{table}
\caption{R2 for train set}
\centering
\begin{tabular}{lll}
\toprule
\multicolumn{2}{c}{} \\
\cmidrule(r){1-2}
Value of k & r2\_ score \\
\midrule
1 & 1.0 \\
2 & 0.75197 \\
3 & 0.66335 \\
4 & 0.61151 \\
5 & 0.58466 \\
6 & 0.56340 \\
7 & 0.54481 \\
8 & 0.53381 \\
9 & 0.52504 \\
10 & 0.51789 \\
\bottomrule
\end{tabular}
\label{tab:table}
\end{table}
\ The table 1 shows the r2\_ score for different values of k upto 10.We can clearly see that r2\_ score is maximum for k=1.
We used this k to predict target value in test set and to evalute r2\_ score for test test which is found to be 0.017.
\begin{figure}[H]
\centering
\includegraphics[width=15cm]{avsp.png}
\caption{Actual value Vs Predicted value }
\label{fig:my_label}
\end{figure}
Figure 3 shows the accuray of predicted value.The curve for predicted value fits approximately well with the curve of actual value.
\section{Conclusions}
We have predicted the price of rental properties available in airbnb of Amsterdam using the features which included number of bedrooms,bathrooms,reviews,guests,accommodates and distance from city centre.There are many future applications for this project.New variables can be added to help increase the accuracy of the model.Seasonality can be the next big influencer to be added.As it is obvious that there are months with more traveller tahn others,but our model do not account for that yet.In future we can predict the price on week to week basis or month wise.
\section{Acknowledgement}
\bibliographystyle{unsrt}
%\bibliography{references} %%% Remove comment to use the external .bib file (using bibtex).
%%% and comment out the ``thebibliography'' section.
This is our first project under the guidance of our mentor Ankit Tewari who guided us how to use KNN algorithm to predict the target value.He advised and helped us to overcome with the obstacles and problems we faced during the accomplishment of the task.
%%% Comment out this section when you \bibliography{references} is enabled.
\begin{thebibliography}{1}
\bibitem{kour2014real}
An Introduction to Statistical learning:with applications of R(Springer Texts in Statistics)
\bibitem{kour2014fast}
Notebook:https://www.kaggle.com/nampaul/practice
\bibitem{kour2014fast}
https://necromuralist.github.io/machine-learning-studies/posts/knn-regresion/
\bibitem{hadash2018estimate}
https://www.analyticsvidhya.com/blog/2018/08/k-nearest-neighbor-introduction-regression-python/
\end{thebibliography}
\end{document}