Table of Contents
Our Hotel Price Prediction project aims to predict the price of hotels given a listing. Our model takes as inputs images, text reviews and descriptions, ratings in many categories, location, and outputs a price prediction. The aim is to be able to use our model as a tool to quickly combine information found in the form of many reviews and numerical, and even image data to tell users what a hotel's price should be. This way, they can discern whether a hotel is overpriced, priced fairly, or a good deal. We trained our model with supervised learning, using a tuned pretrained text transformers on the text data, a tuned pretrained CNN on image data, and fed all of the embeddings of these networks into a fully connected network, along with numerical data.
In addition to the price prediction model, we explored training a causal model to learn the latent variable "quality", which is the instrinsic value of a hotel detached from everything else. We explored using an autoencoder-like architecture, and analyzed different approaches of solving the problem of extracting a latent quality variable which is not directly expressed by any of the features in the data found on hotel listings.
To get a local copy up and running follow these simple example steps.
Required packages
- pip
pip install torch torchvision torchaudio transformers datasets pandas numpy matplotlib seaborn scikit-learn
- Clone the repo
git clone https://github.com/john-zhang-uoft/hotel_price_prediction
- Install python packages
pip install torch torchvision torchaudio transformers datasets pandas numpy matplotlib seaborn scikit-learn
causal_model.ipynb is the training for the quality encoding model.
multi-modal_network.ipynb is the training of the final price prediction multi-modal network.
extract_json.ipynb contains unpacking the json data.
final_dataset.ipynb contains the construction of the final dataset after filtering images.
image_heuristics.ipynb contains the calculation for different predictive heuristics we used for images to simplify the task of regressing on images.
We would like to thank professor Michael Guerzhoy and our TA Parsa Farinneya for their excellent teaching and help this semester.