In this project, I analyzed the data from the Airbnb website to predict multiple continuous rating scores. Two different regression models (Ridge and K-Nearest Neighbors Model (KNN)) will be used to the extracted features to predict the scores. In the last, the importance in each model will be presented to show which feature is most related to the rating scores. You can see details in the report.
#finalProjects is the train code folder #data is the data folder
#choose_hyperparameters.py : choose C and K value
#correct spelling.py : correct the wrong spelling of the comments
#extract_tf-idf_features.py: extract tf-idf feature
#merge2vec.py : merge listings.csv and reviews.csv
#plot_importance.py : plot the feature importance with the bar
#plot_the_mse.py : plot the mse for different features and classifiers
#process_reviews.py
#process_listings.py
#process_description.py
#regression_model.py : train the model
#train_word2vec.py : train word2vec model and extract word2vec features for sentense
#translate_false_info.py : translate the texts