Skip to content

Latest commit

 

History

History
102 lines (65 loc) · 6.18 KB

README.md

File metadata and controls

102 lines (65 loc) · 6.18 KB

RentalGPT Dash App

Selected as Plotly’s Top ChatGPT & Generative AI Project

I present the data analysis, data visualization, and Dash application development called "RentalGPT", an interactive, user friendly dashboard that provides services to multiple stakeholders. Using the "Two Sigma Connect: Rental Listing Inquiries" dataset collected from Kaggle, we can do in-depth data analysis, interactive data visualization, and an app that has predictive analytics and virtual assistance.

Data Science Life Cycle

Business Understanding & Data Acquisition

The datasets can be found here.

  • train.json: the training set.
  • images_sample.zip: listing images organized by listing_id (a sample of 100 listings)
  • Kaggle-renthop.7z: listing images organized by listing_id. Total size: 78.5 GB compressed.

Note: for this project, we will consider the training dataset as a full-dataset, as our goal is not to beat the competition, but to build a tool for data analysis and usages.

Data Processing & Feature Engineering

Using various techniques

  • Tabular Feature Extraction.
  • Sentimental Extraction via HuggingFace's pretrained BERT (benchmark on SST dataset).
  • Image Feature Extraction via PyTorch's YOLO.v5.

Notebooks

EDA

Explore the dataset, reveal underlying information by plotting static plots for processed features. Here's the list of analysis: outlider detection & removal, PCA, statistical test (K-S, Shapiro-Witt, D'K^2), Bar plot, Count plot, Pie chart, Distribution plot, Pair plot, Heatmap, Histogram with KDE, QQ plot, KDE, Regression plot with scatter representation and regression line, Boxen plot, Area plot, Violin plot, Joint plot with KDE and scatter representation, Rug plot, 3D plot, Contour plot, Cluster map, Hexbin, Strip plot, Swarm plot, Subplots.

Notebooks, Codebase

ML Modeling

I used Sklearn's SVM, Decision Tree, Random Forest, MLP, KNN.

  • Type 1: Based on the tabular input of the user (without extracted features from image), can we predict the interest level.
  • Type 2: Based on the tabular input of the user and the image, can we predict the interest level.
  • Type 3: Based on the tabular input of the user (without extracted features from image), can we predict the price.
  • Type 4: Based on the tabular input of the user and the image, can we predict the price.

Notebooks

Dash App

Codebase

How to Run

Instructions to run Dash app can be found here.

Instructions run EDA scripts can be found here.

Pre-run notebooks with results can be found here.

Instructions to run HuggingFace Chatbot with Dash template can be found here.

Instructions to run Interest-Level Prediction with Dash template can be found here.

Instructions to run Interest-Level Prediction with Dash template can be found here.

Demo

Overview Page

Apartments Listing Page

Data Analysis Page

Data Visualization Page

Interest Level Prediction Page

Rental Cost Prediction

Virtual Assistant Page

Presentation & Report

Presentation

Report

References

[1] “Two Sigma Connect: Rental Listing Inquiries | Kaggle,” Kaggle.com, 2023. https://www.kaggle.com/competitions/two-sigma-connect-rental-listing-inquiries/data?select=train.json.zip (accessed Nov. 23, 2023).

‌[2] Huging Chat API

[3] HugChat Chatbot with Streamlit Blog

[4] HugChat Chatbot with Streamlit Code

[5] OpenAssistant LLaMA 30B SFT 6

[6] HuggingChat - New Open Source Alternative to ChatGPT

[7] Open Assistant

[8] Dash with ChatGPT Code

[9] HugChat API Repository