Skip to content

Nanyang Technological Univercity CZ4034 Information Retrieval Group Project

Notifications You must be signed in to change notification settings

M2-Luminous/CZ4043-Information-Retrieval

 
 

Repository files navigation

Information Retrieval Project

This repository contains the project for 2023 CZ4043 Group 31

Application Structure

There are several moving elements to this application and they are summarised as follows

  • Web app: This is the client that is written in React so that the user can interact with the search engine (/web-app). The web app directly communicates with the backend and indexing engine to reduce complexity since the engine is able to directly output data in JSON.
  • Indexing Engine: This is the Solr Server that is running that allows the data to be searched and the relevant information to be retrieved.
  • Backend: This is a Flask Server that runs the crawler script which fetches new data when requested. It scrapes specific sites like Cars.com for additional data (/backend).
  • ML Model: This is a RoBERTa model that has been fine tuned on large sentiment datasets and then further fine tuned by us specifically for the cars dataset. This follows the principle of task-specific training followed by domain-specific training as highlighted in this paper (/classification). Caddy: This is a reverse proxy that is used to avoid CORS errors. Docker: Used to run all the applications.

A visual representation is as follows,

flowchart LR
    A["fa:fa-user Client"] --> B["fa:fa-network-wired Caddy"]
    subgraph Docker
    B -->|/solr/*| C[(Solr)]
    B -->|/api/*| D["fa:fa-server Flask"]
    B -->|/*| E["fa:fa-react React"]
    C[(Solr)] --> Zoo
    end
Loading

Requirements

You will need Docker in order to run the application. The commands are

docker compose up -d

To stop the containers run

docker compose stop

To delete the images run

docker compose down

Initial Setup

  1. Start the containers with the docker command above.
  2. Navigate to the Solr Server.
  3. Create a new collection called info_retrieval.
  4. Upload the data.csv file found at the root directory to this collection
    1. Go to the collection.
    2. Go to the Documents section
    3. Change Document type to File upload
    4. Select the csv file and then press submit
  5. Navigate to the Homepage of the app.

Explanation of Files

  • data.csv: The fully processed data that is to be loaded into the Solr server for consumption.
  • docker-compose.yml: The compose file to spin up all the project containers.

About

Nanyang Technological Univercity CZ4034 Information Retrieval Group Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 88.1%
  • JavaScript 6.6%
  • CSS 2.5%
  • Python 2.0%
  • Other 0.8%