Information Retrieval Project

This repository contains the project for 2023 CZ4043 Group 31

Application Structure

There are several moving elements to this application and they are summarised as follows

Web app: This is the client that is written in React so that the user can interact with the search engine (/web-app). The web app directly communicates with the backend and indexing engine to reduce complexity since the engine is able to directly output data in JSON.

Indexing Engine: This is the Solr Server that is running that allows the data to be searched and the relevant information to be retrieved.

Backend: This is a Flask Server that runs the crawler script which fetches new data when requested. It scrapes specific sites like Cars.com for additional data (/backend).

ML Model: This is a RoBERTa model that has been fine tuned on large sentiment datasets and then further fine tuned by us specifically for the cars dataset. This follows the principle of task-specific training followed by domain-specific training as highlighted in this paper (/classification). Caddy: This is a reverse proxy that is used to avoid CORS errors. Docker: Used to run all the applications.

A visual representation is as follows,

flowchart LR
    A["fa:fa-user Client"] --> B["fa:fa-network-wired Caddy"]
    subgraph Docker
    B -->|/solr/*| C[(Solr)]
    B -->|/api/*| D["fa:fa-server Flask"]
    B -->|/*| E["fa:fa-react React"]
    C[(Solr)] --> Zoo
    end

Requirements

You will need Docker in order to run the application. The commands are

docker compose up -d

To stop the containers run

docker compose stop

To delete the images run

docker compose down

Initial Setup

Start the containers with the docker command above.
Navigate to the Solr Server.
Create a new collection called info_retrieval.
Upload the data.csv file found at the root directory to this collection
1. Go to the collection.
2. Go to the Documents section
3. Change Document type to File upload
4. Select the csv file and then press submit
Navigate to the Homepage of the app.

Explanation of Files

data.csv: The fully processed data that is to be loaded into the Solr server for consumption.

docker-compose.yml: The compose file to spin up all the project containers.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.vscode		.vscode
backend		backend
caddy		caddy
classification		classification
indexing		indexing
submission		submission
web-app		web-app
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
data.csv		data.csv
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information Retrieval Project

Application Structure

Requirements

Initial Setup

Explanation of Files

About

Releases

Packages

Languages

M2-Luminous/CZ4043-Information-Retrieval

Folders and files

Latest commit

History

Repository files navigation

Information Retrieval Project

Application Structure

Requirements

Initial Setup

Explanation of Files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages