Pradigi Hiring Task

A Q&A chatbot for the website.

Setup:

Set `.env` from `.env.example`

MISTRAL_API_KEY=MistralAI API Key
QDRANT_API_KEY=Qdrant API Key (only if hosted using qdrant cloud)
QDRANT_URL=Qdrant URL
QDRANT_CN=Qdrant Collection Name
PROJ_BASE_URL=https://pratham.org/
JINA_RR_API_KEY=API Key for Jina AI Reranker v2 [optional]
FASTEMBED_MODEL=Embedding model name for fastembed (One of: 1. jinaai/jina-embeddings-v2-small-en or 2. BAAI/bge-small-en-v1.5)

Run Locally

In a Python 3.12 environment shell of your choice, run the following command:

(venv) python -m pip install pre-commit pip-tools
(venv) pip-sync requirements/requirements.txt

Run locally in dev mode:

(venv) pip-sync requirements/requirements.txt requirements/requirements-dev.txt

Ingestion:

To ingest data (from pre-defined sources), run the following:

(venv) python -m src.ingest

Note: It is recommended to run this command in a tmux shell, since it takes a long time depending upon the chunk size and embedding model used.

Chat UI:

To run chat ui, run the following:

(venv) python -m src.chat

Then open a new tab in your favourite browser, and enter http://localhost:7860/ into the address bar.

To re-rank or not to:

While running in dev mode, one can also use the Jina AI Reranker model to filter out the chunks of text chosen for generation. We observe that this approach is effective only as long as the top_k vectors to be searched for is greater than 50.

To use the reranker:

Set JINA_RR_API_KEY in .env.
Uncomment the following lines from chat.py:
a. line 37
b. line 43
Change the similarity_top_k argument from 10 to 50 or greater at line 42. Higher value means higher cost per request.

For most of our use-cases, top_k = 10 is a good standard value and doesnot require reranker api.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
requirements		requirements
src		src
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pradigi Hiring Task

Setup:

Set `.env` from `.env.example`

Run Locally

Run locally in dev mode:

Ingestion:

Chat UI:

To re-rank or not to:

To use the reranker:

About

Releases

Packages

Languages

sdk2k01/pradigi-task

Folders and files

Latest commit

History

Repository files navigation

Pradigi Hiring Task

Setup:

Set .env from .env.example

Run Locally

Run locally in dev mode:

Ingestion:

Chat UI:

To re-rank or not to:

To use the reranker:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Set `.env` from `.env.example`

Packages