College Q&A Chatbot

This project is a Q&A chatbot designed to answer questions related to the Global Academy of Technology (GAT) using a combination of large language models (LLMs), Text-Embeddings, Retrieval-Augmented Generation (RAG), and Prompt Engineering Techniques. The chatbot can process both text and audio inputs, providing relevant answers based on the conversation history and preloaded documents.

Features

Text and Audio Input: Accepts user queries via text input or voice recording.
Retrieval-Augmented Generation (RAG): Enhances responses using relevant information retrieved from preloaded documents.
Context-Aware Responses: Utilizes conversation history to provide coherent and contextually appropriate answers.
Streamlit Interface: User-friendly interface built with Streamlit, featuring options for follow-up mode and search depth customization.

Installation

Clone the Repository

git clone https://github.com/mahadev0811/CollegeChatbot.git
cd CollegeChatbot

Create a Virtual Environment

python -m venv venv
source venv/bin/activate   # On Windows use `venv\Scripts\activate`

Install Dependencies
```
pip install -r requirements.txt
```
Configure API Key
- Create a config.json file in the root directory with your Google API key:
```
{
  "google_api_key": "YOUR_GOOGLE_API_KEY"
}
```

Usage

Run the Application
```
streamlit run st_app.py
```
Interact with the Chatbot
- Use the text input box or the audio recorder to submit your queries.
- Adjust settings using the sidebar options:
  - Follow-up Questions Mode: Toggle to use conversation history for responses.
  - Search Depth: Adjust the number of paragraphs to search in the documents for relevant information.

Project Structure

st_app.py: Main application script.
embedding_generator.py: Script for generating embeddings from the data file.
webscrapper.ipynb: Jupyter notebook for scraping text data from URLs to generate the raw data file.
config.json: Configuration file for API keys.
requirements.txt: List of required Python packages.
data_generation/gat_raw.txt: Raw scraped data containing information about GAT.
data_generation/gat_refined.txt: Human-supervised and edited version of the raw data.
gat_embeddings.pkl: Precomputed embeddings for the preloaded document.

Generating Data

To generate the initial raw data file (gat_raw.txt), use the webscrapper.ipynb notebook. This notebook scrapes text content from given URLs and formats it appropriately.

Generating Embeddings

To generate embeddings from your data file, use the embedding_generator.py script. This script reads a text file containing the data, generates embeddings using the FlagEmbedding model, and saves the embeddings as a pickle file.

Running the Embedding Generator

Prepare Your Data File

Ensure your data file (e.g., data_generation/gat_refined.txt) is in the correct format, with paragraphs separated by double newlines (\n\n).

Run the Script

Run the embedding_generator.py script with the path to your data file as an argument:
```
python embedding_generator.py --data_file data_generation/gat_refined.txt
```
The script will generate embeddings for the paragraphs in the data file and save them as a pickle file (gat_embeddings.pkl).

Screenshots

This video shows the chatbot in action, answering questions about GAT:

recordings.mp4

Credits

FlagEmbedding: Custom embedding model used for encoding queries.
Streamlit: Open-source app framework for ML and data science projects.
Hugging Face Transformers: Library for state-of-the-art NLP models.
Google Cloud Speech-to-Text API: Service for converting speech into text.
Google Generative AI: Used for generating responses.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data_generation		data_generation
screenshots		screenshots
LICENSE		LICENSE
README.md		README.md
config.json		config.json
embeddings_generator.py		embeddings_generator.py
gat_embeddings.pkl		gat_embeddings.pkl
modules_info.txt		modules_info.txt
requirements.txt		requirements.txt
st_app.py		st_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

College Q&A Chatbot

Features

Installation

Usage

Project Structure

Generating Data

Generating Embeddings

Running the Embedding Generator

Prepare Your Data File

Run the Script

Screenshots

Credits

License

About

Releases

Packages

Contributors 2

Languages

License

mahadev0811/CollegeChatbot

Folders and files

Latest commit

History

Repository files navigation

College Q&A Chatbot

Features

Installation

Usage

Project Structure

Generating Data

Generating Embeddings

Running the Embedding Generator

Prepare Your Data File

Run the Script

Screenshots

Credits

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages