Note: This project is a part of BGSW (Bosch) GenAI Hackathon.
Pitch Deck
Cargo is an AI-powered advanced multimodal RAG chatbot designed to enable automotive companies to interact effectively with their extensive user manuals. This robust platform enables users to interact seamlessly with their PDF files that includes images, text, and tables.
- Q&A with multiple PDFs.
- Analyses texts, tables and images in given documents.
- LLM integration for intelligent responses.
- Multimodal search and retrieval.
- Search using images.
- Langchain: simplifies the integration of language models into applications, facilitating complex natural language processing tasks.
- Pinecone: offers a vector database that allows efficient similarity search and retrieval for high-dimensional data.
- Vertex AI: Machine learning platform by Google that gives access to Large Language Models.
- Google AI studio: platform that gives access to a text embedding model and Gemini.
- Unstructured: Core library for partitioning, cleaning, and chunking documents types for LLM applications.
- Cloudinary: provides a secure and comprehensive API for easily uploading media files
- Streamlit: allows for the rapid development of interactive web applications with minimal coding effort.
-
- Fork the repo
- Clone the repo to your local machine
git clone https://github.com/codedmachine111/cargo.git
- Change current directory
cd cargo
- Install latest version of Python and create a virtual environment:
python -m venv venv
./venv/Scripts/activate
-
Google Cloud Platform setup:
- Login to Google Cloud Platform and create a new project.
- Go to the project dashboard.
- Navigate to IAM & Admin > Service Accounts.
- Click Create Service Account.
- Grant the necessary permissions to this service account (e.g., Vertex AI User).
- Click on your newly created service account.
- Create a new key (JSON), rename it to
secret.json
and copy to the root directory of project.
-
Create a .env file in the root directory of the project and add:
PINECONE_API_KEY= "YOUR-API-KEY"
GOOGLE_API_KEY= "YOUR-API-KEY"
PROJECT_ID="YOUR-GOOGLE-CLOUD-PROJECT-ID"
CLOUDINARY_NAME="YOUR-CLOUDINARY-CLOUD-NAME"
CLOUDINARY_API_KEY="YOUR-CLOUDINARY-API-KEY"
CLOUDINARY_API_SECRET="YOUR-CLOUDINARY-API-SECRET"
You need to get your Google API key from here
Pinecone API key from here
Go to cloudinary, create a new account. Navigate to Media Library -> Settings -> API keys to find your credentials. Your Cloud name will be displayed on top left of console.
- Install Tesseract OCR on your machine.
For Windows:
- Download tesseract exe from here.
- Install the
.exe
file inC:\Program Files\Tesseract-OCR
. - Add a new path to the system environment variables:
- On the Windows search bar, search for “Environment Variables.” You will find “Edit the System Variable.”
- Next, in the “System Properties” window, click on the “Environment Variables” button.
- Under “System variables,” find the “Path” variable, select it, and click the “Edit” button.
- Click the “New” button and add the path to the Tesseract installation directory:
C:\Program Files\Tesseract-OCR.
- Then, click “OK” to save the changes.
Note
Follow this guide if you are on Mac or Linux.
- Install all dependencies:
pip install -r requirements.txt
- Start the app:
streamlit run main.py