The bot enables you to explore the web, condense information, and search through documents.
- Bot client - React App
- Bot Server - Written in python
- @minchat/react-chat-ui
- Bing APIs SDK for Python
- AnyScale API
- ChromaDB (InMemory)
- OpenAI (the key needs to be set)
- Run client locally,
cd client
npm start
- Run server locally (recommended to use launch.json)
python3 main.py
- General Internet Search
- File Summarization
- Search within a document
- Semantic search on multiple documents
- The Client App initiates the query.
- The query is transmitted to the backend, which queries the Bing API.
- The backend subsequently generates a map of URLs and snippets, sending it to OpenAI/Llama for summarization.
- The summarized response is then relayed back to the client app.
Just-Another-Bot.mp4
The three predominant methods of summarization include:
- Directly extracting the data and forwarding it to LLM for summarization. While suitable for smaller documents, this approach becomes costly and encounters token limitations for larger documents.
- Employing Langchain MapReduce to divide the document into segments, sending each part to LLM for summarization, and then consolidating the responses before sending them back to LLM for the final summary. Although easy to implement, this method exhibits high latency, requiring multiple invocations of LLM.
- Utilizing K-means vector clustering, the approach implemented in this project involves the following steps:
- Segmenting the document into sections (paragraphs were used as sections in this case).
- Vectorizing each section.
- Employing K-means clustering to label and cluster the sections, with the number of clusters varying based on the document type.
- Identifying the representative chunk from each cluster.
- Combining all the representative chunks from different clusters.
- Sending the consolidated representative chunk to LLM for the final summary.
- The user submits a document in the application and requests summarization.
- The document undergoes parsing and is stored temporarily.
- The backend subsequently executes steps 3.a to 3.f outlined in the preceding section.
Summarization.mp4
- The user uploads a document in the application and submits a list of questions.
- The document undergoes parsing and is stored temporarily.
- The backend extracts text from the document and generates embeddings.
- The document is clustered using K-Means clustering.
- The representative chunks for each cluster are sent to LLM for summarization.
- User input query and the summay is sent to LLM to get the final answer, which is then sent back to user.
Search-document.mp4
- The user navigates to settings page and upload the directory with files in it.
- File events are created in SQLite Events table
- The server process the files and do the following: a. Chunking - 100 chunks per file b. Associate metadata with each chunk c. Generate and store embeddings in chromadb
source venv/bin/activate
pip freeze | xargs pip uninstall -y
pip install llama-index
- Better exception handling specially in main.py and handlers package
- Implement intent classifier for query and command using Instructor
- Improve deployment strategy
- Apply stylesheet to Settings page