GitHub - gtinside/just-another-chat-bot: A bot that allows you to search internet

Just Another Chat Bot

The bot enables you to explore the web, condense information, and search through documents.

Components

Bot client - React App
Bot Server - Written in python

Dependencies

@minchat/react-chat-ui
Bing APIs SDK for Python
AnyScale API
ChromaDB (InMemory)
OpenAI (the key needs to be set)

How to?

Run client locally,

cd client
npm start

Run server locally (recommended to use launch.json)

python3 main.py

Functionalities

General Internet Search
File Summarization
Search within a document
Semantic search on multiple documents

General Internet Search

Workflow

The Client App initiates the query.
The query is transmitted to the backend, which queries the Bing API.
The backend subsequently generates a map of URLs and snippets, sending it to OpenAI/Llama for summarization.
The summarized response is then relayed back to the client app.

Demo

Just-Another-Bot.mp4

File Summarization

The three predominant methods of summarization include:

Directly extracting the data and forwarding it to LLM for summarization. While suitable for smaller documents, this approach becomes costly and encounters token limitations for larger documents.
Employing Langchain MapReduce to divide the document into segments, sending each part to LLM for summarization, and then consolidating the responses before sending them back to LLM for the final summary. Although easy to implement, this method exhibits high latency, requiring multiple invocations of LLM.
Utilizing K-means vector clustering, the approach implemented in this project involves the following steps:
- Segmenting the document into sections (paragraphs were used as sections in this case).
- Vectorizing each section.
- Employing K-means clustering to label and cluster the sections, with the number of clusters varying based on the document type.
- Identifying the representative chunk from each cluster.
- Combining all the representative chunks from different clusters.
- Sending the consolidated representative chunk to LLM for the final summary.

Workflow

The user submits a document in the application and requests summarization.
The document undergoes parsing and is stored temporarily.
The backend subsequently executes steps 3.a to 3.f outlined in the preceding section.

Demo

Summarization.mp4

Search within a document

Workflow

The user uploads a document in the application and submits a list of questions.
The document undergoes parsing and is stored temporarily.
The backend extracts text from the document and generates embeddings.
The document is clustered using K-Means clustering.
The representative chunks for each cluster are sent to LLM for summarization.
User input query and the summay is sent to LLM to get the final answer, which is then sent back to user.

Demo

Search-document.mp4

Semantic search on multiple documents

Workflow

The user navigates to settings page and upload the directory with files in it.
File events are created in SQLite Events table
The server process the files and do the following: a. Chunking - 100 chunks per file b. Associate metadata with each chunk c. Generate and store embeddings in chromadb

Demo

pip errors

source venv/bin/activate
pip freeze | xargs pip uninstall -y
pip install llama-index

TODO

Better exception handling specially in main.py and handlers package
Implement intent classifier for query and command using Instructor
Improve deployment strategy
Apply stylesheet to Settings page

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
client		client
deployment		deployment
server		server
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Just Another Chat Bot

Components

Dependencies

How to?

Functionalities

General Internet Search

Workflow

Demo

File Summarization

Workflow

Demo

Search within a document

Workflow

Demo

Semantic search on multiple documents

Workflow

Demo

pip errors

TODO

About

Releases

Packages

Languages

gtinside/just-another-chat-bot

Folders and files

Latest commit

History

Repository files navigation

Just Another Chat Bot

Components

Dependencies

How to?

Functionalities

General Internet Search

Workflow

Demo

File Summarization

Workflow

Demo

Search within a document

Workflow

Demo

Semantic search on multiple documents

Workflow

Demo

pip errors

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages