Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chain server needs restart after initial file upload #61

Open
MattFeinberg opened this issue Nov 15, 2024 · 2 comments
Open

Chain server needs restart after initial file upload #61

MattFeinberg opened this issue Nov 15, 2024 · 2 comments

Comments

@MattFeinberg
Copy link
Collaborator

On a fresh clone and running of NIM anywhere, the first time you upload files to the knowledge base, the chain server won't be able to pick up the new documents without being restarted.

Steps to reproduce:

Remove any instances of NIM anywhere on your workbench (you might not have to do this, but this is what I've been doing to simplify my life when reproducing this error)

Remove the milvus docker volume created by NIM Anywhere

Clone the NIM anywhere project and go through the startup/config steps

When it comes time to turn applications on, turn them on in the following order:

Milvus
Redis
Chain Server
Chat frontend
Jupyter Lab

Then, go to the Jupyter notebook and run through the steps to upload files.

I don't think that it reliably uploads the same files every time, so, you should copy one of the uploaded file names from the jupyter notebook output and find the file in the /data/dataset folder. Open the file and find the article's author's name. Try to find an article by a single person rather than a group.

You can either ask the chatbot or the chain server directly (make sure you check the use knowledge base checkbox):

"Tell me about an article in the context by AUTHOR_NAME_HERE"

You should notice that the chatbot will not correctly identify the article

Then, restart the chain server, try again, and you should see that the chatbot will either name the article directly, or sometimes it will just describe it/tell you what it is about

@rmkraus
Copy link
Collaborator

rmkraus commented Nov 17, 2024

I've run through a couple permutations and here is what I've found.

If the Milvus collection has not been populated when the chain server starts, the you currently have to restart langchain after data is added to the collection. This seems to mostly be because the collection's tables cannot be (easily) initialized until an embedded vector is provided. When the data gets added, the tables get created and the collection works, but langchain never goes back to check.

I think the best thing to do for this is to add a gui for uploading documents and have that upload function trigger a reload of the chain if it is uploading data into an empty collection.

@MattFeinberg
Copy link
Collaborator Author

I agree. I'll take a stab at implementing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants