Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backend] Support embedding model #242

Merged
merged 3 commits into from
Sep 9, 2024

Conversation

L-jasmine
Copy link
Collaborator

No description provided.

Signed-off-by: csh <458761603@qq.com>
@L-jasmine L-jasmine requested review from juntao and jmbejar September 1, 2024 16:00
@L-jasmine
Copy link
Collaborator Author

@juntao @jmbejar

If the model-card repository contains an embedding.json file, the corresponding embedding model will be downloaded, and both the embedding and chat models will be loaded.

moxin-org/model-cards#8

@L-jasmine
Copy link
Collaborator Author

https://github.com/moxin-org/moxin/pull/242/files#diff-d406d942d2259b118601d9d68766930f7b8a6bf5d6700fb22011065649d4024fR207-R208

We can set the environment variable MODEL_CARDS_REPO at compile time to control which model-card repository the backend should clone.

@juntao
Copy link
Collaborator

juntao commented Sep 1, 2024

Thanks. Please see my comment on the model card repo on the startup parameters for the nomic-embed model.

Also, when does the moxin app download the embedding model? What if the embedding model is not yet fully downloaded when the user starts to chat?

@jmbejar
Copy link
Collaborator

jmbejar commented Sep 2, 2024

Thanks. Please see my comment on the model card repo on the startup parameters for the nomic-embed model.

Also, when does the moxin app download the embedding model? What if the embedding model is not yet fully downloaded when the user starts to chat?

The PR approach is to download the embedding model at Moxin's start time.. which takes many seconds in my machine, but it could be slower for other people.

Maybe it is a good start to move this download to a background task, having the chat interactions to use it whenever is available. Of course, we would need to show some UI indication to communicate to the user if the text embedding is in use or not, but we can do it in a separate PR.

@jmbejar
Copy link
Collaborator

jmbejar commented Sep 2, 2024

@L-jasmine I tested this PR but the embedding json file was not downloading in my first attempts. Only when I removed the downloaded models-cards repo, the app recreated it (cloned it) and the embedding model was downloaded. So, I'm not sure if existing users will have the embedding models downloaded.

Looks like the "git pull" (in the models-cards repo folder) is not working for some reason.

@jmbejar
Copy link
Collaborator

jmbejar commented Sep 5, 2024

@juntao @L-jasmine I'm not merging this because I feel the commented issues are worth to be addressed.

However, I'm curious to know what you think about. Do you think it is possible/convenient to download the embedding model on a thread to do not block application startup. Or we should just merge and try to address it on the frontend somehow?

BTW @L-jasmine let me know if you were able to reproduce the other issue, when you already have a local copy of the model-cards repo but fails to pull the new json file.

@juntao
Copy link
Collaborator

juntao commented Sep 5, 2024

Yeah. Let's download it in a separate thread. We could prevent the user from chatting before this download completes. Thanks. @L-jasmine

Signed-off-by: csh <458761603@qq.com>
Signed-off-by: csh <458761603@qq.com>
@jmbejar
Copy link
Collaborator

jmbejar commented Sep 9, 2024

@L-jasmine Tested again. Here are the results:

1- The git pull issue is resolved 👍

2- I'm hitting an issue with the download of the embedded schema in the background.
I tested having a chat conversation while the embedded was being downloaded, but later I realized I need to reload the load manually (from the UI) to have the embedded included in the chat section.

It may require some thinking from the UI to properly handle the first time the embedded is download, how the user is informed about. But let's handle it in another PR.

Accepting and merging now

@jmbejar jmbejar merged commit 6714258 into moxin-org:dev Sep 9, 2024
5 checks passed
@L-jasmine
Copy link
Collaborator Author

@L-jasmine Tested again. Here are the results:

1- The git pull issue is resolved 👍

2- I'm hitting an issue with the download of the embedded schema in the background. I tested having a chat conversation while the embedded was being downloaded, but later I realized I need to reload the load manually (from the UI) to have the embedded included in the chat section.

It may require some thinking from the UI to properly handle the first time the embedded is download, how the user is informed about. But let's handle it in another PR.

Accepting and merging now

Yes, because I have no way to notify the frontend that the embedding model has already been downloaded, so I can only load the embedding model during the reload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants