-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Dynamic Model Loading and Model Endpoint in FastAPI #17
Comments
Potentially related: |
@abetlen requested a list of prompt formats for various models Alpaca:
Vicuna:
as discussed in ggml-org/llama.cpp#302 (comment) Koala:
source: https://github.com/young-geng/EasyLM/blob/main/docs/koala.md Open Assistant: (no llama.cpp support yet)
|
Here's something that seemed interesting from vicuna that I just saw. I can definitely see the challenge trying to adapt to all these different input formats. This seemed like an extendable format that might help, not sure where you currently are on it. Edit: |
@jmtatsch @MillionthOdin16 thank you! I still have a few questions on the best way to implement this, appreciate any input. The basic features would allow you to:
The part I'm still scratching my head on are the chat models
I guess the solution would be to have some way to specify these pre-defined models and custom prompt serialisation functions for each. |
Hi! That way you can override the prompt generation from the outside and you could provide a list of model specific implementations to handle the message history and prompt generation on a per model basis :) |
Implemented in #931 |
I'd like to propose a future feature I think would add useful flexibility for users of the
completions/embeddings
API . I'm suggesting the ability to dynamically load models based on calls to theFastAPI
endpoint.The concept is as follows:
models
folder within the project) and allow users to specify an additional model folder if needed.GET
request to the/v1/engines
endpoint , which would return a list of models and their statuses.This dynamic model loading feature would align with the behavior of the OpenAI spec for models and model status. It would offer users the flexibility to easily choose and use different models without having make manual changes to the project or configs.
This is a suggestion for later, but I wanted to suggest it now so we can plan if we do decide to implement it.
Let me know your thoughts :)
The text was updated successfully, but these errors were encountered: