Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion][Umbrella] ModelAdapter Issues #700

Open
1 of 5 tasks
kerthcet opened this issue Feb 18, 2025 · 1 comment
Open
1 of 5 tasks

[Discussion][Umbrella] ModelAdapter Issues #700

kerthcet opened this issue Feb 18, 2025 · 1 comment
Labels
area/lora kind/enhancement New feature or request priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@kerthcet
Copy link
Collaborator

kerthcet commented Feb 18, 2025

Track some issues I have for ModelAdapter, some are big concerns, some are small, just for discussions so I can help with the project.
Will update them based on the latest design.

  • 1. modelAdapterStatus.Instances will record all the Pod names as a list, will that lead to a explosion of the list length if we have lots of Pods
  • 2. we'll load/unload the adapter in the runtime of reconciliation via http requests, which might be too heavy for the controller, especially when thousands of adapters reconciling the same time, maybe agent?
  • 3. we'll validate the CRD in the controller, which should be delegated to the webhook, see separate issue: [Umbrella] Add webhook for validation #710
  • 4. we have a scheduler in the controller, which usually should be a separate component, but I think it's ok as a start
  • 5. TODO: when removing modelAdapter, we only unload 0-index instance rather than the whole list
@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 18, 2025

  1. there're some basic assumptions on the usage. Basically, lora will serve "high density" use case, I won't expect lora to be scheduled across multiple instances for most of the time. If that case, instances list won't be long. If lora has multiple replicas and they are hot. we should merge the loras. (in public proposal, there's a field called dynamic merge, that's designed for this case)

  2. load/unload via http request is not that elegant but I am not aware of other means. Do you have suggestions? you mean hand it over to agent? and let agent to reconcile the object and send requests? I think that way sounds good. we are a little bit hesitate to introduce agent earlier. At that time, we consider to provide the host level agent to manage the model artifacts instead of ai runtime agent at this moment. We can consider to refactor this part. Let's have an offline discussion

  3. Totally agree, it used to simplify the deployment and avoid the webhook.

  4. We can have more discussion later. In our internal system, it plays multiples roles, including but not limited to scheduling, descheduling, rebalancer. We can consider to maintain a simplified version for lora.

  5. multiple replicas are not supported yet. tracking issue Support multiple Lora adapter replicas #129

@Jeffwan Jeffwan added kind/enhancement New feature or request area/lora priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/lora kind/enhancement New feature or request priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

2 participants