[Discussion][Umbrella] ModelAdapter Issues #700

kerthcet · 2025-02-18T07:52:44Z

Track some issues I have for ModelAdapter, some are big concerns, some are small, just for discussions so I can help with the project.
Will update them based on the latest design.

1. modelAdapterStatus.Instances will record all the Pod names as a list, will that lead to a explosion of the list length if we have lots of Pods
2. we'll load/unload the adapter in the runtime of reconciliation via http requests, which might be too heavy for the controller, especially when thousands of adapters reconciling the same time, maybe agent?
3. we'll validate the CRD in the controller, which should be delegated to the webhook, see separate issue: [Umbrella] Add webhook for validation #710
4. we have a scheduler in the controller, which usually should be a separate component, but I think it's ok as a start
5. TODO: when removing modelAdapter, we only unload 0-index instance rather than the whole list

The text was updated successfully, but these errors were encountered:

Jeffwan · 2025-02-18T17:34:26Z

there're some basic assumptions on the usage. Basically, lora will serve "high density" use case, I won't expect lora to be scheduled across multiple instances for most of the time. If that case, instances list won't be long. If lora has multiple replicas and they are hot. we should merge the loras. (in public proposal, there's a field called dynamic merge, that's designed for this case)
load/unload via http request is not that elegant but I am not aware of other means. Do you have suggestions? you mean hand it over to agent? and let agent to reconcile the object and send requests? I think that way sounds good. we are a little bit hesitate to introduce agent earlier. At that time, we consider to provide the host level agent to manage the model artifacts instead of ai runtime agent at this moment. We can consider to refactor this part. Let's have an offline discussion
Totally agree, it used to simplify the deployment and avoid the webhook.
We can have more discussion later. In our internal system, it plays multiples roles, including but not limited to scheduling, descheduling, rebalancer. We can consider to maintain a simplified version for lora.
multiple replicas are not supported yet. tracking issue Support multiple Lora adapter replicas #129

Jeffwan added kind/enhancement New feature or request area/lora priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion][Umbrella] ModelAdapter Issues #700

[Discussion][Umbrella] ModelAdapter Issues #700

kerthcet commented Feb 18, 2025 •

edited

Loading

Jeffwan commented Feb 18, 2025

[Discussion][Umbrella] ModelAdapter Issues #700

[Discussion][Umbrella] ModelAdapter Issues #700

Comments

kerthcet commented Feb 18, 2025 • edited Loading

Jeffwan commented Feb 18, 2025

kerthcet commented Feb 18, 2025 •

edited

Loading