You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This step is in Error state with this message: pods "test-ewe2j-t22kv-1073397324" is forbidden: failed quota: kf-resource-quota: must specify cpu,memory
It seems Kale submits container without limits which is blocked by KF. Any workaround for this ? we are using Kale 0.6 and KF 1.1
The text was updated successfully, but these errors were encountered:
I got a tricky way, when you save the pipeline.yaml, you can add something like below in the titanic-ml.kale.py to restrict the quota of the pod. _kale_step_limits = {'nvidia.com/gpu': '1'} for _kale_k, _kale_v in _kale_step_limits.items(): _kale_loaddata_task.container.add_resource_limit(_kale_k, _kale_v)
this is the usage of the GPU and it can transform into cpu and memory as well.
generally we can add those limits as per this directly in pipeline files: kubeflow/pipelines#5695
But I guess idea for Kale is that user don't have to mess with code in yaml or py file. If somebody will point me to the place in code where i can modify yaml for executed pod it i can try to make a patch for it.
but as you can see, it support the resource of GPU, how it can not be applied with cpu and memory. That just does not make any sense, Maybe it was not meant for multi user scen. So we can only fix it by modifing the source code
When using Kale with Kubeflow profile that has CPU or mem restrictions (https://www.kubeflow.org/docs/components/multi-tenancy/getting-started/#manual-profile-creation) after pipeline run first step fails with:
This step is in Error state with this message: pods "test-ewe2j-t22kv-1073397324" is forbidden: failed quota: kf-resource-quota: must specify cpu,memory
It seems Kale submits container without limits which is blocked by KF. Any workaround for this ? we are using Kale 0.6 and KF 1.1
The text was updated successfully, but these errors were encountered: