Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with profile resource restrictions on Kubeflow #355

Open
szymek116 opened this issue Jun 9, 2021 · 3 comments
Open

Compatibility with profile resource restrictions on Kubeflow #355

szymek116 opened this issue Jun 9, 2021 · 3 comments

Comments

@szymek116
Copy link

When using Kale with Kubeflow profile that has CPU or mem restrictions (https://www.kubeflow.org/docs/components/multi-tenancy/getting-started/#manual-profile-creation) after pipeline run first step fails with:

This step is in Error state with this message: pods "test-ewe2j-t22kv-1073397324" is forbidden: failed quota: kf-resource-quota: must specify cpu,memory

It seems Kale submits container without limits which is blocked by KF. Any workaround for this ? we are using Kale 0.6 and KF 1.1

@brness
Copy link

brness commented Aug 13, 2021

I got a tricky way, when you save the pipeline.yaml, you can add something like below in the titanic-ml.kale.py to restrict the quota of the pod.
_kale_step_limits = {'nvidia.com/gpu': '1'} for _kale_k, _kale_v in _kale_step_limits.items(): _kale_loaddata_task.container.add_resource_limit(_kale_k, _kale_v)
this is the usage of the GPU and it can transform into cpu and memory as well.

@szymek116
Copy link
Author

generally we can add those limits as per this directly in pipeline files:
kubeflow/pipelines#5695

But I guess idea for Kale is that user don't have to mess with code in yaml or py file. If somebody will point me to the place in code where i can modify yaml for executed pod it i can try to make a patch for it.

@brness
Copy link

brness commented Sep 24, 2021

but as you can see, it support the resource of GPU, how it can not be applied with cpu and memory. That just does not make any sense, Maybe it was not meant for multi user scen. So we can only fix it by modifing the source code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants