-
Notifications
You must be signed in to change notification settings - Fork 45
fix: Support custom images in ClusterTrainingRuntime for container backend #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
002ffd5 to
07d86bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Fiona-Waters!
/lgtm
| raise ValueError(f"Runtime {name} from {source} 'node' must specify containers[0].image") | ||
|
|
||
| # Extract the container image | ||
| image = containers[0].get("image") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might not be that important in practice, but maybe more robust to select the container named "node".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've updated to select the container named "node". PTAL.
…ckend Signed-off-by: Fiona Waters <fiwaters6@gmail.com>
07d86bf to
e061aa4
Compare
Pull Request Test Coverage Report for Build 19070400017Details
💛 - Coveralls |
|
Thanks @Fiona-Waters! /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: astefanutti The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| name: str | ||
| trainer: RuntimeTrainer | ||
| pretrained_model: Optional[str] = None | ||
| image: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@astefanutti @Fiona-Waters, shall we move this image under RuntimeTrainer ?
Since we also have initializer in the Runtime.
Also, we might need to update Kubernetes backend to also populate this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich right that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can look at creating a follow on PR to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you already got there :-)
What this PR does / why we need it:
While testing use of a custom image in the container backend I noticed that the image was not being used and instead the default one was being picked up. This PR will fix this and ensure that the TrainingRuntimeSource will correctly use the image specified in the users ClusterTrainingRuntime yaml.
cc @andreyvelich @astefanutti @kramaranya
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...format, will close the issue(s) when PR gets merged):Fixes #
Checklist: