-
Notifications
You must be signed in to change notification settings - Fork 45
feat(trainer): Support namespaced TrainingRuntime in the SDK #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(trainer): Support namespaced TrainingRuntime in the SDK #130
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
24f00a7 to
1740535
Compare
|
/ok-to-test |
Signed-off-by: Moeed Shaik <shaikmoeed@gmail.com>
Signed-off-by: Moeed Shaik <shaikmoeed@gmail.com>
8f0b6d5 to
de2ad1b
Compare
Signed-off-by: Moeed Shaik <shaikmoeed@gmail.com>
|
Thank you @shaikmoeed for this! |
|
|
||
| def get_runtime(self, name: str) -> types.Runtime: | ||
| """Get the the Runtime object""" | ||
| """Get the the Runtime object prefer namespaced, fall-back to cluster-scoped""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| """Get the the Runtime object prefer namespaced, fall-back to cluster-scoped""" | |
| """Get the Runtime object prefer namespaced, fall-back to cluster-scoped""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same change goes for each occurence here
| ) | ||
|
|
||
| cluster_runtime_list = models.TrainerV1alpha1ClusterTrainingRuntimeList.from_dict( | ||
| cluster_thread.get(constants.DEFAULT_TIMEOUT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| cluster_thread.get(constants.DEFAULT_TIMEOUT) | |
| cluster_thread.get(common_constants.DEFAULT_TIMEOUT) |
| def create_training_runtime( | ||
| name: str, | ||
| namespace: str = "default", | ||
| ) -> models.TrainerV1alpha1TrainingRuntime: | ||
| """Create a mock namespaced TrainingRuntime object (not cluster-scoped).""" | ||
| return models.TrainerV1alpha1TrainingRuntime( | ||
| apiVersion=constants.API_VERSION, | ||
| kind="TrainingRuntime", | ||
| metadata=models.IoK8sApimachineryPkgApisMetaV1ObjectMeta( | ||
| name=name, | ||
| namespace=namespace, | ||
| labels={constants.RUNTIME_FRAMEWORK_LABEL: name}, | ||
| ), | ||
| spec=models.TrainerV1alpha1TrainingRuntimeSpec( | ||
| mlPolicy=models.TrainerV1alpha1MLPolicy( | ||
| torch=models.TrainerV1alpha1TorchMLPolicySource( | ||
| numProcPerNode=models.IoK8sApimachineryPkgUtilIntstrIntOrString(2) | ||
| ), | ||
| numNodes=2, | ||
| ), | ||
| template=models.TrainerV1alpha1JobSetTemplateSpec( | ||
| metadata=models.IoK8sApimachineryPkgApisMetaV1ObjectMeta( | ||
| name=name, | ||
| namespace=namespace, | ||
| ), | ||
| spec=models.JobsetV1alpha2JobSetSpec(replicatedJobs=[get_replicated_job()]), | ||
| ), | ||
| ), | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to create this in kubernetes/backend_test.py?
this is not a test function and I believe it should be added to the TrainerClient and propagated to the different backends.
What this PR does / why we need it:
Add support to list/get namespaced TrainingRuntime.
Which issue(s) this PR fixes:
Fixes #88
Checklist: