Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python SDK for Kubeflow Training Operator #1420

Merged
merged 9 commits into from
Oct 3, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions hack/python-sdk/post_gen.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/usr/bin/env python
Jeffwan marked this conversation as resolved.
Show resolved Hide resolved

"""
This script is used for updating generated SDK files.
"""

import os
import fileinput
import re

__replacements = [
("import kubeflow.training", "from kubeflow.training.models import *"),
("kubeflow.training.models.v1\/.*.v1.", "V1")
]

sdk_dir = os.path.abspath(os.path.join(__file__, "../../..", "sdk/python"))


def main():
fix_test_files()


def fix_test_files() -> None:
"""
Fix invalid model imports in generated model tests
"""
os.path.realpath(__file__)
test_folder_dir = os.path.join(sdk_dir, "test/models")
test_files = os.listdir(test_folder_dir)
for test_file in test_files:
print(test_file)
with fileinput.FileInput(os.path.join(test_folder_dir, test_file), inplace=True) as file:
for line in file:
print(_apply_regex(line), end='')


def _apply_regex(input_str: str) -> str:
for pattern, replacement in __replacements:
input_str = re.sub(pattern, replacement, input_str)
return input_str


if __name__ == '__main__':
main()
31 changes: 27 additions & 4 deletions sdk/python/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Kubeflow TFJob SDK
Python SDK for TF-Operator
# Kubeflow Training SDK
Python SDK for Training Operator

## Requirements.

Expand All @@ -9,12 +9,12 @@ Python 2.7 and 3.5+
### pip install

```sh
pip install kubeflow-tfjob
pip install kubeflow-training
Jeffwan marked this conversation as resolved.
Show resolved Hide resolved
```

Then import the package:
```python
from kubeflow import tfjob
from kubeflow import training
```

### Setuptools
Expand Down Expand Up @@ -46,14 +46,37 @@ Class | Method | Description
[TFJobClient](docs/TFJobClient.md) | [is_job_succeeded](docs/TFJobClient.md#is_job_succeeded) | Check if the TFJob status is Succeeded |
[TFJobClient](docs/TFJobClient.md) | [get_pod_names](docs/TFJobClient.md#get_pod_names) | Get pod names of TFJob |
[TFJobClient](docs/TFJobClient.md) | [get_logs](docs/TFJobClient.md#get_logs) | Get training logs of the TFJob |
[PyTorchJobClient](docs/PyTorchJobClient.md) | [create](docs/PyTorchJobClient.md#create) | Create PyTorchJob|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the Client docs were deleted by generator. Should we modify our script to not delete PyTorchJobClient.md and docs/TFJobClient.md during SDK generator run ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that, added client docs back

[PyTorchJobClient](docs/PyTorchJobClient.md) | [get](docs/PyTorchJobClient.md#get) | Get the specified PyTorchJob or all PyTorchJob in the namespace |
[PyTorchJobClient](docs/PyTorchJobClient.md) | [patch](docs/PyTorchJobClient.md#patch) | Patch the specified PyTorchJob|
[PyTorchJobClient](docs/PyTorchJobClient.md) | [delete](docs/PyTorchJobClient.md#delete) | Delete the specified PyTorchJob |
[PyTorchJobClient](docs/PyTorchJobClient.md) | [wait_for_job](docs/PyTorchJobClient.md#wait_for_job) | Wait for the specified job to finish |
[PyTorchJobClient](docs/PyTorchJobClient.md) | [wait_for_condition](docs/PyTorchJobClient.md#wait_for_condition) | Waits until any of the specified conditions occur |
[PyTorchJobClient](docs/PyTorchJobClient.md) | [get_job_status](docs/PyTorchJobClient.md#get_job_status) | Get the PyTorchJob status|
[PyTorchJobClient](docs/PyTorchJobClient.md) | [is_job_running](docs/PyTorchJobClient.md#is_job_running) | Check if the PyTorchJob running |
[PyTorchJobClient](docs/PyTorchJobClient.md) | [is_job_succeeded](docs/PyTorchJobClient.md#is_job_succeeded) | Check if the PyTorchJob Succeeded |
[PyTorchJobClient](docs/PyTorchJobClient.md) | [get_pod_names](docs/PyTorchJobClient.md#get_pod_names) | Get pod names of PyTorchJob |
[PyTorchJobClient](docs/PyTorchJobClient.md)| [get_logs](docs/PyTorchJobClient.md#get_logs) | Get training logs of the PyTorchJob |
## Documentation For Models

## Documentation For Models

- [V1JobCondition](docs/V1JobCondition.md)
- [V1JobStatus](docs/V1JobStatus.md)
- [V1MXJob](docs/V1MXJob.md)
- [V1MXJobList](docs/V1MXJobList.md)
- [V1MXJobSpec](docs/V1MXJobSpec.md)
- [V1PyTorchJob](docs/V1PyTorchJob.md)
- [V1PyTorchJobList](docs/V1PyTorchJobList.md)
- [V1PyTorchJobSpec](docs/V1PyTorchJobSpec.md)
- [V1ReplicaSpec](docs/V1ReplicaSpec.md)
- [V1ReplicaStatus](docs/V1ReplicaStatus.md)
- [V1RunPolicy](docs/V1RunPolicy.md)
- [V1SchedulingPolicy](docs/V1SchedulingPolicy.md)
- [V1TFJob](docs/V1TFJob.md)
- [V1TFJobList](docs/V1TFJobList.md)
- [V1TFJobSpec](docs/V1TFJobSpec.md)
- [V1XGBoostJob](docs/V1XGBoostJob.md)
- [V1XGBoostJobList](docs/V1XGBoostJobList.md)
- [V1XGBoostJobSpec](docs/V1XGBoostJobSpec.md)

5 changes: 3 additions & 2 deletions sdk/python/docs/V1JobCondition.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# V1JobCondition

JobCondition describes the state of the job at a certain point.
## Properties
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**last_transition_time** | [**V1Time**](V1Time.md) | Last time the condition transitioned from one status to another. | [optional]
**last_update_time** | [**V1Time**](V1Time.md) | The last time this condition was updated. | [optional]
**last_transition_time** | [**K8sIoApimachineryPkgApisMetaV1Time**](K8sIoApimachineryPkgApisMetaV1Time.md) | | [optional]
**last_update_time** | [**K8sIoApimachineryPkgApisMetaV1Time**](K8sIoApimachineryPkgApisMetaV1Time.md) | | [optional]
**message** | **str** | A human readable message indicating details about the transition. | [optional]
**reason** | **str** | The reason for the condition's last transition. | [optional]
**status** | **str** | Status of the condition, one of True, False, Unknown. |
Expand Down
7 changes: 4 additions & 3 deletions sdk/python/docs/V1JobStatus.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
# V1JobStatus

JobStatus represents the current observed state of the training Job.
## Properties
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**completion_time** | [**V1Time**](V1Time.md) | Represents time when the job was completed. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC. | [optional]
**completion_time** | [**K8sIoApimachineryPkgApisMetaV1Time**](K8sIoApimachineryPkgApisMetaV1Time.md) | | [optional]
**conditions** | [**list[V1JobCondition]**](V1JobCondition.md) | Conditions is an array of current observed job conditions. |
**last_reconcile_time** | [**V1Time**](V1Time.md) | Represents last time when the job was reconciled. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC. | [optional]
**last_reconcile_time** | [**K8sIoApimachineryPkgApisMetaV1Time**](K8sIoApimachineryPkgApisMetaV1Time.md) | | [optional]
**replica_statuses** | [**dict(str, V1ReplicaStatus)**](V1ReplicaStatus.md) | ReplicaStatuses is map of ReplicaType and ReplicaStatus, specifies the status of each replica. |
**start_time** | [**V1Time**](V1Time.md) | Represents time when the job was acknowledged by the job controller. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC. | [optional]
**start_time** | [**K8sIoApimachineryPkgApisMetaV1Time**](K8sIoApimachineryPkgApisMetaV1Time.md) | | [optional]

[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)

Expand Down
15 changes: 15 additions & 0 deletions sdk/python/docs/V1MXJob.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# V1MXJob

MXJob is the Schema for the mxjobs API
## Properties
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**api_version** | **str** | APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources | [optional]
**kind** | **str** | Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds | [optional]
**metadata** | [**K8sIoApimachineryPkgApisMetaV1ObjectMeta**](K8sIoApimachineryPkgApisMetaV1ObjectMeta.md) | | [optional]
**spec** | [**V1MXJobSpec**](V1MXJobSpec.md) | | [optional]
**status** | [**V1JobStatus**](V1JobStatus.md) | | [optional]

[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)


14 changes: 14 additions & 0 deletions sdk/python/docs/V1MXJobList.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# V1MXJobList

MXJobList contains a list of MXJob
## Properties
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**api_version** | **str** | APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources | [optional]
**items** | [**list[V1MXJob]**](V1MXJob.md) | |
**kind** | **str** | Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds | [optional]
**metadata** | [**K8sIoApimachineryPkgApisMetaV1ListMeta**](K8sIoApimachineryPkgApisMetaV1ListMeta.md) | | [optional]

[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)


13 changes: 13 additions & 0 deletions sdk/python/docs/V1MXJobSpec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# V1MXJobSpec

MXJobSpec defines the desired state of MXJob
## Properties
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**job_mode** | **str** | JobMode specify the kind of MXjob to do. Different mode may have different MXReplicaSpecs request |
**mx_replica_specs** | [**dict(str, V1ReplicaSpec)**](V1ReplicaSpec.md) | MXReplicaSpecs is map of common.ReplicaType and common.ReplicaSpec specifies the MX replicas to run. For example, { \"Scheduler\": common.ReplicaSpec, \"Server\": common.ReplicaSpec, \"Worker\": common.ReplicaSpec, } |
**run_policy** | [**V1RunPolicy**](V1RunPolicy.md) | |

[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)


15 changes: 15 additions & 0 deletions sdk/python/docs/V1PyTorchJob.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# V1PyTorchJob

PyTorchJob Represents a PyTorchJob resource.
## Properties
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**api_version** | **str** | APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources | [optional]
**kind** | **str** | Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds | [optional]
**metadata** | [**K8sIoApimachineryPkgApisMetaV1ObjectMeta**](K8sIoApimachineryPkgApisMetaV1ObjectMeta.md) | | [optional]
**spec** | [**V1PyTorchJobSpec**](V1PyTorchJobSpec.md) | | [optional]
**status** | [**V1JobStatus**](V1JobStatus.md) | | [optional]

[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)


14 changes: 14 additions & 0 deletions sdk/python/docs/V1PyTorchJobList.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# V1PyTorchJobList

PyTorchJobList is a list of PyTorchJobs.
## Properties
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**api_version** | **str** | APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources | [optional]
**items** | [**list[V1PyTorchJob]**](V1PyTorchJob.md) | List of PyTorchJobs. |
**kind** | **str** | Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds | [optional]
**metadata** | [**K8sIoApimachineryPkgApisMetaV1ListMeta**](K8sIoApimachineryPkgApisMetaV1ListMeta.md) | | [optional]

[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)


12 changes: 12 additions & 0 deletions sdk/python/docs/V1PyTorchJobSpec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# V1PyTorchJobSpec

PyTorchJobSpec is a desired state description of the PyTorchJob.
## Properties
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**pytorch_replica_specs** | [**dict(str, V1ReplicaSpec)**](V1ReplicaSpec.md) | A map of PyTorchReplicaType (type) to ReplicaSpec (value). Specifies the PyTorch cluster configuration. For example, { \"Master\": PyTorchReplicaSpec, \"Worker\": PyTorchReplicaSpec, } |
**run_policy** | [**V1RunPolicy**](V1RunPolicy.md) | |

[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)


3 changes: 2 additions & 1 deletion sdk/python/docs/V1ReplicaSpec.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# V1ReplicaSpec

ReplicaSpec is a description of the replica
## Properties
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**replicas** | **int** | Replicas is the desired number of replicas of the given template. If unspecified, defaults to 1. | [optional]
**restart_policy** | **str** | Restart policy for all replicas within the job. One of Always, OnFailure, Never and ExitCode. Default to Never. | [optional]
**template** | [**V1PodTemplateSpec**](https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1PodTemplateSpec.md) | Template is the object that describes the pod that will be created for this replica. RestartPolicy in PodTemplateSpec will be overide by RestartPolicy in ReplicaSpec | [optional]
**template** | [**K8sIoApiCoreV1PodTemplateSpec**](K8sIoApiCoreV1PodTemplateSpec.md) | | [optional]

[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)

Expand Down
1 change: 1 addition & 0 deletions sdk/python/docs/V1ReplicaStatus.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# V1ReplicaStatus

ReplicaStatus represents the current observed state of the replica.
## Properties
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
Expand Down
15 changes: 15 additions & 0 deletions sdk/python/docs/V1RunPolicy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# V1RunPolicy

RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active.
## Properties
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**active_deadline_seconds** | **int** | Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer. | [optional]
**backoff_limit** | **int** | Optional number of retries before marking this job failed. | [optional]
**clean_pod_policy** | **str** | CleanPodPolicy defines the policy to kill pods after the job completes. Default to Running. | [optional]
**scheduling_policy** | [**V1SchedulingPolicy**](V1SchedulingPolicy.md) | | [optional]
**ttl_seconds_after_finished** | **int** | TTLSecondsAfterFinished is the TTL to clean up jobs. It may take extra ReconcilePeriod seconds for the cleanup, since reconcile gets called periodically. Default to infinite. | [optional]

[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)


14 changes: 14 additions & 0 deletions sdk/python/docs/V1SchedulingPolicy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# V1SchedulingPolicy

SchedulingPolicy encapsulates various scheduling policies of the distributed training job, for example `minAvailable` for gang-scheduling.
## Properties
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**min_available** | **int** | | [optional]
**min_resources** | [**dict(str, K8sIoApimachineryPkgApiResourceQuantity)**](K8sIoApimachineryPkgApiResourceQuantity.md) | | [optional]
**priority_class** | **str** | | [optional]
**queue** | **str** | | [optional]

[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)


11 changes: 6 additions & 5 deletions sdk/python/docs/V1TFJob.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
# V1TFJob

TFJob represents a TFJob resource.
## Properties
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**api_version** | **str** | APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#resources | [optional]
**kind** | **str** | Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kinds | [optional]
**metadata** | [**V1ObjectMeta**](https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1ObjectMeta.md) | Standard Kubernetes object's metadata. | [optional]
**spec** | [**V1TFJobSpec**](V1TFJobSpec.md) | Specification of the desired state of the TFJob. | [optional]
**status** | [**V1JobStatus**](V1JobStatus.md) | Most recently observed status of the TFJob. Read-only (modified by the system). | [optional]
**api_version** | **str** | APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources | [optional]
**kind** | **str** | Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds | [optional]
**metadata** | [**K8sIoApimachineryPkgApisMetaV1ObjectMeta**](K8sIoApimachineryPkgApisMetaV1ObjectMeta.md) | | [optional]
**spec** | [**V1TFJobSpec**](V1TFJobSpec.md) | | [optional]
**status** | [**V1JobStatus**](V1JobStatus.md) | | [optional]

[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)

Expand Down
Loading