Skip to content

Commit

Permalink
Add Python SDK for Kubeflow Training Operator (#1420)
Browse files Browse the repository at this point in the history
* Add Python SDK for Kubeflow Training Operator

* Update example notebooks

* Update SDK generation tooling and docs

* Re-generate SDK

* Allow to specify container name in 'get_logs' methods

* Generalize job labels

* Check if attribute exists

* Address code review comments
  • Loading branch information
alembiewski authored Oct 3, 2021
1 parent 4aac6d1 commit 6523d8d
Show file tree
Hide file tree
Showing 102 changed files with 7,438 additions and 2,164 deletions.
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,15 @@ For users who prefer to use original tensorflow controllers, please checkout v1.
kubectl apply -k "github.com/kubeflow/tf-operator.git/manifests/overlays/standalone?ref=v1.2.0"
```

### Python SDK for Kubeflow Training Operator

Training Operator provides Python SDK for the custom resources. More docs are available in [sdk/python](sdk/python) folder.

Use `pip install` command to install the latest release of the SDK:
```
pip install kubeflow-training
```

## Quick Start

Please refer to the [quick-start-v1.md](docs/quick-start-v1.md) and [Kubeflow Training User Guide](https://www.kubeflow.org/docs/guides/components/tftraining/) for more information.
Expand Down
15 changes: 15 additions & 0 deletions docs/development/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,21 @@ kubectl create -f ./tf_job_mnist.yaml

On ubuntu the default go package appears to be gccgo-go which has problems see [issue](https://github.com/golang/go/issues/15429) golang-go package is also really old so install from golang tarballs instead.

## Generate Python SDK

To generate Python SDK for the operator, run:
```
./hack/python-sdk/gen-sdk.sh
```
This command will re-generate the api and model files together with the documentation and model tests.
The following files/folders in `sdk/python` are auto-generated and should not be modified directly:
```
docs
kubeflow/training/models
kubeflow/training/*.py
test/*.py
```

## Code Style

### Python
Expand Down
34 changes: 22 additions & 12 deletions hack/python-sdk/gen-sdk.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash

# Copyright 2019 The Kubeflow Authors.
# Copyright 2021 The Kubeflow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -18,10 +18,12 @@ set -o errexit
set -o nounset
set -o pipefail

repo_root="$(realpath "$(dirname "$0")/../..")"

SWAGGER_JAR_URL="https://repo1.maven.org/maven2/org/openapitools/openapi-generator-cli/4.3.1/openapi-generator-cli-4.3.1.jar"
SWAGGER_CODEGEN_JAR="hack/python-sdk/openapi-generator-cli.jar"
SWAGGER_CODEGEN_CONF="hack/python-sdk/swagger_config.json"
SDK_OUTPUT_PATH="/tmp/sdk/python"
SWAGGER_CODEGEN_JAR="${repo_root}/hack/python-sdk/openapi-generator-cli.jar"
SWAGGER_CODEGEN_CONF="${repo_root}/hack/python-sdk/swagger_config.json"
SDK_OUTPUT_PATH="${repo_root}/sdk/python"
FRAMEWORKS=(tensorflow pytorch mxnet xgboost)
VERSION=1.3.0

Expand All @@ -32,14 +34,16 @@ fi
echo "Generating OpenAPI specification ..."
echo "./hack/update-codegen.sh already help us generate openapi specs ..."

echo "Downloading the swagger-codegen JAR package ..."
wget -O ${SWAGGER_CODEGEN_JAR} ${SWAGGER_JAR_URL}
if [[ ! -f "$SWAGGER_CODEGEN_JAR" ]]; then
echo "Downloading the swagger-codegen JAR package ..."
wget -O "${SWAGGER_CODEGEN_JAR}" ${SWAGGER_JAR_URL}
fi


for FRAMEWORK in ${FRAMEWORKS[@]}; do
SWAGGER_CODEGEN_FILE="pkg/apis/${FRAMEWORK}/v1/swagger.json"
echo "Generating swagger file for ${FRAMEWORK} ..."
go run hack/python-sdk/main.go ${FRAMEWORK} ${VERSION} > ${SWAGGER_CODEGEN_FILE}
go run "${repo_root}"/hack/python-sdk/main.go "${FRAMEWORK}" ${VERSION} > "${SWAGGER_CODEGEN_FILE}"
done

echo "Merging swagger files from different frameworks into one"
Expand All @@ -50,11 +54,17 @@ chmod +x /tmp/swagger

# it will report warning like 'v1.SchedulingPolicy' already exists in primary or higher priority mixin, skipping
# error code is not 0 but t's acceptable.
/tmp/swagger mixin pkg/apis/tensorflow/v1/swagger.json pkg/apis/pytorch/v1/swagger.json pkg/apis/mxnet/v1/swagger.json pkg/apis/xgboost/v1/swagger.json \
--output hack/python-sdk/swagger.json --quiet || true
/tmp/swagger mixin "${repo_root}"/pkg/apis/tensorflow/v1/swagger.json "${repo_root}"/pkg/apis/pytorch/v1/swagger.json \
"${repo_root}"/pkg/apis/mxnet/v1/swagger.json "${repo_root}"/pkg/apis/xgboost/v1/swagger.json \
--output "${repo_root}"/hack/python-sdk/swagger.json --quiet || true

echo "Generating Python SDK for ${FRAMEWORK} ..."
java -jar ${SWAGGER_CODEGEN_JAR} generate -i hack/python-sdk/swagger.json -g python -o ${SDK_OUTPUT_PATH} -c ${SWAGGER_CODEGEN_CONF}
echo "Removing previously generated files ..."
rm -rf "${SDK_OUTPUT_PATH}"/docs/V1*.md "${SDK_OUTPUT_PATH}"/kubeflow/training/models "${SDK_OUTPUT_PATH}"/kubeflow/training/*.py "${SDK_OUTPUT_PATH}"/test/*.py
echo "Generating Python SDK for Training Operator ..."
java -jar "${SWAGGER_CODEGEN_JAR}" generate -i "${repo_root}"/hack/python-sdk/swagger.json -g python -o "${SDK_OUTPUT_PATH}" -c "${SWAGGER_CODEGEN_CONF}"

echo "Kubeflow Training Operator Python SDK is generated successfully to folder ${SDK_OUTPUT_PATH}/."
rm /tmp/swagger
rm /tmp/swagger

echo "Running post-generation script ..."
"${repo_root}"/hack/python-sdk/post_gen.py
3 changes: 3 additions & 0 deletions hack/python-sdk/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ func main() {
func swaggify(name, framework string) string {
name = strings.Replace(name, fmt.Sprintf("github.com/kubeflow/tf-operator/pkg/apis/%s/", framework), "", -1)
name = strings.Replace(name, "github.com/kubeflow/common/pkg/apis/common/", "", -1)
name = strings.Replace(name, "k8s.io/api/core/", "", -1)
name = strings.Replace(name, "k8s.io/apimachinery/pkg/apis/meta/", "", -1)
name = strings.Replace(name, "k8s.io/apimachinery/pkg/api/resource", "", -1)
name = strings.Replace(name, "/", ".", -1)
return name
}
65 changes: 65 additions & 0 deletions hack/python-sdk/post_gen.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#!/usr/bin/env python

# Copyright 2021 The Kubeflow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
This script is used for updating generated SDK files.
"""

import os
import fileinput
import re

__replacements = [
("import kubeflow.training", "from kubeflow.training.models import *"),
("kubeflow.training.models.v1\/.*.v1.", "V1")
]

sdk_dir = os.path.abspath(os.path.join(__file__, "../../..", "sdk/python"))


def main():
fix_test_files()
add_imports()


def fix_test_files() -> None:
"""
Fix invalid model imports in generated model tests
"""
test_folder_dir = os.path.join(sdk_dir, "test")
test_files = os.listdir(test_folder_dir)
for test_file in test_files:
print(f"Precessing file {test_file}")
if test_file.endswith(".py"):
with fileinput.FileInput(os.path.join(test_folder_dir, test_file), inplace=True) as file:
for line in file:
print(_apply_regex(line), end='')


def add_imports() -> None:
with open(os.path.join(sdk_dir, "kubeflow/training/__init__.py"), "a") as init_file:
init_file.write("from kubeflow.training.api.tf_job_client import TFJobClient\n")
init_file.write("from kubeflow.training.api.py_torch_job_client import PyTorchJobClient\n")


def _apply_regex(input_str: str) -> str:
for pattern, replacement in __replacements:
input_str = re.sub(pattern, replacement, input_str)
return input_str


if __name__ == '__main__':
main()
32 changes: 16 additions & 16 deletions hack/python-sdk/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@
"properties": {
"lastTransitionTime": {
"description": "Last time the condition transitioned from one status to another.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"lastUpdateTime": {
"description": "The last time this condition was updated.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"message": {
"description": "A human readable message indicating details about the transition.",
Expand Down Expand Up @@ -51,7 +51,7 @@
"properties": {
"completionTime": {
"description": "Represents time when the job was completed. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"conditions": {
"description": "Conditions is an array of current observed job conditions.",
Expand All @@ -62,7 +62,7 @@
},
"lastReconcileTime": {
"description": "Represents last time when the job was reconciled. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"replicaStatuses": {
"description": "ReplicaStatuses is map of ReplicaType and ReplicaStatus, specifies the status of each replica.",
Expand All @@ -73,7 +73,7 @@
},
"startTime": {
"description": "Represents time when the job was acknowledged by the job controller. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
}
}
},
Expand All @@ -90,7 +90,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ObjectMeta"
"$ref": "#/definitions/v1.ObjectMeta"
},
"spec": {
"$ref": "#/definitions/v1.MXJobSpec"
Expand Down Expand Up @@ -122,7 +122,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ListMeta"
"$ref": "#/definitions/v1.ListMeta"
}
}
},
Expand Down Expand Up @@ -169,7 +169,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ObjectMeta"
"$ref": "#/definitions/v1.ObjectMeta"
},
"spec": {
"description": "Specification of the desired state of the PyTorchJob.",
Expand Down Expand Up @@ -205,7 +205,7 @@
},
"metadata": {
"description": "Standard list metadata.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ListMeta"
"$ref": "#/definitions/v1.ListMeta"
}
}
},
Expand Down Expand Up @@ -245,7 +245,7 @@
},
"template": {
"description": "Template is the object that describes the pod that will be created for this replica. RestartPolicy in PodTemplateSpec will be overide by RestartPolicy in ReplicaSpec",
"$ref": "#/definitions/k8s.io.api.core.v1.PodTemplateSpec"
"$ref": "#/definitions/v1.PodTemplateSpec"
}
}
},
Expand Down Expand Up @@ -310,7 +310,7 @@
"minResources": {
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.api.resource.Quantity"
"$ref": "#/definitions/.Quantity"
}
},
"priorityClass": {
Expand All @@ -334,7 +334,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ObjectMeta"
"$ref": "#/definitions/v1.ObjectMeta"
},
"spec": {
"description": "Specification of the desired state of the TFJob.",
Expand Down Expand Up @@ -370,7 +370,7 @@
},
"metadata": {
"description": "Standard list metadata.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ListMeta"
"$ref": "#/definitions/v1.ListMeta"
}
}
},
Expand Down Expand Up @@ -416,7 +416,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ObjectMeta"
"$ref": "#/definitions/v1.ObjectMeta"
},
"spec": {
"$ref": "#/definitions/v1.XGBoostJobSpec"
Expand Down Expand Up @@ -448,7 +448,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ListMeta"
"$ref": "#/definitions/v1.ListMeta"
}
}
},
Expand All @@ -473,4 +473,4 @@
}
}
}
}
}
18 changes: 9 additions & 9 deletions pkg/apis/mxnet/v1/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@
"properties": {
"lastTransitionTime": {
"description": "Last time the condition transitioned from one status to another.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"lastUpdateTime": {
"description": "The last time this condition was updated.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"message": {
"description": "A human readable message indicating details about the transition.",
Expand Down Expand Up @@ -51,7 +51,7 @@
"properties": {
"completionTime": {
"description": "Represents time when the job was completed. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"conditions": {
"description": "Conditions is an array of current observed job conditions.",
Expand All @@ -62,7 +62,7 @@
},
"lastReconcileTime": {
"description": "Represents last time when the job was reconciled. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"replicaStatuses": {
"description": "ReplicaStatuses is map of ReplicaType and ReplicaStatus, specifies the status of each replica.",
Expand All @@ -73,7 +73,7 @@
},
"startTime": {
"description": "Represents time when the job was acknowledged by the job controller. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
}
}
},
Expand All @@ -90,7 +90,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ObjectMeta"
"$ref": "#/definitions/v1.ObjectMeta"
},
"spec": {
"$ref": "#/definitions/v1.MXJobSpec"
Expand Down Expand Up @@ -122,7 +122,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ListMeta"
"$ref": "#/definitions/v1.ListMeta"
}
}
},
Expand Down Expand Up @@ -171,7 +171,7 @@
},
"template": {
"description": "Template is the object that describes the pod that will be created for this replica. RestartPolicy in PodTemplateSpec will be overide by RestartPolicy in ReplicaSpec",
"$ref": "#/definitions/k8s.io.api.core.v1.PodTemplateSpec"
"$ref": "#/definitions/v1.PodTemplateSpec"
}
}
},
Expand Down Expand Up @@ -236,7 +236,7 @@
"minResources": {
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.api.resource.Quantity"
"$ref": "#/definitions/.Quantity"
}
},
"priorityClass": {
Expand Down
Loading

0 comments on commit 6523d8d

Please sign in to comment.