Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python SDK for Kubeflow Training Operator #1420

Merged
merged 9 commits into from
Oct 3, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,15 @@ For users who prefer to use original tensorflow controllers, please checkout v1.
kubectl apply -k "github.com/kubeflow/tf-operator.git/manifests/overlays/standalone?ref=v1.2.0"
```

### Python SDK for Kubeflow Training Operator

Training Operator provides Python SDK for the custom resources. More docs are available in [sdk/python](sdk/python) folder.

Use `pip install` command to install the latest release of the SDK:
```
pip install kubeflow-training
```

## Quick Start

Please refer to the [quick-start-v1.md](docs/quick-start-v1.md) and [Kubeflow Training User Guide](https://www.kubeflow.org/docs/guides/components/tftraining/) for more information.
Expand Down
15 changes: 15 additions & 0 deletions docs/development/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,21 @@ kubectl create -f ./tf_job_mnist.yaml

On ubuntu the default go package appears to be gccgo-go which has problems see [issue](https://github.com/golang/go/issues/15429) golang-go package is also really old so install from golang tarballs instead.

## Generate Python SDK

To generate Python SDK for the operator, run:
```
./hack/python-sdk/gen-sdk.sh
```
This command will re-generate the api and model files together with the documentation and model tests.
The following files/folders in `sdk/python` are auto-generated and should not be modified directly:
Jeffwan marked this conversation as resolved.
Show resolved Hide resolved
```
docs
kubeflow/training/models
kubeflow/training/*.py
test/*.py
```

## Code Style

### Python
Expand Down
34 changes: 22 additions & 12 deletions hack/python-sdk/gen-sdk.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash

# Copyright 2019 The Kubeflow Authors.
# Copyright 2021 The Kubeflow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -18,10 +18,12 @@ set -o errexit
set -o nounset
set -o pipefail

repo_root="$(realpath "$(dirname "$0")/../..")"

SWAGGER_JAR_URL="https://repo1.maven.org/maven2/org/openapitools/openapi-generator-cli/4.3.1/openapi-generator-cli-4.3.1.jar"
SWAGGER_CODEGEN_JAR="hack/python-sdk/openapi-generator-cli.jar"
SWAGGER_CODEGEN_CONF="hack/python-sdk/swagger_config.json"
SDK_OUTPUT_PATH="/tmp/sdk/python"
SWAGGER_CODEGEN_JAR="${repo_root}/hack/python-sdk/openapi-generator-cli.jar"
SWAGGER_CODEGEN_CONF="${repo_root}/hack/python-sdk/swagger_config.json"
SDK_OUTPUT_PATH="${repo_root}/sdk/python"
FRAMEWORKS=(tensorflow pytorch mxnet xgboost)
VERSION=1.3.0

Expand All @@ -32,14 +34,16 @@ fi
echo "Generating OpenAPI specification ..."
echo "./hack/update-codegen.sh already help us generate openapi specs ..."

echo "Downloading the swagger-codegen JAR package ..."
wget -O ${SWAGGER_CODEGEN_JAR} ${SWAGGER_JAR_URL}
if [[ ! -f "$SWAGGER_CODEGEN_JAR" ]]; then
echo "Downloading the swagger-codegen JAR package ..."
wget -O "${SWAGGER_CODEGEN_JAR}" ${SWAGGER_JAR_URL}
fi


for FRAMEWORK in ${FRAMEWORKS[@]}; do
SWAGGER_CODEGEN_FILE="pkg/apis/${FRAMEWORK}/v1/swagger.json"
echo "Generating swagger file for ${FRAMEWORK} ..."
go run hack/python-sdk/main.go ${FRAMEWORK} ${VERSION} > ${SWAGGER_CODEGEN_FILE}
go run "${repo_root}"/hack/python-sdk/main.go "${FRAMEWORK}" ${VERSION} > "${SWAGGER_CODEGEN_FILE}"
done

echo "Merging swagger files from different frameworks into one"
Expand All @@ -50,11 +54,17 @@ chmod +x /tmp/swagger

# it will report warning like 'v1.SchedulingPolicy' already exists in primary or higher priority mixin, skipping
# error code is not 0 but t's acceptable.
/tmp/swagger mixin pkg/apis/tensorflow/v1/swagger.json pkg/apis/pytorch/v1/swagger.json pkg/apis/mxnet/v1/swagger.json pkg/apis/xgboost/v1/swagger.json \
--output hack/python-sdk/swagger.json --quiet || true
/tmp/swagger mixin "${repo_root}"/pkg/apis/tensorflow/v1/swagger.json "${repo_root}"/pkg/apis/pytorch/v1/swagger.json \
"${repo_root}"/pkg/apis/mxnet/v1/swagger.json "${repo_root}"/pkg/apis/xgboost/v1/swagger.json \
--output "${repo_root}"/hack/python-sdk/swagger.json --quiet || true

echo "Generating Python SDK for ${FRAMEWORK} ..."
java -jar ${SWAGGER_CODEGEN_JAR} generate -i hack/python-sdk/swagger.json -g python -o ${SDK_OUTPUT_PATH} -c ${SWAGGER_CODEGEN_CONF}
echo "Removing previously generated files ..."
rm -rf "${SDK_OUTPUT_PATH}"/docs/V1*.md "${SDK_OUTPUT_PATH}"/kubeflow/training/models "${SDK_OUTPUT_PATH}"/kubeflow/training/*.py "${SDK_OUTPUT_PATH}"/test/*.py
echo "Generating Python SDK for Training Operator ..."
java -jar "${SWAGGER_CODEGEN_JAR}" generate -i "${repo_root}"/hack/python-sdk/swagger.json -g python -o "${SDK_OUTPUT_PATH}" -c "${SWAGGER_CODEGEN_CONF}"

echo "Kubeflow Training Operator Python SDK is generated successfully to folder ${SDK_OUTPUT_PATH}/."
rm /tmp/swagger
rm /tmp/swagger

echo "Running post-generation script ..."
"${repo_root}"/hack/python-sdk/post_gen.py
3 changes: 3 additions & 0 deletions hack/python-sdk/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ func main() {
func swaggify(name, framework string) string {
name = strings.Replace(name, fmt.Sprintf("github.com/kubeflow/tf-operator/pkg/apis/%s/", framework), "", -1)
name = strings.Replace(name, "github.com/kubeflow/common/pkg/apis/common/", "", -1)
name = strings.Replace(name, "k8s.io/api/core/", "", -1)
name = strings.Replace(name, "k8s.io/apimachinery/pkg/apis/meta/", "", -1)
name = strings.Replace(name, "k8s.io/apimachinery/pkg/api/resource", "", -1)
name = strings.Replace(name, "/", ".", -1)
return name
}
65 changes: 65 additions & 0 deletions hack/python-sdk/post_gen.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#!/usr/bin/env python
Jeffwan marked this conversation as resolved.
Show resolved Hide resolved

# Copyright 2021 The Kubeflow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
This script is used for updating generated SDK files.
"""

import os
import fileinput
import re

__replacements = [
("import kubeflow.training", "from kubeflow.training.models import *"),
("kubeflow.training.models.v1\/.*.v1.", "V1")
]

sdk_dir = os.path.abspath(os.path.join(__file__, "../../..", "sdk/python"))


def main():
fix_test_files()
add_imports()


def fix_test_files() -> None:
"""
Fix invalid model imports in generated model tests
"""
test_folder_dir = os.path.join(sdk_dir, "test")
test_files = os.listdir(test_folder_dir)
for test_file in test_files:
print(f"Precessing file {test_file}")
if test_file.endswith(".py"):
with fileinput.FileInput(os.path.join(test_folder_dir, test_file), inplace=True) as file:
for line in file:
print(_apply_regex(line), end='')


def add_imports() -> None:
with open(os.path.join(sdk_dir, "kubeflow/training/__init__.py"), "a") as init_file:
init_file.write("from kubeflow.training.api.tf_job_client import TFJobClient\n")
init_file.write("from kubeflow.training.api.py_torch_job_client import PyTorchJobClient\n")


def _apply_regex(input_str: str) -> str:
for pattern, replacement in __replacements:
input_str = re.sub(pattern, replacement, input_str)
return input_str


if __name__ == '__main__':
main()
32 changes: 16 additions & 16 deletions hack/python-sdk/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@
"properties": {
"lastTransitionTime": {
"description": "Last time the condition transitioned from one status to another.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"lastUpdateTime": {
"description": "The last time this condition was updated.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"message": {
"description": "A human readable message indicating details about the transition.",
Expand Down Expand Up @@ -51,7 +51,7 @@
"properties": {
"completionTime": {
"description": "Represents time when the job was completed. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"conditions": {
"description": "Conditions is an array of current observed job conditions.",
Expand All @@ -62,7 +62,7 @@
},
"lastReconcileTime": {
"description": "Represents last time when the job was reconciled. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
Jeffwan marked this conversation as resolved.
Show resolved Hide resolved
},
"replicaStatuses": {
"description": "ReplicaStatuses is map of ReplicaType and ReplicaStatus, specifies the status of each replica.",
Expand All @@ -73,7 +73,7 @@
},
"startTime": {
"description": "Represents time when the job was acknowledged by the job controller. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
}
}
},
Expand All @@ -90,7 +90,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ObjectMeta"
"$ref": "#/definitions/v1.ObjectMeta"
},
"spec": {
"$ref": "#/definitions/v1.MXJobSpec"
Expand Down Expand Up @@ -122,7 +122,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ListMeta"
"$ref": "#/definitions/v1.ListMeta"
}
}
},
Expand Down Expand Up @@ -169,7 +169,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ObjectMeta"
"$ref": "#/definitions/v1.ObjectMeta"
},
"spec": {
"description": "Specification of the desired state of the PyTorchJob.",
Expand Down Expand Up @@ -205,7 +205,7 @@
},
"metadata": {
"description": "Standard list metadata.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ListMeta"
"$ref": "#/definitions/v1.ListMeta"
}
}
},
Expand Down Expand Up @@ -245,7 +245,7 @@
},
"template": {
"description": "Template is the object that describes the pod that will be created for this replica. RestartPolicy in PodTemplateSpec will be overide by RestartPolicy in ReplicaSpec",
"$ref": "#/definitions/k8s.io.api.core.v1.PodTemplateSpec"
"$ref": "#/definitions/v1.PodTemplateSpec"
}
}
},
Expand Down Expand Up @@ -310,7 +310,7 @@
"minResources": {
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.api.resource.Quantity"
"$ref": "#/definitions/.Quantity"
}
},
"priorityClass": {
Expand All @@ -334,7 +334,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ObjectMeta"
"$ref": "#/definitions/v1.ObjectMeta"
},
"spec": {
"description": "Specification of the desired state of the TFJob.",
Expand Down Expand Up @@ -370,7 +370,7 @@
},
"metadata": {
"description": "Standard list metadata.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ListMeta"
"$ref": "#/definitions/v1.ListMeta"
}
}
},
Expand Down Expand Up @@ -416,7 +416,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ObjectMeta"
"$ref": "#/definitions/v1.ObjectMeta"
},
"spec": {
"$ref": "#/definitions/v1.XGBoostJobSpec"
Expand Down Expand Up @@ -448,7 +448,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ListMeta"
"$ref": "#/definitions/v1.ListMeta"
}
}
},
Expand All @@ -473,4 +473,4 @@
}
}
}
}
}
18 changes: 9 additions & 9 deletions pkg/apis/mxnet/v1/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@
"properties": {
"lastTransitionTime": {
"description": "Last time the condition transitioned from one status to another.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"lastUpdateTime": {
"description": "The last time this condition was updated.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"message": {
"description": "A human readable message indicating details about the transition.",
Expand Down Expand Up @@ -51,7 +51,7 @@
"properties": {
"completionTime": {
"description": "Represents time when the job was completed. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"conditions": {
"description": "Conditions is an array of current observed job conditions.",
Expand All @@ -62,7 +62,7 @@
},
"lastReconcileTime": {
"description": "Represents last time when the job was reconciled. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
},
"replicaStatuses": {
"description": "ReplicaStatuses is map of ReplicaType and ReplicaStatus, specifies the status of each replica.",
Expand All @@ -73,7 +73,7 @@
},
"startTime": {
"description": "Represents time when the job was acknowledged by the job controller. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.",
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.Time"
"$ref": "#/definitions/v1.Time"
}
}
},
Expand All @@ -90,7 +90,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ObjectMeta"
"$ref": "#/definitions/v1.ObjectMeta"
},
"spec": {
"$ref": "#/definitions/v1.MXJobSpec"
Expand Down Expand Up @@ -122,7 +122,7 @@
"type": "string"
},
"metadata": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.apis.meta.v1.ListMeta"
"$ref": "#/definitions/v1.ListMeta"
}
}
},
Expand Down Expand Up @@ -171,7 +171,7 @@
},
"template": {
"description": "Template is the object that describes the pod that will be created for this replica. RestartPolicy in PodTemplateSpec will be overide by RestartPolicy in ReplicaSpec",
"$ref": "#/definitions/k8s.io.api.core.v1.PodTemplateSpec"
"$ref": "#/definitions/v1.PodTemplateSpec"
}
}
},
Expand Down Expand Up @@ -236,7 +236,7 @@
"minResources": {
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/k8s.io.apimachinery.pkg.api.resource.Quantity"
"$ref": "#/definitions/.Quantity"
}
},
"priorityClass": {
Expand Down
Loading