Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC] Add unit tests for tune API #2423

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion .github/workflows/test-python.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,17 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: 3.11
python-version: '3.10'

- name: Install Katib SDK
shell: bash
run: pip install --prefer-binary -e sdk/python/v1beta1

- name: Install Training Operator SDK
shell: bash
run: |
pip install git+https://github.com/kubeflow/training-operator.git@v1.8-branch#subdirectory=sdk/python
pip install peft==0.3.0 datasets==2.15.0 transformers==4.38.0

- name: Run Python test
run: make pytest
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ pytest: prepare-pytest prepare-pytest-testdata
pytest ./test/unit/v1beta1/suggestion --ignore=./test/unit/v1beta1/suggestion/test_skopt_service.py
pytest ./test/unit/v1beta1/earlystopping
pytest ./test/unit/v1beta1/metricscollector
pytest ./test/unit/v1beta1/tune-api
cp ./pkg/apis/manager/v1beta1/python/api_pb2.py ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2.py
cp ./pkg/apis/manager/v1beta1/python/api_pb2_grpc.py ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2_grpc.py
sed -i "s/api_pb2/kubeflow\.katib\.katib_api_pb2/g" ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2_grpc.py
Expand Down
18 changes: 15 additions & 3 deletions sdk/python/v1beta1/kubeflow/katib/api/katib_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,9 @@ class name in this argument.

# If users choose to use a custom objective function.
if objective is not None:
if not base_image or not parameters:
raise ValueError("One of the required parameters is None")

helenxie-bit marked this conversation as resolved.
Show resolved Hide resolved
# Add metrics collector to the Katib Experiment.
# Up to now, we only support parameter `kind`, of which default value
# is `StdOut`, to specify the kind of metrics collector.
Expand Down Expand Up @@ -633,6 +636,8 @@ class name in this argument.
model_provider_parameters.model_uri,
"--transformer_type",
model_provider_parameters.transformer_type.__name__,
"--num_labels",
str(model_provider_parameters.num_labels),
"--model_dir",
VOLUME_PATH_MODEL,
"--dataset_dir",
Expand All @@ -643,7 +648,11 @@ class name in this argument.
f"'{training_args}'",
],
volume_mounts=[STORAGE_INITIALIZER_VOLUME_MOUNT],
resources=resources_per_trial.resources_per_worker,
resources=(
resources_per_trial.resources_per_worker
if resources_per_trial
else None
),
)

# Create the worker and the master pod.
Expand Down Expand Up @@ -677,7 +686,10 @@ class name in this argument.
),
)

if resources_per_trial.num_procs_per_worker:
if (
resources_per_trial is not None
and resources_per_trial.num_procs_per_worker
):
pytorchjob.spec.nproc_per_node = str(
resources_per_trial.num_procs_per_worker
)
Expand All @@ -689,7 +701,7 @@ class name in this argument.
)
)

if resources_per_trial.num_workers > 1:
if resources_per_trial is not None and resources_per_trial.num_workers > 1:
pytorchjob.spec.pytorch_replica_specs["Worker"] = (
training_models.KubeflowOrgV1ReplicaSpec(
replicas=resources_per_trial.num_workers - 1,
Expand Down
Loading
Loading