-
Notifications
You must be signed in to change notification settings - Fork 48
feat: Hyperparameter Optimization APIs in Kubeflow SDK #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
65e4dea
Init commit
andreyvelich 2bd1540
Create optimize() API
andreyvelich 778c6a9
Set retain=True for Experiment
andreyvelich f7f7aba
Fix location to Trainer utils
andreyvelich 192299c
Implement get_job, list_jobs, and delete_job APIs
andreyvelich bf0b93a
Add metrics and parameters to Trial object
andreyvelich cdec3b9
Clarify message for objective
andreyvelich 5e7d131
Move TrainJobTemplate to the Trainer types
andreyvelich 55a89fa
Rename CRD to CR
andreyvelich 14c1497
Fix serialization of TrainJob
andreyvelich 1353fc9
Rename ExecutionBackend to RuntimeBackend
andreyvelich a1bcab9
Export GridSearch
andreyvelich 50d743f
Add OptimizationJob constant
andreyvelich 57c0a40
Change to BaseAlgorithm
andreyvelich 6ca385e
Keep func_args for Trainer
andreyvelich 85c63f4
Use PyPI package for Katib models
andreyvelich a044087
Update lock file
andreyvelich File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -33,6 +33,7 @@ jobs: | |
| ci | ||
| docs | ||
| examples | ||
| optimizer | ||
| scripts | ||
| test | ||
| trainer | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| # Copyright 2025 The Kubeflow Authors. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # The default Kubernetes namespace. | ||
| DEFAULT_NAMESPACE = "default" | ||
|
|
||
| # How long to wait in seconds for requests to the Kubernetes API Server. | ||
| DEFAULT_TIMEOUT = 120 | ||
|
|
||
| # Unknown indicates that the value can't be identified. | ||
| UNKNOWN = "Unknown" |
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| # Copyright 2025 The Kubeflow Authors. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| import os | ||
| from typing import Optional | ||
|
|
||
| from kubernetes import config | ||
|
|
||
| from kubeflow.common import constants | ||
|
|
||
|
|
||
| def is_running_in_k8s() -> bool: | ||
| return os.path.isdir("/var/run/secrets/kubernetes.io/") | ||
|
|
||
|
|
||
| def get_default_target_namespace(context: Optional[str] = None) -> str: | ||
| if not is_running_in_k8s(): | ||
| try: | ||
| all_contexts, current_context = config.list_kube_config_contexts() | ||
| # If context is set, we should get namespace from it. | ||
| if context: | ||
| for c in all_contexts: | ||
| if isinstance(c, dict) and c.get("name") == context: | ||
| return c["context"]["namespace"] | ||
| # Otherwise, try to get namespace from the current context. | ||
| return current_context["context"]["namespace"] | ||
| except Exception: | ||
| return constants.DEFAULT_NAMESPACE | ||
| with open("/var/run/secrets/kubernetes.io/serviceaccount/namespace") as f: | ||
| return f.readline() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| # Copyright 2025 The Kubeflow Authors. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # Import common types. | ||
| from kubeflow.common.types import KubernetesBackendConfig | ||
|
|
||
| # Import the Kubeflow Optimizer client. | ||
| from kubeflow.optimizer.api.optimizer_client import OptimizerClient | ||
|
|
||
| # Import the Kubeflow Optimizer types. | ||
| from kubeflow.optimizer.types.algorithm_types import GridSearch, RandomSearch | ||
| from kubeflow.optimizer.types.optimization_types import Objective, OptimizationJob, TrialConfig | ||
| from kubeflow.optimizer.types.search_types import Search | ||
|
|
||
| # Import the Kubeflow Trainer types. | ||
| from kubeflow.trainer.types.types import TrainJobTemplate | ||
|
|
||
| __all__ = [ | ||
| "GridSearch", | ||
| "KubernetesBackendConfig", | ||
| "Objective", | ||
| "OptimizationJob", | ||
| "OptimizerClient", | ||
| "RandomSearch", | ||
| "Search", | ||
| "TrainJobTemplate", | ||
| "TrialConfig", | ||
| ] | ||
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,126 @@ | ||
| # Copyright 2025 The Kubeflow Authors. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| import logging | ||
| from typing import Any, Optional | ||
|
|
||
| from kubeflow.common.types import KubernetesBackendConfig | ||
| from kubeflow.optimizer.backends.kubernetes.backend import KubernetesBackend | ||
| from kubeflow.optimizer.types.algorithm_types import BaseAlgorithm | ||
| from kubeflow.optimizer.types.optimization_types import Objective, OptimizationJob, TrialConfig | ||
| from kubeflow.trainer.types.types import TrainJobTemplate | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class OptimizerClient: | ||
| def __init__( | ||
| self, | ||
| backend_config: Optional[KubernetesBackendConfig] = None, | ||
| ): | ||
| """Initialize a Kubeflow Optimizer client. | ||
|
|
||
| Args: | ||
| backend_config: Backend configuration. Either KubernetesBackendConfig or None to use | ||
| default config class. Defaults to KubernetesBackendConfig. | ||
|
|
||
| Raises: | ||
| ValueError: Invalid backend configuration. | ||
|
|
||
| """ | ||
| # Set the default backend config. | ||
| if not backend_config: | ||
| backend_config = KubernetesBackendConfig() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit, just for consistency shall we match trainer and use the same import style: if not backend_config:
backend_config = common_types.KubernetesBackendConfig()
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let me go other way around, tho. |
||
|
|
||
| if isinstance(backend_config, KubernetesBackendConfig): | ||
| self.backend = KubernetesBackend(backend_config) | ||
| else: | ||
| raise ValueError(f"Invalid backend config '{backend_config}'") | ||
|
|
||
| def optimize( | ||
| self, | ||
| trial_template: TrainJobTemplate, | ||
| *, | ||
| trial_config: Optional[TrialConfig] = None, | ||
| search_space: dict[str, Any], | ||
| objectives: Optional[list[Objective]] = None, | ||
| algorithm: Optional[BaseAlgorithm] = None, | ||
| ) -> str: | ||
| """Create an OptimizationJob for hyperparameter tuning. | ||
|
|
||
| Args: | ||
| trial_template: The TrainJob template defining the training script. | ||
| trial_config: Optional configuration to run Trials. | ||
| objectives: List of objectives to optimize. | ||
| search_space: Dictionary mapping parameter names to Search specifications using | ||
| Search.uniform(), Search.loguniform(), Search.choice(), etc. | ||
| algorithm: The optimization algorithm to use. Defaults to RandomSearch. | ||
|
|
||
| Returns: | ||
| The unique name of the Experiment that has been generated. | ||
|
|
||
| Raises: | ||
| ValueError: Input arguments are invalid. | ||
| TimeoutError: Timeout to create Experiment. | ||
| RuntimeError: Failed to create Experiment. | ||
| """ | ||
| return self.backend.optimize( | ||
| trial_template=trial_template, | ||
| trial_config=trial_config, | ||
| objectives=objectives, | ||
| search_space=search_space, | ||
| algorithm=algorithm, | ||
| ) | ||
|
|
||
| def list_jobs(self) -> list[OptimizationJob]: | ||
| """List of the created OptimizationJobs | ||
|
|
||
| Returns: | ||
| List of created OptimizationJobs. If no OptimizationJob exist, | ||
| an empty list is returned. | ||
|
|
||
| Raises: | ||
| TimeoutError: Timeout to list OptimizationJobs. | ||
| RuntimeError: Failed to list OptimizationJobs. | ||
| """ | ||
|
|
||
| return self.backend.list_jobs() | ||
|
|
||
| def get_job(self, name: str) -> OptimizationJob: | ||
| """Get the OptimizationJob object | ||
|
|
||
| Args: | ||
| name: Name of the OptimizationJob. | ||
|
|
||
| Returns: | ||
| A OptimizationJob object. | ||
|
|
||
| Raises: | ||
| TimeoutError: Timeout to get a OptimizationJob. | ||
| RuntimeError: Failed to get a OptimizationJob. | ||
| """ | ||
|
|
||
| return self.backend.get_job(name=name) | ||
|
|
||
| def delete_job(self, name: str): | ||
| """Delete the OptimizationJob. | ||
|
|
||
| Args: | ||
| name: Name of the OptimizationJob. | ||
|
|
||
| Raises: | ||
| TimeoutError: Timeout to delete OptimizationJob. | ||
| RuntimeError: Failed to delete OptimizationJob. | ||
| """ | ||
| return self.backend.delete_job(name=name) | ||
2 changes: 1 addition & 1 deletion
2
kubeflow/trainer/utils/__init__.py → kubeflow/optimizer/backends/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add GridSearch here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!