-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline and Katib Integration #331
Comments
Maybe we can add an example of pipeline embedding katib studyjob. I think put the example in pipeline project is better. |
/assign |
An example of using Pipelines to orchestrate hyperparameter tuning would be great. You can take a look at the TFJob launcher to get some sense of what a component for Katib might look like There is an open issue to figure out best practices with respect to integrating pipelines with K8s resources; see kubeflow/pipelines#677 |
/close |
@hougangliu: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @animeshsingh |
What is the current status for integrating Katib into a Kubeflow pipeline? I have a train pipeline that involves components for (a) fetching data (b) preprocessing the data and (c) training. I would like to do hyperparameter tuning over the train component. From a first look at the Katib documentation it appears not to have a native integration with Pipelines: you specify a container/command that does the training and fire up your katib experiment, but it doesn't appear that it can itself be a component without wrapping it somehow. It seems my main options are:
The first two seem overly complicated. The third approach seems the most natural, but to some extent undermines the point of using a pipeline in the first place, since the pipeline only strings together data downloading and preprocessing. It would be good to have a pipeline where the input is some data source and the final output is the best model from a hyperparameter tuning experiment. |
@oadams I think the first option is possible and not that complicated, there is an example https://github.com/kubeflow/katib/blob/master/examples/v1beta1/argo/argo-workflow.yaml of such approach. It would allow tuning any part of the pipeline or multiple parts at the same time which is the most elastic approach IMO. However, I struggle to make it work using python dsl, not sure if it was designed to work together, but seems like technically there shouldn't be a problem to make it work. |
Katib is used for hyperparameter tuning, and Pipeline is used for end-to-end ML workflows, and the Pipeline do need some parameters provided by Katib to improve the efficiency, such as enabling Pipeline get the best parameters provided by Katib etc, any document or best practise for how Pipeline can get the parameters generated by Katib?
FYI @hougangliu @jinchihe
The text was updated successfully, but these errors were encountered: