Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

katib: Add per component packages #1141

Merged
merged 6 commits into from
May 6, 2020

Conversation

discordianfish
Copy link
Member

@discordianfish discordianfish commented Apr 29, 2020

Description of your changes:
katib: Add per component packages

To support both katib deployment with bundled mysql and external
databases, split the katib-controller package into individual components.

As suggested by @jlewi, this preserves backwards compatibility by
creating new packages that reference the existing manifests

Checklist:

  • Unit tests have been rebuilt:
    1. cd manifests/tests
    2. make generate-changed-only
    3. make test

@kubeflow-bot
Copy link
Contributor

This change is Reviewable

@discordianfish
Copy link
Member Author

Can't reproduce the error locally. It would be great if we could get kubeflow/kfctl#264 merged so kfctl returns why it "couldn't generate kustomization file for component katib-controller-standalone"

@jlewi: That being said, is this in general how you imaging it?

@jlewi
Copy link
Contributor

jlewi commented Apr 29, 2020

Thanks @discordianfish. A couple suggestions

  • Preserve backwards compatibility so we can avoid having to update all the KFDefs in this PR.

    • You can create new packages that pull in the YAML resources from their current location
    • I think as you've pointed out the kfctl kustomize create logic is a bit brittle and hard to understand
    • So the pattern I've been following is to leave the existing structure intact (i.e. keep base as I think the logic depends on it)
    • Define new kustomize packages that resource the same ".yaml" files to avoid duplication
    • Once all the KFDefs can switch over to not relying on kustomize files being created by kfctl
      we can cleanup all the individual applications.
  • I've been thinking about how to structure our kustomize packages. I might suggest a layout like the following

    katib/components
                               /katib-db-manager
                              /katib-controller
                             /katib-db-mysql
    katib/installs
                       /katib-standalone
                      /katib-external-db
    
    • The idea is to distinguish between install packages and components/modules with the goal of
      making it easier for end-users to find the packages they want.

To support both katib deployment with bundled mysql and external
databases, split the katib-controller package into individual components.

As suggested by @jlewi, this preserves backwards compatibility by
creating new packages that refence the existing manifests.
@discordianfish discordianfish force-pushed the katib-controller-split branch from 9b81109 to 46c9044 Compare April 30, 2020 13:08
@andreyvelich
Copy link
Member

  • I've been thinking about how to structure our kustomize packages. I might suggest a layout like the following

    katib/components
                               /katib-db-manager
                              /katib-controller
                             /katib-db-mysql
    katib/installs
                       /katib-standalone
                      /katib-external-db
    
    • The idea is to distinguish between install packages and components/modules with the goal of
      making it easier for end-users to find the packages they want.

This design lgtm /cc @johnugeorge @gaocegege.

@jlewi What do you think about katib-crds: https://github.com/kubeflow/manifests/tree/master/katib/katib-crds ?
Should it be in independent folder? Do we need additional overlay with Application for the Katib CRD? Or we can we add it /katib/components/ folder too?

@andreyvelich
Copy link
Member

@discordianfish Thanks for doing this!
I will take a look.

@discordianfish
Copy link
Member Author

@jlewi / @andreyvelich Like this?

The remaining question is how to test this if it's not referred by any kfdef yet. Should I create a new one?

@discordianfish discordianfish changed the title [WIP] Split katib-controller components [WIP] katib: Add per component packages Apr 30, 2020
app.kubernetes.io/name: katib-controller
app.kubernetes.io/part-of: kubeflow
configurations:
- ../../katib-controller/v3/params.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah wasn't sure about how to deal with the v3 stuff or not. I've changed it.

- ../../../components/katib-db-manager
patchesStrategicMerge:
- katib-db-manager-deployment.yaml
secretGenerator:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It contains secrets, so I used the secretGenerator. We could use a configmap for everything except the password but I usually prefer keeping everything in a secret in such case.

secretKeyRef:
name: katib-mysql-secrets
key: KATIB_MYSQL_DB_DATABASE
- name: KATIB_MYSQL_DB_HOST
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you can specify KATIB_MYSQL_DB_PORT, take a look here: https://www.kubeflow.org/docs/components/hyperparameter-tuning/env-variables/#katib-db-manager

valueFrom:
secretKeyRef:
name: katib-mysql-secrets
key: DB_USER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be DB_PASSWORD?

valueFrom:
secretKeyRef:
name: katib-mysql-secrets
key: MYSQL_HOST
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe name it KATIB_MYSQL_DB_HOST to be consistent?

@discordianfish
Copy link
Member Author

@andreyvelich ok, addressed your comments. Any thoughts about tests?

@jlewi
Copy link
Contributor

jlewi commented Apr 30, 2020

Just my oppinion so take it or leave it but I might suggest that instead of doing base and overlays in installs structure it as

installs:
   katib-external-db
   katib-standalone/ <no base subdirectory
   katib-ibm

Nesting things in "/base" and "/overlays" just seems like it makes things harder to find.

Just my 2 cents though.

For the tests add to search-dirs

The directory whose kustomizations you want to test. So in this case that's probably the "installs" directory or just specific subdirectories.

The test will generate and check in the expected YAML. This way during PR review inspecting the diff of the expected output will tell the reviewer whether anything changed.

@discordianfish
Copy link
Member Author

@jlewi What do you think about Katib crds? Should it not be part of /katib/components also?

As requested, this PR doesn't move any existing manifests, so I don't think there is any TODO for the crds. Beside that @jlewi already said it looks good from his POV. It would be great if we could minimalize the back and forth where possible, given we're all in different timezones.

@andreyvelich
Copy link
Member

@discordianfish Alright, overall /lgtm.
Thank you for doing this!

/cc @johnugeorge @gaocegege

@andreyvelich
Copy link
Member

/lgtm

@jlewi
Copy link
Contributor

jlewi commented May 5, 2020

/approved

Thanks @discordianfish!

@andreyvelich could you please create an OWNERs file under /katib so you can approve changes?

@gaocegege
Copy link
Member

/lgtm

@discordianfish
Copy link
Member Author

@jlewi I've just tried to test this on our cluster (by using this branch and katib-controller's repoRef/path set to katib/installs/katib-external-db in kfctl.yaml) but it fails with:

INFO[0000] Processing application: tensorboard           filename="kustomize/kustomize.go:495"
INFO[0000] Processing application: katib-crds            filename="kustomize/kustomize.go:495"
INFO[0000] Processing application: katib-controller      filename="kustomize/kustomize.go:495"
WARN[0000] Cannot get kustomization from /etc/kubeflow/kustomize/katib-controller/base: open /etc/kubeflow/kustomize/katib-controller/base/kustomization.yaml: no such file or directory  filename="kustomize/kustomize.go:714"
Error: failed to build kfApp from URI /etc/kubeflow/kfctl.yaml: couldn't generate KfApp:  (kubeflow.error): Code 500 with message: kfApp Generate failed for kustomize: Kustomize generate failed:  (kubeflow.error): Code 500 with message: couldn't generate kustomization file for component katib-controller: open /etc/kubeflow/kustomize/katib-controller/base/katib-db-manager-deployment.yaml: no such file or directory

I believe that's because kfctl's "unrolling" of the manifests. Not sure about the specifics but it copies the manifests from the source to /etc/kubeflow/manifests and flattens the structure, so if I specify katib/installs/katib-external-db, it ends up in /etc/kubeflow/manifests/katib-controller and the resources references end up going nowhere. So I think to make the changes here work, we need to change kfctl as well - or am I missing something and this is suppose to work already?

@andreyvelich
Copy link
Member

@jlewi I've just tried to test this on our cluster (by using this branch and katib-controller's repoRef/path set to katib/installs/katib-external-db in kfctl.yaml) but it fails with:

INFO[0000] Processing application: tensorboard           filename="kustomize/kustomize.go:495"
INFO[0000] Processing application: katib-crds            filename="kustomize/kustomize.go:495"
INFO[0000] Processing application: katib-controller      filename="kustomize/kustomize.go:495"
WARN[0000] Cannot get kustomization from /etc/kubeflow/kustomize/katib-controller/base: open /etc/kubeflow/kustomize/katib-controller/base/kustomization.yaml: no such file or directory  filename="kustomize/kustomize.go:714"
Error: failed to build kfApp from URI /etc/kubeflow/kfctl.yaml: couldn't generate KfApp:  (kubeflow.error): Code 500 with message: kfApp Generate failed for kustomize: Kustomize generate failed:  (kubeflow.error): Code 500 with message: couldn't generate kustomization file for component katib-controller: open /etc/kubeflow/kustomize/katib-controller/base/katib-db-manager-deployment.yaml: no such file or directory

I believe that's because kfctl's "unrolling" of the manifests. Not sure about the specifics but it copies the manifests from the source to /etc/kubeflow/manifests and flattens the structure, so if I specify katib/installs/katib-external-db, it ends up in /etc/kubeflow/manifests/katib-controller and the resources references end up going nowhere. So I think to make the changes here work, we need to change kfctl as well - or am I missing something and this is suppose to work already?

I think that can be because of kfctl design. @jlewi Will we use this structure for kfctl in current version or we will use it only for v3 version?

@jlewi
Copy link
Contributor

jlewi commented May 5, 2020

/approve

@discordianfish and @andreyvelich I added logic to short circuit the logic in kfctl to create kustomize files.
https://github.com/kubeflow/kfctl/blob/c9afc939f04420ea822e0aa09bd9bd28c8a71b73/pkg/kfapp/kustomize/kustomize.go#L519

I think there are two cases we want to support

  1. Legacy - Where kustomization.yaml is generated by kfctl
  2. No magic - kustomization.yaml isn't generated by kfctl; the application in KFDef points to a valid kustomize package.

In particular, I don't think we want to mix the two; i.e. use a mix of kustomize packages.

The logic to decide which mode to use is defined here
https://github.com/kubeflow/kfctl/blob/c9afc939f04420ea822e0aa09bd9bd28c8a71b73/pkg/kfconfig/types.go#L849

Its a bit of a hack. We look for an application with a specific name. This is intended to be a "stack" which is a kustomize package which combines all the kustomize applications you want to install.
Here's the stack for GCP.
https://github.com/kubeflow/manifests/tree/master/stacks/gcp

And associated KFDef.
https://github.com/kubeflow/manifests/blob/master/stacks/examples/kfctl_gcp_stacks.experimental.yaml

So we would add the katib package to the stack kustomization.yaml and not to the KFDef.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit e172590 into kubeflow:master May 6, 2020
@andreyvelich
Copy link
Member

Thank you for your clarification @jlewi, I agree with this approach.

@discordianfish
Copy link
Member Author

@jlewi So to use this, I need to migrate my kfdef to a stacks manifest? In general, I agree that this is better than having this custom kfdef but it's not trivial to migrate this.

To migrate to this, I'd like to generate all manifests both from the existing kfdef and from a new stack manifests.

I assume for the new stack based approach I can just run kustomize build, but for the old kfdef based approach I can't find a way to generate the manifests without applying them. kfctl build just generates the kustomize manifests, not the final rendered manifests. Is there a way to generate the manifests without applying them?

@jlewi
Copy link
Contributor

jlewi commented May 8, 2020

@discordianfish looks like we are going to have to roll this back because of test breakages. #1159 is tracking rolling it forward.

I need to migrate my kfdef to a stacks manifest? In general, I agree that this is better than having this custom kfdef but it's not trivial to migrate this.

I guess that depends on your KFDef. There's an example KFDef for stacks here
https://github.com/kubeflow/manifests/blob/master/stacks/examples/kfctl_gcp_stacks.experimental.yaml

I wouldn't expect that to be too different from other KFDefs (E.g. just remove the GCP plugin).

I assume for the new stack based approach I can just run kustomize build, but for the old kfdef based approach I can't find a way to generate the manifests without applying them. kfctl build

Not quite. kfctl build is still creating the kustomization packages and not actually hydrating the manifests.

We don't currently have a command to recourse of all the directories in "${KFAPP}/kustomize".
You could write your favorite shell script to do it. That's the approach we are taking on GCP.
https://github.com/kubeflow/gcp-blueprints/blob/6f16dce8033c71dfe92d26075c62529a74d3097e/kubeflow/Makefile#L69

Or you could try running

cd ${KFAPP}/kustomize
kustomize create --autodetect

I think that will create a kustomization.yaml file that combines all the subdirectories but I haven't tried it.

I didn't go that route because

  1. To me an uber kustomization.yaml file implies that all the packages could potentially be in the same namespace but that would actually break things
  2. Order matters in how the packages are applied and I'm not sure if an uber kustomize file preserves order.

k8s-ci-robot pushed a commit that referenced this pull request May 8, 2020
discordianfish added a commit to discordianfish/kubeflow-manifests that referenced this pull request May 11, 2020
This was reverted in a03973b due to
test failures. This re-introduces the packages with the issues fixed.

See kubeflow#1141 for more context.
k8s-ci-robot pushed a commit that referenced this pull request May 13, 2020
This was reverted in a03973b due to
test failures. This re-introduces the packages with the issues fixed.

See #1141 for more context.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants