Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Multi user] How do we release KFP multi user in Kubeflow? #3645

Closed
Bobgy opened this issue Apr 29, 2020 · 19 comments
Closed

[Multi user] How do we release KFP multi user in Kubeflow? #3645

Bobgy opened this issue Apr 29, 2020 · 19 comments
Assignees
Labels
area/release cuj/multi-user status/triaged Whether the issue has been explicitly triaged

Comments

@Bobgy
Copy link
Contributor

Bobgy commented Apr 29, 2020

Part of #1223

/cc @jlewi @chensun @IronPan

Options:

  • As a fork based on kubeflow 1.0.2
  • Merge to kubeflow 1.0 branch and release as 1.0.3
  • Wait for kubeflow 1.1 (any ETA?)

Note, it's only for gcp-iap deployment initially.

@Bobgy
Copy link
Contributor Author

Bobgy commented Apr 29, 2020

@jlewi I think we want a way to release it soon (maybe as experimental first).

So we can also go with release as an experimental fork from kubeflow 1.0.2 and wait for either 1.0.3 or 1.1.

What do you think?

@Bobgy Bobgy self-assigned this Apr 29, 2020
@Bobgy Bobgy added status/triaged Whether the issue has been explicitly triaged cuj/multi-user area/release labels Apr 29, 2020
@jlewi
Copy link
Contributor

jlewi commented Apr 29, 2020

@Bobgy Why do we need a fork? If its on master can't people just deploy from master?

@jlewi
Copy link
Contributor

jlewi commented Apr 29, 2020

@Bobgy is all the relevant code/fixes on master?

What is the path forward for turning on ISTIO in the kubeflow namespace? kubeflow/kfctl#296

Does the KFP manifest(https://github.com/kubeflow/manifests/tree/master/pipeline) need to be updated with multiuser changes?

@Bobgy
Copy link
Contributor Author

Bobgy commented Apr 30, 2020

@jlewi I verified using namespace resource works (I guess I misconfigured sth when I tried it before), so we can just use kfctl 1.0.2.

All relevant code/fixes are on KFP master and my forked manifest branch: https://github.com/Bobgy/manifests/pull/3/files.

(it can be deployed normally following https://www.kubeflow.org/docs/gke/deploy/deploy-cli/ and use CONFIG_URI="https://raw.githubusercontent.com/Bobgy/manifests/kfp-multi-user/kfdef/kfctl_gcp_iap.yaml" instead

@Bobgy
Copy link
Contributor Author

Bobgy commented Apr 30, 2020

I prefer that, in this release, only KFP is kind of experimental using the multi user mode, but all other components are stable. Therefore, I think KF 1.0.2 might be a good candidate as a base.

What do you think?

in the mean time, I can try to merge forked changes to Kubeflow master

@jlewi
Copy link
Contributor

jlewi commented May 1, 2020

@Bobgy my suggestion would be to get things working on master. I don't know that we want to big changes onto release branches. Either way though getting it checked in working on master is a precursor so lets do that first.

@Bobgy
Copy link
Contributor Author

Bobgy commented May 1, 2020

I think we'd want to release a relatively stable version soon for community to try out KFP multi user mode in a forked branch.

In the mean time, I can try to start sending PRs to master.

@jlewi
Copy link
Contributor

jlewi commented May 4, 2020

Would back-porting this onto 1.0 branch be consistent with semantic versioning?
If we backport this onto 1.0 do we risk blocking our ability to release minor patch fixes to 1.0?

@Bobgy
Copy link
Contributor Author

Bobgy commented May 5, 2020

I see, that makes sense to me.

Shall we

  • target 1.1 release like before
  • provide a quick try-out version based on forking 1.0.2 unofficially
    ?

@jlewi
Copy link
Contributor

jlewi commented May 6, 2020

My suggestion would be to get it working on master. Once we have it working on master we can either add a git tag or possibly create a new release branch. I'd probably recommend we bump the minor version; e.g 1.1.0-alpha1.

@Bobgy
Copy link
Contributor Author

Bobgy commented May 6, 2020

@jlewi Sounds good.

After rethinking over it, I agree there's not much value added if the early access fork is based on 1.0.2, because KFP itself is also experimental.

I'll try to merge to kubeflow master first

@jlewi
Copy link
Contributor

jlewi commented May 7, 2020

@Bobgy +1
It would be good to get the multiuser support integrated into the GCP blueprint for blueprint users. That should be fairly straightforward. Blueprints should be easier to upgrade so it will be easier to roll out changes to customers using the blueprints.

I think the only changes would be

  1. Add pipelines to the stacks kustomization file

  2. Replace the namespaces package in the blueprint (this [file])(https://github.com/kubeflow/gcp-blueprints/blob/master/kubeflow/instance/kustomize/namespaces/namespaces.yaml) with the one you added to kubeflow/manifests so the blueprint is in sync

Per: GoogleCloudPlatform/kubeflow-distribution#5 I'm setting up autodeployments for gcp blueprints. That should make it easy for us to verify and ultimately add tests to verify it is all working.

@Bobgy
Copy link
Contributor Author

Bobgy commented May 8, 2020

@jlewi Thanks for the suggestion! Because I have limited capacity working on these, I will get these changes merged to kubeflow/manifest master first before trying out the gcp blue print.

@yanniszark
Copy link
Contributor

@Bobgy thanks a lot for confirming.
It seems that the user-identity parsing code is already present in a few places and probably more in the future.

While we (and other users) can test multi-user pipelines on their on-prem installations, it's very difficult to dig through every part of the code and make sure the headers are configurable.
The best time to incorporate these options is when the code is first written.

Is there a plan to make the headers configurable like the rest of Kubeflow's web apps? (Jupyter Web App, CentralDashboard, etc).
That would allow non-GCP users of Kubeflow to use multi-user pipelines.

@jlewi we had similar issues with web apps being GCP-only in v0.6 and we contributed changes throughout the code after the fact. After v0.6, we had agreed to use the established headers, so we wouldn't have to do it again. How can we make sure that new code is written in a way that it doesn't have an artificial dependence on GCP?

@jlewi
Copy link
Contributor

jlewi commented May 8, 2020

we had agreed to use the established headers
Do you mean we agreed to use environment variables to make the headers programmable? I recall discussing 2 options

  1. Using ISTIO to transform the headers to some standard values
  2. Making the headers programmatic for each application.

I thought we mostly went with the second option?

@yanniszark I think the best way would be to come with ways to make it easy for application developers to follow best practices. Ideally, that would mean finding some way to handle it centrally so people don't have to think about it or creating reusable libraries.

Tests would also help. If application developers write tests for the feature (e.g. multiuser pipelines) and we have CI setup for different platforms then that test would fail on other platforms and provide appropriate signal.

Finally documenting it and promoting it is also helpful.

@Bobgy
Copy link
Contributor Author

Bobgy commented May 8, 2020

Multi user is a very complex feature, I don't feel there's any problem if we prototype with just one platform first. It was good enough we agreed on the high-level design in the beginning and guaranteed it should work with all platforms. Now, it's pretty trivial to port the feature.

There are only been two places we are using it: https://github.com/kubeflow/pipelines/search?q=x-goog-authenticated-user-email&unscoped_q=x-goog-authenticated-user-email

@yanniszark @jlewi Can you share what options for other apps look like for configuring headers? I'm not sure they have enough visibility for other developers.

@yanniszark
Copy link
Contributor

Do you mean we agreed to use environment variables to make the headers programmable?

@jlewi
Yes, exactly, that was bad phrasing on my part.
Since in v0.6 we agreed to make them configurable (through envvars, cli args or config files), I would expect new applications integrating with Kubeflow's multi-tenancy to follow that approach.

@yanniszark @jlewi Can you share what options for other apps look like for configuring headers? I'm not sure they have enough visibility for other developers.

Absolutely!
Here are some examples:

@Bobgy I agree that expectations of Web Apps in terms of auth should be clear.
To help with making the concepts more clear and streamline implementations of future components, I have started a design doc that gathers all the current knowledge that is spread through the code in one place:
https://docs.google.com/document/d/11Xi-I2OqJvUuy_Zg0NskMF9GmworRDJhlu68poDgK5c/edit#bookmark=id.d3wqgeictsp4
I expect we will also discuss it further in Tuesday's Kubeflow community meeting.

The KUBEFLOW_USERID_HEADER and KUBEFLOW_USERID_PREFIX options would be a great start to get Pipelines working in all platforms.

@Bobgy
Copy link
Contributor Author

Bobgy commented May 13, 2020

/close
As we have reached agreement, further progress will be tracked in #3693

@k8s-ci-robot
Copy link
Contributor

@Bobgy: Closing this issue.

In response to this:

/close
As we have reached agreement, further progress will be tracked in #3693

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/release cuj/multi-user status/triaged Whether the issue has been explicitly triaged
Projects
None yet
Development

No branches or pull requests

4 participants