Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributions: Readiness for 1.3 Kubeflow Release #1798

Closed
5 tasks done
yanniszark opened this issue Apr 5, 2021 · 38 comments
Closed
5 tasks done

Distributions: Readiness for 1.3 Kubeflow Release #1798

yanniszark opened this issue Apr 5, 2021 · 38 comments

Comments

@yanniszark
Copy link
Contributor

yanniszark commented Apr 5, 2021

Problem Statement

We have a 6 step plan for releasing Kubeflow 1.3: #1777
This issue is for the 6th step: distributions use instructions from wg-manifests for how to use kustomizations and create distributions for 1.3.

Distribution Owners Readiness for 1.3
Arrikto EKF @kubeflow/arrikto ✔️
Arrikto MiniKF @kubeflow/arrikto ✔️
Azure @kubeflow/azure
AWS @kubeflow/aws ✔️
Charmed Kubeflow @RFMVasconcelos ✔️
Google Cloud @kubeflow/google ✔️
IBM @kubeflow/ibm ✔️
Kubeflow on MicroK8s @RFMVasconcelos ✔️
Kubeflow Operator @kubeflow/red-hat ✔️
Kubeflow with Argo CD @davidspek ✔️
kfctl_istio_k8s Support dropped, remove from docs
kfctl_istio_dex Support dropped, remove from docs
Openshift @kubeflow/red-hat ✔️

Distributions should have owners and update their process/docs for 1.3 installations.

Distributions can use the instructions for installing Kubeflow 1.3 components and common services, provided by wg-manifests, to perform their integration:
https://github.com/kubeflow/manifests/tree/v1.3.0-rc.0#readme

I would like to ask all distribution owners to check-in in this issue to confirm that they will support their distributions for Kubeflow 1.3.

Current Issues:

@davidspek
Copy link
Contributor

Guess you can add Kubeflow On-Prem/ArgoFlow/Argo CD to this list with https://github.com/kubeflow-onprem/ArgoFlow and status Ready.
/cc @jtfogarty @tbaums @RFMVasconcelos

@nakfour
Copy link
Member

nakfour commented Apr 5, 2021

@yanniszark we plan to update the Openshift distribution, however not sure if one week will be enough, it will depend on the issues we run into.

@rui-vas
Copy link

rui-vas commented Apr 6, 2021

Hi Yannis,

Thank you for leading this so effectively :)

For reference, I am only a small part of the Canonical team, @knkski @evilnick and @DomFleischmann are the stars :) I will create @kubeflow/canonical next week for simplicity in the future.

@yanniszark
Copy link
Contributor Author

Guess you can add Kubeflow On-Prem/ArgoFlow/Argo CD to this list with https://github.com/kubeflow-onprem/ArgoFlow and status Ready.

@davidspek sounds awesome! I'd love to include the Argoflow distribution, but I want to ask first. Distributions are usually backed by vendors, who have a commercial interest in supporting and maintaining them. This ensures a good user experience and sufficient support. Who will be the owners of Argoflow? Is Argoflow something that its owners plan to maintain and support throughout the release and in later releases?

@yanniszark we plan to update the Openshift distribution, however not sure if one week will be enough, it will depend on the issues we run into.

Thanks for the reply @nakfour. I understand that issues may come up during testing, so let's communicate frequently on the status and decide accordingly. Could you also inform me if Red Hat is planning to support the Kubeflow Operator distribution? I believe Red Hat is the main stakeholder there, correct?

@yanniszark
Copy link
Contributor Author

From Arrikto's side, we are dropping support for the kfctl_istio_dex distribution, so we should completely remove it from the docs.
With the instructions provided by WG-Manifests , a user can deploy all Kubeflow components and common services, including Istio and Dex, using standard kustomize and kubectl. So, existing users should be able to just use these instructions instead.

@davidspek
Copy link
Contributor

davidspek commented Apr 6, 2021

@yanniszark ArgoFlow is just using the upstream manifests (sometimes with a fix I have implemented), so the maintenance is all in the upstream manifests and Argo CD itself working. Basically, it is automating and simplifying the steps from the README in this repository. I don't think a distribution needs to be backed by a vendor, there can also be a community distribution. Having a commercial vendor doesn't necessarily mean there will be a good user experience, as I've seen very many unanswered issuer related to vendor distributions and their KfDef files.

I will be actively maintaining the ArgoFlow repository for this release, so you can list me as the owner. I think the on-prem working group wants to pick it up as well at some point.

@nakfour
Copy link
Member

nakfour commented Apr 6, 2021

@yanniszark yes we are using the KF Operator for our KF distribution in Open Data Hub on Openshift, we are interested in taking over Kubeflow Operator distribution, however not at the moment. Maybe we can discuss this after KF 1.3 is released.

@davidspek
Copy link
Contributor

davidspek commented Apr 6, 2021

@yanniszark I think it should be: Kubeflow with Argo CD. Also, status can be set to ready as it was built specifically for 1.3.

@berndverst
Copy link
Member

FYI OIDC Auth Service Manifest is broken. PR here: #1805

@davidspek
Copy link
Contributor

davidspek commented Apr 7, 2021

Cross-posting my comment here as well as it directly affects distributions.

@yanniszark After discussing with @Bobgy it might be a blocking issue for the distributions to not include the new Jupyter Web App which allow for spawning VSCode and RStudio notebook servers due to the use of their respective logos. I'll be looking into a mitigation route for distributions to avoid potential problems here, but they will need to be notified about this.

@Bobgy Also suggested to bring up that RStudio is licensed under the AGPL license, so the example image with RStudio might also be something distributions need to look into if they can include this or not. The above mitigation will also address this problem, but it is something distributions might want to look into separately.

Mitigation:
The PR that solves this problem was just merged in kubeflow/kubeflow#5823. All references to the trademarks for VSCode and RStudio have been removed from the UI and code. A new confimap was added to the deployment which contains allows users to set whatever SVG logos (for the Spawner UI) and icons (for the index page) they want for each Notebook Server Type.

@nakfour
Copy link
Member

nakfour commented Apr 7, 2021

@yanniszark bumped into first issue here #1810

@Bobgy
Copy link
Contributor

Bobgy commented Apr 8, 2021

kubeflow/dashboard#40 of using namePrefix in upstream manifests make it harder to patch resources (because the expected name we use to patch resources become confusing, sometimes we need the prefix and sometimes not).
It is a common inconvenience for us, I'd like to hear what others think about it.

@Bobgy
Copy link
Contributor

Bobgy commented Apr 8, 2021

@yanniszark we identified root cause for kubeflow/kubeflow#5813, it should affect any distribution using profile plugins -- GCP and AWS, therefore I think it's a blocking issue we need to resolve before the release.

@davidspek
Copy link
Contributor

The fix for the trademark issue described in my previous comment has been created in kubeflow/kubeflow#5823.

@Bobgy
Copy link
Contributor

Bobgy commented Apr 10, 2021

Update for GCP, after resolving #1798 (comment), @zijianjoy and I got KFP and notebooks multi-user mode working on GCP. We are looking into other kubeflow applications.

@karlschriek
Copy link
Contributor

@kubeflow/aws I would be happy to also test out the AWS distribution and help with any issues. This current code in https://github.com/kubeflow/manifests/tree/master/distributions/stacks/aws looks fairly old. Is there something more recent somewhere?

@yanniszark
Copy link
Contributor Author

@kubeflow/cisco any update on the kfctl_istio_k8s distribution?

@andreyvelich
Copy link
Member

@kubeflow/cisco any update on the kfctl_istio_k8s distribution?

From our side we will use the default Dex + OIDC installation for Kubeflow 1.3.
I think we can deprecate kfctl_istio_k8s and kfctl_istio_dex.
cc @ramdootp @amsaha

@nakfour
Copy link
Member

nakfour commented Apr 13, 2021

For OCP we ran into this issue :kubeflow/kubeflow#5803 , we have a workaround as described in the comment

@davidspek
Copy link
Contributor

@yanniszark @PatrickXYS Seems like there is an issue with AwsIamForServiceAccount plugin for the profile controller. kubeflow/kubeflow#5812

@yanniszark
Copy link
Contributor Author

@davidspek I took a look at the issue and I believe it's a duplicate of kubeflow/kubeflow#5813, which we have fixed.

@nakfour
Copy link
Member

nakfour commented Apr 16, 2021

@yanniszark an update on OCP distribution, we are about 80% done, I dont think we will be done by Monday. I wanted to see if we can delay the KF 1.3 release since looks like most distributions on the list above are still not ready. If not, do we have a target date for KF 1.3.1 tag so we can have our code tagged?
Also for next release, I wonder if we can do like a two tier release, one for KF and one a couple of weeks later for distributions. Just a thought, since with all the issues and a lot of components to test it takes longer time.
Thanks

@rui-vas
Copy link

rui-vas commented Apr 16, 2021

[Green light] - Charmed Kubeflow distribution: We have no pending issues with the manifests and expect to release our distribution within our typical 2 weeks timeframe of upstream to distribution release. @knkski and @DomFleischmann are leading this. cc @yanniszark @castrojo

@moficodes
Copy link
Contributor

@yanniszark For IBM release for IKS I am about 95% done. Just a couple of minor changes and clean up. Should be done in a few hours.

@PatrickXYS
Copy link
Member

@yanniszark From AWS side, we're pretty good, 90% done. Should be able to finish the PR by today or over the weekends.

@Bobgy
Copy link
Contributor

Bobgy commented Apr 19, 2021

@yanniszark I'm sending a PR to update KFP doc in manifests root README to resolve some confusions.
#1851
Also, I think we should update KFP manifest version in the repo to 1.5.0-rc.3. Does manifest WG want to do that or should I create a PR? curious if you built any script to automate this

@yanniszark
Copy link
Contributor Author

Thanks @Bobgy! I merged the README PR, thanks for taking the time to create that one.
For upgrading the kfp manifests to 1.5.0-rc.2, I'd love to but I think we are too close to the release to do that. A lot of distributions would need to rebase and redo their testing, pushing the release further. I think we should put it in 1.3.1, along with other important changes like upgrading cert-manager and Knative.
What do you think?

@nakfour
Copy link
Member

nakfour commented Apr 19, 2021

@yanniszark OCP KF 1.3 distribution is ready, just pending review and merge of #1811
Also the operator at the moment does not need any specific changes to get KF 1.3 installed.

@PatrickXYS
Copy link
Member

@yanniszark AWS EKS 1.3 manifest is ready, I'll find someone help review as well. #1832

@Bobgy
Copy link
Contributor

Bobgy commented Apr 20, 2021

Thanks @Bobgy! I merged the README PR, thanks for taking the time to create that one.
For upgrading the kfp manifests to 1.5.0-rc.2, I'd love to but I think we are too close to the release to do that. A lot of distributions would need to rebase and redo their testing, pushing the release further. I think we should put it in 1.3.1, along with other important changes like upgrading cert-manager and Knative.
What do you think?

UPDATE: I just released KFP 1.5.0, it's based on the same commit as 1.5.0-rc.3, I'd suggest use KFP 1.5.0 as the final release version.

The difference between KFP 1.5.0-rc.2 and 1.5.0-rc.3 is very minimal, most of the commits are either components or sdk. The only real changes are: kubeflow/pipelines#5446, kubeflow/pipelines#5408, kubeflow/pipelines#5424 (5424 is an important bug fix).

So I'd say it's basically a drop-in replacement of KFP 1.5.0-rc.2 that we do not need to worry about re-integration.

@yanniszark sorry I keep getting confused of our timeline, from previous emails, I thought current release will be 1.3.0-rc.1. When is 1.3.0 planned?

@Bobgy
Copy link
Contributor

Bobgy commented Apr 20, 2021

@Bobgy Also suggested to bring up that RStudio is licensed under the AGPL license, so the example image with RStudio might also be something distributions need to look into if they can include this or not. The above mitigation will also address this problem, but it is something distributions might want to look into separately.

in #1798 (comment)

I've been consulting with Google lawyers about RStudio being AGPL and haven't got the conclusive answer yet. GCP distribution might consider disabling RStudio support altogether. We are also confirming whether https://github.com/kubeflow/kubeflow/tree/master/components/example-notebook-servers/rstudio should be considered AGPL as well, which might break kubeflow/kubeflow's license declaration of Apache 2.0.

@moficodes
Copy link
Contributor

IBM Distribution is ready #1823
Waiting for review and merge

@yanniszark
Copy link
Contributor Author

yanniszark commented Apr 20, 2021

Thanks @Bobgy! I took a look at kubeflow/pipelines#5424 and it seems that it's very similar to a recent bug we fixed in KFP. I viewed the manifests diff, deployed them and tested them and all seems good with no incompatibilities. Thus, I will make a PR to include 1.5.0, since it has an important bugfix. Please take a look at: #1859

@yanniszark sorry I keep getting confused of our timeline, from previous emails, I thought current release will be 1.3.0-rc.1. When is 1.3.0 planned?

We released 1.3.0-rc.1 over the weekend, as per the email I had sent to the list saying that I would move forward with an rc1. I didn't send a separate email after the actual cut, which now seems I should have, so perhaps this is why there was confusion. The plan is to cut 1.3.0 today.

@berndverst
Copy link
Member

There is insufficient information that explains which manifests or overlays must be used for multi-user and which must be used for single user Kubeflow.

Additionally, dependencies between components from apps and components from common aren't clear.

Further, the example provided by @yanniszark (https://github.com/kubeflow/manifests/blob/master/example/kustomization.yaml) deploys both OIDC auth service and a Dex Istio Overlay. Why? I thought people use either Dex or OIDC Auth Service (https://github.com/arrikto/oidc-authservice)? Is there a dependency between these? Can I safely remove the Dex overlay and OIDC Auth Service should still work?

I really need these things documented and explained properly before I (or anyone else contributing in their free time) can complete the Azure distribution.

#1873

@davidspek
Copy link
Contributor

@berndverst Indeed, there is no information regarding single user deployments. Regarding the OIDC authservice and Dex, they are both necessary. The OIDC authservice is the OIDC client that while Dex is the OIDC provider. Dex could be replaced with another OIDC provider (keycloak or AWS Cognito for example) by changing the OIDC authservice configuration.

@zijianjoy
Copy link
Contributor

Update: Kubeflow v1.3 on Google Cloud is available. Documentation is also updated: https://www.kubeflow.org/docs/distributions/gke/deploy/overview/

@stale
Copy link

stale bot commented Aug 21, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in one week if no further activity occurs. Thank you for your contributions.

@stale
Copy link

stale bot commented Oct 2, 2021

This issue has been closed due to inactivity.

@stale stale bot closed this as completed Oct 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests