-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubeflow on GCP - Support managed storage #4356
Kubeflow on GCP - Support managed storage #4356
Comments
Having data in storage outside of the cluster makes it more accessible. We've seen several users having problems trying to get access to this data. Latest one is #4327 When the data is inside the cluster, the MLMD is less useful. It stores the URIs, but they cannot be accessed directly. |
I thought Kubeflow on GCP has an option to enable Cloud SQL? What makes GCS/Cloud SQL incompatible with the full Kubeflow 1.0.2 installation? I thought it would be as simple as applying the cloudsqlproxy and minio patches, but I haven't tried. Regarding importance, I also think managed storage via Cloud SQL and GCS is fairly important for us (and IMO, should probably just be the default on GCP). One thing that managed storage simplifies is lifecycle management, scaling, and high availability of cluster data. |
What is the relative importance of these features for you? How would you rank the following: "GCS without CLoudSQL", "CloudSQL without GCS", "GCS with CloudSQL". |
I think I'd rank them:
I think this is obvious, having a stateless cluster reduces the operational burden and complexity significantly, as well as the reasons you guys brought up. For the second and third choices, I think it's tough but ultimately the operational burden to support an in-cluster hosted DB backed by a persistent disk outweighs the benefit of externalizing artifact storage on GCS. For the most part, we do that anyways but having it abstracted via minio is a nice-to-have. |
Another +1 on support for managed storage, which we are using via AI Platform Pipelines.
|
Thanks for the detailed feedbacks! Forgot to ask one more question: what do you think making managed storage the default option on GCP? |
Interesting. I would have expected the opposite: With artifacts being in the open GCS instead of in-cluster PD, the users are free to tinker with them, break the immutability guarantees and have untracked side-channel data access. While with more black-box store, they can only work with the intermediate data via the system channels. |
I'm surprised to find this topic on github as I thought that newest version supports managed storage out-of-the-box, especially because there is a KFP link related to managed storage configuration: Right now we're facing the issues with migration to another Kubeflow cluster, we wanted to persist all pipeline runs (experiments) but it seems to be really laborious task with current setup (in-cluster PDs and MySQL). I totally agree with @parthmishra - those are basically my feelings around this topic:
@Bobgy |
To answer some of the questions: Yes, KFP standalone https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/env/gcp supports managed storage on GCP. For full Kubeflow, the 1.1 release timeline was fairly tight, I was only able to get KFP multi-user support there, but didn't manage to tackle managed storage (although it isn't a lot of work, I should be able to reuse the standalone configuration and then write some documentation). I created this issue to get some early feedback and considering the positive feedback, I'll see if I can prioritize making managed storage as an option in the Kubeflow 1.1.1. Starting from next Kubeflow minor release will be a good timing we can consider changing it as the default. |
FYI, I created another issue for supporting upgrading Kubeflow 1.0.2 to Kubeflow 1.1.0 while keeping data: #4346. It's agnostic to managed storage, because KFP switched from shared mode to multi-user mode in Kubeflow 1.1.0 and DB migration will be required to keep your data. Looks like some of you in this thread will be interested in that topic too, so I am talking about it here. |
Hi Yuan, assume your mean this ticket is for Kubeflow fullset. Not Kubeflow Pipeline (standalone and Hosted/CAIP Pipeline). |
@rmgogogo Yes, this issue is for the full Kubeflow installation. |
Hi @jlewi, I joined the "Kubeflow Community Pipelines Meeting" today, and was hoping we could add CloudSQL and GCS support for the new kubeflow v1.1.* deployment. We are getting to ready to deploy in the next few weeks, and it would be awesome if we could get this feature out. We think it would make Kubeflow more stable and easier to maintain in production. Otherwise, we'll go ahead and deploy the standard version. ~ Royer |
@RoyerRamirez I'm not working on this issue. I would suggest working with the KFP folks and the folks mentioned in this issue to see about prioritizing it if you think its important. |
I've got enough feedback and will work on this next |
Hi, just a few findings from my end when deploying with cloudsql and GCS. Firstly context, I am deploying using the following custom 'stack' on 1.16 GKE with Istio 1.5.10:
I am facing one issue: I am getting the following error in the API server: client_manager.go:353] Failed to check if Minio bucket exists. Error: Access Denied. I am able to port forward across the Minio service normally and I can see that it has all the relevant access. When I jump into a curl container and try to curl the minio end I am getting an RBAC access denied. I imagine that the same is happening with Minio. EDIT: My second issue was caused by firewall walls restricting webhook to run. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/lifecycle frozen |
…pelines#4356 (#259) * feat(kfp): use managed storage -- GCS and CloudSQL * update * Revert "update" This reverts commit bf1f212. * remove isSet * fix bucket access permission * rm unused cnrm/pipelines package * feat(kfp): resource name includes KF_NAME & separate cloudsql-name & bucket-name setters * reset values * reset values
…pelines#4356 (GoogleCloudPlatform#259) * feat(kfp): use managed storage -- GCS and CloudSQL * update * Revert "update" This reverts commit bf1f212. * remove isSet * fix bucket access permission * rm unused cnrm/pipelines package * feat(kfp): resource name includes KF_NAME & separate cloudsql-name & bucket-name setters * reset values * reset values
* (ASM) Add instruction about how to upgrade ASM * (ASM) define in-package kpt value for ASM label * Document ASM upgrade changes * address comments * admin email * patch the quote around variables * (profile-controller) Adopt v1.3.0-gcp profile controller (#262) * feat(kfp): use managed storage -- GCS and CloudSQL. Fixes kubeflow/pipelines#4356 (#259) * feat(kfp): use managed storage -- GCS and CloudSQL * update * Revert "update" This reverts commit bf1f212. * remove isSet * fix bucket access permission * rm unused cnrm/pipelines package * feat(kfp): resource name includes KF_NAME & separate cloudsql-name & bucket-name setters * reset values * reset values * (management) Specify namespace in wait-gcp command. Fix #252 (#263) * chore: clean up pull-upstream.sh (#264) * chore: kpt-set.sh fixes (#265) * chore: Change pipeline/ to pipelines/ . Fix #268 (#269) * (management) Specify namespace in wait-gcp command * chore: Change pipeline/ to pipelines/ * update address comment * resolve conflict Co-authored-by: Yuan (Bob) Gong <4957653+Bobgy@users.noreply.github.com>
Google Cloud managed storages (GCS and CLoud SQL) make it easier for users to manage, backup and restore KFP data.
They are not currently supported for Kubeflow on GCP, we'd need some user feedback first before supporting them.
Please provide your feedback if this is important to you.
The text was updated successfully, but these errors were encountered: