Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Multi User] Support separate metadata for each namespace #4790

Open
Jeffwan opened this issue Nov 20, 2020 · 14 comments
Open

[Multi User] Support separate metadata for each namespace #4790

Jeffwan opened this issue Nov 20, 2020 · 14 comments

Comments

@Jeffwan
Copy link
Member

Jeffwan commented Nov 20, 2020

Part of #1223, since we close it, we need a separate issue to track this feature.

Support separate metadata for each namespace help us only see related artifact/executations.

Currently, MLMD doesn't have user/namespace concept to isolate metadata based on user. An workaround we can move forward is to aggregate artifacts/executations by existing experiment and runs in user's namespace. This will end up some MLMD queries and I am not sure how's the performance especially in large scale.

Thumbs up if this is something you need.

/kind feature

@Jeffwan
Copy link
Member Author

Jeffwan commented Nov 20, 2020

I didn't find existing issue to track this story. If there's one, please let me know

@numerology
Copy link

I remember there's a very related one in TFX repo: tensorflow/tfx#2618

I am assuming this one is talking about supporting multi-tenancy through k8s-native way -- namespace while that one is more about built-in multi-tenancy support in MLMD itself.

@Jeffwan
Copy link
Member Author

Jeffwan commented Nov 20, 2020

@numerology Yeah, If MLMD can add support for multi-tenancy. that would be great. Pipeline project can make corresponding changes.

I am assuming this one is talking about supporting multi-tenancy through k8s-native way -- namespace while that one is more about built-in multi-tenancy support in MLMD itself.

Yeah, that's true. If MLMD doesn't have plan to support it, we can still have the workaround to aggregate metadata at the namespace level

@arllanos
Copy link

@Jeffwan @numerology @Bobgy

Let me mention some points I think can be considered along with this issue related to artifacts list page. I did not check executions page yet.

  • Issue 1: data retrieval in ArtifactsList.tsx hangs with big number of artifacts and can be optimized. Following endpoint call is not necessary (seems this was added when ml_metadata was not returning the creation-time of the artifact):

    creationTime: await getArtifactCreationTime(artifactId, this.api.metadataStoreService),

    We implemented this optimization internally at PwC and the page does not hang anymore even with big number of artifacts.

    If we put our optimization upstream, there is a couple of options:

    Option 1 Do nothing else, that is, keep pagination disabled and keep filtering and sorting client-side until mlmd supports server side filtering (with predicates) and sorting.
    Drawback is that although artifacts will be rendered, it can be slow depending on number of artifacts in mlmd. Tested with 35000 artifacts in mlmd took ~50 secs to load all artifacts. Filter/sort ~15 secs.

    Option 2 Enable pagination by using server-side feature in mlmd. Pagination is currently available in mlmd. We tried this internally and paginated data is rendered almost immediately.
    However, sort/filter in the frontend client will act on data fetched in memory but not all data is in memory, only the portion of it that corresponds to the current pagination.
    A solution so ArtifactList uses pagination, sorting and filtering in mlmd side, I think is blocked until mlmd supports filtering with predicates and more flexible sorting (sorting in mlmd is limited to creation-time/update-time/id)

  • Issue 2: There is a ParentContext feature available in mlmd v1.0.0. Have you checked into this? Maybe can be used to achieve separation per namespace.

I'll appreciate your thoughts and comments.

CC/ @maganaluis

@Bobgy
Copy link
Contributor

Bobgy commented Aug 23, 2021

@Bobgy
Copy link
Contributor

Bobgy commented Aug 23, 2021

For anyone interested, for your namespace separation requirements, do you want metadata DB to be

  1. one instance per namespace
  2. shared instance per cluster and use namespace context to filter stuff in a certain namespace

WIth 1, we can build access control using Istio Authorization.
With 2, IIUC, Istio Authorization needs to parse the requests and understand which namespace it's querying on. That's probably not possible right now given the requests are in gRPC not HTTP.

@juliusvonkohout
Copy link
Member

@Bobgy is there any progress or decision made on this issue?

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Apr 12, 2022

For anyone interested, for your namespace separation requirements, do you want metadata DB to be

1. one instance per namespace

2. shared instance per cluster and use namespace context to filter stuff in a certain namespace

WIth 1, we can build access control using Istio Authorization. With 2, IIUC, Istio Authorization needs to parse the requests and understand which namespace it's querying on. That's probably not possible right now given the requests are in gRPC not HTTP.

@chensun @zijianjoy

maybe we should use a proxy as it is done for katib-mysql and katib-db-manager.
google/ml-metadata#141 suggests that we just have to add a namespace/profile/user column and filter by it

@juliusvonkohout
Copy link
Member

juliusvonkohout commented May 5, 2022

@Bobgy @zijianjoy istio should support grpc filtering now istio/istio#25193 (comment)

@ca-scribner would you be interested to implement this envoy filter? I am still busy with the minio stuff.

@zijianjoy
Copy link
Collaborator

zijianjoy commented Jul 19, 2023

Following up on this item:

I am leaning towards creating one MLMD instance per namespace. This is because we should consider the data lifecycle of MLMD information. When a namespace is deleted, we should have an approach to easily clean up data related to this namespace. This might not be easy with MLMD nowadays using single MLMD instance, because delete operation is not supported by design: google/ml-metadata#38. Thus starting the separation from the beginning is my current preference.

That said, I am aware that one MLMD instance per namespace probably mean resource usage overhead for a cluster with many namespaces. So we should consider using something like pod autoscaler: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/. This problem we are facing is similar to the artifact-ui scalability problem as well: #9555

@juliusvonkohout
Copy link
Member

@zijianjoy amazing, that this is tackled. We need to have this anyway as CNCF graduation requirement. The CNCF will do a security assessment and this is a clear security violation.

I think the artifact per namespace visualization server should be removed anyway, since it is deprecated and the artifact proxy is obsolete as well as explained here #9555 (comment).

That means currently you can already have zero overhead namespaces if you drop old garbage. I know of Kubeflow installations with several hundred namespaces, so that is a real problem customers are facing. I can create a PR to make that the default and fix the security issue i have found a few years ago with code from @thesuperzapper #8406 (comment)

In the long term i would propose switching to MLFlow, since that seems to be the industry standard, but if that is not possible due to google policies we should consider something with minimal footpprint. Maybe knative serverless per namespace
Nevertheless i still prefer a single MLMD instance for the time being to keep supporting zero overhead kubeflow namespaces and find a proper solution for the long-term, so not MLMD.

@rimolive
Copy link
Member

rimolive commented Nov 8, 2023

Following up on this item:

I am leaning towards creating one MLMD instance per namespace. This is because we should consider the data lifecycle of MLMD information. When a namespace is deleted, we should have an approach to easily clean up data related to this namespace. This might not be easy with MLMD nowadays using single MLMD instance, because delete operation is not supported by design: google/ml-metadata#38. Thus starting the separation from the beginning is my current preference.

That said, I am aware that one MLMD instance per namespace probably mean resource usage overhead for a cluster with many namespaces. So we should consider using something like pod autoscaler: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/. This problem we are facing is similar to the artifact-ui scalability problem as well: #9555

One MLMD instance per namespace is bad for a Governance aspect. What if I want to track all assets produced by the company, like a catalog? This would require querying multiple MLMD API Servers. There should be a way to prevent unwanted access through Istio, thus creating a solution that does not depend on MLMD developers to implement.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 24, 2024
@juliusvonkohout
Copy link
Member

/lifecycle frozen

@google-oss-prow google-oss-prow bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants