Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK] Create API to get Trial metrics from Katib DB #2022

Closed
andreyvelich opened this issue Nov 18, 2022 · 10 comments
Closed

[SDK] Create API to get Trial metrics from Katib DB #2022

andreyvelich opened this issue Nov 18, 2022 · 10 comments

Comments

@andreyvelich
Copy link
Member

/kind feature
/area sdk

Our Katib Python SDK doesn't have an API to get Trial metrics from Katib DB.
Currently, user can see the Trial metrics only using Katib UI.
We should give an ability to query metrics using GetObservationLog gRPC API via Katib SDK.

From the security perspective user can run this gRPC API from any namespace and any experiment since our DB Manager doesn't have any auth checks, right ?
Should we investigate how to improve user isolation for Katib ("multi-user mode feature") ?
One solution could be to use Istio to allow traffic only from the appropriate user, as @apo-ger mentioned here: #1983 (comment).

What do you think @johnugeorge @gaocegege @tenzen-y @anencore94 @kimwnasptd @apo-ger ?


Love this feature? Give it a 👍 We prioritize the features with the most 👍

@kimwnasptd
Copy link
Member

@andreyvelich that's a great feature!

Regarding the authnz part, I think this discussion will revolve around having programmatic client support for the DB Manager API Server. This is the same with how KFP allows Pods from other namespaces to use its API Server to perform CRUD tasks kubeflow/pipelines#5138.

And this is done by:

  1. Allowing everyone to talk to the DB Manager, but without setting the kubeflow-userid header (to avoid impersonations).
  2. The DB Manager will drop any requests that are not authenticated
  3. In-cluster pods that will need to talk to the DB Manager will need to provide an audience scoped ServiceAccount token
  4. The DB Manager will need to validate the token
  5. The DB Manager will then extract the identity (ServiceAccount name) from that token and perform a SubjectAccessReview

Then there's also the discussion on how to use the ServiceAccount tokens from outside the cluster. But this is a next step once we have the above in-cluster behavior working

@andreyvelich
Copy link
Member Author

This is the same with how KFP allows Pods from other namespaces to use its API Server to perform CRUD tasks kubeflow/pipelines#5138.

Thanks for sharing this @kimwnasptd. On the recent Kubeflow summit we also got questions will Katib SDK have the same auth: https://kubeflow.slack.com/archives/C046YTDRABW/p1666199636566639.
Also, maybe we should authenticate all gRPC calls in Katib using ServiceAccountToken as you suggested or/and all gRPC request should go through proxy (e.g. Katib API Server) to verify the requests.

I guess, currently users can call GetSuggestions in all Kubernetes namespaces where Katib Experiment is running (similar problem with the Trial metrics). Or any other gRPC APIs that we have.

I think, we should have broader discussion in the Kubeflow community how to keep the same security best practice for our various components (e.g. Pipelines, Katib).

cc @kubeflow/wg-training-leads @tenzen-y @anencore94

@johnugeorge
Copy link
Member

@kimwnasptd @andreyvelich We need to think about external access as well for this feature. If it just works for in-cluster requests, it will not be a good value add for the SDK.
Adding to #2022 (comment), we haven't discussed the right design for SDKs in KF. Each project handles in a a different way(Pipelines, Kserve etc)

@gaocegege
Copy link
Member

We need to think about external access as well for this feature. If it just works for in-cluster requests, it will not be a good value add for the SDK.

I think so.

@tenzen-y
Copy link
Member

Also, maybe we should authenticate all gRPC calls in Katib using ServiceAccountToken as you suggested or/and all gRPC request should go through proxy (e.g. Katib API Server) to verify the requests.

@andreyvelich It sounds good.

BTW, I have a question.
In the case of gRPC calls between katib-components (e.g. metrics-collector <-> katib-db-manager), will the gRPC request through the katib API server for authentication? Or is that request direct access to the katib-db-manager, the same as always?

@andreyvelich
Copy link
Member Author

In the case of gRPC calls between katib-components (e.g. metrics-collector <-> katib-db-manager), will the gRPC request through the katib API server for authentication? Or is that request direct access to the katib-db-manager, the same as always?

@tenzen-y Since Metrics Collector is running on the user profile side, I guess we should have some sort of authentication.
From my understanding, currently any Kubeflow user can run our gRPC APIs to report/delete/get any logs from the DB: https://github.com/kubeflow/katib/blob/master/pkg/apis/manager/v1beta1/api.proto#L18-L28.

I will start with the simple API to get Trial metrics from the DB using SDK. We can think about proper auth in the following discussions.
/assign @andreyvelich

@tenzen-y
Copy link
Member

tenzen-y commented Dec 2, 2022

In the case of gRPC calls between katib-components (e.g. metrics-collector <-> katib-db-manager), will the gRPC request through the katib API server for authentication? Or is that request direct access to the katib-db-manager, the same as always?

@tenzen-y Since Metrics Collector is running on the user profile side, I guess we should have some sort of authentication. From my understanding, currently any Kubeflow user can run our gRPC APIs to report/delete/get any logs from the DB: https://github.com/kubeflow/katib/blob/master/pkg/apis/manager/v1beta1/api.proto#L18-L28.

I will start with the simple API to get Trial metrics from the DB using SDK. We can think about proper auth in the following discussions. /assign @andreyvelich

@andreyvelich Thanks for clarifying!

@anencore94
Copy link
Member

https://docs.google.com/document/d/1TRUKUY1zCCMdgF-nJ7QtzRwifsoQop0V8UnRo-GWlpI/edit?disco=AAAAknO9PlM

For answering the above question, @andreyvelich . I've seen many company make their own UI page using several kubeflow APIs including kubeflow notebooks, pipelines and katib. Thus if there is a http server for katib, many clients including there own sdk and ui will use those APIs much easier

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@andreyvelich
Copy link
Member Author

Let's close this issue, we can track the multi-user support for Katib DB manager in separate issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants