-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(backend): Added multi-user pipelines API. Fixes #4197 #4835
Conversation
Hi @maganaluis. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Awesome 👍👍 |
/cc @yanniszark @elikatsis @Jeffwan @IronPan @chensun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick high-level review.
- The backend behavior LGTM.
- I think we need some frontend design how users may view/choose shared pipelines from UI
EDIT: I'll follow up with more detailed review once most people agree with the high-level behavior.
@Bobgy thanks for the ping, I'll take a look this week |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maganaluis I took a quick look at the backend work. First of all, great work!
I have the following high-level comments:
-
I think the RBAC permissions are not aligned with the design outlined here: Multi-User Authorization: Add support for K8s RBAC via SubjectAccessReview #3513 (comment). More specifically, the design specifies that versions are a subresource of pipelines, but I see the checks on versions use the pipeline resource. In addition, several of the verbs used are wrong (e.g., checking for the list verb on a create handler). Could you go over the new endpoints and ensure they conform to the design?
-
I think that I saw the ListPipelines call return both namespaced and non-namespaced Pipelines. I could be wrong on this. Could you confirm that the List Pipelines API call only lists pipelines for the specified namespace?
-
What should we do about the separation of namespaced and non-namespaced pipelines? Should we differentiate between them in the authorization layer? (e.g., Pipeline vs ClusterPipelines). cc @Bobgy
Again, thanks for the great effort on this! 😄
cc @elikatsis to also take a look
@Bobgy @yanniszark I think returning only namespaced Pipelines makes sense, I will remove the shared capabilities for now. While still keeping the |
Thanks for covering both high level and low level problems!
I can see he did it intentionally, if API returns both, then we wouldn't need to adjust UI to make both types of pipelines discoverable. This is good for saving some initial cost. Further speaking, this is backward compatible behavior to allow upgrading without breaking any sdk/UI client code. I believe we need some further discussion, how we can introduce the default behavior change. Shall we add an request field to switch the behavior or add an API server configuration?
I think we can discuss this after this PR, because this can be a progressive improvement. Overall, I'd say I want to scope down on this PR by focusing on MVP changes to introduce pipeline separation. We can improve further on demand in following ups. |
Returning pipelines only for the specified namespace is also backwards compatible. IMO, when the API call specifies a namespace, it doesn't make sense semantically to return things that are not in that namespace. It's a violation of the filter that the user has clearly specified. @maganaluis please let me know when the PR is ready for another pass :) |
@yanniszark I agree, let me go over the code one more time and re-test it from my side. Just a quick question, given we'll be making changes to the API, do I need to make modifications to the KFP SDK? Should this be a separate PR? |
I added the name_namespace index; thanks for the review this quite a big bug. :) The current code will only display the Pipelines for the namespace being queried, and the "Shared" check-mark has been removed. I've already tested locally in MiniKube. The only issue is that the Examples will not load for the users anymore, because these are being loaded in the "" (Public) namespace. |
You should regenerate python SDK with script: https://github.com/kubeflow/pipelines/blob/master/backend/api/build_kfp_server_api_python_package.sh. |
Let me explain the end-to-end user journey for backward compatibility:
One thing that will be breaking if pipeline endpoint only returns namespaced pipelines is that:
Reiterating on my goal here again: I'd want a user using KFP to be able to upgrade to a new version while keeping all the existing functionality still working --- including KFP UI and KFP SDK usages. All the information that were available should still be available. And after going over these scenarios, I think the minimal effort fix is to: The flag defaults to false, because that's the desired long term behavior of only showing namespaced pipelines Other use-cases should have no problem with the new behavior:
This is for discussion, what do you think? I think it'll be better to wait until we all agree on this topic before starting coding. |
b15ff17
to
a3f0dda
Compare
@Bobgy As suggested, I removed all the UI changes, and ensured backwards compatibility.
I tested the namespaced functionality, and this also works as expected. If we are not going to make any changes on the UI, does it make sense to include the Python API changes or should we remove it as well? |
@maganaluis Thanks! Let me take a look. It's recommended to keep Python api changes, because it should reflect latest API status. |
/lgtm Thank you again for the continued efforts! /hold |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Bobgy, maganaluis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Bobgy, maganaluis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks @Bobgy @maganaluis, I'll try to take a look today or tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maganaluis thanks for the great work on this. I mainly have a few nits in the code and some questions around SQL indexes and migrations.
response = db.Model(&model.Pipeline{}).RemoveIndex("Name") | ||
if response.Error != nil { | ||
glog.Fatalf("Failed to drop unique key on pipeline name. Error: %s", response.Error) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the migration code be wrapped in a transaction (remove index + add new)?
This is in order to not leave the database in an invalid state, in case the migration fails to complete.
Also, is this code idempotent? Meaning, what would happen if the "Name" index doesn't exist? Would it fail?
Maybe gorm
does it automatically if the (name, namespace) index is declared on the Go struct (in AutoMigrate)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my understanding as well, and yes I also believe the operation is idempotent, I remember testing this locally. I added a comment here in how it would look like given KFP switches to the gorm
main fork.
#5125
I think it's best to leave the operations separately, that's the pattern currently being used with this fork.
We tested this from Kubeflow 1.1 and we are using the code I wrote in production. Which has this logic, we did not observe any issues and we also tested it on the MySQL in cluster which Kubeflow comes with and an Azure MySQL.
if err != nil { | ||
return "", util.Wrap(err, "Failed to get namespace from versionId ID") | ||
} | ||
pipeline, err := r.GetPipeline(pipelineVersion.PipelineId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe reuse GetNamespaceFromPipelineID
here?
refKey := filterContext.ReferenceKey | ||
if refKey == nil { | ||
// In single user mode, apply filter with empty namespace for backward compatibile. | ||
filterContext = &common.FilterContext{ | ||
ReferenceKey: &common.ReferenceKey{Type: common.Namespace, ID: ""}, | ||
} | ||
} | ||
if refKey != nil && refKey.Type != common.Namespace { | ||
return nil, util.NewInvalidInputError("Invalid resource references for pipelines. ListPipelines requires filtering by namespace.") | ||
} | ||
if refKey != nil && refKey.Type == common.Namespace { | ||
namespace := refKey.ID | ||
resourceAttributes := &authorizationv1.ResourceAttributes{ | ||
Namespace: namespace, | ||
Verb: common.RbacResourceVerbList, | ||
} | ||
if err = s.CanAccessPipeline(ctx, "", resourceAttributes); err != nil { | ||
return nil, util.Wrap(err, "Failed to authorize with API resource references") | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would rewrite that as:
refKey := filterContext.ReferenceKey
// Validate first
if refKey != nil && refKey.Type != common.Namespace {
return nil, util.NewInvalidInputError("Invalid resource references for pipelines. ListPipelines requires filtering by namespace.")
}
if refKey == nil {
// In single user mode, apply filter with empty namespace for backward compatibile.
filterContext = &common.FilterContext{
ReferenceKey: &common.ReferenceKey{Type: common.Namespace, ID: ""},
}
}
namespace := refKey.ID
resourceAttributes := &authorizationv1.ResourceAttributes{
Namespace: namespace,
Verb: common.RbacResourceVerbList,
}
if err = s.CanAccessPipeline(ctx, "", resourceAttributes); err != nil {
return nil, util.Wrap(err, "Failed to authorize with API resource references")
}
because it eliminates an if
clause and imo makes it less complex. Your call though :)
*/ | ||
refKey := filterContext.ReferenceKey | ||
if refKey == nil { | ||
// In single user mode, apply filter with empty namespace for backward compatibile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for both multi-user and single-user at this moment right? It's for general backwards-compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but you'll actually want to keep this code here going forward. Otherwise you'll run into issues with KFP standalone.
backend/src/apiserver/server/util.go
Outdated
func GetPipelineNamespace(queryString string) (string, error) { | ||
pipelineNamespace, err := url.QueryUnescape(queryString) | ||
if err != nil { | ||
return "", util.NewInvalidInputErrorWithDetails(err, "Pipeline namespace in the query string has invalid format.") | ||
} | ||
return pipelineNamespace, nil | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Since this function is only used in the upload server, maybe keep it there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Done.
if filterContext.ReferenceKey != nil && filterContext.ReferenceKey.Type == common.Namespace { | ||
glog.Info("Using Namespace to filter the query") | ||
query = query.Where( | ||
sq.Eq{"pipelines.Status": model.PipelineReady, | ||
"pipelines.Namespace": filterContext.ReferenceKey.ID}, | ||
) | ||
} else { | ||
query = query.Where( | ||
sq.Eq{"pipelines.Status": model.PipelineReady}, | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this if/else needed? From what I understand, the filterContext
always contains a namespace reference key, because the ListPipelines
endpoint sets it to the default value if it's not present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to check this, otherwise you'll get a nill pointer while checking for the ID. Same on the List Pipelines method. It's mostly due to keeping backwards-compatibility with KFP stand-alone.
New changes are detected. LGTM label has been removed. |
corrected typo updating code based on review fixes for pipelines server reverting this back
@Bobgy @yanniszark Thank you for the reviews. It's good to go from my side. |
/LGTM Thank you a lot again @maganaluis @yanniszark! Let's get this going! If there are any problems, we can always come back to fix. |
/unhold |
Added namespaced pipelines, with UI and API changes, as well as the ability to share pipelines.
Fixes:
#4197
Description of your changes:
Authors:
@arllanos @maganaluis