Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): Added multi-user pipelines API. Fixes #4197 #4835

Merged
merged 2 commits into from
Feb 26, 2021

Conversation

maganaluis
Copy link
Contributor

Added namespaced pipelines, with UI and API changes, as well as the ability to share pipelines.

Fixes:
#4197

Description of your changes:

  • Added a new field in Pipelines table for namespace.
  • Uploaded Pipelines are by default namespaced.
  • Ability to share Pipelines by selecting "shared" check-mark in the UI.
  • Authorization via SubjectAccessReview for Pipelines, PipelinesVersions, and Upload Pipelines endpoints.

Authors:
@arllanos @maganaluis

@k8s-ci-robot
Copy link
Contributor

Hi @maganaluis. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Bobgy
Copy link
Contributor

Bobgy commented Nov 28, 2020

Awesome 👍👍
/ok-to-test
I will take a deeper look next week

@Bobgy
Copy link
Contributor

Bobgy commented Nov 30, 2020

/cc @yanniszark @elikatsis @Jeffwan @IronPan @chensun
CC all stakeholders

Copy link
Contributor

@Bobgy Bobgy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick high-level review.

  • The backend behavior LGTM.
  • I think we need some frontend design how users may view/choose shared pipelines from UI

EDIT: I'll follow up with more detailed review once most people agree with the high-level behavior.

backend/api/pipeline.proto Outdated Show resolved Hide resolved
backend/api/pipeline.proto Outdated Show resolved Hide resolved
frontend/src/pages/PipelineList.tsx Outdated Show resolved Hide resolved
@yanniszark
Copy link
Contributor

@Bobgy thanks for the ping, I'll take a look this week

backend/api/pipeline.proto Outdated Show resolved Hide resolved
Copy link
Contributor

@yanniszark yanniszark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maganaluis I took a quick look at the backend work. First of all, great work!
I have the following high-level comments:

  • I think the RBAC permissions are not aligned with the design outlined here: Multi-User Authorization: Add support for K8s RBAC via SubjectAccessReview #3513 (comment). More specifically, the design specifies that versions are a subresource of pipelines, but I see the checks on versions use the pipeline resource. In addition, several of the verbs used are wrong (e.g., checking for the list verb on a create handler). Could you go over the new endpoints and ensure they conform to the design?

  • I think that I saw the ListPipelines call return both namespaced and non-namespaced Pipelines. I could be wrong on this. Could you confirm that the List Pipelines API call only lists pipelines for the specified namespace?

  • What should we do about the separation of namespaced and non-namespaced pipelines? Should we differentiate between them in the authorization layer? (e.g., Pipeline vs ClusterPipelines). cc @Bobgy

Again, thanks for the great effort on this! 😄
cc @elikatsis to also take a look

backend/src/apiserver/model/pipeline.go Show resolved Hide resolved
backend/src/apiserver/server/pipeline_server.go Outdated Show resolved Hide resolved
backend/src/apiserver/storage/pipeline_store.go Outdated Show resolved Hide resolved
backend/src/apiserver/server/pipeline_server.go Outdated Show resolved Hide resolved
backend/src/apiserver/server/pipeline_server.go Outdated Show resolved Hide resolved
backend/src/apiserver/server/pipeline_upload_server.go Outdated Show resolved Hide resolved
@maganaluis
Copy link
Contributor Author

@Bobgy @yanniszark I think returning only namespaced Pipelines makes sense, I will remove the shared capabilities for now. While still keeping the "" empty string as a default in the namespace field, so this capability can be added in the future. My main goal is to secure the Pipelines, so whatever is simpler.

@Bobgy
Copy link
Contributor

Bobgy commented Dec 5, 2020

Thanks for covering both high level and low level problems!

  • I think that I saw the ListPipelines call return both namespaced and non-namespaced Pipelines. I could be wrong on this. Could you confirm that the List Pipelines API call only lists pipelines for the specified namespace?

I can see he did it intentionally, if API returns both, then we wouldn't need to adjust UI to make both types of pipelines discoverable. This is good for saving some initial cost. Further speaking, this is backward compatible behavior to allow upgrading without breaking any sdk/UI client code.

I believe we need some further discussion, how we can introduce the default behavior change. Shall we add an request field to switch the behavior or add an API server configuration?

  • What should we do about the separation of namespaced and non-namespaced pipelines? Should we differentiate between them in the authorization layer? (e.g., Pipeline vs ClusterPipelines). cc @Bobgy

Again, thanks for the great effort on this! 😄
cc @elikatsis to also take a look

I think we can discuss this after this PR, because this can be a progressive improvement.

Overall, I'd say I want to scope down on this PR by focusing on MVP changes to introduce pipeline separation. We can improve further on demand in following ups.

@yanniszark
Copy link
Contributor

yanniszark commented Dec 11, 2020

I can see he did it intentionally, if API returns both, then we wouldn't need to adjust UI to make both types of pipelines discoverable. This is good for saving some initial cost. Further speaking, this is backward compatible behavior to allow upgrading without breaking any sdk/UI client code.

Returning pipelines only for the specified namespace is also backwards compatible. IMO, when the API call specifies a namespace, it doesn't make sense semantically to return things that are not in that namespace. It's a violation of the filter that the user has clearly specified.

@maganaluis please let me know when the PR is ready for another pass :)

@maganaluis
Copy link
Contributor Author

@yanniszark I agree, let me go over the code one more time and re-test it from my side. Just a quick question, given we'll be making changes to the API, do I need to make modifications to the KFP SDK? Should this be a separate PR?

@maganaluis
Copy link
Contributor Author

@yanniszark @Bobgy

I added the name_namespace index; thanks for the review this quite a big bug. :)

The current code will only display the Pipelines for the namespace being queried, and the "Shared" check-mark has been removed. I've already tested locally in MiniKube.

The only issue is that the Examples will not load for the users anymore, because these are being loaded in the "" (Public) namespace.

@Bobgy
Copy link
Contributor

Bobgy commented Dec 15, 2020

Just a quick question, given we'll be making changes to the API, do I need to make modifications to the KFP SDK? Should this be a separate PR?

You should regenerate python SDK with script: https://github.com/kubeflow/pipelines/blob/master/backend/api/build_kfp_server_api_python_package.sh.
Adding any other helpers/fields in kfp/_client.py can be a separate PR.

@Bobgy
Copy link
Contributor

Bobgy commented Dec 15, 2020

Returning pipelines only for the specified namespace is also backwards compatible. IMO, when the API call specifies a namespace, it doesn't make sense semantically to return things that are not in that namespace. It's a violation of the filter that the user has clearly specified.

Let me explain the end-to-end user journey for backward compatibility:

  1. User installs current KFP with multi-user mode enabled, but pipelines are shared
  2. User uses KFP, uploads some pipelines, and built some automation around KFP using KFP SDK and API.
  3. User upgrades KFP to a version which pipelines are separated
  4. I'd hope at this stage, all the existing KFP usages can still give the user what they had back.
    4.1 The user should be able to open KFP UI and see all the shared pipelines
    4.2 The user should be able to use KFP SDK to query all the shared pipelines
    4.3 It's OK if the user uploads more namespaced pipelines, but they cannot be queried by shared pipelines

One thing that will be breaking if pipeline endpoint only returns namespaced pipelines is that:

  • we will change KFP UI to query pipelines in one namespace only
  • so when the user opens KFP UI for a namespace, it will show nothing

Reiterating on my goal here again: I'd want a user using KFP to be able to upgrade to a new version while keeping all the existing functionality still working --- including KFP UI and KFP SDK usages. All the information that were available should still be available.

And after going over these scenarios, I think the minimal effort fix is to:
Add a special flag -- "includeShared" (you might find a better name for it) -- when querying KFP pipeline endpoints

The flag defaults to false, because that's the desired long term behavior of only showing namespaced pipelines
When KFP UI lists pipelines, it should query the namespace with includeShared=true, so that all previously shown pipelines still show up, while namespaced pipelines should also show up.

Other use-cases should have no problem with the new behavior:

  • if create pipeline request do not specify namespace, it's a shared pipeline as before
  • if create pipeline request contains a namespace, it's a namespaced pipeline
  • if list pipeline request do not specify namespace, it lists all shared pipelines as before
  • if list pipeline request contains a namespace, it lists namespaced pipelines
  • if list pipeline request contains a namespace and includeShared=true, it lists both namespaced pipelines and shared pipelines (this will initially only be used by KFP UI to make UI UX backward compatible, we can figure out a long term path about this flag later)
  • if create run request specifies a pipeline, we need to verify the user has access permission to this pipeline, either it's shared or the user has get access to pipelines in this namespace

This is for discussion, what do you think? I think it'll be better to wait until we all agree on this topic before starting coding.

@maganaluis
Copy link
Contributor Author

@Bobgy As suggested, I removed all the UI changes, and ensured backwards compatibility.

1. User does not provide resource reference
2. User provides resource reference to public namesapce ""
3. User provides resource reference to namespace x

I tested the namespaced functionality, and this also works as expected. If we are not going to make any changes on the UI, does it make sense to include the Python API changes or should we remove it as well?

@Bobgy
Copy link
Contributor

Bobgy commented Feb 22, 2021

@maganaluis Thanks! Let me take a look.

It's recommended to keep Python api changes, because it should reflect latest API status.

@Bobgy
Copy link
Contributor

Bobgy commented Feb 22, 2021

/lgtm
/approve

Thank you again for the continued efforts!
I think this is good to go.

/hold
To give @StefanoFioravanzo @yanniszark @capri-xiyue a last chance to review.

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Bobgy, maganaluis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Bobgy, maganaluis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yanniszark
Copy link
Contributor

Thanks @Bobgy @maganaluis, I'll try to take a look today or tomorrow.

Copy link
Contributor

@yanniszark yanniszark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maganaluis thanks for the great work on this. I mainly have a few nits in the code and some questions around SQL indexes and migrations.

backend/api/pipeline.proto Show resolved Hide resolved
backend/src/apiserver/model/pipeline.go Show resolved Hide resolved
Comment on lines +229 to +233
response = db.Model(&model.Pipeline{}).RemoveIndex("Name")
if response.Error != nil {
glog.Fatalf("Failed to drop unique key on pipeline name. Error: %s", response.Error)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the migration code be wrapped in a transaction (remove index + add new)?
This is in order to not leave the database in an invalid state, in case the migration fails to complete.
Also, is this code idempotent? Meaning, what would happen if the "Name" index doesn't exist? Would it fail?

Maybe gorm does it automatically if the (name, namespace) index is declared on the Go struct (in AutoMigrate)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my understanding as well, and yes I also believe the operation is idempotent, I remember testing this locally. I added a comment here in how it would look like given KFP switches to the gorm main fork.
#5125

I think it's best to leave the operations separately, that's the pattern currently being used with this fork.

We tested this from Kubeflow 1.1 and we are using the code I wrote in production. Which has this logic, we did not observe any issues and we also tested it on the MySQL in cluster which Kubeflow comes with and an Azure MySQL.

if err != nil {
return "", util.Wrap(err, "Failed to get namespace from versionId ID")
}
pipeline, err := r.GetPipeline(pipelineVersion.PipelineId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reuse GetNamespaceFromPipelineID here?

Comment on lines 203 to 222
refKey := filterContext.ReferenceKey
if refKey == nil {
// In single user mode, apply filter with empty namespace for backward compatibile.
filterContext = &common.FilterContext{
ReferenceKey: &common.ReferenceKey{Type: common.Namespace, ID: ""},
}
}
if refKey != nil && refKey.Type != common.Namespace {
return nil, util.NewInvalidInputError("Invalid resource references for pipelines. ListPipelines requires filtering by namespace.")
}
if refKey != nil && refKey.Type == common.Namespace {
namespace := refKey.ID
resourceAttributes := &authorizationv1.ResourceAttributes{
Namespace: namespace,
Verb: common.RbacResourceVerbList,
}
if err = s.CanAccessPipeline(ctx, "", resourceAttributes); err != nil {
return nil, util.Wrap(err, "Failed to authorize with API resource references")
}
}

Copy link
Contributor

@yanniszark yanniszark Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would rewrite that as:

	refKey := filterContext.ReferenceKey

        // Validate first
	if refKey != nil && refKey.Type != common.Namespace {
		return nil, util.NewInvalidInputError("Invalid resource references for pipelines. ListPipelines requires filtering by namespace.")
	}

	if refKey == nil {
		// In single user mode, apply filter with empty namespace for backward compatibile.
		filterContext = &common.FilterContext{
			ReferenceKey: &common.ReferenceKey{Type: common.Namespace, ID: ""},
		}
	}

	namespace := refKey.ID
	resourceAttributes := &authorizationv1.ResourceAttributes{
		Namespace: namespace,
		Verb:      common.RbacResourceVerbList,
	}
	if err = s.CanAccessPipeline(ctx, "", resourceAttributes); err != nil {
		return nil, util.Wrap(err, "Failed to authorize with API resource references")
	}

because it eliminates an if clause and imo makes it less complex. Your call though :)

*/
refKey := filterContext.ReferenceKey
if refKey == nil {
// In single user mode, apply filter with empty namespace for backward compatibile.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for both multi-user and single-user at this moment right? It's for general backwards-compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but you'll actually want to keep this code here going forward. Otherwise you'll run into issues with KFP standalone.

Comment on lines 50 to 57
func GetPipelineNamespace(queryString string) (string, error) {
pipelineNamespace, err := url.QueryUnescape(queryString)
if err != nil {
return "", util.NewInvalidInputErrorWithDetails(err, "Pipeline namespace in the query string has invalid format.")
}
return pipelineNamespace, nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since this function is only used in the upload server, maybe keep it there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Done.

Comment on lines 95 to 105
if filterContext.ReferenceKey != nil && filterContext.ReferenceKey.Type == common.Namespace {
glog.Info("Using Namespace to filter the query")
query = query.Where(
sq.Eq{"pipelines.Status": model.PipelineReady,
"pipelines.Namespace": filterContext.ReferenceKey.ID},
)
} else {
query = query.Where(
sq.Eq{"pipelines.Status": model.PipelineReady},
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this if/else needed? From what I understand, the filterContext always contains a namespace reference key, because the ListPipelines endpoint sets it to the default value if it's not present.

Copy link
Contributor Author

@maganaluis maganaluis Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to check this, otherwise you'll get a nill pointer while checking for the ID. Same on the List Pipelines method. It's mostly due to keeping backwards-compatibility with KFP stand-alone.

@k8s-ci-robot k8s-ci-robot removed the lgtm label Feb 26, 2021
@google-oss-robot
Copy link

New changes are detected. LGTM label has been removed.

corrected typo


updating code based on review


fixes for pipelines server


reverting this back
@maganaluis
Copy link
Contributor Author

@Bobgy @yanniszark Thank you for the reviews. It's good to go from my side.

@Bobgy
Copy link
Contributor

Bobgy commented Feb 26, 2021

/LGTM

Thank you a lot again @maganaluis @yanniszark!
I think a lot of people are looking forward to this.

Let's get this going! If there are any problems, we can always come back to fix.

@Bobgy
Copy link
Contributor

Bobgy commented Feb 26, 2021

/unhold

@google-oss-robot google-oss-robot merged commit 5df2801 into kubeflow:master Feb 26, 2021
@Bobgy Bobgy changed the title feat(backend): Added multi-user pipelines (UI + API); Fixes #4197 feat(backend): Added multi-user pipelines API; Fixes #4197 Jul 7, 2021
@Bobgy Bobgy changed the title feat(backend): Added multi-user pipelines API; Fixes #4197 feat(backend): Added multi-user pipelines API. Fixes #4197 Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants