Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP namespace isolation for artifacts and pipeline definitions #7406

Closed
wants to merge 54 commits into from

Conversation

juliusvonkohout
Copy link
Member

@juliusvonkohout juliusvonkohout commented Mar 11, 2022

Description of your changes:
Fixes #4649 by using one bucket per user/namespace
Fixes #5084 as seen on the screenshot. The kfp python SDK should also get an update
Fixes #4197

The detailed proposal is in the first failed approach #7219 (comment) and in this proposal
KFP Namespace isolation for pipeline definition and execution.docx

@ca-scribner @jacobmalmberg please test. You have to do the user and bucket management yourself. I have a working version with user, policy and bucket management but this is the first step.

@gabornyerges will make it mergeable

image

Checklist:

Signed-off-by: Gabor Nyerges <gabor@nyerg.es>
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 3 hours 37 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 15 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 10 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 1 minute
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 10 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 38 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 27 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 28 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 2 hours 31 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 3 hours 37 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 15 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 10 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 1 minute
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 10 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 38 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 27 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 28 minutes
Signed-off-by: Gabor Nyerges <gabor@nyerg.es>

Took 2 hours 31 minutes
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign zijianjoy after the PR has been reviewed.
You can assign the PR to them by writing /assign @zijianjoy in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@juliusvonkohout juliusvonkohout changed the title WIP fix:(backend) namespace isolation for artifacts and pipeline definitions WIP namespace isolation for artifacts and pipeline definitions Mar 31, 2022
@juliusvonkohout
Copy link
Member Author

@Bobgy @zijianjoy @StefanoFioravanzo @chensun i would really like to know from you whether this has a chance to be merged. It really separates the users and would make v1 pipelines enterprise ready from a security perspective.

@juliusvonkohout
Copy link
Member Author

@gabornyerges #7447 is a pull request that also implements the frontend part and passes the unittests. We could also wait for #4197 (comment) and merge this here without the frontend changes.

@gabornyerges
Copy link

@gabornyerges #7447 is a pull request that also implements the frontend part and passes the unittests. We could also wait for #4197 (comment) and merge this here without the frontend changes.

i will look into both PRs.

@grobbie
Copy link

grobbie commented Apr 1, 2022

FWIW I'd like to see the bucket-per-user namespace parts merged regardless of which frontend implementation PR gets selected, and it looks like you've done a huge amount of work here. For me the Arikkto approach to the frontend is okay, although maybe overkill since the feedback we've had from users is overwhelmingly that they want pipelines isolated per user namespace. Anyway I've made my PR a draft for now.

Looking at the failed e2e test - looks like you didn't run npm run lint on the frontend?

 TS2339: Property 'namespace' does not exist on type 'typeof QUERY_PARAMS'.
Step #2 - "frontend":     129 |     const urlParser = new URLParser(props);
Step #2 - "frontend":     130 |     const pipelineId = urlParser.get(QUERY_PARAMS.pipelineId);
Step #2 - "frontend":   > 131 |     const namespace = urlParser.get(QUERY_PARAMS.namespace) || undefined;

Signed-off-by: Gabor Nyerges <gabor@nyerg.es>
@google-oss-prow google-oss-prow bot added size/XXL and removed size/XL labels Apr 1, 2022
@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Apr 19, 2022

@arllanos @maganaluis this might be relevant for you too, since you started working on it last spring https://docs.google.com/document/d/1DVHbT1RIv_VaIzWz77YCBYlBX7WEMW61AAm-ZHdWZFU/edit?usp=sharing

@google-oss-prow
Copy link

@juliusvonkohout: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
kubeflow-pipeline-frontend-test 7241afc link true /test kubeflow-pipeline-frontend-test
kubeflow-pipeline-e2e-test 7241afc link true /test kubeflow-pipeline-e2e-test
kubeflow-pipeline-upgrade-test 7241afc link true /test kubeflow-pipeline-upgrade-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented May 5, 2022

Due to newly discovered funtionality in argo we will not use multiple buckets but just multiple folders and the following iam policy. Furtheremore e.g. AWS has a limit of 100 buckets so we should definitely stick with 1 bucket.

One bucket multiple folder namespace isolation

  1. change the workflow-controller configmap by adding workflow.namespace to keyFormat: "private-artifacts/{{workflow.namespace}}/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}"
    We do not need the per namespace artifactrepositories configmaps anymore, but users can still use them to change the default artifact storage defined above.

  2. We still need to create a minio user in sync.py and and save the individual credentials to the per namespace secret. But we now need only one policy although this policy must still be attached to every user. So just some small changes in sync.py. We do not need to create additional buckets anymore

"Resource": [
    "arn:aws:s3:::%s"  % shared_bucket_name,
    "arn:aws:s3:::%s/private-artifacts/${aws:username}/*"  % shared_bucket_name, # private artifacts
    "arn:aws:s3:::%s/private/${aws:username}/*"  % shared_bucket_name, # private storage
    "arn:aws:s3:::%s/shared/*"  % shared_bucket_name # shared storage for collaboration

We exclude
    "arn:aws:s3:::%s/pipelines/*"  % shared_bucket_name, # shared pipelines 
because the the apiserver, not the user should manage pipeline  definitions on minio.
So changing environment variable https://github.com/kubeflow/pipelines/blob/49cdb42c129dce653b542d4323d7087f67097e2e/backend/src/apiserver/client_manager.go#L43 and splitting  the storage into shared and private pipelines becomes unecessary
  1. We can revert to the upstream images for apiserver, persistenceagent etc. Only ml-pipeline-ui changes from WIP namespace isolation for artifacts and pipeline definitions #7406 must be kept for managing namespaced pipeline definitions in the UI. The pipeline-ui from the PR 7406 does not need to be changed since it uses the apiserver properly according to gabors changes in frontend/src/pages/PipelineList.tsx

  2. Independent of folders vs buckets we need to add two small things from my comments in https://docs.google.com/document/d/19OfU-hIsY4xBKA6b_F3dYIbSgrLMrVshucUSsyqotQ4/edit#heading=h.qi0ol2nggxlt for a bit extra security in the readartifact

    func (s *RunServer) ReadArtifact(ctx context.Context, request *api.ReadArtifactRequest) (*api.ReadArtifactResponse, error) {
    and reportmetrics
    func (s *RunServer) ReportRunMetrics(ctx context.Context, request *api.ReportRunMetricsRequest) (*api.ReportRunMetricsResponse, error) {
    call.
    Please took a look at
    func (s *RunServer) canAccessRun(ctx context.Context, runId string, resourceAttributes *authorizationv1.ResourceAttributes) error {
    if common.IsMultiUserMode() == false {
    // Skip authz if not multi-user mode.
    return nil
    }
    if len(runId) > 0 {
    runDetail, err := s.resourceManager.GetRun(runId)
    if err != nil {
    return util.Wrap(err, "Failed to authorize with the experiment ID.")
    }
    if len(resourceAttributes.Namespace) == 0 {
    if len(runDetail.Namespace) == 0 {
    return util.NewInternalServerError(
    errors.New("Empty namespace"),
    "The run doesn't have a valid namespace.",
    )
    }
    resourceAttributes.Namespace = runDetail.Namespace
    }
    if len(resourceAttributes.Name) == 0 {
    resourceAttributes.Name = runDetail.Name
    }
    }
    resourceAttributes.Group = common.RbacPipelinesGroup
    resourceAttributes.Version = common.RbacPipelinesVersion
    resourceAttributes.Resource = common.RbacResourceTypeRuns
    err := isAuthorized(s.resourceManager, ctx, resourceAttributes)
    if err != nil {
    return util.Wrap(err, "Failed to authorize with API resource references")
    }
    return nil
    }
    and
    func isAuthorized(resourceManager *resource.ResourceManager, ctx context.Context, resourceAttributes *authorizationv1.ResourceAttributes) error {
    .
    We could also wait for the Vmware developers to do it.

@juliusvonkohout
Copy link
Member Author

closed in favor of #7725

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants