Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): add namespace & prefix scoped credentials to kfp-launcher config for object store paths #10625

Merged
merged 8 commits into from
Apr 16, 2024

Conversation

HumairAK
Copy link
Contributor

@HumairAK HumairAK commented Mar 27, 2024

Description of your changes:

resolves most of: #9689 and #10651
the mlpipeline endpoint hardcoding is not addressed here.

This pr extends the kfp-launcher config to support configurations for supporting different credentials for s3/minio/gcs for different path prefixes. The config is parsed in the driver, and passed to executor via mlmd context custom property.

A fleshed out kfp-launcher configmap can look like this:

kind: ConfigMap
apiVersion: v1
metadata:
  name: kfp-launcher
  namespace: kubeflow
data:
  defaultPipelineRoot: "s3://my-s3-bucket-1/pipelines"
  providers: |
    <provider_config>

where the <provider_config> can look like:

## ==========================
## configs for `s3://` paths
## ==========================
s3:
  ## default configs, if not overridden
  default:
    endpoint: "s3.amazonaws.com"
    region: "us-west-2"
    disableSSL: false
    credentials:
      fromEnv: false
      secretRef:
        secretName: "my-s3-secret-1"
        accessKeyKey: "AWS_ACCESS_KEY_ID"
        secretKeyKey: "AWS_SECRET_ACCESS_KEY"

  ## configs for specific bucket + prefix combinations
  overrides:

    ## configs for `s3://my-s3-bucket-2/SOME/KEY/PREFIX`
    - bucketName: "my-s3-bucket-2"
      keyPrefix: "SOME/KEY/PREFIX"
      #endpoint: "s3.amazonaws.com"
      #region: "us-west-1"
      #disableSSL: false
      credentials:
        #fromEnv: false
        secretRef:
          secretName: "my-s3-secret-2"
          accessKeyKey: "AWS_ACCESS_KEY_ID"
          secretKeyKey: "AWS_SECRET_ACCESS_KEY"

## ==========================
## configs for `minio://` paths
## ==========================
minio:
  ...
  (same structure as `s3`)
  ...

## ==========================
## for `gs://` paths
## ==========================
gs:

  ## default configs, if not overridden
  default:
    credentials:
      fromEnv: false
      secretRef:
        secretName: "my-gcs-secret-1"
        tokenKey: "service_account.json"

  ## configs for specific bucket + prefix combinations
  overrides:

    ## select for `gcs://my-gcp-bucket-1/SOME/KEY/PREFIX`
    - bucketName: "my-gcp-bucket-1"
      keyPrefix: "SOME/KEY/PREFIX"
      credentials:
        #fromEnv: false
        secretRef:
          secretName: "my-gcs-secret-2"
          tokenKey: "service_account.json"
    
    ## select for `gs://my-gcp-bucket-1/SOME/OTHER/KEY/PREFIX`
    - bucketName: "my-gcp-bucket-1"
      keyPrefix: "SOME/OTHER/KEY/PREFIX"
      credentials:
        #fromEnv: false
        secretRef:
          secretName: "my-gcs-secret-3"
          tokenKey: "service_account.json"

In this example, a defaultPipelineRoot: 'gs://my-gcp-bucket-1/SOME/OTHER/KEY/PREFIX' would select the first gs provider with the matching keyprefix, in this case that would be:

  ## select for `gs://my-gcp-bucket-1/SOME/OTHER/KEY/PREFIX`
  - bucketName: "my-gcp-bucket-1"
    keyPrefix: "SOME/OTHER/KEY/PREFIX"
    credentials:
      #fromEnv: false
      secretRef:
        secretName: "my-gcs-secret-3"
        tokenKey: "service_account.json"

See the unit tests in env_test.go for the various different edge cases, please feel free to identify others that ought to be covered. The testdata in the test cases is pulled from here.

This config is parsed in the driver, and when the matching credentials are identified, the information is stored as a mlmd contextproperty (stringified json) with name: store_session_info, an example of how this property looks like:

For GS buckets:

{
  "Provider": "gs",
  "Params": {
    "fromEnv": "false",
    "secretName": "gcs-secret",
    "tokenKey": "credentials.json"
  }
}

For AWS S3 buckets (minio, and other s3 compatible store are identical):

{
  "Provider": "s3",
  "Params": {
    "accessKeyKey": "customaccessKey",
    "disableSSL": "false",
    "endpoint": "s3.amazonaws.com",
    "fromEnv": "false",
    "region": "us-east-2",
    "secretKeyKey": "customsecretKey",
    "secretName": "s3secret"
  }
}

If fromEnv is set to true, then the secret information is omitted from Params.

Doing this in the Driver allows us to fetch the configmap once per pipeline execution in the driver, and retrieve it from context custom properties.

Considerations

  • I've tried to make it so that the original behavior remains unchanged, so if providers is never specified in kfp-launcher, then users should not notice a difference regardless of if they are using the default minio, or gcs, etc.
  • I've re-organized the objectstore config from the object store "operational" logic
  • a new interface was added to handle config parsing for multiple provider support in the kfp-launcher config, I think a good follow up would be to also split up objectstore.go into a "storemanager" interface that specifies all the blob operations "upload/download/etc.", for s3/minio/gcs this would be the same, but would be useful to have when extending it to other store formats. Making it easier to add additional store provider support.
  • the decision to make store_session_info have a dynamic structure "Params" was due to different providers having different cred / config requirements, keeping it unstructured gives us a bit more flexibility here.

Checklist:

@thesuperzapper
Copy link
Member

thesuperzapper commented Mar 27, 2024

Because any of the configs might need to be overwritten for a specific bucket+prefix combination (including endpoint, region, and credentials) it might make more sense to use a structure like:

defaultPipelineRoot: "s3://my-s3-bucket-1/pipelines"
providers:

  ## ==========================
  ## configs for `s3://` paths
  ## ==========================
  s3:

    ## default configs, if not overridden
    default:
      endpoint: "s3.amazonaws.com"
      region: "us-west-2"
      disableSSL: false
      credentials:
        fromEnv: false
        secretRef:
          secretName: "my-s3-secret-1"
          accessKeyKey: "AWS_ACCESS_KEY_ID"
          secretKeyKey: "AWS_SECRET_ACCESS_KEY"

    ## configs for specific bucket + prefix combinations
    overrides:

      ## configs for `s3://my-s3-bucket-2/SOME/KEY/PREFIX`
      - bucketName: "my-s3-bucket-2"
        keyPrefix: "SOME/KEY/PREFIX"
        #endpoint: "s3.amazonaws.com"
        #region: "us-west-1"
        #disableSSL: false
        credentials:
          #fromEnv: false
          secretRef:
            secretName: "my-s3-secret-2"
            accessKeyKey: "AWS_ACCESS_KEY_ID"
            secretKeyKey: "AWS_SECRET_ACCESS_KEY"

  ## ==========================
  ## configs for `minio://` paths
  ## ==========================
  minio:
    ...
    (same structure as `s3`)
    ...

  ## ==========================
  ## for `gcs://` paths
  ## ==========================
  gcs:

    ## default configs, if not overridden
    default:
      credentials:
        fromEnv: false
        secretRef:
          secretName: "my-gcs-secret-1"
          tokenKey: "service_account.json"

    ## configs for specific bucket + prefix combinations
    overrides:

      ## configs for `gcs://my-gcp-bucket-1/SOME/KEY/PREFIX`
      - bucketName: "my-gcp-bucket-1"
        keyPrefix: "SOME/KEY/PREFIX"
        credentials:
          #fromEnv: false
          secretRef:
            secretName: "my-gcs-secret-2"
            tokenKey: "service_account.json"

@thesuperzapper
Copy link
Member

thesuperzapper commented Mar 27, 2024

Also, I haven't thoroughly tested, but my initial thoughts are:

  • I think you may have broken support for AWS IRSA for S3:
    • IRSA is is EKS's Pod ServiceAccount-based authentication for AWS endpoints.
    • Effectively, you just need to have an option where you run blob.OpenBucket( for S3 without providing and access_key + secret_key, so it pulls from the environment's auth.
    • The way I usually handle this is by having a fromEnv flag which if true, will ignore the secretRef.
  • I think you may have broken the existing support for GCS buckets:
    • Obviously, one can use the GCS XML API with HMAC keys (access_key + secret_key pairs), and pretend that it's a "minio://" endpoint.
    • But previously, the code would open the bucket with blob.OpenBucket( for GCS using native gs:// prefix, which tells the gocloud.dev go library to use the Application Default Credentials to retrieve GCP access keys.
    • It also important to note that GCP does not use (access_key + secret_key pairs) for authentication, so the structure of the GCS one should be different.
  • If we allow multiple buckets of each "type" (s3, etc) the front-end will also need to have its logic improved, so it can select from multiple credentials (when previewing the contents of the artifacts in the runs page):
    • The main code which does this is here:
      export function getArtifactsHandler({
      artifactsConfigs,
      useParameter,
      tryExtract,
      }: {
      artifactsConfigs: {
      aws: AWSConfigs;
      http: HttpConfigs;
      minio: MinioConfigs;
      allowedDomain: string;
      };
      tryExtract: boolean;
      useParameter: boolean;
      }): Handler {
      const { aws, http, minio, allowedDomain } = artifactsConfigs;
      return async (req, res) => {
      const source = useParameter ? req.params.source : req.query.source;
      const bucket = useParameter ? req.params.bucket : req.query.bucket;
      const key = useParameter ? req.params[0] : req.query.key;
      const { peek = 0 } = req.query as Partial<ArtifactsQueryStrings>;
      if (!source) {
      res.status(500).send('Storage source is missing from artifact request');
      return;
      }
      if (!bucket) {
      res.status(500).send('Storage bucket is missing from artifact request');
      return;
      }
      if (!key) {
      res.status(500).send('Storage key is missing from artifact request');
      return;
      }
      console.log(`Getting storage artifact at: ${source}: ${bucket}/${key}`);
      switch (source) {
      case 'gcs':
      getGCSArtifactHandler({ bucket, key }, peek)(req, res);
      break;
      case 'minio':
      getMinioArtifactHandler(
      {
      bucket,
      client: new MinioClient(minio),
      key,
      tryExtract,
      },
      peek,
      )(req, res);
      break;
      case 's3':
      getMinioArtifactHandler(
      {
      bucket,
      client: await createMinioClient(aws),
      key,
      },
      peek,
      )(req, res);
      break;
      case 'http':
      case 'https':
      getHttpArtifactsHandler(
      allowedDomain,
      getHttpUrl(source, http.baseUrl || '', bucket, key),
      http.auth,
      peek,
      )(req, res);
      break;
      case 'volume':
      await getVolumeArtifactsHandler(
      {
      bucket,
      key,
      },
      peek,
      )(req, res);
      break;
      default:
      res.status(500).send('Unknown storage source');
      return;
      }
      };
      }
    • Note, the user makes requests to the ml-pipeline-ui pods in the kubeflow namespace, but their requests are actually proxied to the ml-pipeline-ui-artifacts Pod in the profile namespace selected by the?namespace= query parameter by this code:
      export function getArtifactsProxyHandler({
      enabled,
      allowedDomain,
      namespacedServiceGetter,
      }: {
      enabled: boolean;
      allowedDomain: string;
      namespacedServiceGetter: NamespacedServiceGetter;
      }): Handler {
      if (!enabled) {
      return (req, res, next) => next();
      }
      return proxy(
      (_pathname, req) => {
      // only proxy requests with namespace query parameter
      return !!getNamespaceFromUrl(req.url || '');
      },
      {
      changeOrigin: true,
      onProxyReq: proxyReq => {
      console.log('Proxied artifact request: ', proxyReq.path);
      },
      pathRewrite: (pathStr, req) => {
      const url = new URL(pathStr || '', DUMMY_BASE_PATH);
      url.searchParams.delete(QUERIES.NAMESPACE);
      return url.pathname + url.search;
      },
      router: req => {
      const namespace = getNamespaceFromUrl(req.url || '');
      if (!namespace) {
      console.log(`namespace query param expected in ${req.url}.`);
      throw new Error(`namespace query param expected.`);
      }
      const urlStr = namespacedServiceGetter(namespace!);
      if (!isAllowedDomain(urlStr, allowedDomain)) {
      console.log(`Domain is not allowed.`);
      throw new Error(`Domain is not allowed.`);
      }
      return namespacedServiceGetter(namespace!);
      },
      target: '/artifacts',
      headers: HACK_FIX_HPM_PARTIAL_RESPONSE_HEADERS,
      },
      );
      }
    • That is, the user makes requests to either of the following paths, when browsing the artifacts of the past runs (it's a bit strange, but sometimes it's "path based" and other times it's "query-based":
      • https://kubeflow.example.com/pipeline/artifacts/{s3,minio}/{bucket_name}/{object_key}?namespace={profile}
      • https://kubeflow.example.com/pipeline/artifacts/get?source={s3,minio}&namespace={profile}&bucket={bucket_name}&key={object_key}
    • Note, because the frontend uses the minio-js package (which only supports S3 object stores), artifacts on gs:// don't currently work for the previews.

@HumairAK
Copy link
Contributor Author

HumairAK commented Mar 27, 2024

thanks @thesuperzapper for the feedback
may I suggest inline comments so we can have separate discussion threads on the different points being made her? lest it get confusing.

but to quickly address some of your points:

it might make more sense to use a structure like:

yeah this is one of the considerations listed above, I support this idea, I'll make the amendments

I think you may have broken support for AWS IRSA for S3:

In both instances being pointed out, if there are no providers then it should default to the old behavior of calling OpenBucket without any creds being passed in, so it shouldn't break anything, if it does, please point out and I'll address it.

However, it seems like there is definitely no support for IRSA with the new extension to the config. Your suggestion fromEnv sounds good to me, so I'll go ahead and try that out.

I think you may have broken the existing support for GCS buckets:

Similar to the above, I think the default behavior should work. But the config option is clearly not configured with the right fields (I'll have to my research on gcs here, admittedly I'm not as familiar in this).

In the community call @zijianjoy also suggested to separate out the configs and have them specially catered for each provider, (example, region for minio here doesn't make much sense). Will update this.

front-end will also need to have its logic improved

I'll take a look at this as well, thanks!

@thesuperzapper
Copy link
Member

@HumairAK MinIO actually does need to allow region, as you can have MINIO_SITE_REGION set on your MinIO instance.

But an even more important reason is that the minio:// one will probably be used as a catch-all for s3-compatible object stores which may do different things with region.

@HumairAK
Copy link
Contributor Author

HumairAK commented Mar 28, 2024

@thesuperzapper

@HumairAK MinIO actually does need to allow region, as you can have MINIO_SITE_REGION set on your MinIO instance.

Interesting, did not know this, will include this then as optional for minio

But an even more important reason is that the minio:// one will probably be used as a catch-all for s3-compatible object stores which may do different things with region.

I would have expected the s3 configuration to be used for any s3 compatible object store, and not specifically tied to aws

@HumairAK
Copy link
Contributor Author

HumairAK commented Apr 3, 2024

/test kubeflow-pipelines-samples-v2

@rimolive
Copy link
Member

rimolive commented Apr 9, 2024

/lgtm

@rimolive
Copy link
Member

rimolive commented Apr 9, 2024

/assign @Tomcli @chensun

@HumairAK
Copy link
Contributor Author

HumairAK commented Apr 11, 2024

@rimolive this pr is currently missing the UI component, so it's not ready for merge (unless we want to do that in a follow up PR)

until then I've kept the WIP prefix

@rimolive
Copy link
Member

@HumairAK But the PR is a backend feature, right? For the frontend side, you should have a second PR with feat(frontend) preffix.

This looks good for a backend PR, so I don't see a problem with merging it.

@HumairAK HumairAK changed the title WIP: feat(backend): add namespace scoped credentials to kfp-launcher config for object store paths feat(backend): add namespace & prefix scoped credentials to kfp-launcher config for object store paths Apr 11, 2024
Copy link
Member

@Tomcli Tomcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Just some minor comments on the naming convention. Otherwise it looks good to me. From the e2e test, I think it didn't break any of the existing frontend features right?

backend/src/v2/metadata/client.go Outdated Show resolved Hide resolved
backend/src/v2/objectstore/object_store.go Show resolved Hide resolved
@thesuperzapper
Copy link
Member

@HumairAK I just want to confirm if you are actually ready to have this PR reviewed?

Are there any remaining tasks from #10625 (comment) and our other discussions.

@HumairAK
Copy link
Contributor Author

HumairAK commented Apr 11, 2024

@thesuperzapper the backend portion is ready for review, based on the discussion above, I will follow up with a different pr to provide the front end support for this

it should cover everything in your comment, save for the UI bit

@thesuperzapper
Copy link
Member

thesuperzapper commented Apr 11, 2024

@HumairAK I want to make sure we don't break the "query parameter" approach introduced by #10319, which allows bucket roots to look like this:

s3://{bucket_name}/v2/artifacts/{profile_name}?region={bucket_region}&endpoint={endpoint}&disableSSL={not_use_ssl}&s3ForcePathStyle=true

(See https://gocloud.dev/howto/blob/#s3-compatible for the syntax)

I am not sure what the best way to deal with this is. Perhaps we should just ignore pipeline roots that have query parameters set, and revert back to using blob.OpenBucket(ctx, config.bucketURL()) in ./backend/src/v2/objectstore/object_store.go for those cases.

Also, the query parameter approach right now only works for s3://. I was planning to translate minio:// bucket roots (which have query parameters) to s3:// internally before running blob.OpenBucket(ctx, config.bucketURL()). This will allow KFP to correctly identify the bucket as MinIO, so the frontend doesn't try and reach out to s3:// just because the user was forced to use it to set the endpoint query parameter.

We might as well do that in this PR also.

@HumairAK
Copy link
Contributor Author

HumairAK commented Apr 12, 2024

@thesuperzapper
made some additional fixes for handling query parameters, and the default cases where there is no provider config and the standard kfp-launcher is used with only a defaultpipelineroot set.

Tested query parameters with minio and aws S3 and it worked for me. Also tested with GCS buckets, with fromEnv = true and and GOOGLE_APPLICATION_CREDENTIALS set via sdk (as well as with creds configured via launcher config).

You can easily test this PR by buildling the launcher/driver and setting: V2_LAUNCHER_IMAGE and V2_DRIVER_IMAGE in kfp server pod respectively. (I'm also on the kubeflow & cncf slack if you need faster comms).

Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Instead of only reading the kfp-launcher when a custom pipeline root is specified, the root dag will now always read the kfp-launcher config to search for a matching bucket if such a configuration is provided in kfp-launcher

Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Provides a structured configuration for bucket providers, whereby user can specify credentials for different providers and path prefixes. A new interface for providing sessions is introduced, which should be implemented for any new provider configuration support.

Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Utilizes blob provider specific constructors to open s3, minio, gcs accordingly. If a sessioninfo is provided (via kfp-launcher config) then the associated secret is fetched for each case to gain credentials. If fromEnv is provided, then the standard url opener is used. Also separates out config fields and operations to a separate file.

Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
retrieves the session info (if provided via kfp-launcher) and utilizes it for opening the provider's associated bucket

Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
also added some additional code comments clarifying store cred variable usage

Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
as well as update validation logic for provider config, and fix tests
accordingly.

Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Copy link

@HumairAK: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
kubeflow-pipelines-samples-v2 b5d38ce link false /test kubeflow-pipelines-samples-v2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@rimolive
Copy link
Member

/lgtm

cc @chensun

@google-oss-prow google-oss-prow bot added the lgtm label Apr 16, 2024
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chensun, Tomcli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 5e0f9b1 into kubeflow:master Apr 16, 2024
5 of 6 checks passed
@sanchesoon
Copy link

Hi all,
What do you think to add S3ForcePathStyle as region and others?

@HumairAK
Copy link
Contributor Author

HumairAK commented May 9, 2024

@sanchesoon do you mean as a field for the kfp-launcher? can you raise another issue for that?

@sanchesoon
Copy link

sanchesoon commented May 9, 2024

@HumairAK yes, your are right
I have created #10814
BTW thanks a lot for your work in this feature. kfp-launcher config params is very helpful.
Also can pick up this feature (#10814) to do ?

@HumairAK
Copy link
Contributor Author

No problem! and thanks for the feedback.
I have taken the issue, the change should be fairly straight forward, once I get some bandwidth I'll add this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants