Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate clickable artifact url for s3 URI #3531

Merged
merged 4 commits into from
May 31, 2020

Conversation

Jeffwan
Copy link
Member

@Jeffwan Jeffwan commented Apr 17, 2020

This is a follow up PR of #3530, resolves some issues of #3405

link

We need to have this PR to generate clickable artifact url for s3 URI.

It's working now. Click link will automatically download file. (we may consider to redirect user to AWS S3 console later, this is good enough now.)

image

Signed-off-by: Jiaxin Shan seedjeffwan@gmail.com

/cc @eterna2 @Ark-kun @Bobgy @gautamkmr

@kubeflow-bot
Copy link

This change is Reviewable

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 17, 2020

Is there a standard S3 URI format that includes endpoint? It seems to be lost when translating the whole Argo S3 artifact structure to a single URI.

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 17, 2020

What is we always use s3://<endpoint>/bucket/key format in both Metadata Writer and the UX? (or 's3_with_endpoint://...').
This way we can retain all data in the single URI for Minio, GCS and AWS.

@Bobgy
Copy link
Contributor

Bobgy commented Apr 17, 2020

Despite the question from @Ark-kun, this looks good to me
/retest
/lgtm

@k8s-ci-robot
Copy link
Contributor

@Bobgy: The /retest command does not accept any targets.
The following commands are available to trigger jobs:

  • /test kubeflow-pipeline-frontend-test
  • /test kubeflow-pipeline-backend-test
  • /test kubeflow-pipeline-e2e-test
  • /test kubeflow-pipeline-sample-test
  • /test kubeflow-pipeline-integration-test
  • /test kubeflow-pipeline-mkp-test
  • /test kubeflow-pipeline-upgrade-test

Use /test all to run all jobs.

In response to this:

Despite the question from @Ark-kun, this looks good to me
/retest
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@eterna2
Copy link
Contributor

eterna2 commented Apr 17, 2020

@Ark-kun

I think we might need to discuss this.

Not sure about metadata, but for frontend, the uri format has always been

  • storage://bucket/folder/obj

No endpoint info encoded inside, as we assume the endpoints corresponds to the respective storage medium configured in the frontend node server.

Currently, the uri is just a visual text, because the actual link is an API call with the query strings, storage, bucket, and key.

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 17, 2020

storage, bucket, and key.

Is this enough to refer to a specific object? I thought that with Amazon S3, every bucket lives in a specific regions and there are region-specific endpoints (e.g. s3.us-east-1.amazonaws.com or s3.eu-west-3.amazonaws.com) as opposed to Google's single S3 endpoint storage.googleapis.com. Currently Amazon still supports the legacy global endpoint s3.amazonaws.com, but they have plans to deprecate it.

Is this correct?

@Jeffwan
Copy link
Member Author

Jeffwan commented Apr 17, 2020

What is we always use s3:///bucket/key format in both Metadata Writer and the UX? (or 's3_with_endpoint://...').

@Ark-kun

I think that would be perfect if minio client supports this. Since user will fill endpoint information anyway, we don't need extra change to support different backends. Do you have any reference on this path support? s3://<endpoint>/bucket/key This doesn't seem a valid format in native S3. Currently, these are supported. There're some other

  • Legacy Global Endpoint s3.amazonaws.com
  • Regional s3.<region>.amazonaws.com
  • Virtual Hosted Style Access https://bucket-name.s3.Region.amazonaws.com/key name
  • Path-Style Access https://s3.Region.amazonaws.com/bucket-name/key name

Yes, s3.amazonaws.com will be deprecated sometime.

storage, bucket, and key.
Is this enough to refer to a specific object?

I think yes. S3 bucket is region specific but bucket names are globally unique.

@Jeffwan
Copy link
Member Author

Jeffwan commented Apr 17, 2020

@Bobgy Seems test is kind of flaky, it complains version issues. Do you know the problem? Is it because some dependencies don't pin to a specific version? Any thing I can do to resolve them? I see similar issues in other PR as well

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 17, 2020

@Bobgy Seems test is kind of flaky, it complains version issues. Do you know the problem? Is it because some dependencies don't pin to a specific version? Any thing I can do to resolve them? I see similar issues in other PR as well

Python's "ERROR" messages about versions are not really errors.

The are some problems with presubmits in the last couple of days not related to this PR.

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 17, 2020

Here is the full structure of Argo's S3 artifact locations.

        s3:
          endpoint: s3.<region>.amazonaws.com
          bucket: my-bucket
          key: path/in/bucket/hello_world.txt.tgz
          accessKeySecret:
            name: my-s3-credentials
            key: accessKey
          secretKeySecret:
            name: my-s3-credentials
            key: secretKey

We need to make sure we're not losing any important information when converting to URI.

Can we have a situation where different runs have artifacts in different AWS regions (or even in GCS)? Do we need to preserve that information?

@goswamig
Copy link
Contributor

@Ark-kun

Can we have a situation where different runs have artifacts in different AWS regions

We specify the region in config-map workflow-controller-configmap and ml-pipeline-config (not yet merged), so unless until you are changing these configmap different run will point to same region.

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 17, 2020

@Ark-kun

Can we have a situation where different runs have artifacts in different AWS regions

We specify the region in config-map workflow-controller-configmap and ml-pipeline-config (not yet merged), so unless until you are changing these configmap different run will point to same region.

Does this mean that if the configmap is ever changed, all previously created artifacts become broken?

Copy link
Contributor

@goswamig goswamig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Jeffwan
Copy link
Member Author

Jeffwan commented Apr 22, 2020

/retest

@Jeffwan Jeffwan force-pushed the generate_s3_artifact_url branch from 12c1360 to 8a4c073 Compare May 1, 2020 21:36
@k8s-ci-robot k8s-ci-robot removed the lgtm label May 1, 2020
@Bobgy
Copy link
Contributor

Bobgy commented May 5, 2020

@Jeffwan Can you fix the unit tests?

@Jeffwan Jeffwan force-pushed the generate_s3_artifact_url branch from 8a4c073 to ee7a47e Compare May 8, 2020 05:07
@Jeffwan
Copy link
Member Author

Jeffwan commented May 8, 2020

Checking formatting...
src/components/ArtifactLink.tsx
src/lib/Utils.test.ts
Code style issues found in the above file(s). Forgot to run Prettier?

Looks like it's a formatter issue. I use prettier@1.19.1 which is the same version as CI to format the code.

@Jeffwan Jeffwan force-pushed the generate_s3_artifact_url branch from 9a1708e to edaa980 Compare May 8, 2020 11:17
@Jeffwan
Copy link
Member Author

Jeffwan commented May 8, 2020

@Bobgy I fix the issue in this PR and please have another check

@Ark-kun
Copy link
Contributor

Ark-kun commented May 9, 2020

Please check out #3725

@Jeffwan Jeffwan force-pushed the generate_s3_artifact_url branch from edaa980 to c952943 Compare May 28, 2020 00:01
@Ark-kun
Copy link
Contributor

Ark-kun commented May 28, 2020

/lgtm

@Bobgy
Copy link
Contributor

Bobgy commented May 28, 2020

I didn't take much attention on the data discussion, if you already reached a conclusion. I have no problem with this PR.
/lgtm
/approve
/hold
for test fixes
/retest

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Bobgy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Ark-kun
Copy link
Contributor

Ark-kun commented May 31, 2020

All tests now pass.
/hold cancel

@k8s-ci-robot k8s-ci-robot merged commit 3cae116 into kubeflow:master May 31, 2020
@Jeffwan Jeffwan deleted the generate_s3_artifact_url branch May 31, 2020 21:43
RedbackThomson pushed a commit to RedbackThomson/pipelines that referenced this pull request Jun 17, 2020
* Generate clickable artifact url for s3 URI

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>

* Format code using prettier@1.19.1

* Fix unit test failure

* Use encoded string in bucket url
Jeffwan added a commit to Jeffwan/pipelines that referenced this pull request Dec 9, 2020
* Generate clickable artifact url for s3 URI

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>

* Format code using prettier@1.19.1

* Fix unit test failure

* Use encoded string in bucket url
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants