Skip to content

Clean up AWS credentials #2592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 18, 2024
Merged

Clean up AWS credentials #2592

merged 4 commits into from
Jan 18, 2024

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Jan 14, 2024

(Replace #2589)

The credential has been revoked to be replaced by OIDC pytorch/test-infra#4865

@gs-olive
Copy link
Collaborator

Hi @huydhn - it seems that all of the tests are failing with Error: Unable to find an artifact with the name: pytorch_tensorrt__3.8__, as here. I noticed that in earlier CI runs (example), the name of the artifact was named pytorch_tensorrt__3.8 - do you know of any recent test-infra PRs which would have changed the artifact naming scheme?

@huydhn
Copy link
Contributor Author

huydhn commented Jan 16, 2024

Hi @huydhn - it seems that all of the tests are failing with Error: Unable to find an artifact with the name: pytorch_tensorrt__3.8__, as here. I noticed that in earlier CI runs (example), the name of the artifact was named pytorch_tensorrt__3.8 - do you know of any recent test-infra PRs which would have changed the artifact naming scheme?

I think I know the reason, pytorch/test-infra#4877 renames the artifacts so that it can be uniquely identified. Sorry for missing this and let me push a fix.

@huydhn huydhn requested a review from gs-olive January 16, 2024 22:49
@gs-olive
Copy link
Collaborator

gs-olive commented Jan 16, 2024

Hi - thanks for the commit. It seems the same error appears, unfortunately. Another thing I noticed is that in the older CI runs, there are only two jobs within the build-package step, as here, but now there are four, as here. It looks like the artifact name, as below, is not correct:

uses: actions/download-artifact@v3
with:
name: ${{ env.ARTIFACT_NAME }}
path: /opt/torch-tensorrt-builds/

@huydhn
Copy link
Contributor Author

huydhn commented Jan 16, 2024

Well, this is a bit tricky to test this on PR because the job is setup to use pytorch/tensorrt/.github/workflows/linux-test.yml@main from main, not from my PR, i.e. https://github.com/pytorch/TensorRT/blob/main/.github/workflows/build-test.yml#L67, so the change here won't be reflected till it's landed. It's a chicken-and-egg issue.

I could change that for testing though, once the CI signals are green, I can switch back to main before landing the change.

@huydhn
Copy link
Contributor Author

huydhn commented Jan 16, 2024

but now there are four, as here

This is expected because the job has been broken up into 2 parts, building and uploading artifacts. The latter has access to the credentials and its access is protected.

@huydhn
Copy link
Contributor Author

huydhn commented Jan 17, 2024

@gs-olive The test round seems to work correctly https://github.com/pytorch/TensorRT/actions/runs/7548790822, so landing this should be safe (I will switch the linux-test.yml back to @main before landing).

I'm rerunning some failures there https://github.com/pytorch/TensorRT/actions/runs/7548790822/job/20552955754 just to be sure, but they look like pre-existing failures and unrelated to this change?

@gs-olive
Copy link
Collaborator

This looks good! The test failures are unrelated to this PR.

Copy link
Collaborator

@gs-olive gs-olive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, pending the switch back to main on the workflow.

@gs-olive
Copy link
Collaborator

Hi @huydhn - I am noticing the following credential error on our Docker workflow: CI link. This workflow uses the login action as here:

registry: ${{ env.DOCKER_REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

Does this also need to be updated in light of the new authentication changes?

@huydhn
Copy link
Contributor Author

huydhn commented Jan 23, 2024

I think you're right that an update is need to grant the job the permission to push to ghcr.io. From https://docs.github.com/en/actions/using-jobs/assigning-permissions-to-jobs, I think we need to add the follow snippet to the job:

permissions:
  packages: write

@gs-olive
Copy link
Collaborator

Ok thank you for the suggestion - just added it here: #2619

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants