Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive short-lived token expiration handling #307

Open
webknjaz opened this issue Nov 30, 2024 · 12 comments
Open

Adaptive short-lived token expiration handling #307

webknjaz opened this issue Nov 30, 2024 · 12 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@webknjaz
Copy link
Member

So it happened that a short-lived token in yarl started expiring before all uploads managed to complete.

This resulted in having to yank two releases: https://pypi.org/project/yarl/1.18.2/#history

The last failure started uploading things but didn't complete in 15 minutes, eventually hitting the token expiration: https://github.com/aio-libs/yarl/actions/runs/12091091291/job/33719944175#step:9:729

The last successful upload had the step with this action running for 4 minutes https://github.com/aio-libs/yarl/actions/runs/11954909206/job/33329106474#step:9:768

@woodruffw I don't know what exactly caused the slowness, but I think that maybe we should re-request that token every couple of dists if the cumulative size of the uploads is big. I suppose that adding the attestations doubles the number of uploads, which might also affect the overall upload time.

@webknjaz webknjaz added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed labels Nov 30, 2024
@webknjaz webknjaz moved this to 🤔 Brain 🧠 dump 💡 in 📅 Procrastinating in public Nov 30, 2024
@woodruffw
Copy link
Member

Huh, this is interesting -- FWICT all of yarl's distributions appear to be pretty small (all are under 1MB?), so I can't imagine why it would take more than a minute or two to upload them. Maybe GHA itself had a network hiccup?

(The attestations add to that size, but each is ~5KB so that should mostly be a rounding error.)

Re-requesting the token when the cumulative size is large makes sense to me; another option would be to do it on a timer, i.e. re-request if any given dist takes more than ~7 minutes to upload. I'll think a bit about how to rearchitect things here to enable that (maybe calling twine multiple times rather than one-shot?)

@webknjaz
Copy link
Member Author

Yeah, I was thinking about calling twine multiple times and doing the OIDC dance before each call. Since Twine doesn't have a public API, we can't really integrate more granularly...

@woodruffw
Copy link
Member

True -- this might also be the right time for me to revisit pypa/twine#999, since the technical blockers for that on PyPI's side are now resolved. Having the OIDC exchange baked directly into twine could potentially make it easier to refresh the token automatically.

@webknjaz
Copy link
Member Author

Maybe… However, with the release responsiveness there, I envision that causing more problems.

@woodruffw
Copy link
Member

Maybe… However, with the release responsiveness there, I envision that causing more problems.

I'm one of the maintainers now, so it should be a bit faster 😅 -- but I can appreciate not wanting it to become a blocker. We already have all of the code here in the action already, so I can certainly start by looking at things directly here.

@QuLogic
Copy link

QuLogic commented Nov 30, 2024

I just hit this in Matplotlib: https://github.com/matplotlib/matplotlib/actions/runs/12095099184/job/33728735029 I don't know why upload is so slow; when I was uploading them myself before Trusted Publishing became a thing, it took half the time for all of them.

There are 4 wheels (for the lesser-used PyPy, at least) and the sdist missing. Is there anything to be done other than yanking the release? From the log, it appears that the action generated the attestations before starting the upload, but I don't know if there's some component that is on the runner that needs to be uploaded to PyPI as well, or whether I can just upload the wheels myself anyway.

I'm not sure how many people are relying on attestations since this is the first release to have them. The files are also attested on GitHub since we ran that step before the PyPI upload (though I know people have suggested to reverse these operations.)

@woodruffw
Copy link
Member

From the log, it appears that the action generated the attestations before starting the upload, but I don't know if there's some component that is on the runner that needs to be uploaded to PyPI as well, or whether I can just upload the wheels myself anyway.

You can upload the rest manually, but they won't have attestations. That means your release will only be partially attested, however.

Another option would be to re-run your CI for the same release, but pass skip-existing: true. That should gracefully handle the files that have already been uploaded and include attestations for the remaining files.

(I'm sorry for the pain here -- I'm still not 100% sure why CI jobs are suddenly exceeding the 15 minute token window. I'll be able to look into it more today.)

@QuLogic
Copy link

QuLogic commented Dec 1, 2024

You can upload the rest manually, but they won't have attestations. That means your release will only be partially attested, however.

OK, I've done so, and will just note the missing annotations in the release announcement.

Note when I first tried, I got a 502 Bad Request error with twine 4.0.2. When I realized twine was outdated, I installed the newest version and tried again. This time uploading did work, but there was a pause of about 10-15 seconds after each one, and one of them had a 502 response, which twine claimed to retry. I then re-ran it and all 5 missing items claimed to upload again. Checking on PyPI they really are there now.

Anyway, I'm not trying to report a bug on twine or warehouse here, but just that perhaps the delay after each file uploaded and the sporadic 502 Bad Request responses may be contributing to slow uploads.

@woodruffw
Copy link
Member

Thanks @QuLogic, that's helpful context. This does sound like it might in fact be a PyPI backend issue -- twine shouldn't really be getting 502s if the package/upload is otherwise well formed.

Out of curiosity, do you know what metadata version you're producing? I'm wondering if the recent metadata 2.3/2.4 problems with twine might also be related here.

@QuLogic
Copy link

QuLogic commented Dec 3, 2024

No, I believe it's version 2.1.

@webknjaz
Copy link
Member Author

webknjaz commented Dec 5, 2024

@woodruffw FYI whatever was causing slow uploads to PyPI, stopped causing problems on Sunday and a few releases have been successful within aio-libs.

Though, I wouldn't abandon the idea of per-dist token refresh, so the issue remains open.

@webknjaz webknjaz removed the bug Something isn't working label Dec 10, 2024
@QuLogic
Copy link

QuLogic commented Dec 13, 2024

Our latest release also worked today: https://github.com/matplotlib/matplotlib/actions/runs/12309652164/job/34358661026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants