-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
w/ GitLab Registry: "invalid status code from registry 404" #17999
Comments
Relevant quote from @mtrmac in the discussion:
|
I think it would also be useful to be quite loud about it (warning-level logs at least). |
I'm planning on raising a matching issue of Gitlab's issue tracker when I'm back in the office next week to get their input. It's worth noting though that site-wide on gitlab, any time there's an auth / permission restriction (eg trying to view a private project you don't have access to) the server responds with 404. I think this is a valid response choice from a security perspective, in that the server can't "leak" the existence of a resource you don't have permission to view. That might be why we get a 404 here, if it's some kind of rate limiting on the auth endpoint? |
I've raised https://gitlab.com/gitlab-org/gitlab/-/issues/404326 to request some feedback from the gitlab side. |
That concept of pretending something doesn’t exist is fine; but the server still needs to follow other constraints of the protocol. E.g. that other ticket shows that the server returns a 404 on What is the client to do with that information? We could, reasonably, and consistently with that concept, set up this operation to treat 404 (“does not exist”) the same as 403 (unauthorized) — but 403 is explicitly one of the results where we don’t retry because it wouldn’t help and it could hurt (by locking the user out due to repeated authentication failures). If the server is encountering some kind of outage / downtime / overload, I think it should indicate an outage / downtime / overload; that reveals nothing about the authentication rules to the client, but it allows the client to make a desirable retry decision. |
I don't think I have seen this error again after reporting it, although it did occur multiple times before. The only thing I can imagine is that since then, I've regularly updated both GitLab and Podman to the latest version. Potentially it makes sense to track with which Podman and GitLab versions people see this issue. @mtrmac Regarding |
Historically, there was https://gitlab.com/gitlab-org/gitlab/-/issues/215715 , where a repo is expected to be auto-created on a first upload, but if it is accessed while it is being auto-created (in this case, with parallel uploads of multiple layers), some of the attempts may fail. I’m not at all sure this is related in any way, but it is a bit suggestive in that maybe failures encountered while setting up a process might not show up afterwards. The “quite loud warning-level logs” idea was suggested as something to include in the future auto-retry behavior of Podman, so that there is a trace of a thing failing. It wouldn’t directly help with this situation, especially if the failures no longer occur. In this case, I think the |
@mtrmac This is an amazing find. I never considered it but yes, when I reported this issue, that was the first push to that image repository and after deleting an image repository and pushing to it again, I again see this issue. A reason why I never considered it is that this is not happening at the beginning of the push transaction, but rather at the end. In any case I can therefore confirm that also for me this happens when repositories are auto-created. |
This happens to me almost every time when pushing to a new image registry. The first push fails, but GitLab creates the registry successfully, and the second push works fine. I can't reproduce it with |
My workaround is to just try the push twice. So in my .gitlab-ci-yml, I replaced
with
|
I haven't seen Docker attempt 6 simultaneous auth requests for the JWT bearer token prior to the image push. A |
I'm pushing to an authenticated registry so I'm already doing a I assumed the 404 was because the registry wasn't found (because it wasn't created yet), not because the auth actually failed. |
Technically, the authentication/authorization step is failing. But that doesn’t rule out the possibility that a single authentication request would have succeeded, while a series of concurrent ones triggers a failure. The code could, possibly, track authentication requests in flight, and wait for an existing one to succeed instead of starting a parallel one. I don’t immediately know whether it would avoid this problem; it’s anyway the polite thing to do, against servers that could be rate-limiting authentication attempts (especially if the user provided incorrect credentials). It might even be a slight performance improvement. |
Interesting, I had only seen a 404 in the batch of JWT auth requests to gitlab, not in the requests to the registry. |
A friendly reminder that this issue had no activity for 30 days. |
Fyi the ticket I raised at gitlab for this issue has been triaged as severity: major so hopefully we'll get some investigating from their end soon There was also a potential workaround posted that involves building to a different tag name, then retagging to desired tag, though it doesn't make much sense to me as I wouldn't think it'd result in any change to registry api calls: |
I have a draft implementation in containers/image#1968 . Could someone who can reliably reproduce this GitLab failure test a Podman build with that change? |
@mtrmac I can reproduce this, if you or someone has a build available i'll give it a try. |
If anyone has ideas / information regarding reproducing the issue (even just describing your CI setup that does reproduce it) gitlab are trying to investigate it here: https://gitlab.com/gitlab-org/gitlab/-/issues/404326#note_1587264776 |
Discussed in #16842
Originally posted by 2xB October 6, 2022
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
When pushing to a GitLab Container Registry, I sometimes randomly get
Error: trying to reuse blob sha256:... at destination: Requesting bearer token: invalid status code from registry 404 (Not Found)
. Weirdly enough, when I upload multiple images from the same base image, I get this error message once for every image I try to push, the second push for each image works again.Steps to reproduce the issue:
Have a GitLab project with a GitLab Runner that can build and push Podman images (e.g. via Shell executor or custom executor)
Use it very often
Sometimes (rarely) get this error during
podman push ...
Describe the results you received:
Error: trying to reuse blob sha256:... at destination: Requesting bearer token: invalid status code from registry 404 (Not Found)
Describe the results you expected:
Successfully pushing the image to the GitLab Container Registry
Additional information you deem important (e.g. issue happens only occasionally):
I have a full log of
podman --log-level=debug push ...
the time it fails. I probably can't post the full log, but if there's something to check in that log, please tell!Output of
podman version
:Output of
podman info
:Package info (e.g. output of
rpm -q podman
orapt list podman
orbrew info podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
The text was updated successfully, but these errors were encountered: