-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build fails with connection reset by peer
error from GCR from time to time
#298
Comments
|
I met the same issue, also with GCR, storing my cache in gitlab container registry. |
@SunnyMagadan could you find out what's going on or did you find a solution? |
@materemias Unfortunately, I haven't found the root cause. It fails in 20% of cases, and sometimes it could be pretty annoying when it doesn't work after several attempts in a row. |
thanks for your reply @SunnyMagadan , I see the same 1 out of 5 pattern with this |
@materemias Thank you for the information regarding the
But still, I don't know why this happens with a local file cache. @tonistiigi Additionally, I see errors during the writing to the cache from time to time: #30 writing layer sha256:fda5213ffaf8f7d66af79c8560552fd2eb22190b0c0948cf04161261309218a6
ERRO[1013] (*service).Write failed error="rpc error: code = Canceled desc = context canceled" expected="sha256:ee005815d8d30f65f4eb5e606e989f3209ddc42eb31a662387e7fe5367a31f3c" ref="sha256:ee005815d8d30f65f4eb5e606e989f3209ddc42eb31a662387e7fe5367a31f3c" total=16062
2020/06/17 12:37:39 http2: server connection error from localhost: connection error: PROTOCOL_ERROR
#30 writing layer sha256:fda5213ffaf8f7d66af79c8560552fd2eb22190b0c0948cf04161261309218a6 0.4s done
#30 writing config sha256:9480e032752cce4d9a34f2ac45fc3a5764cb412693760bb120859ac40acd7c05 done
#30 writing manifest sha256:ee005815d8d30f65f4eb5e606e989f3209ddc42eb31a662387e7fe5367a31f3c done
#30 DONE 172.5s These errors don't fail the build though, but I wonder if this newly written cache can be considered as valid after these errors. Might this cache lead to subsequent build failures? |
Unclear what process these logs are coming from. They are not part of the cache write as the prefix is missing. Unlikely that this causes any cache corruption. |
I believe we're encountering this issue as well; our circumstances are quite similar and we're seeing the same error message (
Though there are both
We do not seem to be encountering the After looking at a few other past issues here and in I also tried passing
I provided a few of the preceding lines because the rest of the log shows numerous Not sure if this is of any use, but thought I'd provide it in case it is. Please let me know if there is any additional context or other details I can provide, or additional debugging steps I can try. Thanks in advance to all for your efforts and time. 🙂 |
We also get these ~0.1% of test runs in our CI. https://prow.istio.io/view/gs/istio-prow/logs/integ-security-multicluster-tests_istio_postsubmit/1347563831991209984 is an example run with logs. Another one: https://prow.istio.io/view/gs/istio-prow/pr-logs/pull/istio_istio/29815/integ-helm-tests_istio/1345971686867996672 Interestingly, only the first has "http2: server connection error from localhost: connection error: PROTOCOL_ERROR" Is this just a transient network failure that can be retried? |
👋 FYI After switching to docker hub the connection issue completely disappeared. I have no idea what is the real difference between them and why buildx doesn't play well with GCR, but it helped us. |
I am also getting this issue, I run multiple builds in parallel on GitHub and I am making use of the github actions cache |
thanks @imranzunzani , I am trying this and will let u know the results. |
@jdelucaa - Could you mention your findings in the issue's page? |
Unfortunately, I could not test it properly yet. As soon as we set the network=host, our frontend builds started to fail. |
Strange! What kind of SSL errors? |
Sometimes we get some handshake failures when we run too many builds in parallel, so not sure if it was just a coincidence, but this is the error we consistently got when switching to network=host:
|
It seems like a non-blocking / multiplexing issue relevant to Cipher Agreement on SSL. |
Hello 👋
We have CI builds running with
buildx
and from time to time it may fail with an error like:The issue usually disappears after restarting the build, but sometimes requires several restarts in a row.
The IP address
173.194.214.128
belongs to Google Cloud Registry, so it looks like GCR resets the connection for some reason, but it's unclear what could be the root cause of this behavior.The build command:
The output
Also, I've noticed another error in the output:
It seems suspicious and at some point might trigger the connection error too.
What are the cases when the layer considered locked/unavailable?
Not sure if it's related to the cache, but in our case, each build operates independently from others, so the cache is local and not shared with other instances that potentially could locking some files.
The text was updated successfully, but these errors were encountered: