Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build fails with connection reset by peer error from GCR from time to time #298

Closed
SunnyMagadan opened this issue May 20, 2020 · 18 comments
Closed

Comments

@SunnyMagadan
Copy link

Hello 👋

We have CI builds running with buildx and from time to time it may fail with an error like:

failed to solve: rpc error: code = Unknown desc = failed to copy: read tcp 172.17.0.2:47038->173.194.214.128:443: read: connection reset by peer

The issue usually disappears after restarting the build, but sometimes requires several restarts in a row.
The IP address 173.194.214.128 belongs to Google Cloud Registry, so it looks like GCR resets the connection for some reason, but it's unclear what could be the root cause of this behavior.

The build command:

docker buildx build \
  --tag huddle/app:latest \
  --progress=plain \
  --cache-to=type=local,dest=/home/ciuser/docker-cache-new,mode=max \
  --cache-from=type=local,src=/home/ciuser/docker-cache \
  --output type=docker,dest=image.tar .

The output

#1 [internal] booting buildkit
#1 pulling image moby/buildkit:buildx-stable-1
#1 pulling image moby/buildkit:buildx-stable-1 34.3s done
#1 creating container buildx_buildkit_containerdriver0
#1 creating container buildx_buildkit_containerdriver0 2.2s done
#1 DONE 36.5s

#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile:
#2 transferring dockerfile: 1.22kB done
#2 DONE 0.0s

#3 [internal] load .dockerignore
#3 transferring context: 301B done
#3 DONE 0.1s

#6 [internal] load metadata for gcr.io/***********/huddle-ruby_bundle:node1...
#6 ...

#4 [internal] load metadata for gcr.io/***********/huddle-base:node12
#4 DONE 2.0s

#5 [internal] load metadata for gcr.io/***********/huddle-node_modules:node...
#5 ...

#6 [internal] load metadata for gcr.io/***********/huddle-ruby_bundle:node1...
#6 DONE 2.5s

#5 [internal] load metadata for gcr.io/***********/huddle-node_modules:node...
#5 DONE 2.9s

#8 [builder  1/11] FROM gcr.io/***********/huddle-base:node12@sha256:2b1c8d...
#8 resolve gcr.io/***********/huddle-base:node12@sha256:2b1c8dfcbebebebcec002a4cd34dd963082801eb694c3e1fe3f2581e53fee56a done
#8 DONE 0.0s

#15 [ruby_bundle 1/3] FROM gcr.io/***********/huddle-ruby_bundle:node12@sha2...
#15 resolve gcr.io/***********/huddle-ruby_bundle:node12@sha256:5ef1bc45cfe33c463f0ee2154bf6d72b607f5d8f7eda66440397f5f15692b4bb done
#15 DONE 0.0s

#19 [node_modules 1/5] FROM gcr.io/***********/huddle-node_modules:node12@sh...
#19 resolve gcr.io/***********/huddle-node_modules:node12@sha256:067f8e6e954233b16e42bd7c1cb3d92f0931c6e523cdb86efdd7763062692cb9 done
#19 DONE 0.0s

#7 importing cache manifest from local:8362393208701384237
#7 DONE 0.0s

#9 [internal] load build context
#9 ...

#25 FROM gcr.io/***********/huddle-webpacker_cache:node12
#25 resolve gcr.io/***********/huddle-webpacker_cache:node12 1.2s done
#25 DONE 1.2s

#9 [internal] load build context
#9 transferring context: 22.74MB 2.0s
#9 transferring context: 79.36MB 3.3s done
#9 DONE 3.4s

#10 [builder  2/11] COPY config/application.yml.example config/application.y...
#10 CACHED

#11 [builder  3/11] COPY config/database.yml.example config/database.yml
#11 pulling sha256:4e9545d8d85d670275c1486d9c2f012afc52240092bd26dfe42496b9a9b7afdc
#11 pulling sha256:fda5213ffaf8f7d66af79c8560552fd2eb22190b0c0948cf04161261309218a6
#11 pulling sha256:53e643ceb3e6bbcd62ccc5c10384845a1a1d1f40d02d5d82e237c7541952eba4
#11 pulling sha256:68ced04f60ab5c7a5f1d0b0b4e7572c5a4c8cce44866513d30d9df1a15277d6b
#11 pulling sha256:6558aeaccec55261e0e83509eee2a9f40001be9755765eb02adbc2c97fc38056
#11 pulling sha256:c55a61b57e3f6be04fbb54f4f49599cf38837e4be329c9010a4ef6977bdbd99a
#11 pulling sha256:3e56979ca8f121c1cc4b85f2c0bff0eb766aa71ada5964ad7a1ce801ae915f5d
#11 pulling sha256:a486fb9a30dac6e55b779c44e1dec51d7f68bdbd30b2251d8a5d0f75e979b372
#11 pulling sha256:94fc2147f42ce6526f83f6b98c755ac582e28322e88764fd24e884839fa88026
#11 pulling sha256:fa4541c61250ed97cb558d4a38ac75e68ca5a45db2445eadbf6884015ed6bc75
#11 pulling sha256:5828078197d9ca0ccb1866b35e5cf55db743ee41e1bd0fc2c35fcd52c66642e3
#11 pulling sha256:ced87f0436e304a3b30b7a59d9bb888ff4bc80525481fcf62eb92dc0a3253f35
#11 pulling sha256:cfb8cb55736d5b374b69aa3b0083eab4e1de98f62818997b9f0ea02728c23e41
#11 pulling sha256:3e6b2d08248106f9d122198a88a668fdd9e8ada3c9bd1a4bbbc6340fdb576143
#11 pulling sha256:19a1f0ac494cff5c808712974edd35e7a56e84f1c4ba8d5d3155464c8d9c8fb7
#11 pulling sha256:e04aacc8eab89f37083192c3617ab80e99e2ed1fb9da4eba1dde3c787a0fc16b
#11 pulling sha256:fa5a03d00e6eb736eea0d99a226fb263da01d0f4cb0da1c3a8dfded5b60b747f
#11 pulling sha256:6bdc372469a19461e9efa0448e2f1e82dabe22384bce757ef8fb6e6d142096ad
#11 pulling sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1
#11 pulling sha256:e5057d6574d605b9b1d175168547a99677b123e8d2a56cc3922c8467fff24dcd
#11 pulling sha256:4c13842975af8a901c654f55df7131c0bdae023ad4233ad9f7e7a1179f0da377
#11 pulling sha256:4e9545d8d85d670275c1486d9c2f012afc52240092bd26dfe42496b9a9b7afdc 0.2s done
#11 pulling sha256:3e56979ca8f121c1cc4b85f2c0bff0eb766aa71ada5964ad7a1ce801ae915f5d 0.3s done
#11 pulling sha256:94fc2147f42ce6526f83f6b98c755ac582e28322e88764fd24e884839fa88026 0.3s done
#11 pulling sha256:fa4541c61250ed97cb558d4a38ac75e68ca5a45db2445eadbf6884015ed6bc75 0.3s done
#11 pulling sha256:5828078197d9ca0ccb1866b35e5cf55db743ee41e1bd0fc2c35fcd52c66642e3 0.3s done
#11 pulling sha256:3e6b2d08248106f9d122198a88a668fdd9e8ada3c9bd1a4bbbc6340fdb576143 0.3s done
#11 pulling sha256:6bdc372469a19461e9efa0448e2f1e82dabe22384bce757ef8fb6e6d142096ad 0.3s done
#11 pulling sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 0.3s done
#11 pulling sha256:e5057d6574d605b9b1d175168547a99677b123e8d2a56cc3922c8467fff24dcd 0.3s done
#11 pulling sha256:4c13842975af8a901c654f55df7131c0bdae023ad4233ad9f7e7a1179f0da377 0.3s done
#11 pulling sha256:cfb8cb55736d5b374b69aa3b0083eab4e1de98f62818997b9f0ea02728c23e41 1.8s done
#11 pulling sha256:6558aeaccec55261e0e83509eee2a9f40001be9755765eb02adbc2c97fc38056 4.4s done
#11 pulling sha256:c55a61b57e3f6be04fbb54f4f49599cf38837e4be329c9010a4ef6977bdbd99a 5.1s done
#11 pulling sha256:a486fb9a30dac6e55b779c44e1dec51d7f68bdbd30b2251d8a5d0f75e979b372 5.2s done
#11 ...

#19 [node_modules 1/5] FROM gcr.io/***********/huddle-node_modules:node12@sh...
#19 sha256:c671bd3c17d46c01e7809e987e46204ffe41b52903b5d3e0a75a2a0289c29e06 0B / 24.17MB 4.5s
#19 sha256:ffc0b4e8606888cb95487a6250450b11be01594e0d0a250f164d6585378c03c1 7.75kB / 7.75kB done
#19 sha256:3597249f82e0db6e21ebfbab396ec1edb5638dafe802cc4582f9522c71829750 0B / 589.99MB 4.5s
#19 sha256:675fa60367503292034ae80912a94382dae8a0e9c757a4a1afd04957566ca4fb 498.39kB / 2.77MB 4.5s
#19 sha256:426a612bcf733c66fa5f2d59b59549dc685e18b5a52f465f9dfa1f1d6f45412c 0B / 285B 4.5s
#19 sha256:54fec2fa59d0a0de9cd2dec9850b36c43de451f1fd1c0a5bf8f1cf26a61a5da4 0B / 27.10MB 4.5s
#19 sha256:e36a8161fff2c2327c7204a96b834cb42b8f9769b44d94f50ed6e77f8e87c34b 342.12kB / 342.12kB 4.2s done
#19 sha256:067f8e6e954233b16e42bd7c1cb3d92f0931c6e523cdb86efdd7763062692cb9 2.42kB / 2.42kB done
#19 sha256:a410070425f6b4c60a46e50dc7daddf626fcc95a11683d6a081c27b56939b76b 0B / 30.07MB 4.5s
#19 sha256:5c3320d2c3eceabc02f11650ff46c51234b4fee38813b8c8d66677b2243b2d28 4.16kB / 4.16kB 4.2s done
#19 sha256:bb952d552f449071f5244b3144ede7d05b4911820f93663285c8e3cc6cc53a04 275B / 275B 4.2s done
#19 sha256:bd5306adc7da0938d57d3948587edfb00251f2bc47a3319a7ae9aed713c0303c 175.10kB / 175.10kB 4.2s done
#19 ERROR: failed to copy: read tcp 172.17.0.2:47038->173.194.214.128:443: read: connection reset by peer

#15 [ruby_bundle 1/3] FROM gcr.io/***********/huddle-ruby_bundle:node12@sha2...
#15 sha256:68ced04f60ab5c7a5f1d0b0b4e7572c5a4c8cce44866513d30d9df1a15277d6b 27.09MB / 27.09MB 3.6s done
#15 sha256:3e56979ca8f121c1cc4b85f2c0bff0eb766aa71ada5964ad7a1ce801ae915f5d 201B / 201B done
#15 sha256:fa4541c61250ed97cb558d4a38ac75e68ca5a45db2445eadbf6884015ed6bc75 143B / 143B done
#15 sha256:6b4bdd2b10af8d142a4b0afb5dd8aba3c43f16659b5a5d747d96ca8c89aac6ed
#15 sha256:a486fb9a30dac6e55b779c44e1dec51d7f68bdbd30b2251d8a5d0f75e979b372 21.45MB / 21.45MB 3.0s done
#15 sha256:616fc16f72d5f1a4df946be7bb766575bbde33114b31a4337d77a43920df27dc 6.53kB / 6.53kB done
#15 sha256:c55a61b57e3f6be04fbb54f4f49599cf38837e4be329c9010a4ef6977bdbd99a 12.54MB / 12.54MB 2.9s done
#15 sha256:ced87f0436e304a3b30b7a59d9bb888ff4bc80525481fcf62eb92dc0a3253f35
#15 sha256:cfb8cb55736d5b374b69aa3b0083eab4e1de98f62818997b9f0ea02728c23e41 7.30MB / 7.30MB done
#15 sha256:5ef1bc45cfe33c463f0ee2154bf6d72b607f5d8f7eda66440397f5f15692b4bb 2.21kB / 2.21kB done
#15 sha256:8684892534b14d5570bce0dbd7e9edecf2659fb0977c2b1bd77eb4788580b1fc 9.93kB / 9.93kB 4.1s done
#15 ...

#25 FROM gcr.io/***********/huddle-webpacker_cache:node12
#25 DONE 7.8s

#11 [builder  3/11] COPY config/database.yml.example config/database.yml
#11 pulling sha256:fda5213ffaf8f7d66af79c8560552fd2eb22190b0c0948cf04161261309218a6 5.8s done
#11 pulling sha256:68ced04f60ab5c7a5f1d0b0b4e7572c5a4c8cce44866513d30d9df1a15277d6b 5.9s done
#11 pulling sha256:19a1f0ac494cff5c808712974edd35e7a56e84f1c4ba8d5d3155464c8d9c8fb7 5.9s done
#11 pulling sha256:e04aacc8eab89f37083192c3617ab80e99e2ed1fb9da4eba1dde3c787a0fc16b 6.0s done
#11 pulling sha256:53e643ceb3e6bbcd62ccc5c10384845a1a1d1f40d02d5d82e237c7541952eba4 7.8s done
#11 pulling sha256:ced87f0436e304a3b30b7a59d9bb888ff4bc80525481fcf62eb92dc0a3253f35 7.8s done
#11 pulling sha256:fa5a03d00e6eb736eea0d99a226fb263da01d0f4cb0da1c3a8dfded5b60b747f 7.8s done
#11 CANCELED

#15 [ruby_bundle 1/3] FROM gcr.io/***********/huddle-ruby_bundle:node12@sha2...
#15 ERROR: ref buildkit/1/layer-sha256:ced87f0436e304a3b30b7a59d9bb888ff4bc80525481fcf62eb92dc0a3253f35 locked: unavailable

#25 FROM gcr.io/***********/huddle-webpacker_cache:node12
#25 sha256:52b5db3ddac124b8309593995c205ea3fcb8c430e1ad1c6540aeab0f4d6bb8e3 528B / 528B done
#25 CANCELED
------
 > [ruby_bundle 1/3] FROM gcr.io/***********/huddle-ruby_bundle:node12@sha256:5ef1bc45cfe33c463f0ee2154bf6d72b607f5d8f7eda66440397f5f15692b4bb:
------
------
 > [node_modules 1/5] FROM gcr.io/***********/huddle-node_modules:node12@sha256:067f8e6e954233b16e42bd7c1cb3d92f0931c6e523cdb86efdd7763062692cb9:
------
failed to solve: rpc error: code = Unknown desc = failed to copy: read tcp 172.17.0.2:47038->173.194.214.128:443: read: connection reset by peer

Exited with code exit status 1

Also, I've noticed another error in the output:

#15 ERROR: ref buildkit/1/layer-sha256:ced87f0436e304a3b30b7a59d9bb888ff4bc80525481fcf62eb92dc0a3253f35 locked: unavailable

It seems suspicious and at some point might trigger the connection error too.
What are the cases when the layer considered locked/unavailable?
Not sure if it's related to the cache, but in our case, each build operates independently from others, so the cache is local and not shared with other instances that potentially could locking some files.

@tonistiigi
Copy link
Member

What are the cases when the layer considered locked/unavailable?

@dmcgowan

@materemias
Copy link

I met the same issue, also with GCR, storing my cache in gitlab container registry.

@materemias
Copy link

@SunnyMagadan could you find out what's going on or did you find a solution?

@SunnyMagadan
Copy link
Author

@materemias Unfortunately, I haven't found the root cause. It fails in 20% of cases, and sometimes it could be pretty annoying when it doesn't work after several attempts in a row.
I tried to check GCR access logs if there anything suspicious, but logs haven't shown any issues with accessing the files. All looks good there. At the moment, I don't know if there is a problem with GCR itself. Have you tried the docker hub instead?
@tonistiigi Could you provide a piece of advice on how to get more details about the error, please? Are there any options that could help?

@materemias
Copy link

thanks for your reply @SunnyMagadan , I see the same 1 out of 5 pattern with this connection reset by peer tcp error, the difference between our setups is that I use cache-to and cache from type=registry. since gcr doesn't allow cache layers to be stored I tried with gitlab container registry (which is also hosted in gcp hence the same connection reset error).
I have also tried docker.io as a potential cache target, so far it seems to be working (at least it stores and uses the layers), will come back later with experiences.

@SunnyMagadan
Copy link
Author

@materemias Thank you for the information regarding the docker.io. I might consider trying it instead. 🤔
As for the cache storage at GCR, I found the article about buildx with the following note:

As a side note, we use DockerHub as the cache repository instead of Google Container Registry which stores our application images. This is because at the time of writing, Google Container Registry does not seem to support the cache manifest format application/vnd.buildkit.cacheconfig.v0 and returns Bad Request 400 when trying to push a build cache. So we fell back on using a private repo on DockerHub for now and it works perfectly.

https://medium.com/titansoft-engineering/docker-build-cache-sharing-on-multi-hosts-with-buildkit-and-buildx-eb8f7005918e

But still, I don't know why this happens with a local file cache.

@tonistiigi Additionally, I see errors during the writing to the cache from time to time:

#30 writing layer sha256:fda5213ffaf8f7d66af79c8560552fd2eb22190b0c0948cf04161261309218a6
ERRO[1013] (*service).Write failed                       error="rpc error: code = Canceled desc = context canceled" expected="sha256:ee005815d8d30f65f4eb5e606e989f3209ddc42eb31a662387e7fe5367a31f3c" ref="sha256:ee005815d8d30f65f4eb5e606e989f3209ddc42eb31a662387e7fe5367a31f3c" total=16062
2020/06/17 12:37:39 http2: server connection error from localhost: connection error: PROTOCOL_ERROR
#30 writing layer sha256:fda5213ffaf8f7d66af79c8560552fd2eb22190b0c0948cf04161261309218a6 0.4s done
#30 writing config sha256:9480e032752cce4d9a34f2ac45fc3a5764cb412693760bb120859ac40acd7c05 done
#30 writing manifest sha256:ee005815d8d30f65f4eb5e606e989f3209ddc42eb31a662387e7fe5367a31f3c done
#30 DONE 172.5s

These errors don't fail the build though, but I wonder if this newly written cache can be considered as valid after these errors. Might this cache lead to subsequent build failures?

@tonistiigi
Copy link
Member

see errors during the writing to the cache from time to time:

Unclear what process these logs are coming from. They are not part of the cache write as the prefix is missing. Unlikely that this causes any cache corruption.

@bendemaree
Copy link

I believe we're encountering this issue as well; our circumstances are quite similar and we're seeing the same error message (connection reset by peer). We're using buildx v0.4.1 with the docker-container driver; this is also a CI environment in our case, so this is all under Kubernetes, using DinD (specifically docker:19.03.9-dind). The Dockerfile is relatively small:

FROM docker.io/cypress/browsers:node12.6.0-chrome77
COPY --from=gcr.io/... <src> <target>
COPY --from=gcr.io/... <src> <target>

Though there are both docker.io and gcr.io-hosted images involved, the failures appear to always refer to the gcr.io image:

 #5 sha256:72c7047d2f78f06ec055d39d316765aa51484304bcd0f6046dcebd02f343bc02
 #5 sha256:7aaf66ba1796667c0ea3ce30b042412f6bfd684a844ee824d660179be1310106 196B / 196B 0.9s done
 #5 CANCELED
 ------
  > FROM gcr.io/...:
 ------
 failed to solve: rpc error: code = Unknown desc = failed to copy: read tcp 172.17.0.2:57938->172.217.212.128:443: read: connection reset by peer

We do not seem to be encountering the locked: unavailable or context canceled errors, however.

After looking at a few other past issues here and in buildkit's tracker, I thought I'd try enabling some debug flags to see if there's anything that looks suspect. Setting GODEBUG=http2debug=1 when invoking docker buildx bake did not seem to produce much of interest (happy to share those logs if useful, though).

I also tried passing --buildkitd-flags '--debug' to docker buildx create, then dumping the builder containers logs after a failure. I'm happy to share as much of the logs as would be useful; here's what the very end looks like, though (I've replaced some identifiers with ..., and added newlines for clarity):

time="2020-06-29T14:37:54Z" level=debug msg="do request" digest="sha256:e8a766f5fc07c788360b94ec016d5e84eadfc9dc704e7da6d2e7d2c562c5b926" mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip request.header.accept="application/vnd.docker.image.rootfs.diff.tar.gzip, */*" request.header.user-agent=containerd/1.3.0+unknown request.method=GET size=334415752 url="https://gcr.io/v2/.../blobs/sha256:e8a766f5fc07c788360b94ec016d5e84eadfc9dc704e7da6d2e7d2c562c5b926"

time="2020-06-29T14:37:54Z" level=debug msg="do request" digest="sha256:87bc9957c38cc4056dc0749b509984c50b202c19d92f8985e58ba48bf632b035" mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip request.header.accept="application/vnd.docker.image.rootfs.diff.tar.gzip, */*" request.header.user-agent=containerd/1.3.0+unknown request.method=GET size=907024 url="https://gcr.io/v2/.../blobs/sha256:87bc9957c38cc4056dc0749b509984c50b202c19d92f8985e58ba48bf632b035"

time="2020-06-29T14:37:54Z" level=debug msg="fetch response received" digest="sha256:7f49758756f564fb22c430e284efb10fdaffcab3c058e2e95933a1b862efe3aa" mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.accept-ranges=bytes response.header.cache-control="private, max-age=0" response.header.content-length=1263195 response.header.content-type=application/octet-stream response.header.date="Mon, 29 Jun 2020 14:37:54 GMT" response.header.etag="\"3a15a10fd8e39ff7acb85ab3c09e9a7c\"" response.header.expires="Mon, 29 Jun 2020 14:37:54 GMT" response.header.last-modified="Fri, 20 Dec 2019 15:13:00 GMT" response.header.server=UploadServer response.header.x-goog-generation=1576854780023714 response.header.x-goog-hash="crc32c=c0krUQ==" response.header.x-goog-hash.1="md5=OhWhD9jjn/esuFqzwJ6afA==" response.header.x-goog-metageneration=1 response.header.x-goog-storage-class=STANDARD response.header.x-goog-stored-content-encoding=identity response.header.x-goog-stored-content-length=1263195 response.header.x-guploader-uploadid=AAANsUkL8Q_mkFvErZ5d8ptDTZO2nIeIdXq3yxlZJQWSvIlyyxqwgdRUOX9QkoCXegRKAkuRie7TukWabZrbzNQDvJU response.status="200 OK" size=1263195 url="https://gcr.io/v2/.../blobs/sha256:7f49758756f564fb22c430e284efb10fdaffcab3c058e2e95933a1b862efe3aa"

time="2020-06-29T14:37:55Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: read tcp 172.17.0.2:57938->172.217.212.128:443: read: connection reset by peer
failed to copy
github.com/containerd/containerd/content.Copy
    /src/vendor/github.com/containerd/containerd/content/helpers.go:148
github.com/containerd/containerd/remotes.fetch
    /src/vendor/github.com/containerd/containerd/remotes/handlers.go:133
github.com/containerd/containerd/remotes.FetchHandler.func1
    /src/vendor/github.com/containerd/containerd/remotes/handlers.go:95
github.com/containerd/containerd/images.HandlerFunc.Handle
    /src/vendor/github.com/containerd/containerd/images/handlers.go:55
github.com/containerd/containerd/images.Handlers.func1
    /src/vendor/github.com/containerd/containerd/images/handlers.go:65
github.com/containerd/containerd/images.HandlerFunc.Handle
    /src/vendor/github.com/containerd/containerd/images/handlers.go:55
github.com/containerd/containerd/images.Dispatch.func1
    /src/vendor/github.com/containerd/containerd/images/handlers.go:134
golang.org/x/sync/errgroup.(*Group).Go.func1
    /src/vendor/golang.org/x/sync/errgroup/errgroup.go:57
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1357
github.com/moby/buildkit/solver.(*edge).execOp
    /src/solver/edge.go:869
github.com/moby/buildkit/solver/internal/pipe.NewWithFunction.func2
    /src/solver/internal/pipe/pipe.go:82
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1357"

I provided a few of the preceding lines because the rest of the log shows numerous 401 responses from gcr.io; my assumption is that that's due to something the client is doing to determine what auth strategy to use, as there are also plenty of successful requests/responses such as those.

Not sure if this is of any use, but thought I'd provide it in case it is. Please let me know if there is any additional context or other details I can provide, or additional debugging steps I can try. Thanks in advance to all for your efforts and time. 🙂

@howardjohn
Copy link

We also get these ~0.1% of test runs in our CI. https://prow.istio.io/view/gs/istio-prow/logs/integ-security-multicluster-tests_istio_postsubmit/1347563831991209984 is an example run with logs.

Another one: https://prow.istio.io/view/gs/istio-prow/pr-logs/pull/istio_istio/29815/integ-helm-tests_istio/1345971686867996672

Interestingly, only the first has "http2: server connection error from localhost: connection error: PROTOCOL_ERROR"

Is this just a transient network failure that can be retried?

@SunnyMagadan
Copy link
Author

👋 FYI After switching to docker hub the connection issue completely disappeared. I have no idea what is the real difference between them and why buildx doesn't play well with GCR, but it helped us.

@jdelucaa
Copy link

I am also getting this issue, I run multiple builds in parallel on GitHub and I am making use of the github actions cache cache-to: type=gha,mode=max and I am using the docker/build-and-push action, see this connection reset by peer errors very often.

@imranzunzani
Copy link

@jdelucaa - May be this could be of help to you.

@jdelucaa
Copy link

@jdelucaa - May be this could be of help to you.

thanks @imranzunzani , I am trying this and will let u know the results.

@imranzunzani
Copy link

@jdelucaa - Could you mention your findings in the issue's page?

@jdelucaa
Copy link

jdelucaa commented Jul 5, 2023

@jdelucaa - Could you mention your findings in the issue's page?

Unfortunately, I could not test it properly yet. As soon as we set the network=host, our frontend builds started to fail.
We fetch some NPM packages from a private registry and started having ssl errors.

@imranzunzani
Copy link

imranzunzani commented Jul 5, 2023

Unfortunately, I could not test it properly yet. As soon as we set the network=host, our frontend builds started to fail. We fetch some NPM packages from a private registry and started having ssl errors.

Strange! What kind of SSL errors?

@jdelucaa
Copy link

jdelucaa commented Jul 6, 2023

Unfortunately, I could not test it properly yet. As soon as we set the network=host, our frontend builds started to fail. We fetch some NPM packages from a private registry and started having ssl errors.

Strange! What kind of SSL errors?

Sometimes we get some handshake failures when we run too many builds in parallel, so not sure if it was just a coincidence, but this is the error we consistently got when switching to network=host:

8.920 error An unexpected error occurred: "https://my-private-registry/my-package: write EPROTO 139779399866304:error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:../deps/openssl/openssl/ssl/record/rec_layer_s3.c:1565:SSL alert number 40

@imranzunzani
Copy link

Sometimes we get some handshake failures when we run too many builds in parallel, so not sure if it was just a coincidence, but this is the error we consistently got when switching to network=host:

8.920 error An unexpected error occurred: "https://my-private-registry/my-package: write EPROTO 139779399866304:error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:../deps/openssl/openssl/ssl/record/rec_layer_s3.c:1565:SSL alert number 40

It seems like a non-blocking / multiplexing issue relevant to Cipher Agreement on SSL.
See: Stackoverflow, Github.
Doesn't look related to network=host . You can ensure that by checking if it still happens without network=host.
It might be due to falling back to a lower TLS version after negotiating on higher one. As per this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants