-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman build: stream dropped, unexpected failure #10154
Comments
Two more failures in my next rerun (plus one more flake which I'm just going to stick my head in the sand about) |
Update: although it mostly happens in the This bug makes it impossible to merge #9887 |
Another one, in bud-http-Dockerfile test |
Here's a reproducer (assumes $ cat Containerfile
FROM php:7.2
COPY --from foo:bar a b
$ while :;do x=$(../bin/podman-remote build --force-rm=false -t foo . 2>&1); if [[ $x =~ stream ]]; then echo $x; break;fi;done
STEP 1: FROM php:7.2
STEP 2: COPY --from foo:bar a b
Error: stream dropped, unexpected failure Might take 3 minutes, might take 15. The [UPDATE: nothing useful on server end, with
|
Update: #10295 did not, alas, fix the problem. Here is a failed CI run from my buildah-bud-under-podman-remote PR. I was also able to reproduce the problem locally on my laptop, using the reproducer above. (I added Curiously... I ran the server with Hint: if you run with a verbose log-level, redirect server output to a logfile in shm. With output going to my rxvt, I wasn't able to reproduce the failure. |
@jwhonce, @rhatdan, I know nothing at all about Go streams ... but is EOF really considered an error? Is it possible for EOF to trigger in conjunction with a successful read? And, more to the point, WDYT of: --- a/pkg/bindings/images/build.go
+++ b/pkg/bindings/images/build.go
@@ -353,7 +353,7 @@ func Build(ctx context.Context, containerFiles []string, options entities.BuildO
}
if err := dec.Decode(&s); err != nil {
if errors.Is(err, io.EOF) {
- if mErr == nil && id == "" {
+ if mErr == nil && s.Error == "" && id == "" {
mErr = errors.New("stream dropped, unexpected failure")
}
break With that, I've been unable to trigger the error on my laptop, and it even passed in CI, which has not happened in weeks. I don't know what unintended consequences that might have, though. |
Okay... never mind. I tried the following: diff --git a/pkg/bindings/images/build.go b/pkg/bindings/images/build.go
index f5e7c0c98..44434810d 100644
--- a/pkg/bindings/images/build.go
+++ b/pkg/bindings/images/build.go
@@ -354,6 +354,7 @@ func Build(ctx context.Context, containerFiles []string, options entities.BuildO
if err := dec.Decode(&s); err != nil {
if errors.Is(err, io.EOF) {
if mErr == nil && id == "" {
+ logrus.Errorf("got here: s.Error=%q", s.Error)
mErr = errors.New("stream dropped, unexpected failure")
}
break ...and ran my reproducer, which quickly failed with: $ while :;do x=$(../bin/podman-remote build --force-rm=false -t foo . 2>&1); if [[ $x =~ stream ]]; then echo $x; beep;break;fi;buildah rm -a >/dev/null;done
STEP 1: FROM php:7.2
STEP 2: COPY --from foo:bar a b
time="2021-05-11T10:06:35-06:00" level=error msg="got here: s.Error=\"\""
Error: stream dropped, unexpected failure Therefore: adding |
New record: seven failures in one CI run.
|
PR #9887 (buildah-bud tests under podman-remote) has been catching a large number of bugs in both podman and buildah. It would be really great if we could merge that and run it in CI, instead of manually by me, but we can't, because of this flake. Is there any chance someone could take a fresh look? |
A friendly reminder that this issue had no activity for 30 days. |
Oh well. I was slightly hopeful that #10916 would fix this, but nope. |
Oh, this is really bad. We're now seeing this flake in regular CI (sys remote ubuntu-2104 root host). |
...and, looking back through flake logs, another one also in ubuntu-2104 root. |
And another one in our regular CI, this time ubuntu-2010. @containers/podman-maintainers this is becoming a serious problem. I don't care if we can't merge #9887, but the flake is now affecting us in real-world CI. |
I am still dizzy from my 2nd shot but bookmarked it. Will take a look at it with a fresh brain on Monday. Thank you for the ping, @edsantiago! |
Hit the flake also in #11049 |
Address a number of issues in the streaming logic in remote build, most importantly an error in using buffered channels on the server side. The pattern below does not guarantee that the channel is entirely read before the context fires. for { select { case <- bufferedChannel: ... case <- ctx.Done(): ... } } Fixes: containers#10154 Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Address a number of issues in the streaming logic in remote build, most importantly an error in using buffered channels on the server side. The pattern below does not guarantee that the channel is entirely read before the context fires. for { select { case <- bufferedChannel: ... case <- ctx.Done(): ... } } Fixes: containers#10154 Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Looks like an unintended consequence of #10034 . I'm seeing a lot of these failures in my buildah-bud-tests-under-podman-remote testing (#9887):
Logs, all from fedora34-beta:
Four of the five failures are in tests that expect rc=125; but one of them is in a test that expects rc=0.
This is based on master @ 476c76f as of yesterday afternoon (Apr 26)
The text was updated successfully, but these errors were encountered: