Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman-exec tty resize: stty: standard input #10710

Open
edsantiago opened this issue Jun 17, 2021 · 39 comments
Open

podman-exec tty resize: stty: standard input #10710

edsantiago opened this issue Jun 17, 2021 · 39 comments
Assignees
Labels
flakes Flakes from Continuous Integration

Comments

@edsantiago
Copy link
Member

Fallout from #10683 (tty resize). The test even flaked in that very same PR:

# # podman exec -it mystty stty size
# stty: standard input

# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: stty under podman exec reads the correct dimensions
# #| expected: '41 62'
# #|   actual: 'stty: standard input
'
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

sys: podman detects correct tty size

...and it just triggered in a new PR, also ubuntu-2104.

@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Jun 17, 2021
@edsantiago
Copy link
Member Author

Yikes - the #10688 flake was a triple:

@Luap99
Copy link
Member

Luap99 commented Jun 17, 2021

@edsantiago Are you able to reproduce locally?

@edsantiago
Copy link
Member Author

I have not been able to reproduce, because I'm plagued by #10701 on my laptop.

@mheon
Copy link
Member

mheon commented Jun 17, 2021

I'm struggling to see how this could happen - the resize is now happening in a way that can't race.

@edsantiago
Copy link
Member Author

I just reproduced on f34, podman-3.2.1-1.fc34 rootless. Passed on retry.

@edsantiago
Copy link
Member Author

Reproduced on same f34 system as above, this time in podman-remote rootless. Does the warning message help?

   $ podman-remote run -it --name mystty quay.io/libpod/testimage:20210427 stty size
   stty: standard input
   time="2021-06-17T16:38:39-04:00" level=warning msg="failed to resize TTY: can only resize created or running containers: container state improper"
   #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
   #|     FAIL: stty under podman reads the correct dimensions
   #| expected: '55 28'
'  #|   actual: 'stty: standard input
   #|         > 'time="2021-06-17T16:38:39-04:00" level=warning msg="failed to resize TTY: can only resize created or running containers: container state improper"'
   #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@mheon
Copy link
Member

mheon commented Jun 17, 2021

OK, so the resize is happening too soon, presumably? That at least makes sense.

@edsantiago
Copy link
Member Author

Recommendation: try wait_for_ready.

@rhatdan
Copy link
Member

rhatdan commented Jun 18, 2021

@Luap99 PTAL

@Luap99
Copy link
Member

Luap99 commented Jun 18, 2021

Reproduced on same f34 system as above, this time in podman-remote rootless. Does the warning message help?

   $ podman-remote run -it --name mystty quay.io/libpod/testimage:20210427 stty size
   stty: standard input
   time="2021-06-17T16:38:39-04:00" level=warning msg="failed to resize TTY: can only resize created or running containers: container state improper"
   #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
   #|     FAIL: stty under podman reads the correct dimensions
   #| expected: '55 28'
'  #|   actual: 'stty: standard input
   #|         > 'time="2021-06-17T16:38:39-04:00" level=warning msg="failed to resize TTY: can only resize created or running containers: container state improper"'
   #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Well you managed to reproduced this issue with podman-remote run so this is not the same problem as podman exec.

@mheon I believe the race is between podman and conmon. The resize call will simply write to a named pipe and after that podman will write to another pipe to signal conmon that it should start the process. I can imagine a case where conmon reads the start pipe before the resize pipe.

Conmon never signals podman if the resize even worked.

@mheon
Copy link
Member

mheon commented Jun 18, 2021

Could we also be looking at an internal latency in Conmon, where the process is not immediately ready after the start pipe is read?

Regardless, this sounds like a reasonable conclusion. I don't see an easy way around it, though - Conmon doesn't tell us that the start signal was received, either, so there's no easy way to syncronize

@vrothberg
Copy link
Member

Any progress on that? It's a quite consistent and stubborn flake in the main branch at the moment.

vrothberg added a commit to vrothberg/libpod that referenced this issue Jun 23, 2021
As discussed in containers#10710, the additional checks for podman-exec added by
commit 666f555 are extremely flaky and appear in nearly every PR
I have see this week.

Let's temporarily disable the checks and reenable them on containers#10710 is
fixed.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
@vrothberg
Copy link
Member

I opened #10758 to disable the flaky tests. We need to make sure to reenable the tests once the underlying issue is fixed (I made sure to drop a comment).

I usually feel strongly about disabling tests but the flakes are just too frequent.

mheon pushed a commit to mheon/libpod that referenced this issue Jun 24, 2021
As discussed in containers#10710, the additional checks for podman-exec added by
commit 666f555 are extremely flaky and appear in nearly every PR
I have see this week.

Let's temporarily disable the checks and reenable them on containers#10710 is
fixed.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jul 25, 2021

@edsantiago @vrothberg Any update on this issue?

@edsantiago
Copy link
Member Author

The multiarch testing group is still seeing this. Podman 3.2.3 on s390x; I don't know if it's Fedora or RHEL. If I read the log correctly, August 19.

@vrothberg
Copy link
Member

Let's wait for the results on v3.3.0. We increased the sizes of signal buffers "in hope" to resolve such issues which made it into v3.3.0.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Sep 24, 2021

Well it is a month later, so I am going to assume this is fixed. Reopen if it happens again.

@edsantiago
Copy link
Member Author

I've got a 1minutetip VM right now that's reproducing it super-easily. Ping me for access.

The "remote" connection (in comments above) was me grasping at straws. I don't know if this merits a new issue. My gut tells me no, that the common factor is "stty is broken", not whether it's via exec or remote.

@Luap99
Copy link
Member

Luap99 commented Dec 8, 2021

My comment is still valid.

@github-actions
Copy link

github-actions bot commented Jan 8, 2022

A friendly reminder that this issue had no activity for 30 days.

@edsantiago
Copy link
Member Author

I can no longer reproduce this: not with the podman run reproducer above (Dec 7), and not with exec either. Not on f35 with compiled podman, nor on Rawhide with 4.0rc6.

Is it time to reenable the commented-out exec ... stty test, and see what happens?

edsantiago added a commit to edsantiago/libpod that referenced this issue Feb 23, 2022
Ref: containers#10710, a nasty and frequent flake. I can no longer
reproduce the failure on f35 or Rawhide, so let's take
the risk of reenabling the test.

Signed-off-by: Ed Santiago <santiago@redhat.com>
@edsantiago
Copy link
Member Author

Flaked in the very PR that was going to reintroduce the test. Problem still exists, even if I can't reproduce it on my laptop or a Rawhide VM.

@Luap99
Copy link
Member

Luap99 commented Feb 24, 2022

I guess this can be fixed with conmon-rs in the future.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@edsantiago
Copy link
Member Author

After a long absence, this is back. rawhide rootless:

not ok 416 podman detects correct tty size
...
$ podman run -it --name mystty quay.io/libpod/testimage:20221018 stty size
stty: standard input

#/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#|     FAIL: stty under podman run reads the correct dimensions
#| expected: '31 32
'
#|   actual: 'stty: standard input
'
#\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@Luap99
Copy link
Member

Luap99 commented May 10, 2023

Yeah at this point I don't think this is ever going to be fixed, we would need a bidirectional channel to get a success response back from conmon and only then start the container. I guess conmon-rs is your best hope here.

edsantiago added a commit to edsantiago/libpod that referenced this issue Oct 31, 2023
I've seen the stty flake (containers#10710) twice in one day. Time to
add a retry.

Signed-off-by: Ed Santiago <santiago@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flakes Flakes from Continuous Integration
Projects
None yet
Development

No branches or pull requests

5 participants