Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable and fix AuditOn. #17687

Merged
merged 4 commits into from
Nov 14, 2022
Merged

Enable and fix AuditOn. #17687

merged 4 commits into from
Nov 14, 2022

Conversation

jakule
Copy link
Contributor

@jakule jakule commented Oct 21, 2022

This change should re-enable and fix the AuditOn test. Read my comment for an explanation of each related change.

@jakule jakule force-pushed the jakule/auditon-fix-3 branch from 40fe59a to 8ac9976 Compare November 9, 2022 01:53
@jakule jakule force-pushed the jakule/auditon-fix-3 branch from 4d9a647 to 01a0527 Compare November 11, 2022 20:52
@jakule jakule changed the title [WIP] Reduce AuditOn flakiness. Reduce AuditOn flakiness. Nov 11, 2022
@@ -51,7 +51,8 @@ func NewFixture(t *testing.T) *Fixture {
require.NoError(t, err)

// Find AllocatePortsNum free listening ports to use.
fixture.Me, _ = user.Current()
fixture.Me, err = user.Current()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to the fix itself, but it bit me at some point, so I decided to fix it.


myTerm.Type("\aecho hi\n\r\aexit\n\r\a")
// let's type "echo hi" followed by "enter" and then "exit" + "enter":
myTerm.Type("echo hi\n\rexit\n\r")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\a suspends the terminal for a second

time.Sleep(time.Second)

I don't see the point in waiting.

@@ -304,13 +304,6 @@ func RunCommand() (errw io.Writer, code int, err error) {
if err != nil {
return errorWriter, teleport.RemoteCommandFailure, trace.Wrap(err)
}
defer func() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to close TTY and PTY before we close the process. This cleanup logic was introduced a few months ago #13491
After this function returns, the process ends anyway, so all file descriptors will be closed anyways.

@@ -990,6 +990,12 @@ func (s *session) startInteractive(ctx context.Context, ch ssh.Channel, scx *Ser
scx.Errorf("Received error waiting for the interactive session %v to finish: %v.", s.id, err)
}

if result != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this block above the select. The result should be returned before we finish the session. Otherwise, we may miss the exit code.

@@ -560,6 +571,7 @@ func (t *remoteTerminal) PID() int {
}

func (t *remoteTerminal) Close() error {
t.wg.Wait()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maybe controversial, but this t.wg is not being used anywhere, and from what I see we should wait on it here (the same as terminal).

@jakule jakule marked this pull request as ready for review November 11, 2022 22:16
@github-actions github-actions bot requested review from espadolini and mdwn November 11, 2022 22:16
var err error
defer t.closeTTY()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're closing the TTY in two different places anyway. Closing it here causes the return code to be "skipped".

@jakule jakule requested a review from zmb3 November 11, 2022 22:18
@jakule jakule changed the title Reduce AuditOn flakiness. Enable and fix AuditOn. Nov 11, 2022
Copy link
Collaborator

@zmb3 zmb3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. Did you want to keep the debug logging enabled or remove it?

@github-actions github-actions bot removed the request for review from mdwn November 14, 2022 08:50
@jakule
Copy link
Contributor Author

jakule commented Nov 14, 2022

Looks fine to me. Did you want to keep the debug logging enabled or remove it?

@zmb3 Are you asking about https://github.com/gravitational/teleport/pull/17687/files#diff-a4192e941574d6233edf26747166f0112babc45504defb4a22dcb211522c605fR272-R273
I'd like to have it, at least for now. We can always remove it when we find it annoying.

@jakule jakule enabled auto-merge (squash) November 14, 2022 22:31
@jakule jakule merged commit 08863c4 into master Nov 14, 2022
jakule added a commit that referenced this pull request Nov 17, 2022
This change re-enables the AuditOn system test and fixes the TTY connection between the Teleport parent and child process. It should allow the child to send the error code to the parent, which should fix the test.
jakule added a commit that referenced this pull request Nov 22, 2022
This change re-enables the AuditOn system test and fixes the TTY connection between the Teleport parent and child process. It should allow the child to send the error code to the parent, which should fix the test.

Backport of #17687
rosstimothy added a commit that referenced this pull request Aug 9, 2024
#17687 attempted to fix flakiness of TestIntegrations/AuditOn by
sending an exit-status request _prior_ to consuming all output from
the PTY. While this made the test more reliable, it created a
scenario that allowed for a session to be completed without all of
the data from the PTY being consumed by the client. This condition
was hit by running an ansible playbook that output 1MB to stdout.

The reason TestIntegrations/AuditOn was flaky is because the
exit-status request was not received at times. The mechanism used
to send that request requires sending the result over a channel
and the request to be sent by another goroutine. That provides an
opportunity for the request on the channel to be processed after
the underlying ssh connection has been closed.

To resolve the issue of missing output, the change in order of
operations from #17687 was reverted and the exit-status request
is now being sent directly in the same goroutine that waits
for the session to end instead. This change now causes the exit-status
to be sent later in time, which in the real world should not be
noticed, however, some time dependent tests needed to have their
timeout for sessions completing bumped.
rosstimothy added a commit that referenced this pull request Aug 9, 2024
#17687 attempted to fix flakiness of TestIntegrations/AuditOn by
sending an exit-status request _prior_ to consuming all output from
the PTY. While this made the test more reliable, it created a
scenario that allowed for a session to be completed without all of
the data from the PTY being consumed by the client. This condition
was hit by running an ansible playbook that output 1MB to stdout.

The reason TestIntegrations/AuditOn was flaky is because the
exit-status request was not received at times. The mechanism used
to send that request requires sending the result over a channel
and the request to be sent by another goroutine. That provides an
opportunity for the request on the channel to be processed after
the underlying ssh connection has been closed.

To resolve the issue of missing output, the change in order of
operations from #17687 was reverted and the exit-status request
is now being sent directly in the same goroutine that waits
for the session to end instead. This change now causes the exit-status
to be sent later in time, which in the real world should not be
noticed, however, some time dependent tests needed to have their
timeout for sessions completing bumped.
rosstimothy added a commit that referenced this pull request Aug 9, 2024
#17687 attempted to fix flakiness of TestIntegrations/AuditOn by
sending an exit-status request _prior_ to consuming all output from
the PTY. While this made the test more reliable, it created a
scenario that allowed for a session to be completed without all of
the data from the PTY being consumed by the client. This condition
was hit by running an ansible playbook that output 1MB to stdout.

The reason TestIntegrations/AuditOn was flaky is because the
exit-status request was not received at times. The mechanism used
to send that request requires sending the result over a channel
and the request to be sent by another goroutine. That provides an
opportunity for the request on the channel to be processed after
the underlying ssh connection has been closed.

To resolve the issue of missing output, the change in order of
operations from #17687 was reverted and the exit-status request
is now being sent directly in the same goroutine that waits
for the session to end instead. This change now causes the exit-status
to be sent later in time, which in the real world should not be
noticed, however, some time dependent tests needed to have their
timeout for sessions completing bumped.
github-merge-queue bot pushed a commit that referenced this pull request Aug 12, 2024
#17687 attempted to fix flakiness of TestIntegrations/AuditOn by
sending an exit-status request _prior_ to consuming all output from
the PTY. While this made the test more reliable, it created a
scenario that allowed for a session to be completed without all of
the data from the PTY being consumed by the client. This condition
was hit by running an ansible playbook that output 1MB to stdout.

The reason TestIntegrations/AuditOn was flaky is because the
exit-status request was not received at times. The mechanism used
to send that request requires sending the result over a channel
and the request to be sent by another goroutine. That provides an
opportunity for the request on the channel to be processed after
the underlying ssh connection has been closed.

To resolve the issue of missing output, the change in order of
operations from #17687 was reverted and the exit-status request
is now being sent directly in the same goroutine that waits
for the session to end instead. This change now causes the exit-status
to be sent later in time, which in the real world should not be
noticed, however, some time dependent tests needed to have their
timeout for sessions completing bumped.
github-actions bot pushed a commit that referenced this pull request Aug 12, 2024
#17687 attempted to fix flakiness of TestIntegrations/AuditOn by
sending an exit-status request _prior_ to consuming all output from
the PTY. While this made the test more reliable, it created a
scenario that allowed for a session to be completed without all of
the data from the PTY being consumed by the client. This condition
was hit by running an ansible playbook that output 1MB to stdout.

The reason TestIntegrations/AuditOn was flaky is because the
exit-status request was not received at times. The mechanism used
to send that request requires sending the result over a channel
and the request to be sent by another goroutine. That provides an
opportunity for the request on the channel to be processed after
the underlying ssh connection has been closed.

To resolve the issue of missing output, the change in order of
operations from #17687 was reverted and the exit-status request
is now being sent directly in the same goroutine that waits
for the session to end instead. This change now causes the exit-status
to be sent later in time, which in the real world should not be
noticed, however, some time dependent tests needed to have their
timeout for sessions completing bumped.
github-actions bot pushed a commit that referenced this pull request Aug 12, 2024
#17687 attempted to fix flakiness of TestIntegrations/AuditOn by
sending an exit-status request _prior_ to consuming all output from
the PTY. While this made the test more reliable, it created a
scenario that allowed for a session to be completed without all of
the data from the PTY being consumed by the client. This condition
was hit by running an ansible playbook that output 1MB to stdout.

The reason TestIntegrations/AuditOn was flaky is because the
exit-status request was not received at times. The mechanism used
to send that request requires sending the result over a channel
and the request to be sent by another goroutine. That provides an
opportunity for the request on the channel to be processed after
the underlying ssh connection has been closed.

To resolve the issue of missing output, the change in order of
operations from #17687 was reverted and the exit-status request
is now being sent directly in the same goroutine that waits
for the session to end instead. This change now causes the exit-status
to be sent later in time, which in the real world should not be
noticed, however, some time dependent tests needed to have their
timeout for sessions completing bumped.
github-actions bot pushed a commit that referenced this pull request Aug 12, 2024
#17687 attempted to fix flakiness of TestIntegrations/AuditOn by
sending an exit-status request _prior_ to consuming all output from
the PTY. While this made the test more reliable, it created a
scenario that allowed for a session to be completed without all of
the data from the PTY being consumed by the client. This condition
was hit by running an ansible playbook that output 1MB to stdout.

The reason TestIntegrations/AuditOn was flaky is because the
exit-status request was not received at times. The mechanism used
to send that request requires sending the result over a channel
and the request to be sent by another goroutine. That provides an
opportunity for the request on the channel to be processed after
the underlying ssh connection has been closed.

To resolve the issue of missing output, the change in order of
operations from #17687 was reverted and the exit-status request
is now being sent directly in the same goroutine that waits
for the session to end instead. This change now causes the exit-status
to be sent later in time, which in the real world should not be
noticed, however, some time dependent tests needed to have their
timeout for sessions completing bumped.
github-merge-queue bot pushed a commit that referenced this pull request Aug 12, 2024
#17687 attempted to fix flakiness of TestIntegrations/AuditOn by
sending an exit-status request _prior_ to consuming all output from
the PTY. While this made the test more reliable, it created a
scenario that allowed for a session to be completed without all of
the data from the PTY being consumed by the client. This condition
was hit by running an ansible playbook that output 1MB to stdout.

The reason TestIntegrations/AuditOn was flaky is because the
exit-status request was not received at times. The mechanism used
to send that request requires sending the result over a channel
and the request to be sent by another goroutine. That provides an
opportunity for the request on the channel to be processed after
the underlying ssh connection has been closed.

To resolve the issue of missing output, the change in order of
operations from #17687 was reverted and the exit-status request
is now being sent directly in the same goroutine that waits
for the session to end instead. This change now causes the exit-status
to be sent later in time, which in the real world should not be
noticed, however, some time dependent tests needed to have their
timeout for sessions completing bumped.
rosstimothy added a commit that referenced this pull request Aug 13, 2024
#17687 attempted to fix flakiness of TestIntegrations/AuditOn by
sending an exit-status request _prior_ to consuming all output from
the PTY. While this made the test more reliable, it created a
scenario that allowed for a session to be completed without all of
the data from the PTY being consumed by the client. This condition
was hit by running an ansible playbook that output 1MB to stdout.

The reason TestIntegrations/AuditOn was flaky is because the
exit-status request was not received at times. The mechanism used
to send that request requires sending the result over a channel
and the request to be sent by another goroutine. That provides an
opportunity for the request on the channel to be processed after
the underlying ssh connection has been closed.

To resolve the issue of missing output, the change in order of
operations from #17687 was reverted and the exit-status request
is now being sent directly in the same goroutine that waits
for the session to end instead. This change now causes the exit-status
to be sent later in time, which in the real world should not be
noticed, however, some time dependent tests needed to have their
timeout for sessions completing bumped.
rosstimothy added a commit that referenced this pull request Aug 13, 2024
#17687 attempted to fix flakiness of TestIntegrations/AuditOn by
sending an exit-status request _prior_ to consuming all output from
the PTY. While this made the test more reliable, it created a
scenario that allowed for a session to be completed without all of
the data from the PTY being consumed by the client. This condition
was hit by running an ansible playbook that output 1MB to stdout.

The reason TestIntegrations/AuditOn was flaky is because the
exit-status request was not received at times. The mechanism used
to send that request requires sending the result over a channel
and the request to be sent by another goroutine. That provides an
opportunity for the request on the channel to be processed after
the underlying ssh connection has been closed.

To resolve the issue of missing output, the change in order of
operations from #17687 was reverted and the exit-status request
is now being sent directly in the same goroutine that waits
for the session to end instead. This change now causes the exit-status
to be sent later in time, which in the real world should not be
noticed, however, some time dependent tests needed to have their
timeout for sessions completing bumped.
github-merge-queue bot pushed a commit that referenced this pull request Aug 13, 2024
* Prevent exiting a session prior to output being consumed

#17687 attempted to fix flakiness of TestIntegrations/AuditOn by
sending an exit-status request _prior_ to consuming all output from
the PTY. While this made the test more reliable, it created a
scenario that allowed for a session to be completed without all of
the data from the PTY being consumed by the client. This condition
was hit by running an ansible playbook that output 1MB to stdout.

The reason TestIntegrations/AuditOn was flaky is because the
exit-status request was not received at times. The mechanism used
to send that request requires sending the result over a channel
and the request to be sent by another goroutine. That provides an
opportunity for the request on the channel to be processed after
the underlying ssh connection has been closed.

To resolve the issue of missing output, the change in order of
operations from #17687 was reverted and the exit-status request
is now being sent directly in the same goroutine that waits
for the session to end instead. This change now causes the exit-status
to be sent later in time, which in the real world should not be
noticed, however, some time dependent tests needed to have their
timeout for sessions completing bumped.

* Fix SSH sessions recorded on proxy not being fully closed (#41434)

* fix(srv): SSH remote sessions resources not being closed correctly

* refactor(srv): code review suggestions

* test(srv): move t.Helper to the correct function

* chore(srv): typo

* chore(srv): typo

---------

Co-authored-by: Gabriel Corado <gabriel.oliveira@goteleport.com>
rosstimothy added a commit that referenced this pull request Aug 20, 2024
#17687 attempted to fix flakiness of TestIntegrations/AuditOn by
sending an exit-status request _prior_ to consuming all output from
the PTY. While this made the test more reliable, it created a
scenario that allowed for a session to be completed without all of
the data from the PTY being consumed by the client. This condition
was hit by running an ansible playbook that output 1MB to stdout.

The reason TestIntegrations/AuditOn was flaky is because the
exit-status request was not received at times. The mechanism used
to send that request requires sending the result over a channel
and the request to be sent by another goroutine. That provides an
opportunity for the request on the channel to be processed after
the underlying ssh connection has been closed.

To resolve the issue of missing output, the change in order of
operations from #17687 was reverted and the exit-status request
is now being sent directly in the same goroutine that waits
for the session to end instead. This change now causes the exit-status
to be sent later in time, which in the real world should not be
noticed, however, some time dependent tests needed to have their
timeout for sessions completing bumped.
rosstimothy added a commit that referenced this pull request Aug 28, 2024
#17687 attempted to fix flakiness of TestIntegrations/AuditOn by
sending an exit-status request _prior_ to consuming all output from
the PTY. While this made the test more reliable, it created a
scenario that allowed for a session to be completed without all of
the data from the PTY being consumed by the client. This condition
was hit by running an ansible playbook that output 1MB to stdout.

The reason TestIntegrations/AuditOn was flaky is because the
exit-status request was not received at times. The mechanism used
to send that request requires sending the result over a channel
and the request to be sent by another goroutine. That provides an
opportunity for the request on the channel to be processed after
the underlying ssh connection has been closed.

To resolve the issue of missing output, the change in order of
operations from #17687 was reverted and the exit-status request
is now being sent directly in the same goroutine that waits
for the session to end instead. This change now causes the exit-status
to be sent later in time, which in the real world should not be
noticed, however, some time dependent tests needed to have their
timeout for sessions completing bumped.
github-merge-queue bot pushed a commit that referenced this pull request Aug 28, 2024
* Prevent exiting a session prior to output being consumed

#17687 attempted to fix flakiness of TestIntegrations/AuditOn by
sending an exit-status request _prior_ to consuming all output from
the PTY. While this made the test more reliable, it created a
scenario that allowed for a session to be completed without all of
the data from the PTY being consumed by the client. This condition
was hit by running an ansible playbook that output 1MB to stdout.

The reason TestIntegrations/AuditOn was flaky is because the
exit-status request was not received at times. The mechanism used
to send that request requires sending the result over a channel
and the request to be sent by another goroutine. That provides an
opportunity for the request on the channel to be processed after
the underlying ssh connection has been closed.

To resolve the issue of missing output, the change in order of
operations from #17687 was reverted and the exit-status request
is now being sent directly in the same goroutine that waits
for the session to end instead. This change now causes the exit-status
to be sent later in time, which in the real world should not be
noticed, however, some time dependent tests needed to have their
timeout for sessions completing bumped.

* Fix SSH sessions recorded on proxy not being fully closed (#41434)

* fix(srv): SSH remote sessions resources not being closed correctly

* refactor(srv): code review suggestions

* test(srv): move t.Helper to the correct function

* chore(srv): typo

* chore(srv): typo

---------

Co-authored-by: Gabriel Corado <gabriel.oliveira@goteleport.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants