Enable and fix AuditOn. #17687

jakule · 2022-10-21T20:34:49Z

This change should re-enable and fix the AuditOn test. Read my comment for an explanation of each related change.

jakule · 2022-11-11T22:05:53Z

integration/helpers/fixture.go

@@ -51,7 +51,8 @@ func NewFixture(t *testing.T) *Fixture {
 	require.NoError(t, err)

 	// Find AllocatePortsNum free listening ports to use.
-	fixture.Me, _ = user.Current()
+	fixture.Me, err = user.Current()


Unrelated to the fix itself, but it bit me at some point, so I decided to fix it.

jakule · 2022-11-11T22:08:33Z

integration/integration_test.go

-
-			myTerm.Type("\aecho hi\n\r\aexit\n\r\a")
+			// let's type "echo hi" followed by "enter" and then "exit" + "enter":
+			myTerm.Type("echo hi\n\rexit\n\r")


\a suspends the terminal for a second

teleport/integration/terminal_test.go

Line 79 in 1686a71

time.Sleep(time.Second)

I don't see the point in waiting.

jakule · 2022-11-11T22:11:32Z

lib/srv/reexec.go

@@ -304,13 +304,6 @@ func RunCommand() (errw io.Writer, code int, err error) {
 	if err != nil {
 		return errorWriter, teleport.RemoteCommandFailure, trace.Wrap(err)
 	}
-	defer func() {


We don't want to close TTY and PTY before we close the process. This cleanup logic was introduced a few months ago #13491
After this function returns, the process ends anyway, so all file descriptors will be closed anyways.

jakule · 2022-11-11T22:13:26Z

lib/srv/sess.go

@@ -990,6 +990,12 @@ func (s *session) startInteractive(ctx context.Context, ch ssh.Channel, scx *Ser
 			scx.Errorf("Received error waiting for the interactive session %v to finish: %v.", s.id, err)
 		}

+		if result != nil {


I moved this block above the select. The result should be returned before we finish the session. Otherwise, we may miss the exit code.

jakule · 2022-11-11T22:14:30Z

lib/srv/term.go

@@ -560,6 +571,7 @@ func (t *remoteTerminal) PID() int {
 }

 func (t *remoteTerminal) Close() error {
+	t.wg.Wait()


This maybe controversial, but this t.wg is not being used anywhere, and from what I see we should wait on it here (the same as terminal).

jakule · 2022-11-11T22:17:20Z

lib/srv/term.go

 	var err error
-	defer t.closeTTY()


We're closing the TTY in two different places anyway. Closing it here causes the return code to be "skipped".

zmb3

Looks fine to me. Did you want to keep the debug logging enabled or remove it?

jakule · 2022-11-14T22:28:50Z

Looks fine to me. Did you want to keep the debug logging enabled or remove it?

@zmb3 Are you asking about https://github.com/gravitational/teleport/pull/17687/files#diff-a4192e941574d6233edf26747166f0112babc45504defb4a22dcb211522c605fR272-R273
I'd like to have it, at least for now. We can always remove it when we find it annoying.

This change re-enables the AuditOn system test and fixes the TTY connection between the Teleport parent and child process. It should allow the child to send the error code to the parent, which should fix the test.

This change re-enables the AuditOn system test and fixes the TTY connection between the Teleport parent and child process. It should allow the child to send the error code to the parent, which should fix the test. Backport of #17687

#17687 attempted to fix flakiness of TestIntegrations/AuditOn by sending an exit-status request _prior_ to consuming all output from the PTY. While this made the test more reliable, it created a scenario that allowed for a session to be completed without all of the data from the PTY being consumed by the client. This condition was hit by running an ansible playbook that output 1MB to stdout. The reason TestIntegrations/AuditOn was flaky is because the exit-status request was not received at times. The mechanism used to send that request requires sending the result over a channel and the request to be sent by another goroutine. That provides an opportunity for the request on the channel to be processed after the underlying ssh connection has been closed. To resolve the issue of missing output, the change in order of operations from #17687 was reverted and the exit-status request is now being sent directly in the same goroutine that waits for the session to end instead. This change now causes the exit-status to be sent later in time, which in the real world should not be noticed, however, some time dependent tests needed to have their timeout for sessions completing bumped.

* Prevent exiting a session prior to output being consumed #17687 attempted to fix flakiness of TestIntegrations/AuditOn by sending an exit-status request _prior_ to consuming all output from the PTY. While this made the test more reliable, it created a scenario that allowed for a session to be completed without all of the data from the PTY being consumed by the client. This condition was hit by running an ansible playbook that output 1MB to stdout. The reason TestIntegrations/AuditOn was flaky is because the exit-status request was not received at times. The mechanism used to send that request requires sending the result over a channel and the request to be sent by another goroutine. That provides an opportunity for the request on the channel to be processed after the underlying ssh connection has been closed. To resolve the issue of missing output, the change in order of operations from #17687 was reverted and the exit-status request is now being sent directly in the same goroutine that waits for the session to end instead. This change now causes the exit-status to be sent later in time, which in the real world should not be noticed, however, some time dependent tests needed to have their timeout for sessions completing bumped. * Fix SSH sessions recorded on proxy not being fully closed (#41434) * fix(srv): SSH remote sessions resources not being closed correctly * refactor(srv): code review suggestions * test(srv): move t.Helper to the correct function * chore(srv): typo * chore(srv): typo --------- Co-authored-by: Gabriel Corado <gabriel.oliveira@goteleport.com>

#17687 attempted to fix flakiness of TestIntegrations/AuditOn by sending an exit-status request _prior_ to consuming all output from the PTY. While this made the test more reliable, it created a scenario that allowed for a session to be completed without all of the data from the PTY being consumed by the client. This condition was hit by running an ansible playbook that output 1MB to stdout. The reason TestIntegrations/AuditOn was flaky is because the exit-status request was not received at times. The mechanism used to send that request requires sending the result over a channel and the request to be sent by another goroutine. That provides an opportunity for the request on the channel to be processed after the underlying ssh connection has been closed. To resolve the issue of missing output, the change in order of operations from #17687 was reverted and the exit-status request is now being sent directly in the same goroutine that waits for the session to end instead. This change now causes the exit-status to be sent later in time, which in the real world should not be noticed, however, some time dependent tests needed to have their timeout for sessions completing bumped.

* Prevent exiting a session prior to output being consumed #17687 attempted to fix flakiness of TestIntegrations/AuditOn by sending an exit-status request _prior_ to consuming all output from the PTY. While this made the test more reliable, it created a scenario that allowed for a session to be completed without all of the data from the PTY being consumed by the client. This condition was hit by running an ansible playbook that output 1MB to stdout. The reason TestIntegrations/AuditOn was flaky is because the exit-status request was not received at times. The mechanism used to send that request requires sending the result over a channel and the request to be sent by another goroutine. That provides an opportunity for the request on the channel to be processed after the underlying ssh connection has been closed. To resolve the issue of missing output, the change in order of operations from #17687 was reverted and the exit-status request is now being sent directly in the same goroutine that waits for the session to end instead. This change now causes the exit-status to be sent later in time, which in the real world should not be noticed, however, some time dependent tests needed to have their timeout for sessions completing bumped. * Fix SSH sessions recorded on proxy not being fully closed (#41434) * fix(srv): SSH remote sessions resources not being closed correctly * refactor(srv): code review suggestions * test(srv): move t.Helper to the correct function * chore(srv): typo * chore(srv): typo --------- Co-authored-by: Gabriel Corado <gabriel.oliveira@goteleport.com>

jakule force-pushed the jakule/auditon-fix-3 branch from 40fe59a to 8ac9976 Compare November 9, 2022 01:53

jakule added 3 commits November 11, 2022 15:34

Reduce AuditOn flakiness.

919b788

Commit missing code.

03519e5

Do not close TTY too early.

01a0527

jakule force-pushed the jakule/auditon-fix-3 branch from 4d9a647 to 01a0527 Compare November 11, 2022 20:52

jakule changed the title ~~[WIP] Reduce AuditOn flakiness.~~ Reduce AuditOn flakiness. Nov 11, 2022

jakule commented Nov 11, 2022

View reviewed changes

jakule marked this pull request as ready for review November 11, 2022 22:16

github-actions bot requested review from espadolini and mdwn November 11, 2022 22:16

jakule commented Nov 11, 2022

View reviewed changes

jakule requested a review from zmb3 November 11, 2022 22:18

jakule changed the title ~~Reduce AuditOn flakiness.~~ Enable and fix AuditOn. Nov 11, 2022

zmb3 approved these changes Nov 11, 2022

View reviewed changes

espadolini approved these changes Nov 14, 2022

View reviewed changes

github-actions bot removed the request for review from mdwn November 14, 2022 08:50

Merge branch 'master' into jakule/auditon-fix-3

8e9867c

jakule enabled auto-merge (squash) November 14, 2022 22:31

jakule merged commit 08863c4 into master Nov 14, 2022

This was referenced Nov 17, 2022

[v11] Enable and fix AuditOn. #18574

Merged

TestIntegrations/AuditOn flakiness #17224

Open

rosstimothy mentioned this pull request Aug 9, 2024

Prevent exiting a session prior to output being consumed #45223

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable and fix AuditOn. #17687

Enable and fix AuditOn. #17687

jakule commented Oct 21, 2022 •

edited

Loading

jakule Nov 11, 2022

jakule Nov 11, 2022

jakule Nov 11, 2022

jakule Nov 11, 2022

jakule Nov 11, 2022

jakule Nov 11, 2022

zmb3 left a comment

jakule commented Nov 14, 2022

Enable and fix AuditOn. #17687

Enable and fix AuditOn. #17687

Conversation

jakule commented Oct 21, 2022 • edited Loading

jakule Nov 11, 2022

Choose a reason for hiding this comment

jakule Nov 11, 2022

Choose a reason for hiding this comment

jakule Nov 11, 2022

Choose a reason for hiding this comment

jakule Nov 11, 2022

Choose a reason for hiding this comment

jakule Nov 11, 2022

Choose a reason for hiding this comment

jakule Nov 11, 2022

Choose a reason for hiding this comment

zmb3 left a comment

Choose a reason for hiding this comment

jakule commented Nov 14, 2022

jakule commented Oct 21, 2022 •

edited

Loading