Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

securecomms: TestSshProxyReverse test is flakey #2153

Open
stevenhorsman opened this issue Nov 13, 2024 · 0 comments · May be fixed by #2154
Open

securecomms: TestSshProxyReverse test is flakey #2153

stevenhorsman opened this issue Nov 13, 2024 · 0 comments · May be fixed by #2154
Assignees
Labels
bug Something isn't working

Comments

@stevenhorsman
Copy link
Member

stevenhorsman commented Nov 13, 2024

In the last few weeks I have failing often seen the CAA build job failed e.g. https://github.com/confidential-containers/cloud-api-adaptor/actions/runs/11819549925/job/32929722455?pr=2146 due to an issue in the TestSshProxyReverse test:

=== RUN   TestSshProxyReverse
2024/11/13 14:44:54 [secure-comms] Attestation phase: peer reported phase Attestation
2024/11/13 14:44:54 [secure-comms] Attestation phase: peer reported phase Attestation
2024/11/13 14:44:54 [secure-comms] Inbound listening to port 7011 in namespace 
2024/11/13 14:44:54 [secure-comms] Attestation phase: AddInbound: XYZ
HttpClient start : http://127.0.0.1:7011/
HttpClient sending req: http://127.0.0.1:7011/
2024/11/13 14:44:54 [secure-comms] Attestation phase: Inbound accept: XYZ
2024/11/13 14:44:54 [secure-comms] Attestation phase: NewSshPeer - peer requested a tunnel channel for XYZ
2024/11/13 14:44:54 [secure-comms] Attestation phase: NewInboundInstance OpenChannel opening tunnel for: XYZ
2024/11/13 14:44:54 [secure-comms] Outbound XYZ accept dial address 127.0.0.1:7001 err: dial tcp 127.0.0.1:7001: connect: connection refused - closing channel
2024/11/13 14:44:54 [secure-comms] Attestation phase: Inbound XYZ channelReqs closed
HttpClient http://127.0.0.1:7011/ Error Get "http://127.0.0.1:7011/": EOF
    sshproxy_test.go:230: Failed - not successful
2024/11/13 14:44:54 [secure-comms] Attestation phase: peer reported it is upgrading to Kubernetes phase
2024/11/13 14:44:54 [secure-comms] Attestation phase: peer done by >>> Test Finish <<<
2024/11/13 14:44:54 [secure-comms] Attestation phase: peer done by >>> chans closed <<<
--- FAIL: TestSshProxyReverse (1.63s)

and in https://github.com/confidential-containers/cloud-api-adaptor/actions/runs/11819549925/job/32930561772?pr=2146

=== RUN   TestSshProxyWithNamespace
2024/11/13 14:58:28 [secure-comms] Attestation phase: peer reported phase Attestation
2024/11/13 14:58:28 [secure-comms] Attestation phase: peer reported phase Attestation
2024/11/13 14:58:28 [secure-comms] Inbound listening to port 7010 in namespace a64934bc-7bb7-4e83-8ad7-794edd3a1f55
2024/11/13 14:58:28 [secure-comms] Attestation phase: AddInbound: ABC
HttpClient start : http://127.0.0.1:7010/ in namepspace: /run/netns/a64934bc-7bb7-4e83-8ad7-794edd3a1f55
HttpClient dialing req: 127.0.0.1:7010 in namepspace: /run/netns/a64934bc-7bb7-4e83-8ad7-794edd3a1f55
2024/11/13 14:58:28 [secure-comms] Attestation phase: Inbound accept: ABC
2024/11/13 14:58:28 [secure-comms] Attestation phase: NewInboundInstance OpenChannel opening tunnel for: ABC
2024/11/13 14:58:28 [secure-comms] Attestation phase: NewSshPeer - peer requested a tunnel channel for ABC
2024/11/13 14:58:28 [secure-comms] Outbound ABC accept dial address 127.0.0.1:7020 err: dial tcp 127.0.0.1:7020: connect: connection refused - closing channel
2024/11/13 14:58:28 [secure-comms] Attestation phase: Inbound ABC channelReqs closed
HttpClient http://127.0.0.1:7010/ Get Error Get "http://127.0.0.1:7010/": EOF
    sshproxy_test.go:187: Failed - not successful
--- FAIL: TestSshProxyWithNamespace (2.23s)

These are starting to slow down PRs as multiple re-test are sometimes needed, so we need to have these tests fixed.

@stevenhorsman stevenhorsman added the bug Something isn't working label Nov 13, 2024
davidhadas pushed a commit to davidhadas/cloud-api-adaptor that referenced this issue Nov 14, 2024
Fixes: confidential-containers#2153

Seperate ListenAndServe() of test servers to perform Listen() in the
same thread as used by the client that rely on the server port to be
open.

Move to reuse tuntest network namespace test code rather that
maintaining seperate network namespace test code for SecureComms.

Signed-off-by: David Hadas <davidh@il.ibm.com>
@davidhadas davidhadas linked a pull request Nov 14, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants