Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestHSMMigrate flakiness #14172

Closed
ibeckermayer opened this issue Jul 7, 2022 · 17 comments · Fixed by #32911 or #37241
Closed

TestHSMMigrate flakiness #14172

ibeckermayer opened this issue Jul 7, 2022 · 17 comments · Fixed by #32911 or #37241
Assignees

Comments

@ibeckermayer
Copy link
Contributor

Failure

Link(s) to logs

Relevant snippet

        	Messages:   	error while closing auth2 during test cleanup
        	Test:       	TestHSMMigrate
        	            	context canceled
        	Error:      	Received unexpected error:
        	            				hsm_test.go:826
        	            				hsm_test.go:811
        	            				testing.go:864
        	            				panic.go:500
        	            				testing.go:1433
        	            				testing.go:1203
        	            				testing.go:1034
        	            				hsm_test.go:76
        	            				testing.go:864
        	            				panic.go:500
        	            				testing.go:1187
        	            				testing.go:1203
        	            				testing.go:1034
        	Error Trace:	hsm_test.go:76
    hsm_test.go:76: 
@nklaassen
Copy link
Contributor

fwiw the test cleanup error is a red herring, that happens when the test has already failed for another reason

    hsm_test.go:811: 
        	Error Trace:	hsm_test.go:811
        	            				hsm_test.go:826
        	Error:      	Received unexpected error:
        	            	timed out waiting for proxy to be ready
        	            		context deadline exceeded
        	Test:       	TestHSMMigrate

I've finally managed to repro this off of CI and working on it now

@nklaassen
Copy link
Contributor

failures where the log includes is not in dotted-tri format should be fixed by #14178

@nklaassen
Copy link
Contributor

This is hopefully fixed now, it looks like we had no hits this week. Going to close this issue, please ping me directly if you see a failure.

@zmb3
Copy link
Collaborator

zmb3 commented Aug 14, 2023

@zmb3 zmb3 reopened this Aug 14, 2023
@zmb3
Copy link
Collaborator

zmb3 commented Aug 23, 2023

https://github.com/gravitational/teleport/actions/runs/5954247952/job/16150371290?pr=30916

    hsm_test.go:78: 
        	Error Trace:	/__w/teleport/teleport/integration/hsm/hsm_test.go:78
        	            				/opt/go/src/testing/testing.go:1150
        	            				/opt/go/src/testing/testing.go:1328
        	            				/opt/go/src/testing/testing.go:1570
        	            				/opt/go/src/runtime/panic.go:540
        	            				/opt/go/src/testing/testing.go:980
        	            				/__w/teleport/teleport/integration/hsm/hsm_test.go:836
        	            				/__w/teleport/teleport/integration/hsm/hsm_test.go:879
        	Error:      	Received unexpected error:
        	            	context canceled, context canceled
        	Test:       	TestHSMMigrate
        	Messages:   	error while closing auth2 during test cleanup

@Tener
Copy link
Contributor

Tener commented Aug 25, 2023

    hsm_test.go:78: 
        	Error Trace:	/__w/teleport/teleport/integration/hsm/hsm_test.go:78
        	            				/opt/go/src/testing/testing.go:1150
        	            				/opt/go/src/testing/testing.go:1328
        	            				/opt/go/src/testing/testing.go:1312
        	            				/opt/go/src/runtime/panic.go:540
        	            				/opt/go/src/testing/testing.go:980
        	            				/__w/teleport/teleport/integration/hsm/hsm_test.go:78
        	            				/opt/go/src/testing/testing.go:1150
        	            				/opt/go/src/testing/testing.go:1328
        	            				/opt/go/src/testing/testing.go:1570
        	            				/opt/go/src/runtime/panic.go:540
        	            				/opt/go/src/testing/testing.go:980
        	            				/__w/teleport/teleport/integration/hsm/hsm_test.go:820
        	            				/__w/teleport/teleport/integration/hsm/hsm_test.go:879
        	Error:      	Received unexpected error:
        	            	context canceled, context canceled, context canceled
        	Test:       	TestHSMMigrate
        	Messages:   	error while closing auth1 during test cleanup

--- FAIL: TestHSMMigrate (480.08s)

@zmb3
Copy link
Collaborator

zmb3 commented Aug 29, 2023

@GavinFrazar
Copy link
Contributor

https://github.com/gravitational/teleport/actions/runs/6239221136/job/16936696327

    hsm_test.go:836: 
        	Error Trace:	/__w/teleport/teleport/integration/hsm/hsm_test.go:836
        	            				/__w/teleport/teleport/integration/hsm/hsm_test.go:879
        	Error:      	Received unexpected error:
        	            	timed out waiting for proxy to be ready
        	            		context deadline exceeded
        	Test:       	TestHSMMigrate

@jakule
Copy link
Contributor

jakule commented Sep 27, 2023

Another one https://github.com/gravitational/teleport/actions/runs/6331609086/job/17196519623?pr=32681

    hsm_test.go:828: 
        	Error Trace:	/__w/teleport/teleport/integration/hsm/hsm_test.go:828
        	            				/__w/teleport/teleport/integration/hsm/hsm_test.go:851
        	Error:      	Received unexpected error:
        	            	timed out waiting for proxy to be ready
        	            		context deadline exceeded
        	Test:       	TestHSMMigrate
time="2023-09-27T21:37:20Z" level=warning msg="Failed to forward connection: context is closing." trace.component=loadbalancer trace.fields="map[listen:8ce2fe8a89f0:0]"
{"caller":"service/supervisor.go:388","component":"proc:17","event":"TeleportExit","level":"debug","message":"Broadcasting event.","pid":"34558.17","timestamp":"2023-09-27T21:37:20Z"}
{"caller":"service/connect.go:699","component":"proc:17","level":"warning","message":"Sync rotation state cycle failed. Retrying in ~10s","pid":"34558.17","timestamp":"2023-09-27T21:37:20Z"}
{"caller":"service/supervisor.go:286","component":"proc:17","error":"context canceled","level":"warning","message":"Teleport process has exited with error.","pid":"34558.17","service":"auth.server_info","timestamp":"2023-09-27T21:37:20Z"}
{"caller":"service/supervisor.go:286","component":"proc:16","error":"context canceled","level":"warning","message":"Teleport process has exited with error.","pid":"34558.16","service":"auth.server_info","timestamp":"2023-09-27T21:37:20Z"}
{"caller":"service/connect.go:699","component":"proc:16","level":"warning","message":"Sync rotation state cycle failed. Retrying in ~10s","pid":"34558.16","timestamp":"2023-09-27T21:37:20Z"}
{"caller":"service/supervisor.go:388","component":"proc:16","event":"TeleportExit","level":"debug","message":"Broadcasting event.","pid":"34558.16","timestamp":"2023-09-27T21:37:20Z"}
{"caller":"service/supervisor.go:388","component":"proc:17","event":"TeleportExit","level":"debug","message":"Broadcasting event.","pid":"34558.17","timestamp":"2023-09-27T21:37:20Z"}
{"caller":"service/supervisor.go:257","component":"proc:17","level":"debug","message":"Service is completed and removed.","pid":"34558.17","service":"auth.heartbeat","timestamp":"2023-09-27T21:37:20Z"}
    hsm_test.go:78: 
        	Error Trace:	/__w/teleport/teleport/integration/hsm/hsm_test.go:78
        	            				/opt/go/src/testing/testing.go:1169
        	            				/opt/go/src/testing/testing.go:1347
        	            				/opt/go/src/testing/testing.go:1589
        	            				/opt/go/src/runtime/panic.go:541
        	            				/opt/go/src/testing/testing.go:999
        	            				/__w/teleport/teleport/integration/hsm/hsm_test.go:828
        	            				/__w/teleport/teleport/integration/hsm/hsm_test.go:851
        	Error:      	Received unexpected error:
        	            	context canceled, context canceled, context canceled
        	Test:       	TestHSMMigrate
        	Messages:   	error while closing auth1 during test cleanup

@zmb3
Copy link
Collaborator

zmb3 commented Jan 19, 2024

FYI @nklaassen saw this one again today:

        	Error Trace:	/__w/teleport/teleport/integration/hsm/helpers.go:64
        	            				/opt/go/src/testing/testing.go:1169
        	            				/opt/go/src/testing/testing.go:1347
        	            				/opt/go/src/testing/testing.go:1331
        	            				/opt/go/src/runtime/panic.go:541
        	            				/opt/go/src/testing/testing.go:999
        	            				/__w/teleport/teleport/integration/hsm/helpers.go:64
        	            				/opt/go/src/testing/testing.go:1169
        	            				/opt/go/src/testing/testing.go:1347
        	            				/opt/go/src/testing/testing.go:1589
        	            				/opt/go/src/runtime/panic.go:541
        	            				/opt/go/src/testing/testing.go:999
        	            				/__w/teleport/teleport/integration/hsm/hsm_test.go:544
        	            				/__w/teleport/teleport/integration/hsm/hsm_test.go:595
        	Error:      	Received unexpected error:
        	            	context canceled, context canceled, context canceled
        	Test:       	TestHSMMigrate
        	Messages:   	error while closing auth1 during test cleanup

https://github.com/gravitational/teleport/actions/runs/7588652574/job/20671669318

@zmb3 zmb3 reopened this Jan 19, 2024
@nklaassen
Copy link
Contributor

that cleanup error is a red herring, the original failure is:

    hsm_test.go:544: 
        	Error Trace:	/__w/teleport/teleport/integration/hsm/hsm_test.go:544
        	            				/__w/teleport/teleport/integration/hsm/hsm_test.go:595
        	Error:      	Received unexpected error:
        	            	timed out waiting for proxy gen 5 to be ready
        	            		context deadline exceeded
        	Test:       	TestHSMMigrate

I'll look into it

@zmb3
Copy link
Collaborator

zmb3 commented Jan 23, 2024

Another: https://github.com/gravitational/teleport/actions/runs/7629026500/job/20781482241

    hsm_test.go:570: 
        	Error Trace:	/__w/teleport/teleport/integration/hsm/hsm_test.go:570
        	            				/__w/teleport/teleport/integration/hsm/hsm_test.go:619
        	Error:      	Received unexpected error:
        	            	timed out waiting for proxy gen 7 to be ready
        	            		context deadline exceeded
        	Test:       	TestHSMMigrate

@zmb3
Copy link
Collaborator

zmb3 commented Jan 23, 2024

Brutal: https://github.com/gravitational/teleport/actions/runs/7629852042/job/20784246262

@ptgott
Copy link
Contributor

ptgott commented Jan 25, 2024

=== Failed
=== FAIL: integration/hsm  (0.00s)
panic: test timed out after 30m0s
running tests:
	TestHSMMigrate (29m35s)

https://github.com/gravitational/teleport/actions/runs/7658978297/job/20873085411#step:6:3993

@zmb3
Copy link
Collaborator

zmb3 commented Feb 20, 2024

https://github.com/gravitational/teleport/actions/runs/7975999279/job/21775537866

hsm_test.go:570: 
        	Error Trace:	/__w/teleport/teleport/integration/hsm/hsm_test.go:570
        	Error:      	"[{{cluster_alert  v1 {ca-key-types/1e83a6cf-5b0d-4c21-ab5d-06333b984cd6   map[teleport.internal/alert-on-login:yes teleport.internal/alert-verb-permit:cert_authority:update] 2024-02-20 03:44:50.750241474 +0000 UTC %!s(int64=0)  {}  %!s(int32=0)} {}  %!s(int32=0)} {MEDIUM Auth Service 1e83a6cf-5b0d-4c21-ab5d-06333b984cd6 is configured to use PKCS#11 HSM keys created by 1e83a6cf-5b0d-4c21-ab5d-06333b984cd6, but the following CAs do not contain any keys of that type: [oidc_idp host user db db_client openssh jwt saml_idp]. The Auth Service will continue signing certificates with raw software keys. These CAs must be rotated to begin using the configured key type. See https://goteleport.com/docs/management/operations/ca-rotation/ 2024-02-20 15:40:58.2[2468](https://github.com/gravitational/teleport/actions/runs/7975999279/job/21775537866#step:6:2469)2224 +0000 UTC {}  %!s(int32=0)} {}  %!s(int32=0)}]" should have 2 item(s), but has 1
        	Test:       	TestHSMMigrate

@zmb3 zmb3 reopened this Feb 20, 2024
@ravicious
Copy link
Member

https://github.com/gravitational/teleport/actions/runs/9954455637/job/27500124402#step:6:4081

    hsm_test.go:461: 
        	Error Trace:	/__w/teleport/teleport/integration/hsm/hsm_test.go:461
        	Error:      	Received unexpected error:
        	            	initialization failed
        	            		context deadline exceeded
        	Test:       	TestHSMMigrate
{"level":"warn","ts":"2024-07-16T09:47:06.240098Z","logger":"etcd-client","caller":"v3@v3.5.14/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001a74b40/etcd0:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp: lookup etcd0 on 127.0.0.11:53: server misbehaving\""}
    hsm_test.go:91: 
        	Error Trace:	/__w/teleport/teleport/integration/hsm/hsm_test.go:91
        	            				/opt/go/src/testing/testing.go:1175
        	            				/opt/go/src/testing/testing.go:1353
        	            				/opt/go/src/testing/testing.go:1683
        	            				/opt/go/src/runtime/panic.go:626
        	            				/opt/go/src/testing/testing.go:1005
        	            				/__w/teleport/teleport/integration/hsm/hsm_test.go:461
        	Error:      	Received unexpected error:
        	            	context deadline exceeded
        	Test:       	TestHSMMigrate

@nklaassen
Copy link
Contributor

edoardo refactored this test to no longer depend on etcd, i'll close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants