Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race detector makes TestGenerateKey flaky in pkg/server/plugin/keymanager/awskms #2714

Closed
zmt opened this issue Jan 31, 2022 · 3 comments · Fixed by #2855
Closed

race detector makes TestGenerateKey flaky in pkg/server/plugin/keymanager/awskms #2714

zmt opened this issue Jan 31, 2022 · 3 comments · Fixed by #2855
Labels
help wanted Issues with this label are ready to start work but are in need of someone to do it

Comments

@zmt
Copy link
Contributor

zmt commented Jan 31, 2022

  • Version: 2067119
  • Platform: Linux [redacted] 5.4.129+ #1 SMP Wed Aug 18 19:58:18 PDT 2021 x86_64 GNU/Linux
  • Subsystem: AWSKMS Keymanager Plugin

The TestGenerateKey in pkg/server/plugin/keymanager/awskms fails at an unacceptable rate when run under the race detector. No specific data race is detected, but the additional overhead of enabling the race detector seems to trigger failures in some time-dependent tests. It tends to fail in a few ways:

Example 1:

% go test -race -count 100 -failfast -run TestGenerateKey ./pkg/server/plugin/keymanager/awskms
--- FAIL: TestGenerateKey (0.40s)
    --- FAIL: TestGenerateKey/invalid_arn_error (0.03s)
        logs.go:46:
            	Error Trace:	logs.go:46
            	            				awskms_test.go:751
            	Error:      	Not equal:
            	            	expected: []spiretest.LogEntry{spiretest.LogEntry{Level:0x2, Message:"Failed to schedule key deletion", Data:logrus.Fields{"key_arn":"arn:aws:kms:region:1234:key/abcd-fghi", "reason":"Invalid ARN"}}}
            	            	actual  : []spiretest.LogEntry{spiretest.LogEntry{Level:0x5, Message:"Key enqueued for deletion", Data:logrus.Fields{"key_arn":"arn:aws:kms:region:1234:key/abcd-fghi"}}}

            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -2,7 +2,6 @@
            	            	  (spiretest.LogEntry) {
            	            	-  Level: (logrus.Level) 2,
            	            	-  Message: (string) (len=31) "Failed to schedule key deletion",
            	            	-  Data: (logrus.Fields) (len=2) {
            	            	-   (string) (len=7) "key_arn": (string) (len=37) "arn:aws:kms:region:1234:key/abcd-fghi",
            	            	-   (string) (len=6) "reason": (string) (len=11) "Invalid ARN"
            	            	+  Level: (logrus.Level) 5,
            	            	+  Message: (string) (len=25) "Key enqueued for deletion",
            	            	+  Data: (logrus.Fields) (len=1) {
            	            	+   (string) (len=7) "key_arn": (string) (len=37) "arn:aws:kms:region:1234:key/abcd-fghi"
            	            	   }
            	Test:       	TestGenerateKey/invalid_arn_error
            	Messages:   	unexpected logs
FAIL
FAIL	github.com/spiffe/spire/pkg/server/plugin/keymanager/awskms	20.032s
FAIL

Example 2:

% go test -race -count 100 -failfast -run TestGenerateKey ./pkg/server/plugin/keymanager/awskms
--- FAIL: TestGenerateKey (0.46s)
    --- FAIL: TestGenerateKey/schedule_key_deletion_error (0.04s)
        logs.go:46:
            	Error Trace:	logs.go:46
            	            				awskms_test.go:751
            	Error:      	Not equal:
            	            	expected: []spiretest.LogEntry{spiretest.LogEntry{Level:0x2, Message:"It was not possible to schedule key for deletion", Data:logrus.Fields{"key_arn":"arn:aws:kms:region:1234:key/abcd-fghi", "reason":"schedule key deletion error"}}, spiretest.LogEntry{Level:0x5, Message:"Key re-enqueued for deletion", Data:logrus.Fields{"key_arn":"arn:aws:kms:region:1234:key/abcd-fghi"}}}
            	            	actual  : []spiretest.LogEntry{spiretest.LogEntry{Level:0x5, Message:"Key enqueued for deletion", Data:logrus.Fields{"key_arn":"arn:aws:kms:region:1234:key/abcd-fghi"}}, spiretest.LogEntry{Level:0x5, Message:"Key re-enqueued for deletion", Data:logrus.Fields{"key_arn":"arn:aws:kms:region:1234:key/abcd-fghi"}}}

            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -2,7 +2,6 @@
            	            	  (spiretest.LogEntry) {
            	            	-  Level: (logrus.Level) 2,
            	            	-  Message: (string) (len=48) "It was not possible to schedule key for deletion",
            	            	-  Data: (logrus.Fields) (len=2) {
            	            	-   (string) (len=7) "key_arn": (string) (len=37) "arn:aws:kms:region:1234:key/abcd-fghi",
            	            	-   (string) (len=6) "reason": (string) (len=27) "schedule key deletion error"
            	            	+  Level: (logrus.Level) 5,
            	            	+  Message: (string) (len=25) "Key enqueued for deletion",
            	            	+  Data: (logrus.Fields) (len=1) {
            	            	+   (string) (len=7) "key_arn": (string) (len=37) "arn:aws:kms:region:1234:key/abcd-fghi"
            	            	   }
            	Test:       	TestGenerateKey/schedule_key_deletion_error
            	Messages:   	unexpected logs
FAIL
FAIL	github.com/spiffe/spire/pkg/server/plugin/keymanager/awskms	25.001s
FAIL

These do not occur when race detector is not enabled:

% go test -count 100 -failfast -run TestGenerateKey ./pkg/server/plugin/keymanager/awskms
ok  	github.com/spiffe/spire/pkg/server/plugin/keymanager/awskms	3.630s
@zmt
Copy link
Contributor Author

zmt commented Jan 31, 2022

Also see #2379 & #2705.

@zmt
Copy link
Contributor Author

zmt commented Mar 11, 2022

I can no longer reproduce this flakiness. I can make it fail by exceeding the goroutine limit in the race detector (which in turn is a limit in LLVM. See golang/go#38184):

% go test -race -failfast -count=100 -run=TestGenerateKey ./pkg/server/plugin/keymanager/awskms
race: limit on 8128 simultaneously alive goroutines is exceeded, dying
FAIL	github.com/spiffe/spire/pkg/server/plugin/keymanager/awskms	40.133s
FAIL
% go test -race -failfast -count=75 -run=TestGenerateKey ./pkg/server/plugin/keymanager/awskms
ok  	github.com/spiffe/spire/pkg/server/plugin/keymanager/awskms	29.251s

@zmt zmt closed this as completed Mar 11, 2022
@rturner3
Copy link
Collaborator

This failed in a different way for me in a CI run with race detection enabled:

--- FAIL: TestGenerateKey (0.15s)
    --- FAIL: TestGenerateKey/invalid_key_state_error (0.01s)
        logs.go:46: 
            	Error Trace:	logs.go:46
            	            				awskms_test.go:751
            	Error:      	Not equal: 
            	            	expected: []spiretest.LogEntry{spiretest.LogEntry{Level:0x2, Message:"Failed to schedule key deletion", Data:logrus.Fields{"key_arn":"arn:aws:kms:region:1234:key/abcd-fghi", "reason":"Key was on invalid state for deletion"}}}
            	            	actual  : []spiretest.LogEntry{spiretest.LogEntry{Level:0x5, Message:"Key enqueued for deletion", Data:logrus.Fields{"key_arn":"arn:aws:kms:region:1234:key/abcd-fghi"}}}
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -2,7 +2,6 @@
            	            	  (spiretest.LogEntry) {
            	            	-  Level: (logrus.Level) 2,
            	            	-  Message: (string) (len=31) "Failed to schedule key deletion",
            	            	-  Data: (logrus.Fields) (len=2) {
            	            	-   (string) (len=7) "key_arn": (string) (len=37) "arn:aws:kms:region:1234:key/abcd-fghi",
            	            	-   (string) (len=6) "reason": (string) (len=37) "Key was on invalid state for deletion"
            	            	+  Level: (logrus.Level) 5,
            	            	+  Message: (string) (len=25) "Key enqueued for deletion",
            	            	+  Data: (logrus.Fields) (len=1) {
            	            	+   (string) (len=7) "key_arn": (string) (len=37) "arn:aws:kms:region:1234:key/abcd-fghi"
            	            	   }
            	Test:       	TestGenerateKey/invalid_key_state_error
            	Messages:   	unexpected logs
FAIL
FAIL	github.com/spiffe/spire/pkg/server/plugin/keymanager/awskms	2.031s

Reopening the issue since this test still seems to be racy.

@rturner3 rturner3 reopened this Mar 15, 2022
@rturner3 rturner3 added the help wanted Issues with this label are ready to start work but are in need of someone to do it label Mar 15, 2022
azdagron added a commit to azdagron/spire that referenced this issue Mar 16, 2022
Also fixes a bug in the AssertLogsContainEntries that did not normalize
the log entry data fields.

Fixes: spiffe#2714

Signed-off-by: Andrew Harding <aharding@vmware.com>
azdagron added a commit that referenced this issue Mar 16, 2022
Also fixes a bug in the AssertLogsContainEntries that did not normalize
the log entry data fields.

Fixes: #2714

Signed-off-by: Andrew Harding <aharding@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issues with this label are ready to start work but are in need of someone to do it
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants