-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify tags related functions and some test files to fix flaky assertion errors #51
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kyriechen96 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
9499fff
to
f5b2d69
Compare
aclName: open-access | ||
numShards: 1 | ||
numReplicasPerShard: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be added whole idea is test should run once you fix the spec that has issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the root reason is that test account is full now. Deleting some clusters in account.
nodeType: cache.t4g.small | ||
aclName: open-access | ||
numShards: 2 | ||
numShards: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think these changes are needed. cache
is not valid node type for memorydb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I agree with you, but this causes node limit exceed. It doesn't have this issue before, but this happens recently. It may cause by some code changes on service side. The modification makes test pass now, so we need to investigate the root cause.
@@ -58,12 +58,12 @@ steps: | |||
userNames: | |||
- userone$RANDOM_SUFFIX | |||
- usertwo$RANDOM_SUFFIX | |||
wait: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait:
status:
conditions:
ACK.ResourceSynced:
status: "True"
timeout: 180
Why would hard coding the value give advantage over status change wait? I prefer the previous approach as it helps us catch any issue with status change flipping back and forth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous approach causes random assertion error. My assumption is there is something wrong with previous approach, and I modified to the current approach to check whether it works. And I think we need to investigate the root cause.
f5b2d69
to
696a07a
Compare
02c45db
to
c5c60a0
Compare
@kyriechen96 Please see the error from the tests:
The issue isn't timeouts. The issue seems to be a missing ACL dependency in one of the tests I think? |
We had a discussion of it. This is happened during condition check for ACL creation. I'm fixing it now. |
c5c60a0
to
6a8b7ed
Compare
/retest |
@kyriechen96: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
bba90b9
to
bc6af83
Compare
068e1d0
to
79629a0
Compare
I'm not super familiar with the memorydb snapshotting process. Would you mind explaining this a little more:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff Kyrie!
if hasSameKey { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you wanna ignore duplicated keys +++
condMsgCurrentlyDeleting = "cluster currently being deleted" | ||
condMsgNoDeleteWhileUpdating = "cluster is being updated. cannot delete" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make messages a bit consistent? cluster currently being updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two variables are not related to flaky tests. I didn't modify or create them in this PR, so I will mark it and change it in a new PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please. The english grammar for these two messages should be improved. cluster is currently being deleted
and cluster is currently being updated, cannot delete
are my suggestions
@@ -15,6 +15,7 @@ package parameter_group | |||
|
|||
import ( | |||
"context" | |||
ackutil "github.com/aws-controllers-k8s/runtime/pkg/util" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it was automatically added there. I will change the order manually.
func equalStrings(a, b *string) bool { | ||
if a == nil { | ||
return b == nil || *b == "" | ||
} | ||
return (*a == "" && b == nil) || *a == *b | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not panic in the tags case, it could with other fields. Correct implementation here: aws-controllers-k8s/community#1654
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it for all resource hooks in the new PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to just copy the code from that issue into all of your equalStrings
functions. It would be better if we address it now than leave it for another PR, where we might forget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it for all hook files.
svcsdk "github.com/aws/aws-sdk-go/service/memorydb" | ||
corev1 "k8s.io/api/core/v1" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
|
||
svcapitypes "github.com/aws-controllers-k8s/memorydb-controller/apis/v1alpha1" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pkg/resource/subnet_group/hooks.go
Outdated
@@ -15,6 +15,7 @@ package subnet_group | |||
|
|||
import ( | |||
"context" | |||
ackutil "github.com/aws-controllers-k8s/runtime/pkg/util" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed import orders for all hooks.
01c8527
to
a38db76
Compare
a38db76
to
f00537c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to go now! Running the soak tests will be the real determination of whether all of these tests are solid, but for now this looks good.
I've left one nit that's applicable to all of the <resource>Active
hook methods.
@@ -55,6 +59,14 @@ func (rm *resourceManager) validateUserNeedsUpdate( | |||
return nil, nil | |||
} | |||
|
|||
// userActive returns true when the status of the given User is set to `active` | |||
func (rm *resourceManager) userActive( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: methods returning booleans should start with an article. For example isUserActive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyriechen96 wants to address this in a seperate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job on this.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: A-Hilaly, kyriechen96, nmvk, RedbackThomson The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…ion errors (aws-controllers-k8s#51) Issue #, if available: MemoryDB Controller has some flaky tests that randomly show assertion error. Root causes: 1. Snapshot test files didn't clean resources which were created from previous failed tests, so eventually test account reached node limit. 2. Delta compare considers order matters, so it considers two tag arrays are different even though they have same elements with different orders. Function of updating tags didn't include this corner case, and it caused infinite update. 3. When creating resources, some resources take a while to finish creation. Function of getting tags got error because it couldn't find resource, and this error covered previous ResourceNotFound error. Hence, reconciler didn't call requeue, because it didn't recognize the new error. 4. Cluster update validation test file has a step to update and validate Snapshot Window (daily time range to auto snapshot). If this step is executing within the time range of new Snapshot Window, update would succeed (ResourceSynced returns true) because two resources match, but cluster starts snapshotting after it's Snapshot Window is updated to the current time. Hence, for next step to update other fields of cluster, new fields couldn't be updated on time, because the status of cluster is snapshotting and cluster cannot be updated. Description of changes: 1. Fix updateTags function and it covers situation of equal tag arrays. (It still wastes time to call sdkUpdate over and over. It is better to modify delta comparison of tags in code-generator. New PR is created (aws-controllers-k8s/community#1658). 2. Change updateTags functions from parameter group and subnet group to match the updateTags functions in other resources. 3. Add a status check for getTags. It is only called when status of resource (cluster, snapshot, acl, user) is active. 4. Increase wait time for the next update step after Snapshot Window update. The new wait time is enough to cover snapshotting process. 5. Use same method to create cluster for snapshot for snapshot_validate_tags.yaml test file. 6. Correct user names and delete unused resources in test files. 7. Modify some test IDs, step IDs, and descriptions which are not appropriate that makes debugging really hard. 8. Fix import order of hook files. 9. Modify equalStrings function. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Issue #, if available:
MemoryDB Controller has some flaky tests that randomly show assertion error.
Root causes:
Description of changes:
acktags.Tag
structs community#1658).By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.