-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstract waitForInit to minimize duplication and improve test reliability #1935
Abstract waitForInit to minimize duplication and improve test reliability #1935
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1935 +/- ##
=========================================
Coverage 61.00% 61.01%
- Complexity 3233 3234 +1
=========================================
Files 256 256
Lines 18088 18088
Branches 3224 3224
=========================================
+ Hits 11035 11036 +1
+ Misses 5470 5468 -2
- Partials 1583 1584 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great step forward, I think by making this fix a little more generic we could see the stability of the entire test suite go up instead of just these couple of tests.
//Wait for the security plugin to load roles. | ||
public void waitForInit(Client client) throws Exception { | ||
int maxRetries = 3; | ||
Optional<Exception> retainedException = Optional.empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its cleaner without an optional, for your consideration:
private void waitForHealthyCluster(final Client client) {
final int maxRetries = 3;
for (int attempt = 0; true; attempt++) {
try {
client.admin().cluster().health(new ClusterHealthRequest()).actionGet();
return;
} catch (final Exception ex) {
if(attempt >= maxRetries) {
throw new RuntimeException(ex);
}
}
try {
Thread.sleep(500);
} catch (Exception ex) { /* ignored */}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of the RuntimeException here. Will update.
@@ -135,6 +138,31 @@ protected void setupGenericNodes(List<Settings> nodeOverride, List<Boolean> sslO | |||
clusterConfiguration); | |||
} | |||
|
|||
//Wait for the security plugin to load roles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use javadoc comment style, /** ... */
@@ -135,6 +138,31 @@ protected void setupGenericNodes(List<Settings> nodeOverride, List<Boolean> sslO | |||
clusterConfiguration); | |||
} | |||
|
|||
//Wait for the security plugin to load roles. | |||
public void waitForInit(Client client) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the throws
cause, this function is easier to use without it
Optional<Exception> retainedException = Optional.empty(); | ||
for (int i = 0; i < maxRetries; i++) { | ||
try { | ||
client.admin().cluster().health(new ClusterHealthRequest()).actionGet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't a value returned that should be checked, could this value be something like not healthy/not up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted to keep the existing behavior of waitForInit
, but I think there's room for improvement. While making updates to the function I was retaining all thrown exceptions and noticed that on failing tests I would receive a no permissions for [cluster:monitor/health]...
. I believe the implementation did this intentionally because its indicative of security plugin initialization even if it does not explicitly check for status == UP
.
How do you think we should handle missing permissions here?
@@ -67,18 +66,6 @@ public Collection<Object> createComponents(Client client, ClusterService cluster | |||
} | |||
} | |||
|
|||
//Wait for the security plugin to load roles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use that new method inside of SingleCluster to after initalized(...)
is called? This would effectively make sure all tests start only when the local cluster is fully operational.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In one of the previous iterations I moved this call up to be done directly after initialize
for all tests that use SingleClusterTest
. It introduced test failures, but that was before I came to realize the permissions issue with the healthcheck call in waitForInit
. I'll try moving this up again directly after initialize
to be applicable to all tests.
@@ -41,6 +42,9 @@ public TenantInfoActionTest(){ | |||
public void testTenantInfoAPIAccess() throws Exception { | |||
Settings settings = Settings.builder().put(ConfigConstants.SECURITY_UNSUPPORTED_RESTAPI_ALLOW_SECURITYCONFIG_MODIFICATION, true).build(); | |||
setup(settings); | |||
try (Client tc = getClient()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should be baked into setup(...)
or deeper, if you call setup I would expect a setup up and ready to use server. What do you think?
0b90eb4
to
a110f18
Compare
…ration load timeout to 10s Signed-off-by: Craig Perkins <cwperx@amazon.com>
86af4c4
to
187a2ef
Compare
@@ -363,13 +363,13 @@ public Map<CType, SecurityDynamicConfiguration<?>> getConfigurationsFromIndex(Co | |||
} else { | |||
LOGGER.debug("security index exists and was created with ES 7 (new layout)"); | |||
} | |||
retVal.putAll(validate(cl.load(configTypes.toArray(new CType[0]), 5, TimeUnit.SECONDS, acceptInvalid), configTypes.size())); | |||
retVal.putAll(validate(cl.load(configTypes.toArray(new CType[0]), 10, TimeUnit.SECONDS, acceptInvalid), configTypes.size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, why did this value change to 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the source of the flakiness in TenantInfoActionTest. We would often receive build output like this:
[2022-07-13T18:01:15,983][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure No shard available for [org.opensearch.action.get.MultiGetShardRequest@143292d5] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,983][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure No shard available for [org.opensearch.action.get.MultiGetShardRequest@143292d5] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,983][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure No shard available for [org.opensearch.action.get.MultiGetShardRequest@143292d5] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,983][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure No shard available for [org.opensearch.action.get.MultiGetShardRequest@143292d5] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,983][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure No shard available for [org.opensearch.action.get.MultiGetShardRequest@143292d5] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,983][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure No shard available for [org.opensearch.action.get.MultiGetShardRequest@143292d5] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,983][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure No shard available for [org.opensearch.action.get.MultiGetShardRequest@143292d5] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,983][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure No shard available for [org.opensearch.action.get.MultiGetShardRequest@143292d5] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,983][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure No shard available for [org.opensearch.action.get.MultiGetShardRequest@143292d5] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,983][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure No shard available for [org.opensearch.action.get.MultiGetShardRequest@143292d5] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,984][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812] Node not connected retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,984][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812] Node not connected retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,984][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812] Node not connected retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,984][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812] Node not connected retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,984][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812] Node not connected retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,984][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812] Node not connected retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,984][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812] Node not connected retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,984][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812] Node not connected retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,984][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812] Node not connected retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:15,984][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812] Node not connected retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:16,343][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812][indices:data/read/mget[shard][s]] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:16,343][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812][indices:data/read/mget[shard][s]] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:16,343][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812][indices:data/read/mget[shard][s]] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:16,343][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812][indices:data/read/mget[shard][s]] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:16,343][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812][indices:data/read/mget[shard][s]] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:16,343][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812][indices:data/read/mget[shard][s]] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:16,343][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812][indices:data/read/mget[shard][s]] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:16,343][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812][indices:data/read/mget[shard][s]] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:16,343][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812][indices:data/read/mget[shard][s]] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
[2022-07-13T18:01:16,343][ERROR][org.opensearch.security.configuration.ConfigurationLoaderSecurity7] Failure [node_utest_n29_fnull_t458578506714_num1][127.0.0.1:6812][indices:data/read/mget[shard][s]] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
**[2022-07-13T18:01:20,215][ERROR][org.opensearch.security.rest.TenantInfoAction] OpenSearchException[java.util.concurrent.TimeoutException: Timeout after 5SECONDS while retrieving configuration for [ROLESMAPPING](index=.opendistro_security)]; nested: TimeoutException[Timeout after 5SECONDS while retrieving configuration for [ROLESMAPPING](index=.opendistro_security)];**
>>>> TenantInfoActionTest testTenantInfoAPIUpdate FAILED due to java.lang.AssertionError: expected:<200> but was:<500>
See the second to last line for the timeout. By increasing the timeout gives a bit more of a buffer for the configuration load to complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay...got it.
@@ -232,7 +216,8 @@ public void testReplication() throws Exception { | |||
} | |||
|
|||
try (Node node = new PluginAwareNode(false, tcSettings, Netty4Plugin.class, OpenSearchSecurityPlugin.class, MockReplicationPlugin.class).start()) { | |||
waitOrThrow(node.client(), "hr-masking"); | |||
waitForInit(node.client()); | |||
node.client().execute(MockReplicationAction.INSTANCE, new MockReplicationRequest("hr-masking")).actionGet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These calls can be extracted into in a separate method:
void mockReplicationRequest(Client client, String replicationFor) {
waitForInit(node.client());
node.client().execute(MockReplicationAction.INSTANCE, new MockReplicationRequest(replicationFor)).actionGet();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I brought back waitOrThrow
and opted to change the implementation of waitOrThrow
to use waitForInit
to cut down our duplication.
Signed-off-by: Craig Perkins <cwperx@amazon.com>
* Abstract waitForInit to AbstractSecurityUnitTest and increase configuration load timeout to 10s * Use waitForInit inside waitOrThrow for CCReplicationTest Signed-off-by: Craig Perkins <cwperx@amazon.com> (cherry picked from commit cb24f6e)
* Abstract waitForInit to AbstractSecurityUnitTest and increase configuration load timeout to 10s * Use waitForInit inside waitOrThrow for CCReplicationTest Signed-off-by: Craig Perkins <cwperx@amazon.com> (cherry picked from commit cb24f6e) Co-authored-by: Craig Perkins <cwperx@amazon.com>
* Abstract waitForInit to AbstractSecurityUnitTest and increase configuration load timeout to 10s * Use waitForInit inside waitOrThrow for CCReplicationTest Signed-off-by: Craig Perkins <cwperx@amazon.com> Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Windows build and test support for 1.3 - Use most recent format of CI workflows from main - Backport #2206 - Supply workarounds for JDK8 incompatible APIs for Encoding / Pattern matching (Thanks @cwperks!) - Backport only freeport logic from #1638 - Backport #1758 - All updates to TestAuditlogImpl.java from main - #2180 - #1935 - #1920 - #1914 - #1829 - And Targeted test updates for ComplianceAuditlogTest and BasicAuditlogTest to keep up with TestAuditlogImpl.java updates Signed-off-by: Peter Nied <petern@amazon.com> Signed-off-by: Stephen Crawford <steecraw@amazon.com> Signed-off-by: Stephen Crawford <65832608+scrawfor99@users.noreply.github.com> Co-authored-by: Stephen Crawford <65832608+scrawfor99@users.noreply.github.com>
Description
waitForInit
was copied in few different tests classes. This PR abstracts it intoSingleClusterTests
and changes it from being recursive to iterative with amaxRetries
of 3. (i.e. waiting a max of 1.5 seconds for security plugin initialization).While abstracting the function I noticed that some of the tests its applied to correctly identify that the plugin is initialized, but swallow an error that the user making the health request does not have the correct permissions. This PR keeps the same behavior.
This also adds
waitForInit
inside of tests inTenantInfoActionTest
Maintenance, Test Reliability
Reduce code duplication and improve reliability
Issues Resolved
Test reliability is impacting CI workflows #1917
Testing
[Please provide details of testing done: unit testing, integration testing and manual testing]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.