Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #7972 SearchBackpressureIT flaky tests #8063

Merged
merged 18 commits into from
Jun 22, 2023

Conversation

stephen-crawford
Copy link
Contributor

@stephen-crawford stephen-crawford commented Jun 14, 2023

Description

This PR is attempt number 2 at remediating the flaky tests in **/SearchBackpressureIT. The two tests focusing on high CPU usage were prone to failure do to thread concurrency issues where the assignment of the CancellableTask values were not synchronized across the nodes.

To resolve the issue, I added in a small static class as suggested by @ketanv3 and used an atomic reference through the SetOnce class. This allows for atomic assignments of the class variables without requiring a synchronized block around the cancel(reason) method.

After the change, I ran a runner for 10 times with no failures so it appears to be fixed.

One note about this test setup: I am not sure we want tests in this vain. The way the tests are made, we end up with non-deterministic behavior. For instance, we set the HIGH_CPU tests to be 1000 iterations, but that is just an arbitrary choice for a iteration count. On a more or less powerful processor this could have different results. This is especially true with different architectures that could have stack unwinding or other features.

Related Issues

Resolves #7972

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Signed-off-by: Stephen Crawford <steecraw@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@stephen-crawford
Copy link
Contributor Author

Failure does not seem related:


FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':test:fixtures:krb5kdc-fixture:composeBuild'.
> Exit-code 1 when calling /usr/bin/docker-compose, stdout: Step 1/9 : FROM ubuntu:14.04
  14.04: Pulling from library/ubuntu
  Digest: sha256:64483f3496c1373bfd55348e88694d1c4d0c9b660dee6bfef5e12f43b9933b30
  Status: Downloaded newer image for ubuntu:14.04
   ---> 13b66b487594
  Step 2/9 : RUN apt update -y
   ---> Running in 0ebbcd3251a0
  �[91m
  WARNING: apt does not have a stable CLI interface yet. Use with caution in scripts.
  

@owaiskazi19
Copy link
Member

@scrawfor99 can you rebase your branch with latest main and push again?

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
stephen-crawford and others added 2 commits June 16, 2023 14:33
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationIT.testScrollCreatedOnReplica

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testIndexDeletionDuringSnapshotCreationInQueue
      1 org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.index.ShardIndexingPressureIT.testShardIndexingPressureTrackingDuringBulkWrites

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      3 org.opensearch.remotestore.RemoteStoreRefreshListenerIT.testRemoteRefreshRetryOnFailure
      2 org.opensearch.cluster.service.MasterServiceTests.classMethod
      2 org.opensearch.cluster.service.MasterServiceTests.classMethod
      1 org.opensearch.cluster.service.MasterServiceTests.testThrottlingForMultipleTaskTypes
      1 org.opensearch.cluster.service.MasterServiceTests.testThrottlingForMultipleTaskTypes

@stephen-crawford
Copy link
Contributor Author

@dbwiddis, @reta, or @owaiskazi19 would any of you being able to merge this? Thank you.

.idea/vcs.xml Outdated Show resolved Hide resolved
Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Signed-off-by: Stephen Crawford <steecraw@amazon.com>
@owaiskazi19
Copy link
Member

@scrawfor99 do you think we can document the test setup you tried for reproducing flaky tests like somewhere? I don't know the right place probably in TESTING.md. Let's see if folks agree to it.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testDropPrimaryDuringReplication

@stephen-crawford
Copy link
Contributor Author

Hi @owaiskazi19, I can definitely add something but to be honest it was not very scientific... I use IntelliJ so I just ran the tests a handful of times with the IDE testing framework. @ketanv3 had provided quite a bit of insight here as someone who understands the use case and components around these tests much better than I do.

@stephen-crawford
Copy link
Contributor Author

stephen-crawford commented Jun 21, 2023

@dbwiddis @reta, think we could get this merged? :)

@reta reta merged commit 63dc6aa into opensearch-project:main Jun 22, 2023
@reta reta added the backport 2.x Backport to 2.x branch label Jun 22, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jun 22, 2023
* fix thread issue

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* fix thread issue

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Fix thresholds

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Swap to object based

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Spotless

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Swap to preserve nulls

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Spotless

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Resolve npe

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* remove final declerations

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* spotless

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* add annotations

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* push to rerun tests

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Fix idea

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Fix idea

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

---------

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
(cherry picked from commit 63dc6aa)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@stephen-crawford stephen-crawford deleted the backpressureFlaky branch June 22, 2023 13:08
reta pushed a commit that referenced this pull request Jun 22, 2023
* fix thread issue



* fix thread issue



* Fix thresholds



* Swap to object based



* Spotless



* Swap to preserve nulls



* Spotless



* Resolve npe



* remove final declerations



* spotless



* add annotations



* push to rerun tests



* Fix idea



* Fix idea



---------


(cherry picked from commit 63dc6aa)

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
gaiksaya pushed a commit to gaiksaya/OpenSearch that referenced this pull request Jun 26, 2023
…rch-project#8063) (opensearch-project#8217)

* fix thread issue



* fix thread issue



* Fix thresholds



* Swap to object based



* Spotless



* Swap to preserve nulls



* Spotless



* Resolve npe



* remove final declerations



* spotless



* add annotations



* push to rerun tests



* Fix idea



* Fix idea



---------


(cherry picked from commit 63dc6aa)

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
imRishN pushed a commit to imRishN/OpenSearch that referenced this pull request Jun 27, 2023
…rch-project#8063)

* fix thread issue

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* fix thread issue

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Fix thresholds

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Swap to object based

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Spotless

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Swap to preserve nulls

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Spotless

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Resolve npe

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* remove final declerations

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* spotless

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* add annotations

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* push to rerun tests

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Fix idea

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Fix idea

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

---------

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…rch-project#8063)

* fix thread issue

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* fix thread issue

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Fix thresholds

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Swap to object based

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Spotless

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Swap to preserve nulls

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Spotless

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Resolve npe

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* remove final declerations

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* spotless

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* add annotations

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* push to rerun tests

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Fix idea

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

* Fix idea

Signed-off-by: Stephen Crawford <steecraw@amazon.com>

---------

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
4 participants