Skip to content

Conversation

@atris
Copy link
Contributor

@atris atris commented Aug 14, 2025

The test had a race condition where threads were calling start() twice
on the CyclicBarrier - once inside getStats() and once from the main
thread. This caused inconsistent synchronization and intermittent failures.

Fixed by separating barrier synchronization from stats retrieval. Threads
now explicitly call start() to wait on the barrier, then call getStatsOnly()
to retrieve stats without additional synchronization.

Fixes #18682

Signed-off-by: Atri Sharma atri.jiit@gmail.com

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

❌ Gradle check result for 90e2d75: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@andrross
Copy link
Member

@atris Can you rebase to get the fix for the failing tests?

  The test had a race condition where threads were calling start() twice
  on the CyclicBarrier - once inside getStats() and once from the main
  thread. This caused inconsistent synchronization and intermittent failures.

  Fixed by separating barrier synchronization from stats retrieval. Threads
  now explicitly call start() to wait on the barrier, then call getStatsOnly()
  to retrieve stats without additional synchronization.

  Fixes opensearch-project#18682

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

  #The test used a CyclicBarrier with 7 parties (6 threads + main thread)
  #which created unnecessary complexity. The main thread's participation in
  #the barrier was not needed and could cause timing issues.

  #Fixed by having only the 6 worker threads participate in the barrier for
  #synchronization. The main thread now simply starts the threads and waits
  #for them to complete via join(). This simplifies the synchronization model
  #and eliminates potential race conditions.

  #Fixes opensearch-project#18682
@atris
Copy link
Contributor Author

atris commented Aug 14, 2025

@atris Can you rebase to get the fix for the failing tests?

Done, thank you!

@github-actions
Copy link
Contributor

❌ Gradle check result for 57be3b6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 57be3b6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@jainankitk
Copy link
Contributor

Flaky test:

[Test Result](https://build.ci.opensearch.org/job/gradle-check/62411/testReport/) (1 failure / -3)

    [org.opensearch.remotestore.multipart.RemoteStoreMultipartCoreTestCase.testNoSearchIdleForAnyReplicaCount](https://build.ci.opensearch.org/job/gradle-check/62411/testReport/junit/org.opensearch.remotestore.multipart/RemoteStoreMultipartCoreTestCase/testNoSearchIdleForAnyReplicaCount/)

@github-actions
Copy link
Contributor

❌ Gradle check result for 57be3b6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for 57be3b6: SUCCESS

@codecov
Copy link

codecov bot commented Aug 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.83%. Comparing base (14491cc) to head (57be3b6).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #19076      +/-   ##
============================================
- Coverage     72.98%   72.83%   -0.16%     
+ Complexity    69482    69402      -80     
============================================
  Files          5647     5647              
  Lines        319137   319137              
  Branches      46163    46163              
============================================
- Hits         232907   232428     -479     
- Misses        67391    67897     +506     
+ Partials      18839    18812      -27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jainankitk jainankitk merged commit 1273c5a into opensearch-project:main Aug 15, 2025
34 of 40 checks passed
RajatGupta02 pushed a commit to RajatGupta02/OpenSearch that referenced this pull request Aug 18, 2025
…rch-project#19076)

The test had a race condition where threads were calling start() twice
  on the CyclicBarrier - once inside getStats() and once from the main
  thread. This caused inconsistent synchronization and intermittent failures.

  Fixed by separating barrier synchronization from stats retrieval. Threads
  now explicitly call start() to wait on the barrier, then call getStatsOnly()
  to retrieve stats without additional synchronization.

  Fixes opensearch-project#18682

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
karenyrx pushed a commit to karenyrx/OpenSearch that referenced this pull request Aug 21, 2025
…rch-project#19076)

The test had a race condition where threads were calling start() twice
  on the CyclicBarrier - once inside getStats() and once from the main
  thread. This caused inconsistent synchronization and intermittent failures.

  Fixed by separating barrier synchronization from stats retrieval. Threads
  now explicitly call start() to wait on the barrier, then call getStatsOnly()
  to retrieve stats without additional synchronization.

  Fixes opensearch-project#18682

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
atris added a commit to atris/OpenSearch that referenced this pull request Aug 28, 2025
…rch-project#19076)

The test had a race condition where threads were calling start() twice
  on the CyclicBarrier - once inside getStats() and once from the main
  thread. This caused inconsistent synchronization and intermittent failures.

  Fixed by separating barrier synchronization from stats retrieval. Threads
  now explicitly call start() to wait on the barrier, then call getStatsOnly()
  to retrieve stats without additional synchronization.

  Fixes opensearch-project#18682

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
kh3ra pushed a commit to kh3ra/OpenSearch that referenced this pull request Sep 5, 2025
…rch-project#19076)

The test had a race condition where threads were calling start() twice
  on the CyclicBarrier - once inside getStats() and once from the main
  thread. This caused inconsistent synchronization and intermittent failures.

  Fixed by separating barrier synchronization from stats retrieval. Threads
  now explicitly call start() to wait on the barrier, then call getStatsOnly()
  to retrieve stats without additional synchronization.

  Fixes opensearch-project#18682

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Sep 26, 2025
…rch-project#19076)

The test had a race condition where threads were calling start() twice
  on the CyclicBarrier - once inside getStats() and once from the main
  thread. This caused inconsistent synchronization and intermittent failures.

  Fixed by separating barrier synchronization from stats retrieval. Threads
  now explicitly call start() to wait on the barrier, then call getStatsOnly()
  to retrieve stats without additional synchronization.

  Fixes opensearch-project#18682

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autocut flaky-test Random test failure that succeeds on second run skip-changelog >test-failure Test failure from CI, local build, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for CompletionStatsCacheTests

3 participants