-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Fix flaky AggregationProfilerIT.testTopHitsAggregationFetchProfiling #19138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The test was failing intermittently when concurrent segment search was enabled because the profiling structure can vary. With concurrent execution, profiled children may be empty or have different structure compared to sequential execution. Modified assertions to check for FetchSourcePhase children only when the children list is non-empty, making the test resilient to both execution modes. Fixes opensearch-project#19070 Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #19138 +/- ##
============================================
- Coverage 72.91% 72.83% -0.09%
+ Complexity 69478 69433 -45
============================================
Files 5648 5648
Lines 319272 319272
Branches 46183 46183
============================================
- Hits 232811 232555 -256
- Misses 67641 67941 +300
+ Partials 18820 18776 -44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| if (!topHitsFetch1.getProfiledChildren().isEmpty()) { | ||
| // Verify at least one child is FetchSourcePhase when children exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering in what cases topHitsFetch1.getProfiledChildren() could be empty? While the structure could be different, but why could it be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The children can be empty due to a race condition in profile collection with concurrent segment search.
When concurrent search is enabled, FlatFetchProfileTree uses thread IDs to track phases (element + "_" + Thread.currentThread().getId()). This can cause sub-phases to not link correctly to their parent in the profile tree due to timing issues.
The fetch phase still executes FetchSourcePhase correctly, but the profiling infrastructure doesn't always capture the parent-child relationship.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fetch phase still executes FetchSourcePhase correctly, but the profiling infrastructure doesn't always capture the parent-child relationship.
This seems more like a bug to me, than a flaky test. I am wondering if this issue is fixable and we should instead have a github issue to address it?
ConcurrentQueryProfileBreakdown has associateCollectorToLeaves method to build the relationship correctly instead of relying on the threadId. Maybe we can use the same logic here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, after spending another hour or so, I just realized that the maps need to be synchronized to avoid running into this issue #19164. After enabling concurrent implementations, I could get through more than 1500 executions without running into failure. Previously, I could barely hit 200 executions
Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
|
❌ Gradle check result for 16d2522: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
This change may not be needed after #19164
|
Sorry I missed this. No, post that, we do not need this. Thank you for the PR @jainankitk |
Thanks @atris for the confirmation. We can close this PR for now! |
The test was failing intermittently when concurrent segment search was
enabled because the profiling structure can vary. With concurrent execution,
profiled children may be empty or have different structure compared to
sequential execution.
Modified assertions to check for FetchSourcePhase children only when
the children list is non-empty, making the test resilient to both
execution modes.
Fixes #19070
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.