-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Fix bug caused by empty composites #30343
Conversation
When dealing with filtering, a composite aggregation might return empty buckets (which have been filtered) which gets sent as is to the client. Unfortunately this interprets the response as no more data instead of retrying. This now has changed and the listener keeps retrying until either the query has ended or data passes the filter. Fix elastic#30292 SQL: Fix bug related to HAVING filtering
// retry | ||
if (shouldRetryDueToEmptyPage(r)) { | ||
CompositeAggregationCursor.updateCompositeAfterKey(r, search.source()); | ||
client.search(search, this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000 is this okay for chaining listeners, in particular the return
for the current call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general chaining listeners is fine. I don't tend to use the return
early style in listeners because I feel like they should look symmetric. They just feel better to me when they do. I use the early return
style everywhere else though.
The only concern with chaining listeners like this is that some APIs are synchronous sometimes and if you chain on a synchronous API you'll get a stack overflow. The initial phase of search has this problem. I don't know if you have that problem here but I'd dig through the code to make super sure you don't.
updateCompositeAfterKey(r, query); | ||
CompositeAggsRowSet rowSet = new CompositeAggsRowSet(extractors, r, limit, serializeQuery(query), indices); | ||
listener.onResponse(rowSet); | ||
} catch (Exception ex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to catch (Exception
here because the caller already does and will forward any exceptions you throw to onFailure
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While updating the code, I was reminded why I added it in the first place - serializeQuery
throws an IOException
which needs to be caught so it's easy to just pass it to the listener.
I cannot use wrap
directly because I need to pass the listener instance to it again...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand with wrap
. I think it'd be a little more clear if you just caught the IOException
and sent it to onFailure
but what you have will work as well.
// retry | ||
if (shouldRetryDueToEmptyPage(r)) { | ||
CompositeAggregationCursor.updateCompositeAfterKey(r, search.source()); | ||
client.search(search, this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general chaining listeners is fine. I don't tend to use the return
early style in listeners because I feel like they should look symmetric. They just feel better to me when they do. I use the early return
style everywhere else though.
The only concern with chaining listeners like this is that some APIs are synchronous sometimes and if you chain on a synchronous API you'll get a stack overflow. The initial phase of search has this problem. I don't know if you have that problem here but I'd dig through the code to make super sure you don't.
// retry | ||
if (CompositeAggregationCursor.shouldRetryDueToEmptyPage(response)) { | ||
CompositeAggregationCursor.updateCompositeAfterKey(response, request.source()); | ||
client.search(request, this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same deal here. Make sure it isn't going to stack overflow. It probably won't be we should make sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've ran the test suite with the troublesome seed and without (plus ran the test separately with the page set to 1 - this is what triggered the bug). Anything else I should watch out for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You pretty much have to read the code to make sure it won't happen. With search it only came up because dozens of shards for the search were on the local node. With this it'd come up if the composite agg needed to be retried hundreds of times and the API doesn't otherwise go async.
I'd dig through the client implementation. I think this is safe because you are being called on the listener thread pool and the search is running on the search threadpool but it is worth double checking.
if (CompositeAggregationCursor.shouldRetryDueToEmptyPage(response)) { | ||
CompositeAggregationCursor.updateCompositeAfterKey(response, request.source()); | ||
client.search(request, this); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the return
in this case because it bails out from the listening method. the if/else might be nested inside another block (like here) and some code might be hanging towards the end.
Pinging @elastic/es-search-aggs |
When dealing with filtering, a composite aggregation might return empty buckets (which have been filtered) which gets sent as is to the client. Unfortunately this interprets the response as no more data instead of retrying. This now has changed and the listener keeps retrying until either the query has ended or data passes the filter. Fix #30292
When dealing with filtering, a composite aggregation might return empty buckets (which have been filtered) which gets sent as is to the client. Unfortunately this interprets the response as no more data instead of retrying. This now has changed and the listener keeps retrying until either the query has ended or data passes the filter. Fix #30292
* master: Set the new lucene version for 6.4.0 [ML][TEST] Clean up jobs in ModelPlotIT Upgrade to 7.4.0-snapshot-1ed95c097b (#30357) Watcher: Ensure trigger service pauses execution (#30363) [DOCS] Added coming qualifiers in changelog [DOCS] Commented out empty sections in the changelog to fix the doc build. (#30372) Security: reduce garbage during index resolution (#30180) Make RepositoriesMetaData contents unmodifiable (#30361) Change quad tree max levels to 29. Closes #21191 (#29663) Test: use trial license in qa tests with security [ML] Add integration test for model plots (#30359) SQL: Fix bug caused by empty composites (#30343) [ML] Account for gaps in data counts after job is reopened (#30294) InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (#30121) Change signature of Get Repositories Response (#30333) Tests: Use different watch ids per test in smoke test (#30331) [Docs] Add term query with normalizer example Adds Eclipse config for xpack licence headers (#30299) Watcher: Make start/stop cycle more predictable and synchronous (#30118) [test] add debug logging for packaging test [DOCS] Removed X-Pack Breaking Changes [DOCS] Fixes link to TLS LDAP info Update versions for start_trial after backport (#30218) Packaging: Set elasticsearch user to have non-existent homedir (#29007) [DOCS] Fixes broken links to bootstrap user (#30349) Fix NPE when CumulativeSum agg encounters null/empty bucket (#29641) Make licensing FIPS-140 compliant (#30251) [DOCS] Reorganizes authentication details in Stack Overview (#30280) Network: Remove http.enabled setting (#29601) Fix merging logic of Suggester Options (#29514) [DOCS] Adds LDAP realm configuration details (#30214) [DOCS] Adds native realm configuration details (#30215) ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled (#30316) [DOCS] Enables edit links for X-Pack pages (#30278) Packaging: Unmark systemd service file as a config file (#29004) SQL: Reduce number of ranges generated for comparisons (#30267) Tests: Simplify VersionUtils released version splitting (#30322) Cancelling a peer recovery on the source can leak a primary permit (#30318) Added changelog entry for deb prerelease version change (#30184) Convert server javadoc to html5 (#30279) Create default ES_TMPDIR on Windows (#30325) [Docs] Clarify `fuzzy_like_this` redirect (#30183) Post backport of #29658. Fix docs of the `_ignored` meta field. Remove MapperService#types(). (#29617) Remove useless version checks in REST tests. (#30165) Add a new `_ignored` meta field. (#29658) Move repository-azure fixture test to QA project (#30253) # Conflicts: # buildSrc/version.properties # server/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
* origin/ccr: (166 commits) Introduce soft-deletes retention policy based on global checkpoint (elastic#30335) Enable MockHttpTransport in ShardChangsIT Remove old sha files from dated Lucene snapshot Update InternalEngine tests on ccr side for elastic#30121 Set the new lucene version for 6.4.0 [ML][TEST] Clean up jobs in ModelPlotIT Upgrade to 7.4.0-snapshot-1ed95c097b (elastic#30357) Watcher: Ensure trigger service pauses execution (elastic#30363) [DOCS] Added coming qualifiers in changelog [DOCS] Commented out empty sections in the changelog to fix the doc build. (elastic#30372) Security: reduce garbage during index resolution (elastic#30180) Make RepositoriesMetaData contents unmodifiable (elastic#30361) Change quad tree max levels to 29. Closes elastic#21191 (elastic#29663) Test: use trial license in qa tests with security [ML] Add integration test for model plots (elastic#30359) SQL: Fix bug caused by empty composites (elastic#30343) [ML] Account for gaps in data counts after job is reopened (elastic#30294) InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (elastic#30121) Change signature of Get Repositories Response (elastic#30333) Tests: Use different watch ids per test in smoke test (elastic#30331) ...
* origin/ccr: (127 commits) Introduce soft-deletes retention policy based on global checkpoint (elastic#30335) Enable MockHttpTransport in ShardChangsIT Remove old sha files from dated Lucene snapshot Update InternalEngine tests on ccr side for elastic#30121 Set the new lucene version for 6.4.0 [ML][TEST] Clean up jobs in ModelPlotIT Upgrade to 7.4.0-snapshot-1ed95c097b (elastic#30357) Watcher: Ensure trigger service pauses execution (elastic#30363) [DOCS] Added coming qualifiers in changelog [DOCS] Commented out empty sections in the changelog to fix the doc build. (elastic#30372) Security: reduce garbage during index resolution (elastic#30180) Make RepositoriesMetaData contents unmodifiable (elastic#30361) Change quad tree max levels to 29. Closes elastic#21191 (elastic#29663) Test: use trial license in qa tests with security [ML] Add integration test for model plots (elastic#30359) SQL: Fix bug caused by empty composites (elastic#30343) [ML] Account for gaps in data counts after job is reopened (elastic#30294) InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (elastic#30121) Change signature of Get Repositories Response (elastic#30333) Tests: Use different watch ids per test in smoke test (elastic#30331) ...
* origin/ccr: (166 commits) Introduce soft-deletes retention policy based on global checkpoint (elastic#30335) Enable MockHttpTransport in ShardChangsIT Remove old sha files from dated Lucene snapshot Update InternalEngine tests on ccr side for elastic#30121 Set the new lucene version for 6.4.0 [ML][TEST] Clean up jobs in ModelPlotIT Upgrade to 7.4.0-snapshot-1ed95c097b (elastic#30357) Watcher: Ensure trigger service pauses execution (elastic#30363) [DOCS] Added coming qualifiers in changelog [DOCS] Commented out empty sections in the changelog to fix the doc build. (elastic#30372) Security: reduce garbage during index resolution (elastic#30180) Make RepositoriesMetaData contents unmodifiable (elastic#30361) Change quad tree max levels to 29. Closes elastic#21191 (elastic#29663) Test: use trial license in qa tests with security [ML] Add integration test for model plots (elastic#30359) SQL: Fix bug caused by empty composites (elastic#30343) [ML] Account for gaps in data counts after job is reopened (elastic#30294) InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (elastic#30121) Change signature of Get Repositories Response (elastic#30333) Tests: Use different watch ids per test in smoke test (elastic#30331) ...
* 6.x: Stop forking javac (#30462) Fix tribe tests Docs: Use task_id in examples of tasks (#30436) Security: Rename IndexLifecycleManager to SecurityIndexManager (#30442) Packaging: Set elasticsearch user to have non-existent homedir (#29007) [Docs] Fix typo in cardinality-aggregation.asciidoc (#30434) Avoid NPE in `more_like_this` when field has zero tokens (#30365) Build: Switch to building javadoc with html5 (#30440) Add a quick tour of the project to CONTRIBUTING (#30187) Add stricter geohash parsing (#30376) Reindex: Use request flavored methods (#30317) Silence SplitIndexIT.testSplitIndexPrimaryTerm test failure. (#30432) Auto-expand replicas when adding or removing nodes (#30423) Silence IndexUpgradeIT test failures. (#30430) Fix line length violation in cache tests Add failing test for core cache deadlock [DOCS] convert forcemerge snippet Update forcemerge.asciidoc (#30113) Added zentity to the list of API extension plugins (#29143) Fix the search request default operation behavior doc (#29302) (#29405) Watcher: Mark watcher as started only after loading watches (#30403) Correct wording in log message (#30336) Do not fail snapshot when deleting a missing snapshotted file (#30332) AwaitsFix testCreateShrinkIndexToN DOCS: Correct mapping tags in put-template api DOCS: Fix broken link in the put index template api Add put index template api to high level rest client (#30400) [Docs] Add snippets for POS stop tags default value Remove entry inadvertently picked into changelog Move respect accept header on no handler to 6.3.1 Respect accept header on no handler (#30383) [Test] Add analysis-nori plugin to the vagrant tests [Rollup] Validate timezone in range queries (#30338) [Docs] Fix bad link [Docs] Fix end of section in the korean plugin docs add the Korean nori plugin to the change logs Expose the Lucene Korean analyzer module in a plugin (#30397) Docs: remove transport_client from CCS role example (#30263) Test: remove cluster permission from CCS user (#30262) Watcher: Remove unneeded index deletion in tests fix docs branch version fix lucene snapshot version Upgrade to 7.4.0-snapshot-1ed95c097b (#30357) [ML][TEST] Clean up jobs in ModelPlotIT Watcher: Ensure trigger service pauses execution (#30363) [DOCS] Fixes ordering of changelog sections [DOCS] Commented out empty sections in the changelog to fix the doc build. (#30372) Make RepositoriesMetaData contents unmodifiable (#30361) Change signature of Get Repositories Response (#30333) 6.x Backport: Terms query validate bug (#30319) InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (#30121) Security: reduce garbage during index resolution (#30180) Test: use trial license in qa tests with security [ML] Add integration test for model plots (#30359) SQL: Fix bug caused by empty composites (#30343) [ML] Account for gaps in data counts after job is reopened (#30294) [ML] Refactor DataStreamDiagnostics to use array (#30129) Make licensing FIPS-140 compliant (#30251) Do not load global state when deleting a snapshot (#29278) Don't load global state when only restoring indices (#29239) Tests: Use different watch ids per test in smoke test (#30331) Watcher: Make start/stop cycle more predictable and synchronous (#30118) [Docs] Add term query with normalizer example Adds Eclipse config for xpack licence headers (#30299) Fix message content in users tool (#30293) [DOCS] Removed X-Pack breaking changes page [DOCS] Added security breaking change [DOCS] Fixes link to TLS LDAP info [DOCS] Merges X-Pack release notes into changelog (#30350) [DOCS] Fixes broken links to bootstrap user (#30349) [Docs] Remove errant changelog line Fix NPE when CumulativeSum agg encounters null/empty bucket (#29641) [DOCS] Reorganizes authentication details in Stack Overview (#30280) Tests: Simplify VersionUtils released version splitting (#30322) Fix merging logic of Suggester Options (#29514) ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled (#30316) [DOCS] Adds LDAP realm configuration details (#30214) [DOCS] Adds native realm configuration details (#30215) Disable SSL on testing old BWC nodes (#30337) [DOCS] Enables edit links for X-Pack pages Cancelling a peer recovery on the source can leak a primary permit (#30318) SQL: Reduce number of ranges generated for comparisons (#30267) [DOCS] Adds links to changelog sections Convert server javadoc to html5 (#30279) REST Client: Add Request object flavored methods (#29623) Create default ES_TMPDIR on Windows (#30325) [Docs] Clarify `fuzzy_like_this` redirect (#30183) Fix docs of the `_ignored` meta field. Add a new `_ignored` meta field. (#29658) Move repository-azure fixture test to QA project (#30253)
When dealing with filtering, a composite aggregation might return empty
buckets (which have been filtered) which gets sent as is to the client.
Unfortunately this interprets the response as no more data instead of
retrying.
This now has changed and the listener keeps retrying until either the
query has ended or data passes the filter.
Fix #30292