Allow searches with specific reader contexts #53989

dnhatn · 2020-03-23T15:35:27Z

This commit integrates searches with reader contexts so we can perform multiple searches with specific point-in-time readers.

Use sequence numbers and force merge UUID to determine whether a shard has changed or not instead before falling back to comparing files to get incremental snapshots on primary fail-over.

* _cat/shards support path stats * fix some style case * fix some style case * fix rest-api-spec cat.shards error * fix rest-api-spec cat.shards bwc error Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

This commit adds the "_async_searhc" get and delete APIs to the AsyncSearchClient in the High Level Rest Client. Relates to elastic#49091

…tic#53661) DoubleValuesSource is the type-safe replacement for ValueSource in the lucene core. Most of elasticsearch has moved to use these, but lang-expressions is still using the old version. This commit migrates lang-expressions as well.

…ic#53981) The test in CloseWhileRelocatingShardsIT failed recently multiple times (3) when waiting for initial indices to be become green. Looking at the execution logs from elastic#53544 it appears at the very beginning of the test and when the WindowsFS file system is picked up (which is known to slow down tests). This commit simply increases the timeout for the first ensureGreen() to 60 seconds. If the test continues to fail, we might want to test a larger timeout or disable WindowsFS for this test. Closes elastic#53544

elasticmachine · 2020-03-23T15:35:29Z

Pinging @elastic/es-search (:Search/Search)

This delays deserializing the aggregation response try until *right* before we merge the objects.

…tic#53912) This reverts commit 4c0e8f1. It should be re-added once elastic#53909 is addressed.

Removes a mention of the `_doc` mapping type that's no longer applicable now that mapping types are removed/deprecated.

The field holding the timing stats was mistakenly called `timings_stats`.

Since a data frame analytics job may have associated docs in the .ml-stats-* indices, when the job is deleted we should delete those docs too.

Fix bad link in top_metrics.

…ggregation (elastic#53874)

This adds reenables IndicesRequestCacheIT.testQueryRewrite and enables logging for it. Relates to elastic#32827

…stic#53873) This commit changes the pre_filter_shard_size default from 128 to unspecified. This allows to apply heuristics based on the request and the target indices when deciding whether the can match phase should run or not. When unspecified, this pr runs the can match phase automatically if one of these conditions is met: * The request targets more than 128 shards. * The request contains read-only indices. * The primary sort of the query targets an indexed field. Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value. Closes elastic#39835

This change adds the `nori_number` token filter. It also adds a `discard_punctuation` option in nori_tokenizer that should be used in conjunction with the new filter.

This moves the pipeline aggregation validation from the data node to the coordinating node so that we, eventually, can stop sending pipeline aggregations to the data nodes entirely. In fact, it moves it into the "request validation" stage so multiple errors can be accumulated and sent back to the requester for the entire request. We can't always take advantage of that, but it'll be nice for folks not to have to play whack-a-mole with validation. This is implemented by replacing `PipelineAggretionBuilder#validate` with: ``` protected abstract void validate(ValidationContext context); ``` The `ValidationContext` handles the accumulation of validation failures, provides access to the aggregation's siblings, and implements a few validation utility methods.

Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>

Updates a few versions in serialization because we didn't make the 7.7.0 release train.

In Elasticsearch 7.7.0, the setting cluster.remote.connect was deprecated. In this commit, we remote the setting permanently in favor of setting node.remote_cluster_client.

Elasticsearch has a number of different BytesReference implementations. These implementations can all implement the interface in different ways with subtly different behavior and performance characteristics. On the other-hand, the JVM only represents bytes as an array or a direct byte buffer. This commit deletes the specialized Netty implementations and moves to using a generic ByteBuffer reference type. This will allow us to focus on standardizing performance and behave around a smaller number of implementations that can be used by all components in Elasticsearch.

Xpack license state contains a helper method to determine whether security is disabled due to license level defaults. Most code needs to know whether security is enabled, not disabled, but this method exists so that the security being explicitly disabled can be distinguished from licence level defaulting to disabled. However, in the case that security is explicitly disabled, the handlers in question are never registered, so security is implicitly not disabled explicitly, and thus we can share a single method to know whether licensing is enabled.

…elastic#54134)

This test assumed cluster events would be processed quickly which is not always true

This was a bug in elastic#54043, where the logic for security being enabled needs to be combined with it not being explicitly disabled.

Silly intellij config is being overriden at the moment...

Today the keystore add command can only handle adding a single setting/value pair in a single invocation. This incurs the startup costs of the JVM many times, which in some environments can be expensive. This commit teaches the add keystore command to accept adding multiple settings in a single invocation.

Avoid string comparison when we can use safter enums. This refactor is a follow up for elastic#52178. Resolves: elastic#52511

The documentation was missing the long option for the force option, and the short option for the stdin option. This commit addresses this by adding these to the documentation.

Today the keystore add-file command can only handle adding a single setting/file pair in a single invocation. This incurs the startup costs of the JVM many times, which in some environments can be expensive. This commit teaches the add-file keystore command to accept adding multiple settings in a single invocation.

Retry here to work around the possible race between snapshot finalization and deletion. Closes elastic#53509

…sor docs (elastic#54190)

This commit renames wait_for_completion to wait_for_completion_timeout in submit async search and get async search. Also it renames clean_on_completion to keep_on_completion and turns around its behaviour. Closes elastic#54069

Changes ThreadPool's schedule method to run the schedule task in the context of the thread that scheduled the task. This is the more sensible default for this method, and eliminates a range of bugs where the current thread context is mistakenly dropped. Closes elastic#17143

This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: #52741 - Allow searches with a specific reader context: #53989 - Add the ability to acquire readers in IndexShard: #54966 Relates #46523 Relates #26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>

This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: elastic#52741 - Allow searches with a specific reader context: elastic#53989 - Add the ability to acquire readers in IndexShard: elastic#54966 Relates elastic#46523 Relates elastic#26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>

This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: #52741 - Allow searches with a specific reader context: #53989 - Add the ability to acquire readers in IndexShard: #54966 Relates #46523 Relates #26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>

original-brownbear and others added 6 commits March 23, 2020 14:24

Better Incrementality for Snapshots of Unchanged Shards (elastic#52182)

87c910b

Use sequence numbers and force merge UUID to determine whether a shard has changed or not instead before falling back to comparing files to get incremental snapshots on primary fail-over.

/_cat/shards support path stats (elastic#53461)

face375

* _cat/shards support path stats * fix some style case * fix some style case * fix rest-api-spec cat.shards error * fix rest-api-spec cat.shards bwc error Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Add async_search get and delete APIs to HLRC (elastic#53828)

3ceb60b

This commit adds the "_async_searhc" get and delete APIs to the AsyncSearchClient in the High Level Rest Client. Relates to elastic#49091

Integrate search with reader contexts

33612b0

dnhatn added >feature :Search/Search Search-related issues that do not fall into other categories labels Mar 23, 2020

dnhatn requested a review from jimczi March 23, 2020 15:35

mark-vieira and others added 20 commits March 23, 2020 08:47

Don't include HLRC on downstream classpath twice (elastic#53983)

975d4f5

Try to save memory on aggregations (elastic#53793)

1ca52fc

This delays deserializing the aggregation response try until *right* before we merge the objects.

fix yaml test

b186153

Revert "Introduce system index APIs for Kibana (elastic#52385)" (elas…

8264bdd

…tic#53912) This reverts commit 4c0e8f1. It should be re-added once elastic#53909 is addressed.

[DOCS] Clarify routing enforcement in docs (elastic#53945)

34feb3c

Removes a mention of the `_doc` mapping type that's no longer applicable now that mapping types are removed/deprecated.

[ML] Fix typo in outlier detection timing stats (elastic#53988)

78f473d

The field holding the timing stats was mistakenly called `timings_stats`.

[ML] Delete DF analytics stats upon job deletion (elastic#53933)

7665993

Since a data frame analytics job may have associated docs in the .ml-stats-* indices, when the job is deleted we should delete those docs too.

[DOCS] Add generated_dest_index to preview transform API (elastic#53905)

bad7580

[DOCS] Adds data nanos transform limitation (elastic#53826)

0ea4324

check keep alive limit for scroll requests

65a4fea

stylecheck

9439e22

[DOCS] Fixes formatting in transform overview (elastic#53900)

ea33795

[DOCS] link fix (elastic#53973)

de1229c

Fix bad link in top_metrics.

Verify that the field is aggregatable before attempting cardinality a…

f783670

…ggregation (elastic#53874)

fix hlrc test

f6a11b8

Add logging and enable testQueryRewrite (elastic#53809)

20d861c

This adds reenables IndicesRequestCacheIT.testQueryRewrite and enables logging for it. Relates to elastic#32827

Add nori_number token filter in analysis-nori (elastic#53583)

8d4ff29

This change adds the `nori_number` token filter. It also adds a `discard_punctuation` option in nori_tokenizer that should be used in conjunction with the new filter.

Re-enable bwc tests disabled from elastic#53912 (elastic#54001)

ffbb558

lcawl and others added 19 commits March 25, 2020 12:35

[DOCS] Augments cat transforms API (elastic#53776)

6fceef7

Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>

Reenable BWC after backporting elastic#53730 (elastic#54230)

53c6278

Updates a few versions in serialization because we didn't make the 7.7.0 release train.

Remove the cluster.remote.connect setting (elastic#54175)

513985e

In Elasticsearch 7.7.0, the setting cluster.remote.connect was deprecated. In this commit, we remote the setting permanently in favor of setting node.remote_cluster_client.

Upgrade to Gradle 6.3 (elastic#53499)

6c29bf3

Define lifecycle tasks for running different types of packaging tests (…

fd54cb0

…elastic#54134)

Improve stability of SamlServiceProviderIndexTests (elastic#54166)

c07bc3d

This test assumed cluster events would be processed quickly which is not always true

Fix boolean enabled logic in xpack usage test

8db51cd

This was a bug in elastic#54043, where the logic for security being enabled needs to be combined with it not being explicitly disabled.

Fix wildcard imports

f943100

Silly intellij config is being overriden at the moment...

Check authentication type using enum instead of string (elastic#54145)

e244a3b

Avoid string comparison when we can use safter enums. This refactor is a follow up for elastic#52178. Resolves: elastic#52511

Complete keystore CLI options documentation (elastic#54242)

c120388

The documentation was missing the long option for the force option, and the short option for the stdin option. This commit addresses this by adding these to the documentation.

Retry in SnapshotIT Snapshot Abort (elastic#54195)

cdf2725

Retry here to work around the possible race between snapshot finalization and deletion. Closes elastic#53509

[DOCS] Adds feature importance mapping subsection to inference proces…

a65e95e

…sor docs (elastic#54190)

Async search: rename REST parameters (elastic#54198)

1c48214

This commit renames wait_for_completion to wait_for_completion_timeout in submit async search and get async search. Also it renames clean_on_completion to keep_on_completion and turns around its behaviour. Closes elastic#54069

Merge branch 'master' into pr/53989

80265eb

jimczi merged commit 8913369 into elastic:reader-context Mar 26, 2020

dnhatn deleted the search-after branch March 26, 2020 15:04

dnhatn mentioned this pull request May 9, 2020

Introduce search context - point in time view of indices #56480

Closed

dnhatn mentioned this pull request Aug 12, 2020

Introduce point in time APIs in x-pack basic #61062

Merged

dnhatn mentioned this pull request Sep 2, 2020

Introduce point in time APIs in x-pack basic #61872

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow searches with specific reader contexts #53989

Allow searches with specific reader contexts #53989

dnhatn commented Mar 23, 2020

elasticmachine commented Mar 23, 2020

Allow searches with specific reader contexts #53989

Allow searches with specific reader contexts #53989

Conversation

dnhatn commented Mar 23, 2020

elasticmachine commented Mar 23, 2020