Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow searches with specific reader contexts #53989

Merged
merged 155 commits into from
Mar 26, 2020

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Mar 23, 2020

This commit integrates searches with reader contexts so we can perform multiple searches with specific point-in-time readers.

original-brownbear and others added 6 commits March 23, 2020 14:24
Use sequence numbers and force merge UUID to determine whether a shard has changed or not instead before falling back to comparing files to get incremental snapshots on primary fail-over.
* _cat/shards support path stats

* fix some style case

* fix some style case

* fix rest-api-spec cat.shards error

* fix rest-api-spec cat.shards bwc error

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This commit adds the "_async_searhc" get and delete APIs to the
AsyncSearchClient in the High Level Rest Client.

Relates to elastic#49091
…tic#53661)

DoubleValuesSource is the type-safe replacement for ValueSource in the lucene
core. Most of elasticsearch has moved to use these, but lang-expressions is still
using the old version. This commit migrates lang-expressions as well.
…ic#53981)

The test in CloseWhileRelocatingShardsIT failed recently 
multiple times (3) when waiting for initial indices to be 
become green. Looking at the execution logs from  elastic#53544
 it appears at the very beginning of the test and when 
the WindowsFS file system is picked up (which is known 
to slow down tests).

This commit simply increases the timeout for the first 
ensureGreen() to 60 seconds. If the test continues to fail, 
we might want to test a larger timeout or disable 
WindowsFS for this test.

Closes elastic#53544
@dnhatn dnhatn added >feature :Search/Search Search-related issues that do not fall into other categories labels Mar 23, 2020
@dnhatn dnhatn requested a review from jimczi March 23, 2020 15:35
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

mark-vieira and others added 20 commits March 23, 2020 08:47
This delays deserializing the aggregation response try until *right*
before we merge the objects.
Removes a mention of the `_doc` mapping type that's
no longer applicable now that mapping types are
removed/deprecated.
The field holding the timing stats was mistakenly called
`timings_stats`.
Since a data frame analytics job may have associated docs
in the .ml-stats-* indices, when the job is deleted we
should delete those docs too.
Fix bad link in top_metrics.
This adds reenables IndicesRequestCacheIT.testQueryRewrite
and enables logging for it.

Relates to elastic#32827
…stic#53873)

This commit changes the pre_filter_shard_size default from 128 to unspecified.
This allows to apply heuristics based on the request and the target indices when deciding
whether the can match phase should run or not. When unspecified, this pr runs the can match phase
automatically if one of these conditions is met:
  * The request targets more than 128 shards.
  * The request contains read-only indices.
  * The primary sort of the query targets an indexed field.
Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value.

Closes elastic#39835
This change adds the `nori_number` token filter.
It also adds a `discard_punctuation` option in nori_tokenizer that should be used in conjunction with the new filter.
This moves the pipeline aggregation validation from the data node to the
coordinating node so that we, eventually, can stop sending pipeline
aggregations to the data nodes entirely. In fact, it moves it into the
"request validation" stage so multiple errors can be accumulated and
sent back to the requester for the entire request. We can't always take
advantage of that, but it'll be nice for folks not to have to play
whack-a-mole with validation.

This is implemented by replacing `PipelineAggretionBuilder#validate`
with:
```
protected abstract void validate(ValidationContext context);
```

The `ValidationContext` handles the accumulation of validation failures,
provides access to the aggregation's siblings, and implements a few
validation utility methods.
lcawl and others added 19 commits March 25, 2020 12:35
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Updates a few versions in serialization because we didn't make the 7.7.0
release train.
In Elasticsearch 7.7.0, the setting cluster.remote.connect was
deprecated. In this commit, we remote the setting permanently in favor
of setting node.remote_cluster_client.
Elasticsearch has a number of different BytesReference implementations.
These implementations can all implement the interface in different ways
with subtly different behavior and performance characteristics. On the
other-hand, the JVM only represents bytes as an array or a direct byte
buffer. This commit deletes the specialized Netty implementations and
moves to using a generic ByteBuffer reference type. This will allow us
to focus on standardizing performance and behave around a smaller number
of implementations that can be used by all components in Elasticsearch.
Xpack license state contains a helper method to determine whether
security is disabled due to license level defaults. Most code needs to
know whether security is enabled, not disabled, but this method exists
so that the security being explicitly disabled can be distinguished from
licence level defaulting to disabled. However, in the case that security
is explicitly disabled, the handlers in question are never registered,
so security is implicitly not disabled explicitly, and thus we can share
a single method to know whether licensing is enabled.
This test assumed cluster events would be processed quickly which is
not always true
This was a bug in elastic#54043, where the logic for security being enabled
needs to be combined with it not being explicitly disabled.
Silly intellij config is being overriden at the moment...
Today the keystore add command can only handle adding a single
setting/value pair in a single invocation. This incurs the startup costs
of the JVM many times, which in some environments can be expensive. This
commit teaches the add keystore command to accept adding multiple
settings in a single invocation.
Avoid string comparison when we can use safter enums. 
This refactor is a follow up for elastic#52178.

Resolves: elastic#52511
The documentation was missing the long option for the force option, and
the short option for the stdin option. This commit addresses this by
adding these to the documentation.
Today the keystore add-file command can only handle adding a single
setting/file pair in a single invocation. This incurs the startup costs
of the JVM many times, which in some environments can be expensive. This
commit teaches the add-file keystore command to accept adding multiple
settings in a single invocation.
Retry here to work around the possible race between snapshot finalization
and deletion.

Closes elastic#53509
This commit renames wait_for_completion to wait_for_completion_timeout in submit async search and get async search.
Also it renames clean_on_completion to keep_on_completion and turns around its behaviour.

Closes elastic#54069
Changes ThreadPool's schedule method to run the schedule task in the context of the thread
that scheduled the task.

This is the more sensible default for this method, and eliminates a range of bugs where the
current thread context is mistakenly dropped.

Closes elastic#17143
@jimczi jimczi merged commit 8913369 into elastic:reader-context Mar 26, 2020
@dnhatn dnhatn deleted the search-after branch March 26, 2020 15:04
dnhatn added a commit that referenced this pull request Aug 25, 2020
This commit introduces a new API that manages point-in-times in x-pack 
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The 
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.

```
POST /my_index/_pit?keep_alive=1m
```

The response from the above request includes a `id`, which should be 
passed to the `id` of the `pit` parameter of search requests.

```
POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}
```

Point-in-times are automatically closed when the `keep_alive` is 
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

```
DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```

#### Notable works in this change:

- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966

Relates #46523
Relates #26472

Co-authored-by: Jim Ferenczi <jimczi@apache.org>
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Sep 10, 2020
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.

```
POST /my_index/_pit?keep_alive=1m
```

The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.

```
POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}
```

Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

```
DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```

#### Notable works in this change:

- Move the search state to the coordinating node: elastic#52741
- Allow searches with a specific reader context: elastic#53989
- Add the ability to acquire readers in IndexShard: elastic#54966

Relates elastic#46523
Relates elastic#26472

Co-authored-by: Jim Ferenczi <jimczi@apache.org>
dnhatn added a commit that referenced this pull request Sep 10, 2020
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.

```
POST /my_index/_pit?keep_alive=1m
```

The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.

```
POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}
```

Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

```
DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```

#### Notable works in this change:

- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966

Relates #46523
Relates #26472

Co-authored-by: Jim Ferenczi <jimczi@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.