Expose timestamp field type on coordinator node #65873

DaveCTurner · 2020-12-04T10:26:57Z

Today a coordinating node does not have (easy) access to the mappings
for the indices for the searches it wishes to coordinate. This means it
can't properly interpret a timestamp range filter in a query and must
involve a copy of every shard in at least the can_match phase. It
therefore cannot cope with cases when shards are temporarily not started
even if those shards are irrelevant to the search.

This commit captures the mapping of the @timestamp field for indices
which expose a timestamp range in their index metadata.

Today a coordinating node does not have (easy) access to the mappings for the indices for the searches it wishes to coordinate. This means it can't properly interpret a timestamp range filter in a query and must involve a copy of every shard in at least the `can_match` phase. It therefore cannot cope with cases when shards are temporarily not started even if those shards are irrelevant to the search. This commit captures the mapping of the `@timestamp` field for indices which expose a timestamp range in their index metadata.

elasticmachine · 2020-12-04T10:27:00Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner · 2020-12-04T10:42:19Z

server/src/main/java/org/elasticsearch/indices/TimestampFieldMapperService.java

+    @Nullable
+    public DateFieldMapper.DateFieldType getTimestampFieldType(Index index) {
+        final PlainActionFuture<DateFieldMapper.DateFieldType> future = fieldTypesByIndex.get(index);
+        if (future == null || future.isDone() == false) {


There remains a question of whether we should block here or not (and if so, for how long).

On reflection I think we shouldn't block. Returning null sooner will allow the search coordination to proceed normally, ignoring any timestamp filter and deferring any skipping to the individual shards. This means we'll see shard failures if the coordinating node falls behind on extracting these mappings AND some of the shards are unassigned, which is hopefully rare.

As a follow-up we could in theory add another more patient getter to support a workflow that goes:

we call getTimestampFieldType which returns null

some shards are unavailable for the can_match phase

we call getTimestampFieldTypePatiently to see for whether those shard failures can be ignored or not

I think it makes sense to return null and proceed regularly if the mapping isn't available yet. This should be rare enough to cause too much trouble.
I guess the most problematic scenario is when a node joins and has to parse a lot of mappings, right?

Right, although in that case there's no particular reason to expect shards to be unavailable.

fcofdez

LGTM, just some minor comments in the test 👍

...rozen-indices/src/internalClusterTest/java/org/elasticsearch/index/engine/FrozenIndexIT.java

fcofdez · 2020-12-04T10:53:11Z

...rozen-indices/src/internalClusterTest/java/org/elasticsearch/index/engine/FrozenIndexIT.java

+                timestampFieldTypeFuture.onResponse(timestampFieldType);
+            });
+            assertTrue(timestampFieldTypeFuture.isDone());
+            assertThat(timestampFieldTypeFuture.get().dateTimeFormatter().locale().toString(), equalTo(locale));


Maybe we can add a an assertion that checks that DateFieldMapper.DateFieldType#parse works with the original timestamp string?

Ok let me try and remember the month names in French to give this assertion some teeth 🇫🇷

Done in 79ef7b0. Remembering the names wasn't the hard bit, it was working out that in French we write month names lower-case, with a trailing ., and sometimes use more than 3 letters.

fcofdez · 2020-12-04T10:55:06Z

server/src/main/java/org/elasticsearch/indices/TimestampFieldMapperService.java

+    @Nullable
+    public DateFieldMapper.DateFieldType getTimestampFieldType(Index index) {
+        final PlainActionFuture<DateFieldMapper.DateFieldType> future = fieldTypesByIndex.get(index);
+        if (future == null || future.isDone() == false) {


I think it makes sense to return null and proceed regularly if the mapping isn't available yet. This should be rare enough to cause too much trouble.
I guess the most problematic scenario is when a node joins and has to parse a lot of mappings, right?

DaveCTurner · 2020-12-04T14:09:00Z

@elasticmachine please run elasticsearch-ci/bwc

DaveCTurner · 2020-12-04T15:21:25Z

@elasticmachine please run elasticsearch-ci/2

Today a coordinating node does not have (easy) access to the mappings for the indices for the searches it wishes to coordinate. This means it can't properly interpret a timestamp range filter in a query and must involve a copy of every shard in at least the `can_match` phase. It therefore cannot cope with cases when shards are temporarily not started even if those shards are irrelevant to the search. This commit captures the mapping of the `@timestamp` field for indices which expose a timestamp range in their index metadata.

DaveCTurner · 2020-12-04T17:02:13Z

The backport failed in CI: https://gradle-enterprise.elastic.co/s/s3kleye5lwxds/console-log?task=:x-pack:plugin:frozen-indices:internalClusterTest

No idea why yet but I've reverted it from 7.x for now.

This reverts commit a0e5f9b.

Today a coordinating node does not have (easy) access to the mappings for the indices for the searches it wishes to coordinate. This means it can't properly interpret a timestamp range filter in a query and must involve a copy of every shard in at least the `can_match` phase. It therefore cannot cope with cases when shards are temporarily not started even if those shards are irrelevant to the search. This commit captures the mapping of the `@timestamp` field for indices which expose a timestamp range in their index metadata.

Today a coordinating node does not have (easy) access to the mappings for the indices for the searches it wishes to coordinate. This means it can't properly interpret a timestamp range filter in a query and must involve a copy of every shard in at least the `can_match` phase. It therefore cannot cope with cases when shards are temporarily not started even if those shards are irrelevant to the search. This commit captures the mapping of the `@timestamp` field for indices which expose a timestamp range in their index metadata. Backport of #65873 to 7.x

DaveCTurner · 2020-12-07T10:44:11Z

I reinstated the backport with a few JDK8-specific tweaks in #65925.

DaveCTurner added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.11.0 labels Dec 4, 2020

DaveCTurner requested a review from fcofdez December 4, 2020 10:26

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Dec 4, 2020

Don't block, just return null if not known

d44afc0

DaveCTurner commented Dec 4, 2020

View reviewed changes

fcofdez approved these changes Dec 4, 2020

View reviewed changes

Ensure that localised dates can really be parsed by the mapper we got

79ef7b0

Merge branch 'master' into 2020-12-03-extract-timestamp-field-type

693a9a6

DaveCTurner merged commit e1e1974 into elastic:master Dec 4, 2020

DaveCTurner deleted the 2020-12-03-extract-timestamp-field-type branch December 4, 2020 16:11

DaveCTurner added the backport pending label Dec 4, 2020

DaveCTurner added a commit that referenced this pull request Dec 4, 2020

Revert "Expose timestamp field type on coordinator node (#65873)"

5cc3297

This reverts commit a0e5f9b.

DaveCTurner mentioned this pull request Dec 7, 2020

Expose timestamp field type on coordinator node #65925

Merged

DaveCTurner removed the backport pending label Dec 7, 2020

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose timestamp field type on coordinator node #65873

Expose timestamp field type on coordinator node #65873

DaveCTurner commented Dec 4, 2020

elasticmachine commented Dec 4, 2020

DaveCTurner Dec 4, 2020

fcofdez Dec 4, 2020

DaveCTurner Dec 4, 2020

fcofdez left a comment

fcofdez Dec 4, 2020

DaveCTurner Dec 4, 2020

DaveCTurner Dec 4, 2020

fcofdez Dec 4, 2020

DaveCTurner commented Dec 4, 2020

DaveCTurner commented Dec 4, 2020

DaveCTurner commented Dec 4, 2020

DaveCTurner commented Dec 7, 2020

Expose timestamp field type on coordinator node #65873

Expose timestamp field type on coordinator node #65873

Conversation

DaveCTurner commented Dec 4, 2020

elasticmachine commented Dec 4, 2020

DaveCTurner Dec 4, 2020

Choose a reason for hiding this comment

fcofdez Dec 4, 2020

Choose a reason for hiding this comment

DaveCTurner Dec 4, 2020

Choose a reason for hiding this comment

fcofdez left a comment

Choose a reason for hiding this comment

fcofdez Dec 4, 2020

Choose a reason for hiding this comment

DaveCTurner Dec 4, 2020

Choose a reason for hiding this comment

DaveCTurner Dec 4, 2020

Choose a reason for hiding this comment

fcofdez Dec 4, 2020

Choose a reason for hiding this comment

DaveCTurner commented Dec 4, 2020

DaveCTurner commented Dec 4, 2020

DaveCTurner commented Dec 4, 2020

DaveCTurner commented Dec 7, 2020