-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't skip shards in coord rewrite if timestamp is an alias #117271
base: main
Are you sure you want to change the base?
Conversation
The coordinator rewrite has logic to skip indices if the provided date range filter is not within the min and max range of all of its shards. This mechanism is enabled for event.ingested and @timestamp fields, against searchable snapshots. We have basic checks that such fields need to be of date field type, yet if they are defined as alias of a date field, their range will be empty, which indicates that the shards are empty, and the coord rewrite logic resolves the alias and ends up skipping shards that may have matching docs. This commit adds an explicit check that declares the range UNKNOWN instead of EMPTY in these circumstances. The same check is also performed in the coord rewrite logic, so that shards are no longer skipped by mistake.
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
Hi @javanna, I've created a changelog YAML for you. |
@@ -2274,8 +2274,8 @@ private ShardLongFieldRange determineShardLongFieldRange(String fieldName) { | |||
return ShardLongFieldRange.UNKNOWN; // no mapper service, no idea if the field even exists | |||
} | |||
final MappedFieldType mappedFieldType = mapperService().fieldType(fieldName); | |||
if (mappedFieldType instanceof DateFieldMapper.DateFieldType == false) { | |||
return ShardLongFieldRange.UNKNOWN; // field missing or not a date | |||
if (mappedFieldType instanceof DateFieldMapper.DateFieldType == false || mappedFieldType.name().equals(fieldName) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fix is technically not necessary: the change below on the search side is enough to fix the issue, because we no longer look for the range if timestamp is an alias. Yet it seems wrong to mark the range EMPTY in that case, hence I think it makes sense to correct this side too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaveCTurner do you think that this introduces possible incompatibility with previous versions, given that for the same situation, newer indices will have UNKNOWN while older indices will have EMPTY in their range info? I believe this is ok as long as the search side is also fixed to do the same check. But I don't know deeply the update mechanism here: will the range info even get updated for older indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the index metadata contains a range with org.elasticsearch.index.shard.IndexLongFieldRange#isComplete()
being true
, it shouldn't change again, and it should reach that state after starting all the shards in the index the first time. But I know this area has moved around quite a bit since I last touched it and there was some confusion about EMPTY
vs UNKNOWN
in the past (e.g. around #106252) whose resolution I didn't follow, so I can't be sure that it does exactly what I think it should do :)
@@ -438,6 +438,7 @@ public String getWriteableName() { | |||
protected MappedFieldType.Relation getRelation(final CoordinatorRewriteContext coordinatorRewriteContext) { | |||
final MappedFieldType fieldType = coordinatorRewriteContext.getFieldType(fieldName); | |||
if (fieldType instanceof final DateFieldMapper.DateFieldType dateFieldType) { | |||
assert fieldName.equals(fieldType.name()); | |||
IndexLongFieldRange fieldRange = coordinatorRewriteContext.getFieldRange(fieldName); | |||
if (fieldRange.isComplete() == false || fieldRange == IndexLongFieldRange.EMPTY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff does not include it, but the core of the issue is that we return DISJOINT for an EMPTY range, hence we skip. I wonder if we should rather change this: if we have incomplete data should we rather let the query run instead of skipping and taking the risk of not returning matching data like in this case? On the other hand if EMPTY reliably means empty shard, it makes sense to skip. Not too sure about the isComplete part here.
Also, I cannot think of other situations where timestamp comes back as the right field type and checks all the boxes for coord rewrite, yet perhaps we were not able to store the range in the cluster state, because min and max were not available. In that case we'd end up skipping. It may be safer to not skip instead. It would have saved us from this bug at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this the relevant discussion about that issue from a PR I had that was closed not merged:
#110873 (review)
#110873 (comment)
#110873 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And Armin had the same question here: #111419 (comment)
But I never answered him on the PR. I think I referred him to the above discussion outside the PR. I didn't want to change functionality more than necessary since the changes around event.ingested were so complex and I was worried of subtle bugs.
But now might be a time to raise a GH issue around this and get another discussion going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Production code changes look good to me.
I think there is another test to be written though?
if (mappedFieldType instanceof DateFieldMapper.DateFieldType == false) { | ||
return ShardLongFieldRange.UNKNOWN; // field missing or not a date | ||
if (mappedFieldType instanceof DateFieldMapper.DateFieldType == false || mappedFieldType.name().equals(fieldName) == false) { | ||
return ShardLongFieldRange.UNKNOWN; // field missing or not a date or an alias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code comment is slightly ambiguous "not a date or an alias" could be read to mean "not an alias", so would be clearer to say:
// field is missing, an alias or not a date field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT but ++ to Michael's wording :)
assertThat(searchResponse.getTotalShards(), equalTo(2)); | ||
assertThat(searchResponse.getHits().getTotalHits().value(), equalTo(20L)); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm misunderstanding this test, but I would have thought the test you want to write is one where with @timestamp
the shards are skipped where as with the alias the shards are not skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that test can be done in server/src/test/java/org/elasticsearch/action/search/CanMatchPreFilterSearchPhaseTests.java ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that the tests for the skipping of @timestamp
and event.ingested
are already written, as that isn't the functionality I am adding. What my change does is fixing that when one of the two is an alias, shards are not incorrectly skipped.
@@ -438,6 +438,7 @@ public String getWriteableName() { | |||
protected MappedFieldType.Relation getRelation(final CoordinatorRewriteContext coordinatorRewriteContext) { | |||
final MappedFieldType fieldType = coordinatorRewriteContext.getFieldType(fieldName); | |||
if (fieldType instanceof final DateFieldMapper.DateFieldType dateFieldType) { | |||
assert fieldName.equals(fieldType.name()); | |||
IndexLongFieldRange fieldRange = coordinatorRewriteContext.getFieldRange(fieldName); | |||
if (fieldRange.isComplete() == false || fieldRange == IndexLongFieldRange.EMPTY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this the relevant discussion about that issue from a PR I had that was closed not merged:
#110873 (review)
#110873 (comment)
#110873 (comment)
@@ -438,6 +438,7 @@ public String getWriteableName() { | |||
protected MappedFieldType.Relation getRelation(final CoordinatorRewriteContext coordinatorRewriteContext) { | |||
final MappedFieldType fieldType = coordinatorRewriteContext.getFieldType(fieldName); | |||
if (fieldType instanceof final DateFieldMapper.DateFieldType dateFieldType) { | |||
assert fieldName.equals(fieldType.name()); | |||
IndexLongFieldRange fieldRange = coordinatorRewriteContext.getFieldRange(fieldName); | |||
if (fieldRange.isComplete() == false || fieldRange == IndexLongFieldRange.EMPTY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And Armin had the same question here: #111419 (comment)
But I never answered him on the PR. I think I referred him to the above discussion outside the PR. I didn't want to change functionality more than necessary since the changes around event.ingested were so complex and I was worried of subtle bugs.
But now might be a time to raise a GH issue around this and get another discussion going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks Luca :)
I do agree though, we should have a discussion on the rewrite-to-none-if-not-yet-available situation in some form. The current behavior feels incredibly confusing to me. If a searchable snapshot shard is ready and covered by the shard resolution logic in the coordinator we shouldn't have some secondary layer that hacks in skipping, the currently implementation has already cost multiple people a lot more time than reasonable to wrap their heads around.
The coordinator rewrite has logic to skip indices if the provided date range filter is not within the min and max range of all of its shards. This mechanism is enabled for event.ingested and @timestamp fields, against searchable snapshots.
We have basic checks that such fields need to be of date field type, yet if they are defined as alias of a date field, their range will be empty, which indicates that the shards are empty, and the coord rewrite logic resolves the alias and ends up skipping shards that may have matching docs.
This commit adds an explicit check that declares the range UNKNOWN instead of EMPTY in these circumstances. The same check is also performed in the coord rewrite logic, so that shards are no longer skipped by mistake.