-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that fields copied using copy_to are not present in synthetic source #112625
Conversation
fb4e59a
to
d07d1e1
Compare
d07d1e1
to
d5435ad
Compare
private static final FieldMapper NO_OP_FIELDMAPPER = new FieldMapper( | ||
"no-op", | ||
new MappedFieldType("no-op", false, false, false, TextSearchInfo.NONE, Collections.emptyMap()) { | ||
private static FieldMapper noopFieldMapper(String path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to correctly implement fullPath()
that is now used in context.isCopyToDestinationField
check.
@@ -207,7 +213,7 @@ protected DocumentParserContext( | |||
parent, | |||
dynamic, | |||
new HashSet<>(), | |||
new HashSet<>(), | |||
new HashSet<>(mappingLookup.fieldTypesLookup().getCopyToDestinationFields()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one existing user of this DocumentParser#postProcessDynamicArrayMapping
but i think the intention of that code is the same as ours. It needs to have logic based on the fact that field is used as a copy_to destination. Previously that logic kicked in only if such a field was processed in this document already. With this, it will kick in the edge case of copy_to destination having actual values in the document. That seems to align with the intention of the code.
cc @kderusso
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is fine to only kick in if copy_to
has values. Thank you for the ping.
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
&& context.sourceKeepModeFromIndexSettings() == Mapper.SourceKeepMode.ARRAYS; | ||
boolean dynamicRuntimeContext = context.dynamic() == ObjectMapper.Dynamic.RUNTIME; | ||
if (objectRequiresStoringSource || fieldWithFallbackSyntheticSource || dynamicRuntimeContext || fieldWithStoredArraySource) { | ||
boolean copyToFieldHasValuesInDocument = context.isWithinCopyTo() == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: String fullPath = context.path().pathAsText(arrayFieldName);
and use below.
server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java
Show resolved
Hide resolved
@@ -103,19 +117,22 @@ static void decodeAndWrite(XContentBuilder b, BytesRef r) throws IOException { | |||
* @throws IOException | |||
*/ | |||
static void writeMerged(XContentBuilder b, String fieldName, List<BytesRef> encodedParts) throws IOException { | |||
if (encodedParts.isEmpty()) { | |||
var partsWithData = encodedParts.stream().filter(XContentDataHelper::isDataPresent).toList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc stream ops are fairly slow, consider replacing with regular list processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably be optimistic and first check if there's an empty element, then create the copy on a second pass if one is found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote this but i don't have a feel if this is any faster. It skips allocating stream object i guess.
if (context.dynamic() == ObjectMapper.Dynamic.RUNTIME && context.canAddIgnoredField()) { | ||
try { | ||
context.addIgnoredField( | ||
IgnoredSourceFieldMapper.NameValue.fromContext(context, path, XContentDataHelper.encodeToken(context.parser())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Freakin path, again. That should fix issues with runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specific to this change. It may also help with #111916 eventually.
k: ["5", "6"] | ||
copy: ["7", "8"] | ||
- match: { hits.hits.3.fields.copy: ["5", "6", "7", "8"] } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a test where copy_to is applied within a context where the source has been stored already, e.g. in an array that has been stored.
Nice, do we need to add coverage in randomized testing too? |
if (encodedParts.isEmpty()) { | ||
var partsWithData = encodedParts.stream().filter(XContentDataHelper::isDataPresent).toList(); | ||
|
||
if (partsWithData.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace partsWithData
with encodedParts
and move the early exit above?
@@ -158,6 +175,10 @@ static void writeMerged(XContentBuilder b, String fieldName, List<BytesRef> enco | |||
b.endArray(); | |||
} | |||
|
|||
public static boolean isDataPresent(BytesRef encoded) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this logic to check for empty and rename to isEmpty
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that better? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All collections work with the concept of "being empty" and have an isEmpty
and I think also that usually writing logic like "if it is empty then exit or return" is easier to understand...no data, nothing to do. Of course it is not better from a functional perspective but I see it more "coherent" with the rest.
For instance encodedParts
and partsWithData
above use isEmpty
.
I see that you use isDataPresent
in the stream filter
which works on the opposite of isEmpty
...so that is probably the reason why you did it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, filter
is the reason.
@@ -1214,3 +1224,231 @@ fallback synthetic_source for text field: | |||
hits.hits.0._source: | |||
text: [ "world", "hello", "world" ] | |||
|
|||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also have a reindex test now that we have copy_to
working in synthetic source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add it in a separate PR.
- match: { hits.hits.1.fields.copy: ["5", "6", "7", "8"] } | ||
|
||
--- | ||
synthetic_source with copy_to field from dynamic template having values in source: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check that using copy_to
results in error when used with object-like fields? Note: it is not just object
and nested
but also things like range
or geo-like. I think this is important for BWC...we need to keep the error-behaviour.
@elasticmachine update branch |
💚 Backport successful
|
This is a follow-up change to #112294 that fixes and edge case not covered there.