Ensure that fields copied using copy_to are not present in synthetic source #112625

lkts · 2024-09-06T21:49:35Z

This is a follow-up change to #112294 that fixes and edge case not covered there.

…source

lkts · 2024-09-10T21:11:38Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java

-    private static final FieldMapper NO_OP_FIELDMAPPER = new FieldMapper(
-        "no-op",
-        new MappedFieldType("no-op", false, false, false, TextSearchInfo.NONE, Collections.emptyMap()) {
+    private static FieldMapper noopFieldMapper(String path) {


This is needed to correctly implement fullPath() that is now used in context.isCopyToDestinationField check.

lkts · 2024-09-10T22:01:12Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java

@@ -207,7 +213,7 @@ protected DocumentParserContext(
            parent,
            dynamic,
            new HashSet<>(),
-            new HashSet<>(),
+            new HashSet<>(mappingLookup.fieldTypesLookup().getCopyToDestinationFields()),


There is one existing user of this DocumentParser#postProcessDynamicArrayMapping but i think the intention of that code is the same as ours. It needs to have logic based on the fact that field is used as a copy_to destination. Previously that logic kicked in only if such a field was processed in this document already. With this, it will kick in the edge case of copy_to destination having actual values in the document. That seems to align with the intention of the code.
cc @kderusso

Yes, this is fine to only kick in if copy_to has values. Thank you for the ping.

elasticsearchmachine · 2024-09-10T22:02:47Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

kkrik-es · 2024-09-11T06:00:44Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java

                && context.sourceKeepModeFromIndexSettings() == Mapper.SourceKeepMode.ARRAYS;
            boolean dynamicRuntimeContext = context.dynamic() == ObjectMapper.Dynamic.RUNTIME;
-            if (objectRequiresStoringSource || fieldWithFallbackSyntheticSource || dynamicRuntimeContext || fieldWithStoredArraySource) {
+            boolean copyToFieldHasValuesInDocument = context.isWithinCopyTo() == false


Nit: String fullPath = context.path().pathAsText(arrayFieldName); and use below.

server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java

kkrik-es · 2024-09-11T06:13:14Z

server/src/main/java/org/elasticsearch/index/mapper/XContentDataHelper.java

@@ -103,19 +117,22 @@ static void decodeAndWrite(XContentBuilder b, BytesRef r) throws IOException {
     * @throws IOException
     */
    static void writeMerged(XContentBuilder b, String fieldName, List<BytesRef> encodedParts) throws IOException {
-        if (encodedParts.isEmpty()) {
+        var partsWithData = encodedParts.stream().filter(XContentDataHelper::isDataPresent).toList();


iirc stream ops are fairly slow, consider replacing with regular list processing.

You can probably be optimistic and first check if there's an empty element, then create the copy on a second pass if one is found.

I rewrote this but i don't have a feel if this is any faster. It skips allocating stream object i guess.

kkrik-es · 2024-09-11T06:21:10Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java

+                if (context.dynamic() == ObjectMapper.Dynamic.RUNTIME && context.canAddIgnoredField()) {
+                    try {
+                        context.addIgnoredField(
+                            IgnoredSourceFieldMapper.NameValue.fromContext(context, path, XContentDataHelper.encodeToken(context.parser()))


Freakin path, again. That should fix issues with runtime?

This is specific to this change. It may also help with #111916 eventually.

kkrik-es · 2024-09-11T06:26:13Z

...pi-spec/src/yamlRestTest/resources/rest-api-spec/test/indices.create/20_synthetic_source.yml

+        k: ["5", "6"]
+        copy: ["7", "8"]
+  - match: { hits.hits.3.fields.copy: ["5", "6", "7", "8"] }
+


Maybe add a test where copy_to is applied within a context where the source has been stored already, e.g. in an array that has been stored.

kkrik-es · 2024-09-11T06:27:18Z

Nice, do we need to add coverage in randomized testing too?

salvatore-campagna · 2024-09-11T06:48:48Z

server/src/main/java/org/elasticsearch/index/mapper/XContentDataHelper.java

-        if (encodedParts.isEmpty()) {
+        var partsWithData = encodedParts.stream().filter(XContentDataHelper::isDataPresent).toList();
+
+        if (partsWithData.isEmpty()) {


replace partsWithData with encodedParts and move the early exit above?

salvatore-campagna · 2024-09-11T06:50:02Z

server/src/main/java/org/elasticsearch/index/mapper/XContentDataHelper.java

@@ -158,6 +175,10 @@ static void writeMerged(XContentBuilder b, String fieldName, List<BytesRef> enco
        b.endArray();
    }

+    public static boolean isDataPresent(BytesRef encoded) {


Change this logic to check for empty and rename to isEmpty?

Why is that better? :)

All collections work with the concept of "being empty" and have an isEmpty and I think also that usually writing logic like "if it is empty then exit or return" is easier to understand...no data, nothing to do. Of course it is not better from a functional perspective but I see it more "coherent" with the rest.

For instance encodedParts and partsWithData above use isEmpty.

I see that you use isDataPresent in the stream filter which works on the opposite of isEmpty...so that is probably the reason why you did it like this.

Yes, filter is the reason.

salvatore-campagna · 2024-09-11T06:51:24Z

...pi-spec/src/yamlRestTest/resources/rest-api-spec/test/indices.create/20_synthetic_source.yml

@@ -1214,3 +1224,231 @@ fallback synthetic_source for text field:
      hits.hits.0._source:
        text: [ "world", "hello", "world" ]

+---


Should we also have a reindex test now that we have copy_to working in synthetic source?

I'll add it in a separate PR.

salvatore-campagna · 2024-09-11T06:53:33Z

...pi-spec/src/yamlRestTest/resources/rest-api-spec/test/indices.create/20_synthetic_source.yml

+  - match: { hits.hits.1.fields.copy: ["5", "6", "7", "8"] }
+
+---
+synthetic_source with copy_to field from dynamic template having values in source:


Can we check that using copy_to results in error when used with object-like fields? Note: it is not just object and nested but also things like range or geo-like. I think this is important for BWC...we need to keep the error-behaviour.

lkts · 2024-09-11T19:41:37Z

@elasticmachine update branch

…source (elastic#112625)

elasticsearchmachine · 2024-09-12T19:20:01Z

💚 Backport successful

Status	Branch	Result
✅	8.x

…source (#112625) (#112835)

elasticsearchmachine added the v8.16.0 label Sep 6, 2024

lkts force-pushed the copy_to_handling_ignore_source branch 2 times, most recently from fb4e59a to d07d1e1 Compare September 10, 2024 21:09

lkts changed the title ~~Copy to handling ignore source~~ Ensure that fields copied using copy_to are not present in synthetic source Sep 10, 2024

Ensure that fields copied using copy_to are not present in synthetic …

d5435ad

…source

lkts force-pushed the copy_to_handling_ignore_source branch from d07d1e1 to d5435ad Compare September 10, 2024 21:10

lkts commented Sep 10, 2024

View reviewed changes

Handle dynamic templates

0768846

lkts commented Sep 10, 2024

View reviewed changes

lkts requested review from kkrik-es and martijnvg September 10, 2024 22:01

lkts marked this pull request as ready for review September 10, 2024 22:02

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Sep 10, 2024

lkts added >non-issue :StorageEngine/Mapping The storage related side of mappings labels Sep 10, 2024

elasticsearchmachine added Team:StorageEngine and removed needs:triage Requires assignment of a team area label labels Sep 10, 2024

kkrik-es reviewed Sep 11, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java Show resolved Hide resolved

kkrik-es reviewed Sep 11, 2024

View reviewed changes

salvatore-campagna reviewed Sep 11, 2024

View reviewed changes

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

lkts added auto-backport-and-merge v8.16.0 labels Sep 11, 2024

Address feedback

99a6f1d

elasticmachine and others added 3 commits September 11, 2024 21:41

Merge branch 'main' into copy_to_handling_ignore_source

d35ab5f

Add coverage in randomized tests

6125c54

style

3d24302

kkrik-es approved these changes Sep 12, 2024

View reviewed changes

lkts added 2 commits September 12, 2024 10:29

Add new feature

1acc1e2

style

e357f72

lkts merged commit 44c9271 into elastic:main Sep 12, 2024
15 checks passed

lkts deleted the copy_to_handling_ignore_source branch September 12, 2024 19:18

lkts added a commit to lkts/elasticsearch that referenced this pull request Sep 12, 2024

Ensure that fields copied using copy_to are not present in synthetic …

9272f2d

…source (elastic#112625)

lkts mentioned this pull request Sep 12, 2024

[8.x] Ensure that fields copied using copy_to are not present in synthetic source (#112625) #112835

Merged

elasticsearchmachine pushed a commit that referenced this pull request Sep 12, 2024

Ensure that fields copied using copy_to are not present in synthetic …

12b4900

…source (#112625) (#112835)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure that fields copied using copy_to are not present in synthetic source #112625

Ensure that fields copied using copy_to are not present in synthetic source #112625

lkts commented Sep 6, 2024 •

edited

Loading

lkts Sep 10, 2024

lkts Sep 10, 2024

kderusso Sep 11, 2024

elasticsearchmachine commented Sep 10, 2024

kkrik-es Sep 11, 2024 •

edited

Loading

kkrik-es Sep 11, 2024

kkrik-es Sep 11, 2024

lkts Sep 11, 2024

kkrik-es Sep 11, 2024

lkts Sep 11, 2024

kkrik-es Sep 11, 2024 •

edited

Loading

kkrik-es commented Sep 11, 2024

salvatore-campagna Sep 11, 2024

salvatore-campagna Sep 11, 2024

lkts Sep 11, 2024 •

edited

Loading

salvatore-campagna Sep 12, 2024 •

edited

Loading

lkts Sep 12, 2024

salvatore-campagna Sep 11, 2024

lkts Sep 12, 2024

salvatore-campagna Sep 11, 2024 •

edited

Loading

lkts commented Sep 11, 2024

elasticsearchmachine commented Sep 12, 2024

Ensure that fields copied using copy_to are not present in synthetic source #112625

Ensure that fields copied using copy_to are not present in synthetic source #112625

Conversation

lkts commented Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Sep 10, 2024

kkrik-es Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkrik-es Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

kkrik-es commented Sep 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lkts Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

salvatore-campagna Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salvatore-campagna Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

lkts commented Sep 11, 2024

elasticsearchmachine commented Sep 12, 2024

💚 Backport successful

lkts commented Sep 6, 2024 •

edited

Loading

kkrik-es Sep 11, 2024 •

edited

Loading

kkrik-es Sep 11, 2024 •

edited

Loading

lkts Sep 11, 2024 •

edited

Loading

salvatore-campagna Sep 12, 2024 •

edited

Loading

salvatore-campagna Sep 11, 2024 •

edited

Loading