Fix ignore_malformed behaviour for unsigned long fields #110045

salvatore-campagna · 2024-06-21T13:24:10Z

When an object is supplied as a value for a field whose type is unsigned_log we need to trigger
handling of the field value using our ignore malformed handling strategy. The IllegalStateException
happens because the parser does not handle an object value being supplied and the try/catch only
deals with IllegalArgumentException when adding the malformed value.

Resolves #109705

elasticsearchmachine · 2024-06-21T13:24:35Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

elasticsearchmachine · 2024-06-21T13:27:19Z

Hi @salvatore-campagna, I've created a changelog YAML for you.

salvatore-campagna · 2024-06-21T20:54:10Z

@elasticsearchmachine test this please

…urce

salvatore-campagna · 2024-06-25T15:25:18Z

...n/mapper-unsigned-long/src/yamlRestTest/resources/rest-api-spec/test/80_ignore_malformed.yml

+        refresh: true
+        index: test-stored
+        id: "5"
+        body: { "ul_ignored": [1, { "key": "foo", "value": "bar" }, 3], "ul_not_ignored": 4000 }


This is the case where I see different behaviour between stored and synthetic source.

This in an operation against test-stored which does not exist at this time right. So this is going into dynamic fields code path.

Yes I fixed it yesterday but didn't push since I was doing some more tests...that is the reason why I saw dynamic mapping kick in...

salvatore-campagna · 2024-06-26T07:51:28Z

Note that the yaml test including the error messages and exceptions has the same behaviour we have in main right now before merging this PR. SO this PR is not actually changing the bahaviour when it comes to the response when an indexing error (parsing) takes place.

lkts · 2024-06-26T17:57:16Z

docs/changelog/110045.yaml

@@ -0,0 +1,6 @@
+pr: 110045
+summary: Parsing objects for unsigned long fields


Let's align this with PR title.

Will change this before I merge.

lkts · 2024-06-26T19:28:33Z

...nsigned-long/src/main/java/org/elasticsearch/xpack/unsignedlong/UnsignedLongFieldMapper.java

+                        context.addIgnoredField(mappedFieldType.name());
+                        if (isSourceSynthetic) {
+                            context.doc().add(IgnoreMalformedStoredValues.storedField(fullPath(), context.parser()));
+                        }


Are we missing a return here? Right now it will still throw an exception with ignore malformed enabled.

I took a closer look and there is a problem that when the source is not synthetic we'll not advance the parser to the end of the object and fail later with parsing exception (found extra data after parsing: END_OBJECT). I think this is actually a reason that we have parser.currentToken().isValue() check when handling ignore_malformed.

It looks like my expectations in #109705 are incorrect and this works as designed - "value fields" like numbers don't take objects as inputs even with ignore_malformed. If we wanted to change that we probably need a wider group decision. What do you think?

I do not expect non-object fields (line keyword or integer) to accept object-like values...but then we need to agree on how to handle them when it comes to ignore_malformed I see most of our code does not parse objects for things other than object-like types. It looks like ignore_malformed is more for things like numbers which are not numbers or maybe out of range values and so on...I don't think we can catch all kind of parsing issues.

On the other end, anyway, I think the purpose of ignore_malformed is to avoid documents being rejected because of a a malformed field value. So, probably, the right behaviour would be to just parse the object until the end so to avoid the assertion failure later but then add the field to ignored fields and store its value.

lkts · 2024-06-26T19:46:00Z

...n/mapper-unsigned-long/src/yamlRestTest/resources/rest-api-spec/test/80_ignore_malformed.yml

+        refresh: true
+        index: test-stored
+        id: "5"
+        body: { "ul_ignored": [1, { "key": "foo", "value": "bar" }, 3], "ul_not_ignored": 4000 }


This in an operation against test-stored which does not exist at this time right. So this is going into dynamic fields code path.

salvatore-campagna · 2024-06-27T12:28:13Z

@martijnvg @lkts with the last commit I have done it how it should be in my opinion. If a value that is not an unsigned long is provided the behaviour changes according to ignore_malformed being true or false at indexing time. When it comes to synthetic source, instead what changes is the usage of the stored field to save the field value. As a result, when reconstructing the document the only difference is in synthetic source limitations that we are aware of. Also I had to add a call to parser.skipChildren() to avoid and invalid state of the parser that we assert doing assert token.isValue(); in DocumentParser.

martijnvg

lgtm

lkts · 2024-06-27T17:45:59Z

The approach LGTM too. There may be some overlap here with #12366.

fix: parsing objects for unsigned long fields

3efacc8

salvatore-campagna added the :StorageEngine/Logs You know, for Logs label Jun 21, 2024

salvatore-campagna requested review from martijnvg and lkts June 21, 2024 13:24

salvatore-campagna self-assigned this Jun 21, 2024

elasticsearchmachine added the v8.15.0 label Jun 21, 2024

elasticsearchmachine added the Team:StorageEngine label Jun 21, 2024

fix: remove duplicate match_all query

3b555ab

salvatore-campagna added the >bug label Jun 21, 2024

Update docs/changelog/110045.yaml

a2d7efb

salvatore-campagna added 4 commits June 21, 2024 15:29

dry: extract method

aac3aa8

nit: error message

e6cf588

fix: use isValue

96c49e4

test: add a few more malformed tests

f21bce8

salvatore-campagna added the test-full-bwc Trigger full BWC version matrix tests label Jun 21, 2024

salvatore-campagna and others added 5 commits June 21, 2024 22:56

test: check no ignored field

d515a8c

fix: hits array idnex

212ce37

fix: _ignored null check instead of length

9b56f72

Merge branch 'main' into fix/109705-unsigned-long-ignore-malformed

12865e6

fix: compiler error after conflict resolution

7df4159

salvatore-campagna changed the title ~~Parsing objects for unsigned long fields~~ Ignore malformed objects values for unsigned long fields Jun 22, 2024

salvatore-campagna changed the title ~~Ignore malformed objects values for unsigned long fields~~ Ignore malformed object values for unsigned long fields Jun 22, 2024

salvatore-campagna added 5 commits June 22, 2024 22:56

fix: keep existing failure behavior if not malformed or not stored so…

30bc13a

…urce

test: synthetic source and stored source

b306187

fix: assign parsing result

2d0fda2

fix: catch merge object with non-object

f4009f5

fix: error handling while parsing unsigned longs

e31c841

salvatore-campagna commented Jun 25, 2024

View reviewed changes

fix: uncomment different exception

15669d3

lkts reviewed Jun 26, 2024

View reviewed changes

salvatore-campagna added 2 commits June 26, 2024 22:39

fix: index and error

17cad30

fix: ignore_malformed for unsigned long

cecca7b

salvatore-campagna changed the title ~~Ignore malformed object values for unsigned long fields~~ Fix ignore_malformed behaviour for unsigned long fields Jun 27, 2024

fix: changelog summary

4a411f7

martijnvg approved these changes Jun 27, 2024

View reviewed changes

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

salvatore-campagna closed this Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ignore_malformed behaviour for unsigned long fields #110045

Fix ignore_malformed behaviour for unsigned long fields #110045

salvatore-campagna commented Jun 21, 2024 •

edited

Loading

elasticsearchmachine commented Jun 21, 2024

elasticsearchmachine commented Jun 21, 2024

salvatore-campagna commented Jun 21, 2024

salvatore-campagna Jun 25, 2024

lkts Jun 26, 2024

salvatore-campagna Jun 27, 2024

salvatore-campagna commented Jun 26, 2024

lkts Jun 26, 2024

salvatore-campagna Jun 26, 2024

lkts Jun 26, 2024

lkts Jun 26, 2024

salvatore-campagna Jun 26, 2024 •

edited

Loading

salvatore-campagna Jun 27, 2024 •

edited

Loading

lkts Jun 26, 2024

salvatore-campagna commented Jun 27, 2024 •

edited

Loading

martijnvg left a comment

lkts commented Jun 27, 2024

		@@ -0,0 +1,6 @@
		pr: 110045
		summary: Parsing objects for unsigned long fields

Fix ignore_malformed behaviour for unsigned long fields #110045

Fix ignore_malformed behaviour for unsigned long fields #110045

Conversation

salvatore-campagna commented Jun 21, 2024 • edited Loading

elasticsearchmachine commented Jun 21, 2024

elasticsearchmachine commented Jun 21, 2024

salvatore-campagna commented Jun 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salvatore-campagna commented Jun 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salvatore-campagna Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

salvatore-campagna Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salvatore-campagna commented Jun 27, 2024 • edited Loading

martijnvg left a comment

Choose a reason for hiding this comment

lkts commented Jun 27, 2024

salvatore-campagna commented Jun 21, 2024 •

edited

Loading

salvatore-campagna Jun 26, 2024 •

edited

Loading

salvatore-campagna Jun 27, 2024 •

edited

Loading

salvatore-campagna commented Jun 27, 2024 •

edited

Loading