Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ignore_malformed behaviour for unsigned long fields #110045

Conversation

salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Jun 21, 2024

When an object is supplied as a value for a field whose type is unsigned_log we need to trigger
handling of the field value using our ignore malformed handling strategy. The IllegalStateException
happens because the parser does not handle an object value being supplied and the try/catch only
deals with IllegalArgumentException when adding the malformed value.

Resolves #109705

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @salvatore-campagna, I've created a changelog YAML for you.

@salvatore-campagna salvatore-campagna added the test-full-bwc Trigger full BWC version matrix tests label Jun 21, 2024
@salvatore-campagna
Copy link
Contributor Author

@elasticsearchmachine test this please

@salvatore-campagna salvatore-campagna changed the title Parsing objects for unsigned long fields Ignore malformed objects values for unsigned long fields Jun 22, 2024
@salvatore-campagna salvatore-campagna changed the title Ignore malformed objects values for unsigned long fields Ignore malformed object values for unsigned long fields Jun 22, 2024
refresh: true
index: test-stored
id: "5"
body: { "ul_ignored": [1, { "key": "foo", "value": "bar" }, 3], "ul_not_ignored": 4000 }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the case where I see different behaviour between stored and synthetic source.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in an operation against test-stored which does not exist at this time right. So this is going into dynamic fields code path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I fixed it yesterday but didn't push since I was doing some more tests...that is the reason why I saw dynamic mapping kick in...

@salvatore-campagna
Copy link
Contributor Author

Note that the yaml test including the error messages and exceptions has the same behaviour we have in main right now before merging this PR. SO this PR is not actually changing the bahaviour when it comes to the response when an indexing error (parsing) takes place.

@@ -0,0 +1,6 @@
pr: 110045
summary: Parsing objects for unsigned long fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's align this with PR title.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change this before I merge.

context.addIgnoredField(mappedFieldType.name());
if (isSourceSynthetic) {
context.doc().add(IgnoreMalformedStoredValues.storedField(fullPath(), context.parser()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing a return here? Right now it will still throw an exception with ignore malformed enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look and there is a problem that when the source is not synthetic we'll not advance the parser to the end of the object and fail later with parsing exception (found extra data after parsing: END_OBJECT). I think this is actually a reason that we have parser.currentToken().isValue() check when handling ignore_malformed.

It looks like my expectations in #109705 are incorrect and this works as designed - "value fields" like numbers don't take objects as inputs even with ignore_malformed. If we wanted to change that we probably need a wider group decision. What do you think?

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not expect non-object fields (line keyword or integer) to accept object-like values...but then we need to agree on how to handle them when it comes to ignore_malformed I see most of our code does not parse objects for things other than object-like types. It looks like ignore_malformed is more for things like numbers which are not numbers or maybe out of range values and so on...I don't think we can catch all kind of parsing issues.

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other end, anyway, I think the purpose of ignore_malformed is to avoid documents being rejected because of a a malformed field value. So, probably, the right behaviour would be to just parse the object until the end so to avoid the assertion failure later but then add the field to ignored fields and store its value.

refresh: true
index: test-stored
id: "5"
body: { "ul_ignored": [1, { "key": "foo", "value": "bar" }, 3], "ul_not_ignored": 4000 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in an operation against test-stored which does not exist at this time right. So this is going into dynamic fields code path.

@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Jun 27, 2024

@martijnvg @lkts with the last commit I have done it how it should be in my opinion. If a value that is not an unsigned long is provided the behaviour changes according to ignore_malformed being true or false at indexing time. When it comes to synthetic source, instead what changes is the usage of the stored field to save the field value. As a result, when reconstructing the document the only difference is in synthetic source limitations that we are aware of. Also I had to add a call to parser.skipChildren() to avoid and invalid state of the parser that we assert doing assert token.isValue(); in DocumentParser.

@salvatore-campagna salvatore-campagna changed the title Ignore malformed object values for unsigned long fields Fix ignore_malformed behaviour for unsigned long fields Jun 27, 2024
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@lkts
Copy link
Contributor

lkts commented Jun 27, 2024

The approach LGTM too. There may be some overlap here with #12366.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :StorageEngine/Logs You know, for Logs Team:StorageEngine test-full-bwc Trigger full BWC version matrix tests v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unsigned_long field type fails to index specific malformed data even with ignore_malformed enabled
5 participants