Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ignore_malformed behaviour for unsigned long fields #110045

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
3efacc8
fix: parsing objects for unsigned long fields
salvatore-campagna Jun 21, 2024
3b555ab
fix: remove duplicate match_all query
salvatore-campagna Jun 21, 2024
a2d7efb
Update docs/changelog/110045.yaml
salvatore-campagna Jun 21, 2024
aac3aa8
dry: extract method
salvatore-campagna Jun 21, 2024
e6cf588
nit: error message
salvatore-campagna Jun 21, 2024
96c49e4
fix: use isValue
salvatore-campagna Jun 21, 2024
f21bce8
test: add a few more malformed tests
salvatore-campagna Jun 21, 2024
d515a8c
test: check no ignored field
salvatore-campagna Jun 21, 2024
212ce37
fix: hits array idnex
salvatore-campagna Jun 21, 2024
9b56f72
fix: _ignored null check instead of length
salvatore-campagna Jun 21, 2024
12865e6
Merge branch 'main' into fix/109705-unsigned-long-ignore-malformed
salvatore-campagna Jun 22, 2024
7df4159
fix: compiler error after conflict resolution
salvatore-campagna Jun 22, 2024
30bc13a
fix: keep existing failure behavior if not malformed or not stored so…
salvatore-campagna Jun 22, 2024
b306187
test: synthetic source and stored source
salvatore-campagna Jun 22, 2024
2d0fda2
fix: assign parsing result
salvatore-campagna Jun 24, 2024
f4009f5
fix: catch merge object with non-object
salvatore-campagna Jun 24, 2024
e31c841
fix: error handling while parsing unsigned longs
salvatore-campagna Jun 25, 2024
15669d3
fix: uncomment different exception
salvatore-campagna Jun 25, 2024
17cad30
fix: index and error
salvatore-campagna Jun 26, 2024
cecca7b
fix: ignore_malformed for unsigned long
salvatore-campagna Jun 27, 2024
4a411f7
fix: changelog summary
salvatore-campagna Jun 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/110045.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 110045
summary: fix ignore_malformed behaviour for unsigned long fields
area: Logs
type: bug
issues:
- 109705
Original file line number Diff line number Diff line change
Expand Up @@ -626,7 +626,18 @@ protected void parseCreateField(DocumentParserContext context) throws IOExceptio
numericValue = null;
} else {
try {
if (parser.currentToken() == XContentParser.Token.VALUE_NUMBER) {
if (parser.currentToken().isValue() == false) {
if (ignoreMalformed.value()) {
context.addIgnoredField(mappedFieldType.name());
if (isSourceSynthetic) {
context.doc().add(IgnoreMalformedStoredValues.storedField(fullPath(), context.parser()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing a return here? Right now it will still throw an exception with ignore malformed enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look and there is a problem that when the source is not synthetic we'll not advance the parser to the end of the object and fail later with parsing exception (found extra data after parsing: END_OBJECT). I think this is actually a reason that we have parser.currentToken().isValue() check when handling ignore_malformed.

It looks like my expectations in #109705 are incorrect and this works as designed - "value fields" like numbers don't take objects as inputs even with ignore_malformed. If we wanted to change that we probably need a wider group decision. What do you think?

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not expect non-object fields (line keyword or integer) to accept object-like values...but then we need to agree on how to handle them when it comes to ignore_malformed I see most of our code does not parse objects for things other than object-like types. It looks like ignore_malformed is more for things like numbers which are not numbers or maybe out of range values and so on...I don't think we can catch all kind of parsing issues.

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other end, anyway, I think the purpose of ignore_malformed is to avoid documents being rejected because of a a malformed field value. So, probably, the right behaviour would be to just parse the object until the end so to avoid the assertion failure later but then add the field to ignored fields and store its value.

// NOTE: make sure we parse till the end to avoid the parser ends up in an illegal state
parser.skipChildren();
return;
}
numericValue = parseUnsignedLong(parser.text());
} else if (parser.currentToken() == XContentParser.Token.VALUE_NUMBER) {
numericValue = parseUnsignedLong(parser.numberValue());
} else {
numericValue = parseUnsignedLong(parser.text());
Expand All @@ -635,7 +646,6 @@ protected void parseCreateField(DocumentParserContext context) throws IOExceptio
if (ignoreMalformed.value() && parser.currentToken().isValue()) {
context.addIgnoredField(mappedFieldType.name());
if (isSourceSynthetic) {
// Save a copy of the field so synthetic source can load it
context.doc().add(IgnoreMalformedStoredValues.storedField(fullPath(), context.parser()));
}
return;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
---
"unsigned_long with ignore_malformed in synthetic source":
- do:
indices.create:
index: test-synthetic
body:
mappings:
_source:
mode: synthetic
properties:
ul_ignored:
type: unsigned_long
ignore_malformed: true
ul_not_ignored:
type: unsigned_long
ignore_malformed: false

- do:
catch: bad_request
index:
refresh: true
index: test-synthetic
id: "1"
body: { "ul_ignored": 2000, "ul_not_ignored": { "key": "bar", value: 200 } }

- match: { error.root_cause.0.type: "document_parsing_exception" }

- do:
index:
refresh: true
index: test-synthetic
id: "2"
body: { "ul_ignored": { "key": "foo", "value": 100 }, "ul_not_ignored": 1000 }

- do:
index:
refresh: true
index: test-synthetic
id: "3"
body: { "ul_ignored": [ 100, 200, 300 ], "ul_not_ignored": 3000 }

- do:
index:
refresh: true
index: test-synthetic
id: "4"
body: { "ul_ignored": [ { "key": "a", "value": 100 }, { "key": "b", "value": 200 } ], "ul_not_ignored": 4000 }

- do:
index:
refresh: true
index: test-synthetic
id: "5"
body: { "ul_ignored": [1, { "key": "foo", "value": "bar" }, 3], "ul_not_ignored": 4000 }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the case where I see different behaviour between stored and synthetic source.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in an operation against test-stored which does not exist at this time right. So this is going into dynamic fields code path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I fixed it yesterday but didn't push since I was doing some more tests...that is the reason why I saw dynamic mapping kick in...


- do:
index:
refresh: true
index: test-synthetic
id: "6"
body: { "ul_ignored": [ 1, "foo", 3, "bar" ], "ul_not_ignored": 4000 }

- do:
index:
refresh: true
index: test-synthetic
id: "7"
body: { "ul_not_ignored": 7000 }

- do:
index:
refresh: true
index: test-synthetic
id: "8"
body: { "ul_ignored": "", "ul_not_ignored": 8000 }

- do:
index:
refresh: true
index: test-synthetic
id: "9"
body: { "ul_ignored": " ", "ul_not_ignored": 9000 }

- do:
search:
index: test-synthetic
body:
query:
match_all: { }

- match: { hits.total.value: 8 }

- match: { hits.hits.0._id: "2" }
- match: { hits.hits.0._source.ul_ignored: { "key": "foo", "value": 100 } }
- match: { hits.hits.0._source.ul_not_ignored: 1000 }
- match: { hits.hits.0._ignored.0: "ul_ignored" }

- match: { hits.hits.1._id: "3" }
- match: { hits.hits.1._source.ul_ignored: [ 100, 200, 300 ] }
- match: { hits.hits.1._source.ul_not_ignored: 3000 }
- match: { hits.hits.1._ignored: null }

- match: { hits.hits.2._id: "4" }
- match: { hits.hits.2._source.ul_ignored: [ { "key": "a", "value": 100 }, { "key": "b", "value": 200 } ] }
- match: { hits.hits.2._source.ul_not_ignored: 4000 }
- match: { hits.hits.2._ignored.0: "ul_ignored" }

- match: { hits.hits.3._id: "5" }
# NOTE: values sorting is different due to synthetic source using doc values
- match: { hits.hits.3._source.ul_ignored: [ 1, 3, { "key": "foo", "value": "bar" } ] }
- match: { hits.hits.3._source.ul_not_ignored: 4000 }
- match: { hits.hits.3._ignored.0: "ul_ignored" }

- match: { hits.hits.4._id: "6" }
# NOTE: values sorting is different due to synthetic source using doc values
- match: { hits.hits.4._source.ul_ignored: [ 1, 3, "foo", "bar" ] }
- match: { hits.hits.4._source.ul_not_ignored: 4000 }
- match: { hits.hits.4._ignored.0: "ul_ignored" }

- match: { hits.hits.5._id: "7" }
- match: { hits.hits.5._source.ul_ignored: null }
- match: { hits.hits.5._source.ul_not_ignored: 7000 }

- match: { hits.hits.6._id: "8" }
# NOTE: empty string treated as null value
- match: { hits.hits.6._source.ul_ignored: null }
- match: { hits.hits.6._source.ul_not_ignored: 8000 }

- match: { hits.hits.7._id: "9" }
- match: { hits.hits.7._source.ul_ignored: " " }
- match: { hits.hits.7._source.ul_not_ignored: 9000 }
- match: { hits.hits.7._ignored.0: "ul_ignored" }

---
"unsigned_long with ignore_malformed with stored source":
- do:
indices.create:
index: test-stored
body:
mappings:
_source:
mode: stored
properties:
ul_ignored:
type: unsigned_long
ignore_malformed: true
ul_not_ignored:
type: unsigned_long
ignore_malformed: false

- do:
catch: bad_request
index:
refresh: true
index: test-stored
id: "1"
body: { "ul_ignored": 2000, "ul_not_ignored": { "key": "bar", value: 200 } }

- match: { error.root_cause.0.type: "document_parsing_exception" }

- do:
index:
refresh: true
index: test-stored
id: "2"
body: { "ul_ignored": { "key": "foo", "value": 100 }, "ul_not_ignored": 1000 }

- do:
index:
refresh: true
index: test-stored
id: "3"
body: { "ul_ignored": [ 100, 200, 300 ], "ul_not_ignored": 3000 }

- do:
index:
refresh: true
index: test-stored
id: "4"
body: { "ul_ignored": [ { "key": "a", "value": 100 }, { "key": "b", "value": 200 } ], "ul_not_ignored": 4000 }

- do:
index:
refresh: true
index: test-stored
id: "5"
body: { "ul_ignored": [1, { "key": "foo", "value": "bar" }, 3], "ul_not_ignored": 4000 }

- do:
index:
refresh: true
index: test-stored
id: "6"
body: { "ul_ignored": [ 1, "foo", 3, "bar" ], "ul_not_ignored": 4000 }

- do:
index:
refresh: true
index: test-stored
id: "7"
body: { "ul_not_ignored": 7000 }

- do:
index:
refresh: true
index: test-stored
id: "8"
body: { "ul_ignored": "", "ul_not_ignored": 8000 }

- do:
index:
refresh: true
index: test-stored
id: "9"
body: { "ul_ignored": " ", "ul_not_ignored": 9000 }

- do:
search:
index: test-stored
body:
query:
match_all: { }

- match: { hits.total.value: 8 }

- match: { hits.hits.0._id: "2" }
- match: { hits.hits.0._source.ul_ignored: { "key": "foo", "value": 100 } }
- match: { hits.hits.0._source.ul_not_ignored: 1000 }
- match: { hits.hits.0._ignored.0: "ul_ignored" }

- match: { hits.hits.1._id: "3" }
- match: { hits.hits.1._source.ul_ignored: [ 100, 200, 300 ] }
- match: { hits.hits.1._source.ul_not_ignored: 3000 }
- match: { hits.hits.1._ignored: null }

- match: { hits.hits.2._id: "4" }
- match: { hits.hits.2._source.ul_ignored: [ { "key": "a", "value": 100 }, { "key": "b", "value": 200 } ] }
- match: { hits.hits.2._source.ul_not_ignored: 4000 }
- match: { hits.hits.2._ignored.0: "ul_ignored" }

- match: { hits.hits.3._id: "5" }
- match: { hits.hits.3._source.ul_ignored: [ 1, { "key": "foo", "value": "bar" }, 3 ] }
- match: { hits.hits.3._source.ul_not_ignored: 4000 }
- match: { hits.hits.3._ignored.0: "ul_ignored" }

- match: { hits.hits.4._id: "6" }
- match: { hits.hits.4._source.ul_ignored: [ 1, "foo", 3, "bar" ] }
- match: { hits.hits.4._source.ul_not_ignored: 4000 }
- match: { hits.hits.4._ignored.0: "ul_ignored" }

- match: { hits.hits.5._id: "7" }
- match: { hits.hits.5._source.ul_ignored: null }
- match: { hits.hits.5._source.ul_not_ignored: 7000 }

- match: { hits.hits.6._id: "8" }
- match: { hits.hits.6._source.ul_ignored: "" }
- match: { hits.hits.6._source.ul_not_ignored: 8000 }

- match: { hits.hits.7._id: "9" }
- match: { hits.hits.7._source.ul_ignored: " " }
- match: { hits.hits.7._source.ul_not_ignored: 9000 }
- match: { hits.hits.7._ignored.0: "ul_ignored" }