-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ignore_malformed behaviour for unsigned long fields #110045
Changes from all commits
3efacc8
3b555ab
a2d7efb
aac3aa8
e6cf588
96c49e4
f21bce8
d515a8c
212ce37
9b56f72
12865e6
7df4159
30bc13a
b306187
2d0fda2
f4009f5
e31c841
15669d3
17cad30
cecca7b
4a411f7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
pr: 110045 | ||
summary: fix ignore_malformed behaviour for unsigned long fields | ||
area: Logs | ||
type: bug | ||
issues: | ||
- 109705 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,262 @@ | ||
--- | ||
"unsigned_long with ignore_malformed in synthetic source": | ||
- do: | ||
indices.create: | ||
index: test-synthetic | ||
body: | ||
mappings: | ||
_source: | ||
mode: synthetic | ||
properties: | ||
ul_ignored: | ||
type: unsigned_long | ||
ignore_malformed: true | ||
ul_not_ignored: | ||
type: unsigned_long | ||
ignore_malformed: false | ||
|
||
- do: | ||
catch: bad_request | ||
index: | ||
refresh: true | ||
index: test-synthetic | ||
id: "1" | ||
body: { "ul_ignored": 2000, "ul_not_ignored": { "key": "bar", value: 200 } } | ||
|
||
- match: { error.root_cause.0.type: "document_parsing_exception" } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-synthetic | ||
id: "2" | ||
body: { "ul_ignored": { "key": "foo", "value": 100 }, "ul_not_ignored": 1000 } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-synthetic | ||
id: "3" | ||
body: { "ul_ignored": [ 100, 200, 300 ], "ul_not_ignored": 3000 } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-synthetic | ||
id: "4" | ||
body: { "ul_ignored": [ { "key": "a", "value": 100 }, { "key": "b", "value": 200 } ], "ul_not_ignored": 4000 } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-synthetic | ||
id: "5" | ||
body: { "ul_ignored": [1, { "key": "foo", "value": "bar" }, 3], "ul_not_ignored": 4000 } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the case where I see different behaviour between stored and synthetic source. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This in an operation against There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes I fixed it yesterday but didn't push since I was doing some more tests...that is the reason why I saw dynamic mapping kick in... |
||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-synthetic | ||
id: "6" | ||
body: { "ul_ignored": [ 1, "foo", 3, "bar" ], "ul_not_ignored": 4000 } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-synthetic | ||
id: "7" | ||
body: { "ul_not_ignored": 7000 } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-synthetic | ||
id: "8" | ||
body: { "ul_ignored": "", "ul_not_ignored": 8000 } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-synthetic | ||
id: "9" | ||
body: { "ul_ignored": " ", "ul_not_ignored": 9000 } | ||
|
||
- do: | ||
search: | ||
index: test-synthetic | ||
body: | ||
query: | ||
match_all: { } | ||
|
||
- match: { hits.total.value: 8 } | ||
|
||
- match: { hits.hits.0._id: "2" } | ||
- match: { hits.hits.0._source.ul_ignored: { "key": "foo", "value": 100 } } | ||
- match: { hits.hits.0._source.ul_not_ignored: 1000 } | ||
- match: { hits.hits.0._ignored.0: "ul_ignored" } | ||
|
||
- match: { hits.hits.1._id: "3" } | ||
- match: { hits.hits.1._source.ul_ignored: [ 100, 200, 300 ] } | ||
- match: { hits.hits.1._source.ul_not_ignored: 3000 } | ||
- match: { hits.hits.1._ignored: null } | ||
|
||
- match: { hits.hits.2._id: "4" } | ||
- match: { hits.hits.2._source.ul_ignored: [ { "key": "a", "value": 100 }, { "key": "b", "value": 200 } ] } | ||
- match: { hits.hits.2._source.ul_not_ignored: 4000 } | ||
- match: { hits.hits.2._ignored.0: "ul_ignored" } | ||
|
||
- match: { hits.hits.3._id: "5" } | ||
# NOTE: values sorting is different due to synthetic source using doc values | ||
- match: { hits.hits.3._source.ul_ignored: [ 1, 3, { "key": "foo", "value": "bar" } ] } | ||
- match: { hits.hits.3._source.ul_not_ignored: 4000 } | ||
- match: { hits.hits.3._ignored.0: "ul_ignored" } | ||
|
||
- match: { hits.hits.4._id: "6" } | ||
# NOTE: values sorting is different due to synthetic source using doc values | ||
- match: { hits.hits.4._source.ul_ignored: [ 1, 3, "foo", "bar" ] } | ||
- match: { hits.hits.4._source.ul_not_ignored: 4000 } | ||
- match: { hits.hits.4._ignored.0: "ul_ignored" } | ||
|
||
- match: { hits.hits.5._id: "7" } | ||
- match: { hits.hits.5._source.ul_ignored: null } | ||
- match: { hits.hits.5._source.ul_not_ignored: 7000 } | ||
|
||
- match: { hits.hits.6._id: "8" } | ||
# NOTE: empty string treated as null value | ||
- match: { hits.hits.6._source.ul_ignored: null } | ||
- match: { hits.hits.6._source.ul_not_ignored: 8000 } | ||
|
||
- match: { hits.hits.7._id: "9" } | ||
- match: { hits.hits.7._source.ul_ignored: " " } | ||
- match: { hits.hits.7._source.ul_not_ignored: 9000 } | ||
- match: { hits.hits.7._ignored.0: "ul_ignored" } | ||
|
||
--- | ||
"unsigned_long with ignore_malformed with stored source": | ||
- do: | ||
indices.create: | ||
index: test-stored | ||
body: | ||
mappings: | ||
_source: | ||
mode: stored | ||
properties: | ||
ul_ignored: | ||
type: unsigned_long | ||
ignore_malformed: true | ||
ul_not_ignored: | ||
type: unsigned_long | ||
ignore_malformed: false | ||
|
||
- do: | ||
catch: bad_request | ||
index: | ||
refresh: true | ||
index: test-stored | ||
id: "1" | ||
body: { "ul_ignored": 2000, "ul_not_ignored": { "key": "bar", value: 200 } } | ||
|
||
- match: { error.root_cause.0.type: "document_parsing_exception" } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-stored | ||
id: "2" | ||
body: { "ul_ignored": { "key": "foo", "value": 100 }, "ul_not_ignored": 1000 } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-stored | ||
id: "3" | ||
body: { "ul_ignored": [ 100, 200, 300 ], "ul_not_ignored": 3000 } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-stored | ||
id: "4" | ||
body: { "ul_ignored": [ { "key": "a", "value": 100 }, { "key": "b", "value": 200 } ], "ul_not_ignored": 4000 } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-stored | ||
id: "5" | ||
body: { "ul_ignored": [1, { "key": "foo", "value": "bar" }, 3], "ul_not_ignored": 4000 } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-stored | ||
id: "6" | ||
body: { "ul_ignored": [ 1, "foo", 3, "bar" ], "ul_not_ignored": 4000 } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-stored | ||
id: "7" | ||
body: { "ul_not_ignored": 7000 } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-stored | ||
id: "8" | ||
body: { "ul_ignored": "", "ul_not_ignored": 8000 } | ||
|
||
- do: | ||
index: | ||
refresh: true | ||
index: test-stored | ||
id: "9" | ||
body: { "ul_ignored": " ", "ul_not_ignored": 9000 } | ||
|
||
- do: | ||
search: | ||
index: test-stored | ||
body: | ||
query: | ||
match_all: { } | ||
|
||
- match: { hits.total.value: 8 } | ||
|
||
- match: { hits.hits.0._id: "2" } | ||
- match: { hits.hits.0._source.ul_ignored: { "key": "foo", "value": 100 } } | ||
- match: { hits.hits.0._source.ul_not_ignored: 1000 } | ||
- match: { hits.hits.0._ignored.0: "ul_ignored" } | ||
|
||
- match: { hits.hits.1._id: "3" } | ||
- match: { hits.hits.1._source.ul_ignored: [ 100, 200, 300 ] } | ||
- match: { hits.hits.1._source.ul_not_ignored: 3000 } | ||
- match: { hits.hits.1._ignored: null } | ||
|
||
- match: { hits.hits.2._id: "4" } | ||
- match: { hits.hits.2._source.ul_ignored: [ { "key": "a", "value": 100 }, { "key": "b", "value": 200 } ] } | ||
- match: { hits.hits.2._source.ul_not_ignored: 4000 } | ||
- match: { hits.hits.2._ignored.0: "ul_ignored" } | ||
|
||
- match: { hits.hits.3._id: "5" } | ||
- match: { hits.hits.3._source.ul_ignored: [ 1, { "key": "foo", "value": "bar" }, 3 ] } | ||
- match: { hits.hits.3._source.ul_not_ignored: 4000 } | ||
- match: { hits.hits.3._ignored.0: "ul_ignored" } | ||
|
||
- match: { hits.hits.4._id: "6" } | ||
- match: { hits.hits.4._source.ul_ignored: [ 1, "foo", 3, "bar" ] } | ||
- match: { hits.hits.4._source.ul_not_ignored: 4000 } | ||
- match: { hits.hits.4._ignored.0: "ul_ignored" } | ||
|
||
- match: { hits.hits.5._id: "7" } | ||
- match: { hits.hits.5._source.ul_ignored: null } | ||
- match: { hits.hits.5._source.ul_not_ignored: 7000 } | ||
|
||
- match: { hits.hits.6._id: "8" } | ||
- match: { hits.hits.6._source.ul_ignored: "" } | ||
- match: { hits.hits.6._source.ul_not_ignored: 8000 } | ||
|
||
- match: { hits.hits.7._id: "9" } | ||
- match: { hits.hits.7._source.ul_ignored: " " } | ||
- match: { hits.hits.7._source.ul_not_ignored: 9000 } | ||
- match: { hits.hits.7._ignored.0: "ul_ignored" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing a return here? Right now it will still throw an exception with ignore malformed enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a closer look and there is a problem that when the source is not synthetic we'll not advance the parser to the end of the object and fail later with parsing exception (
found extra data after parsing: END_OBJECT
). I think this is actually a reason that we haveparser.currentToken().isValue()
check when handlingignore_malformed
.It looks like my expectations in #109705 are incorrect and this works as designed - "value fields" like numbers don't take objects as inputs even with
ignore_malformed
. If we wanted to change that we probably need a wider group decision. What do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not expect non-object fields (line keyword or integer) to accept object-like values...but then we need to agree on how to handle them when it comes to
ignore_malformed
I see most of our code does not parse objects for things other than object-like types. It looks likeignore_malformed
is more for things like numbers which are not numbers or maybe out of range values and so on...I don't think we can catch all kind of parsing issues.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other end, anyway, I think the purpose of
ignore_malformed
is to avoid documents being rejected because of a a malformed field value. So, probably, the right behaviour would be to just parse the object until the end so to avoid the assertion failure later but then add the field to ignored fields and store its value.