Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Replace unsupported data extraction with unsupported field #99506

Closed
costin opened this issue Sep 13, 2023 · 2 comments · Fixed by #99588
Closed

ESQL: Replace unsupported data extraction with unsupported field #99506

costin opened this issue Sep 13, 2023 · 2 comments · Fixed by #99588
Assignees
Labels

Comments

@costin
Copy link
Member

costin commented Sep 13, 2023

Description

#99213 and #99183 indicate that there are cases where the data extraction fails which causes the query to fail.
Instead of blowing up the exception, we need to mark the column/value as unsupported and move on.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine elasticsearchmachine added the Team:QL (Deprecated) Meta label for query languages team label Sep 13, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

elasticsearchmachine pushed a commit that referenced this issue Sep 19, 2023
Fixes #99506

This is an attempt to avoid runtime exceptions when dealing with fields
that have no `doc_values` support (eg. that are not indexed) and then
cannot be extracted. In this scenario, we currently support extraction
of keyword fields and of text fields when source is present (ie. not
with synthetic source), but for all the other types, ESQL fails with an
exception.

This PR adds a last resort defense against these errors, returning a
default value when actual values cannot be extracted, and avoiding
runtime failures.

There is a significant change here that impacts unsupported fields:
before this PR, unsupported field values where returned as
`"<unsupported>"`. We cannot use this value also in this case, because
at this stage the data type is well defined already, with all the
constraints in terms of block types, so we cannot always return a
BytesRef (eg. for numeric, that has different blocks, or IP, that
requires a specific string format). To avoid multiple ways of returning
invalid values, this PR uniforms it returning `null` in all cases AND
emitting a warning regarding the unsupported field.

This has a few advantages: - `null` is valid for all types - it doesn't
overlap with valid values (eg. `"<unsupported>"` is a valid value for a
KEYWORD field) - it's the only alternative for types where defining a
value for unsupported values is practically impossible, like for
numerics - keeps the result clean, moving the report of the problem to
the right place, that is warnings

There is an alternative to this approach, that is to try to intercept
the problem during the query analysis/resolution phase. It could be more
elegant, but we risk to miss some cases and still have to catch errors
during the physical/extraction/execution phase, so we'll probably need
this anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants