Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: syntax error on date math if date period variable is at the end of the expression #99293

Open
dej611 opened this issue Sep 7, 2023 · 12 comments
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@dej611
Copy link
Contributor

dej611 commented Sep 7, 2023

Elasticsearch Version

8.11.0

Installed Plugins

No response

Java Version

bundled

OS Version

MacOS

Problem Description

Testing date math in ES|QL in Kibana after #98870 and noticed that the date math feature kicks in only when the date period variable is not at the end of the statement.

Steps to Reproduce

This works:

row n = now() | eval then = n - 1 year + 2 minutes

as in the PR works ✅

but the same expression with a different order errors out:

row n = now() | eval then = - 1 year - 2 minutes + n

❌ Errors: Unexpected error from Elasticsearch: verification_exception - Found 1 problem line 1:29: [-] has arguments with incompatible types [DATE_PERIOD] and [TIME_DURATION]

This one works instead ✅ :

row n = now() | eval then = - 1 year + n - 2 minutes

Logs (if relevant)

No response

@dej611 dej611 added >bug needs:triage Requires assignment of a team area label :Analytics/ES|QL AKA ESQL labels Sep 7, 2023
@elasticsearchmachine elasticsearchmachine added Team:QL (Deprecated) Meta label for query languages team and removed needs:triage Requires assignment of a team area label labels Sep 7, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

@alex-spies
Copy link
Contributor

I had a super quick look because I was curious.

The exception is triggered in org/elasticsearch/xpack/esql/expression/predicate/operator/arithmetic/DateTimeArithmeticOperation.java, in resolveType. The way date math currently works is, I think, that we can always add a duration to a datetime, but we cannot add two durations. The same holds for subtractions.

The following fails for the same reason, and does not contain subtractions/negatives:

row n = now() | eval then = 1 year + 2 minutes + n

It does work just fine, though, if we set parentheses:

row n = now() | eval then = 1 year + (2 minutes + n)

Resolving this issue may require us to allow adding/subtracting time periods and only disallowing time periods as the final output of an expression instead.

@alex-spies
Copy link
Contributor

I made a bit of progress on this one.

The problem already occurs when using exclusively date periods, e.g. years:

row n = now() | eval then = 1 year + 2 year + n

The problem is that this is equivalent to

row n = now() | eval then = (1 year + 2 year) + n

and the sum 1year + 2year should be folded into 3 year even before execution. However, the folding in DateTimeArithmeticOperation is the default folding, which falls back to using toEvaluator. toEvaluator, however, is only implemented for datetime +/- time_duration and datetime +/- date_period.

I think this issue can be solved in two steps:

  • Override fold to enable adding/subtracting time_duration and date_period literal expressions.
  • Generalize DateTimeArithmeticOperation to support adding/subtracting combinations of time durations and date periods to datetimes.

@alex-spies
Copy link
Contributor

alex-spies commented Sep 11, 2023

Created a draft PR that already addresses adding/subtracting years before adding them to a datetime: #99432

@bpintea if you think the general direction of this is fine, I think I can take care of this issue.

@bpintea
Copy link
Contributor

bpintea commented Sep 13, 2023

@alex-spies an alternative approach might be to canonicalize ADD, having the DATETIME on the left and the temporal amount(s) on the right. This should work with the current folding approach of just chaining evaluators.

@alex-spies
Copy link
Contributor

Thanks @bpintea , that's a good point and would work as well.

On the other hand, wouldn't we normally want to canonicalize in a way that allows for the most folding possible? Before the evaluator is even triggered, I think our constant folding optimization rule should do as much work as possible, and ideally we should only be left with just one date math ADD or SUB (or two, if we consider both date periods and time durations), right?

This requires that we can fold periods/durations in the first place, so I think we'll have to do something about the fold method one way or another, no?

@bpintea
Copy link
Contributor

bpintea commented Sep 13, 2023

I think we'll have to do something about the fold method one way or another, no?

Yes, indeed! We can have the proposed solution as an intermediary one until something more generic is in place. But was thinking that if it's intermediary, a quicker fix might be to just canonicalize. Not having strong feelings about it, though.

@alex-spies
Copy link
Contributor

Alright, PR #99432 enables part of this functionality: expressions like 1week + 1day + now() as well as 1minute + 1 hour + now() (also with minus signs wherever) are covered fine.

What's missing after that are mixed expressions, where both periods (days, months etc.) and durations (hours, minutes etc.) occur and are added to each other before being added to a datetime. This includes the initial expression that started this issue: - 1 year - 2 minutes + n (which is equivalent to (-1year -2minutes) + n).

I'll try to summarize the possible approaches.

Approach 1: Add another unrepresentable data type

We could declare another data type that is comprised of a period and a duration; 1year + 1hour would then be a valid value for this data type. By allowing arithmetics with this data type and datetimes, expressions like (1year + 1hour) + now() would become valid.

Pros

  • Probably easy-ish to implement in the current state.
  • Maybe we can consolidate both durations and periods into the new data type, simplifying some logic.

Cons

  • Since this is a new data type, it will probably proliferate into lots of places, and potentially become even somewhat user-facing (at least in error messages).
  • Removing the new data type later for whatever reason might be complicated or impossible.

Approach 2: Rely on correct parentheses/canonicalization of expressions

@bpintea suggested that canonicalizing expressions would solve this issue. This relates to issue #99502, which suggests canonicalizing arithmetic expressions (though for other reasons). By exploiting (or imposing) associativity and commutativity of +, an expression like (1year + 1hour) + now() could be transformed into (now() + 1year) + 1hour, so that no sums involving both periods and durations will ever have to be evaluated.

More generally, we could canonicalize any date math expression by grouping all period summands and all duration summands, so that e.g.

1 year + 1 hour + 1 month + 1 second + now() + 1 millisecond

becomes

(now() + (1year + 1month)) + (1hour + 1millisecond)

Pros

  • Also improves performance, as constant folding will turn any date math expression into an operation that involves at most two sums. (((now() + 1year) + 1month) -2year)-2month would just become (now() -1year)-1month.
  • This effectively imposes associativity on date math expressions. Note that
    (2023-04-30 + (1 month + 2day)) -(1month +2day) == 2023-04-29
    
    because of the order of execution. With canonicalization, the expression above will just become 2023-04-30 + (0month + 0day) instead of creating an unexpected result.

Cons

  • Relying on canonicalization of an expression to be able to evaluate it, at all, is more complex than just writing another evaluator (or constant folding code).
  • Since canonicalization also contributes to performance, this might lead to complex situations and harder to maintain code.

@alex-spies
Copy link
Contributor

Since both possible approaches are either impactful or (potentially) complex, I think we need to postpone the full solution to this issue for a bit.

alex-spies added a commit that referenced this issue Sep 20, 2023
- Enable date math expressions like (1year + 1month) + now() and (1hour
  + 1second) + now().                          
- To achieve this, enable folding of + and - for duration/period literals.
- Provide user-friendly error messages for date time arithmetic
  overflows, and for illegal expressions like eval x = 1week + 1month
  whose type is not representable.

Partially solves #99293.
@costin costin added this to the 8.12 milestone Nov 16, 2023
@alex-spies alex-spies removed this from the 8.12 milestone Nov 21, 2023
@wchaparro wchaparro added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 2, 2024
@elasticsearchmachine elasticsearchmachine removed the Team:QL (Deprecated) Meta label for query languages team label Jan 2, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@alex-spies alex-spies removed their assignment Jul 4, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@bpintea bpintea removed their assignment Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

6 participants