Optimize SQL expression comparison by caching hash codes #34149

ranma42 · 2024-07-03T12:43:22Z

In the query pipeline sometimes expressions are compared for (deep) equality.
This currently is based on a recursive visit of the subtree, which can be very costly if done multiple times while visiting the expression (it is quite easy to construct expression trees that require O(n^2) operations when visited).

This could be improved by computing a hash that filters out most of the inequalities as suggested in #34133 (comment)

ranma42 · 2024-07-03T12:44:19Z

Side note: another case that could be interesting is the negation of an expression. Having it "ready to use" would make some predicate-related optimizations faster/cheaper.

roji · 2024-07-03T15:33:11Z

Blocked on making our entire SqlExpression tree immutable (#32927); SelectExpression is currently mutable, and since it can be contained inside most SqlExpressions (thanks @ranma42), we can't cache the hashcode.

roji · 2024-07-04T06:47:17Z

Blocked on making our entire SqlExpression tree immutable (#32927); SelectExpression is currently mutable, and since it can be contained inside most SqlExpressions (#34133 (comment)), we can't cache the hashcode.

Though on second thought, SelectExpression should never be mutable when it's already composed upon...

roji · 2024-08-22T09:19:01Z

@ranma42 just to continue on my comment just above, I think this should be safe to implement even in the current bits - any case in which a mutable SelectExpression is contained within another expression (except ShapedQueryExpression) should be a violation of our current invariants: only the top-level select in the tree may be mutable.

I still absolutely want to make SelectExpression fully immutable (#32927), but that's quite a big task, and I don't think it needs to block this optimization here.

roji · 2024-08-22T09:22:17Z

Note also #19859, which is about not exhaustively calculating hash codes for the entire tree, but just some shallower subset; this would further improve our performance here.

ranma42 · 2024-08-22T09:45:21Z

@ranma42 just to continue on my comment just above, I think this should be safe to implement even in the current bits - any case in which a mutable SelectExpression is contained within another expression (except ShapedQueryExpression) should be a violation of our current invariants: only the top-level select in the tree may be mutable.

I still absolutely want to make SelectExpression fully immutable (#32927), but that's quite a big task, and I don't think it needs to block this optimization here.

If that is the case, I believe I can try and implement this. What would be the best way to evaluate the performance? (would a micro-benchmark make sense?)

roji · 2024-08-22T11:30:42Z

Sure, an ad-hoc BenchmarkDotNet benchmark could work - not sure we need to commit it etc. Though this is one of the cases where the benefits depend on the tree depth/complexity which you choose to benchmark, which is a bit arbitrary... I think it's OK to do this improvement in any case?

ranma42 added the customer-reported label Jul 3, 2024

ranma42 mentioned this issue Jul 3, 2024

Simplify AND and OR #34133

Merged

roji added area-perf area-query labels Jul 3, 2024

roji added this to the Backlog milestone Jul 3, 2024

roji added the blocked label Jul 3, 2024

roji mentioned this issue Jul 4, 2024

Consider introducing nullability on SqlExpression #33889

Open

roji removed the blocked label Jul 4, 2024

roji changed the title ~~Comparing expressions can be slow~~ Optimize SQL expression comparison by caching hash codes Aug 22, 2024

This was referenced Aug 22, 2024

Query: Support for BETWEEN ... AND ... #12634

Open

Transform multiple ORs into a single SQL IN #34507

Open

roji mentioned this issue Aug 22, 2024

Compilation perf: consider caching hash codes for immutable query expression node types #19860

Closed

roji mentioned this issue Aug 29, 2024

Apply early SqlExpression optimizations (e.g. x AND true -> x) in VisitChildren #34556

Open

ranma42 mentioned this issue Dec 15, 2024

Translate to NULLIF #35327

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize SQL expression comparison by caching hash codes #34149

Optimize SQL expression comparison by caching hash codes #34149

ranma42 commented Jul 3, 2024

ranma42 commented Jul 3, 2024

roji commented Jul 3, 2024 •

edited

Loading

roji commented Jul 4, 2024

roji commented Aug 22, 2024

roji commented Aug 22, 2024

ranma42 commented Aug 22, 2024

roji commented Aug 22, 2024

Optimize SQL expression comparison by caching hash codes #34149

Optimize SQL expression comparison by caching hash codes #34149

Comments

ranma42 commented Jul 3, 2024

ranma42 commented Jul 3, 2024

roji commented Jul 3, 2024 • edited Loading

roji commented Jul 4, 2024

roji commented Aug 22, 2024

roji commented Aug 22, 2024

ranma42 commented Aug 22, 2024

roji commented Aug 22, 2024

roji commented Jul 3, 2024 •

edited

Loading