Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize SQL expression comparison by caching hash codes #34149

Open
ranma42 opened this issue Jul 3, 2024 · 7 comments
Open

Optimize SQL expression comparison by caching hash codes #34149

ranma42 opened this issue Jul 3, 2024 · 7 comments

Comments

@ranma42
Copy link
Contributor

ranma42 commented Jul 3, 2024

In the query pipeline sometimes expressions are compared for (deep) equality.
This currently is based on a recursive visit of the subtree, which can be very costly if done multiple times while visiting the expression (it is quite easy to construct expression trees that require O(n^2) operations when visited).

This could be improved by computing a hash that filters out most of the inequalities as suggested in #34133 (comment)

@ranma42
Copy link
Contributor Author

ranma42 commented Jul 3, 2024

Side note: another case that could be interesting is the negation of an expression. Having it "ready to use" would make some predicate-related optimizations faster/cheaper.

@roji roji added this to the Backlog milestone Jul 3, 2024
@roji
Copy link
Member

roji commented Jul 3, 2024

Blocked on making our entire SqlExpression tree immutable (#32927); SelectExpression is currently mutable, and since it can be contained inside most SqlExpressions (thanks @ranma42), we can't cache the hashcode.

@roji
Copy link
Member

roji commented Jul 4, 2024

Blocked on making our entire SqlExpression tree immutable (#32927); SelectExpression is currently mutable, and since it can be contained inside most SqlExpressions (#34133 (comment)), we can't cache the hashcode.

Though on second thought, SelectExpression should never be mutable when it's already composed upon...

@roji roji changed the title Comparing expressions can be slow Optimize SQL expression comparison by caching hash codes Aug 22, 2024
@roji
Copy link
Member

roji commented Aug 22, 2024

@ranma42 just to continue on my comment just above, I think this should be safe to implement even in the current bits - any case in which a mutable SelectExpression is contained within another expression (except ShapedQueryExpression) should be a violation of our current invariants: only the top-level select in the tree may be mutable.

I still absolutely want to make SelectExpression fully immutable (#32927), but that's quite a big task, and I don't think it needs to block this optimization here.

@roji
Copy link
Member

roji commented Aug 22, 2024

Note also #19859, which is about not exhaustively calculating hash codes for the entire tree, but just some shallower subset; this would further improve our performance here.

@ranma42
Copy link
Contributor Author

ranma42 commented Aug 22, 2024

@ranma42 just to continue on my comment just above, I think this should be safe to implement even in the current bits - any case in which a mutable SelectExpression is contained within another expression (except ShapedQueryExpression) should be a violation of our current invariants: only the top-level select in the tree may be mutable.

I still absolutely want to make SelectExpression fully immutable (#32927), but that's quite a big task, and I don't think it needs to block this optimization here.

If that is the case, I believe I can try and implement this. What would be the best way to evaluate the performance? (would a micro-benchmark make sense?)

@roji
Copy link
Member

roji commented Aug 22, 2024

Sure, an ad-hoc BenchmarkDotNet benchmark could work - not sure we need to commit it etc. Though this is one of the cases where the benefits depend on the tree depth/complexity which you choose to benchmark, which is a bit arbitrary... I think it's OK to do this improvement in any case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants