-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize SQL expression comparison by caching hash codes #34149
Comments
Side note: another case that could be interesting is the negation of an expression. Having it "ready to use" would make some predicate-related optimizations faster/cheaper. |
Blocked on making our entire SqlExpression tree immutable (#32927); SelectExpression is currently mutable, and since it can be contained inside most SqlExpressions (thanks @ranma42), we can't cache the hashcode. |
Though on second thought, SelectExpression should never be mutable when it's already composed upon... |
@ranma42 just to continue on my comment just above, I think this should be safe to implement even in the current bits - any case in which a mutable SelectExpression is contained within another expression (except ShapedQueryExpression) should be a violation of our current invariants: only the top-level select in the tree may be mutable. I still absolutely want to make SelectExpression fully immutable (#32927), but that's quite a big task, and I don't think it needs to block this optimization here. |
Note also #19859, which is about not exhaustively calculating hash codes for the entire tree, but just some shallower subset; this would further improve our performance here. |
If that is the case, I believe I can try and implement this. What would be the best way to evaluate the performance? (would a micro-benchmark make sense?) |
Sure, an ad-hoc BenchmarkDotNet benchmark could work - not sure we need to commit it etc. Though this is one of the cases where the benefits depend on the tree depth/complexity which you choose to benchmark, which is a bit arbitrary... I think it's OK to do this improvement in any case? |
In the query pipeline sometimes expressions are compared for (deep) equality.
This currently is based on a recursive visit of the subtree, which can be very costly if done multiple times while visiting the expression (it is quite easy to construct expression trees that require O(n^2) operations when visited).
This could be improved by computing a hash that filters out most of the inequalities as suggested in #34133 (comment)
The text was updated successfully, but these errors were encountered: