-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle IN optimizer transformation of ordering to constant (take 2) #16422
Conversation
Our InExpressionValuesExpanding visitor can turn an an IN expression to a constant (i.e. when the values list is empty), which is unsupported in orderings in SqlServer. Detect this and turn these expressions to a simple constant expression detectable by the SQL generator, which will generate (SELECT 1). Fixes #15713
switch (node) | ||
{ | ||
case ColumnExpression _: | ||
case SqlParameterExpression _: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SqlServer does not allow ordering by parameter either. This may work in current tests but seems like a hacky solution.
declare @p1 int;
set @p1 = 1;
select 1
from Cities
order by
case
when 1 = 1 then @p1
else 1
end
Msg 1008, Level 16, State 1, Line 8
The SELECT item identified by the ORDER BY number 1 contains a variable as part of the expression identifying a column position. Variables are only allowed when ordering by an expression referencing a column name.
Let's do it right way by simplifying expression. Or we can leave the PR unmerged for now and revisit it later closer to preview8 deadline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so: ORDER BY @p
doesn't work because it's interpreted as a column position (not supported), but ORDER BY @p + 1
does work, presumably because it's interpreted as a constant instead?
😖
In any case, thinking about it some more, I think the previous logic in #16380 may have actually been more correct. Ordering by COUNT(*) is very similar to ordering by a constant - if an ordering doesn't reference the current row in any way, it is basically meaningless, isn't it? In fact, QuerySqlGenerator.GenerateOrderings already elides SqlParameterExpression just like SqlConstantExpression. So why not have the same logic here, and always simplify an ordering that contains a ParameterExpression (but no ColumnExpression) to (SELECT 1)
?
(We should try hard to avoid simplifying the expression unless absolutely necessary, as this will be both non-trivial to write and also have to run in runtime).
Note that COUNT() over a related entity (e.g. OrderBy(c => c.Orders.Count()
) does make sense - unlike COUNT() over the actual entity which is a constant. This should work properly because the join will introduce ColumnExpressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordering by COUNT(*) is very similar to ordering by a constant - if an ordering doesn't reference the current row in any way, it is basically meaningless, isn't it?
Not when you have a Group by clause.
(We should try hard to avoid simplifying the expression unless absolutely necessary, as this will be both non-trivial to write and also have to run in runtime).
Let's not do premature optimization here. I would prefer to be on the side of being correct than being fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not when you have a Group by clause.
Good point - but I don't think it's a fundamental thing here. An OrderBy() after GroupBy() still must refer to the columns of the table in some way - and that can be our trigger for not reducing it to constant. I just did a quick test, and this actually works well with the current state of this PR, since OrderBy() is generated with an SqlFragment for the star in COUNT(*):
var list = new List<string>();
return AssertQueryScalar<Order>(isAsync, os => os
.GroupBy(o => o.CustomerID)
.OrderBy(g => (list.Contains(g.Key) ? 1 : 0) + g.Count())
.Select(g => g.Count()));
SELECT COUNT(*)
FROM [Orders] AS [o]
GROUP BY [o].[CustomerID]
ORDER BY CASE
WHEN CAST(1 AS bit) = CAST(0 AS bit) THEN 1
ELSE 0
END + COUNT(*)
So unless I'm mistaken, the approach of looking for references to the row's columns should be OK - let me know if you see another problematic case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no good logical reason why SqlFragmentExpression is allow-able like ColumnExpression. Yes it works for COUNT(*)
but I fail to identify why it is any general case.
Unless, we have very detailed understanding of what is supported and what is not then a generalized visitor based pattern matching cannot be concluded to be accurate. Otherwise it is just bunch of hacks put together to make tests (or scenarios we know) pass. What if someone creates SqlFragment with value 1 inside. This escapes it but still invalid SQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that we don't know, and will never be able to know, what is inside an SqlFragment - no matter how complete an analysis we do. So either we:
- Assume that it is a constant, in which case we will sometimes elide the whole ordering. This silently produces a wrong ordering.
- Assume that it may not be a constant, in which case SqlServer will error if it is a constant.
The only theoretical alternative is if we start to somehow transfer additional information with the SqlFragment, which allows us to understand what's in it. That simply seems unjustified for just these edge cases.
To my mind that is good logical reasoning (I actually excepted SqlFragment before looking at COUNT(*)
) - I don't agree this can be called a "bunch of hacks".
protected override Expression VisitExtension(Expression extensionExpression) | ||
{ | ||
var visited = base.VisitExtension(extensionExpression); | ||
if (_innerExpressionChangedToConstant && visited is OrderingExpression orderingExpression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few design things (no need to address right now)
- Instead of storing state you can just check if
visited.Equals(extensionExpression)
. If it is changed, it will be different and reconstructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here was to limit the check to only the case where when we rewrite the expression into a constant - not all changes inside the ordering mean that we need to perform this (relatively expensive) check. This might be a somewhat unneeded optimization, let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly un-needed. We can also do reference equality due to immutability and reconstruction. It should give same result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be misunderstanding, but once again there are rewritings that this visitor performs which do not require our check. For example, if there's a null in the values list, the visitor adds null semantics - if we do a reference equality check it will show as modified, although there's no need to check whether it was converted to constant.
It may not be important to skip these cases perf-wise, but then again the stateful flag approach seems easy and safe enough, and allows us to check only when we know we've performed a "dangerous" rewrite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
States are dangerous. No need of premature optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have state in a lot of visitors around the pipeline, this is outside of the compilation phase (so perf-sensitive), and the statefulness is extremely limited.
But if you're deadset against it I can remove it.
src/EFCore.Relational/Query/Pipeline/InExpressionValuesExpandingExpressionVisitor.cs
Outdated
Show resolved
Hide resolved
Also stop treating parameters as non-constant
@smitpatel have pushed some fixed based on your comments. Let's decide what we want to do with the open questions above. |
I suggest putting this PR on hold. I am not convinced that this solution is accurate in all case. We need detailed understanding of what works and what does not and how to exclude non-working case. I don't have time to think more about this right now. I can revisit this in last week of preview8 deadline. |
@smitpatel no problem for putting this PR on hold. Have added my comments anyway (but answer only when you have time). Also, we're now dealing with correctness for what looks like pretty rare edge cases, it may be fine to not handle everything. |
Not handling pretty rare edge cases is fine when they are exception to normal rules. Here we don't even have proper set of rules. I am not willing to take bunch of hacks to make certain tests pass. If you believe that this scenario is important enough to make it work in 3.0 then do exact pattern match for the specific structure you are looking into without using ExpressionVisitor. Otherwise backlog. |
PR is not right fix and associated issue is in backlog. |
Our InExpressionValuesExpanding visitor can turn an an IN expression to a constant (i.e. when the values list is empty), which is unsupported in orderings in SqlServer. Detect this and turn these expressions to a simple constant expression detectable by the SQL generator, which will generate (SELECT 1).
Fixes #15713
#16380 was a previous reverted attempt.