Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle IN optimizer transformation of ordering to constant (take 2) #16422

Closed
wants to merge 2 commits into from

Conversation

roji
Copy link
Member

@roji roji commented Jul 3, 2019

Our InExpressionValuesExpanding visitor can turn an an IN expression to a constant (i.e. when the values list is empty), which is unsupported in orderings in SqlServer. Detect this and turn these expressions to a simple constant expression detectable by the SQL generator, which will generate (SELECT 1).

Fixes #15713

#16380 was a previous reverted attempt.

Our InExpressionValuesExpanding visitor can turn an an IN expression
to a constant (i.e. when the values list is empty), which is
unsupported in orderings in SqlServer. Detect this and turn these
expressions to a simple constant expression detectable by the
SQL generator, which will generate (SELECT 1).

Fixes #15713
switch (node)
{
case ColumnExpression _:
case SqlParameterExpression _:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SqlServer does not allow ordering by parameter either. This may work in current tests but seems like a hacky solution.

declare @p1 int;
set @p1 = 1;

select 1 
from Cities
order by 
case 
    when 1 = 1 then @p1
	else 1
end
Msg 1008, Level 16, State 1, Line 8
The SELECT item identified by the ORDER BY number 1 contains a variable as part of the expression identifying a column position. Variables are only allowed when ordering by an expression referencing a column name.

Let's do it right way by simplifying expression. Or we can leave the PR unmerged for now and revisit it later closer to preview8 deadline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so: ORDER BY @p doesn't work because it's interpreted as a column position (not supported), but ORDER BY @p + 1 does work, presumably because it's interpreted as a constant instead?

😖

In any case, thinking about it some more, I think the previous logic in #16380 may have actually been more correct. Ordering by COUNT(*) is very similar to ordering by a constant - if an ordering doesn't reference the current row in any way, it is basically meaningless, isn't it? In fact, QuerySqlGenerator.GenerateOrderings already elides SqlParameterExpression just like SqlConstantExpression. So why not have the same logic here, and always simplify an ordering that contains a ParameterExpression (but no ColumnExpression) to (SELECT 1)?

(We should try hard to avoid simplifying the expression unless absolutely necessary, as this will be both non-trivial to write and also have to run in runtime).

Note that COUNT() over a related entity (e.g. OrderBy(c => c.Orders.Count()) does make sense - unlike COUNT() over the actual entity which is a constant. This should work properly because the join will introduce ColumnExpressions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering by COUNT(*) is very similar to ordering by a constant - if an ordering doesn't reference the current row in any way, it is basically meaningless, isn't it?

Not when you have a Group by clause.

(We should try hard to avoid simplifying the expression unless absolutely necessary, as this will be both non-trivial to write and also have to run in runtime).

Let's not do premature optimization here. I would prefer to be on the side of being correct than being fast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not when you have a Group by clause.

Good point - but I don't think it's a fundamental thing here. An OrderBy() after GroupBy() still must refer to the columns of the table in some way - and that can be our trigger for not reducing it to constant. I just did a quick test, and this actually works well with the current state of this PR, since OrderBy() is generated with an SqlFragment for the star in COUNT(*):

var list = new List<string>();

return AssertQueryScalar<Order>(isAsync, os => os
    .GroupBy(o => o.CustomerID)
    .OrderBy(g => (list.Contains(g.Key) ? 1 : 0) + g.Count())
    .Select(g => g.Count()));
SELECT COUNT(*)
FROM [Orders] AS [o]
GROUP BY [o].[CustomerID]
ORDER BY CASE
    WHEN CAST(1 AS bit) = CAST(0 AS bit) THEN 1
    ELSE 0
END + COUNT(*)

So unless I'm mistaken, the approach of looking for references to the row's columns should be OK - let me know if you see another problematic case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no good logical reason why SqlFragmentExpression is allow-able like ColumnExpression. Yes it works for COUNT(*) but I fail to identify why it is any general case.
Unless, we have very detailed understanding of what is supported and what is not then a generalized visitor based pattern matching cannot be concluded to be accurate. Otherwise it is just bunch of hacks put together to make tests (or scenarios we know) pass. What if someone creates SqlFragment with value 1 inside. This escapes it but still invalid SQL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that we don't know, and will never be able to know, what is inside an SqlFragment - no matter how complete an analysis we do. So either we:

  1. Assume that it is a constant, in which case we will sometimes elide the whole ordering. This silently produces a wrong ordering.
  2. Assume that it may not be a constant, in which case SqlServer will error if it is a constant.

The only theoretical alternative is if we start to somehow transfer additional information with the SqlFragment, which allows us to understand what's in it. That simply seems unjustified for just these edge cases.

To my mind that is good logical reasoning (I actually excepted SqlFragment before looking at COUNT(*)) - I don't agree this can be called a "bunch of hacks".

protected override Expression VisitExtension(Expression extensionExpression)
{
var visited = base.VisitExtension(extensionExpression);
if (_innerExpressionChangedToConstant && visited is OrderingExpression orderingExpression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few design things (no need to address right now)

  • Instead of storing state you can just check if visited.Equals(extensionExpression). If it is changed, it will be different and reconstructed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here was to limit the check to only the case where when we rewrite the expression into a constant - not all changes inside the ordering mean that we need to perform this (relatively expensive) check. This might be a somewhat unneeded optimization, let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly un-needed. We can also do reference equality due to immutability and reconstruction. It should give same result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding, but once again there are rewritings that this visitor performs which do not require our check. For example, if there's a null in the values list, the visitor adds null semantics - if we do a reference equality check it will show as modified, although there's no need to check whether it was converted to constant.

It may not be important to skip these cases perf-wise, but then again the stateful flag approach seems easy and safe enough, and allows us to check only when we know we've performed a "dangerous" rewrite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

States are dangerous. No need of premature optimization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have state in a lot of visitors around the pipeline, this is outside of the compilation phase (so perf-sensitive), and the statefulness is extremely limited.

But if you're deadset against it I can remove it.

Also stop treating parameters as non-constant
@roji
Copy link
Member Author

roji commented Jul 4, 2019

@smitpatel have pushed some fixed based on your comments. Let's decide what we want to do with the open questions above.

@smitpatel
Copy link
Contributor

I suggest putting this PR on hold. I am not convinced that this solution is accurate in all case. We need detailed understanding of what works and what does not and how to exclude non-working case.
Even simplifying expression could be more accurate than this but not 100% accurate. So that is not final solution either.

I don't have time to think more about this right now. I can revisit this in last week of preview8 deadline.

@roji
Copy link
Member Author

roji commented Jul 5, 2019

@smitpatel no problem for putting this PR on hold. Have added my comments anyway (but answer only when you have time).

Also, we're now dealing with correctness for what looks like pretty rare edge cases, it may be fine to not handle everything.

@smitpatel
Copy link
Contributor

Also, we're now dealing with correctness for what looks like pretty rare edge cases

Not handling pretty rare edge cases is fine when they are exception to normal rules. Here we don't even have proper set of rules. I am not willing to take bunch of hacks to make certain tests pass. If you believe that this scenario is important enough to make it work in 3.0 then do exact pattern match for the specific structure you are looking into without using ExpressionVisitor. Otherwise backlog.

@smitpatel
Copy link
Contributor

PR is not right fix and associated issue is in backlog.

@smitpatel smitpatel closed this Sep 9, 2019
@smitpatel smitpatel deleted the OrderBy1 branch September 9, 2019 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants