-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove last ORDER BY for collection joins #24360
Conversation
Hmm, composite keys :) |
This seems like it could be done for queries with identity resolution, but queries without it would break as the shaper depends on the ordering for detecting entity boundaries in the resultset. Separating out to #24377. |
Re composite keys... Assuming Blogs with Posts, where Post has a composite key, loading with this PR as is gives: SELECT [b].[Id], [b].[Name], [p].[Id1], [p].[Id2], [p].[BlogId]
FROM [Blogs] AS [b]
LEFT JOIN [Post] AS [p] ON [b].[Id] = [p].[BlogId]
ORDER BY [b].[Id], [p].[Id1] -- no [p].[Id2] So only one of the composite key's columns has been chopped off. Getting rid of the other one is trickier: the inner select expressions However, unless I'm missing something, there shouldn't be an issue with leaving this unneeded ordering - it's only a further missed optimization for the composite key scenario, which we can always handle later if we think it's valuable. |
db61e87
to
a783d59
Compare
All tests are passing - including on PostgreSQL (which is notorious for ordering things differently). Note that ordering of related entities is now no longer deterministic, so entity asserters may need to apply ordering (see the example in OwnedQueryTestBase.cs, without which the PG test failed). Any further ideas?? This looks correct/safe to me, but maybe @smitpatel @maumar have some other ideas. |
@@ -1645,7 +1662,7 @@ static Expression RemoveConvert(Expression expression) | |||
{ | |||
var updatedColumn = identifier.Column.MakeNullable(); | |||
_childIdentifiers.Add((updatedColumn, identifier.Comparer)); | |||
AppendOrdering(new OrderingExpression(updatedColumn, ascending: true)); | |||
AppendOrdering(new OrderingExpression(updatedColumn, ascending: true), isPending: true); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to create an AppendPendingOrdering(List<OrderingExpression> orderings)
maybe instead, to solve the issue with composite keys? Then when another ordering is added (pending or non-pending), the whole previous pending list is applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm misunderstanding the code, when SelectExpression.ApplyCollectionJoin is called, innerSelectExpression
has already has a single list of all identifier columns included in it; if it's a subquery over 5 different tables, than it contains a flattened list which doesn't make any distinction between which column corresponds to which table (and so which columns belong together as a composite key). Of course, this could be fixed with further refactoring - but optimizing further for the composite key case seems a bit less important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a tree of collection joins, and an inner expression has at least one join, then the ordering requirements for that inner expression will already have been applied a few lines above through innerOrderingExpressions
, if I'm not mistaken. At least this happens in for example Multi_level_include_one_to_many_optional_and_one_to_many_optional_produces_valid_sql
. So I guess my approach will work...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this case:
db.Set<Level1>().Include(l1 => l1.Level2s).ThenInclude(l2 => l2.Level3s)
.Include(l1 => l1.AnotherLevel2s);
In this case the order by items must be identifiers of Level1, Level2, Level3.
I tried to find a test case in the test suite for the above Include pattern, but couldn't find one. Maybe consider add such a test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's missing, it would definitely be good to add it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Emill am looking at this again for 6.0. Re our conversation above, I do seem to be seeing a problem, see test ComplexNavigationsCollectionsQuerySqlServerTest.IncludeCollection8 as an example. The PR causes the following changes in the SQL:
-ORDER BY [l].[Id], [t].[Id], [t].[Id0], [t].[Id1]");
+ORDER BY [l].[Id]");
This is because all three identifier orderings of the inner query end up as pending orderings in the outer query, and never actually get added. innerOrderingExpressions is empty for this case - there are no orderings in the inner query, just identifiers that need to be ordered by.
Assuming I haven't missed anything trivial and given that the 6.0 feature deadline is approaching fast, I think it would be OK to merge #24360 for non-composite key orderings and revisit composite ones later - tell me what you think.
The code I'm working on is in https://github.com/roji/efcore/tree/ThatLastOrderBy (based off of your #24401)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following offline conversation with @Emill, am closing this for now - we'll remove only the last ordering for 6.0, and pursue going further for composite keys in the future.
Just an FYI, I did a simple performance test in PostgreSQL using Even though PostgreSQL adds an incremental sort (the outer elements are traversed through an index scan) when using two order by items, and the inner sort will in this case sort only 2 elements per outer element, it still adds quite a significant overhead. It would be interesting to know if MySQL also uses incremental sort or if it uses a full sort. In any case, I hope MySQL will be able to optimize away the sort altogether and just perform an index scan on the outer table to fulfill the order by, when using only one order by item. |
@Emill thanks for the perf measurement - it's good to get further confirmation that this helps. |
I implemented the approach I described at https://github.com/Emill/efcore/tree/ThatLastOrderBy, and it passes all the tests (except for that the sql queries don't match the expected, of course). |
@Emill I've looked at the code again, and I think you're right. Basically, this code copies all the orderings from the inner select expression to the outer, so this code adding pending orderings ends up only copying the inner pending orderings to the outer, as pending orderings. Unless I'm mistaken, this means that all identifiers coming from a single table are preserved as a single pending group as they bubble up the outer queries. Would you like to submit a PR against my PR branch, including the SQL baselines (assuming the SQL changes make sense of course 🤣)? If not, let me know and I can work on it. |
Ah, I think what I wrote was already written by you in #24360 (comment) :) |
Ok made a PR at #24401. Please go ahead and fix the SQL in the tests. The test case I talked about above shows an important fact that we must add pending orderings for the first join in the outer expression. Since the inner orderings that are already applied miss the last column, this last column will be finally added when the second outer join is processed, when it applies the pending ordering from previous join. |
Also consider cases where the collection itself has user specified orderings so we don't mess up orderings there. |
Good point. I saw some tests where this was the case, i.e. where an inner expression was ordered by some text column. It seems this is covered since the orderings added for the collection shapers to work are always added last, after the user-defined orderings, for every level. So something like |
FYI we'll be looking at increasing test coverage for composite key scenarios before merging this, so it might take a bit. |
As per #24360 (comment), rebased and retested this (SqlServer/Sqlite/Npgsql). @smitpatel @maumar your move. |
Split out the further optimization for composite keys to #25481. |
{ | ||
Check.NotNull(orderingExpression, nameof(orderingExpression)); | ||
|
||
if (orderingExpression.Equals(_pendingOrdering)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to de-dupe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this should apply right away. The purpose of pending ordering to remember last ordering we would actually end up putting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this should apply right away.
Also was not sure. But yeah, it seems pretty sure that trying to append an ordering with pending=true when the same ordering is already pending should just do nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to de-dupe
Looking at this again, I'm not sure this is needed...
- If X is in _pendingOrdering, it already passed dedup to get in there in the last round.
- If anything else was appended since (pending or not), X would be appended to the actual orderings at that point.
- If X comes in exactly when X is pending (nothing in between):
- If the 2nd X comes in with pending=true, we need to append
- If the 2nd X also comes in pending, I'm not sure. However, that seems edge-casey, and appending when we don't have to isn't a bug, just a missed optimization...
Am I missing something? Do you have a specific scenario in mind?
If pending ordering is first class in SelectExpression and we add ordering for inner level collections also, why there is no changes to SQL in inner level. All SQL changes are very last ordering at the end of SQL. Wouldn't say I like this implementation. Given we have a code path to apply collections as join, we should try to put avoid appending ordering part of that implementation. |
This reverts commit 5a37a7c.
Remember we only chop the last ordering which comes out of identifiers in an inner query. I did a quick survey and the ORDER BYs that I'm seeing in inner queries are almost always explicit user orderings coming before skip/take/whatever (and other unrelated stuff) - we don't want to touch those. Is there any scenario where we would have orderings in a subquery which come from a identifiers in a collection join?
Am open to change! Can you point me to where you think this should live? |
@smitpatel made the pending ordering logic self-contained in ApplyProjection as discussed, take a look and tell me if you want any other changes. (BTW it might be possible to make other orderings added by ApplyProjection pending - I gave it a quick shot but got some weird test failures, so deferring). Re the test, we can wait and see what @maumar comes up with! |
OK @smitpatel, I think I have a better understanding now - hopefully this looks good. |
Hello @roji! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
Apologies, while this PR appears ready to be merged, I've been configured to only merge when all checks have explicitly passed. The following integrations have not reported any progress on their checks and are blocking auto-merge:
These integrations are possibly never going to report a check, and unblocking auto-merge likely requires a human being to update my configuration to exempt these integrations from requiring a passing check. Give feedback on thisFrom the bot dev teamWe've tried to tune the bot such that it posts a comment like this only when auto-merge is blocked for exceptional, non-intuitive reasons. When the bot's auto-merge capability is properly configured, auto-merge should operate as you would intuitively expect and you should not see any spurious comments. Please reach out to us at fabricbotservices@microsoft.com to provide feedback if you believe you're seeing this comment appear spuriously. Please note that we usually are unable to update your bot configuration on your team's behalf, but we're happy to help you identify your bot admin. |
Thanks for the guidance @smitpatel |
Remind me to never use auto-merge again (because of the squash). |
Here's a draft attempt, probably wrong in various ways 🤣 I've tested and modified the baselines in ComplexNavigationsQuerySqlServerTest only for now, if it's a good start, I can continue.
The approach is simple - just have a pending ordering which gets applied only when another ordering is about to be applied (we already like pending things so much).
After implementing the above, I realized something - can't we drop all orderings on any collection joins which doesn't have a deeper join referring back to it? For example, in the following:
Since b and c are both at the same level (dependents of a), don't we only need to order on a.id in this query?
Fixes #19828