-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document behavior of AsSplitQuery() when subquery has unstable ordering #3242
Comments
@angelaki EF Core should be ensuring a unique ordering when generating SQL for a split query, so it sounds like this is a bug. Please attach a small, runnable project or post a small, runnable code listing that reproduces what you are seeing so that we can investigate. |
What should have been the expected behavior? Should it get ordered by Id last by default? I'll try my very best, but isolating this issue seams pretty tricky. |
EF Core split query will add ordering by PK automatically. It is appended after the ordering specified by the user so that user gets results in the order they asked. So Split queries always have a stable and unique-ly identifying ordering. So the issue you are describing doesn't really occur for split query in theory. So you will need to provide a full repro which shows that split query is not working correctly due to aforementioned reason. It is likely that you are running into a different issue. |
Sorry, I'm pretty busy these days, but I'll try to isolate the issue next weekend. It's a querable pushed through multiple services, getting dynamicly ordered by expressions etc.. |
EF Team Triage: Closing this issue as the requested additional details have not been provided and we have been unable to reproduce it. BTW this is a canned response and may have info or details that do not directly apply to this particular issue. While we'd like to spend the time to uniquely address every incoming issue, we get a lot traffic on the EF projects and that is not practical. To ensure we maximize the time we have to work on fixing bugs, implementing new features, etc. we use canned responses for common triage decisions. |
Sorry it took me a while to isolate the issue - it actually was pretty easy. Here you go:
generates the following SQL:
As you can see the INNER SQL is sorted wrong! Furthermore I noticed:
But that's ok for me. Intended behavior? |
@angelaki what exactly is wrong with the inner SQL? It seems to sort by Start as you've requested in the LINQ query, no?
Take/Skip does not do reverse ordering - try using these operators in LINQ to Objects (without EF Core) to see the expected behavior. If you think the EF-generated SQL is wrong with regards to that, please open an issue with the exact SQL you're seeing and why it doesn't seem right.
Here too, I'm guessing your expectations may not be accurate. Take/Skip without a preceding OrderBy are problematic, since in SQL table/subquery rows have no determinate order; so what actual rows get selected is completely non-deterministic. Once again, if you suspect a real issue, please open a separate issue with a full LINQ query and the generated SQL. |
Thank you for the explanation about these two behaviors! 👍 Back to my initial issue: As you can see the outer SQL is ordered by Now that I use It sure still works if I manually order by |
Reopening to let @smitpatel take a look. |
The SQL generated by EF Core is accurate. We generate OrderBy-Apply Skip/Take and then add ordering we require so result set is pre-defined. Now the ordering being non-unique is not entirely issue as long as Skip/Take are not crossing duplicate column set. e.g. If you are sorting by City and values are New York, New York, Seattle, Seattle and you do Skip(0).Take(3) then which row with city value Seattle will be picked is non-deterministic. If you do Take(2) then subquery may not have exact ordering but it will select same rows and then outer query will order them uniquely. If your main result set is non-deterministic because of the combination of OrderBy/Skip/Take specified then you will get whatever results server gets. Neither EF Core can predict that it will be problematic (since it is data dependent), nor EF Core can do anything fix is because it may change expected results and add additional (at times significant) perf. |
Yes, I totally agree with you. But in my case |
When the issue happens only when you run query multiple times then it would only happens when using AsSplitQuery. Again, EF Core cannot do anything here. |
No, AsSplitQuery runs two queries! But the 2nd one does not contain the records of the first one! Just like the outer query gets the primary key as order criteria attached, the inner one needs it, too. Do you need more code to reproduce it? I just skipped pasting the second query getting executed by one EFCore action. |
No we don't need more code. We are saying we cannot do anything here as you, owner of the data, need to make sure that running your query are not generating non-deterministic result set. The issue can also be reproduced by applying same set of OrderBy/Skip/Take and then selecting only certain columns using anonymous type projection. Then your own 2 queries will give different result sets. At the end of the day, it is data-dependent non-deterministic query by user and EF Core cannot do anything about it. |
But you are also adding order criterias for regular queries, why don't you for SubSelects? Statements running just fine as |
I already written above. First we don't know that user-specified ordering is problematic. Second, adding additional ordering lowers perf. We cannot automatically add ordering and cause perf penalty for everyone for the sake of a non-deterministic query. |
Ah, ok. But why do you for the outer query? I'd expect a full order extension or none. But the current behavior is just dangerous in my opinion. Ordering the inner by my criteria (Start) only but the outer by mine (Start) plus the PK (Id) seams inconsistent. |
We add orderings we need to generate results. And we only add orderings at very outer-level unless there are nested collections being projected. So if your subquery is 4 levels deep inside the same thing doesn't work. There are also multiple issues in this repo filed to remove those orderings. We also have PR in progress to remove last ordering because very last ordering is not needed. So sorry, your argument doesn't hold. At the very least the core issue remains that you have paging operation which generates non-deterministic results. You are free to add additional ordering to make it deterministic. |
I agree with you if no automatic ordering is done at all - and if it gets removed i'm fine. Right now 99,99% of the queries are automatically ordered and the very few that really need it (paged split queries) aren't. Maybe also only paged split queries could get ordered automatically. Right now it seams like an issue - just like @ajcvickers said: EF Core should be ensuring a unique ordering when generating SQL for a split query, so it sounds like this is a bug |
Which we did at outer level. It is subquery results which are different. Subquery is generated based on user provided query. |
Sorry, but I don't really get it. Doing it at outer level isn't even necessary, it's more a nice gimmick, while not sorting the inner one breaks the functionallity. Speaking in performance terms is sorting the outer one applied in 99,99% of the queries for no reason while sorting the inner one would be necessary. Anyways, how about a |
|
@angelaki for sorting at the outer level, there are various issues where this is already discussed - dotnet/efcore#19571 may help understand why it's needed. |
Ok, I got the point. Seams like you had some trouble with these sorting things already. And I agree, it's not a bug, it's just not a feature. Maybe an extension method, opt-in or what ever could be helpful for ordered paged queries - but guess that'd make things too complicated. I agree with you: the user needs to know what he's doing. Edit: maybe one idea. This issue is only |
Back again, since my solution has proven me wrong: I really love the idea of my services returning IQueryables of entities other users can consume. I cannot expect them to know how the entities need to be ordered for the query to work. They can reorder the entities the way they need them breaking the inner query. Imho EFCore needs to be able to return a queryable that can be reordered without breaking it's logic (and not even noticing it)! I guess an extension method that forces all queries (inner and outer) to be ordered by the entities PK would be a great solution. Sure opt-in. |
The issue is the user written query which has non-deterministic results due to ordering on non-unique element and then applying paging. If you write a query which doesn't generate non-deterministic results then that IQueryable will work just fine for all scenarios including scenarios where consumers of IQueryable specifies different ordering since the inner subquery results are fixed at that point. This is not issue for consumers, this is issue for the producer of the LINQ query. |
Maybe I'm mistake here, so I'll describe you my scenario: I have a service that returns an IQueryable<Appointment> GetAppointments()
=> _dbContext.Appointments.AsSplitQuery().Include(a => a.Participants).Where(a => !a.IsDeleted && a.Participants.Any(p => p.UserId == currentUserId)).OrderBy(a => a.Start).ThenBy(a => a.Id); Now the user consuming my service wants to order them by I can tell the users to not reorder the unmaterialized results but that wouldn't allow us to take benefits of streaming, server side ordering etc. |
Doing ordering on non-unique column and applying paging generates wrong results. At this point since all that code is written by consumer, they are writing bad LINQ query, so they get non-deterministic results. It doesn't hide anything from provider perspective. |
Yes, guess you're right. But since this problem only occurs with |
The issue occurs outside of AsSplitQuery also. If you run 2 queries which uses same kind of subquery but with non-deterministic results and try to stich or match them on client side (basically what AsSplitQuery does under the hood), it will still give wrong results. So point falls into that non-deterministic results are just bad regardless so such query shouldn't be written even if it doesn't give error since there is no correct results. |
Ok, you're absolutely right. But still: how about an opt-in extension method to ensure the order for inner and outer queries independent on the use sortings? But I'm already noticing ... you don't like the idea of it 😉 |
As you know - we don't like the idea of it because it doesn't educate users to write proper queries. Instead it introduce a hack for users so that they can continue writing bad queries and EF Core correcting under the hood. And an opt-in extension method means they still have to learn which method to use. So if we are educating users then why not send them in right direction rather than in a hack direction. So we will update docs to mention this so users write nicer queries. |
In split query, and also a stronger note in the pagination page. Closes dotnet#3242
In split query, and also a stronger note in the pagination page. Closes dotnet#3242
I'm sorry I cannot fully reproduce this error yet, but as I already described in a just created stackoverflow (https://stackoverflow.com/questions/67169707/azure-sql-queries-with-same-order-returning-different-resultsets) Azure SQL somehow seams to have problem keeping the order for some queries. This sure breaks the logic of your "AsSplitQuery()" so that I needed to remove it completely and live with the performance decrease.
Is this maybe an already known misbehavior? Is there something I can do to make it work? Anyways, I thought you should know this.
Edit: I now know that SQL Server is not guaranteed to keep the order between queries for not unique columns! The order is determined based on indexes, columns etc.. But what to do in these cases? Guess I'm not the last one to fall in this pit. I now append a
.ThenBy(e => e.UniqueIdentifier)
everytime a order is attached. But it's a pretty technical reason causing this issue and will sure confuse several persons out there!I don't know how complex this could be under the hood, but would it be possible to always order by entities PK last if any / no orders are used? Guess it's getting pretty tricky right now but it is a terrible pitfall right now!
The text was updated successfully, but these errors were encountered: