Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document behavior of AsSplitQuery() when subquery has unstable ordering #3242

Closed
angelaki opened this issue Apr 19, 2021 · 32 comments · Fixed by #3685
Closed

Document behavior of AsSplitQuery() when subquery has unstable ordering #3242

angelaki opened this issue Apr 19, 2021 · 32 comments · Fixed by #3685
Assignees
Milestone

Comments

@angelaki
Copy link

I'm sorry I cannot fully reproduce this error yet, but as I already described in a just created stackoverflow (https://stackoverflow.com/questions/67169707/azure-sql-queries-with-same-order-returning-different-resultsets) Azure SQL somehow seams to have problem keeping the order for some queries. This sure breaks the logic of your "AsSplitQuery()" so that I needed to remove it completely and live with the performance decrease.

Is this maybe an already known misbehavior? Is there something I can do to make it work? Anyways, I thought you should know this.

Edit: I now know that SQL Server is not guaranteed to keep the order between queries for not unique columns! The order is determined based on indexes, columns etc.. But what to do in these cases? Guess I'm not the last one to fall in this pit. I now append a .ThenBy(e => e.UniqueIdentifier) everytime a order is attached. But it's a pretty technical reason causing this issue and will sure confuse several persons out there!

I don't know how complex this could be under the hood, but would it be possible to always order by entities PK last if any / no orders are used? Guess it's getting pretty tricky right now but it is a terrible pitfall right now!

@ajcvickers
Copy link
Member

@angelaki EF Core should be ensuring a unique ordering when generating SQL for a split query, so it sounds like this is a bug. Please attach a small, runnable project or post a small, runnable code listing that reproduces what you are seeing so that we can investigate.

@angelaki
Copy link
Author

What should have been the expected behavior? Should it get ordered by Id last by default? I'll try my very best, but isolating this issue seams pretty tricky.

@smitpatel
Copy link
Contributor

EF Core split query will add ordering by PK automatically. It is appended after the ordering specified by the user so that user gets results in the order they asked. So Split queries always have a stable and unique-ly identifying ordering. So the issue you are describing doesn't really occur for split query in theory. So you will need to provide a full repro which shows that split query is not working correctly due to aforementioned reason. It is likely that you are running into a different issue.

@angelaki
Copy link
Author

angelaki commented Apr 25, 2021

Sorry, I'm pretty busy these days, but I'll try to isolate the issue next weekend. It's a querable pushed through multiple services, getting dynamicly ordered by expressions etc..

@ajcvickers
Copy link
Member

EF Team Triage: Closing this issue as the requested additional details have not been provided and we have been unable to reproduce it.

BTW this is a canned response and may have info or details that do not directly apply to this particular issue. While we'd like to spend the time to uniquely address every incoming issue, we get a lot traffic on the EF projects and that is not practical. To ensure we maximize the time we have to work on fixing bugs, implementing new features, etc. we use canned responses for common triage decisions.

@angelaki
Copy link
Author

angelaki commented May 5, 2021

Sorry it took me a while to isolate the issue - it actually was pretty easy. Here you go:

public class Appointment
{
    public Guid Id { get; set; }
    public string Name { get; set; }
    public DateTime? Start { get; set; }
    public ICollection<AppointmentParticipant> Participants { get; set; }
}

public class AppointmentParticipant
{
    public Guid UserId { get; set; }
    public Guid AppointmentId { get; set; }
    public Appointment Appointment { get; set; }
}

_dbContext.Appointments.Include(p => p.Participants).AsSplitQuery().OrderBy(a => a.Start).Skip(10).Take(10).ToArrayAsync();

generates the following SQL:

SELECT [t].[Id], [t].[Name], [t].[Start]
FROM (
    SELECT [a].[Id], [a].[Name], [a].[Start]
    FROM [Appointment] AS [a]
    ORDER BY [a].[Start]
    OFFSET @__p_0 ROWS FETCH NEXT @__p_0 ROWS ONLY
) AS [t]
ORDER BY [t].[Start], [t].[Id]

As you can see the INNER SQL is sorted wrong! Furthermore I noticed:

  • .Take(X).Skip(X) (reverse Order) generates wrong SQLs
  • Order after .Take(X).Skip(X) generates wrong SQLs

But that's ok for me. Intended behavior?

@roji
Copy link
Member

roji commented May 5, 2021

@angelaki what exactly is wrong with the inner SQL? It seems to sort by Start as you've requested in the LINQ query, no?

.Take(X).Skip(X) (reverse Order) generates wrong SQLs

Take/Skip does not do reverse ordering - try using these operators in LINQ to Objects (without EF Core) to see the expected behavior. If you think the EF-generated SQL is wrong with regards to that, please open an issue with the exact SQL you're seeing and why it doesn't seem right.

Order after .Take(X).Skip(X) generates wrong SQLs

Here too, I'm guessing your expectations may not be accurate. Take/Skip without a preceding OrderBy are problematic, since in SQL table/subquery rows have no determinate order; so what actual rows get selected is completely non-deterministic. Once again, if you suspect a real issue, please open a separate issue with a full LINQ query and the generated SQL.

@angelaki
Copy link
Author

angelaki commented May 5, 2021

Thank you for the explanation about these two behaviors! 👍

Back to my initial issue:

As you can see the outer SQL is ordered by Start and Id to ensure the data is always returned in the same order. Id gets added by some EFCore magic I guess - I never actually sorted by it. Since the Inner Query gets only ordered by Start (what is not unique) its order is not deterministic (at least not enough, since it changes depending on selected fields [SQL Server optimization etc.]).

Now that I use .AsSplitQuery() the second query will try to get all the participants the first query returned. It will use the same subquery, bit since no columns are actually selected this time the order changes and the SplitQuery return the participants from different appointments!

It sure still works if I manually order by Start and Id, but as @ajcvickers and @smitpatel mentioned this should work "out of the box"

@roji
Copy link
Member

roji commented May 5, 2021

Reopening to let @smitpatel take a look.

@roji roji reopened this May 5, 2021
@smitpatel
Copy link
Contributor

smitpatel commented May 5, 2021

The SQL generated by EF Core is accurate. We generate OrderBy-Apply Skip/Take and then add ordering we require so result set is pre-defined.

Now the ordering being non-unique is not entirely issue as long as Skip/Take are not crossing duplicate column set. e.g. If you are sorting by City and values are New York, New York, Seattle, Seattle and you do Skip(0).Take(3) then which row with city value Seattle will be picked is non-deterministic. If you do Take(2) then subquery may not have exact ordering but it will select same rows and then outer query will order them uniquely.

If your main result set is non-deterministic because of the combination of OrderBy/Skip/Take specified then you will get whatever results server gets. Neither EF Core can predict that it will be problematic (since it is data dependent), nor EF Core can do anything fix is because it may change expected results and add additional (at times significant) perf.

@angelaki
Copy link
Author

angelaki commented May 5, 2021

Yes, I totally agree with you. But in my case AsSingleQuery still works while AsSplitQuery is broken since the reload (appointment participants) targets different rows!

@smitpatel
Copy link
Contributor

When the issue happens only when you run query multiple times then it would only happens when using AsSplitQuery. Again, EF Core cannot do anything here.

@angelaki
Copy link
Author

angelaki commented May 5, 2021

No, AsSplitQuery runs two queries! But the 2nd one does not contain the records of the first one! Just like the outer query gets the primary key as order criteria attached, the inner one needs it, too.

Do you need more code to reproduce it? I just skipped pasting the second query getting executed by one EFCore action.

@smitpatel
Copy link
Contributor

No we don't need more code. We are saying we cannot do anything here as you, owner of the data, need to make sure that running your query are not generating non-deterministic result set. The issue can also be reproduced by applying same set of OrderBy/Skip/Take and then selecting only certain columns using anonymous type projection. Then your own 2 queries will give different result sets.

At the end of the day, it is data-dependent non-deterministic query by user and EF Core cannot do anything about it.

@angelaki
Copy link
Author

angelaki commented May 5, 2021

But you are also adding order criterias for regular queries, why don't you for SubSelects?

Statements running just fine as AsSingleQuery and returning wrong results as AsSplitQuery are pretty dangerous / hard to find for coders. And it could be solved by applying the order to the inner query just like you do for the outer.

@smitpatel
Copy link
Contributor

why don't you for SubSelects?

I already written above. First we don't know that user-specified ordering is problematic. Second, adding additional ordering lowers perf. We cannot automatically add ordering and cause perf penalty for everyone for the sake of a non-deterministic query.

@angelaki
Copy link
Author

angelaki commented May 5, 2021

Ah, ok. But why do you for the outer query? I'd expect a full order extension or none. But the current behavior is just dangerous in my opinion.

Ordering the inner by my criteria (Start) only but the outer by mine (Start) plus the PK (Id) seams inconsistent.

@smitpatel
Copy link
Contributor

We add orderings we need to generate results. And we only add orderings at very outer-level unless there are nested collections being projected. So if your subquery is 4 levels deep inside the same thing doesn't work. There are also multiple issues in this repo filed to remove those orderings. We also have PR in progress to remove last ordering because very last ordering is not needed. So sorry, your argument doesn't hold. At the very least the core issue remains that you have paging operation which generates non-deterministic results. You are free to add additional ordering to make it deterministic.

@angelaki
Copy link
Author

angelaki commented May 5, 2021

I agree with you if no automatic ordering is done at all - and if it gets removed i'm fine. Right now 99,99% of the queries are automatically ordered and the very few that really need it (paged split queries) aren't.

Maybe also only paged split queries could get ordered automatically. Right now it seams like an issue - just like @ajcvickers said: EF Core should be ensuring a unique ordering when generating SQL for a split query, so it sounds like this is a bug

@smitpatel
Copy link
Contributor

EF Core should be ensuring a unique ordering when generating SQL for a split query

Which we did at outer level. It is subquery results which are different. Subquery is generated based on user provided query.

@angelaki
Copy link
Author

angelaki commented May 6, 2021

Sorry, but I don't really get it. Doing it at outer level isn't even necessary, it's more a nice gimmick, while not sorting the inner one breaks the functionallity. Speaking in performance terms is sorting the outer one applied in 99,99% of the queries for no reason while sorting the inner one would be necessary.

Anyways, how about a .ThenByKey() order extension method? This way the user could easy and dynamically apply a PK ordering to entities he's not aware of?

@smitpatel
Copy link
Contributor

  • Sorting on outer level is not a gimmick. It is required ordering when streaming results. Expression tree doesn't tell you if it is streaming query or buffering query.
  • Sorting in inner level may break the functionality but it is purely data dependent and the query user wrote and not something EF Core ever know if it breaks functionality or not.
  • There is no additional method needed. You can use ThenBy after order by in the query to sort by PK and EF Core will sort inner subquery with it for you.
  • Dynamic query generation also works with ThenBy since the type of results are deterministic & not a random type, PK properties should be known to user. And user can use EF.Property call to generate any mapped property access in EF Core linq query.

@roji
Copy link
Member

roji commented May 6, 2021

@angelaki for sorting at the outer level, there are various issues where this is already discussed - dotnet/efcore#19571 may help understand why it's needed.

@angelaki
Copy link
Author

angelaki commented May 6, 2021

Ok, I got the point. Seams like you had some trouble with these sorting things already. And I agree, it's not a bug, it's just not a feature. Maybe an extension method, opt-in or what ever could be helpful for ordered paged queries - but guess that'd make things too complicated. I agree with you: the user needs to know what he's doing.

Edit: maybe one idea. This issue is only .Include() related I guess - and in my specific case .Include() with .AsSplitQuery(). How about setting a flag the indicates the included queries as "order ensured" (for outer orders and for AsSplitQuery for the inner ones, too), with an option to disable the auto sorting? But maybe ... We're back in making things too complicated? Guess that order thing already cost you some nerves ...

@angelaki
Copy link
Author

angelaki commented May 9, 2021

Back again, since my solution has proven me wrong: I really love the idea of my services returning IQueryables of entities other users can consume. I cannot expect them to know how the entities need to be ordered for the query to work. They can reorder the entities the way they need them breaking the inner query.

Imho EFCore needs to be able to return a queryable that can be reordered without breaking it's logic (and not even noticing it)! I guess an extension method that forces all queries (inner and outer) to be ordered by the entities PK would be a great solution. Sure opt-in.

@ajcvickers ajcvickers changed the title AsSplitQuery()-Behavior causing issues with Azure Document behavior of AsSplitQuery() when subquery has unstable ordering May 10, 2021
@ajcvickers ajcvickers transferred this issue from dotnet/efcore May 10, 2021
@smitpatel
Copy link
Contributor

Imho EFCore needs to be able to return a queryable that can be reordered without breaking it's logic

The issue is the user written query which has non-deterministic results due to ordering on non-unique element and then applying paging. If you write a query which doesn't generate non-deterministic results then that IQueryable will work just fine for all scenarios including scenarios where consumers of IQueryable specifies different ordering since the inner subquery results are fixed at that point. This is not issue for consumers, this is issue for the producer of the LINQ query.

@angelaki
Copy link
Author

angelaki commented May 10, 2021

Maybe I'm mistake here, so I'll describe you my scenario:

I have a service that returns an IQueryable<Appointment> that can be consumed by other users. The service automatically filteres the appointments that are deleted, the user can't see etc. and orders them correctly. For example this could look like

IQueryable<Appointment> GetAppointments()
    => _dbContext.Appointments.AsSplitQuery().Include(a => a.Participants).Where(a => !a.IsDeleted && a.Participants.Any(p => p.UserId == currentUserId)).OrderBy(a => a.Start).ThenBy(a => a.Id);

Now the user consuming my service wants to order them by Name, so he executes service.GetAppointments().OrderBy(a => a.Name).Skip(50).Take(50). and the order is removed. The query still executes but the participants are missing due to the non-deterministic results.

I can tell the users to not reorder the unmaterialized results but that wouldn't allow us to take benefits of streaming, server side ordering etc.

@smitpatel
Copy link
Contributor

Doing ordering on non-unique column and applying paging generates wrong results. At this point since all that code is written by consumer, they are writing bad LINQ query, so they get non-deterministic results. It doesn't hide anything from provider perspective.

@angelaki
Copy link
Author

Yes, guess you're right. But since this problem only occurs with AsSplitQuery and is hard for users to understand why not just add an opt-in overload that will ensure deterministic results for the AsSplitQuery method? I think that would be a nice solution.

@smitpatel
Copy link
Contributor

The issue occurs outside of AsSplitQuery also. If you run 2 queries which uses same kind of subquery but with non-deterministic results and try to stich or match them on client side (basically what AsSplitQuery does under the hood), it will still give wrong results. So point falls into that non-deterministic results are just bad regardless so such query shouldn't be written even if it doesn't give error since there is no correct results.

@angelaki
Copy link
Author

Ok, you're absolutely right. But still: how about an opt-in extension method to ensure the order for inner and outer queries independent on the use sortings? But I'm already noticing ... you don't like the idea of it 😉

@smitpatel
Copy link
Contributor

As you know - we don't like the idea of it because it doesn't educate users to write proper queries. Instead it introduce a hack for users so that they can continue writing bad queries and EF Core correcting under the hood. And an opt-in extension method means they still have to learn which method to use. So if we are educating users then why not send them in right direction rather than in a hack direction. So we will update docs to mention this so users write nicer queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants