Investigate single query related entity loading and orderings #29171

roji · 2022-09-20T19:03:40Z

When loading related entities in single query mode, our query pipeline currently injects orderings which make all related rows be grouped together. Our shaper relies on this ordering for assigning the related dependents to their correct principal. This issue is about investigating removing those orderings, and using client-side dictionary (or identity) lookups to find the principal instead.

Orderings generally impact query planning in a significant way, and require the database to do a lot of work. We've received quite a few user reports about these orderings; we still need to investigate this thoroughly, but it makes sense that the orderings would regress perf significantly in various scenarios.
In general, we should strive to remove as much load from the database, even at the cost of running slower at the client, since the database tier is far harder to scale than the application tier. The orderings effectively do the opposite, pushing more work down to the database.
One argument in favor of ordering is that it allows EF to stream the results, since all rows related to a principal are grouped together. If we remove the orderings, EF can't return a single principal before it consumes all rows, since there may be another dependent row at the end.
However, orderings prevent the database from streaming results back; we're basically pushing the buffering back to the server, increasing memory requirements there (as above, we should be doing the opposite and unloading the database).
This also negatively affects the latency of results, as the database starts sending rows back later. Removing the orderings would allow the database to return rows earlier, which is important.
We've raised the possibility of using identity resolution as a substitute for the orderings (but only if the queries are identified as buffering in some way). Depending on the perf impact, I think we should consider always doing this, either for AsTracking and NoTrackingWithIdentity, or possibly even for AsNoTracking (which makes it pretty much the same as NoTrackingWithIdentity). Note that we need to investigate why NoTrackingWithIdentity isn't efficient at the moment (#28579).
We may want to do the same for final GroupBy, which also injects orderings (#19929).

Thanks @NinoFloris for the conversation around this.

Issues on this:

#19571: previous issue where removing the orderings was discussed.
#20076: discussed distinguishing between buffering and streaming queries, and proposes removing tracking only for queries which are both buffering and tracking (via identity resolution).
#19828: issue for removing only the last ordering (done)

ajcvickers · 2022-09-21T09:09:15Z

We've raised the possibility of using identity resolution as a substitute for the orderings (but only if the queries are identified as buffering in some way). Depending on the perf impact, I think we should consider always doing this, in any case. This would effectively mean that NoTracking queries become NoTrackingWithIdentity.

One of the uses for AsNoTracking is to stream a result set that won't fit in memory. We shouldn't prevent this, even if it becomes opt-in.

roji · 2022-09-21T09:20:09Z

I'm wondering if it makes sense to allow users to stream huge results in EF, when doing so forces the database to buffer those same results (which is what the orderings do)... We're basically just pushing the buffering (=huge memory requirements) onto the database, no?

It's true that databases may have ways to better deal with this, e.g. use disk space as temporary storage while ordering the results. I certainly wouldn't want this to be anything close to a default behavior though, as perf there probably becomes truly disasterous.

I wonder if for the huge resultset scenario - which hopefully should be rare - it might not be better for users to deal with this by breaking their single LINQ query into two, separately processing principals and dependents. I'd certainly want to at least aggressively guide users towards considering this, rather than just slapping AsNoTracking to make things "just work", at the relatively hidden cost of increasing database memory usage (and CPU).

In any case, we definitely need to carefully consider what we do here, as changes here may be "breaking" (in the sense that client memory requirements may suddenly change in a very significant way).

(added needs-design)

ajcvickers · 2022-09-21T09:31:06Z

If a no-tracking query against a single table without ordering still streams efficiently, then that might be enough.

roji · 2022-09-21T09:35:25Z

Yeah, there's indeed no reason to buffer anywhere if only a single table is involved.

roji · 2022-09-21T09:45:45Z

One possible pattern to efficiently stream with relationships is the following (or some variation):

var posts = ctx.Posts
    .Select(p => new
    {
        Post = new Post { Id = p.Id, Title = p.Title },
        Blog = new Blog { Id = p.Blog.Id, Name = p.Blog.Name }
    })
    .ToList();

The point is to avoid the fixup; Assuming single query is used, each row contains all the information needed to materialize a result, and so should be able to stream. That could be acceptable as a user workaround for streaming huge result set scenario with relationships.

roji · 2022-10-17T14:22:28Z

A couple more notes on this...

First, when you have a single collection include, (Blogs.Include(b => b.Posts), an ORDER BY on the principal ID probably has little overhead, since there's an index (the primary). However, if you add another nested include (Blogs.Include(b => b.Posts).ThenInclude(p => p.Comments)), we have the second ordering on the Post ID. At this point an index cannot be used any more, and so removing the ordering becomes quite important.

Second, we think it's still important to support streaming (as opposed to buffering) for when results are huge. We should keep in mind that it's not great to achieve this at the expense of forcing the database to buffer, by adding orderings. Possibilities here include:

Reversing the query, starting from the dependent, and avoiding navigation fixup (see Investigate single query related entity loading and orderings #29171 (comment)).
Switch tracking queries to the new mode (without orderings), and leave no-tracking as the streaming mode.
- This requires us to maintain both techniques (more complexity), and makes no-tracking even more problematic from a perf standpoint.
- Since using the state manager imposes overhead, for cases where tracking isn't needed, AsNoTrackingWithIdentity becomes the new "best perf" option, as least for single query.
- We'd have to make sure AsNoTrackingWithIdentity it performs well (see Look into the performance of AsNoTrackingWithIdentity #28579), and educate people to use it. Many people just use AsNoTracking "to make stuff faster".

stevendarby · 2022-11-23T03:32:57Z

Please consider non-entity queries (i.e. projecting to a DTO) which are inherently non-tracking and where I suspect identity resolution won't work as a method of buffering either. So buffering at some other level may be required to support this. We do a lot of projection with collections and have retries on so are buffering; dropping order by in this scenario would be great!

stevendarby · 2022-11-23T03:54:40Z

Related question: included reference navigations get added to the order by - is this even needed? Could removing those be a simple win with few changes to streaming/buffering required? Can raise a separate issue if preferred.

roji · 2022-11-23T07:07:44Z

included reference navigations get added to the order by - is this even needed? Could removing those be a simple win with few changes to streaming/buffering required?

Interesting - yes, please open a separate issue; that sounds like a possibly cheap optimization.

roji · 2023-07-07T16:29:23Z

@hisuwh ok, thanks for confirming. This is definitely high up on my want list for 9.

stevendarby · 2023-07-07T16:41:40Z

@hisuwh one thing to be mindful of is that without the order by, you might start to see results appearing immediately in SSMS as it streams the results back, giving the impression the query has finished, when it might not have. Be sure to note the actual query time. With an order by there can be more buffering before it starts to stream results.

hisuwh · 2023-07-07T18:45:40Z

@roji ok great. Is the EF version tied to the .NET version? We've only just upgraded to .NET 6 so we won't be moving to 9 any time soon.

@stevendarby this query isn't returning any rows so I don't think that will matter

roji · 2023-07-07T19:26:30Z

one thing to be mindful of is that without the order by, you might start to see results appearing immediately in SSMS as it streams the results back [...]

It's worth mentioning that this in itself is a reason to remove those ORDER BYs. In other words, if EF forces the database to start delivering results much later (by asking for ordered results), then that in itself negatively impacts result latency (regardless of the overall time taken to process the query etc.).

eero-dev · 2023-12-18T05:29:59Z

Just adding my 2 cents since I had to deal with such problems in the past few weeks.

We have experienced multiple performance problems due to resource constrained SQL servers when deploying our application to on-premise (aka no resources and slow IO subsystem - nothing we can do about it as we don't manage the SQL servers in question), by loading the entities separately without includes we shaved off a significant chunk of processing away from the SQL server. We don't really care about the order until the data hits the application side or if we are doing pagination/selecting top 1.

Note: sharing only the order by part of the query plan due to nda reasons
For example here we are loading sensor data with a single join to get the sensor, loading time with order by id took a whopping 0.677s

After separating the query into two queries (load data + load sensor information separately): 0.022s, a 30x improvement and cpu time was significantly lower

This might? be an extreme case, but it was one we had to deal with.
Unless the tables in question have like 10 rows it might not be worth to do it by hand, but anything else it seems like we have to load them manually.

Good reading on the subject from a credible source:
https://www.brentozar.com/archive/2019/10/how-to-think-like-the-sql-server-engine-adding-an-order-by/

roji · 2023-12-18T09:20:22Z

@eero-dev thanks. Your data unfortunately doesn't add much given that it's very partial and doesn't expose actual queries and plans being performed; we're definitely aware that ORDER BY can have quite a cost, and this issue is about removing that specifically when loading related entities e.g. with Include, where EF itself is the one adding the ORDER BY for its own purposes.

The link is definitely useful - Brent Ozar is always a great source of info!

eero-dev · 2023-12-18T09:49:56Z

@eero-dev thanks. Your data unfortunately doesn't add much given that it's very partial and doesn't expose actual queries and plans being performed; we're definitely aware that ORDER BY can have quite a cost, and this issue is about removing that specifically when loading related entities e.g. with Include, where EF itself is the one adding the ORDER BY for its own purposes.

The link is definitely useful - Brent Ozar is always a great source of info!

Sorry for being unable to share any concrete data as it is confidential, I can provide a repro though if needed, which shows that even a single Include with the Order By is quite costly

roji · 2024-02-20T09:02:24Z

Note: consider removing the orderings for split query as well, not just for single query.

drdamour · 2025-02-18T19:15:03Z

you are probably not looking for more use cases....but i did recently hit this. The issue (which it turns out is quite common in the DBs we work with) is a "linker" table (ie a table that only exists to have FKs to two other tables in a means of representing a multi<->multi relationship) had a surrogate key that was NOT included in the unique indexes available.

Because of this...a key lookup against the cluster index became required for any queries involving this entity type/table and as it was in a tight inner loop which lead to overall bad query & app performance. Since it was a vendor managed db adding the surrogate to the end of the index (or including it) isn't a viable option. It was a bummer this column was only needed for the implicit EF core order by to satisfy the shaper.

It's been quite difficult to work around this without resorting to raw sql.

Webreaper · 2025-02-25T12:27:53Z

We're being impacted by this too - seeing a query that would/should take about 200-300ms without these orderings is taking 3-5s (yes seconds). We very explicitly add our ordering, but then these unnecessary additional sort orders added by EF break all of our indexes (and they also produce very weird effects with AsSplitQuery - as a related issue found. If we add AsSplitQuery the query goes from 3-5s to 25s or more in some cases.

Until this is resolved properly, does anyone have a simple (or even complicated 😄) solution to strip these unnecessary Order bys from the EF query? Wondering if there's some post-generation, pre-execution hook we can wire up to strip them out until there's a more fundamental fix within EF?

We're on .Net and EF 9.0. We don't care about streaming because we're paginating based on our own sorting, so return at most 100 records.

roji · 2025-02-26T09:50:43Z

@Webreaper you cannot strip the orderings from the query, since EF (currently) depends on them in order to properly materialize the results (i.e. the rows are expected to come back in the appropriate ordering, and if they don't you'll get bad results).

One workaround you can use is to avoid Include() altogether, but rather use Join(), at which point you're expressing the SQL query more directly and won't get the extra orderings:

_ = await context.Blogs
    .Join(context.Posts, b => b.Id, p => p.BlogId, (b, p) => new { b, p })
    .ToListAsync();

Webreaper · 2025-02-26T10:08:27Z

Thanks, that's really useful. I don't think I've ever actually used EFCore's Join directly before - we can look into it.

For now we've mitigated it by sort of hard-coding our own split query. So we do:

var resultIds = await context.Things
                             .Include( ... )
                             .ThenInclude( ... )
                             .Where( ... )
                             .OrderBy( ... )
                             .Select( x => x.Id )
                             .ToListAsync();

var actualResults = await GetTheThingsFromIDs( resultIds );

Because there's only a single column of IDs in the resultset, EF doesn't add any extraneous ORDER BY clauses, and we were already hitting the DB twice anyway (for unrelated reasons) so this isn't any less efficient, and seems to run much faster.

I'd ask you to please prioritise getting this fixed as a priorty in EF 10; it's already been bumped for 7, 8 and 9, and it seems critical enough to be something that shouldn't be a 4-year fix. I'd bet money this exact issue is the cause of a lot of people who claim "EF performance is rubbish" - but in this case it's justified.

roji added type-enhancement area-perf customer-reported area-query consider-for-next-release labels Sep 20, 2022

roji added the needs-design label Sep 21, 2022

ajcvickers added this to the Backlog milestone Sep 21, 2022

ajcvickers added consider-for-current-release shay-loves-labels and removed consider-for-next-release labels Oct 20, 2022

ajcvickers removed the shay-loves-labels label Nov 18, 2022

roji mentioned this issue Nov 23, 2022

Consider removing reference navigation keys from order by #29662

Open

ajcvickers assigned roji Dec 2, 2022

roji mentioned this issue Jan 8, 2023

encourage WHERE even if splitting dotnet/EntityFramework.Docs#4044

Closed

ajcvickers added the area-groupby label Jan 29, 2023

ajcvickers mentioned this issue Jan 29, 2023

Improve GroupBy support #30173

Open

53 tasks

roji mentioned this issue Apr 4, 2023

Generate table joins instead of subquery joins #17622

Open

This comment was marked as off-topic.

Sign in to view

roji added the propose-punt label Jun 16, 2023

This comment was marked as duplicate.

Sign in to view

This comment was marked as resolved.

Sign in to view

This comment was marked as spam.

Sign in to view

ajcvickers mentioned this issue Sep 25, 2023

AsSplitQuery : Remove LEFT JOIN in child query when split the query and join self #31700

Closed

This was referenced Oct 18, 2023

Prevent Entity Framework adding ORDER BY when using Include #20397

Closed

Adjust EF behavior based on query buffering vs. streaming #20076

Open

roji mentioned this issue Feb 2, 2024

Execute multiple LINQ queries in a single round-trip (aka Expose batching read API to users) #10879

Open

roji mentioned this issue Feb 20, 2024

Multiple Result Sets for SplitQuery #33124

Closed

roji mentioned this issue Apr 6, 2024

Concatenation in projection throws "System.InvalidOperationException: Unable to translate a collection subquery in a projection" #33410

Closed

roji mentioned this issue Sep 23, 2024

Consider ensuring that split query ordering is deterministic, either warning/throwing if not or injecting the primary key properties #34722

Closed

This comment was marked as duplicate.

Sign in to view

roji mentioned this issue Oct 12, 2024

Final GroupBy on an entity type adds orderings for all of its properties #34895

Closed

maumar mentioned this issue Oct 25, 2024

When is "order by" generated in the underlying sql command text #34971

Closed

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate single query related entity loading and orderings #29171

Investigate single query related entity loading and orderings #29171

roji commented Sep 20, 2022 •

edited

Loading

ajcvickers commented Sep 21, 2022

roji commented Sep 21, 2022 •

edited

Loading

ajcvickers commented Sep 21, 2022

roji commented Sep 21, 2022

roji commented Sep 21, 2022 •

edited

Loading

roji commented Oct 17, 2022

stevendarby commented Nov 23, 2022

stevendarby commented Nov 23, 2022

roji commented Nov 23, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

roji commented Jul 7, 2023

stevendarby commented Jul 7, 2023

hisuwh commented Jul 7, 2023

roji commented Jul 7, 2023

This comment was marked as duplicate.

This comment was marked as duplicate.

This comment was marked as resolved.

This comment was marked as spam.

This comment was marked as spam.

eero-dev commented Dec 18, 2023 •

edited

Loading

roji commented Dec 18, 2023

eero-dev commented Dec 18, 2023 •

edited

Loading

roji commented Feb 20, 2024

This comment was marked as duplicate.

This comment was marked as duplicate.

This comment has been minimized.

This comment has been minimized.

drdamour commented Feb 18, 2025 •

edited

Loading

Webreaper commented Feb 25, 2025 •

edited

Loading

roji commented Feb 26, 2025

Webreaper commented Feb 26, 2025 •

edited

Loading

Investigate single query related entity loading and orderings #29171

Investigate single query related entity loading and orderings #29171

Comments

roji commented Sep 20, 2022 • edited Loading

ajcvickers commented Sep 21, 2022

roji commented Sep 21, 2022 • edited Loading

ajcvickers commented Sep 21, 2022

roji commented Sep 21, 2022

roji commented Sep 21, 2022 • edited Loading

roji commented Oct 17, 2022

stevendarby commented Nov 23, 2022

stevendarby commented Nov 23, 2022

roji commented Nov 23, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

roji commented Jul 7, 2023

stevendarby commented Jul 7, 2023

hisuwh commented Jul 7, 2023

roji commented Jul 7, 2023

This comment was marked as duplicate.

This comment was marked as duplicate.

This comment was marked as resolved.

This comment was marked as spam.

This comment was marked as spam.

eero-dev commented Dec 18, 2023 • edited Loading

roji commented Dec 18, 2023

eero-dev commented Dec 18, 2023 • edited Loading

roji commented Feb 20, 2024

This comment was marked as duplicate.

This comment was marked as duplicate.

This comment has been minimized.

This comment has been minimized.

drdamour commented Feb 18, 2025 • edited Loading

Webreaper commented Feb 25, 2025 • edited Loading

roji commented Feb 26, 2025

Webreaper commented Feb 26, 2025 • edited Loading

roji commented Sep 20, 2022 •

edited

Loading

roji commented Sep 21, 2022 •

edited

Loading

roji commented Sep 21, 2022 •

edited

Loading

eero-dev commented Dec 18, 2023 •

edited

Loading

eero-dev commented Dec 18, 2023 •

edited

Loading

drdamour commented Feb 18, 2025 •

edited

Loading

Webreaper commented Feb 25, 2025 •

edited

Loading

Webreaper commented Feb 26, 2025 •

edited

Loading