-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant Query Slowdown When Using Multiple Joins Due To Changes In 3.0 #18022
Comments
This also happens with me: #18017 It seems it's happening to quite a few people. I know my query sucks, as do you in OP, but it would be a shame to have to revert an upgrade due this. |
@KevinMallinson I'm not surprised, it's a pretty significant change. I was hoping when I first read about it, that there would be some way to opt-out of this new behaviour, but unfortunately, that doesn't seem to be the case. |
Some observations:
|
@divega Having an API or extension to use the previous behavior would be good for those who don't have the bandwidth to rewrite large queries like this one. I personally would like to rewrite mine since it's inefficient and cumbersome to read, but as it stands I'll have to remain on 2.2.6 for a while longer. I'm sure others are in the same boat. As far as refactoring it goes, I seldom use explicit loading, so excuse my unfamiliarity with it, but would I be able to use that in tandem with No Tracking to be able to obtain a copy of the entity that can have its keys changed before saving it again? |
@dragnilar you just made me realize there are two interesting points I didn't think of:
|
Now I remember now that I needed the fixup behaviour, which was why I couldn't use explicit loading in the first place for this particular query. I tried out what you suggested: Running the query through EF with tracking enabled has it finish successfully. It takes about ~25 seconds on average. Running the generated SQL (I obtained it via the profiler) in SSMS takes almost 2 minutes to finish. The generated SQL is pretty large in this case, but it's not as large as when I have tracking turned off. |
I am very glad of the new behavior. Many times I had written a query expecting it to run in a single call, when it makes multiple roundtrips to the database to retrieve what I know to be very small amounts of data. Now it is in my hands as a developer whether to make one call to SQL for all the joins, or make separate requests to the database to retrieve all the pertinent information. Rewriting a few complex queries seems a small price to pay for the benefits gained in the majority of queries, at least certainly going forwards. Ideally, we could provide SQL-compilation hints to the query, where it would run those queries separately. Or maybe opt-in behavior for the query as a whole. However, I never found EF 2.2 to produce very efficient queries in this scenario, so I would hope for more optimized behavior. |
@Shane32 I agree it's better that it adhire to what is expected but at the time it was the seemingly recommended approach based on the documentation provided and what other resources recommended for the cloning scenario. Im fine with rewriting it, but as I said in the OP, I'd rather explore all options first before committing to a rewrite. I'm not sure what others will say, for me it's a pain, but not a super huge impact on my team since my code base uses pretty simple queries for all other operations. |
Please, provide an API because this behavior change is very problematic. We need to rewrite a lot of our queries in EF Core 3. |
I would like to chime in and say that we are also having this issue, and is very problematic for us. An |
Same here, seeing how the multiple queries returned their own subset of results then were consolidated in the framework that resulted in 100s of results. My query is fairly basic, it's really just: Get me products by brand, their images, their options and those option's values. Example: To my surprise, when I opened SQL Inspector when I started getting timeouts, I found out it was returning 150,000+ results when running the command manually. |
For anyone wondering how to 'hack' your way around this to get 2.2 speed on 3.0, follow the idea on this comment. Basically, you define an initial query to filter your results to the entity you want, and then create dummy variables with .includes to include the related entities you need. Important: I have only tested if this returns the correct data, nothing else. If you need to do something more than reading/displaying, make sure you test first. For example, I went from this: var equipment = await _context.Equipment.Include(x => x.Group)
.Include(x => x.SystemInfo).ThenInclude(x => x.SystemUsers)
.Include(x => x.SystemInfo).ThenInclude(x => x.Frameworks)
.Include(x => x.SystemInfo).ThenInclude(x => x.VideoCards)
.Include(x => x.SystemInfo).ThenInclude(x => x.StorageDrives)
.Include(x => x.SystemInfo).ThenInclude(x => x.Software)
.Include(x => x.SystemInfo).ThenInclude(x => x.NetworkAdapters)
.Include(x => x.SystemInfo).ThenInclude(x => x.Printers)
.Include(x => x.Parts).ThenInclude(x => x.ChildrenParts)
.Include(x => x.Parts).ThenInclude(x => x.ParentParts)
.Include(x => x.Parts).ThenInclude(x => x.Vendor)
.Include(x => x.MaintenanceHours)
.Include(x => x.Attachments)
.Include(x => x.Notes)
.Include(x => x.Status)
.Include(x => x.Area)
.Include(x => x.EquipmentType)
.Include(x => x.Department)
.Include(x => x.PMaintenance)
.Include(x => x.Request)
.FirstOrDefaultAsync(x => x.EquipmentId == id); to this: var equip = _context.Equipment.Include(x => x.Group).Where(x => x.EquipmentId == id);
var equipment = await equip.Include(x => x.MaintenanceHours)
.Include(x => x.Attachments)
.Include(x => x.Notes)
.Include(x => x.Status)
.Include(x => x.Area)
.Include(x => x.EquipmentType)
.Include(x => x.Department)
.Include(x => x.PMaintenance)
.Include(x => x.Request).FirstOrDefaultAsync();
var d2 = await equip.Include(x => x.SystemInfo).ThenInclude(x => x.SystemUsers).FirstOrDefaultAsync();
var d3 = await equip.Include(x => x.SystemInfo).ThenInclude(x => x.Frameworks).FirstOrDefaultAsync();
var d4 = await equip.Include(x => x.SystemInfo).ThenInclude(x => x.VideoCards).FirstOrDefaultAsync();
var d5 = await equip.Include(x => x.SystemInfo).ThenInclude(x => x.StorageDrives).FirstOrDefaultAsync();
var d6 = await equip.Include(x => x.SystemInfo).ThenInclude(x => x.Software).FirstOrDefaultAsync();
var d7 = await equip.Include(x => x.SystemInfo).ThenInclude(x => x.NetworkAdapters).FirstOrDefaultAsync();
var d8 = await equip.Include(x => x.SystemInfo).ThenInclude(x => x.Printers).FirstOrDefaultAsync();
var d9 = await equip.Include(x => x.Parts).ThenInclude(x => x.ChildrenParts).FirstOrDefaultAsync();
var d10 = await equip.Include(x => x.Parts).ThenInclude(x => x.ParentParts).FirstOrDefaultAsync();
var d11 = await equip.Include(x => x.Parts).ThenInclude(x => x.Vendor).FirstOrDefaultAsync(); You should only need to do this for the entities that need a |
@erikmf12 not sure if this is a full work around. I think we tried something like that and ran into a problem with the entity not having been fixed up since we need it to be non tracked so we can reset the primary keys before recomitting the entity back to the database as a copy of the original. Currently we're considering just writing a manual clone. |
@dragnilar True, I have not tested anything besides making sure all the data is included. In my case, I just need to display the data on a detail page. |
Figured I would reference the original change request for this feature on here so it would be tracked back on that request. |
@dragnilar Correct, my workaround only works with tracked entities. You should be able to write a snippet to untrack ALL entities from a dbcontext, if that works for you. Something like this: (not sure if this is a EF6 or EF Core snippet) public void DetachAllEntities()
{
var changedEntriesCopy = this.ChangeTracker.Entries()
.Where(e => e.State == EntityState.Added ||
e.State == EntityState.Modified ||
e.State == EntityState.Deleted)
.ToList();
foreach (var entry in changedEntriesCopy)
entry.State = EntityState.Detached;
} If that works for you, you can retrieve the data with tracking, then detach all tracking, then use your existing code to clone the data. You could probably also write code to inspect any entity object, and utilize the IModel to recursively iterate through all of its navigation members, creating a list of all related entities, then detach them all. Should be relatively easy to do. With a little extra work, it could reset primary keys and foreign keys, which may be necessary as well. |
Edit: I was probably being too polite. In actuality, we did try this first when we saw what the problem was, since it was an old trick we all had learned back in our EF6 days. Unfortunately it did not work due to a bug (see other comments I made). When we finally found out about the bug and the recommended work around though, the whole thing worked. Hence, the advice was appreciated though. However your remark about IModel did not mean anything to me. Also, I could not find any significant documentation of it besides some vague statements over at MSDN. 😑 It sounds cool but I didn't want to get wrapped up in chasing something blindly; alas I left it alone. |
Triage decision:
|
@smitpatel I think the last point would probably provide the best flexibility, but I'll understand if your team decides against it. In the least, the documentation definitely needs to be updated so others can avoid running into this situation. |
I think this is quickly going to become the most frequent issue/question with EF Core 3.x once people to start to transition and have the WTF moment, and indeed whilst my EAV model is mandated by a legacy system and is a questionable design choice, this new restriction affects even the most simple scenarios - the canonical Blog/Tags/Comments/Upvotes. If we are really restricted to realistically 2 or 3 to .Include 1-to-many relations and/or carry the cognitive burden of computing the avg number of related rows to power of number of .Include so we don't stream too much redundant information then I can't see how EF Core is any better than the micro ORMs for the query side. I understand the argument that splitting the queries can potentially carry consistency risks that aren't clear when not used with the right transaction modes and so wasn't perfect fit, however it was pragmatic and this strive for the right choice seems like a strive for purity rather than what the consumers of EF Core 3.0 desired. |
@sitepodmatt Indeed, it was a pretty nasty surprise to see things like this with EF Core 3.0. As I implied, I am in a position where I can go either way and see both sides. And also for what it is worth, me and my team did notice certain areas perform better with 3.0 versus 2.6. Unfortunately, with the number of breaking changes (such as this) and the other issues that have cropped up on here over the past couple of days, we have decided to table upgrading to EF 3.0 for the time being. Also, I think for the sake of backwards compatibility, they should have left in a hook or some type of flag on the context that you could toggle to use multi-query mode vs single query mode. I believe that would have been more pragmatic. I almost wonder if @roji was being semi-prophetic when he said this on #12098:
I doubt "everyone" lost with this design change. But at this stage I think it's too early to say definitively what was the net gain/loss. I am under the impression that most devs are not rushing to upgrade to Core 3.0, so it will take some time. |
Just realize that there are a lot of devs, like me, that felt that EF Core 1.x / 2.x was a big step backwards from EF6 where in EF Core 2.x there is no way to issue certain queries without them being broken up into multiple queries. I can provide samples of seemingly simple queries on EF Core 2.x that would run a subquery individually for EVERY result in the primary table, totally breaking performance! I've had to write and rewrite queries many times to get EF 2.x to produce the desired SQL. For us, EF 3.0 is a pleasant relief, and while it would be nice if there was a 'multi-query mode' for EF 3.0, I'd rather take the 3.0 behavior over the 2.x behavior any day. Ideally, I'd like to see a parameter added to Include to indicate multi-query behavior (e.g. |
It is possible as written in the gist. |
Do you mean the opening post? Because afaik EF Plus needs tracking enabled. |
In #20892 we are getting the For EfCore 3.1 I'm using this as a workaround to fix this using System;
using System.Data.Common;
using System.Linq;
using System.Reflection;
using Microsoft.Data.SqlClient;
using Microsoft.EntityFrameworkCore.Storage;
using Microsoft.EntityFrameworkCore.Storage.Internal;
namespace Test.Application.Data.EfCoreOverrides
{
// register this as IRelationalCommandBuilderFactory in the service provider
internal class ClonableRelationalCommandBuilderFactory : RelationalCommandBuilderFactory
{
public ClonableRelationalCommandBuilderFactory(
RelationalCommandBuilderDependencies dependencies)
: base(dependencies)
{
}
public override IRelationalCommandBuilder Create()
{
return new ClonableRelationalCommandBuilder(Dependencies);
}
}
[System.Diagnostics.CodeAnalysis.SuppressMessage("Usage", "EF1001:Internal EF Core API usage.", Justification = "<Pending>")]
internal class ClonableRelationalCommandBuilder : RelationalCommandBuilder
{
private static readonly FieldInfo _parameterFieldInfo = typeof(RawRelationalParameter)
.GetField("_parameter", BindingFlags.Instance | BindingFlags.NonPublic);
public ClonableRelationalCommandBuilder(
RelationalCommandBuilderDependencies dependencies) : base(dependencies)
{
}
public override IRelationalCommandBuilder AddParameter(IRelationalParameter parameter)
{
if (parameter is CompositeRelationalParameter compositParameter)
{
var clonedParameters = new CompositeRelationalParameter(compositParameter.InvariantName, compositParameter.RelationalParameters
.Select(x =>
{
if (x is RawRelationalParameter rawParameter)
{
var dbParameter = (DbParameter)_parameterFieldInfo.GetValue(rawParameter);
return new ClonableRawRelationalParameter(x.InvariantName, dbParameter);
}
return x;
})
.ToList());
return base.AddParameter(clonedParameters);
}
return base.AddParameter(parameter);
}
}
[System.Diagnostics.CodeAnalysis.SuppressMessage("Usage", "EF1001:Internal EF Core API usage.", Justification = "<Pending>")]
internal class ClonableRawRelationalParameter : RawRelationalParameter
{
private static readonly FieldInfo _parentFieldInfo = typeof(SqlParameter)
.GetField("_parent", BindingFlags.Instance | BindingFlags.NonPublic);
public ClonableRawRelationalParameter(string invariantName, DbParameter parameter) : base(invariantName, parameter)
{
}
public override void AddDbParameter(DbCommand command, object value)
{
if (value is SqlParameter
&& !(_parentFieldInfo?.GetValue(value) is null))
{
base.AddDbParameter(command, (DbParameter)((ICloneable)value).Clone());
return;
}
base.AddDbParameter(command, value);
}
}
} |
I think the new issue-number is wrong, should be #22483 |
Yes, my 2 key is faulty. |
As a side effect of this workaround I have noticed that the List<> properties are now null, where earlier they would be just empty Lists. This means I need to check against null in many other places where i would just do an Any() |
I'm surprised to see so few mentions of the increased memory usage caused by the "single query / no automatic splitting" feature in EF Core 3. I've just hunted down a tenfold memory increase after moving to EF Core 3.1.10 (from 2.2.6) and the cause was that the query was no longer automatically split into 5 simpler queries. This has a huge impact. FYI, the EF Core 2 query looked like: var company = await _dbContext.Companies
.Include(c => c.Sites)
.Include(c => c.Categories)
.Include(c => c.Assets).ThenInclude(a => a.Site)
.Include(c => c.Assets).ThenInclude(a => a.Tasks)
.AsTracking()
.SingleOrDefaultAsync(c => c.Id == companyId); |
@dstj Please open a new issue and attach a small, runnable project or post a small, runnable code listing that reproduces what you are seeing so that we can investigate. |
Closing this discussion issue since 5.0 has now shipped with split-query support for most scenarios. Remaining work is tracked by #21234. |
FYI, we had a chance to test this today with the original query and 5.0 did resolve the performance problem (we had to enable Split Query mode obviously). I am noticing side effects of #22283, but we have our work around with EF Plus for now. Thanks for addressing this and keeping the 🦄 magic going! 👍 |
Updated the pending item issue number to #21234 |
This change is broken logic. I'm now getting multiple parent records loaded because I want to include some child elements. That makes no sense at all. The change seems to be motivated by performance but at the expense of the logic? The fix is to put back many queries negating any performance improvement anyway. This is a terrible change. Giving it a fix name "split-query" just hides the fact that it's actually broken logic. I don't want split-query I want proper parent loading. I'm upgrading to netcore 3.1 rather than 5. |
It's not clear to me what your problem is, but upgrading from 2.1 to 3.1 probably won't give you the outcome you want. Split query was introduced in 5.0 to alleviate the problem of the query being resolved in a single SQL statement (as in the older EF 6). |
"I'm now getting multiple parent records loaded because I want to include some child elements." I use .Include/.ThenInclude which was working very well in 2.1. The problem is the join syntax in 3.1 now loads many parents as it joins to the child tables rather than independently applying any where expression. It's different, incorrect, logic. Previously the join did not exist and child elements were queried independently as they should be for a many-one relationship. Otherwise you get duplicated parent data (performance) and too many records loaded, (bad logic). You shouldn't be using a join to simulate many queries with similar where clauses - it breaks the relational model. My options seem to be to implement my own independent queries on 3.1 or upgrade to 5.0 and use "split-query" (which should be the default anyway). |
The problem you describe is exactly why split query was introduced, so you would be best served by upgrading to 5.0 instead of 3.1 (unless you have a blocker). Enabling split queries globally allows you to make that the default. |
Thanks for the advice. I have made progress with net core 3.1 and EF 5.0 with single queries. It's looking much better now so very much appreciated. One remaining problem, views. We use views in our solution in the same way as table with .Include/.ThenInclude to control loading of child objects to the view. That doesn't appear to work for views now (we used to have views as tables in the content with .ToTable). I have tried .ToTable & ToView in the Context but no luck. Can you confirm if this is another know issue or intended behaviour for some reason? Is there a work around? |
Glad that you've been able to make progress. Take a look at Keyless Entity Types for views. If you need further help, I'd recommend asking on Stack Overflow as you'll get much better help there than on a unrelated closed issue. |
Note:
Preview6 introduces
AsSplitQuery
API to load collections using multiple queries. Details of the API has been posted #20892 (comment)A way to rewrite the query to avoid large result set & mimick previous version behavior is posted https://gist.github.com/smitpatel/d4cb3619e5b33e8d9ea24d3f2a88333a
Note: Per @roji's request, I am opening this in response to the comments on #17455
I have a very large query that contains about 57 includes that is being used to copy down a large entity so that it can be modified and cloned.
With EF 2.2.6, this large query ran successfully in about 1-3 seconds (variable). With the changes in 3.0 (all includes create one entire SQL statement with joins), the query takes significantly longer and always times out with the default execution timeout settings.
Steps to reproduce
Use a (IMO nasty) Linq query similar to the one as follows:
Results:
With EF Core 2.2.6 - I can see in the output via the SQL Server Profiler that EF is breaking up the LINQ statement into smaller queries. The overall process takes about 1-3 seconds.
With EF Core 3.0 - I can see in the output via the SQL Server Profiler that EF is emitting one massive query with lots and lots of joins. The overall process always times out with the default execution timeout setting.
At this point, I am open to the notion that this query needs to be either re-written or the process needs to be changed for handling the cloning. I would still like to hear if there are any workarounds, findings that this is a bug or other suggestions to avoid having to devote a significant amount of time on rewriting.
Edit
For now we worked around this by splitting the query up manually using EF Plus' "Includes Optimized" method and then looping through the change tracker to set all of the entities as untracked so we can then reset their keys so that the graph can be comitted as a clone (this gave me a flashback to my EF 6 days).
Note: The model changed somewhat between the time this issue was first encountered and now due to user requests and other factors. I should also note that the system is now in production and users are pretty happy with the performance.
My team was struggling with this at first due to the fact that EF was at first wiping out the entities when we detatched them due to an issue on #18982. Using the work around that was posted there allowed for things to work. The overall performance is actually better due to the fact there isn't any client side evaluation. However, I would still prefer if the behavior of Includes Optimized (which is pretty much what EF 2.x did, splitting the query) was something that came out of the box with EF Core. I also do not like it how this entire scenario can no longer be done with a no tracking query (or so it seems), as it was possible to do it before with EF Core 2.x.
Further technical details
EF Core version: 3.0.0
Database provider: Microsoft.EntityFramework.SqlServer
Target framework: .Net Core 3.0
Operating system: Windows 10 x64 / Windows Server 2016
IDE: Visual Studio 2019 Pro (16.3)
The text was updated successfully, but these errors were encountered: