-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document the increased memory requirements associated with connection resiliency #2902
Comments
@dstj can you confirm that the split query feature re-introduced in the 5.0 release makes the issue disappear? This seems like it could simply be a natural result of the cartesian explosion that single query can trigger... @smitpatel is it possible we materialize duplicated data? If not it could also be happening undernear in SqlClient or something. In any case, a repro would be useful for further investigation. |
@roji I'll try that as well and get back to you. I assume it will. 🤞 It does look like a LOT of duplicated data is created in memory. Memory profiling showed me about 1.2 million copies of a class that should have been there like 20-25 times... I broke my request into two requests and memory usage went down to ~125 MB (similar to EF Core 2) New version: var company = await _dbContext.Companies
.Include(c => c.Sites)
.Include(c => c.Categories)
.SingleOrDefaultAsync(c => c.Id == companyId);
var assets = await _dbContext.Assets
.Include(a => a.Tasks)
.AsTracking()
.Where(c => c.CompanyId == companyId)
.ToListAsync(); Obviously, this 1.3 GB used is dependent on the actual amount the data fetched from the DB, but clearly, there's a problem with the underlying single-query logic. At the very least, an Error or Warning should make its way in the logs. |
It may be because of identity resolution rather than split query Try using |
@smitpatel I thought that as well, but the original query (the problematic one) is |
I misread that as AsNoTracking. My bad. |
In that case, there should not be more than one copy of each entity with given key. Further even if single query brings a lot more data from server, duplicated data is not materialized again. Or even read from the DataReader when not needed. So it increasing client side memory is sus. |
No worries, I should have mentioned that I do I'll try to get a small repro done tonight. Hopefully, it won't be too hard to do and will help pinpointing the root cause. 🤞 |
@roji Result of the first test: EF Core 5 with Without it, same huge memory usage than EF Core 3, but with this warning in the logs:
(Now, onto the small repro attempt...) |
@roji @smitpatel Alright, some more info, not yet a repro that I can share, but a good step in the right direction. It seems related to the Here's my test case: /* Test description:
see https://github.com/dotnet/efcore/issues/23461
*/
public class Tests
{
public const string ConnectionString = "Server=127.0.0.1;Port=5432;Database=pym;User Id=postgres;Password=postgres";
[Test]
public async Task DoTheTest()
{
var companyId = Guid.Parse("94d6c7e8-15f8-411a-a8bd-2e8dfb0da8ab");
var retryOptions = new DbContextOptionsBuilder<MyDbContext>()
.UseNpgsql(ConnectionString, cfg => { cfg.EnableRetryOnFailure(); });
var noRetryOptions = new DbContextOptionsBuilder<MyDbContext>()
.UseNpgsql(ConnectionString);
await DoRequestAsync(noRetryOptions.Options, "NO RETRY 1", companyId);
await DoRequestAsync(retryOptions.Options, "W/ RETRY", companyId);
}
private static async Task DoRequestAsync(DbContextOptions<MyDbContext> dbContextOptions, string method, Guid companyId)
{
using (var context = new MyDbContext(dbContextOptions)) {
Console.WriteLine("{0} - BEFORE REQUEST - Memory Used {1:F2} MB", method, MemorySetHelper.GetPrivateMemoryInMb());
var company = await context.Companies
.Include(c => c.Sites)
.Include(c => c.Categories)
.Include(c => c.Assets).ThenInclude(a => a.Site)
.Include(c => c.Assets).ThenInclude(a => a.Tasks)
.AsTracking()
.SingleOrDefaultAsync(c => c.Id == companyId);
Console.WriteLine("{0} - AFTER REQUEST - Memory Used {1:F2} MB", method, MemorySetHelper.GetPrivateMemoryInMb());
}
}
} And the results:
|
@dstj when a retrying execution strategy is used, EF Core buffers the entire resultset in memory before starting to process any results (this way, if any error happens during the loading of the resultset, the query can be safely retried). If the resultset is huge, this leads to increased memory usage, although that memory is released for garbage collection as soon as the query is finished. This isn't really related to either split/single query, or tracking/no-tracking. |
@roji Hmm, so you're saying this 65MB to 1.3GB (about 2100%) memory usage is, essentially, a side-effect of a feature? I had no idea. I had read this doc page about Connection Resiliency and simply wanted any transient network error to be retried (during connection mostly). There's no mention of the buffering of the resultset on that page, nor any possible immensely negative impact on memory consumption. I know I certainly didn't expect it. While not really related, the "Single query" breaking change in EF Core 3 certainly shoved that one into the light for me though ;) |
Yes, one could think of it that way - and I agree that the buffering aspect of retrying strategies should be documented better. Note when using split queries, the results from the earlier queries must also be buffered (regardless of retrying strategies or not), unless MARS is used with SQL Server (while MARS comes with its own set of problems). At least that part is documented. However, stepping back... While it's true that connection resiliency increases memory usage, it's important to understand that even without connection resiliency, that 1.3GB is being transferred across the network. That is, without resiliency there's memory impact (since EF does streaming instead of buffering), but there's very much a problematic perf phenomenon, which you likely want to avoid. The root cause of having a 1.3GB resultset is the cartesian explosion - since split queries make it disappear - and so that's the main point here, not the resiliency buffering effect. To summarize, avoid transferring huge resultsets as a general rule; although these have an adverse effect on memory only when resiliency is activated, they are problematic perf-wise for other reasons as well. PS If what you're after is only connection resiliency and are using SqlClient, you can enable that at the SqlClient level, without EF core doing anything (see the SqlClient docs). |
Thanks @roji for your thorough explanation. Much appreciated. Good point about the network transfer! Yes, the cartesian product is definitely bad and should be avoided. Automatic query splitting hid it (avoided it) from me in EF Core 2. That makes me think, maybe a feature suggestion here: just as we had a setting to throw on Client-side evaluations in EF Core 2, there could be a setting to throw on cartesian products detection unless a Thanks again. Update
I should have just skipped EF Core 3 😉 |
Yep, we warn on potentially bad cartesian explosion in 5.0, and give you the option to throw as well. I've transferred this to our docs repo to add proper documentation that resiliency causes buffering. |
I've just hunted down a tenfold memory increase after moving to EF Core 3.1.10 (from 2.2.6) and the cause was that the query was no longer automatically split into 5 simpler queries by EF Core, i.e. the single query / no automatic splitting feature of EF Core 3.
EF Core 2 - ~120 MB used

EF Core 3 - ~1.3 GB used

This has a huge impact.
FYI, the EF Core 2 query looked like:
Note: I'll try to create a small project to reproduce the issue a bit later.
--
Originally posted by @dstj in dotnet/efcore#18022 (comment)
The text was updated successfully, but these errors were encountered: