Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #15173

Closed
blarsern opened this issue Mar 27, 2019 · 42 comments · Fixed by #16046
Closed

Memory leak #15173

blarsern opened this issue Mar 27, 2019 · 42 comments · Fixed by #16046
Assignees
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported type-bug
Milestone

Comments

@blarsern
Copy link

We have been experiencing massive memory leaks in all our microservices after upgrading to net core 2.2.3 and EF 2.2.3. (From 2.0)

And we have narrowed it down to the simplest app which still leaks.
It's a new core 2.2 webapi, added one context, one table, added context in startup.
One simple controller with one simple context query.

Steps to reproduce

Download this:
GutHub MemTestRepo
Fix connection string.
Create webapp in azure.
Set ASPNETCORE_ENVIRONMENT to Development.
Set always on webapp.
Create the DB.
Deploy to azure.

Now we are running some custom loadtest with 20 clients, 100 requests per client, repeated 4 times. 8k requests in total.
To this endpoint: api/memtest/test/1111111

At the red line 8k requests (24k requestst in total)
image

The memory keeps raising until azure restarts it:
(After getting OutOfMemory exceptions)
image

Another one:
image

If we remove the context from the controller, and AddDbContext in startup, doing a simple array create of 1mb, no DB access.
It looks like this after 32k requests:
image

Another one:
image

What are we missing ??
Azure problem or EF problem ?
I can't believe we are the only ones experiencing this.. :)

We have memory dumps etc but not so easy to figure out whats causing this.

@khalidabuhakmeh
Copy link

My first guess would be object tracking, based on the controller code.

var entities = await Context.MemTestItems.Where(x => x.Created >= fromDate).ToListAsync();

Have you tried using AsNoTracking to see if it makes a difference?

@khalidabuhakmeh
Copy link

Also looking at the Startup file, I notice that Entity Framework is registered using the ServiceCollection. If the context is not registered as transient, then it may be surviving past the request lifecycle. This means the same context could be holding on to the same objects across requests.

I recommend trying to call Dispose on the context to see if it resolves the issue or breaks something bad. If it does, then it leads me to think you are using the same context across multiple requests.

@smitpatel
Copy link
Contributor

If the context is not registered as transient, then it may be surviving past the request lifecycle.

The best default is Scoped. You don't have to register context as transient.

@yyjdelete
Copy link

yyjdelete commented Mar 28, 2019

Try set ServerGarbageCollection to false in your .csproj? The Server GC maybe too lazy to recycle memory in some case.

  <PropertyGroup>
    <ServerGarbageCollection>false</ServerGarbageCollection>
  </PropertyGroup>

@blarsern
Copy link
Author

Tried one run with both AsNoTracking and forcing Transient on the AddDbContext.
Same result:
image

I doubt turning of server garbage collection has any impact, if so the GC is clearly broken ;)
But i can always try it.

@blarsern
Copy link
Author

Tried with this:

    <ServerGarbageCollection>false</ServerGarbageCollection>
    <ConcurrentGarbageCollection>true</ConcurrentGarbageCollection>

Well it certainly use less memory overall, but memory still increases over time without any requests.

image

@larserikfinholt
Copy link

We've (@blarsern is my colleague) just tried with the sample on another Azure subscription (but with the same app plan - S1). Then we get the following result.
image
We've applied the same load test.
As you can see we're using less memory, and the private bytes is not increasing.
We've also tried comparing the resource groups and can't spot any difference.

Please note that the error is not present when we remove the call to EF. This indicates that the error is somehow related to the combination of EF and azure subscriptions.

@ajcvickers
Copy link
Contributor

@blarsern Does this only happen with "ASPNETCORE_ENVIRONMENT set to Development"?

@larserikfinholt
Copy link

@ajcvickers No, this problem was first found in production, so I would not think it is related to ASPNETCORE_ENVIRONMENT = Development

@larserikfinholt
Copy link

When we create a new WebSite in one of our existing three AppPlans:
image

When we create new WebSite in a new AppPlan everything looks ok:
image

Any ideas why EF is causing website to leak memory when used in our existing AppPlans?

@ajcvickers
Copy link
Contributor

@blarsern @larserikfinholt Another couple of suggestions for things to check:

  • Do you have AppInsights or other diagnostics enabled both when it repros and when it doesn't?
  • Is the exact same patch version of EF Core being used in both placess?

@blarsern
Copy link
Author

AppInsights is disabled it both places.

It's the same project beeing deployed. So it's the same patch version 2.2.3.

@ajcvickers

@divega
Copy link
Contributor

divega commented Mar 30, 2019

@blarsern Could you create a version of the repro that only uses ADO.NET APIs (e.g. SqlConnection, SqlCommand, SqlDataReader) to execute the same SQL and run it against the app service plans that show the symptoms?

This could help narrowing things down.

@davidfowl
Copy link
Member

Can you share the dump?

@blarsern
Copy link
Author

Memory dumps:
https://drive.google.com/open?id=1RqX-CmWUgh2XSpD9uv93hZfjPrDHWPpF

First one is some time after 8k requests.
Second is 25 minutes later. No requests in this period.
And the third one is after additional 8k requests.

@davidfowl

@blarsern
Copy link
Author

Ok tried using a SQlDataReader:
It stabilized at 46mb, 32k requests.
image

Then switched back to DBContext, 24k requests:
And it starts to increase on it's own:
image

@divega

@divega divega added this to the 3.0.0 milestone Apr 1, 2019
@divega divega added the type-bug label Apr 2, 2019
@jaseaman
Copy link

jaseaman commented Apr 7, 2019

@blarsern You are not the only one experiencing this, I was on my way here today to make an issue about it.

Similar results on AWS ECS with an increase in memory usage on every query request. It will reach its peak allocated memory and then force terminate the task. We run docker containers with microsoft/dotnet:2.1-aspnetcore-runtime with 200MB soft limit and 400MB hard limit.
Using 3.0.0-preview3.19153.1

We have tried the following :

  • ServerGarbageCollection to false
  • AsNoTracking on all queries

Are there any temporary workarounds for this before 3.0, as while it is not completely critical (as the services restart themselves) it is causing massive headache and concern when under load.

It is noted that left long enough the GC appears to release the memory, but that is after a substantial amount of time, and running even synchronous requests will cause the memory to 'overflow'.

Only thing I can think of is if a mvc filter is keeping the context alive, but it is set to the default scoped, so should be disposed of after every request?

@ErikEJ
Copy link
Contributor

ErikEJ commented Apr 7, 2019

@roji did you identify the root cause?

@roji
Copy link
Member

roji commented Apr 7, 2019

@ErikEJ I haven't had any time to look at this yet (busy times), but plan to in the coming days...

@blarsern
Copy link
Author

Just a follow up, we have a case going on with case id 119031526000920

So during analysis of the dumps they found a a huge amount of diagnostic listeners which where not released.

So if i send 500 requests to the test project locally and take a dump, i get this i VS:
image

If i remove the db context from the controller i get this (also after 500 requests):
image

If i send 40000 requests, i get 40000 diagnostic listeners.

I have tried turning logging off by this code:
.ConfigureLogging((hostingContext, logging) => { logging.ClearProviders(); logging.SetMinimumLevel(LogLevel.None); logging.AddFilter("Default", LogLevel.None); logging.AddFilter("System", LogLevel.None); logging.AddFilter("Microsoft", LogLevel.None); logging.AddFilter(DbLoggerCategory.Database.Name, LogLevel.None); logging.AddFilter(DbLoggerCategory.Database.Connection.Name, LogLevel.None); logging.AddFilter(DbLoggerCategory.Database.Transaction.Name, LogLevel.None); logging.AddFilter(DbLoggerCategory.Database.Command.Name, LogLevel.None); //logging.AddConfiguration(hostingContext.Configuration.GetSection("Logging")); //logging.AddConsole(options => options.IncludeScopes = false); //logging.AddDebug(); })

Seems like there is no way to turn off the diagnostic listener ?
Also tried to turn it off in the web.config but this is ignored by core.
How can i turn it off completely ?

This leak is one per request, and it's reproducible locally.
Our problem in Azure with self increasing leak is something else i think, but we need to get rid of this
diagnostic listener problem before we can move on i guess.

Can you look into why this is leaking ?
@ajcvickers @divega @ErikEJ

@davidfowl
Copy link
Member

Do you get this leak if you turn of intellitrace in visual studio?

@roji
Copy link
Member

roji commented May 22, 2019

@blarsern thanks for the added info, can you confirm whether the leak info also occurs only in one specific Azure app plan as mentioned in #15173 (comment)?

@blarsern
Copy link
Author

@davidfowl
If i turn off intellitrace 5 of the objects disappears, but diagnostic listener remains:
image

And thank you for suggesting this because:
I recently installed a new VM with: Windows Server 2019 / SQL 2019 / VS 2019
And i got severe performance issues with my load tester app.
In the old VM i averaged 80ms on 2000 requests (VS Professional 2017).
In the new VM i average 1000 ms on 2000 requests (VS Enterprise 2019).
And it turns out to be Intellitrace that slows down everything.
So i guess Intellitrace will be off from now on, thanks ;)

@roji
I'm pretty sure the working app service plan also is leaking diagnostic listeners.
I'm no expert in memory dumps so if you want to look into it you can look at these dumps:
https://drive.google.com/file/d/1D0pwX-Yho5iECxuphvnVazmM4ctKql2P/view?usp=sharing
About 10k requests in the second dump.

Note, working app service plan means there is not self increasing memory leak.
But probably has diagnostic listener leaks.

@ajcvickers
Copy link
Contributor

@blarsern This could be related to how logging is configured with EF. Do you have any calls to UseLoggerFactory in your code? Do you create any DiagnosticListener instances in your code?

@blarsern
Copy link
Author

@ajcvickers
If you look at the test project, it is just std logging. No diagnostic listener.

And my post from yesterday shows code that was supposed to turn off logging completely.

@ajcvickers
Copy link
Contributor

@blarsern Is the link above (GutHub MemTestRepo) still the code to repro the Listener leak?

@blarsern
Copy link
Author

@roji
Claiming that this doesn't leak is beyond my understanding.

Explain this test then, 1 mill requests:
image

Result, memory usage:
48 MB Initial
72 MB After 500k requests
113 MB After 500k additional requests

So after 1 mill requests, i have 1 mill diagnostic listeners in memory, the memory has increased
with 65MB.

The leak isn't massive, but it leaks alright.

Anyway, can this problem be fixed ? Time schedule ?
My support case is halted because of this problem, it's going nowhere.

@yyjdelete
Copy link

yyjdelete commented Jun 11, 2019

@roji @blarsern
DiagnosticListener is IDisposable, and add itself to an static field DiagnosticListener.s_allListeners as linkedlist in ctor, which is only removed when IDisposable.Dispose() is called.
https://github.com/dotnet/corefx/blob/43ca5e2aee8c4ada93ab30fc3fd34bb8f4c6224d/src/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/DiagnosticListener.cs#L145-L146

So the TryAddSingleton here may leak an instance of DiagnosticListener, if TryAdd return false here(if be called for the 2rd time, but seems not happen in this case),
https://github.com/aspnet/EntityFrameworkCore/blob/2daec6ae17d415af47a606feb0d9f8cfe21296b8/src/EFCore/Infrastructure/EntityFrameworkServicesBuilder.cs#L303-L304
or Build() and Dispose() is never called for the ServiceCollection(for this case).
https://github.com/aspnet/EntityFrameworkCore/blob/0ba65b8dfc9294b714b6855eef340bdfbfb8d64c/src/EFCore/DbContextOptionsBuilder.cs#L71

StackTrace:


The first(only once)

   at Microsoft.EntityFrameworkCore.Infrastructure.EntityFrameworkServicesBuilder.TryAddCoreServices()
   at Microsoft.EntityFrameworkCore.Infrastructure.EntityFrameworkRelationalServicesBuilder.TryAddCoreServices()
   at Microsoft.Extensions.DependencyInjection.SqlServerServiceCollectionExtensions.AddEntityFrameworkSqlServer(IServiceCollection serviceCollection)
   at Microsoft.EntityFrameworkCore.SqlServer.Infrastructure.Internal.SqlServerOptionsExtension.ApplyServices(IServiceCollection services)
   at Microsoft.EntityFrameworkCore.Internal.ServiceProviderCache.ApplyServices(IDbContextOptions options, ServiceCollection services)
   at Microsoft.EntityFrameworkCore.Internal.ServiceProviderCache.<>c__DisplayClass4_0.<GetOrAdd>b__2(Int64 k)
   at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)
   at Microsoft.EntityFrameworkCore.Internal.ServiceProviderCache.GetOrAdd(IDbContextOptions options, Boolean providerRequired)
   at Microsoft.EntityFrameworkCore.DbContext..ctor(DbContextOptions options)
   at WebAppMemTest.MemtestDbContext..ctor(DbContextOptions`1 options) in ***\WebAppMemTest\MemtestDbContext.cs:line 17
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.Reflection.ConstructorInfo.Invoke(Object[] parameters)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitConstructor(ConstructorCallSite constructorCallSite, ServiceProviderEngineScope scope)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(IServiceCallSite callSite, TArgument argument)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitScoped(ScopedCallSite scopedCallSite, ServiceProviderEngineScope scope)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(IServiceCallSite callSite, TArgument argument)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.Resolve(IServiceCallSite callSite, ServiceProviderEngineScope scope)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.DynamicServiceProviderEngine.<>c__DisplayClass1_0.<RealizeService>b__0(ServiceProviderEngineScope scope)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngine.GetService(Type serviceType, ServiceProviderEngineScope serviceProviderEngineScope)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope.GetService(Type serviceType)
   at Microsoft.Extensions.DependencyInjection.ActivatorUtilities.GetService(IServiceProvider sp, Type type, Type requiredBy, Boolean isDefaultParameterRequired)
   at lambda_method(Closure , IServiceProvider , Object[] )
   ...

All later(repeat for every request)

   at Microsoft.EntityFrameworkCore.Infrastructure.EntityFrameworkServicesBuilder.TryAddCoreServices()
   at Microsoft.EntityFrameworkCore.Infrastructure.EntityFrameworkRelationalServicesBuilder.TryAddCoreServices()
   at Microsoft.Extensions.DependencyInjection.SqlServerServiceCollectionExtensions.AddEntityFrameworkSqlServer(IServiceCollection serviceCollection)
   at Microsoft.EntityFrameworkCore.SqlServer.Infrastructure.Internal.SqlServerOptionsExtension.ApplyServices(IServiceCollection services)
   at Microsoft.EntityFrameworkCore.DbContextOptionsBuilder.<>c.<get_IsConfigured>b__6_0(IDbContextOptionsExtension e)
   at System.Linq.Enumerable.Any[TSource](IEnumerable`1 source, Func`2 predicate)
   at Microsoft.EntityFrameworkCore.DbContextOptionsBuilder.get_IsConfigured()
   at WebAppMemTest.MemtestDbContext.OnConfiguring(DbContextOptionsBuilder options) in **\WebAppMemTest\MemtestDbContext.cs:line 23
   at Microsoft.EntityFrameworkCore.DbContext.get_InternalServiceProvider()
   at Microsoft.EntityFrameworkCore.DbContext.get_DbContextDependencies()
   at Microsoft.EntityFrameworkCore.DbContext.get_Model()
   at Microsoft.EntityFrameworkCore.Internal.InternalDbSet`1.get_EntityType()
   at Microsoft.EntityFrameworkCore.Internal.InternalDbSet`1.CheckState()
   at Microsoft.EntityFrameworkCore.Internal.InternalDbSet`1.get_EntityQueryable()
   at Microsoft.EntityFrameworkCore.Internal.InternalDbSet`1.System.Linq.IQueryable.get_Provider()
   at System.Linq.Queryable.Where[TSource](IQueryable`1 source, Expression`1 predicate)
   at WebAppMemTest.Controllers.MemTestController.Test(String id) in **\WebAppMemTest\Controllers\\MemTestController.cs:line 26
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.Start[TStateMachine](TStateMachine& stateMachine)
   at WebAppMemTest.Controllers.MemTestController.Test(String id)
   at lambda_method(Closure , Object , Object[] )
   ...

@ajcvickers
Copy link
Contributor

@yyjdelete That constructor should not be called every time a context instance is created. It should usually be called once per application.

The reason is that EF should only built it's internal service provider once, and that singleton service should then be re-used. This most often happens when something pathologically causes the internal service provider to be re-built for every context instance. However, the repro code posted does not do this, at least as far as I can tell.

So it's not that line of code that in-of-itself is wrong. It's whatever is causing the instance to be created multiple times. That's assuming that this is the line of code that is creating these instances. Given that this is not an EF type, it could be something else that is creating the instances.

@yyjdelete
Copy link

@ajcvickers
Add the stacktrace in
#15173 (comment)

@ajcvickers
Copy link
Contributor

@yyjdelete I am not able to reproduce this. I see that code be run on the first request, and then not again. In fact, with the repro code provided, IsConfigured should always be returning true (and does so in my tests) so I'm not sure what you have done to make it return false, but maybe you're not running the code as posted?

@roji
Copy link
Member

roji commented Jun 11, 2019

@blarsern

Claiming that this doesn't leak is beyond my understanding.

To be clear, I am not claiming that you don't have a leak. I'm only saying that I've managed to see the spurious instances of DiagnosticListener being allocated, but in my repro they were collected by the GC.

I intend to sit down and dive further into this tomorrow morning, so please hang tight for a bit more time. In the meantime, can you please confirm that you're still seeing this only on one specific Azure subscription, and are unable to produce the leak anywhere else (I asked this above in #15173 (comment) but never got an answer).

@blarsern
Copy link
Author

@roji
The diagnostic listener leak has nothing todo with the app service plan leak.
It's reproducible locally with the provided repo.

The problem is, my support case regarding the app service plan is somewhat halted because we need to get the diagnostic listener fixed first.

Hopefully there is some crazy magic going on when the diagnostic listener leaks in azure. So fixing this will fix our main problem with the self increasing memory leak.
But I have my doubts..

@yyjdelete
Copy link

@ajcvickers
Just the same repro
https://github.com/larserikfinholt/WebAppMemTest
, only change the connStr to use localDb and create the database with dotnet ef database update, and set an bp at DiagnosticListener.ctor/Dispose.

IsConfigured does return true, but the alloc of DiagnosticListener just happen inside IsConfigured.
Microsoft.EntityFrameworkCore.DbContextOptionsBuilder.IsConfigured will call something like Microsoft.EntityFrameworkCore.SqlServer.Infrastructure.Internal.SqlServerOptionsExtension.ApplyServices(IServiceCollection)(or SqliteOptionsExtension, InMemoryOptionsExtension and etc), and then call Microsoft.Extensions.DependencyInjection.SqlServerServiceCollectionExtensions.AddEntityFrameworkSqlServer(IServiceCollection), and then call Microsoft.EntityFrameworkCore.Infrastructure.EntityFrameworkRelationalServicesBuilder.TryAddCoreServices()(or EntityFrameworkServicesBuilder), which will create an new instance of DiagnosticListener without call Dispose() on it.

roji added a commit to roji/efcore that referenced this issue Jun 12, 2019
When IsConfigured is called, it applies all services. This caused a
DiagnosticListener to get instantiated on each DbContext instantiation,
and since it wasn't disposed it caused a leak.

Fixes dotnet#15173
@roji
Copy link
Member

roji commented Jun 12, 2019

OK, I've looked into this and I can confirm that essentially what @yyjdelete wrote above is correct - thanks for your analysis. Some additional details:

Commit 188dbf4 changed the DbContextOptionsBuilder.IsConfigured property to call ApplyServices() on all options extensions. This is generally safe, as the various TryAdd() invocations don't actually do anything. But the DiagnosticListener is specifically added with the overload that accepts an instance, rather than a factory lambda, causing spurious allocations. In addition, as the instance is never disposed and undisposed listeners are referenced in DiagnosticListener.s_allListeners, this is a memory leak.

I've opened #16046 to fix the allocation issue (@ajcvickers have also added the consider-for-servicing label, hopefully that's the right process). The workaround is currently to avoid using IsConfigured - @blarsern this should allow you to see if your original Azure issue still happens without waiting for anything.

Note that regardless of the leak, ApplyServices() also has a perf cost - opened #16045 to track.

roji added a commit that referenced this issue Jun 12, 2019
When IsConfigured is called, it applies all services. This caused a
DiagnosticListener to get instantiated on each DbContext instantiation,
and since it wasn't disposed it caused a leak.

Fixes #15173
roji added a commit that referenced this issue Jun 12, 2019
When IsConfigured is called, it applies all services. This caused a
DiagnosticListener to get instantiated on each DbContext instantiation,
and since it wasn't disposed it caused a leak.

Fixes #15173

(cherry picked from commit 904ad9c)
@roji
Copy link
Member

roji commented Jun 12, 2019

PR for 2.2: #16047

ajcvickers pushed a commit that referenced this issue Jun 13, 2019
When IsConfigured is called, it applies all services. This caused a
DiagnosticListener to get instantiated on each DbContext instantiation,
and since it wasn't disposed it caused a leak.

Fixes #15173

(cherry picked from commit 904ad9c)
@blarsern
Copy link
Author

I can confirm that by not calling IsConfigured solved the DiagnosticListener leak.

So a new test in Azure:
After 300k requests, there was no self increasing memory leak which was not really expected:
image

So i switched back to calling IsConfigured, and i got this result:
image
So suddenly the problem with increasing memory was gone.

So after checking our other app service plan / app services , there has been made some changes in Azure around June 1st:
image

We have not touched the environment at all, no new deployments etc
Azure does not yet run 2.2.6 so the Diagnostic Listener problem should still be there.

Silently fixed i guess ;)

@roji
Copy link
Member

roji commented Jun 14, 2019

@blarsern thanks for confirming! And thanks also @yyjdelete, your analysis and proposed solution were spot on.

@ajcvickers
Copy link
Contributor

@yyjdelete Agree with @roji. You were pointing at the root cause the whole time; I just didn't see it. Great work and many thanks for finding this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported type-bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.