Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large unmanaged memory growth (leak?) when upgrading from .NET 6 to 8 #95922

Closed
SamWilliamsGS opened this issue Dec 12, 2023 · 191 comments
Closed
Labels
area-GC-coreclr tenet-performance Performance related issue
Milestone

Comments

@SamWilliamsGS
Copy link

Description

We have a few different services hosted on kubernetes running on .NET. When we try to upgrade from .NET 6 to .NET 8, we see a steep but constant increase in memory usage, almost all in unmanaged memory. It seems to level off at around four times the memory usage in .NET 6, ignoring imposed memory limits, then continues to creep up more slowly depending on workload. So far we haven't seen an upper bound on the amount of unmanaged memory being leaked(?) here. Reproducing the problem in a minimal way has not been possible so far but we do have lots of data gathered about it. 🙂

Configuration

.NET 8, from the docker image mcr.microsoft.com/dotnet/aspnet:8.0, running on x86-64 machines on AWS EC2.

Regression?

Yes, see data below. This issue does not occur on .NET 6, only on 8. We think it might be part of the GC changes from .NET 6 to 7. Give us a shout and we can try to narrow this down by running it on .NET 7.

Data

Initially we switched from .NET 6 to .NET 8 and we monitored memory usage using prometheus metrics. This is what the memory usage graphs look like. Both pods actually reached the 512m limit we'd imposed, and was restarted. After that we reverted to .NET 6, and things went back to normal. On .NET 6, memory usage remained consistently around ~160MB, but as soon as we deployed the upgrade to .NET 8 the memory increased without limit and were restarted once at 15:30 after hitting 512MB, once we returned to .NET 6 things went back to normal.
image

We then tried increasing the available memory from 512MB to 1GB and re-deployed .NET 8. It increased rapidly as before, then levelled off at about 650MB and stayed that way until midnight. Service load increases drastically around that time and the memory grew again to about 950MB, where it stayed relatively level again until the service was unwittingly redeployed by a coworker. At that point we reverted back to .NET 6, where it went back to the lower memory level. I think it would have passed the 1GB memory limit after another midnight workload, but we haven't tested that again (yet).
image

After trying and failing to reproduce the issue using local containers, we re-deployed .NET 8 and attached the JetBrains dotMemory profiler to work out what was happening. This is the profile we collected, showing the unmanaged memory increases. Interestingly, the amount of managed memory actually goes down over time with GCs becoming more frequent, presumably .NET knows the available memory is running low as the total approaches 1GB. There also seem to be some circumstances where .NET will not allocate from unmanaged memory, since the spikes near the left hand side mirror each other for managed and unmanaged. We had to stop the profile before reaching the memory limit, since kubernetes would have restarted the pod and the profile would have been lost.
image
And the prometheus memory usage graph, for completeness (one pod is higher than the other because it was running the dotMemory profiler this time, and drops because of detaching the profiler):
image

Analysis

The only issue we could find that looked similar was this one, which also affects aspnet services running in kubernetes moving to .NET 7: #92490. As it's memory related we suspect this might be to do with the GC changes going from .NET 6 to 7. We haven't been able to get a clean repro (or any repro outside our hosted environments) yet, but please let us know if there's anything we can do to help narrow this down. 🙂

@SamWilliamsGS SamWilliamsGS added the tenet-performance Performance related issue label Dec 12, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Dec 12, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Dec 12, 2023
@vcsjones vcsjones added area-GC-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Dec 12, 2023
@ghost
Copy link

ghost commented Dec 12, 2023

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

We have a few different services hosted on kubernetes running on .NET. When we try to upgrade from .NET 6 to .NET 8, we see a steep but constant increase in memory usage, almost all in unmanaged memory. It seems to level off at around four times the memory usage in .NET 6, ignoring imposed memory limits, then continues to creep up more slowly depending on workload. So far we haven't seen an upper bound on the amount of unmanaged memory being leaked(?) here. Reproducing the problem in a minimal way has not been possible so far but we do have lots of data gathered about it. 🙂

Configuration

.NET 8, from the docker image mcr.microsoft.com/dotnet/aspnet:8.0, running on x86-64 machines on AWS EC2.

Regression?

Yes, see data below. This issue does not occur on .NET 6, only on 8. We think it might be part of the GC changes from .NET 6 to 7. Give us a shout and we can try to narrow this down by running it on .NET 7.

Data

Initially we switched from .NET 6 to .NET 8 and we monitored memory usage using prometheus metrics. This is what the memory usage graphs look like. Both pods actually reached the 512m limit we'd imposed, and was restarted. After that we reverted to .NET 6, and things went back to normal. On .NET 6, memory usage remained consistently around ~160MB, but as soon as we deployed the upgrade to .NET 8 the memory increased without limit and were restarted once at 15:30 after hitting 512MB, once we returned to .NET 6 things went back to normal.
image

We then tried increasing the available memory from 512MB to 1GB and re-deployed .NET 8. It increased rapidly as before, then levelled off at about 650MB and stayed that way until midnight. Service load increases drastically around that time and the memory grew again to about 950MB, where it stayed relatively level again until the service was unwittingly redeployed by a coworker. At that point we reverted back to .NET 6, where it went back to the lower memory level. I think it would have passed the 1GB memory limit after another midnight workload, but we haven't tested that again (yet).
image

After trying and failing to reproduce the issue using local containers, we re-deployed .NET 8 and attached the JetBrains dotMemory profiler to work out what was happening. This is the profile we collected, showing the unmanaged memory increases. Interestingly, the amount of managed memory actually goes down over time with GCs becoming more frequent, presumably .NET knows the available memory is running low as the total approaches 1GB. There also seem to be some circumstances where .NET will not allocate from unmanaged memory, since the spikes near the left hand side mirror each other for managed and unmanaged. We had to stop the profile before reaching the memory limit, since kubernetes would have restarted the pod and the profile would have been lost.
image
And the prometheus memory usage graph, for completeness (one pod is higher than the other because it was running the dotMemory profiler this time, and drops because of detaching the profiler):
image

Analysis

The only issue we could find that looked similar was this one, which also affects aspnet services running in kubernetes moving to .NET 7: #92490. As it's memory related we suspect this might be to do with the GC changes going from .NET 6 to 7. We haven't been able to get a clean repro (or any repro outside our hosted environments) yet, but please let us know if there's anything we can do to help narrow this down. 🙂

Author: SamWilliamsGS
Assignees: -
Labels:

tenet-performance, area-GC-coreclr, untriaged

Milestone: -

@MichalPetryka
Copy link
Contributor

We think it might be part of the GC changes from .NET 6 to 7

Does setting export DOTNET_GCName=libclrgc.so(this reverts to the old GC) fix this issue?

@mangod9
Copy link
Member

mangod9 commented Dec 12, 2023

It could also be related to this issue if its a continuous memory growth: #95362. Are you able to collect some GCCollectOnly traces so we can diagnose further?

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Dec 12, 2023
@mangod9 mangod9 added this to the 9.0.0 milestone Dec 12, 2023
@taylorjonl
Copy link
Contributor

taylorjonl commented Dec 13, 2023

We are experiencing the same issue and are running on 7.0.4. From the below graph you can see that the native memory goes up while the GC heap size stays flat: image
Then the PODs recycle either from a release or from manual intervention. I will attempt rolling back to the old GC but it is peak season so management may not allow it. We also are in a security hardened environment so running any profilers and/or diagnostic tools would likely take an act of congress so I am eager to see how your testing goes since it appears you are able to do more profiling.

@SamWilliamsGS
Copy link
Author

Sorry for the slow response here. Good to hear we're not the only ones seeing this @taylorjonl!

@MichalPetryka we tried the old GC setting but unfortunately no dice, the memory graphs look the same as before 😢. We'll try out more of the suggestions here after new year but I'm on holiday until then, so this issue might go quiet for a bit. Thanks everyone for the help so far!
image

@MichalPetryka
Copy link
Contributor

Maybe it's W^X or #95362 like mentioned before. Can you try export DOTNET_EnableWriteXorExecute=0?

@janvorli
Copy link
Member

W^X should not cause unbound native memory growth.
There are other sources of native growth I have seen in my debugging similar issues for customers in the past few months:

It is also possible that the native memory leak is caused by a tiny GC memory leak - a case when tiny managed object holds a large block of native memory alive. You would not see such a leak on the GC memory graph. That is also a possible cause related to OpenSSL where runtime uses SafeHandle derived types to reference possibly large data structures - like client certificates - allocated by the OpenSSL. I've seen cases when there was a certificate chain upto a 1GB large.

To try to figure out the culprit, it would be helpful to take a dump of the running process at a point when it has already consumed a large amount of memory and then investigate it using a debugger with SOS plugin or the dotnet-dump analyze command. I can provide more details on that.

Also, if you'd be able to create a repro that you'd be able to share with me, I'd be happy to look into it myself.

@SamWilliamsGS
Copy link
Author

It could also be related to this issue if its a continuous memory growth: #95362. Are you able to collect some GCCollectOnly traces so we can diagnose further?

@mangod9 just to double check, since this is happening on linux, would perfcollect with -gccollectonly flag as described here work the same way? 🙂

@SamWilliamsGS
Copy link
Author

Maybe it's W^X or #95362 like mentioned before. Can you try export DOTNET_EnableWriteXorExecute=0?

@MichalPetryka we gave disabling W^X a go today and unfortunately no difference. Is there a nightly docker image we could use to try out the TLS fix? 🙂 (I tried looking at the docs but got a bit mixed up with how backporting works in this repo, sorry!)

@am11
Copy link
Member

am11 commented Jan 2, 2024

Is there a nightly docker image we could use to try out the TLS fix?

For .NET 9 daily build testing, install script can be used as follow:

VERSION=9
DEST="$HOME/.dotnet$VERSION"

# recreate destination directory
rm -rf "$DEST"
mkdir "$DEST"

# download and install
curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --quality daily --channel "$VERSION.0" --install-dir "$DEST"

# add nuget feed
cat > "$HOME/.nuget/NuGet/NuGet.Config" <<EOF
<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <packageSources>
    <add key="nuget.org" value="https://api.nuget.org/v3/index.json" protocolVersion="3" />
    <add key="dotnet$VERSION" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet$VERSION/nuget/v3/index.json" />
  </packageSources>
</configuration>
EOF

PATH="$DEST":$PATH
DOTNET_ROOT="$DEST"
export PATH DOTNET_ROOT
# dotnet --info
# dotnet publish ..
# etc.

after changing net8.0 to net9.0 in *.csproj files, rebuild your docker image. After the testing, revert these changes and rebuild the docker image again (to bring back net8.0).

@jtsalva
Copy link

jtsalva commented Jan 3, 2024

We're also seeing the exact same issue

@janvorli do you still need any memory dumps? Happy to share privately

@janvorli
Copy link
Member

janvorli commented Jan 3, 2024

@jtsalva that would be great! My email address is my github username at microsoft.com

@danmoseley
Copy link
Member

@am11 I wonder whether that docker file might be useful to have in the docs in this repo.

@SamWilliamsGS
Copy link
Author

[snip]

To try to figure out the culprit, it would be helpful to take a dump of the running process at a point when it has already consumed a large amount of memory and then investigate it using a debugger with SOS plugin or the dotnet-dump analyze command. I can provide more details on that.

Also, if you'd be able to create a repro that you'd be able to share with me, I'd be happy to look into it myself.

Thank you very much for the detailed response @janvorli it's really appreciated 🙂 We have a profile from JetBrains' dotMemory and a few snapshots taken with dotTrace. Would those be helpful to you in lieu of a dotnet-dump? (We can probably also get the latter if that's better to have, will need to get clearance to send to you regardless). I've had difficulty getting a clean repro for this since it only seems to happen when it's hosted on kubernetes but will keep working on it 😄

@janvorli
Copy link
Member

janvorli commented Jan 3, 2024

@SamWilliamsGS I need a dump that contains whole dump of the process memory. I am not sure if the dotMemory profile contains that or not. In case you have sensitive data in the dump, I can just explain how to look at the interesting stuff in the dump and you can do it yourself.

@SamWilliamsGS
Copy link
Author

For .NET 9 daily build testing, install script can be used as follow:

Thanks @am11. Looking at #95362 it looks like this fix was backported to .NET 8, is that correct? And does that mean we'd expect to see the fix already in the standard .NET 8 docker images at this point? 🙂

@martincostello
Copy link
Member

The backport PR (#95439) has a milestone of 8.0.2, so I don't think you'll see it in non-daily Docker images for .NET 8 until February's servicing release.

@dlxeon
Copy link

dlxeon commented Jan 4, 2024

@martincostello I'm curious is there any place where information about next planning servicing release date and fixes is published?

So far I only found this page referring to "Patch Tuesday" every month. https://dotnet.microsoft.com/en-us/platform/support/policy/dotnet-core#servicing
Github milestone doesn't have planned release date either https://github.com/dotnet/runtime/milestone/133

We've updated some of services to .net8 and faced similar unmanaged memory leaks.

@martincostello
Copy link
Member

I'm afraid I don't know the definitive answer (I'm not a member of the .NET team), I just know from prior experience that there's typically a release on the second Tuesday of every month to coincide with Patch Tuesday. What makes those releases though is often just detective work from looking at what's going on in GitHub PR/issues as releases with security fixes don't get worked on in public.

@nhart12
Copy link

nhart12 commented Jan 5, 2024

Has anyone gained further insight into this? I'm also experiencing very similar issues and am working on getting some dumps and traces now to further analyze. It seems to be impacting some of our kubernetes (k3s) services deployed to edge locations with tight memory constraints. Previously these services were on .net 6 and would be fairly stable with limited memory (ranges of 128 Mib - 256 Mib).. Since uplifting them to .NET 8 we are experiencing higher base memory usage plus OOMKilling going on quite frequently as memory seems to consistently grow over time with just k8s probes/health-checks running... Enabling DATAS and GCConserve = 9 does seem to greatly improve things, but I still have tests that fail that used to pass on .net 6. The tests in question all do some batch operations that require more memory than normal load and with the higher usage in .net 8 they just cause the POD to get OOMKilled.
It's hard to narrow down exact changes as with every .net uplift there's also countless dependency uplifts also. Most of these services in question also went from EFCore 6 -> EFCore 8. I have tried some of the nightly docker images for aspnet 8 but still seeing these tests fail when doing larger batch operations so there must be some additional regression causing a higher memory footprint

@mangod9
Copy link
Member

mangod9 commented Jan 5, 2024

as memory seems to consistently grow over time

can you quantify the amount of growth? Could be related to #95439 as suggested above.

@nhart12
Copy link

nhart12 commented Jan 5, 2024

With DATAS enabled it doesn't seem to grow (or possibly just has more aggressive GC to account for the unmanaged leak?) .. but memory usage is simply just higher. I'll try to collect some metrics next week. I'll likely have to revert some PODS to .net 6 to get some baselines and compare as we weren't watching it as closely until the OOM's.
Without DATAS enabled, one of our pods (which normally averages ~100Mib) would slowly over a few hours trickle more and more memory until getting killed at 128Mib)

Wouldn't the nightly aspnet images have the TLS leak fix in them?

@Leonardo-Ferreira
Copy link

I actually had the same problem and I found out that it was related to Azure EventHub SDK... one of the guys was instantiating the EventHubProducerClient sending 1 event and DISPOSING it. But never the less a leak is there. When we started reusing the client, the problem resolved.

@denislohachev1991
Copy link

@Leonardo-Ferreira Thank you for your attention and time, but as far as I know we do not use the Azure EventHub SDK

@rzikm
Copy link
Member

rzikm commented Jul 4, 2024

@denislohachev1991 it is hard to gleam any useful information from the screenshot you shared, can you share the trace/report file and in which tool it can be opened? Also, knowing more about the application (e.g. how long the data collection was running, how much traffic it served) would be helpful when examining the trace.

@rzikm
Copy link
Member

rzikm commented Jul 4, 2024

I recently noticed that static file downloads were very slow using http3 and I decided to disable it, leaving only h1 and h2, and both the download and memory problems were resolved.

@Leandropintogit, regarding HTTP/3, do you know what the target server was? We are running HTTP/3 benchmarks and are aware of some performance gaps compared to HTTP/2, but it should still be very usable. Since we wanted to focus a bit on HTTP/QUIC perf for .NET 9, we might want to investigate.

@Leandropintogit
Copy link

I recently noticed that static file downloads were very slow using http3 and I decided to disable it, leaving only h1 and h2, and both the download and memory problems were resolved.

@Leandropintogit, regarding HTTP/3, do you know what the target server was? We are running HTTP/3 benchmarks and are aware of some performance gaps compared to HTTP/2, but it should still be very usable. Since we wanted to focus a bit on HTTP/QUIC perf for .NET 9, we might want to investigate.

What do you mean target server?

My setup
Netcore 8.0.6 with kerstrel
Debian Docker image running on kubernets at Google cloud
8 vcpus
32 gb ram

Rps +/- 300

@denislohachev1991
Copy link

@rzikm How can I share the trace file with you? Я использовал Heaptrack GUI.
Screenshot_9
This is a simple site with a bot configured that simply requests the home page every 5 minutes. It also performs several background tasks, such as fetching emails from the database and sending them. But this site is configured for testing and there is no data to process for background tasks. All static files are stored on S3 and requested via AWS SDK for .NET

@rzikm
Copy link
Member

rzikm commented Jul 8, 2024

What do you mean target server?

@Leandropintogit I mean if you know what HTTP/3 implementation the other server is using (specifically, if it is running .NET as well), Basically enough information that I can attempt to replicate your observations and investigate them.

@rzikm
Copy link
Member

rzikm commented Jul 8, 2024

@denislohachev1991

Looking at the second screenshot, I am not 100% sure we're looking at a memory leak. The heaptrack tool works by tracking malloc/free calls, and then it reports all unfreed memory as leaks. But if you terminate the trace collection while the mallocd memory is actively being used, then it will still get reported as a leak (i.e. it is false positive). Based on the description of the workload the numbers seem appropriate to me and may simply represent a steady state of the application.

To identify something as a leak with greater confidence, you need to either

  • see disproportionate amount of memory being allocated and not freed (we're talking tens or hundreds of MB)
  • observe the suspected allocations long-term and see their number steadily increasing over time

@denislohachev1991
Copy link

@rzikm I'm not sure if this is due to a leak or if this is normal behavior. We have several instances of an application that, over time (it takes about a month or more) consume all the server memory. I was recently looking through the code and found several places where resources were not freed. After that, I started monitoring the test application and from the start of the launch (~200 MB) to one day of work it gains up to (~450 MB). But even these numbers are very different from running the application on a Windows server. On a Windows server, the application consumes ~200 MB. That's why I assumed that the issue was related to a memory leak.

@rzikm
Copy link
Member

rzikm commented Jul 8, 2024

I started monitoring the test application and from the start of the launch (~200 MB) to one day of work it gains up to (~450 MB).

Yep, that is good indication of a leak. it would be good to run it with heaptrack for long enough for these 100+MB to show in the report, it will be easier to isolate the leak from the rest of the live memory. I suggest using dotnet-symbol on all .so files in the application directory (assuming self-contained publish of the app) to download symbols (will show better callstacks in heaptrack).

Another possible issue you are hitting is #101552 (comment), see linked comment for diagnosis step and possible workaround.

@janvorli
Copy link
Member

janvorli commented Jul 8, 2024

@denislohachev1991 could you please get symbols for the .NET shared libraries, like libcoreclr.so etc? Without the symbols, we cannot see where the allocations were coming from. You can fetch the symbol files using the dotnet-symbol tool - just call it on the related .so file and it will fetch its .so.dbg file to the same directory where the library is located. The heaptrack should then be able to see them. You can use wildcards to fetch symbols for all the libxxxx.so in the dotnet runtime location.
You can install the dotnet-symbol using the dotnet tool install -g dotnet-symbol command.

@denislohachev1991
Copy link

@janvorli Hello. I did as advised, here are my settings for self-contained the application.
Publish
After that, I get the symbols for all *.so files.
Symbols
I transferred all the files to the Linux server and launched the application using heaptrack. Now I'm waiting for the application to work for a long time. After that, I can provide you with the heaptrack.dotnet.13121.zst file if you need it. I also want to dump the file using dotnet-dump collect after the application has been running overnight. Thanks for your time.

@denislohachev1991
Copy link

denislohachev1991 commented Jul 9, 2024

After the application worked for about 2 hours I got the following results heaptrack.
Summary
CallTree
Sort
I don't know if this will be useful. Our system works like this: we have 1 server as a load balancer, nginx is installed on it and certificates are stored on this server. We also have 2 servers where the application itself runs under Kestrel. Nginx works as a proxy.

@rzikm
Copy link
Member

rzikm commented Jul 9, 2024

This shows the same thing as on your earlier report, those 37 MB "leaked" can very well be live memory.

To be able to see anything useful, we need a report where we can see the 200 MB increase you mentioned in your previous message

I started monitoring the test application and from the start of the launch (~200 MB) to one day of work it gains up to (~450 MB).

Can you try running the collection for one day or more?

@janvorli
Copy link
Member

janvorli commented Jul 9, 2024

@denislohachev1991 on glibc based linux distros, each thread consumes 8MB of memory for its stack by default. It looks like most of the memory in your log comes from that. You can try to lower that size e.g. to 1.5MB by setting the DOTNET_DefaultStackSize=0x180000 and see if that reduces the memory size significantly. I would try that with running your app for the ~2 hours as you did for the previous results so that it is comparable.
Also, as @rzikm said, it would be great to let your app run after that longer until the consumption goes to the high numbers you were seeing before.

@denislohachev1991
Copy link

@janvorli As soon as you advised, I set DOTNET_DefaultStackSize=0x180000 and started the application.
Screenshot_15
I'll watch how it consumes memory after these changes. I also launched another instance of the long-term monitoring application heaptrack.

@Leandropintogit
Copy link

What do you mean target server?

@Leandropintogit I mean if you know what HTTP/3 implementation the other server is using (specifically, if it is running .NET as well), Basically enough information that I can attempt to replicate your observations and investigate them.

Hi
There is no other server. Just kestrel listening on 443 port using H1/H2/H3 protocols.

Before
listenOptions.Protocols = HttpProtocols.Http1AndHttp2AndHttp3

After
listenOptions.Protocols = HttpProtocols.Http1AndHttp2;

Tested using Chrome and Edge

@denislohachev1991
Copy link

I set the DOTNET_DefaultStackSize=0x180000 variable and monitored the application after that. This change had virtually no effect on the operation of the application. I also received heaptrack after 21 hours of using the application. Having examined heaptrack I do not see significant changes from the previous ones.
Heap
Heap1
Heap2
Therefore, it seems to me that this is normal application behavior, since an application on Linux consumes significantly more memory compared to Windows, this is just my assumption and perhaps I could be wrong.

@janvorli
Copy link
Member

@denislohachev1991 could you please share the heaptrack log with me so that I can drill into it in more detail? It is strange that the env variable didn't have any effect.

@darthShadow
Copy link

Slight OT, isn't the DOTNET_DefaultStackSize supposed to be specified without the preceding 0x since it's implicitly hexadecimal?
Atleast that's what I have understood from the previous comments in this and other threads but I am not sure if the preceding 0x is ignored anyway.

@janvorli
Copy link
Member

@darthShadow it doesn't matter, both ways work. We use strtoul to perform the conversion of the env var contents to number and it can take the 0x prefix optionally. See https://en.cppreference.com/w/cpp/string/byte/strtoul.

@janvorli
Copy link
Member

@denislohachev1991 I've looked at the dump you've shared with me. It seems there was no permanent growth of the memory consumption over the time, there are few spikes, but the memory consumption stays about the same. Looking at the bottom-up tab in the heaptrack gui, around 25MB are coming from the openssl and about 14.5MB from the coreclr ClrMalloc, which is used by C++ new and C malloc implementations. On Windows, the HTTPS communication doesn't use openssl and IIRC, the memory consumed by that is not attributed to a specific process, so you won't see it in the working set of the process.
Overall, there seems to be nothing wrong.

@denislohachev1991
Copy link

@janvorli Thank you for your work and time spent.

@yaseen22
Copy link

yaseen22 commented Aug 9, 2024

Thanks for this great thread.
It gave us a lot of valuable insights.

We had the same issue, which is memory growth of our Kubernetes pods after migrating to .NET8 from .NET6
What works for us was switching to Alpine Linux distribution instead of Debian one.

We tried adding this configuration DOTNET_DefaultStackSize=0x180000 to our Debian images, but it didn't work.

If you can explain to me in some details the root cause, why lowering the size of the Default Stack or using the Alpine linux which has low stack size by default (from my understanding), help fixing the issue ?

@nhart12
Copy link

nhart12 commented Aug 9, 2024

Thanks for this great thread. It gave us a lot of valuable insights.

We had the same issue, which is memory growth of our Kubernetes pods after migrating to .NET8 from .NET6 What works for us was switching to Alpine Linux distribution instead of Debian one.

We tried adding this configuration DOTNET_DefaultStackSize=0x180000 to our Debian images, but it didn't work.

If you can explain to me in some details the root cause, why lowering the size of the Default Stack or using the Alpine linux which has low stack size by default (from my understanding), help fixing the issue ?

If you are on latest patch of dotnet 8 which contains this fix #100502
you are likely just seeing a difference due to glibc vs musl. You can play around with tunables for your application needs in debian via:
https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html

Likely lowering MALLOC_ARENA_MAX or MALLOC_TRIM_THRESHOLD_ can get you to similar memory utilization as alpine

@krishnap80
Copy link

I am facing the same issue in .NET 6 API as well. Is there any solution identified?

@mangod9
Copy link
Member

mangod9 commented Sep 1, 2024

hey @krishnap80, most of the discussions on this issue were around .NET 8. Since this issue has been closed I would suggest creating a new issue with details about your specific scenario. Ideally please try to move to .NET 8 too since 6 would soon be out of support. Thanks.

@krishnap80
Copy link

krishnap80 commented Sep 4, 2024 via email

@github-actions github-actions bot locked and limited conversation to collaborators Oct 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-GC-coreclr tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests