Dotnet core consuming lot of memory? #79287

smartaquarius10 · 2022-12-06T11:36:28Z

Is there an existing issue for this?

I have searched the existing issues

Describe the bug

Team,

We are running 27 dotnet core based pods in our Azure kubernetes environment. The memory usage was fine until we were on .net core 3.x, but since the day we've migrated to .net 6.0.9, the memory consumption of the dotnet process across nodes is huge.

As shown in the screenshot, I ran top command in one AKS node and then sorted them in decreasing order of RAM. All the dotnet resources are sorted to the first place. The same situation is there within all the 8 nodes in our AKS node pool.

Could you please suggest some tips and strategies to reduce the memory consumption of .net core?

Please help.. Thank you.

Expected Behavior

Less memory consumption

Steps To Reproduce

Build application in .net 6.0.9
SSH kubernetes nodes
Run top command
Press Shift+M to sort

Exceptions (if any)

NA

.NET Version

6.0.9

Anything else?

No response

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2022-12-06T17:28:48Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

davidfowl · 2022-12-06T17:28:49Z

cc @Maoni0 @mangod9

ghost · 2022-12-06T17:29:09Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Is there an existing issue for this?

I have searched the existing issues

Describe the bug

Team,

We are running 27 dotnet core based pods in our Azure kubernetes environment. The memory usage was fine until we were on .net core 3.x, but since the day we've migrated to .net 6.0.9, the memory consumption of the dotnet process across nodes is huge.

As shown in the screenshot, I ran top command in one AKS node and then sorted them in decreasing order of RAM. All the dotnet resources are sorted to the first place. The same situation is there within all the 8 nodes in our AKS node pool.

Could you please suggest some tips and strategies to reduce the memory consumption of .net core?

Please help.. Thank you.

Expected Behavior

Less memory consumption

Steps To Reproduce

Build application in .net 6.0.9
SSH kubernetes nodes
Run top command
Press Shift+M to sort

Exceptions (if any)

NA

.NET Version

6.0.9

Anything else?

No response

Author:	smartaquarius10
Assignees:	-
Labels:	`area-GC-coreclr`, `untriaged`
Milestone:	-

davidfowl · 2022-12-06T17:29:20Z

cc @richlander

NN--- · 2022-12-06T20:23:10Z

Did you try Server GC setting ?
https://learn.microsoft.com/en-us/dotnet/core/runtime-config/garbage-collector#standalone-gc

Maoni0 · 2022-12-06T20:36:33Z

the first step is always to capture a trace to see why your memory grew. if you could share a trace for 3.1 for comparison purpose that'd be best. you can use the dotnet trace tool -

dotnet trace collect -p <pid> -o <outputpath with .nettrace extension> --profile gc-collect --duration <in hh:mm:ss format>

this should include the history that's leading up to the memory growth. so if you can repro this fairly quickly, you can just start the command right before you start your process. this trace contains no PII

how to capture a trace in containers is described here.

please share the resulting trace.

smartaquarius10 · 2022-12-07T20:04:38Z

@Maoni0 , Thanks you so much for helping. Please correct me if I'm wrong. You want the .net trace of any of our custom applications running as containers in the cluster with .net 6 and .net 3.1 build. Am I correct? If yes, then what should be the duration?

Because the screenshot I have shared is of processes (dotnet runtime) running in Azure kubernetes nodes after we have migrated all our pods to .net 6.

Kind Regards,
Tanul

Maoni0 · 2022-12-07T22:44:45Z

@smartaquarius10 you are welcome. what I meant was if it's possible for you to run the same workload with 3.1 and 6.0 and capture a trace for each, that would give us a fair comparison. you mentioned you have 27 .net based pods, I am not sure if these all run the same thing or not, and/or if all 27 pods started seeing mem increase. but let's say one of your 27 pods always runs task X and you observe its memory usage increase the most when you upgraded from 3.1 to 6, would it be possible to run that pod (with the same workload) with 3.1 and 6? if it's too difficult to revert it back to 3.1, it's fine. a trace on 6.0 would be helpful too. it would help us understand your 6.0 memory behavior, but we can't tell things like "why in 3.1 it consumed less memory" without a 3.1 trace.

when you upgrade from one major version to the next, there are a ton of things that change in the libraries and the runtime (in the runtime aside from GC changes, of course a ton of other things have changed). so we can't tell if the higher memory usage is because there's simply higher demand in memory from the libraries and the rest of the runtime, or if it's because of changes in the GC.

smartaquarius10 · 2022-12-08T12:44:22Z

@Maoni0 , Good morning. Hope you are doing well.

I have created a new repository and uploaded the zip file in it with the traces. The duration is 10 seconds.

As I'm unaware of the security, hence created the private repo and provided you the complete access on the repository. Its awaiting your confirmation. Shall be grateful if you can approve the request and download it from here

Thank you so so much once again for helping us.. Really appreciate.

Kind Regards,
Tanul

Maoni0 · 2022-12-08T21:47:27Z

hi Tanul, thanks for your traces. unfortunately these didn't capture any GC events. the 3.1 one shows this in EventStats view in Perfview -

could you please share the commandline you used and when you started it and how long you ran it for?

also CC-ing @mangod9 as I'll be OOF starting tomorrow.

smartaquarius10 · 2022-12-09T11:41:27Z

@Maoni0, Hey, thank you so much for analyzing the traces. This is the procedure I followed:

As alpine is the base image of the POD's downloaded the dotnet-trace tool from this website
Logged in the POD with dotnet 3 and 6 and then executed this command:

dotnet-trace collect -p 1 -o myservice.nettrace --profile gc-collect --duration 00:00:10
PID for the process is 1

Enjoy your holiday and have a great weekend 😃

@mangod9 , Hope you're doing well. I have provided you the access as well on the repository. Its awaiting for your confirmation 😃

Kind Regards,
Tanul

smartaquarius10 · 2022-12-12T07:11:20Z

@mangod9 , Hello Manish, Good morning.. Could you please help us on this issue. Would be grateful.

mangod9 · 2022-12-13T05:15:53Z

hi @smartaquarius10, could you please add @cshung to the traces as well? thanks

smartaquarius10 · 2022-12-13T06:41:07Z

@mangod9, Done.
@cshung needs to approve the request. Thank you 😃

cshung · 2022-12-14T21:26:53Z

I have seen the traces and I agree with @Maoni0's assessment. The traces don't have any GC-related events there and are not useful for our investigation.

The command line looks correct though - I wonder why that happened - maybe a GC is not happening within 10 seconds?

Generally, a GCCollect-only trace is very lightweight, it emits only a handful of events per GC, so you can leave it on for a much longer period of time than just 10 seconds. Can you try to leave it on for an extended period of time? Ideally, it would be nice to capture them side-by-side when the heap size grows. That way we can investigate what caused the growth.

smartaquarius10 · 2022-12-15T12:48:43Z

@Maoni0, @cshung , Hope you're doing well. Sure.. I have uploaded traces for different durations in the repository;

Hope it works.. Thank you. Take care 😃

Kind Regards,
Tanul

cshung · 2022-12-20T00:39:03Z

Hmm, I am suspecting something might be wrong with our trace capture tools (or maybe the trace capturing process), I don't know what is that yet.

In the .net 6 traces, we are capturing a few GC finalizer events, meaning GC is happening, but we aren't capturing any statistics about the GC heap, which doesn't look right.

smartaquarius10 · 2022-12-20T09:17:27Z

@cshung, oh ok. Is there any other tool available to capture these traces?

cshung · 2022-12-22T00:09:56Z

I spent some time today trying to figure out what is going on with dotnet-trace, it appears that the tool works fine for me on both Windows and Linux. Here are some of my experiment details.

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System;
using System.Reflection;
using System.Runtime;
using System.Diagnostics;

namespace CoreLab
{
    internal static class Program
    {
        private static void Main()
        {
            // This ensures I have time to attach dotnet-trace to it.
            Console.ReadLine();
            // Here is how I can trigger the FitBucket event
            GC.Collect(
                /* generation = */ 1,
                /* mode       = */ GCCollectionMode.Forced,
                /* blocking   = */true);

            // Here is how I can trigger the GCLOHCompact event
            GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
            GC.Collect();
        }
    }
}

I ran this program under the latest dotnet runtime that I built out of main, and then I attach dotnet-trace to it with these arguments:

dotnet-trace collect --profile=gc-collect -p <process-id>

For the dotnet-trace, I used to one in the dotnet/diagnostics repo as of main today.

With that, I am able to generate a trace with the appropriate events:

This works with both Windows and Linux Ubuntu. I have a hard time getting this to work on Alpine, so I haven't tested Alpine specifically. Mostly just because I am unfamiliar with that platform myself.

If you can try out the experiment I outlined above on Alpine (you can use whatever build of the software you want, no need to build it yourself) and see if dotnet-trace can capture GC events if you know for sure that GC does happen, that would be great.

smartaquarius10 · 2023-01-02T06:45:16Z

@Maoni0 @cshung , Hey, wishing you a very happy new year.. Hope you've enjoyed.

@cshung, I ran the same command to get the traces. You want me to add this code in the main function and then collect the traces?

cshung · 2023-01-03T17:51:21Z

@smartaquarius10, yes, please.

I don't know your app, but for the c# code I show, there must be a GC because I forced it, so we can certainly rule out the possibility that a GC didn't happen when you capture a trace.

smartaquarius10 · 2023-01-06T12:45:16Z

@cshung, Here are the traces with GC code.

cshung · 2023-01-08T20:39:41Z

The latest traces are good, and the GC events are there, showing the tool is working just fine.
It remains puzzling as to why the earlier trace shows a finalizer event without a GC event.

smartaquarius10 · 2023-01-16T08:09:27Z

Hello @Maoni0 , @cshung,

Hope you are doing well.

Once you get any updates, please let me know. Would be grateful for your help. Thank you :)

Kind Regards,
Tanul

Maoni0 · 2023-01-17T21:52:55Z

hi @smartaquarius10, I've added your case to mem-doc in the 1st FAQ "I didn't change my code at all, why am I seeing a regression in memory when I upgrade my .NET version?". could you tell me if that's helpful?

richlander · 2023-01-18T00:56:45Z

@smartaquarius10 -- We can also setup a call to work through some of this in real time together. You can contact me at rlander@ms if you want to discuss that.

smartaquarius10 · 2023-01-18T08:00:24Z

@Maoni0 , Thank you so much. Will go through that.

@richlander, Thanks a lot for helping. Will go through the details which Maoni shared and ping you on teams after that.. Thank you so much once again. Really appreciate for the all help and support :)

Kind Regards,
Tanul

smartaquarius10 · 2023-03-06T07:47:07Z

@Maoni0 @richlander @cshung ,
Hey, hope you are doing great.. Just a quick question.. Is this high memory consumption with dotnet has any connection with this cgroups v2 due to which Microsoft has upgraded the Azure kubernetes to Ubuntu 22 in version 1.25.x Here are the details

Azure/AKS#3443 (comment)

Maoni0 · 2023-03-07T03:58:13Z

the cgroup v2 support is in 6.0: https://github.com/dotnet/runtime/blob/release/6.0/src/coreclr/pal/src/misc/cgroup.cpp#L39.

do you happen to have a dump? if so you could check if the hardlimt is set. it's gc_heap::heap_hard_limit.

smartaquarius10 · 2023-03-07T07:13:52Z

@Maoni0 , Thank you for the prompt reply. Sorry, I don't know how to get that 😢 Could you please guide through the process of collecting the dump or these values. Would be grateful.

Thank you.

Kind Regards,
Tanul

Maoni0 · 2023-03-08T00:47:19Z

I searched for ".net core dump" on bing and this is the 2nd link that came up, can you see if this is helpful? https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dumps if not, we should improve our docs.

XuwenWang · 2023-10-23T00:01:40Z

Any update on this issue? I'm experiencing the same issue - 40% increase of memory footprint after migration from .net 3.1 to 6.0.
Pretty sure there is not memory leaks - it doesn't go up constantly, but keeps at a stable level.

smartaquarius10 · 2023-10-23T07:37:18Z

Any update on this issue? I'm experiencing the same issue - 40% increase of memory footprint after migration from .net 3.1 to 6.0. Pretty sure there is not memory leaks - it doesn't go up constantly, but keeps at a stable level.

If its running in AKS may be this issue can help

davidfowl transferred this issue from dotnet/aspnetcore Dec 6, 2022

ghost added the untriaged New issue has not been triaged by the area owner label Dec 6, 2022

davidfowl added the area-GC-coreclr label Dec 6, 2022

mangod9 assigned cshung Dec 12, 2022

mangod9 removed the untriaged New issue has not been triaged by the area owner label Jul 24, 2023

mangod9 added this to the Future milestone Jul 24, 2023

smartaquarius10 closed this as completed Oct 23, 2023

ghost locked as resolved and limited conversation to collaborators Nov 22, 2023

Dotnet core consuming lot of memory? #79287

Dotnet core consuming lot of memory? #79287

Comments

smartaquarius10 commented Dec 6, 2022

Is there an existing issue for this?

Describe the bug

Expected Behavior

Steps To Reproduce

Exceptions (if any)

.NET Version

Anything else?

dotnet-issue-labeler bot commented Dec 6, 2022

davidfowl commented Dec 6, 2022

ghost commented Dec 6, 2022

Is there an existing issue for this?

Describe the bug

Expected Behavior

Steps To Reproduce

Exceptions (if any)

.NET Version

Anything else?

davidfowl commented Dec 6, 2022

NN--- commented Dec 6, 2022

Maoni0 commented Dec 6, 2022

smartaquarius10 commented Dec 7, 2022

Maoni0 commented Dec 7, 2022

smartaquarius10 commented Dec 8, 2022 • edited Loading

Maoni0 commented Dec 8, 2022

smartaquarius10 commented Dec 9, 2022 • edited Loading

smartaquarius10 commented Dec 12, 2022

mangod9 commented Dec 13, 2022

smartaquarius10 commented Dec 13, 2022 • edited Loading

cshung commented Dec 14, 2022

smartaquarius10 commented Dec 15, 2022 • edited Loading

cshung commented Dec 20, 2022

smartaquarius10 commented Dec 20, 2022

cshung commented Dec 22, 2022

smartaquarius10 commented Jan 2, 2023

cshung commented Jan 3, 2023

smartaquarius10 commented Jan 6, 2023

cshung commented Jan 8, 2023

smartaquarius10 commented Jan 16, 2023

Maoni0 commented Jan 17, 2023

richlander commented Jan 18, 2023

smartaquarius10 commented Jan 18, 2023

smartaquarius10 commented Mar 6, 2023

Maoni0 commented Mar 7, 2023

smartaquarius10 commented Mar 7, 2023

Maoni0 commented Mar 8, 2023

XuwenWang commented Oct 23, 2023

smartaquarius10 commented Oct 23, 2023

smartaquarius10 commented Dec 8, 2022 •

edited

Loading

smartaquarius10 commented Dec 9, 2022 •

edited

Loading

smartaquarius10 commented Dec 13, 2022 •

edited

Loading

smartaquarius10 commented Dec 15, 2022 •

edited

Loading