Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dotnet core consuming lot of memory? #79287

Closed
1 task done
smartaquarius10 opened this issue Dec 6, 2022 · 33 comments
Closed
1 task done

Dotnet core consuming lot of memory? #79287

smartaquarius10 opened this issue Dec 6, 2022 · 33 comments
Assignees
Milestone

Comments

@smartaquarius10
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

Team,

We are running 27 dotnet core based pods in our Azure kubernetes environment. The memory usage was fine until we were on .net core 3.x, but since the day we've migrated to .net 6.0.9, the memory consumption of the dotnet process across nodes is huge.

As shown in the screenshot, I ran top command in one AKS node and then sorted them in decreasing order of RAM. All the dotnet resources are sorted to the first place. The same situation is there within all the 8 nodes in our AKS node pool.

image

Could you please suggest some tips and strategies to reduce the memory consumption of .net core?

Please help.. Thank you.

Expected Behavior

Less memory consumption

Steps To Reproduce

  • Build application in .net 6.0.9
  • SSH kubernetes nodes
  • Run top command
  • Press Shift+M to sort

Exceptions (if any)

NA

.NET Version

6.0.9

Anything else?

No response

@davidfowl davidfowl transferred this issue from dotnet/aspnetcore Dec 6, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@davidfowl
Copy link
Member

cc @Maoni0 @mangod9

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Dec 6, 2022
@ghost
Copy link

ghost commented Dec 6, 2022

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

Team,

We are running 27 dotnet core based pods in our Azure kubernetes environment. The memory usage was fine until we were on .net core 3.x, but since the day we've migrated to .net 6.0.9, the memory consumption of the dotnet process across nodes is huge.

As shown in the screenshot, I ran top command in one AKS node and then sorted them in decreasing order of RAM. All the dotnet resources are sorted to the first place. The same situation is there within all the 8 nodes in our AKS node pool.

image

Could you please suggest some tips and strategies to reduce the memory consumption of .net core?

Please help.. Thank you.

Expected Behavior

Less memory consumption

Steps To Reproduce

  • Build application in .net 6.0.9
  • SSH kubernetes nodes
  • Run top command
  • Press Shift+M to sort

Exceptions (if any)

NA

.NET Version

6.0.9

Anything else?

No response

Author: smartaquarius10
Assignees: -
Labels:

area-GC-coreclr, untriaged

Milestone: -

@davidfowl
Copy link
Member

cc @richlander

@NN---
Copy link
Contributor

NN--- commented Dec 6, 2022

@Maoni0
Copy link
Member

Maoni0 commented Dec 6, 2022

the first step is always to capture a trace to see why your memory grew. if you could share a trace for 3.1 for comparison purpose that'd be best. you can use the dotnet trace tool -

dotnet trace collect -p <pid> -o <outputpath with .nettrace extension> --profile gc-collect --duration <in hh:mm:ss format>

this should include the history that's leading up to the memory growth. so if you can repro this fairly quickly, you can just start the command right before you start your process. this trace contains no PII

how to capture a trace in containers is described here.

please share the resulting trace.

@smartaquarius10
Copy link
Author

@Maoni0 , Thanks you so much for helping. Please correct me if I'm wrong. You want the .net trace of any of our custom applications running as containers in the cluster with .net 6 and .net 3.1 build. Am I correct? If yes, then what should be the duration?

Because the screenshot I have shared is of processes (dotnet runtime) running in Azure kubernetes nodes after we have migrated all our pods to .net 6.

Kind Regards,
Tanul

@Maoni0
Copy link
Member

Maoni0 commented Dec 7, 2022

@smartaquarius10 you are welcome. what I meant was if it's possible for you to run the same workload with 3.1 and 6.0 and capture a trace for each, that would give us a fair comparison. you mentioned you have 27 .net based pods, I am not sure if these all run the same thing or not, and/or if all 27 pods started seeing mem increase. but let's say one of your 27 pods always runs task X and you observe its memory usage increase the most when you upgraded from 3.1 to 6, would it be possible to run that pod (with the same workload) with 3.1 and 6? if it's too difficult to revert it back to 3.1, it's fine. a trace on 6.0 would be helpful too. it would help us understand your 6.0 memory behavior, but we can't tell things like "why in 3.1 it consumed less memory" without a 3.1 trace.

when you upgrade from one major version to the next, there are a ton of things that change in the libraries and the runtime (in the runtime aside from GC changes, of course a ton of other things have changed). so we can't tell if the higher memory usage is because there's simply higher demand in memory from the libraries and the rest of the runtime, or if it's because of changes in the GC.

@smartaquarius10
Copy link
Author

smartaquarius10 commented Dec 8, 2022

@Maoni0 , Good morning. Hope you are doing well.

I have created a new repository and uploaded the zip file in it with the traces. The duration is 10 seconds.

As I'm unaware of the security, hence created the private repo and provided you the complete access on the repository. Its awaiting your confirmation. Shall be grateful if you can approve the request and download it from here

Thank you so so much once again for helping us.. Really appreciate.

Kind Regards,
Tanul

@Maoni0
Copy link
Member

Maoni0 commented Dec 8, 2022

hi Tanul, thanks for your traces. unfortunately these didn't capture any GC events. the 3.1 one shows this in EventStats view in Perfview -

image

could you please share the commandline you used and when you started it and how long you ran it for?

also CC-ing @mangod9 as I'll be OOF starting tomorrow.

@smartaquarius10
Copy link
Author

smartaquarius10 commented Dec 9, 2022

@Maoni0, Hey, thank you so much for analyzing the traces. This is the procedure I followed:

  • As alpine is the base image of the POD's downloaded the dotnet-trace tool from this website

  • Logged in the POD with dotnet 3 and 6 and then executed this command:

    dotnet-trace collect -p 1 -o myservice.nettrace --profile gc-collect --duration 00:00:10

  • PID for the process is 1

Enjoy your holiday and have a great weekend 😃

@mangod9 , Hope you're doing well. I have provided you the access as well on the repository. Its awaiting for your confirmation 😃

Kind Regards,
Tanul

@smartaquarius10
Copy link
Author

@mangod9 , Hello Manish, Good morning.. Could you please help us on this issue. Would be grateful.

@mangod9
Copy link
Member

mangod9 commented Dec 13, 2022

hi @smartaquarius10, could you please add @cshung to the traces as well? thanks

@smartaquarius10
Copy link
Author

smartaquarius10 commented Dec 13, 2022

@mangod9, Done.
@cshung needs to approve the request. Thank you 😃

@cshung
Copy link
Member

cshung commented Dec 14, 2022

I have seen the traces and I agree with @Maoni0's assessment. The traces don't have any GC-related events there and are not useful for our investigation.

The command line looks correct though - I wonder why that happened - maybe a GC is not happening within 10 seconds?

Generally, a GCCollect-only trace is very lightweight, it emits only a handful of events per GC, so you can leave it on for a much longer period of time than just 10 seconds. Can you try to leave it on for an extended period of time? Ideally, it would be nice to capture them side-by-side when the heap size grows. That way we can investigate what caused the growth.

@smartaquarius10
Copy link
Author

smartaquarius10 commented Dec 15, 2022

@Maoni0, @cshung , Hope you're doing well. Sure.. I have uploaded traces for different durations in the repository;

Hope it works.. Thank you. Take care 😃

Kind Regards,
Tanul

@cshung
Copy link
Member

cshung commented Dec 20, 2022

Hmm, I am suspecting something might be wrong with our trace capture tools (or maybe the trace capturing process), I don't know what is that yet.

In the .net 6 traces, we are capturing a few GC finalizer events, meaning GC is happening, but we aren't capturing any statistics about the GC heap, which doesn't look right.

@smartaquarius10
Copy link
Author

@cshung, oh ok. Is there any other tool available to capture these traces?

@cshung
Copy link
Member

cshung commented Dec 22, 2022

I spent some time today trying to figure out what is going on with dotnet-trace, it appears that the tool works fine for me on both Windows and Linux. Here are some of my experiment details.

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System;
using System.Reflection;
using System.Runtime;
using System.Diagnostics;

namespace CoreLab
{
    internal static class Program
    {
        private static void Main()
        {
            // This ensures I have time to attach dotnet-trace to it.
            Console.ReadLine();
            // Here is how I can trigger the FitBucket event
            GC.Collect(
                /* generation = */ 1,
                /* mode       = */ GCCollectionMode.Forced,
                /* blocking   = */true);

            // Here is how I can trigger the GCLOHCompact event
            GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
            GC.Collect();
        }
    }
}

I ran this program under the latest dotnet runtime that I built out of main, and then I attach dotnet-trace to it with these arguments:

dotnet-trace collect --profile=gc-collect -p <process-id>

For the dotnet-trace, I used to one in the dotnet/diagnostics repo as of main today.

With that, I am able to generate a trace with the appropriate events:

image

This works with both Windows and Linux Ubuntu. I have a hard time getting this to work on Alpine, so I haven't tested Alpine specifically. Mostly just because I am unfamiliar with that platform myself.

If you can try out the experiment I outlined above on Alpine (you can use whatever build of the software you want, no need to build it yourself) and see if dotnet-trace can capture GC events if you know for sure that GC does happen, that would be great.

@smartaquarius10
Copy link
Author

@Maoni0 @cshung , Hey, wishing you a very happy new year.. Hope you've enjoyed.

@cshung, I ran the same command to get the traces. You want me to add this code in the main function and then collect the traces?

@cshung
Copy link
Member

cshung commented Jan 3, 2023

@smartaquarius10, yes, please.

I don't know your app, but for the c# code I show, there must be a GC because I forced it, so we can certainly rule out the possibility that a GC didn't happen when you capture a trace.

@smartaquarius10
Copy link
Author

@cshung, Here are the traces with GC code.

@cshung
Copy link
Member

cshung commented Jan 8, 2023

The latest traces are good, and the GC events are there, showing the tool is working just fine.
It remains puzzling as to why the earlier trace shows a finalizer event without a GC event.

@smartaquarius10
Copy link
Author

Hello @Maoni0 , @cshung,

Hope you are doing well.

Once you get any updates, please let me know. Would be grateful for your help. Thank you :)

Kind Regards,
Tanul

@Maoni0
Copy link
Member

Maoni0 commented Jan 17, 2023

hi @smartaquarius10, I've added your case to mem-doc in the 1st FAQ "I didn't change my code at all, why am I seeing a regression in memory when I upgrade my .NET version?". could you tell me if that's helpful?

@richlander
Copy link
Member

@smartaquarius10 -- We can also setup a call to work through some of this in real time together. You can contact me at rlander@ms if you want to discuss that.

@smartaquarius10
Copy link
Author

@Maoni0 , Thank you so much. Will go through that.

@richlander, Thanks a lot for helping. Will go through the details which Maoni shared and ping you on teams after that.. Thank you so much once again. Really appreciate for the all help and support :)

Kind Regards,
Tanul

@smartaquarius10
Copy link
Author

@Maoni0 @richlander @cshung ,
Hey, hope you are doing great.. Just a quick question.. Is this high memory consumption with dotnet has any connection with this cgroups v2 due to which Microsoft has upgraded the Azure kubernetes to Ubuntu 22 in version 1.25.x Here are the details

Azure/AKS#3443 (comment)

@Maoni0
Copy link
Member

Maoni0 commented Mar 7, 2023

the cgroup v2 support is in 6.0: https://github.com/dotnet/runtime/blob/release/6.0/src/coreclr/pal/src/misc/cgroup.cpp#L39.

do you happen to have a dump? if so you could check if the hardlimt is set. it's gc_heap::heap_hard_limit.

@smartaquarius10
Copy link
Author

@Maoni0 , Thank you for the prompt reply. Sorry, I don't know how to get that 😢 Could you please guide through the process of collecting the dump or these values. Would be grateful.

Thank you.

Kind Regards,
Tanul

@Maoni0
Copy link
Member

Maoni0 commented Mar 8, 2023

I searched for ".net core dump" on bing and this is the 2nd link that came up, can you see if this is helpful? https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dumps if not, we should improve our docs.

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Jul 24, 2023
@mangod9 mangod9 added this to the Future milestone Jul 24, 2023
@XuwenWang
Copy link

Any update on this issue? I'm experiencing the same issue - 40% increase of memory footprint after migration from .net 3.1 to 6.0.
Pretty sure there is not memory leaks - it doesn't go up constantly, but keeps at a stable level.

@smartaquarius10
Copy link
Author

Any update on this issue? I'm experiencing the same issue - 40% increase of memory footprint after migration from .net 3.1 to 6.0. Pretty sure there is not memory leaks - it doesn't go up constantly, but keeps at a stable level.

If its running in AKS may be this issue can help

@ghost ghost locked as resolved and limited conversation to collaborators Nov 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants