Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup limits are not detected in docker #8777

Closed
tmds opened this issue Aug 19, 2017 · 19 comments
Closed

cgroup limits are not detected in docker #8777

tmds opened this issue Aug 19, 2017 · 19 comments

Comments

@tmds
Copy link
Member

tmds commented Aug 19, 2017

Triggered by the discussion at aspnet/KestrelHttpServer#1260, I'm having a look at the cgroup code in coreclr.

The first problem I encounter is the code isn't properly parsing the mountinfo when the mountpoint contains a dash. PR: dotnet/coreclr#13488

The next problem I have is the limits aren't found.
The code is looking for a mountpoint in the /proc/self/mountinfo file and then combines that with a path it finds under /proc/self/cgroup.

I ran a container on rhel7.4+docker and one on minishift(= local instance of openshift, which is using kubernetes as an orchectrator) and the limits are not found at that location.

Instead they are present directly under the mount path.

For example:
mountpoint:

396 394 0:21 /kubepods/besteffort/pod9a18ffb8-8513-11e7-b26e-7e29fbe2a5a3/d28e0087cf8f3f0429f755d60b0de415b20fcf76736ded7bab6e30e7b739ee36 /sys/fs/cgroup/cpu ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpu

cgroup

2:cpu:/kubepods/besteffort/pod9a18ffb8-8513-11e7-b26e-7e29fbe2a5a3/d28e0087cf8f3f0429f755d60b0de415b20fcf76736ded7bab6e30e7b739ee36
$ ls /sys/fs/cgroup/cpu                                                                                                                                                                                                             
cgroup.clone_children  cgroup.procs  cpu.cfs_period_us  cpu.cfs_quota_us  cpu.rt_period_us  cpu.rt_runtime_us  cpu.shares  cpu.stat  notify_on_release  tasks

cc @janvorli @benaadams @markvincze @rahku @gboucher90

@tmds
Copy link
Member Author

tmds commented Aug 19, 2017

Cc @Drawaes too

@Drawaes
Copy link
Contributor

Drawaes commented Aug 19, 2017

Good find :)

@benaadams
Copy link
Member

TL;DR in some situations the GC doesn't pick up the container memory limits; which can lead to OOM exceptions and restarts

/cc @gkhanna79 @Maoni0

Probably would need to be part of a 2.0.x servicing release when fixed? As containers is a mainstream ASP.NET Core deployment scenario

/cc @DamianEdwards @davidfowl @kendrahavens

@Drawaes
Copy link
Contributor

Drawaes commented Aug 19, 2017

Yeah I would backport the patch if it is causing OOM on containers in some environments. (my useful 2p)

@benaadams
Copy link
Member

Looks like someone else might be having container issue dotnet/core#871

@Maoni0
Copy link
Member

Maoni0 commented Aug 21, 2017

@janvorli will you be looking at this?

@tmds
Copy link
Member Author

tmds commented Aug 21, 2017

@janvorli I'll update the PR so it handles the docker limits too.

@tmds
Copy link
Member Author

tmds commented Aug 21, 2017

PR was updated: dotnet/coreclr#13488

@tmds tmds changed the title cgroup implementation cgroup limits are not detected in docker Aug 21, 2017
@tmds
Copy link
Member Author

tmds commented Sep 13, 2017

Implemented on master: dotnet/coreclr#13488 and 2.0 branch: dotnet/coreclr#13895

@tmds tmds closed this as completed Sep 13, 2017
@mapitman
Copy link

Any ideas on when or if we will see this as part of a .NET Core 2.0.x release?

@jkotas
Copy link
Member

jkotas commented Oct 16, 2017

It should show up in .NET Core 2.0.3 update.

@ligue
Copy link

ligue commented Oct 29, 2017

I confirm that our microservices deployed in AWS ECS Docker instances (.NET Core 2.0 runtimes) are affected by this.

@emanuelbalea
Copy link

Anyone know which package to try this out with for docker? Tried the 2.0.3 service release but that still has the issue unless I have to manually specify the ServerGarbageCollection flag (which I did not try).

@tmds
Copy link
Member Author

tmds commented Nov 10, 2017

@emanuelbalea I will have a look. How are you testing this?

@emanuelbalea
Copy link

I was using the nightly docker image and testing on aws ec2 container service with an existing microservice and running siege against it to simulate real load. If there's anything I'm missing I'll be happy to try it

@tmds
Copy link
Member Author

tmds commented Nov 10, 2017

I think memory wise things are ok. The gc should pick up the docker limit and containers shouldn't run out of memory. @emanuelbalea are you running OOM?
ProcessorCount wise things are not ok. Environment.ProcessorCount is not returning the limited cpu count.
There seem to be two implementations of ProcessorCount:

CC @janvorli @stephentoub

@emanuelbalea
Copy link

@tmds yes was just using more and more ram till the host filed up and ultimately crashed. Sounds like there was no GC being done.

Has the usage of 'COMPlus_gcServer' changed? We used to have to set it to 0 for it to actually turn on gc before.

@tmds
Copy link
Member Author

tmds commented Nov 10, 2017

@tmds yes was just using more and more ram till the host filed up and ultimately crashed. Sounds like there was no GC being done.
Has the usage of 'COMPlus_gcServer' changed? We used to have to set it to 0 for it to actually turn on gc before.

From what I see the docker limits are properly parsed. I guess also for gc they are not used in all places. I don't know how to debug this further myself.

@tmds tmds reopened this Nov 10, 2017
@tmds
Copy link
Member Author

tmds commented Nov 11, 2017

@tmds tmds closed this as completed Nov 11, 2017
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants