Fix the initial segment size computation #35802

cshung · 2020-05-04T17:18:28Z

I think I have found a bug when I am trying to enable large page on the low_memory_container perf test case.

The test case limited the physical memory to 600 MB, the logic in GCHeap::Initialize() let the heap_hard_limit to 75% of the total physical memory or at 20MB. So it turns into 450 MB in our case.

Later on, the get_segment_size_hard_limit() computes the segment size using the hard limit value. Ignoring alignment, it is roughly dividing the memory by the number of heaps. In my case, the test cases configured to have 6 heaps, so we have around 75 MB for the segsize.

Later on, the segsize is also used as the large object heap size, and also the pinned object heap size, so overall, we are requesting for 6 heaps x (segsize + largesegsize + pinsegsize) = 6 x 3 x 75 = 1350MB, that fails because we only have 600 MB.

It looks like get_segment_size_hard_limit() should further divide by 3 to account for the heap size is used 3 times later, am I correct?

ghost · 2020-05-04T17:18:33Z

Tagging subscribers to this area: @Maoni0
Notify danmosemsft if you want to be subscribed.

Maoni0 · 2020-05-05T06:44:13Z

there are 2 things -

it is by design that when you specify hardlimit X with large pages enabled, we will commit 2X because we dunno whether the X will be in SOH or LOH.
we know that hardlimit doesn't work with POH - I think @VSadov has a workitem to make this work. I think also counting POH as 1X is overkill.

mjsabby · 2020-05-05T07:30:11Z

will POH be fixed to work with hard limit in 5.0? We now run in hard limit all the time for most of our programs and would like to use POH in 5.0

Maoni0 · 2020-05-05T07:34:31Z

if you need it we can certainly make this higher priority - can you describe what kind of POH usage you'd likely have?

mjsabby · 2020-05-05T08:29:31Z

There's already other things we're working on together that are higher priority so I don't want to randomize that if this is a lot of work.

Basically move our gen2 buffers for async i/o into POH.

cshung · 2020-05-05T15:56:15Z

there are 2 things -

it is by design that when you specify hardlimit X with large pages enabled, we will commit 2X because we dunno whether the X will be in SOH or LOH.

we know that hardlimit doesn't work with POH - I think @VSadov has a workitem to make this work. I think also counting POH as 1X is overkill.

I might be confused with what hard limit mean. I thought it means the GC will not use more than the hard limit (like what a naive user would think). But then we are committing twice as much by design?

Would it make sense, for the container scenario, to not default to have a hard limit of 75%? As is, by default, in a container, we would be committing 150% of the memory available and fail right away.

VSadov · 2020-05-05T20:20:13Z

As is, by default, in a container, we would be committing 150% of the memory available and fail right away.

It sounds like it could not possibly work, but I think this configuration (large pages + hard limit) was working before, although I am not 100% sure.
I mean - is that the extra generous commit for POH that causes the failures, or this was a problem before that.

Is this Windows only?

cshung · 2020-05-05T21:16:08Z

I mean - is that the extra generous commit for POH that causes the failures, or this was a problem before that.

The problem is probably before that - it probably 'worked' by explicitly setting the hard limit to a lower value than the default knowing the GC behavior.

Before POH, the commit is 2 times the hard limit = 2 times of 75% of available memory = 150% of available memory.

After POH, the commit is 3 times the hard limit = 3 times 75% of available memory = 225% of available memory.

In theory, both won't work by default. I don't have a chance to try it on a build without POH yet.

Is this Windows only?

I don't know yet. I reproduced the problem on Windows using the run_in_job harness. I will try it out on Linux to see what happens there.

Maoni0 · 2020-05-05T21:57:21Z

large pages + hardlimit is very different from just hardlimit because with large pages you cannot reserve first then only commit what you need. with just hardlimit, we will reserve more but will not commit more than what you specify as the limit. but with large pages since we don't have that option we commit 2x the limit you specify.

I'm saying I don't think we should commit 3x with the addition of POH. but we need to think about the scenarios and experiment to see what's a good amount to commit; or if we should expose yet another config for this.

Maoni0 · 2020-05-05T22:02:17Z

In theory, both won't work by default. I don't have a chance to try it on a build without POH yet.

this does work - Bing has been using it for quite some now. it's just it works for their specific scenario because this was specifically for them. as I mentioned, we have not tested large page support for a general setting.

mjsabby · 2020-05-05T22:21:16Z

Oh wait, I think this is a blocker for us. What the original message says is that now with LOH we're doing 3X commit, which is something we can't afford. So we'll need to either fix this so that POH either shares the memory or that it gets a setting that we can set to a different amount.

Maoni0 · 2020-05-05T22:22:37Z

yeah, that's what I'm saying, we should not do 3X commit (I presume you mean POH, not LOH).

cshung · 2020-05-21T17:31:53Z

This is superseded by #36731 and more PRs to come to provide a per object heap hard limit.

Fix the initial segment size computation

7062282

Dotnet-GitSync-Bot added the area-GC-coreclr label May 4, 2020

cshung requested a review from Maoni0 May 4, 2020 17:18

cshung closed this May 21, 2020

cshung deleted the dev/andrewau/init-size-fix branch May 21, 2020 17:31

cshung mentioned this pull request May 22, 2020

Tracking how much memory is committed per object heap #36731

Merged

cshung mentioned this pull request Jun 26, 2020

Approximate the heap_hard_limit #38178

Merged

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the initial segment size computation #35802

Fix the initial segment size computation #35802

cshung commented May 4, 2020

ghost commented May 4, 2020

Maoni0 commented May 5, 2020

mjsabby commented May 5, 2020

Maoni0 commented May 5, 2020

mjsabby commented May 5, 2020

cshung commented May 5, 2020

VSadov commented May 5, 2020

cshung commented May 5, 2020

Maoni0 commented May 5, 2020

Maoni0 commented May 5, 2020

mjsabby commented May 5, 2020

Maoni0 commented May 5, 2020

cshung commented May 21, 2020

Fix the initial segment size computation #35802

Fix the initial segment size computation #35802

Conversation

cshung commented May 4, 2020

ghost commented May 4, 2020

Maoni0 commented May 5, 2020

mjsabby commented May 5, 2020

Maoni0 commented May 5, 2020

mjsabby commented May 5, 2020

cshung commented May 5, 2020

VSadov commented May 5, 2020

cshung commented May 5, 2020

Maoni0 commented May 5, 2020

Maoni0 commented May 5, 2020

mjsabby commented May 5, 2020

Maoni0 commented May 5, 2020

cshung commented May 21, 2020