Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the initial segment size computation #35802

Closed
wants to merge 1 commit into from

Conversation

cshung
Copy link
Member

@cshung cshung commented May 4, 2020

I think I have found a bug when I am trying to enable large page on the low_memory_container perf test case.

The test case limited the physical memory to 600 MB, the logic in GCHeap::Initialize() let the heap_hard_limit to 75% of the total physical memory or at 20MB. So it turns into 450 MB in our case.

Later on, the get_segment_size_hard_limit() computes the segment size using the hard limit value. Ignoring alignment, it is roughly dividing the memory by the number of heaps. In my case, the test cases configured to have 6 heaps, so we have around 75 MB for the segsize.

Later on, the segsize is also used as the large object heap size, and also the pinned object heap size, so overall, we are requesting for 6 heaps x (segsize + largesegsize + pinsegsize) = 6 x 3 x 75 = 1350MB, that fails because we only have 600 MB.

It looks like get_segment_size_hard_limit() should further divide by 3 to account for the heap size is used 3 times later, am I correct?

@ghost
Copy link

ghost commented May 4, 2020

Tagging subscribers to this area: @Maoni0
Notify danmosemsft if you want to be subscribed.

@cshung cshung requested a review from Maoni0 May 4, 2020 17:18
@Maoni0
Copy link
Member

Maoni0 commented May 5, 2020

there are 2 things -

  • it is by design that when you specify hardlimit X with large pages enabled, we will commit 2X because we dunno whether the X will be in SOH or LOH.
  • we know that hardlimit doesn't work with POH - I think @VSadov has a workitem to make this work. I think also counting POH as 1X is overkill.

@mjsabby
Copy link
Contributor

mjsabby commented May 5, 2020

will POH be fixed to work with hard limit in 5.0? We now run in hard limit all the time for most of our programs and would like to use POH in 5.0

@Maoni0
Copy link
Member

Maoni0 commented May 5, 2020

if you need it we can certainly make this higher priority - can you describe what kind of POH usage you'd likely have?

@mjsabby
Copy link
Contributor

mjsabby commented May 5, 2020

There's already other things we're working on together that are higher priority so I don't want to randomize that if this is a lot of work.

Basically move our gen2 buffers for async i/o into POH.

@cshung
Copy link
Member Author

cshung commented May 5, 2020

there are 2 things -

  • it is by design that when you specify hardlimit X with large pages enabled, we will commit 2X because we dunno whether the X will be in SOH or LOH.
  • we know that hardlimit doesn't work with POH - I think @VSadov has a workitem to make this work. I think also counting POH as 1X is overkill.

I might be confused with what hard limit mean. I thought it means the GC will not use more than the hard limit (like what a naive user would think). But then we are committing twice as much by design?

Would it make sense, for the container scenario, to not default to have a hard limit of 75%? As is, by default, in a container, we would be committing 150% of the memory available and fail right away.

@VSadov
Copy link
Member

VSadov commented May 5, 2020

As is, by default, in a container, we would be committing 150% of the memory available and fail right away.

It sounds like it could not possibly work, but I think this configuration (large pages + hard limit) was working before, although I am not 100% sure.
I mean - is that the extra generous commit for POH that causes the failures, or this was a problem before that.

Is this Windows only?

@cshung
Copy link
Member Author

cshung commented May 5, 2020

I mean - is that the extra generous commit for POH that causes the failures, or this was a problem before that.

The problem is probably before that - it probably 'worked' by explicitly setting the hard limit to a lower value than the default knowing the GC behavior.

Before POH, the commit is 2 times the hard limit = 2 times of 75% of available memory = 150% of available memory.

After POH, the commit is 3 times the hard limit = 3 times 75% of available memory = 225% of available memory.

In theory, both won't work by default. I don't have a chance to try it on a build without POH yet.

Is this Windows only?

I don't know yet. I reproduced the problem on Windows using the run_in_job harness. I will try it out on Linux to see what happens there.

@Maoni0
Copy link
Member

Maoni0 commented May 5, 2020

large pages + hardlimit is very different from just hardlimit because with large pages you cannot reserve first then only commit what you need. with just hardlimit, we will reserve more but will not commit more than what you specify as the limit. but with large pages since we don't have that option we commit 2x the limit you specify.

I'm saying I don't think we should commit 3x with the addition of POH. but we need to think about the scenarios and experiment to see what's a good amount to commit; or if we should expose yet another config for this.

@Maoni0
Copy link
Member

Maoni0 commented May 5, 2020

In theory, both won't work by default. I don't have a chance to try it on a build without POH yet.

this does work - Bing has been using it for quite some now. it's just it works for their specific scenario because this was specifically for them. as I mentioned, we have not tested large page support for a general setting.

@mjsabby
Copy link
Contributor

mjsabby commented May 5, 2020

Oh wait, I think this is a blocker for us. What the original message says is that now with LOH we're doing 3X commit, which is something we can't afford. So we'll need to either fix this so that POH either shares the memory or that it gets a setting that we can set to a different amount.

@Maoni0
Copy link
Member

Maoni0 commented May 5, 2020

yeah, that's what I'm saying, we should not do 3X commit (I presume you mean POH, not LOH).

@cshung
Copy link
Member Author

cshung commented May 21, 2020

This is superseded by #36731 and more PRs to come to provide a per object heap hard limit.

@cshung cshung closed this May 21, 2020
@cshung cshung deleted the dev/andrewau/init-size-fix branch May 21, 2020 17:31
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants