-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Additional parameter validation for StartNoGCRegion #14088
Conversation
|
also cc @shimingsg |
ae5551b to
fc85a1d
Compare
|
@dotnet-bot test Windows_NT x64 corefx_baseline |
src/gc/gc.cpp
Outdated
| // requested sizes of 0 make no sense. | ||
| if (total_size == 0) | ||
| { | ||
| status = start_no_gc_too_large; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems weird to me. Why are we reusing the 'too large' status for when size is 0? Is it a point-in-time thing that will be addressed soon? This is what happens in StartNoGCRegionWorker when this 'too large' case occurs:
if (status == StartNoGCRegionStatus.AmountTooLarge)
throw new ArgumentOutOfRangeException(nameof(totalSize),
"totalSize is too large. For more information about setting the maximum size, see \"Latency Modes\" in http://go.microsoft.com/fwlink/?LinkId=522706");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep... Ideally we'd surface this as a new error all the way up to managed code. i'm not sure what the compat implications of this are, which is why I did this. If I have the freedom to do this across our various runtimes, I'd be happy to do so - I'm just not sure if I do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose as long as StartNoGCRegionStatus is private, there's no problem, and this is technically already a breaking change already...
src/gc/gc.cpp
Outdated
| } | ||
|
|
||
| assert(total_size != 0); | ||
| total_size = (size_t)((float)total_size * 1.05); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't strictly part of your change, but I looked at the surrounding code and I'm uncomfortable with the casting going on. total_size comes in as a uint64_t, but then we do these float/size_t casts. The loss of precision and the use of size_t can lead to unexpected behavior, especially on 32-bit platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very true. It's unfortunate that we are stuck with managed long parameters on GC.TryStartNoGCRegion. We should probably bail if:
- total_size >
UINTPTR_MAX - loh_size >
UINTPTR_MAX - total_size is less than the original total_size, after the float mul and cast
- loh_size is less than the original total_size, after the float mul and cast
The float cast itself should be fine, I would think - we don't particularly care about precision here, we just want to ballpark 5% more than what was given to us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh... I just looked at the spec and downcasting from a float that won't fit into a size_t is undefined behavior, so I guess it won't be fine.
|
cc @timwang0 |
|
The latest commit moves some parameter validation to managed code in order to throw better exceptions when validation fails, to address some of @adityamandaleeka's feedback. |
|
Regression tests: dotnet/corefx#24201 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/gc/gc.cpp
Outdated
| size_t allocation_no_gc_soh = 0; | ||
| size_t size_per_heap = 0; | ||
| int soh_align_const = get_alignment_constant (TRUE); | ||
| size_t max_soh_allocated = (soh_segment_size - segment_info_size - eph_gen_starts_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit (and up to you): unnecessary parens
src/gc/gc.cpp
Outdated
|
|
||
| assert(total_size != 0); | ||
| // requested sizes that would cause 1.05 * total_size to not fit in a size_t | ||
| // make no sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
src/gc/gc.cpp
Outdated
| num_heaps = n_heaps; | ||
| #endif //MULTIPLE_HEAPS | ||
| size_t total_allowed_soh_allocation = max_soh_allocated * num_heaps; | ||
| const size_t max_alloc_scale = static_cast<size_t>(SIZE_T_MAX / 1.05f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd make 1.05 a const too since it's used a couple of times below.
|
We know exactly the max size for SOH alloc and for LOH alloc we know we can't go over total_physical_mem. so instead of SIZE_T_MAX you could use these values which are more meaningful (and we know they are smaller than SIZE_T_MAX. So we don't need to return start_no_gc_large from multiple places which makes the code easier to follow. what do you think of this: assert (total_size > 0);
if (loh_size_known)
{
assert (loh_size > 0);
assert (total_size >= loh_size);
loh_size = loh_size;
allocation_no_gc_loh = (size_t)loh_size;
allocation_no_gc_soh = (size_t)(total_size - loh_size);
}
else
{
allocation_no_gc_soh = (size_t)total_size;
allocation_no_gc_loh = (size_t)total_size;
}
int soh_align_const = get_alignment_constant (TRUE);
size_t max_soh_allocated = (soh_segment_size - segment_info_size - eph_gen_starts_size);
const float scale_factor = 1.05f;
int num_heaps = 1;
#ifdef MULTIPLE_HEAPS
num_heaps = n_heaps;
#endif //MULTIPLE_HEAPS
size_t total_allowed_soh_allocation = max_soh_allocated * num_heaps;
size_t total_allowed_loh_allocation = total_physical_mem;
size_t total_allowed_soh_alloc_scaled = (allocation_no_gc_soh > 0) ? ((float)(max_soh_allocated * num_heaps) / scale_factor) : 0;
size_t total_allowed_loh_alloc_scaled = (allocation_no_gc_loh > 0) ? ((float)total_physical_mem / scale_factor) : 0;
if ((allocation_no_gc_soh > total_allowed_soh_alloc_scaled) ||
(allocation_no_gc_loh > total_allowed_loh_alloc_scaled))
{
status = start_no_gc_too_large;
goto done;
}
if (allocation_no_gc_soh > 0)
{
allocation_no_gc_soh = (size_t)((float)allocation_no_gc_soh * scale_factor);
// Doing a min to compensate for possible precision problems
allocation_no_gc_soh = min (allocation_no_gc_soh, total_allowed_soh_alloc_scaled);
}
if (allocation_no_gc_soh > 0)
{
allocation_no_gc_loh = (size_t)((float)allocation_no_gc_loh * scale_factor);
allocation_no_gc_loh = min (allocation_no_gc_loh, total_allowed_loh_alloc_scaled);
} |
|
Looks good to me, but isn't it still the case that |
|
oh I was thinking that total_physical_mem was the min of the actual physical mem and virtual mem on 32-bit (the change you made for VS folks) but I realized that's not in core. could port that to core and then this case would be covered? |
|
I can do that! I haven't yet checked-in that change but I can certainly port it here. It is possible that the divergence between CoreCLR/CoreRT due to standalone GC may make this difficult, though - the implementations of GCToOSInterface are different enough now that the change may need to be ported in a separate way for each runtime. I'll take a look at this today. |
|
Ahh yes, since we are in the transitioning stage, please feel free to pick a temporary solution (eg use SIZE_T_MAX for 32-bit and put a note to remove it when you have ported the localGC/VS change over) that doesn't incur too much inconvenience. |
|
OK, will do! |
|
Latest commit incorporates Maoni's code sample. It also tries to operating using |
|
|
||
| uint64_t total_allowed_soh_allocation = max_soh_allocated * num_heaps; | ||
| // In theory, the upper limit here is the physical memory of the machine, not | ||
| // SIZE_T_MAX. This is not true today because total_physical_mem can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please put a "LocalGC TODO" here so you can find it more easily later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do!
src/gc/gc.cpp
Outdated
| if (allocation_no_gc_soh != 0) | ||
| { | ||
| current_no_gc_region_info.soh_allocation_size = allocation_no_gc_soh; | ||
| assert(allocation_no_gc_soh <= SIZE_T_MAX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(allocation_no_gc_soh <= SIZE_T_MAX); [](start = 8, length = 43)
you already asserted total_allowed is <= SIZE_T_MAX so I don't see why they are checked again here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just extra-defensive coding - no reason in particular. I do see that I typo'd one of the asserts, though, on line 15972, so if they're not useful I'll just delete them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I would remove them.
src/gc/gc.cpp
Outdated
| } | ||
|
|
||
|
|
||
| if (allocation_no_gc_soh > total_allowed_soh_allocation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking this branch is dead now and can be removed? allocation_no_gc_soh <= total_allowed_soh_alloc_scaled is known to be true and later allocation_no_gc_soh is defined (through the min) to be at most total_allowed_soh_alloc_scaled so allocation_no_gc_soh > total_allowed_soh_allocation is transitively always false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right - it's dead and should be removed.
…ons on parameter validation errors
… failure case. Also keep data types as uint64_t as long as possible to avoid size_t overflow
1. Remove useless asserts 2. Remove a dead code path 3. Add a LocalGC TODO to comment
6336f10 to
3ae2d45
Compare
|
Thanks for the reviews! Just rebased against master - this should be ready to go if there aren't any more code review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks everyone! 😄 |
GC.TryStartNoGCRegiontoday behaves unexpectedly in corner cases:If asked to start a no GC region with a total budget of 0,
GC.TryStartNoGCRegiondoes not return an error but also doesn't do anything other than set the No GC region status (i.e. it doesn't setsoh_allocation_no_gc), so it won't attempt to avoid GCs and it'll exit the no GC region next time a GC occurs through the normal means.If asked to start a no GC region with an LOH budget greater than the the total budget, a subtraction operation underflows resulting in a SOH allocation budget that is extremely large, way larger than an ephemeral segment, which will cause the GC to report that the budget requested is too large to accommodate. If your LOH budget is large enough, you'll underflow back into the range of an ephemeral segment, so a no GC region will start successfully - but the SOH allocation budget will be completely bogus.
To address these two problems, this PR adds some additional argument validation to avoid getting into these two states. This PR will be accompanied with an additional PR to CoreFX to add additional unit tests for these corner cases.
Addresses https://github.com/dotnet/coreclr/issues/13736
cc @sergiy-k @Maoni0 @adityamandaleeka