Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@swgillespie
Copy link

GC.TryStartNoGCRegion today behaves unexpectedly in corner cases:

  1. If asked to start a no GC region with a total budget of 0, GC.TryStartNoGCRegion does not return an error but also doesn't do anything other than set the No GC region status (i.e. it doesn't set soh_allocation_no_gc), so it won't attempt to avoid GCs and it'll exit the no GC region next time a GC occurs through the normal means.

  2. If asked to start a no GC region with an LOH budget greater than the the total budget, a subtraction operation underflows resulting in a SOH allocation budget that is extremely large, way larger than an ephemeral segment, which will cause the GC to report that the budget requested is too large to accommodate. If your LOH budget is large enough, you'll underflow back into the range of an ephemeral segment, so a no GC region will start successfully - but the SOH allocation budget will be completely bogus.

To address these two problems, this PR adds some additional argument validation to avoid getting into these two states. This PR will be accompanied with an additional PR to CoreFX to add additional unit tests for these corner cases.

Addresses https://github.com/dotnet/coreclr/issues/13736

cc @sergiy-k @Maoni0 @adityamandaleeka

@swgillespie
Copy link
Author

also cc @shimingsg

@swgillespie
Copy link
Author

@dotnet-bot test Windows_NT x64 corefx_baseline

src/gc/gc.cpp Outdated
// requested sizes of 0 make no sense.
if (total_size == 0)
{
status = start_no_gc_too_large;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems weird to me. Why are we reusing the 'too large' status for when size is 0? Is it a point-in-time thing that will be addressed soon? This is what happens in StartNoGCRegionWorker when this 'too large' case occurs:

if (status == StartNoGCRegionStatus.AmountTooLarge)
                throw new ArgumentOutOfRangeException(nameof(totalSize),
                    "totalSize is too large. For more information about setting the maximum size, see \"Latency Modes\" in http://go.microsoft.com/fwlink/?LinkId=522706");

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep... Ideally we'd surface this as a new error all the way up to managed code. i'm not sure what the compat implications of this are, which is why I did this. If I have the freedom to do this across our various runtimes, I'd be happy to do so - I'm just not sure if I do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose as long as StartNoGCRegionStatus is private, there's no problem, and this is technically already a breaking change already...

src/gc/gc.cpp Outdated
}

assert(total_size != 0);
total_size = (size_t)((float)total_size * 1.05);
Copy link
Member

@adityamandaleeka adityamandaleeka Sep 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strictly part of your change, but I looked at the surrounding code and I'm uncomfortable with the casting going on. total_size comes in as a uint64_t, but then we do these float/size_t casts. The loss of precision and the use of size_t can lead to unexpected behavior, especially on 32-bit platforms.

Copy link
Author

@swgillespie swgillespie Sep 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very true. It's unfortunate that we are stuck with managed long parameters on GC.TryStartNoGCRegion. We should probably bail if:

  • total_size > UINTPTR_MAX
  • loh_size > UINTPTR_MAX
  • total_size is less than the original total_size, after the float mul and cast
  • loh_size is less than the original total_size, after the float mul and cast

The float cast itself should be fine, I would think - we don't particularly care about precision here, we just want to ballpark 5% more than what was given to us.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh... I just looked at the spec and downcasting from a float that won't fit into a size_t is undefined behavior, so I guess it won't be fine.

@shimingsg
Copy link

cc @timwang0

@swgillespie
Copy link
Author

The latest commit moves some parameter validation to managed code in order to throw better exceptions when validation fails, to address some of @adityamandaleeka's feedback.

@swgillespie
Copy link
Author

Regression tests: dotnet/corefx#24201

Copy link
Member

@adityamandaleeka adityamandaleeka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/gc/gc.cpp Outdated
size_t allocation_no_gc_soh = 0;
size_t size_per_heap = 0;
int soh_align_const = get_alignment_constant (TRUE);
size_t max_soh_allocated = (soh_segment_size - segment_info_size - eph_gen_starts_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit (and up to you): unnecessary parens

src/gc/gc.cpp Outdated

assert(total_size != 0);
// requested sizes that would cause 1.05 * total_size to not fit in a size_t
// make no sense.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

src/gc/gc.cpp Outdated
num_heaps = n_heaps;
#endif //MULTIPLE_HEAPS
size_t total_allowed_soh_allocation = max_soh_allocated * num_heaps;
const size_t max_alloc_scale = static_cast<size_t>(SIZE_T_MAX / 1.05f);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd make 1.05 a const too since it's used a couple of times below.

@Maoni0
Copy link
Member

Maoni0 commented Sep 25, 2017

We know exactly the max size for SOH alloc and for LOH alloc we know we can't go over total_physical_mem. so instead of SIZE_T_MAX you could use these values which are more meaningful (and we know they are smaller than SIZE_T_MAX. So we don't need to return start_no_gc_large from multiple places which makes the code easier to follow. what do you think of this:

    assert (total_size > 0);

    if (loh_size_known)
    {
        assert (loh_size > 0);
        assert (total_size >= loh_size);
        loh_size = loh_size;
        allocation_no_gc_loh = (size_t)loh_size;
        allocation_no_gc_soh = (size_t)(total_size - loh_size);
    }
    else
    {
        allocation_no_gc_soh = (size_t)total_size;
        allocation_no_gc_loh = (size_t)total_size;
    }

    int soh_align_const = get_alignment_constant (TRUE);
    size_t max_soh_allocated = (soh_segment_size - segment_info_size - eph_gen_starts_size);

    const float scale_factor = 1.05f;

    int num_heaps = 1;
#ifdef MULTIPLE_HEAPS
    num_heaps = n_heaps;
#endif //MULTIPLE_HEAPS
    size_t total_allowed_soh_allocation = max_soh_allocated * num_heaps;
    size_t total_allowed_loh_allocation = total_physical_mem;
    size_t total_allowed_soh_alloc_scaled = (allocation_no_gc_soh > 0) ? ((float)(max_soh_allocated * num_heaps) / scale_factor) : 0;
    size_t total_allowed_loh_alloc_scaled = (allocation_no_gc_loh > 0) ? ((float)total_physical_mem / scale_factor) : 0;

    if ((allocation_no_gc_soh > total_allowed_soh_alloc_scaled) ||
        (allocation_no_gc_loh > total_allowed_loh_alloc_scaled))
    {
        status = start_no_gc_too_large;
        goto done;
    }

    if (allocation_no_gc_soh > 0)
    {
        allocation_no_gc_soh = (size_t)((float)allocation_no_gc_soh * scale_factor);
        // Doing a min to compensate for possible precision problems
        allocation_no_gc_soh = min (allocation_no_gc_soh, total_allowed_soh_alloc_scaled);
    }

    if (allocation_no_gc_soh > 0)
    {
        allocation_no_gc_loh = (size_t)((float)allocation_no_gc_loh * scale_factor);
        allocation_no_gc_loh = min (allocation_no_gc_loh, total_allowed_loh_alloc_scaled);
    }

@swgillespie
Copy link
Author

Looks good to me, but isn't it still the case that total_physical_mem can be larger than SIZE_T_MAX, if running as a 32-bit process on a machine with more than 4gb ram?

@Maoni0
Copy link
Member

Maoni0 commented Sep 25, 2017

oh I was thinking that total_physical_mem was the min of the actual physical mem and virtual mem on 32-bit (the change you made for VS folks) but I realized that's not in core. could port that to core and then this case would be covered?

@swgillespie
Copy link
Author

I can do that! I haven't yet checked-in that change but I can certainly port it here. It is possible that the divergence between CoreCLR/CoreRT due to standalone GC may make this difficult, though - the implementations of GCToOSInterface are different enough now that the change may need to be ported in a separate way for each runtime. I'll take a look at this today.

@Maoni0
Copy link
Member

Maoni0 commented Sep 25, 2017

Ahh yes, since we are in the transitioning stage, please feel free to pick a temporary solution (eg use SIZE_T_MAX for 32-bit and put a note to remove it when you have ported the localGC/VS change over) that doesn't incur too much inconvenience.

@swgillespie
Copy link
Author

OK, will do!

@swgillespie
Copy link
Author

Latest commit incorporates Maoni's code sample. It also tries to operating using uint64_t for as long as possible before casting to size_t once all parameters have been validated, to avoid accidental overflows of size_t. This passes all of the tests in the linked CoreFX pull on both x86 and x64.


uint64_t total_allowed_soh_allocation = max_soh_allocated * num_heaps;
// In theory, the upper limit here is the physical memory of the machine, not
// SIZE_T_MAX. This is not true today because total_physical_mem can be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please put a "LocalGC TODO" here so you can find it more easily later?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do!

src/gc/gc.cpp Outdated
if (allocation_no_gc_soh != 0)
{
current_no_gc_region_info.soh_allocation_size = allocation_no_gc_soh;
assert(allocation_no_gc_soh <= SIZE_T_MAX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert(allocation_no_gc_soh <= SIZE_T_MAX); [](start = 8, length = 43)

you already asserted total_allowed is <= SIZE_T_MAX so I don't see why they are checked again here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just extra-defensive coding - no reason in particular. I do see that I typo'd one of the asserts, though, on line 15972, so if they're not useful I'll just delete them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I would remove them.

src/gc/gc.cpp Outdated
}


if (allocation_no_gc_soh > total_allowed_soh_allocation)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this branch is dead now and can be removed? allocation_no_gc_soh <= total_allowed_soh_alloc_scaled is known to be true and later allocation_no_gc_soh is defined (through the min) to be at most total_allowed_soh_alloc_scaled so allocation_no_gc_soh > total_allowed_soh_allocation is transitively always false.

Copy link
Member

@Maoni0 Maoni0 Sep 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right - it's dead and should be removed.

@swgillespie
Copy link
Author

Thanks for the reviews! Just rebased against master - this should be ready to go if there aren't any more code review comments.

Copy link
Member

@adityamandaleeka adityamandaleeka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@swgillespie swgillespie merged commit 00e7aa0 into dotnet:master Sep 27, 2017
@swgillespie
Copy link
Author

Thanks everyone! 😄

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants