Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: NoGC callback #66039

Closed
cshung opened this issue Mar 1, 2022 · 47 comments · Fixed by #82045
Closed

[API Proposal]: NoGC callback #66039

cshung opened this issue Mar 1, 2022 · 47 comments · Fixed by #82045
Labels
api-approved API was approved in API review, it can be implemented api-suggestion Early API idea and discussion, it is NOT ready for implementation area-GC-coreclr
Milestone

Comments

@cshung
Copy link
Member

cshung commented Mar 1, 2022

Background and motivation

When people use the TryStartNoGCRegion method, in general, they are trying to prevent a GC from happening. However, this can be done as much as the allocation stays within the totalSize specified upfront.

Ensuring the application does not allocate more than totalSize is difficult, if not impossible. Right now, the GC will perform a GC when the totalSize is exceeded and the GC will have to garbage collect. This is okay from some applications, but for some others, it might not.

API Proposal

Customers are asking for a callback when that happens and let the customer decide if we should

  1. Simply terminate the process as OutOfMemory.
  2. Allocate more memory and let the process continue, or
  3. Perform a GC and let the process continue.

We had some discussions (see below) to explore the actual use case and implementation constraints, and we discovered a key conflict between them.

From the requirements perspective, we would like to have a callback when the memory is exhausted so that we can gracefully terminate the game without a GC, but

From the implementation perspective, we cannot serve a callback that could potentially allocate more memory without a GC while we have already exhausted our pre-allocated memory.

To resolve this conflict, we notice that exactly exhausting the memory is not necessary from the requirement side. As long as we have a callback issued so that the game can gracefully terminate, this is good enough.

We also notice that nobody need the process to be terminated right away.

Therefore, I revised the proposal as follow:

API Usage

void OnBeforeEndNoGCRegion()
{
  /*
   * This callback will be invoked on the finalizer thread.
   * At this point, you have used 256M memory since you started the 
   * NoGCRegion, you have 4M to go before the NoGCRegion really ends
   * 
   * Since this is invoked on the finalizer thread, we expect minimal work here
   * to notify the game to terminate. The game loop thread will probably read
   * a flag and start showing goodbye.
   */
}

GC.TryStartNoGCRegion(260 * 1024 * 1024);
GC.RegisterAllocationCallback(256 * 1024 * 1024, OnBeforeEndNoGCRegion);
/*
 * Ideally, the work here should not use more than 256M, if that's the case, the code will 
 * perform a GC when EndNoGCRegion is invoked and that's the best case.
 * 
 * However, if the usage exceeds 256M, the callback will be invoked.
 * 
 * If the usage exceeds 260M, a GC will be automatically invoked and the NoGCRegion
 * will be terminated automatically.
 */
GC.EndNoGCRegion(); // And the callback will be detached automatically.

Alternative Designs

One alternative design is to expose a enum what to do when the NoGCRegion is exhausted. The options are to terminate the NoGCRegion, to commit more memory, or to fail fast. This is decided to be not good enough because what customer really wanted to customizable logic to show some screen and end the game gracefully, none of those options allow them to do that safely.

Another alternative design is to add parameters to the TryStartNoGCRegion. That will work, but there are two reasons why we wanted to have separate APIs for starting to NoGCRegion and registering the callback.

  1. There are already a few overloads on GC.TryStartNoGCRegion, and adding more parameters to it makes it looks really complicated, and
  2. We are envisioning extending this API to support scenarios other than just NoGCRegions, as of the time of writing, there is no plan to implement that yet, but having this API shape allows us to do that when we wanted.

Risks

By asking the caller to provide two limits, the caller must estimate how much memory is necessary for serving the callback, which circles back to the initial question that it is not easy to estimate memory usage. The key idea here is that the callback is supposed to be some simple thing, as @WenceyWang suggested below, it will simply be switching a flag and using some preallocated resources. That hopefully ease the problem.

Implementation-wise, we will reserve up to the larger limit to begin the NoGCRegion. Therefore, there is no risk to issue the callback without GC. When we reach the larger limit, we will terminate the NoGCRegion regardless. That way we can maintain our implementation reliability.

@cshung cshung added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Mar 1, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added area-GC-coreclr untriaged New issue has not been triaged by the area owner labels Mar 1, 2022
@ghost
Copy link

ghost commented Mar 1, 2022

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

When people use the TryStartNoGCRegion method, in general, they are trying to prevent a GC from happening. However, this can be done as much as the allocation stays within the totalSize specified upfront.

Ensuring the application does not allocate more than totalSize is difficult, if not impossible. Right now, the GC will perform a GC when the totalSize is exceeded and the GC will have to garbage collect. This is okay from some applications, but for some others, it might not.

API Proposal

Customers are asking for a callback when that happens and let the customer decide if we should

  1. Simply terminate the process as OutOfMemory.
  2. Allocate more memory and let the process continue, or
  3. Perform a GC and let the process continue.

To do that, we can add a mode parameter to the GC.TryStartNoGCRegion API to let the customer specify what behavior the user wanted. This optional mode parameter will be specified on all overloads of the method.

The mode parameter should be an enum of type NoGCRegionMode and have these values:

/**
 * This mode specify what the GC should do when we ran out of totalBytes within the NoGCRegion
 */
enum NoGCRegionMode
{
    /**
     * This means when we ran out of totalBytes allocated earlier, we will perform a GC.
     * This is the default behavior.
     */
    GC = 0, 
    /**
     * This means when we ran out of totalBytes allocated earlier, we will try to allocate more memory
     * from the operating system and let the application run without further GC.
     * 
     * This is not always feasible - in segments, if we ran out of the segment size, then we will fail.
     * or in regions, if we ran out of physical memory, we will fail as well.
     */
    ContinueAllocating = 1, 
    /**
     * This means we will let the application fail fast by throwing an OutOfMemoryException
     */
    Throws = 2,
}

### API Usage

```c#
GC.TryStartNoGCRegion(12345678, NoGCRegionMode.ContinueAllocating)

Alternative Designs

Since the user asked for a callback, it would be natural for us to provide a managed callback when that happens. We decided not to do that because it probably won't do what the user wants. Most likely, the user would like to trim some cache and give space for the application to continue to run at that point without incurring a GC. We have considered several implementation possibilities (e.g. serving a callback on allocation, throwing an OOM and catch, letting the user trim caches, etc ...), and nothing would achieve the goal that the user would like to. That is why we decided against the callback and offer the mode option instead.

Risks

No response

Author: cshung
Assignees: -
Labels:

api-suggestion, area-GC-coreclr, untriaged

Milestone: -

@cshung
Copy link
Member Author

cshung commented Mar 3, 2022

@lucabol, this is meant to address #11733.
@WenceyWang, this is meant to address #37667.

@lucabol
Copy link

lucabol commented Mar 3, 2022

Yep, we have three requests now for basically the same thing. Let's hope it gets in.

@Maoni0
Copy link
Member

Maoni0 commented Mar 3, 2022

this is the API proposal from the GC team :) please do let us know if the proposed API addresses your concerns. we'll give it some time for discussion then take it to the API review.

@Maoni0 Maoni0 removed the untriaged New issue has not been triaged by the area owner label Mar 3, 2022
@Maoni0 Maoni0 added this to the 7.0.0 milestone Mar 3, 2022
@lucabol
Copy link

lucabol commented Mar 3, 2022

Sorry @cshung @Maoni0 . I didn't realize this was the proposal :-)

This is a good proposal. Can you give more details on why exactly the callback wouldn't work? As it is the more general solution, it would be good to know why not.

BTW: could the user catch the OutOfMemoryException to achieve something similar to the callback?

@cshung
Copy link
Member Author

cshung commented Mar 3, 2022

Can you give more details on why exactly the callback wouldn't work?

Let's consider case 1, where we are attempting to save the current execution by allocating more memory. Remember we discovered the situation that we have already run out of preallocated memory, so we really cannot run more code before actually allocating more. Even if we do allocate more just to serve the callback, and assuming the amount of memory fits the callback's need, where would the callback be called? On the current thread? It would have a weird stack that an allocation suddenly calls another method. On a thread pool thread? What to do with the current thread right now? That (and any other threads that might allocate) cannot proceed. What if we ran out of memory again during the execution of the callback. This just turns into a lot of technical issues that we simply don't have answers to.

Now let's consider case 2, where we give up the current execution and throws an OOM on the spot. Now one could catch the OOM, but what can that exception handler do? First of all, it cannot allocate more memory (as it will fail right away again), so that excluded a lot of useful things that a user would like to do. One might be able to set null to certain references (e.g. global caches) in an attempt to have more memory to use, but even in that case, that memory won't be usable until another GC kicks in. Last but not least, an OOM can be thrown virtually anywhere. After you caught the OOM, are you sure you can recover from all possible partial states the process left off before the OOM happen? That would be a challenge for many people.

It is much easier to hard code these options and let the native runtime do it. Much of the limitation above comes from the fact that the processing of the situation cannot involve allocating more managed memory, this can easily be side-stepped by doing it in the native runtime. Of course, we lose generality, but do we really need it? Let's flip the question around. If you had the callback, what would you do with it - given you know all these limitations?

@lucabol
Copy link

lucabol commented Mar 3, 2022

In the most common case, I think, we can consider running out of totalSize as a programmer error (aka bug). Hence, we don't need to bring back the program in a running state, but just perform whatever application-specific 'bug reporting' is needed and shut down.

I think that is solved by catching the OOM exception. That was the original intent of the callback. So I think I am good with it.

@WenceyWang
Copy link

WenceyWang commented Mar 3, 2022

I think this is good, but I think we should clarify what does ContinueAllocating does. What does the totalBytes means if I use ContinueAllocating? Is GC will try to allocate a block of totalBytes again?

And what if the allocation failed? GC or OOM?

Or we can let it be 2 enum values ContinueAllocatingUntilThrow, which throw oom while allocating failed, and ContinueAllocatingThenGC for a GC while allocating failed?

@jkotas
Copy link
Member

jkotas commented Mar 3, 2022

Now one could catch the OOM, but what can that exception handler do?

JITing methods that is going to happen as side-effect of running the exception handlers would likely cause some managed allocations too. For example, the JIT allocates string literals on the GC heap. These allocations would keep failing too.

I think that the NoGCRegionMode.Throws option is effectively equivalent to terminating the process ungracefully. It is unlikely that the application would be able to run much managed code to react to the OutOfMemoryException.

If we want to allow the application to react to running out of no GC region, I think a callback that is delivered on finalizer or threadpool thread in combination with allowing more allocation to succeed is the best option.

@cshung
Copy link
Member Author

cshung commented Mar 3, 2022

Or we can let it be 2 enum values ContinueAllocatingUntilThrow, which throw oom while allocating failed, and ContinueAllocatingThenGC for a GC while allocating failed?

I was thinking about ContinueAllocating behaving as if ContinueAllocatingUntilThrow, that is, when the application wants more memory than totalBytes, it will ask the operating system for more memory (*) to serve the request without raising a GC. At a certain point, it will reach a limit where we cannot do that anymore (**), and then it will raise an OOM, and the application crashes just like a typical OOM.

We don't have a ContinueAllocatingThenGC option, but we have a GC option, which behaving like GCThenContinueAllocating. When the application wants more memory than totalBytes, it will do a GC on the spot, effectively stopped the NoGCRegion, and then let the application goes on. This is the default option and is also the behavior we have today.

@WenceyWang, what does ContinueAllocatingThenGC means for you? Is that a mode you needed but not available here?

(*) Exactly how much we ask the operating system to serve the request is an unimportant implementation detail. All we needed is to make sure the allocation succeed, and therefore it is not necessary for us to expose that implementation detail.

(**) Without actually getting our hands on the implementation of this proposal, I am not sure exactly what does it mean by that limit. In the case of segments (which is the current mode right now), we cannot extend Gen0 larger than a segment, so the ContinueAllocating mode would probably stop when we reach the segment size. We might be able to do something to get around that, but we haven't investigated that possibility yet. In the case of the region, we should be able to allocate as much as we have physical memory.

@WenceyWang
Copy link

WenceyWang commented Mar 3, 2022

what does ContinueAllocatingThenGC means for you?

I think there's only a little difference for me between the two modes.

As if we use ContinueAllocatingUntilThrow, we can catch the OOM, display a UI about the situation and maybe start a GC and try to do some error report or load the recent checkpoint or so.

If we have ContinueAllocatingThenGC and the player just doesn't have enough memory in that segment, there will be a GC pause, maybe it'll let the player dead in-game or so but maybe not and the player won't lose their progress and maybe still have a good time playing instead of being GC pause frequently.

I think it's good enough for me to see a ContinueAllocatingUntilThrow to catch the dotnet 7 if ContinueAllocatingThenGC can not be implemented soon.

ps. I am reading https://devblogs.microsoft.com/dotnet/put-a-dpad-on-that-gc/ now

@cshung
Copy link
Member Author

cshung commented Mar 4, 2022

we can catch the OOM

While there is nothing stopping you from catching an OOM exception, the recommendation is that you don't. If you wanted to display UI, you will need memory, and you don't have any, so you will have another OOM again. It just won't do what you wanted, might as well let it crash so that the system will handle that for you (e.g. talk to Watson on Windows, create coredump on Linux, ...)

@jkotas
Copy link
Member

jkotas commented Mar 4, 2022

While there is nothing stopping you from catching an OOM exception, the recommendation is that you don't. If you wanted to display UI, you will need memory, and you don't have any, so you will have another OOM again. It just won't do what you wanted, might as well let it crash

Exactly. The problem is that code out there has finally blocks and other exception handling processing for cleanup, etc. This code is also likely to going to try to allocate memory somewhere, this allocation is going to fail again, and the cycle will repeat. It is not unusual for these situations to end up with stack overflow or other hard to diagnose situation.

The write up above says "This means we will let the application fail fast by throwing an OutOfMemoryException". Fail fast (as in Environment.FailFast) is not the same as throwing OutOfMemoryException. Fail fast terminates the process without giving it any chance to react. OutOfMemoryException is a regular exception that can be handled. Could you please clarify which one of these two behaviors you plan to implement?

@WenceyWang
Copy link

WenceyWang commented Mar 6, 2022

It's kinda common in a game to load these fail-safe assets into memory while on the loading screen. displaying an error UI just means us to change a bool flag somewhere and the rendering thread will do their job of changing what to draw to the swapchain, this is done by unmanaged code and I don't think it needs memory allocation while doing this. What we'll do in managed code is just catch the OOM, change a flag in a pinned object, GC.EndNoGCRegion and the control flow will continue to next round the main loop, then we can check a fast resume checkpoint or show the menu, just like what it is as the game process starts.

I am sorry as I didn't realize that the fail fast by throwing OOM means Environment.FailFast(). And I know to throw an OOM means a new OutOfMemoryException() somewhere but as you already have it as I can catch it while not in NoGCRegion?
I don't think Environment.FailFast() is really necessary if possible, as there's a way to handle it.

If there's only FailFast() possible, we will have to start a watchdog process for this and it will be much more painful as all these unmanaged assets will also be needed to create again.

@cshung
Copy link
Member Author

cshung commented Mar 7, 2022

Fail fast terminates the process without giving it any chance to react. OutOfMemoryException is a regular exception that can be handled. Could you please clarify which one of these two behaviors you plan to implement?

At the interface, GCHeap::Alloc returns NULL if we ran out of memory. My propose was that when we ran out of memory within the NoGCRegion with the NoGCRegionMode set to Throws, then we will return NULL at that point.

Right now, in this particular case, the VM checks for that NULL throws an OutOfMemoryException, which we advise not to catch.

Suppose the VM can check if it is in a NoGCRegion and throws a different exception instead. Then, without violating our usual advice, that exception can be caught. Also, suppose we terminate the NoGCRegion before returning NULL, the GC will be able to perform GC and obtain memory to serve further allocation requests.

This is very optimistic, the true challenge is whether or not the VM is ready everywhere it allocates from the GC heap, which I am not sure about.

@jkotas
Copy link
Member

jkotas commented Mar 7, 2022

This is very optimistic, the true challenge is whether or not the VM is ready everywhere it allocates from the GC heap,

It is not just VM. The problem affects all .NET libraries out there. The problem with injecting OutOfMemoryException in random places is that it has a good chance to leave the process in irrecoverable state, or that the exception will be ignored and thus your catch handler won't get notified. The simple example is if this happens inside static constructor. The static constructor is going to marked as failed and it won't be ever rerun.

Also, suppose we terminate the NoGCRegion before returning NULL, the GC will be able to perform GC and obtain memory to serve further allocation requests.

If you are going to automatically turn off NoGCRegion anyway, I think it is better to let the allocation succeed and deliver the notification about running out of no GC region budget as callback. The callback can be passed in as an argument for TryStartNoGCRegion method.

@cshung
Copy link
Member Author

cshung commented Mar 7, 2022

I think it is better to let the allocation succeed

But we do have a mode of letting the allocation succeed, it is the ContinueAllocating mode. Look like what is needed is an additional callback when the continue allocating mode is triggered?

@WenceyWang
Copy link

I think it is better to let the allocation succeed

But we do have a mode of letting the allocation succeed, it is the ContinueAllocating mode. Look like what is needed is an additional callback when the continue allocating mode is triggered?

OK, As I imagine, there are 4 mode

GC
ContinueAllocatingThenGC
ContinueAllocatingUntilThrow
Throws 

And a callback with event info about the status of ContinueAllocating, which happens

  1. After continued allocation occurs;
  2. While in ContinueAllocatingThenGC mode, after (a failed allocation, end the NoGCRegion and a GC).

And also I am also happy to see the current proposal can catch the merge window if there's no time.

@cshung
Copy link
Member Author

cshung commented Mar 8, 2022

The second case is still problematic, in essence, we cannot do a callback when we really don't have memory.

Let me make sure we understand by going a simple example. To make things simple, let's use tiny numbers.

Let's say right now we have 100 bytes of memory. You have used 20 bytes already, and you just started a NoGC region with 10 bytes.

So you allocate object #)1, succeed (remaining 7 bytes)
So you allocate object #)2, succeed (remaining 4 bytes)
So you allocate object #)3, succeed (remaining 1 bytes)
So you allocate object #)4, ...

The default behavior today is that your 4th allocation will automatically leave the NoGCRegion and perform a GC, and this proposal is about providing alternatives for that.

If you chose ContinueAllocating, the GC will ask the operating system for more memory, let say, 15 bytes. This number is implementation-dependent, but now your allocation #)4 will succeed. (*)

So you allocate object #)4, succeed (remaining 13 bytes)
So you allocate object #)5, succeed (remaining 10 bytes)
So you allocate object #)6, succeed (remaining 7 bytes)
So you allocate object #)7, succeed (remaining 4 bytes)
So you allocate object #)8, succeed (remaining 1 bytes)
So you allocate object #)9, ...

At this point, we will continue the ask the operating system again for even more memory, until eventually, you reached the system's limit.

So you allocate object #)n, succeed (remaining 1 bytes)
So you allocate object #)n+1

At this point, GC can no longer allocate, and it will fail the allocation with OOM (**)

We advised you not to, but if you catch the OOM, the OOM handler may continue to allocating, leading to allocate object #)n+2, and you get an OOM again, and the program still crash because now you are crashing within your catch.

Firing a callback at (**) time has exactly the same effect, your callback will allocate memory, and worse, the OOM in the callback will cause another callback, and you eventually run out of the stack too.

Another possibility is that we do a GC anyway at (**) time. Doing a GC is probably better than a crash.

Our suggestion is that we offer you a callback at (*). The idea is, you still have 70 bytes at that point, more than sufficient for you to gracefully terminate the game, show your goodbye screen, do your logging, whatever your game needs.

To make that work, your NoGCRegion setting must leave some room for maneuver. Let's say your callback needed 10 bytes, then if you set your NoGCRegion to preallocate 75 bytes, leaving only 5 bytes remaining for your callback. Then your callback will fail with a OutOfMemoryException, and that is no longer recoverable.

If you wanted to do a GC instead, you can simply terminate the NoGCRegion at the callback, this will end the NoGCRegion and perform a GC at the spot. That would cover the case for ContinueAllocatingThenGC, there is no need for a separate mode for that.

If you chose Throws, then the #)4 allocation will fail, and the application will crash.

@jkotas
Copy link
Member

jkotas commented Mar 8, 2022

I think we should automatically terminate the NoGCRegion before issuing the out of space callback. The whole point of NoGCRegion is to maintain certain SLAs. The moment you run out of space, the SLAs are not going to maintained anyway and so trying to artificially stay in NoGCRegion just makes it difficult to recover from the situation.

@cshung
Copy link
Member Author

cshung commented Mar 8, 2022

I think we should automatically terminate the NoGCRegion before issuing the out-of-space callback.

But that is something we already do, the default behavior is to terminate the NoGCRegion once the out-of-space situation happens.

@WenceyWang
Copy link

Our suggestion is that we offer you a callback at (*). The idea is, you still have 70 bytes at that point, more than sufficient for you to gracefully terminate the game, show your goodbye screen, do your logging, whatever your game needs.

OK, this is fair enough for me to have a callback while still in NoGCRegion.

@jkotas
Copy link
Member

jkotas commented Mar 8, 2022

But that is something we already do, the default behavior is to terminate the NoGCRegion once the out-of-space situation happens.

Agree. I am arguing that it is the right behavior and we should continue doing that even for the callback.

I think managed callbacks that run with NoGCRegion still in place would be fragile and unreliable.

@cshung
Copy link
Member Author

cshung commented Mar 8, 2022

OK, this is fair enough for me to have a callback while still in NoGCRegion.

and

I think managed callbacks that run with NoGCRegion still in place would be fragile and unreliable.

are contradictory ...

@WenceyWang
Copy link

WenceyWang commented Mar 9, 2022

I think the idea of both @jkotas and me is that if we have both ContinueAllocatingThenGC and a callback, thus:

The GC will continue to try to allocate more after (the original totalBytes or the previous addition allocated memory) is used up while still keeping in NoGCRegion.
if the allocation success, invoke the callback.
if the allocation failed, exit NoGCRegion, GC, invoke the callback.

@cshung
Copy link
Member Author

cshung commented Mar 23, 2022

@WenceyWang, @GSPP, @jkotas

The discussion in the thread is enlightening. I think I have a much better understanding of the problem space, and therefore I revised the proposal (on the very first comment of this thread). Please take a look and see if the new version addresses all the concerns.

@jkotas
Copy link
Member

jkotas commented Mar 23, 2022

Which thread is the callback invoked on in the proposed design?

@cshung
Copy link
Member Author

cshung commented Mar 23, 2022

Which thread is the callback invoked on in the proposed design?

The finalizer thread.

@jkotas
Copy link
Member

jkotas commented Mar 23, 2022

What is going to happen if the app exhausts the reserve before the callback gets delivered on the finalizer thread?

@cshung
Copy link
Member Author

cshung commented Mar 23, 2022

What is going to happen if the app exhausts the reserve before the callback gets delivered on the finalizer thread?

In normal cases, that should not happen. No GC occurred earlier, so the finalizer should have nothing to do.

But it may happen, for example, if we have some buggy finalizer code deadlocked in the finalizer thread or some other reasons that I am not aware of.

In any case (i.e. the callback delivered or not, the callback completed or not), when all the reserves are exhausted (i.e. the larger limit), we will do a GC and exit the NoGCRegion automatically, just like it was without this change.

@jkotas
Copy link
Member

jkotas commented Mar 23, 2022

In normal cases, that should not happen. No GC occurred earlier, so the finalizer should have nothing to do.

If app is allocating at high rate, it can easily eat the reserve just in the time window it took the thread scheduler to invoke the callback on finalizer thread.

Would it make sense to decouple this callback from the no GC regions? The callback is basically about providing a notification once the certain amount of memory was allocated. It can be completely independent on the no-GC regions.

@cshung
Copy link
Member Author

cshung commented Mar 23, 2022

@jkotas, I am wondering if we have a misunderstanding on the scenario and the feature.

As I understand it, the NoGCRegion feature is meant for games where they have already preallocated most of their resources and would like the gameplay to be smooth. So my expectation is that the callback is meant for edge cases and allocating at a high rate during NoGCRegion is something that should not happen.

That was what I meant by "in normal cases" above.

Are you having an alternative scenario in mind? I am wondering if you could share that. It is hard to think about decoupling the callback from NoGCRegion without knowing what problem that solves. From a caller's perspective, it might become harder to use, and from an implementation's perspective, it incurs a cost of checking whether or not we need to issue a callback for every allocation that reaches GCHeap, not just the NoGCRegion ones. But if we do have a compelling scenario that a general callback might help, then implementing it once for both might make sense.

@jkotas
Copy link
Member

jkotas commented Mar 23, 2022

Yes, we use scenarios to motivate APIs. Once we arrive at set of APIs that are addressing the scenario, we look at what these APIs actually guarantee and whether the design can be simplified or generalized while providing same guarantees.

The callback in the proposed API is only guaranteed to be delivered after certain amount of memory gets allocated. There is no guarantee whether the callback gets delivered before the no-GC region expires.

It suggests that it is unnecessary to couple the callback with no GC regions. I am sure somebody would find uses for generalized callback like that are unrelated to the original motivating scenario.

@jkotas
Copy link
Member

jkotas commented Mar 23, 2022

From a caller's perspective, it might become harder to use,

We have 4 overloads of the TryStartNoGCRegion method already. Do we need to add callback overloads for each of them? It gets complex to pick the right overload to use from so many different ones.

it incurs a cost of checking whether or not we need to issue a callback for every allocation that reaches GCHeap, not just the NoGCRegion ones.

Yes, this check would be on slow path through the allocator. I do not see it as a problem.

@cshung
Copy link
Member Author

cshung commented Mar 24, 2022

I questioned the scenario mainly because lifting the restriction to NoGCRegion may make the design and implementation more complicated. Here are some design-level questions that we won't have in the NoGCRegion case.

  • Do we allow multiple callbacks? In the NoGCRegion case, the answer is simply no. At any time you are either in or out of a NoGCRegion. So if we restrict the registration to only the point when we enter a NoGCRegion, at any time we can have at most one.

  • What if GC happened? Do I continue the accounting? Do I drop the collected bytes? How would I know how many bytes collected correspond to the bytes allocated at the point the callback is registered? Do I drop the callback in case of a GC? In the NoGCRegion case, the answer is simply that GC doesn't happen.

  • Does the callback fire multiple times? For the NoGC case, it is simple, it is associated with the NoGC region, and the NoGCRegion allocated bytes crosses the limit line exactly once. In case it isn't, it is unclear. Do we want to detach the callback once it is fired, do I want to start the counter again?

Without a scenario, these questions have no good answers.

And I agree, the number of overloads is concerning.

@WenceyWang
Copy link

WenceyWang commented Mar 24, 2022

This new change is good enough for me.

But for the idea of API designing, I think I can talk some, as I think there might be a huge gap between you framework developers and the game developers.

I have talked with some of my indie game developer friends several days ago. They are mainly designers, not cs majors. While I told them the GC might cause pause, they ask me: How to get rid of this thing?

They have no idea about GC at all. As I told them there's GC.TryStartNoGCRegion and they ask me what to put as the argument.
It's hard for me to explain the whole thing. Thus later they told me they have used a for loop to try for a large enough and works value. But still, I think they have no idea when they might lose the NoGCRegion. Maybe they also have no idea about the NoGCRegion will exit automatically.

So I think the idea that easy for them of the whole NoGCRegion API is simple, it's something like

StartNoGCRegion(Action exiting);

EndNoGCRegion();

GC will try to do everything they don't know to keep the NoGCRegion, and If GC thinks there's no way to preserve it, the callback is called, once. And after the callback is invoked, no matter if the callback is actually being executed/returned, a GC can happen if any new allocation is needed.

Also, I think this is the idea of the ContinueAllocatingThenGC.

And still thank you for the enhancement.

@jkotas
Copy link
Member

jkotas commented Mar 24, 2022

StartNoGCRegion(Action exiting);

This looks very close to what the existing GC.WaitForFullGCApproach API does.

@cshung cshung added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Apr 13, 2022
@terrajobst
Copy link
Contributor

terrajobst commented Apr 21, 2022

Video

  • We should rename the method to make it clear that's tied to the no-GC region feature
  • The API should if not currently in a no-GC region
  • The callback is automatically removed when the no-GC region ends
namespace System;

public partial class GC
{
    public static void RegisterNoGCRegionCallback(long totalSize, Action callback);
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Apr 21, 2022
@cshung cshung removed this from the 7.0.0 milestone Jun 1, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 1, 2022
@cshung cshung added this to the Future milestone Jun 1, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jun 1, 2022
@lucabol
Copy link

lucabol commented Feb 2, 2023

Just wondering what the status of this one is. I opened the first one 4 years ago, there was a long discussion on March/April of last year culminating with an API proposal that was approved. It looked promising at that point.

But then it was put into the Future milestone, which looks less promising :-).

So, what is the status? Do we have an agreement on the API and just waiting to be prioritized for development, or have we encountered unforeseen conceptual issues?

@Maoni0
Copy link
Member

Maoni0 commented Feb 2, 2023

@cshung is actually currently working on this. I think he should have something you can try soon (if you want to try a daily 8.0 build).

@lucabol
Copy link

lucabol commented Feb 2, 2023

Great! Happy to try it out. Let me know when/how.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 27, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 21, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented api-suggestion Early API idea and discussion, it is NOT ready for implementation area-GC-coreclr
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants