-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/7.0] Fixing a bug that causes us to mistakenly demote a gen2 region to gen0 #83341
Conversation
We are seeing a gen0 region that's almost fully occupied (< 24 bytes free) with one giant plug (ie, no free objects at all in the region). This causes allocate_in_condemned_generations to go into an infinite loop because in ephemeral generations we expect short plugs, ie, we should be able to allocate a min free object in front of each plug. And normally we can because when we allocate objects in gen0 we make sure to break up the allocation contexts with min free objects and when we compact into gen1 we form short plugs. We are in this situation when all of the following conditions are true - + we did a gen2 compacting GC that generates a pinned plug in a gen2 region almost as big as the whole region. my guess for the reason why there's this giant pinned plug is because that gen2 region was already really compact so when we called allocate_in_condemned_generations on the non pinned plugs that are next to some pinned plugs in it we discovered we can't move the non pinned plugs anyway so we artificially pinned them and formed a giant pinned plug. and during this GC those objects were no longer pinned so we have one giant non pinned plug. + this gen2 region needs to be the last region with pinned plugs; + this gen2 region hasn't been consumed by allocate_in_condemned_generations yet so it was processed by process_remaining_regions; Then in process_remaining_regions we'll set the plan_gen_num for that gen2 region to 0 because we are doing set_region_plan_gen_num_sip (current_region, current_plan_gen_num); instead of going through the demotion logic to decide whether we should demote this region or not.
Tagging subscribers to this area: @dotnet/gc Issue DetailsBackport of #82413 to release/7.0 Customer ImpactTestingRiskIMPORTANT: Is this backport for a servicing release? If so and this change touches code that ships in a NuGet package, please make certain that you have added any necessary package authoring and gotten it explicitly reviewed.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved. we will take for consideration in 7.0.x
I'm retargeting this PR to the new Repo maintainers will now be allowed to merge their own servicing PR as long as it meets the requirements:
The new process is described here: runtime/docs/project/library-servicing.md. The infra team will be actively monitoring servicing PRs to ensure all requirements are met and to help with any issues. Let me know if you have any questions. |
Reminder: April 10th is the last day to merge backport PRs to ensure they get included in the May Release. PR owners are now in charge of merging their own PRs. |
Backport of #82413 to release/7.0
/cc @mangod9 @Maoni0
Customer Impact
An infinite loop in GC. Multiple customers have reported this issue and it happens when a gen0 region that's almost fully occupied (< 24 bytes free) with one giant plug (ie, no free objects at all in the region). This causes allocate_in_condemned_generations to go into an infinite loop because in ephemeral generations we expect short plugs, ie, we should be able to allocate a min free object in front of each plug. And normally we can because when we allocate objects in gen0 we make sure to break up the allocation contexts with min free objects and when we compact into gen1 we form short plugs.
Reported by multiple customers in #80073. While validating we also found that the repro causes another failure which is fixed by #83342
Testing
local functional testing; GC stress testing; also validated by a customer who hit the issue.
Risk
low risk. this would cause a region that happened to be in gen0 by accident to now actually go through our demotion logic which we already do for all other regions anyway.
IMPORTANT: Is this backport for a servicing release? If so and this change touches code that ships in a NuGet package, please make certain that you have added any necessary package authoring and gotten it explicitly reviewed.