Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8236926: Concurrently uncommit memory in G1 #1141

Closed
wants to merge 19 commits into from

Conversation

kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Nov 10, 2020

Please review this change that implements concurrent uncommit for G1.

Summary
G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task G1 Uncommit Region Task. The new task will uncommit memory in chunks of regions to avoid starving out other tasks.

The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS).

Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock Uncommit_lock is added.

One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization.

Logging
To track the concurrent uncommit in logs a few additional messages have been added. There are no new info messages, but for gc+heap there are two new debug messages and one trace:

[7,468s][debug][gc,heap        ] GC(32) Regions ready for uncommit: 1873
...
[7,509s][trace][gc,heap        ] Concurrent Uncommit: 256M, 32 regions, 11,173ms
[7,522s][trace][gc,heap        ] Concurrent Uncommit: 256M, 32 regions, 12,599ms
...
[9,691s][debug][gc,heap        ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms

The trace is printed for each invocation of the task while the debug message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated.

On gc+heap+region there are new logs to see how ranges of regions transition between different states:

[6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024)
[6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280)
[6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536)
[6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792)
[6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651)
[6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96)
[6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099)

Testing
Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot.

I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Testing

Linux aarch64 Linux arm Linux ppc64le Linux s390x Linux x64 Linux x86 Windows x64 macOS x64
Build ⏳ (1/1 running) ⏳ (1/1 running) ⏳ (1/1 running) ⏳ (1/1 running) ✔️ (6/6 passed) ✔️ (2/2 passed) ⏳ (2/2 running) ✔️ (2/2 passed)
Test (tier1) ⏳ (7/9 running) ⏳ (8/9 running) ⏳ (8/9 running)

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/1141/head:pull/1141
$ git checkout pull/1141

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 10, 2020

👋 Welcome back sjohanss! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 10, 2020

@kstefanj The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Nov 10, 2020
@kstefanj kstefanj marked this pull request as ready for review November 10, 2020 10:52
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 10, 2020
@mlbridge
Copy link

mlbridge bot commented Nov 10, 2020

Webrevs

Copy link
Member

@albertnetymk albertnetymk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description provides a very good summary. Furthermore, it would be nice to have an overview of the algorithm: how the additional concurrency managed, what resources are protected by the lock, etc. Ideally, this should be in the comments, making references to corresponding classes/variables.

I've also done a performance run and as expected there are not significant changes.

How about pauses? One of the motivations of this PR is to remove pauses introduced by synchronous uncommit, right?

@@ -2659,6 +2663,12 @@ void G1CollectedHeap::gc_epilogue(bool full) {
_collection_pause_end = Ticks::now();
}

void G1CollectedHeap::uncommit_heap_if_necessary() {
if (hrm()->has_inactive_regions()) {
G1UncommitRegionTask::activate();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since activate is already used for regions, I would suggest another word; simple run is enough. Additionally, I don't think using an enum is necessary, a plain bool is_running should do.

enum class TaskState { active, inactive };
TaskState _state;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, for a short while I had the state running as well, but now just having a simple bool is enough now. I still call the state _active since I think that is more accurate. Also change the name to run().

@@ -182,13 +174,30 @@ HeapRegion* HeapRegionManager::new_heap_region(uint hrm_index) {
return g1h->new_heap_region(hrm_index, mr);
}

void HeapRegionManager::expand(uint start, uint num_regions, WorkGang* pretouch_gang) {
guarantee(num_regions > 0, "No point in calling this for zero regions");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same guarantee is already in commit_regions; I think an assert is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it, as you say we check it the first thing we do in commit_regions.

_allocated_heapregions_length = MAX2(_allocated_heapregions_length, i + 1);
}

if (G1CollectedHeap::heap()->hr_printer()->is_active()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if is not needed, right? commit(hr) does such check already. There are a few other cases of the same kind.

Copy link
Contributor Author

@kstefanj kstefanj Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit unfortunate, but this is_active() is checking if the G1HRPrinter is active and should print stuff. So it is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misunderstood Alberts comment here. He is correct that the check inside commit(hr) is enough, updated this and a few other similar cases.

Comment on lines 271 to 272
guarantee(num_regions >= 1, "Need to specify at least one region to uncommit, tried to uncommit zero regions at %u", start);
guarantee(length() >= num_regions, "pre-condition");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure why guarantee here but assert in reactivate_regions. I don't see a strong reason to use guarantee here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to assert, the reason for them being different is that the code in deactivate_regions is old and just moved into this function. Also changed condition to > 0 like we have most other places.

void HeapRegionManager::reactivate_regions(uint start, uint num_regions) {
assert(num_regions > 0, "No point in calling this for zero regions");

clear_auxiliary_data_structures(start, num_regions);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this name is too general. Even after reading its body and the associated comments, I don't get what "data structures" are cleared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me the comments and implementation is pretty descriptive, but I did update the comment for signal_mapping_changed a bit to make it more explicit. Also added a few lines to explain what will be cleared by each mapper.

I agree that this is not a perfect name but I think the naming is in line with how we refer to these structures elsewhere in the code.

@@ -107,6 +90,9 @@ class HeapRegionManager: public CHeapObj<mtGC> {
// Pass down commit calls to the VirtualSpace.
void commit_regions(uint index, size_t num_regions = 1, WorkGang* pretouch_gang = NULL);

// Initialize the HeapRegions in the range and put them on the free list.
void initialize_regions(uint start, uint num_regions);

// Notify other data structures about change in the heap layout.
void update_committed_space(HeapWord* old_end, HeapWord* new_end);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_committed_space is not implemented, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, filed JDK-8256323 for this.


// The number of regions committed in the heap.
uint _num_committed;
// Map to keep track of which regions are in use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the variable suggests it's tracking what regions are committed and that's all. After reading G1CommittedRegionMap, I believe the class name and the var name are quite misleading; it tracks the state of mem backing up all regions, committed+mapped, committed, to_be_uncommitted, committed. I don't have any good alternatives, but the comments could surely be expanded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard and I agree this isn't perfect but the union of the two bitmaps in G1CommittedRegionMap do track what is committed, so it's not completely wrong.

I did update the comment here a bit.

Comment on lines 134 to 137
void activate_regions(uint index, uint num_regions = 1);
void deactivate_regions(uint start, size_t num_regions);
void reactivate_regions(uint start, uint num_regions);
void uncommit_regions(uint start, uint num_regions);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why uint vs size_t? Why = 1 for some but not others? If such inconsistency is intentional, it should be documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for size_t for deactivate_regions is that we have a size_t in shrink_at which calls it. But for consistency here I will change it to uint and add a cast in shrink_at.

For the default values, on expand it is historic but can now be removed. Good catch, and for inactive I don't have any good excuse =)

assert(_state == TaskState::active, "Must be active");

// Each execution is limited to uncommit at most 256M worth of regions.
static const uint region_limit = (uint) (256 * M / G1HeapRegionSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 256M? Better include some motivation in the comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

256M is just a "reasonable" limit that I picked to get short enough invocations. I updated the comment a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion I have is to make this a static constant in the class declaration not dig it up somewhere in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the 256 * M out to be a class constant, leaving the region limit here since it is not known compile time.

Comment on lines +64 to +67
void active_set_range(uint start, uint end);
void active_clear_range(uint start, uint end);
void inactive_set_range(uint start, uint end);
void inactive_clear_range(uint start, uint end);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading their implementation, I see that Uncommit_lock must be held on call them. I think it's best to mention this precondition in the comments, and explain why this lock is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comments a bit and referred to guarantee_mt_safty_* for more details.

@openjdk
Copy link

openjdk bot commented Nov 13, 2020

@kstefanj this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout 8236926-ccu
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Nov 13, 2020
@kstefanj
Copy link
Contributor Author

Thanks @albertnetymk for your comments, this update does not include any updates for those. This instead fixes a race in the low level committing code for G1 where it was assumed only one thread can commit and uncommit regions at a time. This is no longer the case after this change. It is true for a single region, but adjacent regions are allowed to be committed/uncommitted in parallel. This can for example happen if a humongous allocation happens during concurrent uncommit.

The fix is to use parallel versions of the bitmap operations and for the G1RegionsSmallerThanCommitSizeMapper add a lock to prevent parallel updates for this mapper. This is needed because multiple regions can share a single underlying OS page, so we need to make sure those updates are atomic on a page level.

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Nov 13, 2020
Copy link
Contributor Author

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments Albert.

@@ -2659,6 +2663,12 @@ void G1CollectedHeap::gc_epilogue(bool full) {
_collection_pause_end = Ticks::now();
}

void G1CollectedHeap::uncommit_heap_if_necessary() {
if (hrm()->has_inactive_regions()) {
G1UncommitRegionTask::activate();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, for a short while I had the state running as well, but now just having a simple bool is enough now. I still call the state _active since I think that is more accurate. Also change the name to run().

Comment on lines +64 to +67
void active_set_range(uint start, uint end);
void active_clear_range(uint start, uint end);
void inactive_set_range(uint start, uint end);
void inactive_clear_range(uint start, uint end);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comments a bit and referred to guarantee_mt_safty_* for more details.

assert(_state == TaskState::active, "Must be active");

// Each execution is limited to uncommit at most 256M worth of regions.
static const uint region_limit = (uint) (256 * M / G1HeapRegionSize);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

256M is just a "reasonable" limit that I picked to get short enough invocations. I updated the comment a bit.

@@ -182,13 +174,30 @@ HeapRegion* HeapRegionManager::new_heap_region(uint hrm_index) {
return g1h->new_heap_region(hrm_index, mr);
}

void HeapRegionManager::expand(uint start, uint num_regions, WorkGang* pretouch_gang) {
guarantee(num_regions > 0, "No point in calling this for zero regions");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it, as you say we check it the first thing we do in commit_regions.

_allocated_heapregions_length = MAX2(_allocated_heapregions_length, i + 1);
}

if (G1CollectedHeap::heap()->hr_printer()->is_active()) {
Copy link
Contributor Author

@kstefanj kstefanj Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit unfortunate, but this is_active() is checking if the G1HRPrinter is active and should print stuff. So it is needed.

Comment on lines 271 to 272
guarantee(num_regions >= 1, "Need to specify at least one region to uncommit, tried to uncommit zero regions at %u", start);
guarantee(length() >= num_regions, "pre-condition");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to assert, the reason for them being different is that the code in deactivate_regions is old and just moved into this function. Also changed condition to > 0 like we have most other places.

HeapRegionRange range = _committed_map.next_inactive_range(offset);
// No more regions available for uncommit
if (range.length() == 0) {
return uncommitted;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncommitted can be 0 here so we can't add that assert. There is a chance that between we add the uncommit-task (which calls this function) and that we grab the lock, someone else might have used the inactive regions to expand the heap again. Update the comment a bit, I hope it makes it easier to follow.


// The number of regions committed in the heap.
uint _num_committed;
// Map to keep track of which regions are in use.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard and I agree this isn't perfect but the union of the two bitmaps in G1CommittedRegionMap do track what is committed, so it's not completely wrong.

I did update the comment here a bit.

@@ -107,6 +90,9 @@ class HeapRegionManager: public CHeapObj<mtGC> {
// Pass down commit calls to the VirtualSpace.
void commit_regions(uint index, size_t num_regions = 1, WorkGang* pretouch_gang = NULL);

// Initialize the HeapRegions in the range and put them on the free list.
void initialize_regions(uint start, uint num_regions);

// Notify other data structures about change in the heap layout.
void update_committed_space(HeapWord* old_end, HeapWord* new_end);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, filed JDK-8256323 for this.

Comment on lines 134 to 137
void activate_regions(uint index, uint num_regions = 1);
void deactivate_regions(uint start, size_t num_regions);
void reactivate_regions(uint start, uint num_regions);
void uncommit_regions(uint start, uint num_regions);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for size_t for deactivate_regions is that we have a size_t in shrink_at which calls it. But for consistency here I will change it to uint and add a cast in shrink_at.

For the default values, on expand it is historic but can now be removed. Good catch, and for inactive I don't have any good excuse =)

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Nov 16, 2020
@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Nov 16, 2020
Copy link
Member

@albertnetymk albertnetymk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the revision.

HeapRegionRange next_inactive_range(uint offset) const;
// Finds the next range of committable regions starting at offset.
// This function must only be called when no inactive regions are
// present and can be used to active more regions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/active/activate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -563,6 +563,11 @@ class G1CollectedHeap : public CollectedHeap {

void resize_heap_if_necessary();

// Check if there is memory to uncommit and if so schedule a task to do it.
void uncommit_heap_if_necessary();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if the method were called uncommit_regions_if_necessary() as this method does not uncommit the heap but just uncomittable regions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I agree that is more accurate.

@@ -563,6 +563,11 @@ class G1CollectedHeap : public CollectedHeap {

void resize_heap_if_necessary();

// Check if there is memory to uncommit and if so schedule a task to do it.
void uncommit_heap_if_necessary();
uint uncommit_regions(uint region_limit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment like "// Immediately uncommits uncommittable regions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

log_debug(gc, ergo, heap)("Attempt heap shrinking (archive regions). Total size: " SIZE_FORMAT "B",
HeapRegion::GrainWords * HeapWordSize * shrink_count);
// Explicit uncommit.
_hrm.uncommit_inactive_regions((uint) shrink_count);
Copy link
Contributor

@tschatzl tschatzl Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let the code use the G1CollectedHeap::uncommit_regions() helper here to limit the references to _hrm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed.

@@ -743,7 +744,7 @@ void G1CollectedHeap::dealloc_archive_regions(MemRegion* ranges, size_t count) {
HeapWord* prev_last_addr = NULL;
HeapRegion* prev_last_region = NULL;
size_t size_used = 0;
size_t uncommitted_regions = 0;
size_t shrink_count = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code may as well define shrink_count as uint as the only use seems to cast it to uint anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I left it size_t since everything else in this function uses size_t, but uint is a better fit.

virtual void commit_regions(uint start_idx, size_t num_regions, WorkGang* pretouch_gang) {
guarantee(uncommitted_range(start_idx, num_regions),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this (and the one in uncommit_regions) should be guarantees.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with guarantee since this is what's used in G1PageBasedVirtualSpace for similar checks. Errors like this might be more likely in release builds.

// those resources are in sync:
// - G1RegionToSpaceMapper::_region_commit_map;
// - G1PageBasedVirtualSpace::_committed (_storage.commit())
Mutex _lock;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comment! :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -133,6 +133,9 @@ class G1ServiceThread: public ConcurrentGCThread {
// Register a task with the service thread and schedule it. If
// no delay is specified the task is scheduled to run directly.
void register_task(G1ServiceTask* task, jlong delay = 0);
// Notify a change to the service thread. Used to stop either
// stop the service or to force check for new tasks.
void notify();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first "stop" is too much

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

uint length() const { return _end - _start; }
};

class G1CommittedRegionMap : public CHeapObj<mtGC> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a diagram of the region states and their transitions here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

}
G1CollectedHeap::heap()->hr_printer()->commit(hr);
}
activate_regions(start, num_regions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this place there will be two messages from the HRPrinter for every region:

  1. a COMMIT message and
  2. an ACTIVATE message

This is a bit confusing as in my understanding (that's why I asked for a region state diagram in the G1CommittedRegionMap which may as well be put in HeapRegionManager) the (typical) flow of states are Uncommitted->Committed/Active->Committed/Inactive->Uncommitted.
As mentioned, I'm not sure if it is a good idea to send two separate messages here; better rename the "Active" message to "Commit-Active" (and "Inactive" to "Commit-Inactive") instead imho, even if it's quite long (and drop HRPrinter::commit() completely)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that it will generate two messages when committing a previously uncommitted region, but I still think it is valuable to separate them since we can also have the state change Active->Inactive->Active. In this case the transition from Inactive to Active will not include a commit, but rather making inactive regions active again. Just seeing a "Commit-Active" message in this case would not be as clear as seeing "Active" that is not immediately preceded by "Commit".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I looked at the G1HRPrinter again, and indeed it prints the action, not the region state, so this is a fair point.

Copy link
Contributor Author

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review Thomas. Addressed most of your concerns, but some things I left as is.

@@ -743,7 +744,7 @@ void G1CollectedHeap::dealloc_archive_regions(MemRegion* ranges, size_t count) {
HeapWord* prev_last_addr = NULL;
HeapRegion* prev_last_region = NULL;
size_t size_used = 0;
size_t uncommitted_regions = 0;
size_t shrink_count = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I left it size_t since everything else in this function uses size_t, but uint is a better fit.

log_debug(gc, ergo, heap)("Attempt heap shrinking (archive regions). Total size: " SIZE_FORMAT "B",
HeapRegion::GrainWords * HeapWordSize * shrink_count);
// Explicit uncommit.
_hrm.uncommit_inactive_regions((uint) shrink_count);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed.

@@ -1305,6 +1309,7 @@ void G1CollectedHeap::shrink_helper(size_t shrink_bytes) {
log_debug(gc, ergo, heap)("Shrink the heap. requested shrinking amount: " SIZE_FORMAT "B aligned shrinking amount: " SIZE_FORMAT "B attempted shrinking amount: " SIZE_FORMAT "B",
shrink_bytes, aligned_shrink_bytes, shrunk_bytes);
if (num_regions_removed > 0) {
log_debug(gc, heap)("Regions ready for uncommit: %u", num_regions_removed);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound good.

@@ -563,6 +563,11 @@ class G1CollectedHeap : public CollectedHeap {

void resize_heap_if_necessary();

// Check if there is memory to uncommit and if so schedule a task to do it.
void uncommit_heap_if_necessary();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I agree that is more accurate.

@@ -563,6 +563,11 @@ class G1CollectedHeap : public CollectedHeap {

void resize_heap_if_necessary();

// Check if there is memory to uncommit and if so schedule a task to do it.
void uncommit_heap_if_necessary();
uint uncommit_regions(uint region_limit);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (g1h->has_uncommittable_regions()) {
// No delay, reason to reschedule rather then to loop is to allow
// other tasks to run without waiting for a full uncommit cycle.
schedule(0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because here we know that the service thread is running and if we schedule with a 0 delay, there is no risk of it going to sleep before we run again. Another task might be up next, but this task will eventually run before the service thread can go to sleep.

There is room for improvement on how we schedule tasks from another thread. I changed enqueue to call a new public function on the G1ServiceThread called schedule_task, which calls schedule(task, delay) and then notify(). The function schedule(task, delay) is what previously was named schedule_task. This is what will be called when someone does task->schedule(delay) as well, so it is a bit more unified.

// set to active only from a safepoint and it is set to false
// while running on the service thread joined with the suspendible
// thread set.
bool _active;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, since I moved the 256M constant into the class, the chunking info is now in a comment for this constant. For the state I added the info around usage and moved the implementation detail into set_active().

void clear_summary();

public:
static void run();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like enqueue, changed.

@@ -133,6 +133,9 @@ class G1ServiceThread: public ConcurrentGCThread {
// Register a task with the service thread and schedule it. If
// no delay is specified the task is scheduled to run directly.
void register_task(G1ServiceTask* task, jlong delay = 0);
// Notify a change to the service thread. Used to stop either
// stop the service or to force check for new tasks.
void notify();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

}
G1CollectedHeap::heap()->hr_printer()->commit(hr);
}
activate_regions(start, num_regions);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that it will generate two messages when committing a previously uncommitted region, but I still think it is valuable to separate them since we can also have the state change Active->Inactive->Active. In this case the transition from Inactive to Active will not include a commit, but rather making inactive regions active again. Just seeing a "Commit-Active" message in this case would not be as clear as seeing "Active" that is not immediately preceded by "Commit".

@kstefanj
Copy link
Contributor Author

Testing on the latest changes looks good.

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, thanks, except for some remaining minor nit and some question.

@@ -210,11 +210,10 @@ void G1ServiceThread::register_task(G1ServiceTask* task, jlong delay) {

// Notify the service thread that there is a new task, thread might
// be waiting and the newly added task might be first in the list.
MonitorLocker ml(&_monitor, Mutex::_no_safepoint_check_flag);
ml.notify();
notify();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call schedule_task() here because the two calls are just that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, yes since we are already calling schedule_task() this call to notify is not needed.

if (g1h->has_uncommittable_regions()) {
// No delay, reason to reschedule rather then to loop is to allow
// other tasks to run without waiting for a full uncommit cycle.
schedule(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is mainly, why not notify the task to wake up when scheduling a new task in all cases. I understand the reason for the zero delay.

Not seeing the problem of doing so:

  • tasks that need to run (or are overdue) are automatically run before this task as they have a time-to-run < current time, and so this task is scheduled after

  • The extra notification for schedule_task() does not seem to hurt, at most it wakes up the service thread to do the next scheduled task (which afaiu other tasks if they are already due).

I.e. the "optimization" here to not notify the service thread seems to be superfluous. Or maybe the notification could be suppressed automatically if schedule_task() is called in execute (G1ServiceThread can check fairly easily if it is currently running a task)

I am concerned about users of the API to needlessly have to decide whether they should call schedule() or schedule_task() as they have different effect.

Maybe schedule() could just call schedule_task().

(That might be a pre-existing issue of using schedule vs. schedule_task(), so feel free to say it's out of scope. :) )

Copy link
Contributor Author

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Thomas for your review.

if (g1h->has_uncommittable_regions()) {
// No delay, reason to reschedule rather then to loop is to allow
// other tasks to run without waiting for a full uncommit cycle.
schedule(0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed the assert and also update the comments a bit.

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. Ship it.

@openjdk
Copy link

openjdk bot commented Nov 19, 2020

@kstefanj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8236926: Concurrently uncommit memory in G1

Reviewed-by: ayang, tschatzl

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 21 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 19, 2020
@kstefanj
Copy link
Contributor Author

Thanks for the reviews @albertnetymk and @tschatzl!
/integrate

@openjdk openjdk bot closed this Nov 19, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 19, 2020
@openjdk
Copy link

openjdk bot commented Nov 19, 2020

@kstefanj Since your change was applied there have been 21 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

Pushed as commit b8244b6.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk-notifier bot referenced this pull request Nov 19, 2020
@kstefanj kstefanj deleted the 8236926-ccu branch May 18, 2021 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants