Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.
/ druntime Public archive

Comments

SAOC: fork() based garbage collector#2604

Closed
FraMecca wants to merge 49 commits intodlang:masterfrom
FraMecca:saoc
Closed

SAOC: fork() based garbage collector#2604
FraMecca wants to merge 49 commits intodlang:masterfrom
FraMecca:saoc

Conversation

@FraMecca
Copy link

Full benchmarks here:
https://gist.github.com/FraMecca/ee50c9f4d1dc16d115dcddce8000cf9f
The other benchmark programs from druntime are not indicative in the same way they weren't for the threaded version.

Benchmark results for vdparser_extended:

fork enabled, average of 20 runs

Elapsed (wall clock) time (h:mm:ss or m:ss)      12.6355 seconds
GC summary                                     1454.2 MB
Grand total GC time                             186.5 ms
Max Pause Time                                   65.45 ms
Number of collections                             9 
Total GC prep time                               20.550 ms
Total mark time                                 106.25 ms
Total sweep time                                 58.80 ms

standard, average of 20 runs:

Elapsed (wall clock) time (h:mm:ss or m:ss)      14.363 seconds
GC summary                                     1355.0 MB
Grand total GC time                            2629.30 ms
Max Pause Time                                  969.85 ms
Number of collections                            14 
Total GC prep time                               24.950 ms
Total mark time                                2560.05 ms
Total sweep time                                 43.350 ms

Benchmark results for Dustmite --force ../../druntime/src/ 'grep -nR HAVE_FORK':

fork enabled, average of 20 runs:

Elapsed (wall clock) time (h:mm:ss or m:ss)     18.04 seconds
GC summary                                     477 MB
Grand total GC time                             69.60 ms
Max Pause Time                                  24.50 ms
Number of collections                            6
Total GC prep time                               2.05 ms
Total mark time                                 38.85 ms
Total sweep time                                27.350 ms

standard, average of 20 runs:

Elapsed (wall clock) time (h:mm:ss or m:ss)     18.1285  seconds
GC summary                                     477 MB
Grand total GC time                            475.45 ms
Max Pause Time                                 260.75 ms
Number of collections                            9
Total GC prep time                               2 ms
Total mark time                                452.35 ms
Total sweep time                                19.85 ms

Francesco Mecca added 4 commits May 11, 2019 15:01
This is everything necessary to write the core of the logic for the
concurrent mark.
Also changed some functions call to reflect their signature
@dlang-bot
Copy link
Contributor

dlang-bot commented May 11, 2019

Thanks for your pull request and interest in making D better, @FraMecca! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + druntime#2604"

@wilzbach
Copy link
Contributor

Remove the trailing whitespace for the CI to pass:

src/gc/impl/conservative/gc.d:2700:						break; // bogus
src/gc/impl/conservative/gc.d:2701:					default: // the parent
src/gc/impl/conservative/gc.d:2723:							disableFork();
posix.mak:348: recipe for target 'checkwhitespace' failed

Copy link
Member

@rainers rainers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! See comments for some nits/questions.


printf("GC options are specified as white space separated assignments:
disable:0|1 - start disabled (%d)
fork:0|1 - set fork behaviour (disabled by default) (%d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please align the dashes -.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user will probably not care how it's implemented (i.e. that is using fork), but selects the GC to be concurrent. So maybe the option should be the latter. Alternatively, it could also just be another GC named concurrent, but how to combine that with precise?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is not crazy to think that other concurrent GC implementations could come in the future, specially since fork doesn't exist in Windows and such, so maybe it is more future proof to keep it specific. This will also hint the user that the OS where it is used must support fork to work :)

@@ -43,6 +43,7 @@ import core.gc.gcinterface;
import rt.util.container.treap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should define a version (e.g. COLLECT_FORKED) that is set only for supporting platforms. Most code should only be enabled with that version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or use static if (HaveFork).


// minimize() should be called only after a call to fullcollect
// terminates with a sweep
if (doMinimize || lowMem)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can doMinimize be converted to a function parameter instead of a field? It's unlikely to actually find completely unused pool, so it's probably not too bad if it is skipped sometimes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don;t think it can because it needs to be set when:

  • lowMem
  • tryAllocNewPool fails

we could avoid minimizing after a failed tryAllocNewPool and call minimize() only on lowMem (at least for the forking GC only)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it just needs a better name, e.g. minimizeAfterNextCollection.

freebits.alloc(nbits);
freebits.setRange(0, nbits);
}
freebits.setAll();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already done with the freebits.setRange(0, nbits); call above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get a sigsegv if I replace that call with freebits.setRange(0, nbits);
setAll does this: memset(data, 0xFF, nwords * wordtype.sizeof)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

freebits is only allocated for SmallObjectPool, but setAll is called unconditionally. I suspect setRange doesn't work as expected with nbits == 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line or replace the freebits.setRange call above with it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want me to remove that line?
It leaks without it while setRange generates a sigsegv because it does a different thing.
I can put it behind an else statement so that it gets called for smallPools only.

Copy link
Member

@rainers rainers Jun 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

freebits is empty in the else case, so setAll does the same as setRange for small object pools.

@FraMecca
Copy link
Author

I cannot reproduce this. Maybe it is related to FreeBSD

hospital.d seems another candidate that I have seen in a number of test failures. But there are others in druntime unittest and elsewhere, too.

Maybe try using atomic operations for the mark bits as discussed, i.e. mark.setLocked and mark!(precise, true).

parallel is enabled by default so CI test should run using markParallel that uses mark.setLocked.

In any case, I can't reproduce the failures on my machine so I don't know how to proceed with debugging CI issues.

@rainers
Copy link
Member

rainers commented Jul 12, 2019

parallel is enabled by default so CI test should run using markParallel that uses mark.setLocked.

But the mark.set calls in concurrent allocations are not atomic,

@rainers
Copy link
Member

rainers commented Jul 13, 2019

In any case, I can't reproduce the failures on my machine so I don't know how to proceed with debugging CI issues.

I've setup a VM with Ubuntu 19 and observe that the std.stdio unittest is running into a ForkError on termination. Maybe this has to do with the unittest also using fork to delete temporary files.

The gc can ignore ECHILD because that means that child termination was
caught by the user already (with a call to wait(0) )
@FraMecca
Copy link
Author

FraMecca commented Jul 15, 2019

In any case, I can't reproduce the failures on my machine so I don't know how to proceed with debugging CI issues.

I've setup a VM with Ubuntu 19 and observe that the std.stdio unittest is running into a ForkError on termination. Maybe this has to do with the unittest also using fork to delete temporary files.

This prompted me to open a PR to phobos dlang/phobos#7111
but also to recognize the fact that the user may catch the GC's child termination erroneously so we should ignore ECHILD and acknowledge the end of the concurrent mark phase 6af3c10

@rainers
Copy link
Member

rainers commented Jul 17, 2019

I can now reproduce failures by executing ./run.d -f runnable/hospital.d a couple of times in ddm/test.

The failures seem to go away if I disable either forking or parallel marking. Changing the single call to pool.mark.set during allocation to setLocked didn't help.

fork:0|1 - set fork behaviour (%d)
profile:0|1|2 - enable profiling with summary when terminating program (%d)
gc:".ptr, disable, profile);
gc:".ptr, disable, profile, fork);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the wrong order, now. It's in the middle in the format string.

}

mark.Dtor();
mark.Dtor(config.fork);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is an unconditional call, while mark.alloc uses version(COLLECT_FORK). They should be consistent, probably simpler to leave the evaluation of the share parameter to GCbits and remove the version condition for mark.alloc..

@rainers
Copy link
Member

rainers commented Jul 20, 2019

I made a couple more experiments with hospital.d:

  • using static linkage instead of shared seems to reduce the probability of a failure (2 out of 100 runs instead of about 10 out of 100).
  • using less cores for parallel marking also reduces the chances of failure.
  • adding logging during collection (COLLECT_PRINTF) shows differences between runs, but that is to be expected given the concurrency.
  • adding logging to the allocation (PRINTF) seems to make the issue go away. I suspect that's because the slowdown causes less concurrent accesses.
  • using setLocked instead of set doesn't make a notable difference.

BTW: the failure shows as a small difference in the results, not a crash. Correct run:

Patients: 102515
Time:     33730654
Visits:   106371

Failed run:

Patients: 102515
Time:     33730469
Visits:   106370

@rainers
Copy link
Member

rainers commented Jul 20, 2019

If this passes and we disable fork in the default configuration, it would be good to add some test coverage by running the unittests with the forking GC., See test/gc/Makefile for similar runs.

@wilzbach
Copy link
Contributor

Ping @FraMecca. Any chance you can bring this to the finish line?

@FraMecca
Copy link
Author

@wilzbach I am still working from time to time in understanding the source of the bug.
Right now I can't allocate much time to it, if you need to close it do it, I'll reopen when I have a solution.

If there is something I can do in the meantime let me know.

@ambyjkl
Copy link

ambyjkl commented May 11, 2020

Happy Birthday to this PR :(. @FraMecca any luck with the bug?

@dlang-bot dlang-bot added Needs Rebase needs a `git rebase` performed Needs Work stalled labels May 27, 2021
dlang-bot pushed a commit that referenced this pull request Aug 1, 2021
#2604 rebased - fork() based garbage collector

Signed-off-by: Razvan Nitu <RazvanN7@users.noreply.github.com>
Merged-on-behalf-of: unknown
@RazvanN7
Copy link
Contributor

RazvanN7 commented Aug 31, 2021

I'm going to close this since #3514 was merged.

@RazvanN7 RazvanN7 closed this Aug 31, 2021
@thewilsonator
Copy link
Contributor

#2604 is this PR.. you mean #3514

@RazvanN7
Copy link
Contributor

RazvanN7 commented Aug 31, 2021

Yes, I misread the link description. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Needs Rebase needs a `git rebase` performed Needs Work stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants