SAOC: fork() based garbage collector by FraMecca · Pull Request #2604 · dlang/druntime

FraMecca · 2019-05-11T13:35:19Z

Full benchmarks here:
https://gist.github.com/FraMecca/ee50c9f4d1dc16d115dcddce8000cf9f
The other benchmark programs from druntime are not indicative in the same way they weren't for the threaded version.

Benchmark results for vdparser_extended:

fork enabled, average of 20 runs

Elapsed (wall clock) time (h:mm:ss or m:ss)      12.6355 seconds
GC summary                                     1454.2 MB
Grand total GC time                             186.5 ms
Max Pause Time                                   65.45 ms
Number of collections                             9 
Total GC prep time                               20.550 ms
Total mark time                                 106.25 ms
Total sweep time                                 58.80 ms

standard, average of 20 runs:

Elapsed (wall clock) time (h:mm:ss or m:ss)      14.363 seconds
GC summary                                     1355.0 MB
Grand total GC time                            2629.30 ms
Max Pause Time                                  969.85 ms
Number of collections                            14 
Total GC prep time                               24.950 ms
Total mark time                                2560.05 ms
Total sweep time                                 43.350 ms

Benchmark results for Dustmite --force ../../druntime/src/ 'grep -nR HAVE_FORK':

fork enabled, average of 20 runs:

Elapsed (wall clock) time (h:mm:ss or m:ss)     18.04 seconds
GC summary                                     477 MB
Grand total GC time                             69.60 ms
Max Pause Time                                  24.50 ms
Number of collections                            6
Total GC prep time                               2.05 ms
Total mark time                                 38.85 ms
Total sweep time                                27.350 ms

standard, average of 20 runs:

Elapsed (wall clock) time (h:mm:ss or m:ss)     18.1285  seconds
GC summary                                     477 MB
Grand total GC time                            475.45 ms
Max Pause Time                                 260.75 ms
Number of collections                            9
Total GC prep time                               2 ms
Total mark time                                452.35 ms
Total sweep time                                19.85 ms

This is everything necessary to write the core of the logic for the concurrent mark.

Also changed some functions call to reflect their signature

dlang-bot · 2019-05-11T13:35:20Z

Thanks for your pull request and interest in making D better, @FraMecca! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + druntime#2604"

src/gc/impl/conservative/gc.d

wilzbach · 2019-05-11T13:43:35Z

Remove the trailing whitespace for the CI to pass:

src/gc/impl/conservative/gc.d:2700:						break; // bogus
src/gc/impl/conservative/gc.d:2701:					default: // the parent
src/gc/impl/conservative/gc.d:2723:							disableFork();
posix.mak:348: recipe for target 'checkwhitespace' failed

src/gc/bits.d

src/gc/os.d

src/gc/bits.d

src/gc/impl/conservative/gc.d

rainers

Good stuff! See comments for some nits/questions.

rainers · 2019-05-12T15:52:57Z

src/core/gc/config.d


        printf("GC options are specified as white space separated assignments:
    disable:0|1    - start disabled (%d)
+    fork:0|1    - set fork behaviour (disabled by default) (%d)


please align the dashes -.

The user will probably not care how it's implemented (i.e. that is using fork), but selects the GC to be concurrent. So maybe the option should be the latter. Alternatively, it could also just be another GC named concurrent, but how to combine that with precise?

I think is not crazy to think that other concurrent GC implementations could come in the future, specially since fork doesn't exist in Windows and such, so maybe it is more future proof to keep it specific. This will also hint the user that the OS where it is used must support fork to work :)

src/gc/bits.d

src/core/gc/config.d

rainers · 2019-05-12T16:06:52Z

src/gc/impl/conservative/gc.d

@@ -43,6 +43,7 @@ import core.gc.gcinterface;
 import rt.util.container.treap;


I think you should define a version (e.g. COLLECT_FORKED) that is set only for supporting platforms. Most code should only be enabled with that version.

Or use static if (HaveFork).

src/gc/impl/conservative/gc.d

rainers · 2019-05-12T16:32:51Z

src/gc/impl/conservative/gc.d


+        // minimize() should be called only after a call to fullcollect
+        // terminates with a sweep
+        if (doMinimize || lowMem)


Can doMinimize be converted to a function parameter instead of a field? It's unlikely to actually find completely unused pool, so it's probably not too bad if it is skipped sometimes.

I don;t think it can because it needs to be set when:

lowMem

tryAllocNewPool fails

we could avoid minimizing after a failed tryAllocNewPool and call minimize() only on lowMem (at least for the forking GC only)

Maybe it just needs a better name, e.g. minimizeAfterNextCollection.

rainers · 2019-05-12T16:34:19Z

src/gc/impl/conservative/gc.d

            freebits.alloc(nbits);
            freebits.setRange(0, nbits);
        }
+        freebits.setAll();


This is already done with the freebits.setRange(0, nbits); call above.

I get a sigsegv if I replace that call with freebits.setRange(0, nbits);
setAll does this: memset(data, 0xFF, nwords * wordtype.sizeof)

freebits is only allocated for SmallObjectPool, but setAll is called unconditionally. I suspect setRange doesn't work as expected with nbits == 0.

Remove this line or replace the freebits.setRange call above with it.

Why do you want me to remove that line?
It leaks without it while setRange generates a sigsegv because it does a different thing.
I can put it behind an else statement so that it gets called for smallPools only.

freebits is empty in the else case, so setAll does the same as setRange for small object pools.

src/gc/impl/conservative/gc.d

FraMecca · 2019-07-12T15:47:15Z

I cannot reproduce this. Maybe it is related to FreeBSD

hospital.d seems another candidate that I have seen in a number of test failures. But there are others in druntime unittest and elsewhere, too.

Maybe try using atomic operations for the mark bits as discussed, i.e. mark.setLocked and mark!(precise, true).

parallel is enabled by default so CI test should run using markParallel that uses mark.setLocked.

In any case, I can't reproduce the failures on my machine so I don't know how to proceed with debugging CI issues.

rainers · 2019-07-12T16:29:13Z

parallel is enabled by default so CI test should run using markParallel that uses mark.setLocked.

But the mark.set calls in concurrent allocations are not atomic,

rainers · 2019-07-13T06:06:24Z

In any case, I can't reproduce the failures on my machine so I don't know how to proceed with debugging CI issues.

I've setup a VM with Ubuntu 19 and observe that the std.stdio unittest is running into a ForkError on termination. Maybe this has to do with the unittest also using fork to delete temporary files.

The gc can ignore ECHILD because that means that child termination was caught by the user already (with a call to wait(0) )

FraMecca · 2019-07-15T16:25:17Z

In any case, I can't reproduce the failures on my machine so I don't know how to proceed with debugging CI issues.

I've setup a VM with Ubuntu 19 and observe that the std.stdio unittest is running into a ForkError on termination. Maybe this has to do with the unittest also using fork to delete temporary files.

This prompted me to open a PR to phobos dlang/phobos#7111
but also to recognize the fact that the user may catch the GC's child termination erroneously so we should ignore ECHILD and acknowledge the end of the concurrent mark phase 6af3c10

rainers · 2019-07-17T06:45:29Z

I can now reproduce failures by executing ./run.d -f runnable/hospital.d a couple of times in ddm/test.

The failures seem to go away if I disable either forking or parallel marking. Changing the single call to pool.mark.set during allocation to setLocked didn't help.

src/core/gc/config.d

rainers · 2019-07-17T20:41:11Z

src/core/gc/config.d

+    fork:0|1       - set fork behaviour (%d)
    profile:0|1|2  - enable profiling with summary when terminating program (%d)
-    gc:".ptr, disable, profile);
+    gc:".ptr, disable, profile, fork);


That's the wrong order, now. It's in the middle in the format string.

rainers · 2019-07-17T20:45:46Z

src/gc/impl/conservative/gc.d

        }

-        mark.Dtor();
+        mark.Dtor(config.fork);


Nit: this is an unconditional call, while mark.alloc uses version(COLLECT_FORK). They should be consistent, probably simpler to leave the evaluation of the share parameter to GCbits and remove the version condition for mark.alloc..

rainers · 2019-07-20T08:21:45Z

I made a couple more experiments with hospital.d:

using static linkage instead of shared seems to reduce the probability of a failure (2 out of 100 runs instead of about 10 out of 100).
using less cores for parallel marking also reduces the chances of failure.
adding logging during collection (COLLECT_PRINTF) shows differences between runs, but that is to be expected given the concurrency.
adding logging to the allocation (PRINTF) seems to make the issue go away. I suspect that's because the slowdown causes less concurrent accesses.
using setLocked instead of set doesn't make a notable difference.

BTW: the failure shows as a small difference in the results, not a crash. Correct run:

Patients: 102515
Time:     33730654
Visits:   106371

Failed run:

Patients: 102515
Time:     33730469
Visits:   106370

rainers · 2019-07-20T08:25:58Z

If this passes and we disable fork in the default configuration, it would be good to add some test coverage by running the unittests with the forking GC., See test/gc/Makefile for similar runs.

wilzbach · 2019-10-19T09:44:50Z

Ping @FraMecca. Any chance you can bring this to the finish line?

FraMecca · 2019-10-20T09:05:18Z

@wilzbach I am still working from time to time in understanding the source of the bug.
Right now I can't allocate much time to it, if you need to close it do it, I'll reopen when I have a solution.

If there is something I can do in the meantime let me know.

ambyjkl · 2020-05-11T14:09:55Z

Happy Birthday to this PR :(. @FraMecca any luck with the bug?

#2604 rebased - fork() based garbage collector Signed-off-by: Razvan Nitu <RazvanN7@users.noreply.github.com> Merged-on-behalf-of: unknown

RazvanN7 · 2021-08-31T10:12:48Z

I'm going to close this since #3514 was merged.

thewilsonator · 2021-08-31T10:18:58Z

#2604 is this PR.. you mean #3514

RazvanN7 · 2021-08-31T10:19:57Z

Yes, I misread the link description. Thanks!

Francesco Mecca added 4 commits May 11, 2019 15:01

Toggle forking gc on/off

16ac6d9

Annotate used functions with @nogc

b5e5109

C functions and imports

1016224

This is everything necessary to write the core of the logic for the concurrent mark.

Core logic of the fork() based gc.

89af969

Also changed some functions call to reflect their signature

FraMecca requested review from DmitryOlshansky, MartinNowak, andralex, rainers and wilzbach as code owners May 11, 2019 13:35

FraMecca commented May 11, 2019

View reviewed changes

src/gc/impl/conservative/gc.d Outdated Show resolved Hide resolved

Francesco Mecca added 2 commits May 11, 2019 16:00

whitespace

dbab396

whitespace before parenthesis

412e6ec