#2604 rebased - fork() based garbage collector by rainers · Pull Request #3514 · dlang/druntime

rainers · 2021-07-13T06:38:42Z

Attempt to rebase #2604 to see whether it still fails

dlang-bot · 2021-07-13T06:38:45Z

Thanks for your pull request, @rainers!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + druntime#3514"

RazvanN7 · 2021-07-13T09:05:07Z

Thanks for doing this @rainers ! Fingers crossed! Things look good momentarily.

ibuclaw · 2021-07-13T18:54:54Z

https://sourceware.org/bugzilla/show_bug.cgi?id=4737

On 2.34 with now have _Fork (c32c868ab8b2b) which is a async-signal-safe fork() replacement added by Austin Group issue 62 [1]. The Austin defect also dropped the async-signal-safe requirement for fork, so currently there is no plan to try make fork() async-signal-safe.

rainers · 2021-07-13T19:42:06Z

@ibuclaw Thanks for the link. It seems the fork issue might actually be causing the failures (most seem to be deadlocking and are terminated by timeout).

libc 2.34 doesn't seem to be released yet and might take some time before being available in major distros. Do you think it would be possible now to emulate what _Fork does, i.e. do the appropriate syscall (at least on linux)?

ibuclaw · 2021-07-13T20:01:23Z

@ibuclaw Thanks for the link. It seems the fork issue might actually be causing the failures (most seem to be deadlocking and are terminated by timeout).

libc 2.34 doesn't seem to be released yet and might take some time before being available in major distros. Do you think it would be possible now to emulate what _Fork does, i.e. do the appropriate syscall (at least on linux)?

Possibly https://github.com/bminor/glibc/blob/08cbcd4dbc686bb38ec3093aff2f919fbff5ec17/sysdeps/unix/sysv/linux/arch-fork.h

rainers · 2021-07-16T21:13:26Z

Possibly https://github.com/bminor/glibc/blob/08cbcd4dbc686bb38ec3093aff2f919fbff5ec17/sysdeps/unix/sysv/linux/arch-fork.h

I have tried to imitate that for Linux/x86_64. Let's see if this platform passes now.

rainers · 2021-07-18T14:29:12Z

@FraMecca Hi. In an attempt to resurrect your efforts I rebased your PR and squashed your commits here. I hope that is ok with you. Should I rather push to your PR #2604 instead?

rainers · 2021-07-20T16:36:36Z

This has passed all tests with the forking GC enabled, so should be ready for review.

the forking was only enabled by default to use it in all tests, should be disabled by default.
parallel marking is disabled when forking. In principle it might be interesting to do this in the forked process, but that means the additional threads have to be started every time - not optimal. In addition they would have to be limited somehow as using all CPUs for marking defeats the purpose of the concurrent GC, i.e. keeping the application running.

Running the GC benchmarks (in a VM with 4 cores on an i5-2865U), I get

MIN conalloc         1.557 s,   153 MB,   24 GC  148 ms, Pauses   39 ms <    2 ms
MIN conappend        0.977 s,    91 MB,   34 GC   78 ms, Pauses   38 ms <    2 ms
MIN concpu           1.458 s,   181 MB,   23 GC  132 ms, Pauses   34 ms <    3 ms
MIN conmsg           4.930 s,   176 MB,    7 GC   21 ms, Pauses   11 ms <    2 ms
MIN dlist            6.088 s,   145 MB,    8 GC   19 ms, Pauses    7 ms <    1 ms
MIN huge_single      0.007 s,  1504 MB,    2 GC    0 ms, Pauses    0 ms <    0 ms
MIN rand_large       4.610 s,  1113 MB, 1751 GC  865 ms, Pauses  238 ms <    0 ms
MIN rand_small      24.385 s,  2443 MB,   46 GC 1018 ms, Pauses  481 ms <   23 ms
MIN slist            6.173 s,   145 MB,    8 GC   18 ms, Pauses    7 ms <    1 ms
MIN testgc3          3.196 s,   425 MB,    5 GC   19 ms, Pauses    7 ms <    5 ms
MIN tree1            1.382 s,   117 MB,    8 GC    9 ms, Pauses    5 ms <    1 ms
MIN tree2            4.362 s,    92 MB,    7 GC    6 ms, Pauses    3 ms <    0 ms
MIN vdparser         7.434 s,   145 MB,    8 GC   23 ms, Pauses    7 ms <    1 ms
MIN words            3.303 s,   386 MB,    3 GC    1 ms, Pauses    0 ms <    0 ms

The last value on each line is the most interesting one: the maximum pause time. Compare this with the non-forking GC (with parallel marking disabled):

MIN conalloc         1.657 s,    12 MB, 1085 GC  448 ms, Pauses  442 ms <    1 ms
MIN conappend        0.857 s,    12 MB,  707 GC  281 ms, Pauses  276 ms <    2 ms
MIN concpu           2.033 s,    12 MB, 1166 GC  808 ms, Pauses  803 ms <    3 ms
MIN conmsg           4.088 s,    51 MB,   31 GC   96 ms, Pauses   90 ms <    5 ms
MIN dlist            5.792 s,    22 MB,   35 GC  160 ms, Pauses  158 ms <    5 ms
MIN huge_single      0.008 s,  1504 MB,    2 GC    0 ms, Pauses    0 ms <    0 ms
MIN rand_large       5.346 s,   171 MB, 8027 GC 2143 ms, Pauses 1690 ms <    1 ms
MIN rand_small      11.547 s,    12 MB, 7637 GC 2318 ms, Pauses 1924 ms <    1 ms
MIN slist            5.712 s,    12 MB,  101 GC  131 ms, Pauses  128 ms <    2 ms
MIN testgc3          2.988 s,   258 MB,   10 GC  232 ms, Pauses  222 ms <   47 ms
MIN tree1            1.203 s,     5 MB,   64 GC   27 ms, Pauses   26 ms <    1 ms
MIN tree2            4.238 s,     1 MB,  216 GC   48 ms, Pauses   47 ms <    0 ms
MIN vdparser         6.746 s,    70 MB,   13 GC  121 ms, Pauses  117 ms <   17 ms
MIN words            3.157 s,   358 MB,    5 GC   11 ms, Pauses    9 ms <    8 ms

For reference, these are the results with parallel marking:

MIN conalloc         1.299 s,    12 MB,  680 GC  183 ms, Pauses  179 ms <    2 ms
MIN conappend        0.817 s,    12 MB,  746 GC  211 ms, Pauses  206 ms <    1 ms
MIN concpu           1.953 s,    12 MB, 1281 GC  675 ms, Pauses  669 ms <   12 ms
MIN conmsg           3.975 s,    22 MB,   84 GC  102 ms, Pauses   97 ms <    3 ms
MIN dlist            5.750 s,    22 MB,   35 GC  277 ms, Pauses  274 ms <   13 ms
MIN huge_single      0.007 s,  1504 MB,    2 GC    0 ms, Pauses    0 ms <    0 ms
MIN rand_large       4.614 s,   171 MB, 7858 GC 1438 ms, Pauses  952 ms <    1 ms
MIN rand_small      11.495 s,    12 MB, 7637 GC 2131 ms, Pauses 1710 ms <    0 ms
MIN slist            5.784 s,    12 MB,  101 GC  198 ms, Pauses  196 ms <    3 ms
MIN testgc3          3.031 s,   258 MB,   10 GC  223 ms, Pauses  212 ms <   37 ms
MIN tree1            1.226 s,     5 MB,   64 GC   28 ms, Pauses   25 ms <    1 ms
MIN tree2            4.158 s,     1 MB,  216 GC   30 ms, Pauses   28 ms <    0 ms
MIN vdparser         6.762 s,    70 MB,   13 GC   79 ms, Pauses   74 ms <   10 ms
MIN words            3.115 s,   358 MB,    5 GC   16 ms, Pauses   14 ms <   13 ms

Observations:

the forking GC very much reduces pause time.
it can need quite a bit more memory as allocation continues while a concurrent collection is running. Benchmarks can be pretty bad in this regard if allocating is all they do.

kinke · 2021-07-24T12:35:02Z

fork:

MIN rand_small 24.385 s, 2443 MB, 46 GC 1018 ms, Pauses 481 ms < 23 ms

parallel marking:

MIN rand_small 11.495 s, 12 MB, 7637 GC 2131 ms, Pauses 1710 ms < 0 ms

Does this mean the forking GC increases the runtime by > 100% and requires more than 200x as much RAM?!

rainers · 2021-07-24T13:04:04Z

Does this mean the forking GC increases the runtime by > 100% and requires more than 200x as much RAM?!

I guess this is an extreme example made worse by running in a VM. All the test does is allocating random chunks of memory. That's a pattern that isn't handled very well as there is no throttling of the allocation rate.

GC time is quite a bit less than without forking, but overall run time is usually worse. I suspect this is the penalty for COW, which is probably exaggerated by the VM. It would be nice if someone could test this on a native linux system.

rainers · 2021-07-25T06:53:49Z

Hi @llucax, you might be interested in this, too.
I've (hopefully) worked around the deadlock issue by using clone() instead of fork(). At least tests are passing rather consistently for a week now.

llucax · 2021-07-25T10:22:26Z

Hi @llucax, you might be interested in this, too.
I've (hopefully) worked around the deadlock issue by using clone() instead of fork(). At least tests are passing rather consistently for a week now.

Hi @rainers, thanks! As @ibuclaw said, I recently got a notification about the new _Fork and automatically thought about this :)
I had the feeling there was a reason why clone wasn't a real solution, but now I can't remember why (I think some other interactions with libc), but if tests are running fine, I guess this is really good news!

I'll have a look at when I have some time, at least to the clone part, as the rest if it's what @FraMecca did, I already reviewed it during SAOC.

llucax · 2021-07-25T10:22:41Z

I guess this is an extreme example made worse by running in a VM. All the test does is allocating random chunks of memory. That's a pattern that isn't handled very well as there is no throttling of the allocation rate.

GC time is quite a bit less than without forking, but overall run time is usually worse. I suspect this is the penalty for COW, which is probably exaggerated by the VM. It would be nice if someone could test this on a native linux system.

I want to add to this that in my experiments (that are almost 11 years old 😱 ) I notice in real programs (actually I only tested one real program) the total runtime was less, which was unexpected. The COW and other overhead should make the totl runtime worse, but it also adds concurrency for free, at least for single-threaded programs (so this benefits might only be present in such programs), so if instead of waiting for the GC to finish to do actual calculations you are still doing the calculations your program is supposed to do, it means it will finish earlier, and this is what I observed with this particular program (and some other benchmarks that follow this pattern, you can see in these benchmarks that conalloc or concpu have better runtimes than the non-forking GC, concpu is still beating the parallel marking version).

So which GC to use might depend a lot on your particular program, there is no silver bullet. Hopefully the default GC should be good enough for the majority of cases, but when you are bumping into walls, you certainly might benefit from trying different GC configurations.

For reference, this is the 11 years old blog post where I did the first GC benchmarks of the fully concurrent GC, dil is the real program, a D parser/compiler:
https://llucax.com/blog/blog/post/-1a4bdfba

RazvanN7 · 2021-08-01T10:54:39Z

Everyone OK with merging this?

rainers · 2021-08-01T12:50:09Z

It now needs a rebase after merging #3523 though...

…rking GC

…ed in the wrong process

rainers · 2021-08-01T17:15:51Z

It now needs a rebase after merging #3523 though...

Done.

llucax · 2021-08-01T21:03:12Z

Amazing!!! Thanks @rainers and @FraMecca for doing this! This gives closure to a piece of work I did with a lot of love and dedication, already 11 years ago and makes it finally available for a wider audience. I really hope it is useful :)

rainers · 2021-08-02T05:57:00Z

Thanks to @llucax and @FraMecca for doing most of the work here.

The precise GC took only about 7 years to land in master, but I'm also not certain how much it is actually used outside of Visual D ;-)

- it doesn't test any specific code generation, but rather puts pressure on the GC - it has been added to druntime in dlang/druntime#3514 as it uncovered issues for the concurrent GC - it uses phobos

rainers requested review from CyberShadow, DmitryOlshansky, MartinNowak, WalterBright, andralex, jmdavis, schveiguy and wilzbach as code owners July 13, 2021 06:38

rainers force-pushed the saoc branch from 7d3e3ba to 3dfaea3 Compare July 13, 2021 07:19

rainers force-pushed the saoc branch 2 times, most recently from d52d679 to f784ff8 Compare July 18, 2021 14:26

rainers force-pushed the saoc branch 6 times, most recently from bce649e to 869303d Compare July 24, 2021 17:05

dlang-bot added Needs Rebase needs a `git rebase` performed Needs Work labels Aug 1, 2021

Francesco Mecca and others added 8 commits August 1, 2021 19:11

GC: add concurrent collection based on fork()

1c44260

use atomic operation when accessing shared mark data

9a9a945

use clone() instead of fork()

dfc155f

add hospital.d from the dmd test suite as an explicit test for the fo…

46cfa1b

…rking GC

disable parallel scanning in the concurrent GC, the threads are creat…

4cdc0e7

…ed in the wrong process

add changelog

10ff8a2

try using __fork() on macos

51f0caa

disable forking by default, add gc unittests with forking GC

44442b4

rainers force-pushed the saoc branch from 370ea3d to 44442b4 Compare August 1, 2021 17:14

dlang-bot removed Needs Rebase needs a `git rebase` performed Needs Work labels Aug 1, 2021

RazvanN7 added auto-merge-squash and removed 72h no objection -> merge The PR will be merged if there are no objections raised. labels Aug 1, 2021

dlang-bot merged commit 0cfc798 into dlang:master Aug 1, 2021

rainers mentioned this pull request Aug 15, 2021

[cleanup] remove hospital.d from compiler test suite dlang/dmd#12979

Merged

thewilsonator mentioned this pull request Aug 31, 2021

SAOC: fork() based garbage collector #2604

Closed

This was referenced Sep 9, 2025

druntime: Only use async-signal safe fork in GC dlang/dmd#21831

Merged

druntime: fork based garbage collector and async safety dlang/dmd#21834

Closed

Uh oh!

Conversation

rainers commented Jul 13, 2021

Uh oh!

dlang-bot commented Jul 13, 2021

Bugzilla references

Testing this PR locally

Uh oh!

RazvanN7 commented Jul 13, 2021

Uh oh!

ibuclaw commented Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rainers commented Jul 13, 2021

Uh oh!

ibuclaw commented Jul 13, 2021

Uh oh!

rainers commented Jul 16, 2021

Uh oh!

rainers commented Jul 18, 2021

Uh oh!

rainers commented Jul 20, 2021

Uh oh!

kinke commented Jul 24, 2021

Uh oh!

rainers commented Jul 24, 2021

Uh oh!

rainers commented Jul 25, 2021

Uh oh!

llucax commented Jul 25, 2021

Uh oh!

llucax commented Jul 25, 2021

Uh oh!

RazvanN7 commented Aug 1, 2021

Uh oh!

rainers commented Aug 1, 2021

Uh oh!

rainers commented Aug 1, 2021

Uh oh!

llucax commented Aug 1, 2021

Uh oh!

rainers commented Aug 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Comments

ibuclaw commented Jul 13, 2021 •

edited

Loading