Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.
/ druntime Public archive

#2604 rebased - fork() based garbage collector#3514

Merged
dlang-bot merged 8 commits intodlang:masterfrom
rainers:saoc
Aug 1, 2021
Merged

#2604 rebased - fork() based garbage collector#3514
dlang-bot merged 8 commits intodlang:masterfrom
rainers:saoc

Conversation

@rainers
Copy link
Member

@rainers rainers commented Jul 13, 2021

Attempt to rebase #2604 to see whether it still fails

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @rainers!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + druntime#3514"

@RazvanN7
Copy link
Contributor

Thanks for doing this @rainers ! Fingers crossed! Things look good momentarily.

@ibuclaw
Copy link
Member

ibuclaw commented Jul 13, 2021

https://sourceware.org/bugzilla/show_bug.cgi?id=4737

On 2.34 with now have _Fork (c32c868ab8b2b) which is a async-signal-safe fork() replacement added by Austin Group issue 62 [1]. The Austin defect also dropped the async-signal-safe requirement for fork, so currently there is no plan to try make fork() async-signal-safe.

@rainers
Copy link
Member Author

rainers commented Jul 13, 2021

@ibuclaw Thanks for the link. It seems the fork issue might actually be causing the failures (most seem to be deadlocking and are terminated by timeout).

libc 2.34 doesn't seem to be released yet and might take some time before being available in major distros. Do you think it would be possible now to emulate what _Fork does, i.e. do the appropriate syscall (at least on linux)?

@ibuclaw
Copy link
Member

ibuclaw commented Jul 13, 2021

@ibuclaw Thanks for the link. It seems the fork issue might actually be causing the failures (most seem to be deadlocking and are terminated by timeout).

libc 2.34 doesn't seem to be released yet and might take some time before being available in major distros. Do you think it would be possible now to emulate what _Fork does, i.e. do the appropriate syscall (at least on linux)?

Possibly https://github.com/bminor/glibc/blob/08cbcd4dbc686bb38ec3093aff2f919fbff5ec17/sysdeps/unix/sysv/linux/arch-fork.h

@rainers
Copy link
Member Author

rainers commented Jul 16, 2021

Possibly https://github.com/bminor/glibc/blob/08cbcd4dbc686bb38ec3093aff2f919fbff5ec17/sysdeps/unix/sysv/linux/arch-fork.h

I have tried to imitate that for Linux/x86_64. Let's see if this platform passes now.

@rainers rainers force-pushed the saoc branch 2 times, most recently from d52d679 to f784ff8 Compare July 18, 2021 14:26
@rainers
Copy link
Member Author

rainers commented Jul 18, 2021

@FraMecca Hi. In an attempt to resurrect your efforts I rebased your PR and squashed your commits here. I hope that is ok with you. Should I rather push to your PR #2604 instead?

@rainers
Copy link
Member Author

rainers commented Jul 20, 2021

This has passed all tests with the forking GC enabled, so should be ready for review.

  • the forking was only enabled by default to use it in all tests, should be disabled by default.
  • parallel marking is disabled when forking. In principle it might be interesting to do this in the forked process, but that means the additional threads have to be started every time - not optimal. In addition they would have to be limited somehow as using all CPUs for marking defeats the purpose of the concurrent GC, i.e. keeping the application running.

Running the GC benchmarks (in a VM with 4 cores on an i5-2865U), I get

MIN conalloc         1.557 s,   153 MB,   24 GC  148 ms, Pauses   39 ms <    2 ms
MIN conappend        0.977 s,    91 MB,   34 GC   78 ms, Pauses   38 ms <    2 ms
MIN concpu           1.458 s,   181 MB,   23 GC  132 ms, Pauses   34 ms <    3 ms
MIN conmsg           4.930 s,   176 MB,    7 GC   21 ms, Pauses   11 ms <    2 ms
MIN dlist            6.088 s,   145 MB,    8 GC   19 ms, Pauses    7 ms <    1 ms
MIN huge_single      0.007 s,  1504 MB,    2 GC    0 ms, Pauses    0 ms <    0 ms
MIN rand_large       4.610 s,  1113 MB, 1751 GC  865 ms, Pauses  238 ms <    0 ms
MIN rand_small      24.385 s,  2443 MB,   46 GC 1018 ms, Pauses  481 ms <   23 ms
MIN slist            6.173 s,   145 MB,    8 GC   18 ms, Pauses    7 ms <    1 ms
MIN testgc3          3.196 s,   425 MB,    5 GC   19 ms, Pauses    7 ms <    5 ms
MIN tree1            1.382 s,   117 MB,    8 GC    9 ms, Pauses    5 ms <    1 ms
MIN tree2            4.362 s,    92 MB,    7 GC    6 ms, Pauses    3 ms <    0 ms
MIN vdparser         7.434 s,   145 MB,    8 GC   23 ms, Pauses    7 ms <    1 ms
MIN words            3.303 s,   386 MB,    3 GC    1 ms, Pauses    0 ms <    0 ms

The last value on each line is the most interesting one: the maximum pause time. Compare this with the non-forking GC (with parallel marking disabled):

MIN conalloc         1.657 s,    12 MB, 1085 GC  448 ms, Pauses  442 ms <    1 ms
MIN conappend        0.857 s,    12 MB,  707 GC  281 ms, Pauses  276 ms <    2 ms
MIN concpu           2.033 s,    12 MB, 1166 GC  808 ms, Pauses  803 ms <    3 ms
MIN conmsg           4.088 s,    51 MB,   31 GC   96 ms, Pauses   90 ms <    5 ms
MIN dlist            5.792 s,    22 MB,   35 GC  160 ms, Pauses  158 ms <    5 ms
MIN huge_single      0.008 s,  1504 MB,    2 GC    0 ms, Pauses    0 ms <    0 ms
MIN rand_large       5.346 s,   171 MB, 8027 GC 2143 ms, Pauses 1690 ms <    1 ms
MIN rand_small      11.547 s,    12 MB, 7637 GC 2318 ms, Pauses 1924 ms <    1 ms
MIN slist            5.712 s,    12 MB,  101 GC  131 ms, Pauses  128 ms <    2 ms
MIN testgc3          2.988 s,   258 MB,   10 GC  232 ms, Pauses  222 ms <   47 ms
MIN tree1            1.203 s,     5 MB,   64 GC   27 ms, Pauses   26 ms <    1 ms
MIN tree2            4.238 s,     1 MB,  216 GC   48 ms, Pauses   47 ms <    0 ms
MIN vdparser         6.746 s,    70 MB,   13 GC  121 ms, Pauses  117 ms <   17 ms
MIN words            3.157 s,   358 MB,    5 GC   11 ms, Pauses    9 ms <    8 ms

For reference, these are the results with parallel marking:

MIN conalloc         1.299 s,    12 MB,  680 GC  183 ms, Pauses  179 ms <    2 ms
MIN conappend        0.817 s,    12 MB,  746 GC  211 ms, Pauses  206 ms <    1 ms
MIN concpu           1.953 s,    12 MB, 1281 GC  675 ms, Pauses  669 ms <   12 ms
MIN conmsg           3.975 s,    22 MB,   84 GC  102 ms, Pauses   97 ms <    3 ms
MIN dlist            5.750 s,    22 MB,   35 GC  277 ms, Pauses  274 ms <   13 ms
MIN huge_single      0.007 s,  1504 MB,    2 GC    0 ms, Pauses    0 ms <    0 ms
MIN rand_large       4.614 s,   171 MB, 7858 GC 1438 ms, Pauses  952 ms <    1 ms
MIN rand_small      11.495 s,    12 MB, 7637 GC 2131 ms, Pauses 1710 ms <    0 ms
MIN slist            5.784 s,    12 MB,  101 GC  198 ms, Pauses  196 ms <    3 ms
MIN testgc3          3.031 s,   258 MB,   10 GC  223 ms, Pauses  212 ms <   37 ms
MIN tree1            1.226 s,     5 MB,   64 GC   28 ms, Pauses   25 ms <    1 ms
MIN tree2            4.158 s,     1 MB,  216 GC   30 ms, Pauses   28 ms <    0 ms
MIN vdparser         6.762 s,    70 MB,   13 GC   79 ms, Pauses   74 ms <   10 ms
MIN words            3.115 s,   358 MB,    5 GC   16 ms, Pauses   14 ms <   13 ms

Observations:

  • the forking GC very much reduces pause time.
  • it can need quite a bit more memory as allocation continues while a concurrent collection is running. Benchmarks can be pretty bad in this regard if allocating is all they do.

@kinke
Copy link
Contributor

kinke commented Jul 24, 2021

fork:

MIN rand_small 24.385 s, 2443 MB, 46 GC 1018 ms, Pauses 481 ms < 23 ms

parallel marking:

MIN rand_small 11.495 s, 12 MB, 7637 GC 2131 ms, Pauses 1710 ms < 0 ms

Does this mean the forking GC increases the runtime by > 100% and requires more than 200x as much RAM?!

@rainers
Copy link
Member Author

rainers commented Jul 24, 2021

Does this mean the forking GC increases the runtime by > 100% and requires more than 200x as much RAM?!

I guess this is an extreme example made worse by running in a VM. All the test does is allocating random chunks of memory. That's a pattern that isn't handled very well as there is no throttling of the allocation rate.

GC time is quite a bit less than without forking, but overall run time is usually worse. I suspect this is the penalty for COW, which is probably exaggerated by the VM. It would be nice if someone could test this on a native linux system.

@rainers rainers force-pushed the saoc branch 6 times, most recently from bce649e to 869303d Compare July 24, 2021 17:05
@rainers
Copy link
Member Author

rainers commented Jul 25, 2021

Hi @llucax, you might be interested in this, too.
I've (hopefully) worked around the deadlock issue by using clone() instead of fork(). At least tests are passing rather consistently for a week now.

@llucax
Copy link

llucax commented Jul 25, 2021

Hi @llucax, you might be interested in this, too.
I've (hopefully) worked around the deadlock issue by using clone() instead of fork(). At least tests are passing rather consistently for a week now.

Hi @rainers, thanks! As @ibuclaw said, I recently got a notification about the new _Fork and automatically thought about this :)
I had the feeling there was a reason why clone wasn't a real solution, but now I can't remember why (I think some other interactions with libc), but if tests are running fine, I guess this is really good news!

I'll have a look at when I have some time, at least to the clone part, as the rest if it's what @FraMecca did, I already reviewed it during SAOC.

@llucax
Copy link

llucax commented Jul 25, 2021

I guess this is an extreme example made worse by running in a VM. All the test does is allocating random chunks of memory. That's a pattern that isn't handled very well as there is no throttling of the allocation rate.

GC time is quite a bit less than without forking, but overall run time is usually worse. I suspect this is the penalty for COW, which is probably exaggerated by the VM. It would be nice if someone could test this on a native linux system.

I want to add to this that in my experiments (that are almost 11 years old 😱 ) I notice in real programs (actually I only tested one real program) the total runtime was less, which was unexpected. The COW and other overhead should make the totl runtime worse, but it also adds concurrency for free, at least for single-threaded programs (so this benefits might only be present in such programs), so if instead of waiting for the GC to finish to do actual calculations you are still doing the calculations your program is supposed to do, it means it will finish earlier, and this is what I observed with this particular program (and some other benchmarks that follow this pattern, you can see in these benchmarks that conalloc or concpu have better runtimes than the non-forking GC, concpu is still beating the parallel marking version).

So which GC to use might depend a lot on your particular program, there is no silver bullet. Hopefully the default GC should be good enough for the majority of cases, but when you are bumping into walls, you certainly might benefit from trying different GC configurations.

For reference, this is the 11 years old blog post where I did the first GC benchmarks of the fully concurrent GC, dil is the real program, a D parser/compiler:
https://llucax.com/blog/blog/post/-1a4bdfba

@RazvanN7
Copy link
Contributor

RazvanN7 commented Aug 1, 2021

Everyone OK with merging this?

@rainers
Copy link
Member Author

rainers commented Aug 1, 2021

It now needs a rebase after merging #3523 though...

@dlang-bot dlang-bot added Needs Rebase needs a `git rebase` performed Needs Work labels Aug 1, 2021
@dlang-bot dlang-bot removed Needs Rebase needs a `git rebase` performed Needs Work labels Aug 1, 2021
@rainers
Copy link
Member Author

rainers commented Aug 1, 2021

It now needs a rebase after merging #3523 though...

Done.

@RazvanN7 RazvanN7 added auto-merge-squash and removed 72h no objection -> merge The PR will be merged if there are no objections raised. labels Aug 1, 2021
@dlang-bot dlang-bot merged commit 0cfc798 into dlang:master Aug 1, 2021
@llucax
Copy link

llucax commented Aug 1, 2021

Amazing!!! Thanks @rainers and @FraMecca for doing this! This gives closure to a piece of work I did with a lot of love and dedication, already 11 years ago and makes it finally available for a wider audience. I really hope it is useful :)

@rainers
Copy link
Member Author

rainers commented Aug 2, 2021

Thanks to @llucax and @FraMecca for doing most of the work here.

The precise GC took only about 7 years to land in master, but I'm also not certain how much it is actually used outside of Visual D ;-)

rainers added a commit to rainers/dmd that referenced this pull request Aug 15, 2021
- it doesn't test any specific code generation, but rather puts pressure on the GC
- it has been added to druntime in dlang/druntime#3514 as it uncovered issues for the concurrent GC
- it uses phobos
rainers added a commit to rainers/dmd that referenced this pull request Aug 15, 2021
- it doesn't test any specific code generation, but rather puts pressure on the GC
- it has been added to druntime in dlang/druntime#3514 as it uncovered issues for the concurrent GC
- it uses phobos
ibuclaw pushed a commit to dlang/dmd that referenced this pull request Aug 15, 2021
- it doesn't test any specific code generation, but rather puts pressure on the GC
- it has been added to druntime in dlang/druntime#3514 as it uncovered issues for the concurrent GC
- it uses phobos
UplinkCoder pushed a commit to UplinkCoder/dmd that referenced this pull request Aug 19, 2021
- it doesn't test any specific code generation, but rather puts pressure on the GC
- it has been added to druntime in dlang/druntime#3514 as it uncovered issues for the concurrent GC
- it uses phobos
kinke pushed a commit to ldc-developers/dmd-testsuite that referenced this pull request Sep 6, 2021
- it doesn't test any specific code generation, but rather puts pressure on the GC
- it has been added to druntime in dlang/druntime#3514 as it uncovered issues for the concurrent GC
- it uses phobos
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants

Comments