Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable GC during coroutine execution + test #7725

Merged
merged 6 commits into from
Mar 8, 2023

Conversation

yorickvP
Copy link
Contributor

Motivation

Regression test for #7679

Context

Without @roberth's boehmgc-coroutine-sp-fallback patch, nix mostly works but sometimes segfaults. This PR provides a test that should catch this bug.

Checklist for maintainers

Maintainers: tick if completed or explain if not relevant

  • agreed on idea
  • agreed on implementation strategy
  • tests, as appropriate
    • functional tests - tests/**.sh
    • unit tests - src/*/tests
    • integration tests - tests/nixos/*
  • documentation in the manual
  • code and comments are self-explanatory
  • commit message explains why the change was made
  • new feature or bug fix: updated release notes

@yorickvP
Copy link
Contributor Author

I'll investigate macos tomorrow.

#include "serialise.hh"


#define guard_gc(x) GC_register_finalizer((void*)x, finalizer, x##_collected, nullptr, nullptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be guarded with #if HAVE_BOEHMGC?

@roberth
Copy link
Member

roberth commented Jan 31, 2023

I'll investigate macos tomorrow.

I think the patch for the darwin c file never worked. Its regular stack code is different, so the linux patch doesn't translate to it, and the alt stack that was patched is quite irrelevant; used very shortly during signal handling only.

@yorickvP
Copy link
Contributor Author

yorickvP commented Feb 3, 2023

I've resorted to disabling GC before entering a coroutine on mac OS. I hope that's an acceptable workaround for now.

@yorickvP yorickvP force-pushed the check-coro-gc branch 2 times, most recently from f32e765 to e8f5588 Compare February 7, 2023 14:34
Copy link
Member

@thufschmitt thufschmitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few local suggestions, but looks good overall 👍

Note that this isn't really correct in that you could have interleaved coroutines, in which case the first to finish will restart the GC while the other is still running. But it's good-enough for now (we could certainly do something with a counter or a smart use of a shared_ptr, but let's not over-think it).

public:
DisableGC() {};
virtual ~DisableGC() {};
static std::shared_ptr<DisableGC> (*create)();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static std::shared_ptr<DisableGC> (*create)();
static std::function<std::shared_ptr<DisableGC>()> create;

Doesn't change too much, but let's make this more idiomatic C++

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't want std::function here. It's quite an expensive wrapper and allows more weird stuff (like functors and std::bind and lambda captures), so using an old function pointer is probably more typesafe for now.

See https://stackoverflow.com/a/9054802

/* Disabling GC when entering a coroutine (on macos).
::create is to avoid boehm gc dependency in libutil.
*/
class DisableGC {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe make this a slightly more generic CoroutineContext or something like that? Just so that

  • this layer doesn't have to bother about the GC
  • Rather than having all these #if __APPLE__ in serialise.cc we can just unconditionally call CoroutineContext.create() and put this #if inside the constructor/destructor of the BoehmDisableGC class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I do want to keep it obvious what it's doing and why, though. I've added the disablegc call inside the coroutine definition now.
I considered having a generic hook system (with std::list and everything) so anywhere in the code can add arbitrary coroutine context hooks, but this seems overkill, slower and less clear.

src/libexpr/tests/coro-gc.cc Show resolved Hide resolved
@yorickvP
Copy link
Contributor Author

GC_enable/disable actually does the counting itself. Boehm only enables GC when the amount of calls to enable and disable are equal.

@yorickvP yorickvP requested review from thufschmitt and removed request for edolstra February 10, 2023 15:19
@roberth
Copy link
Member

roberth commented Feb 12, 2023

Before overengineering this workaround, we might consider fixing the problem, which is detecting whether the stack pointer is in the given stack and marking the all the stack sections of the segmented darwin stack if it is not. That'd be the equivalent of the linux solution which works quite well.

That's not to mention that the linux solution isn't the final solution either, but we Nix maintainers don't have the bandwidth to make architectural changes to bdwgc.
If you or anyone finds this kind of thing a nice challenge, I could provide a bit more context to the problem, which is currently spread between a number of issues here and on the ivmai/bdwgc repo.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-44/25546/1

@roberth roberth self-assigned this Feb 26, 2023
@yorickvP
Copy link
Contributor Author

yorickvP commented Mar 1, 2023

I've added a #define to the patch and disable GC in a coroutine when this is not active. However, the tests now succeed when the patch is not applied, which was the entire point of this PR.

@yorickvP
Copy link
Contributor Author

yorickvP commented Mar 1, 2023

I would suggest a ./configure warning if gc.h is missing this define, but also, I don't want to touch autoconf today. Maybe tomorrow.

@roberth
Copy link
Member

roberth commented Mar 2, 2023

However, the tests now succeed when the patch is not applied, which was the entire point of this PR.

That's fine, the main requirement is that it doesn't crash. I don't think we have very long running coroutines, unless you're filtering a huge source in terms of number of assessed files, so the likelihood seems quite small that we'll request more memory instead of collecting, for more than once or twice in a row.
Maybe we should leave a lvlTalkative log message so that it doesn't go entirely unnoticed to someone who's troubleshooting their eval. If we have some numbers that say that a GC would be desirable while blocked, we could log a warning, but that might be difficult and result in overengineering and/or useless warnings.

@roberth
Copy link
Member

roberth commented Mar 2, 2023

Maybe we should

... not over-engineer this, because we may be able to implement the coroutine integration properly in the foreseeable future.

@yorickvP
Copy link
Contributor Author

yorickvP commented Mar 3, 2023

Alright, added this message.

@roberth roberth changed the title tests/coro-gc: create test for boehm stack patch Disable GC during coroutine execution + test Mar 7, 2023
@thufschmitt thufschmitt merged commit 4a6244d into NixOS:master Mar 8, 2023
@edolstra
Copy link
Member

edolstra commented Mar 8, 2023

This is causing some test failures: https://hydra.nixos.org/build/211782650, https://hydra.nixos.org/build/211782591

@yorickvP
Copy link
Contributor Author

yorickvP commented Mar 8, 2023

Ah, great, musl and i686. Hmm. Please revert and I'll debug that.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-45/26397/1

thufschmitt pushed a commit to tweag/nix that referenced this pull request Apr 7, 2023
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-team-report-2022-10-2023-03/27486/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants