Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git-tempfile clobbers signal handlers installed by tokio::signal #336

Closed
alexjg opened this issue Feb 16, 2022 · 10 comments
Closed

git-tempfile clobbers signal handlers installed by tokio::signal #336

alexjg opened this issue Feb 16, 2022 · 10 comments

Comments

@alexjg
Copy link

alexjg commented Feb 16, 2022

Given the following code:

use git_tempfile::{ContainingDirectory, AutoRemove};

#[tokio::main]
async fn main() {

    install_signal_handlers();
    let _tempfile = git_tempfile::new("thetemp", ContainingDirectory::Exists, AutoRemove::Tempfile).unwrap();
    futures::future::pending::<()>().await;
}

fn install_signal_handlers() {
    use tokio::signal::unix::{signal, SignalKind};

    let mut sigterm = signal(SignalKind::terminate()).unwrap();
    let mut sigint = signal(SignalKind::interrupt()).unwrap();

    tokio::spawn(async move {
        futures::future::select(Box::pin(sigterm.recv()), Box::pin(sigint.recv())).await;
        println!("got sigterm or sigint");
        std::process::exit(5);
    });
}

I would expect "got sigterm or sigint" to be printed on exit - and indeed that's what happens if I omit the _tempfile = .. line. With the tempfile however the signal handler doesn't get called.

I can't tell if this is a bug in git-tempfile, signal_hook or tokio::signal.

Note I'm using the following Cargo.toml

[package]
name = "tempfile-signals"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
git-tempfile = "^1.0.6"
tokio = { version = "1.2", features = [ "macros", "process", "signal", "time", "rt-multi-thread" ] }
futures = "^0.3.21"
@alexjg
Copy link
Author

alexjg commented Feb 16, 2022

On further investigation it looks like calling git_tempfile::force_setup(SignalHandlerMode::None) solves the problem.

@alexjg alexjg closed this as completed Feb 16, 2022
@Byron Byron reopened this Feb 16, 2022
@Byron
Copy link
Member

Byron commented Feb 17, 2022

I don't know if reopening the issue is the right choice, but it's meant to indicate that I am looking at it.

The goal of this accidentally surprising default behaviour is to assure tempfiles are deleted on interrupt automatically without the need for configuration. Meddling with signals from a library is dangerous business and this issue is a good example for that, it's definitely something I'd like to tackle.

Using git_tempfile::force_setup(SignalHandlerMode::None) seems to solve the issue, but it also means that tempfiles remain on interrupt, making this crate barely any more useful than the tempfile crate it relies on.

Can you try using git_tempfile::force_setup(SignalHandlerMode::DeleteTempfilesOnTermination) instead?

I think what's happening here by default is the following:

  • tokio installs its signal handlers
  • git-tempfile installs its signal handlers, signal-hook will call it right after the one tokio installed.
  • on signal, and since the default behaviour of git-tempfile is DeleteTempfilesOnTerminationAndRestoreDefaultBehaviour, when called it will emulate the default behaviour which is to abort the process. That's why one won't get to see the println!(…) - the process doesn't get to continue as the signal handler aborts.

The reason for this default behaviour is to avoid installing a handler that won't abort, which is also surprising in its own right.

The only way to not make it surprising is to clearly document that an initialization call is needed before creating any tempfile, and to avoid automatic signal handler setup.

Byron added a commit that referenced this issue Feb 17, 2022
The previous behaviour is meant to be convenient for the casual
user even though it
ends up being surprising when used in applications that install
their own signal handlers and need more control over how the program
shuts down.

This is now fixed by **requiring an explicit `setup()`** call before
the first tempfile is created, which makes it a breaking change.
Byron added a commit that referenced this issue Feb 17, 2022
@Byron
Copy link
Member

Byron commented Feb 17, 2022

About 6 months after the first stable release, it's time for the first breaking change to fix this issue.

Here is the changelog.

@Byron Byron closed this as completed Feb 17, 2022
@kim
Copy link
Contributor

kim commented Feb 17, 2022

Can you make it so other gitoxide crates which are using git-tempfile can be set up to not use signal handlers? Otherwise the fix doesn't really fix anything.

@kim
Copy link
Contributor

kim commented Feb 17, 2022

What I mean is: a feature flag or something which guarantees that no signal handling is happening at all.

@Byron
Copy link
Member

Byron commented Feb 17, 2022

Staying on the topic of not having automatically registered signal handlers interfere with an application, the latest released git-repository v0.14 already acts unsurprisingly by requiring the application to initialize handlers explicitly. In this case, git-tempfile v1.0 is integrated as well. This behavior goes back to git-repository v0.7, with prior versions not depending on signal-hook at all.

As of 90b1c42 only git-tempfile and git-repository depend on signal-hook and both require explicit setup calls to register them. There is no reason to change that back to the previous behavior either.

Otherwise the fix doesn't really fix anything.

You may be referring to another issue that isn't clear to me.

What I mean is: a feature flag or something which guarantees that no signal handling is happening at all.

Despite trying I have trouble making a strong case for it. The one I could come up with is applications wanting to use a different signal handling crate than signal-hook, which appears to be dominant with over 10m downloads. Another one could be to not use signal handlers at all, which is too undesirable as such applications would definitely leave locks when asked to shutdown. Maybe there can be an application that is locked in to an older version of signal-hook which won't compile alongside the latest one used here, and when that issue arises I'd be happy to help.

@kim
Copy link
Contributor

kim commented Feb 17, 2022

only git-tempfile and git-repository depend on signal-hook

According to cargo tree, we're pulling in git-tempfile through git-pack, git-ref, git-protocol and git-odb.

The signal handling code in git-tempfile is obviously unsound, and I would prefer if it was possible to statically prevent it from being entered, without the backdoor of users being able to manipulate a global variable.

@Byron
Copy link
Member

Byron commented Feb 18, 2022

By this definition of soundness it would probably be in both of our interest to fix the 'obviously unsound' way of handling signals in case you want to elaborate on that. Even if signal-handling would end up being removed entirely from this crate it would certainly be good to know how to prevent this cause of unsoundness in future.

For reference, here is how signals are registered in git-tempfile, and here is a similar call in signal-hook itself. This is the only use of unsafe to call the unsafe functions in signal-hook's low_level module. Lazy assures the code is called exactly once.

[..] without the backdoor of users being able to manipulate a global variable.

If users with access to global variables (or memory?) within a process are part of the threat model, isn't it time to wave the white flag? Maybe tracing shouldn't be used either?

@kim
Copy link
Contributor

kim commented Feb 18, 2022

You don't seem to be familiar with signal-safety(7), otherwise you wouldn't compare a logging library to code which can deadlock inside a signal handler. Since this behaviour can be triggered by anything in the dependency graph which happens to call setup first, any invariants we may have about our code can be violated easily (as demonstrated by this issue). This taints every library upstream of git-tempfile. Signal handling belongs to the application layer, as close to main as possible.

Together with your general resistance to feedback and having your code audited by someone else before it gets released publicly, I fear that this means gitoxide is just not something one would want in their dependency graph.

@Byron
Copy link
Member

Byron commented Feb 19, 2022

You don't seem to be familiar with signal-safety(7), otherwise you wouldn't compare a logging library to code which can deadlock inside a signal handler.

Until know I wasn't aware that the issue is about the claim the handler can deadlock, but about users being able to manipulate a global variable..

Based on the conversation here a fix was provided in 81ed5f5 which was released in v1.0.6. Unfortunately there was never any hard evidence that it actually would deadlock so it's certainly no more than a claim that it's fixed either. Is this what you mean by code which can deadlock inside a signal handler? The certainty of it is astounding.

To get some data, I created a stress test in 076c4ff, made it part of CI and tried my best to find a deadlock. Unfortunately, even on v1.0.5 which should still be able to deadlock, the program came up with nothing. Fortunately, the same is true for the most recent version which contains the fix.

For some numbers, here is the output of the program:

OK: survived 120s without deadlock with 20463 tempfiles created, lock obtained 811235865 times, cleanup handler ran 63049 times

Since this behaviour can be triggered by anything in the dependency graph which happens to call setup first, any invariants we may have about our code can be violated easily (as demonstrated by this issue). This taints every library upstream of git-tempfile. Signal handling belongs to the application layer, as close to main as possible.

That's so helpful and is definitely not an issue I saw thus far, but I can see it now. A library calling setup(<cleanup and abort>/Default::default()) will be an issue (and only that). Thanks for elaborating. That would have been great to have right after the first comment ending on Otherwise the fix doesn't really fix anything..

In order to prevent that from happening, there seems to be only one way that fixes it for good: the removal of the code that installs signal handlers entirely. Otherwise there is no way to prevent upstream library crates to turn on the signal-hook feature initially suggested and install handlers in a way that aborts. I couldn't agree more that such code should only run in the application crate. There will be a follow up to this to make additional breaking changes when the next release window opens.

Together with your general resistance to feedback […]

Feedback must be actionable, and to me that requires a shared understanding of what the issue is. My replies on a formerly closed issue show that I intend to get the information I need to make it actionable and improve gitoxide in the process. Thus I whole-heartedly reject this claim.

[…] and having your code audited by someone else before it gets released publicly […]

I am not opposed to reviews and have moved to a PR based workflow a while ago to make comments on what's about to land possible. Since there seems to be an expectation to conduct audits that prevent releases which is new to me, may I suggest to open a discussion so we sort out a workflow for that?


Generally I don't like the tone conversations with you end up having with you @kim, and while acknowledging that it could all be me I truly hope this can change towards more constructive and efficient interactions. Right now they are making me emotionally unhappy, leave a bitter taste, and seem inefficient.

A pattern I see is that it starts out with requests along vague claims I can't follow. Here it's Otherwise the fix doesn't really fix anything. that just makes me wonder what that thing is that isn't fixed. What follows is a vague suggestion for taking action, a feature flag or something, which would help to unlock value which guarantees that no signal handling is happening at all.

After I follow up to understand why that should be, I am met with yet another claim like: The signal handling code in git-tempfile is obviously unsound, and trying to find out why that is leads to no answer but more claims like You don't seem to be familiar with signal-safety(7) and and me displaying a general resistance to feedback.

Even though the interaction finally unearthed a very valid concern that I am happy to address, it was awfully hard and frustrating to get to, costing time and nerves probably not only on my end.

What would help me is if suggestions or requests would be accompanied by their motivation. Maybe assuming best intention and mistakes without directing blame can help with that and keeping that neutral.

What is it that I could do differently?

If we can't figure this out, then I do presume it's for the best to indeed cull gitoxide crates from your dependency tree to remove any reason of interacting with each other in a text-only setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants