Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

appender: fix WorkerGuard not waiting for writer destruction #1713

Merged
merged 7 commits into from
Nov 13, 2021

Conversation

trtt
Copy link
Contributor

@trtt trtt commented Nov 10, 2021

Motivation

Can be though of as a continuation to #1120 and #1125.

Example with problematic racy behavior:

use std::io::Write;

struct TestDrop<T: Write>(T);

impl<T: Write> Drop for TestDrop<T> {
    fn drop(&mut self) {
        println!("Dropped");
    }
}

impl<T: Write> Write for TestDrop<T> {
    fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
        self.0.write(buf)
    }
    fn flush(&mut self) -> std::io::Result<()> {
        self.0.flush()
    }
}

fn main() {
    let writer = TestDrop(std::io::stdout());
    let (non_blocking, _guard) = tracing_appender::non_blocking(writer);
    tracing_subscriber::fmt().with_writer(non_blocking).init();
}

Running this test case in a loop with while ./test | grep Dropped; do done, it can be seen that sometimes writer (TestDrop) is not dropped and the message is not printed.
I suppose that proper destruction of non-blocking writer should properly destroy underlying writer.

Solution

Solution involves joining Worker thread (that owns writer) after waiting for it to almost finish avoiding potential deadlock (see #1120 (comment))

@trtt trtt requested a review from a team as a code owner November 10, 2021 10:51
tracing-appender/src/non_blocking.rs Outdated Show resolved Hide resolved
@@ -277,7 +277,16 @@ impl Drop for WorkerGuard {
// when the `Worker` calls `recv()` on a zero-capacity channel. Use `send_timeout`
// so that drop is not blocked indefinitely.
// TODO: Make timeout configurable.
let _ = self.shutdown.send_timeout((), Duration::from_millis(1000));
match self.shutdown.send_timeout((), Duration::from_millis(1000)) {
Err(SendTimeoutError::Disconnected(_)) => (),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we get the Disconnected error, this means that the Receiver side of the channel has already been dropped...but that doesn't necessarily mean that the worker thread has terminated yet. should we still call join on the JoinHandle in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we get the Disconnected error, this means that the Receiver side of the channel has already been dropped...but that doesn't necessarily mean that the worker thread has terminated yet.

right, this can be racy also

If Disconnected can occur only during worker thread (which owns Receiver) destruction, I think it is safe to use join here.
Went ahead and merged Disconnected with Ok case.

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me overall! i commented on a couple of additional things that might be worth thinking about.

Comment on lines -84 to -86
if let Err(e) = self.writer.flush() {
eprintln!("Failed to flush. Error: {}", e);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm assuming the flush here was removed because we will already always flush on shutdown in Worker::work()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why I felt safe to remove it

but the actual reason was that it could block (or introduce undefined delay)(#1125 (comment)) when joining the worker. To be honest, I would've just joined the thread, but the IO argument does make sense and I just followed with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised that within this logic it might be a good idea to drop writer as well: 45da512. There could be some IO there in destructor.

tracing-appender/src/non_blocking.rs Outdated Show resolved Hide resolved
@hawkw hawkw merged commit b439705 into tokio-rs:master Nov 13, 2021
davidbarsky pushed a commit that referenced this pull request Nov 17, 2023
Can be though of as a continuation to #1120 and #1125.

Example with problematic racy behavior:
```
use std::io::Write;

struct TestDrop<T: Write>(T);

impl<T: Write> Drop for TestDrop<T> {
    fn drop(&mut self) {
        println!("Dropped");
    }
}

impl<T: Write> Write for TestDrop<T> {
    fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
        self.0.write(buf)
    }
    fn flush(&mut self) -> std::io::Result<()> {
        self.0.flush()
    }
}

fn main() {
    let writer = TestDrop(std::io::stdout());
    let (non_blocking, _guard) = tracing_appender::non_blocking(writer);
    tracing_subscriber::fmt().with_writer(non_blocking).init();
}
```

Running this test case in a loop with `while ./test | grep Dropped; do
done`, it can be seen that sometimes writer (`TestDrop`) is not dropped
and the message is not printed. I suppose that proper destruction of
non-blocking writer should properly destroy underlying writer.

Solution involves joining `Worker` thread (that owns writer) after
waiting for it to almost finish avoiding potential deadlock (see
#1120 (comment))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants