appender: fix WorkerGuard not waiting for writer destruction #1713

trtt · 2021-11-10T10:51:54Z

Motivation

Can be though of as a continuation to #1120 and #1125.

Example with problematic racy behavior:

use std::io::Write;

struct TestDrop<T: Write>(T);

impl<T: Write> Drop for TestDrop<T> {
    fn drop(&mut self) {
        println!("Dropped");
    }
}

impl<T: Write> Write for TestDrop<T> {
    fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
        self.0.write(buf)
    }
    fn flush(&mut self) -> std::io::Result<()> {
        self.0.flush()
    }
}

fn main() {
    let writer = TestDrop(std::io::stdout());
    let (non_blocking, _guard) = tracing_appender::non_blocking(writer);
    tracing_subscriber::fmt().with_writer(non_blocking).init();
}

Running this test case in a loop with while ./test | grep Dropped; do done, it can be seen that sometimes writer (TestDrop) is not dropped and the message is not printed.
I suppose that proper destruction of non-blocking writer should properly destroy underlying writer.

Solution

Solution involves joining Worker thread (that owns writer) after waiting for it to almost finish avoiding potential deadlock (see #1120 (comment))

tracing-appender/src/non_blocking.rs

hawkw · 2021-11-11T21:26:48Z

tracing-appender/src/non_blocking.rs

@@ -277,7 +277,16 @@ impl Drop for WorkerGuard {
 // when the `Worker` calls `recv()` on a zero-capacity channel. Use `send_timeout`
 // so that drop is not blocked indefinitely.
 // TODO: Make timeout configurable.
- let _ = self.shutdown.send_timeout((), Duration::from_millis(1000));
+ match self.shutdown.send_timeout((), Duration::from_millis(1000)) {
+ Err(SendTimeoutError::Disconnected(_)) => (),


if we get the Disconnected error, this means that the Receiver side of the channel has already been dropped...but that doesn't necessarily mean that the worker thread has terminated yet. should we still call join on the JoinHandle in this case?

if we get the Disconnected error, this means that the Receiver side of the channel has already been dropped...but that doesn't necessarily mean that the worker thread has terminated yet.

right, this can be racy also

If Disconnected can occur only during worker thread (which owns Receiver) destruction, I think it is safe to use join here.
Went ahead and merged Disconnected with Ok case.

Co-authored-by: Eliza Weisman <eliza@buoyant.io>

hawkw

looks good to me overall! i commented on a couple of additional things that might be worth thinking about.

hawkw · 2021-11-12T17:33:13Z

tracing-appender/src/worker.rs

- if let Err(e) = self.writer.flush() {
- eprintln!("Failed to flush. Error: {}", e);
- }


i'm assuming the flush here was removed because we will already always flush on shutdown in Worker::work()?

that's why I felt safe to remove it

but the actual reason was that it could block (or introduce undefined delay)(#1125 (comment)) when joining the worker. To be honest, I would've just joined the thread, but the IO argument does make sense and I just followed with it.

I realised that within this logic it might be a good idea to drop writer as well: 45da512. There could be some IO there in destructor.

tracing-appender/src/non_blocking.rs

Can be though of as a continuation to #1120 and #1125. Example with problematic racy behavior: ``` use std::io::Write; struct TestDrop<T: Write>(T); impl<T: Write> Drop for TestDrop<T> { fn drop(&mut self) { println!("Dropped"); } } impl<T: Write> Write for TestDrop<T> { fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> { self.0.write(buf) } fn flush(&mut self) -> std::io::Result<()> { self.0.flush() } } fn main() { let writer = TestDrop(std::io::stdout()); let (non_blocking, _guard) = tracing_appender::non_blocking(writer); tracing_subscriber::fmt().with_writer(non_blocking).init(); } ``` Running this test case in a loop with `while ./test | grep Dropped; do done`, it can be seen that sometimes writer (`TestDrop`) is not dropped and the message is not printed. I suppose that proper destruction of non-blocking writer should properly destroy underlying writer. Solution involves joining `Worker` thread (that owns writer) after waiting for it to almost finish avoiding potential deadlock (see #1120 (comment))

appender: fix WorkerGuard not waiting for writer destruction

a6ae3e3

trtt requested a review from a team as a code owner November 10, 2021 10:51

hawkw reviewed Nov 11, 2021

View reviewed changes

trtt and others added 3 commits November 12, 2021 15:00

refactor(suggestion): better error

ac42497

Co-authored-by: Eliza Weisman <eliza@buoyant.io>

refactor: another error message

fb1f941

join worker thread in disconnected case also

1e1741f

hawkw approved these changes Nov 12, 2021

View reviewed changes

trtt and others added 3 commits November 13, 2021 19:52

report error on failed join

13fec12

drop writer before joining worker thread

45da512

Merge branch 'master' into fix/workerdrop

f004b13

hawkw merged commit b439705 into tokio-rs:master Nov 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

appender: fix WorkerGuard not waiting for writer destruction #1713

appender: fix WorkerGuard not waiting for writer destruction #1713

trtt commented Nov 10, 2021

hawkw Nov 11, 2021

trtt Nov 12, 2021

hawkw left a comment

hawkw Nov 12, 2021

trtt Nov 13, 2021

trtt Nov 13, 2021

appender: fix WorkerGuard not waiting for writer destruction #1713

appender: fix WorkerGuard not waiting for writer destruction #1713

Conversation

trtt commented Nov 10, 2021

Motivation

Solution

hawkw Nov 11, 2021

Choose a reason for hiding this comment

trtt Nov 12, 2021

Choose a reason for hiding this comment

hawkw left a comment

Choose a reason for hiding this comment

hawkw Nov 12, 2021

Choose a reason for hiding this comment

trtt Nov 13, 2021

Choose a reason for hiding this comment

trtt Nov 13, 2021

Choose a reason for hiding this comment