GC Bug #117

macrocan · 2021-04-15T08:54:17Z

Firstly, show one case

    let data = b"hello world";
    stream.write_all(data).await.expect("C write msg 1");
    stream.write_all(data).await.expect("C write msg 2");
    // msg3
    stream.write_all(data).await.expect("C write msg 3");

    // drop it immediately
    drop(stream);

Drop stream immediately after write message. As result, msg3 can't be received by reader.
After debug, We found that GC send RST frame before msg3.

// garbage_collect()

    if let Some(f) = frame {
        log::trace!("{}: sending: {}", self.id, f.header());
        self.socket.get_mut().send(&f).await.or(Err(ConnectionError::Closed))?
    }

GC send RST frame directly to reader, At the meantime, msg3 is still queueed in stream receiver.
If GC send RST frame to stream receiver, the bug will be fixed.

// garbage_collect()

    if let Some(f) = frame {
        log::trace!("{}/{}: sending: {}", self.id, stream_id, f.header());
        // send to message process queue to process(/send) orderly
        let cmd = StreamCommand::SendFrame(f.left());
        if self.stream_sender.try_send(cmd).is_err() {
            continue;
        }
    }

mxinden · 2021-04-21T12:38:55Z

Yes, indeed, there is a race condition here. Users of yamux have to call poll_close on a stream before dropping it to make sure all messages are sent before closing the stream with the remote.

Let me know in case that works for you.

macrocan · 2021-04-23T07:17:37Z

Yes, indeed, there is a race condition here. Users of yamux have to call poll_close on a stream before dropping it to make sure all messages are sent before closing the stream with the remote.

Let me know in case that works for you.

For convenience of user, It seems that GC can close stream automatically when stream is dropped.

tomaka · 2021-04-23T13:27:33Z

By sending RST to the remote, you inform of the fact that the messages that the remote has sent on the substream might not have been received and processed. Sending a close message and immediately discarding the substream would be against that idea.

macrocan · 2021-04-24T14:11:39Z

By sending RST to the remote, you inform of the fact that the messages that the remote has sent on the substream might not have been received and processed. Sending a close message and immediately discarding the substream would be against that idea.

My mistake, when stream is dropped, Gc should reset stream after all queued message were sent.

tomaka · 2021-04-25T16:59:26Z

It's not about sending remaining messages, but about receiving the messages that the remote might be sending to you. In your specific situation, you know that the remote doesn't send anything, but that's not the case in the absolute.

macrocan · 2021-04-26T07:04:48Z

It's not about sending remaining messages, but about receiving the messages that the remote might be sending to you. In your specific situation, you know that the remote doesn't send anything, but that's not the case in the absolute.

You are right, we should send Ack alongside with RST frame if receive from the remote.
I think that Gc can do it also.

folex · 2022-04-06T16:16:04Z

is this bug still here?

I'm getting lots of unexpected EOFs when sending big messages between two nodes, could it be relevant? EOFs start to happen when message size exceeds send_split_size parameter

UPD: After several days of debugging, it turned out I didn't do socket.close().await in OutboundUpgrade.

mxinden · 2022-04-06T20:17:36Z

is this bug still here?

I am not aware of it.

UPD: After several days of debugging, it turned out I didn't do socket.close().await in OutboundUpgrade.

Do I understand correctly that this resolves the above mentioned EOF on large messages? 🎉

folex · 2022-04-07T08:05:20Z

Do I understand correctly that this resolves the above mentioned EOF on large messages?

Yes! :)

While we're at it, could you kindly elaborate on why requirement to call socket.close() appeared and why there wasn't such a requirement before?

I'm looking at e.g. rust-ipfs c7a79980bd and I see that they somehow understood that they should add socket.close().

I want to learn how they knew about this requirement, was it communicated or expressed somewhere in the docs?

Many thanks for your great work!

mxinden · 2022-04-08T19:12:37Z

While we're at it, could you kindly elaborate on why requirement to call socket.close() appeared and why there wasn't such a requirement before?

I'm looking at e.g. rust-ipfs c7a79980bd and I see that they somehow understood that they should add socket.close().

I want to learn how they knew about this requirement, was it communicated or expressed somewhere in the docs?

Puuh, I am not sure to be honest. Maybe @koivunej knows more.

Many thanks for your great work!

❤️

koivunej · 2022-04-09T10:12:03Z

Thanks for the ping!

Regarding the linked socket.close().await I am pretty sure it was due to related change in write_length_prefixed in rust-libp2p and this need for this was hinted in the rust-libp2p change as required migration ... I'll try to find it, sadly nothing of this was recorded in comments or commit msg. I would however not take rust-ipfs as an example @folex; I haven't had time to work on it for a long while now.

I'll update if/when I'll find more rationale for that close.

Oki, found it I think, originally write_one was being used in that particular place, and it used to do the close dance as well, so the linked commit was just about restoring the old behaviour without deprecated rust-libp2p code. The related libp2p commit is libp2p/rust-libp2p@c1ef4bf.

However to repeat, I cannot comment on if that close should be there, as my efforts have recently focused on trying to understand the latest windows test failures which started with the latest rust-libp2p upgrade. The bitswap protocol we have is nothing but a draft. I'll try to remember to add some disclaimers around it as well.

thomaseizinger · 2023-05-19T16:47:14Z

This bug can still be triggered if a user calls flush() first and then drops the stream because at the moment, poll_flush is a no-op. I have a fix for this in #156.

…156) Currently, we have a `garbage_collect` function that checks whether any of our streams have been dropped. This can cause a race condition where the channel between a `Stream` and the `Connection` still has pending frames for a stream but dropping a stream causes us to already send a `FIN` flag for the stream. We fix this by maintaining a single channel for each stream. When a stream gets dropped, the `Receiver` becomes disconnected. We use this information to queue the correct frame (`FIN` vs `RST`) into the buffer. At this point, all previous frames have already been processed and the race condition is thus not present. Additionally, this also allows us to implement `Stream::poll_flush` by forwarding to the underlying `Sender`. Note that at present day, this only checks whether there is _space_ in the channel, not whether the items have been emitted by the `Receiver`. We have a PR upstream that might fix this: rust-lang/futures-rs#2746 Fixes: #117.

macrocan mentioned this issue Apr 15, 2021

fix gc bug #118

Closed

ValeryAntopol mentioned this issue Apr 6, 2022

yamux 0.10.1, set_split_send_size libp2p/rust-libp2p#2606

Closed

folex mentioned this issue Apr 19, 2022

Is there any way to tell what PeerId has sent Inbound message in InboundUpgrade? libp2p/rust-libp2p#2602

Closed

thomaseizinger mentioned this issue May 19, 2023

fix: avoid race condition between pending frames and closing stream #156

Merged

mxinden closed this as completed in #156 May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GC Bug #117

GC Bug #117

macrocan commented Apr 15, 2021 •

edited

Loading

mxinden commented Apr 21, 2021

macrocan commented Apr 23, 2021

tomaka commented Apr 23, 2021

macrocan commented Apr 24, 2021

tomaka commented Apr 25, 2021

macrocan commented Apr 26, 2021

folex commented Apr 6, 2022 •

edited

Loading

mxinden commented Apr 6, 2022

folex commented Apr 7, 2022

mxinden commented Apr 8, 2022

koivunej commented Apr 9, 2022 •

edited

Loading

thomaseizinger commented May 19, 2023

GC Bug #117

GC Bug #117

Comments

macrocan commented Apr 15, 2021 • edited Loading

mxinden commented Apr 21, 2021

macrocan commented Apr 23, 2021

tomaka commented Apr 23, 2021

macrocan commented Apr 24, 2021

tomaka commented Apr 25, 2021

macrocan commented Apr 26, 2021

folex commented Apr 6, 2022 • edited Loading

mxinden commented Apr 6, 2022

folex commented Apr 7, 2022

mxinden commented Apr 8, 2022

koivunej commented Apr 9, 2022 • edited Loading

thomaseizinger commented May 19, 2023

macrocan commented Apr 15, 2021 •

edited

Loading

folex commented Apr 6, 2022 •

edited

Loading

koivunej commented Apr 9, 2022 •

edited

Loading