Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swarm/connection: Enforce limit on inbound substreams via StreamMuxer #2861

Merged
merged 46 commits into from
Sep 21, 2022

Conversation

thomaseizinger
Copy link
Contributor

@thomaseizinger thomaseizinger commented Aug 31, 2022

Description

Inlines the HandlerWrapper component into Connection which allows us to enforce the limit of inbound streams directly down to the StreamMuxer level by not calling poll_inbound if we would exceed the limit. Depending on the muxer implementation, this may enforce backpressure all the way to the other node.

Due to this backpressure mechanism, a remote may now be slow in accepting new outbound streams. To prevent unbounded growth of "pending" streams, we change the implementation such that the "upgrade timeout" for outbound streams starts ticking from the moment the stream is requested instead of only from when the stream is opened and we start the upgrade.

Fixes #2796.

Links to any relevant issues

Open tasks

  • Find out why we had to adjust the Connection::poll implementation
  • Start substream upgrade timeout at the time of substream request and not when the upgrade starts
  • Add changelog entry to yamux & bump version

Open Questions

  • How many streams should we buffer at most in libp2p-yamux::Yamux?
  • Should the change to libp2p-yamux be in a separate PR?

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

swarm/src/connection.rs Outdated Show resolved Hide resolved
@thomaseizinger thomaseizinger changed the title swarm/connection: Inline HandlerWrapper swarm/connection: Enforce limit on inbound substreams via StreamMuxer Sep 2, 2022
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful to see. I think there is more work needed before we can land this though.

swarm/src/connection.rs Show resolved Hide resolved
swarm/src/connection.rs Outdated Show resolved Hide resolved
swarm/src/connection.rs Outdated Show resolved Hide resolved
@mxinden
Copy link
Member

mxinden commented Sep 5, 2022

Looking at the many simplifications that @thomaseizinger and @elenaf9 landed in the last months, I already foresee the day where rust-libp2p is a single struct only 😄

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Sep 15, 2022

This is ready for a full review @mxinden @elenaf9! PR description is up to date, please have a look.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow ups on this Thomas!

muxers/yamux/src/lib.rs Outdated Show resolved Hide resolved

if this.inbound_stream_buffer.len() >= MAX_BUFFERED_INBOUND_STREAMS {
log::warn!("dropping {inbound_stream} because buffer is full");
drop(inbound_stream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to make sure we are making an informed decision here. Instead of tail dropping, we could as well do head dropping.

I would have to put more thoughts into this before having an opinion. This does maintain the status quo, thus we can consider it a non-change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, the timeout for opening a new stream already begins as the request is queued. I believe head dropping would negatively affect the number of successfully negotiated streams because we might drop a stream that was about to be processed by the local node. A dropped stream will trigger a ConnectionHandler::inject_dial_upgrade_error with the provided "open info" on the remote node.

In a scenario where a node opens 3 streams that are all pending in this buffer, I think it may be confusing if stream 1 fails but 2 and 3 succeed. It feels more natural to process this in FIFO order.

}

const MAX_BUFFERED_INBOUND_STREAMS: usize = 25;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, the additional buffer of 25 streams increases the overall number of possible inflight inbound streams, correct? In other words, while we previously only supported max_negotiating_inbound_streams before dropping streams, we now allow up to max_negotiating_inbound_streams + 25.

Is this correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that sounds correct to me! Like I said, this number is completely arbitrary and we can change it to something else :)

Cargo.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@elenaf9 elenaf9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

I had to adjust the implemenation of Connection::poll for the autonat (???) tests to work again. I've debugged it for a bit but I can't explain why. Any pointers are welcome.

I can look into that (I am the author of those tests). Those test expect a strict order in which events should occur, which is maybe not ideal. On the other hand they prove that it does make a difference in which order we poll our state machines and this can help decide what order makes the most sense.

swarm/src/connection.rs Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor Author

I had to adjust the implemenation of Connection::poll for the autonat (???) tests to work again. I've debugged it for a bit but I can't explain why. Any pointers are welcome.

I can look into that (I am the author of those tests). Those test expect a strict order in which events should occur, which is maybe not ideal. On the other hand they prove that it does make a difference in which order we poll our state machines and this can help decide what order makes the most sense.

That is outdated and has since been fixed. No adjustment was needed, I just had a stupid bug :D

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

@mxinden
Copy link
Member

mxinden commented Sep 21, 2022

Interoperability failures due to libp2p/test-plans#41. Merging here.

@mxinden mxinden merged commit a4d1e58 into master Sep 21, 2022
@thomaseizinger thomaseizinger deleted the inline-handler-wrapper branch September 21, 2022 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enforce back-pressure on incoming substreams using new StreamMuxer interface
3 participants