Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(swarm)!: don't share event buffer for established connections #3188

Merged
merged 11 commits into from
Jan 19, 2023

Conversation

thomaseizinger
Copy link
Contributor

@thomaseizinger thomaseizinger commented Dec 2, 2022

Description

Currently, we only have a single channel for all established connections. This requires us to construct the channel ahead of time, before we even have a connection. As it turns out, sharing this buffer across all connections actually has downsides. In particular, this means a single, very busy connection can starve others by filling up this buffer, forcing other connections to wait until they can emit an event.

Notes

Depends-On: #3187

Links to any relevant issues

Open Questions

  • Does this need a changelog entry?

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

The task for a pending connection only ever sends one event into this
channel: Either a success or a failure. Cloning a sender adds one
slot to the capacity of the channel. Hence, we can start this capacity
at 0 and have the `cloning` of the `Sender` take care of properly
increasing the capacity.
@thomaseizinger thomaseizinger changed the base branch from master to 3186-no-buffer-size-pending-connections December 2, 2022 02:20
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, having a channel per connection sounds reasonable to me.

I am not yet sure I agree with the underlying motivation, namely whether we should get rid of SwarmBuilder. Happy to proceed here though.

swarm/src/connection/pool.rs Outdated Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor Author

I am not yet sure I agree with the underlying motivation, namely whether we should get rid of SwarmBuilder.

Do you mind posting your concerns/arguments to #3186?

Copy link
Contributor

@elenaf9 elenaf9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether we should simply release this as a patch.
The default for this value was (and remains) 7.
Users that are dealing with a large number of connections per node probably changed this value in their implementation to a much higher number (e.g. iroh to 256).
Buffering 256 events in total vs 256 events per connection is imo a significant difference.

I would actually prefer to do an explicit breaking change here by removing/ renaming connection_event_buffer_size and adding per_connection_event_buffer_size to ensure that users take note of this change.

swarm/src/connection/pool.rs Outdated Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor Author

Explicit breaking changes can be a useful tool but I think it might be a bit exaggerated here. What problems other than more memory usage do you see that might arise from this?

Also, breaking changes in libp2p-swarm and libp2p-core are such a pain to pull through the workspace that I am honestly not willing to do it for such a (IMO) minor thing. Happy to do it if we are already in the middle of a release cycle with breaking changes in it.

@dignifiedquire
Copy link
Member

From a user perspective I think, as long as this is clearly communicated in the release notes/changelog, it is fine to not make this a breaking change.

@dignifiedquire
Copy link
Member

Generally I am glad to see this change, as I suspect we actually ran into this issue, where a protocol emitting a lot of events per connection would overflow the rest of the system.

@mergify
Copy link
Contributor

mergify bot commented Dec 14, 2022

This pull request has merge conflicts. Could you please resolve them @thomaseizinger? 🙏

Base automatically changed from 3186-no-buffer-size-pending-connections to master December 19, 2022 05:25
@elenaf9
Copy link
Contributor

elenaf9 commented Dec 20, 2022

Explicit breaking changes can be a useful tool but I think it might be a bit exaggerated here. What problems other than more memory usage do you see that might arise from this?

Also, breaking changes in libp2p-swarm and libp2p-core are such a pain to pull through the workspace that I am honestly not willing to do it for such a (IMO) minor thing. Happy to do it if we are already in the middle of a release cycle with breaking changes in it.

From a user perspective I think, as long as this is clearly communicated in the release notes/changelog, it is fine to not make this a breaking change.

I don't feel strongly about it; maybe I am being overly careful. I am currently on vacation and don't have time to re-review.
Feel free to move forward here without the explicit breaking change.

elenaf9
elenaf9 previously approved these changes Dec 20, 2022
@mergify
Copy link
Contributor

mergify bot commented Dec 23, 2022

This pull request has merge conflicts. Could you please resolve them @thomaseizinger? 🙏

@mxinden
Copy link
Member

mxinden commented Jan 2, 2023

Generally I am glad to see this change, as I suspect we actually ran into this issue, where a protocol emitting a lot of events per connection would overflow the rest of the system.

@dignifiedquire note that this does not introduce a limit per protocol, but per connection. Thus, in case your protocol was previously reaching the limit and thus harming other protocols, it might still be doing so with this change, just a little delayed and per connection.

@mxinden
Copy link
Member

mxinden commented Jan 2, 2023

I am not sure whether we should simply release this as a patch. The default for this value was (and remains) 7. Users that are dealing with a large number of connections per node probably changed this value in their implementation to a much higher number (e.g. iroh to 256). Buffering 256 events in total vs 256 events per connection is imo a significant difference.

I would actually prefer to do an explicit breaking change here by removing/ renaming connection_event_buffer_size and adding per_connection_event_buffer_size to ensure that users take note of this change.

I agree with @elenaf9. I do think this is a significant change for folks in low-resource or high-connection environments and thus I think this change in functionality warrants a breaking change.

What problems other than more memory usage do you see that might arise from this?

Higher memory usage can be used as an attack, e.g. forcing the process to run out of memory. This is especially relevant as this is the size of the channel from the ConnectionHandler to the NetworkBehaviour and not the other way around. The former is potentially controlled by an attacker, in the worst case with large-sized messages.

@thomaseizinger
Copy link
Contributor Author

I am not sure whether we should simply release this as a patch. The default for this value was (and remains) 7. Users that are dealing with a large number of connections per node probably changed this value in their implementation to a much higher number (e.g. iroh to 256). Buffering 256 events in total vs 256 events per connection is imo a significant difference.
I would actually prefer to do an explicit breaking change here by removing/ renaming connection_event_buffer_size and adding per_connection_event_buffer_size to ensure that users take note of this change.

I agree with @elenaf9. I do think this is a significant change for folks in low-resource or high-connection environments and thus I think this change in functionality warrants a breaking change.

Should it come with a deprecation warning first then? i.e. should we maintain the old functionality in parallel so users can gradually migrate?

@thomaseizinger
Copy link
Contributor Author

Also, are we fine with users who are sticking to the default value potentially not noticing this?

@mxinden
Copy link
Member

mxinden commented Jan 16, 2023

Should it come with a deprecation warning first then? i.e. should we maintain the old functionality in parallel so users can gradually migrate?

I am assuming (intuition) that maintaining both introduces unreasonable complexity. Would you agree?

I am fine with a hard breaking change without deprecation. It does not require a large change on the user side. I consider this an expert-option, only set by people deeply familiar with libp2p.

Also, are we fine with users who are sticking to the default value potentially not noticing this?

I consider both the former and the new default value sane. Thus I don't think further notifications beyond the changelog are needed.

@thomaseizinger thomaseizinger changed the title refactor(swarm): don't share event buffer between established connections refactor(swarm)!: don't share event buffer between established connections Jan 17, 2023
@thomaseizinger
Copy link
Contributor Author

Ready for re-review.

@thomaseizinger thomaseizinger changed the title refactor(swarm)!: don't share event buffer between established connections refactor(swarm)!: don't share event buffer for established connections Jan 17, 2023
@thomaseizinger thomaseizinger dismissed elenaf9’s stale review January 17, 2023 10:55

Code has changed.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-ups!

@mergify mergify bot merged commit b5a3f81 into master Jan 19, 2023
@mergify mergify bot deleted the 3186-one-channel-per-connection branch January 19, 2023 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants