Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(kad): introduce AsyncBehaviour #5294

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

stormshield-frb
Copy link
Contributor

Description

The more things our software is doing, the more it is complicated to track Kademlia queries. We find ourselves needing to have distinct HashMap to keep track of every queries we are doing so we can link them back to where and why there were triggered and we needed them. This began to be very messy and that is why we have implemented a wrapper of the kad::Behaviour with the goal of simplifying the tracking of Kademlia queries.

This wrapper is pretty simple and has one method for each of the possible Kademlia queries (bootstrap, get_closest_peers, etc) respectively suffixed _async (bootstrap_async, get_closest_peers_async, etc). Those methods, instead of returning a QueryId, return a typed UnboundedQueryResultReceiver that will receive every Event::OutboundQueryProgressed corresponding to the QueryId. The only purpose of this wrapper is to map an OutboundQueryProgressed to the corresponding sender and send it.

Doing so, it is very easy for the developer to track his queries and keep a correct state in his application specific code.

Notes & open questions

I have chosen to use UnboundedChannel considering that the data transmitted is already in memory since this we obtain it when receiving an OutboundQueryProgressed. That is why I don't think there is a particular risk of running out of memory. If there is, I can rework my wrapper to work with bounded channels.

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@drHuangMHT
Copy link
Contributor

drHuangMHT commented Apr 5, 2024

I think it's OK to drop some messages though, they are not that important.
I like the idea of directing query result to the query-er but the event system is designed in such a way so that everyone can listen on any event emitted by anyone. However I do encounter times when swarm events overwhelm consumers and that forced me to implement backpressure so this is to-be-discussed.

Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi François, thanks for starting this! I think this is a great idea for kademlia to offer async await friendly primitives, left some notes to help move this forward

/// with an [`Event::OutboundQueryProgressed`] like nothing happen.
///
/// For more information, see [`Behaviour`].
pub struct AsyncBehaviour<TStore> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of following the design introduced with stream, i.e. introducing a clonable Control
handle which implements the methods, that way we don't need to keep a reference to the Swarm to call the methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it would probably be great !!

I'm thinking about it but I really don't see how I could implement it since I need to capture the events emitted by the kad::Behaviour. Do you have an idea ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jxs , I've thought a little bit about this. The only way I see to make a Control work for the kad::Behaviour is to have the entire Behaviour behind an Arc<Mutex<>>. I don't know if it is a good or bad idea.

Will the "control" semantic become something standard for all the behaviours ? If so why not. But if not, I don't know how it will be understood by the end user to have some behaviours using a control and others not.

I'd really like to also have your opinion on this @guillaumemichel when you have the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am discovering how Control works and I think it would make sense in this case. What would be the arguments against using it?

Copy link
Contributor

@elenaf9 elenaf9 Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jxs , I've thought a little bit about this. The only way I see to make a Control work for the kad::Behaviour is to have the entire Behaviour behind an Arc<Mutex<>>. I don't know if it is a good or bad idea.

I don't think you need an Arc/ Mutex.
You could split the current AsyncBehavior logic into two parts: the Control part and the NetworkBehavior part, that communicate through an additional mpsc channel (the "command-channel").

  • Control would have all the async variants of the kademlia behavior (get_closest_peers_async, get_providers_async, etc.).
    There, it would create the mpsc channel for the results, and then send the a tuple (Command::GetProviders {..}, mpsc::Sender<AsyncQueryResult>) through the "command-channel" to the AsyncBehavior.
  • AsyncBehavior would wrap the kademlia behavior and impl NetworkBehavior. In its poll_next loop it would poll the receiving side of the command-channel, and handle the incoming command by calling the matching function on the wrapped kademlia behavior and storing the Sender in the query_result_senders hashmap. poll_next would also still have the already existing logic for intercepting ToSwarm::GenerateEvent events from the inner behavior and forwarding the results.

Does that make sense? @jxs was that roughly what you had in mind?

protocols/kad/src/async_behaviour.rs Outdated Show resolved Hide resolved
swarm/src/behaviour.rs Outdated Show resolved Hide resolved
@stormshield-frb stormshield-frb force-pushed the feat/async-kad branch 2 times, most recently from 48cf62a to 83480f9 Compare April 9, 2024 14:58
@stormshield-frb
Copy link
Contributor Author

@jxs like I mentioned in this PR description (in section Notes and open questions), I have used unbounded channels thinking that this could not cause a memory issue more than what there is already since all the sent events are already in memory. Still, clippy does not agree with me and the CI fails. What do you prefer ? I put an allow(clippy::disallowed_method) or I try to use bounded channels ?

I really don't have an opinion on this.

@dariusc93
Copy link
Member

@jxs like I mentioned in this PR description (in section Notes and open questions), I have used unbounded channels thinking that this could not cause a memory issue more than what there is already since all the sent events are already in memory. Still, clippy does not agree with me and the CI fails. What do you prefer ? I put an allow(clippy::disallowed_method) or I try to use bounded channels ?

I really don't have an opinion on this.

It would be preferred to use bounded channels so we dont introduce that vector.

Copy link
Contributor

mergify bot commented Apr 19, 2024

This pull request has merge conflicts. Could you please resolve them @stormshield-frb? 🙏

@guillaumemichel
Copy link
Contributor

Sorry for the late comment, I like the idea of having async updates from the query in an easier manner. I would also prefer to avoid unbounded channels

Comment on lines +107 to +108
// This query was either not triggered by the user or the receiver has been dropped and removed
// so we simply forward it back up to the swarm like nothing happened.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query could also have been triggered through the non-async methods of the inner kad Behavior that we deref to, right?

protocols/kad/src/async_behaviour.rs Show resolved Hide resolved
/// with an [`Event::OutboundQueryProgressed`] like nothing happen.
///
/// For more information, see [`Behaviour`].
pub struct AsyncBehaviour<TStore> {
Copy link
Contributor

@elenaf9 elenaf9 Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jxs , I've thought a little bit about this. The only way I see to make a Control work for the kad::Behaviour is to have the entire Behaviour behind an Arc<Mutex<>>. I don't know if it is a good or bad idea.

I don't think you need an Arc/ Mutex.
You could split the current AsyncBehavior logic into two parts: the Control part and the NetworkBehavior part, that communicate through an additional mpsc channel (the "command-channel").

  • Control would have all the async variants of the kademlia behavior (get_closest_peers_async, get_providers_async, etc.).
    There, it would create the mpsc channel for the results, and then send the a tuple (Command::GetProviders {..}, mpsc::Sender<AsyncQueryResult>) through the "command-channel" to the AsyncBehavior.
  • AsyncBehavior would wrap the kademlia behavior and impl NetworkBehavior. In its poll_next loop it would poll the receiving side of the command-channel, and handle the incoming command by calling the matching function on the wrapped kademlia behavior and storing the Sender in the query_result_senders hashmap. poll_next would also still have the already existing logic for intercepting ToSwarm::GenerateEvent events from the inner behavior and forwarding the results.

Does that make sense? @jxs was that roughly what you had in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants