Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(p2p): ensure time synchronization in the network #2255

Merged
merged 25 commits into from
Dec 23, 2024

Conversation

onur-ozkan
Copy link
Member

@onur-ozkan onur-ozkan commented Oct 28, 2024

Implementation Details

Once a connection is established with a peer, validation check is immediately performed to confirm the peer's clock. If the peer's time exceeds the maximum allowed cap, the connection is terminated. This approach ensures that all peers performing operations in the network are synchronized in terms of time. Current time cap is set to 30 seconds, which can be changed if it doesn't fit the requirements.

Each peer is temporarily added to the RECENTLY_DIALED_PEERS map for 5 minutes before any connection attempt. This prevents repeated connection attempts to peers that are unavailable or out-of-sync, which reduces unnecessary reconnection overheads.

                      +---------------------+
                      |  Connection Attempt |
                      |       On a Peer     |
                      +---------------------+
                                |
                                v
                   +------------------------+
                   |   Is already dialed    |
                   |       recently?        |
                   +------------------------+
                      |                    |
                      Yes                  No
                      |                    |
                      v                    v
        +---------------------+       +-----------------------+
        |   Skip Connection   |       |  Make the Connection  |
        +---------------------+       +-----------------------+
                                             |
                                             v
                            +---------------------------------+
                            |     Check Peer Time Validity    |
                            +---------------------------------+
                                    |              |
                               Time Valid     Time Invalid
                                    |              |
                                    v              v
                                    |       +-------------------------+
                                    |       | Disconnect from Peer    |
                                    |       +-------------------------+
                                    |
                                    |
                                    v
                +------------------------------------+
                |       Validation completed         |
                +------------------------------------+

Breaking Changes

1. A new request-response payload (GetPeerUtcTimestamp) has been added. This request is sent at each connection attempt, which makes it mandatory for all seed nodes and GUI applications in the network to be updated to this version. Meaning, this version is incompatible with any older version.

2. The NetworkInfoRequest has been renamed to PeerInfoRequest. This likely affects the encoded payload of GetMm2Version (not yet verified but highly likely).

Resolves #1115 and #1683

Blocked by #2256

Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
Comment on lines 17 to 20
/// TODO: This should be called `PeerInfoRequest` instead. However, renaming it
/// will introduce a breaking change in the network and is not worth it. Do this
/// renaming when there is already a breaking change in the release.
NetworkInfo(network_info::NetworkInfoRequest),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR already leads to breaking change. It's perfect time to rename this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to avoid breaking changes is to have a timeout on the request for peer timestamp and allow them to connect if there is no response from the peer. We later remove this when most peers update. Another way is to get the mm2 version and do the timestamp request depending on the version. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to avoid breaking changes is to have a timeout on the request for peer timestamp and allow them to connect if there is no response from the peer. We later remove this when most peers update.

This sounds like a good plan!

Another way is to get the mm2 version and do the timestamp request depending on the version. What do you think?

I think the first idea was better, but then I need to revert the peer-info renaming part. What about bundling these changes in some branch, and then releasing multiple breaking changes at once (like removing mm2 binaries and some other stuff) ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about bundling these changes in some branch, and then releasing multiple breaking changes at once (like removing mm2 binaries and some other stuff) ?

I am fine with this or the first approach.

Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
@onur-ozkan onur-ozkan marked this pull request as ready for review November 5, 2024 05:22
Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
Copy link
Collaborator

@shamardy shamardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have a few suggestion in addition to these 2 comments #2255 (comment), #2255 (comment)

mm2src/mm2_p2p/src/behaviours/atomicdex.rs Outdated Show resolved Hide resolved
mm2src/mm2_p2p/src/behaviours/atomicdex.rs Outdated Show resolved Hide resolved
mm2src/mm2_p2p/src/behaviours/atomicdex.rs Outdated Show resolved Hide resolved
mm2src/mm2_p2p/src/behaviours/atomicdex.rs Outdated Show resolved Hide resolved
mm2src/mm2_p2p/src/behaviours/atomicdex.rs Outdated Show resolved Hide resolved
mm2src/mm2_p2p/src/behaviours/atomicdex.rs Outdated Show resolved Hide resolved
Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
mm2src/mm2_main/src/lp_stats.rs Outdated Show resolved Hide resolved
Comment on lines 185 to 189
/// Determines if a dial attempt to the remote should be made.
///
/// Returns `false` if a dial attempt to the given address has already been made,
/// in which case the caller must skip the dial attempt.
fn pre_dial_check(recently_dialed_peers: &mut MutexGuard<TimedMap<StdClock, Multiaddr, ()>>, addr: &Multiaddr) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither the doc comment nor the func name signify how this function changes the timedmap (only visible from the mutability of the passed map). Let's change this to something like check_and_mark_dialed or something.

Also, no need for the MutexGuard annotation here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither the doc comment nor the func name signify how this function changes the timedmap (only visible from the mutability of the passed map)

"only visible from the mutability of the passed map" that's visible everywhere, both in the function signature and on the caller side!

I struggle to see how check_and_mark_dialed describes the function better than pre_dial_check but whatever, I am okay with it.

Also, no need for the MutexGuard annotation here.

It gives/documents the background context how this map is utilized from caller.

mm2src/mm2_p2p/src/behaviours/atomicdex.rs Outdated Show resolved Hide resolved
mm2src/mm2_p2p/src/behaviours/atomicdex.rs Outdated Show resolved Hide resolved
Signed-off-by: onur-ozkan <work@onurozkan.dev>
@shamardy shamardy removed the P2 label Nov 15, 2024
@onur-ozkan
Copy link
Member Author

Do we have any blocker for this PR? @shamardy

Signed-off-by: onur-ozkan <work@onurozkan.dev>
Signed-off-by: onur-ozkan <work@onurozkan.dev>
@onur-ozkan
Copy link
Member Author

Doesn't cause the breakage anymore.

@onur-ozkan
Copy link
Member Author

Resolved conflicts

Copy link
Collaborator

@mariocynicys mariocynicys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking comments inline (just to not block this further).
LGTM

debug!(
"Peer '{peer}' is within the acceptable time gap ({MAX_TIME_GAP_FOR_CONNECTED_PEER} seconds); time difference is {diff} seconds."
);
response_tx.send(None).await.unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this send in redundant. what about not sending anything in here, as the other end of the end of the channel doesn't really use the None for anything.

Copy link
Member Author

@onur-ozkan onur-ozkan Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't send anything, it will block the request/process until timeout. The channel plays a request-response style communication here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please take a second look at this. the response_tx is for the timestamp checker and not a response to some node or something (i didn't really get what u mean).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thought was awaiting on result of this channel

error!("Unexpected response `{other:?}` from peer `{peer}`");
// TODO: Ideally, we should send `Some(peer)` to end the connection,
// but we don't want to cause a breaking change yet.
response_tx.send(None).await.unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for this None

@@ -733,11 +818,27 @@ fn start_gossipsub(
}
}

while let Poll::Ready(Some(Some(peer_id))) = timestamp_rx.poll_next_unpin(cx) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we can make this channel transmit PeerIDs instead of Option<PeerID>s.

@onur-ozkan onur-ozkan merged commit 87be260 into dev Dec 23, 2024
18 of 23 checks passed
@onur-ozkan onur-ozkan deleted the time-synced-network branch December 23, 2024 12:06
@dimxy
Copy link
Collaborator

dimxy commented Dec 23, 2024

BTW I tried to run tests in the mm2_p2p module with command cargo test -p mm2_p2p but received errors could not find application in the crate root
(however in the previous commit this command ran okay)

@onur-ozkan
Copy link
Member Author

BTW I tried to run tests in the mm2_p2p module with command cargo test -p mm2_p2p but received errors could not find application in the crate root (however in the previous commit this command ran okay)

That's weird lol. The crate path can't be resolved when targeting mm2_p2p specifically but it does build when building whole workspace (which we do on CI by default). We can fix that as a follow-up PR.

@dimxy
Copy link
Collaborator

dimxy commented Dec 23, 2024

BTW I tried to run tests in the mm2_p2p module with command cargo test -p mm2_p2p but received errors could not find application in the crate root (however in the previous commit this command ran okay)

That's weird lol. The crate path can't be resolved when targeting mm2_p2p specifically but it does build when building whole workspace (which we do on CI by default). We can fix that as a follow-up PR.

cargo test -p mm2_p2p --features "application" is needed

@onur-ozkan
Copy link
Member Author

I am aware, it's fixed in #2302 already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants