Fix premature disconnections from seeds #5057

chimp1984 · 2021-01-06T03:00:46Z

Fixes issues with disconnections while peer is still in the initital data request phase.

We have several Request/Response cycles at startup:

PreliminaryGetDataRequest -> GetDataResponse
GetUpdatedDataRequest -> GetDataResponse
GetBlocksRequest ->GetBlocksResponse
GetDaoStateHashesRequest -> GetDaoStateHashesResponse
GetBlindVoteStateHashesRequest -> GetBlindVoteStateHashesResponse
GetProposalStateHashesRequest -> GetProposalStateHashesResponse

We set the PeerType to INITIAL_DATA_REQUEST but we had reset the PeerType to PEER once seed sent the GetDataResponse for the GetUpdatedDataRequest. This was too early as the PeerManager at connection management disconnects the low-prio connection (PEER) if there are too many connections.
This has led to the problem that the seed closed the connection before the peer can start the GetBlocksRequest.
Also the last GetDataResponse was at risk to not arrive as the connection might got closed too fast.

We changed that model now so that we count the requests and responses of this bootstrap phasse and if we reach the expected number we assume we are done and reset the PeerType to PEER. As we cannot rely on that we also start a timer to reset after 4 minutes.
When we exceed 20 connections at a normal peer or 34 at seed nodes (with maxConnections 20) we start to disconnect also peers with type INITIAL_DATA_EXCHANGE. We sort by oldest date when we sent or received a msg, so increase the chance that this peer is already done with bootstrapping.

We also removed the removeSuperfluousSeedNodes method which was a bit dangerous from the app side as it was protected only by the allowDisconnectSeedNodes flag which was handled by the GetBlockHandlers, but there was risk that we disconnect before we can request the StateHashes.

We have to take care that seed nodes do not get too weakly connected to healty nodes as they would miss then broadcast messages if all their connections would be occupied by bootstrapping nodes. I doubt that the changes here would have any impact on that behaviour but as its pretty complex we have to be careful with deployment.

I will test it on a small group of seed nodes first and add some more statistics to see which types of connections the seed has.

relevant classes

Remove unnecessary setPeerType calls. ConnectionState is handling that. Only PeerManager does the setting of isSeedNode as we do not have the required dependency in ConnectionState.

Fix typo

At INITIAL_DATA_EXCHANGE we sort by lastInitialDataExchangeMessageTimeStamp

…alDataMsgTimeStamp

In the 3rd attempt we filter for INITIAL_DATA_EXCHANGE peers. Before we excluded 2 types and as PEER have been already filtered earlier we would look up for SEED_NODE. This was only called by non-seedNodes.

Rename maxConnectionsPeer to outBoundPeerTrigger Rename maxConnectionsNonDirect to initialDataExchangeTrigger Update comment

We handle the connections by INITIAL_DATA_EXCHANGE which cover the seed nodes as well. Do have an parallel routine is risky and make things more complex.

We handle it in ConnectionState by counting requests and responses and adding a timer

Cleanups

it does not do the getBlocksRequest.

Add ConnectionStatistics Print statistics of all live connections periodically

Also track BundleOfEnvelopes

to be accessible to the statistics log.

cbeams · 2021-01-06T08:33:13Z

Concept ACK. Described changes make sense. I've only scanned the code changes themselves, though, so not a real review. Thanks for turning this around so quickly, @chimp1984!

Improve statistic logging

It reports "Wanted but not invoked:" but when debugging into it it is called. So seems to be some mock setup issue

Do check for closed socket after blocking read. Move throttle code after blocking read.

chimp1984 · 2021-01-06T21:54:45Z

I deployed it to 3 seed nodes. Observing how they behave before extending to other seeds.

I added new statistics logging per connection. That helps to analyse the peer management and see where are bottlenecks.

Jan-06 21:49:33.738 [SeedNodeMain] INFO  b.n.p.p.PeerManager: Connection statistics: 

Connection 1
Age: 1 minute, 45.838 seconds
Peer: [Seed node] sn2bisqad7ncazupgbd3dcedqh5ptirgwofw63djwpdtftwhddo75oid.onion:8000 
Type: PEER 
Direction: Outbound
UID: e2e209db-b50e-4caa-972d-cced1afb90f8
Time since last message: 0.800 seconds
Time for response: [PreliminaryGetDataRequest/Response: 18.532 seconds, GetDaoStateHashesRequest/Response: 0.550 seconds, GetProposalStateHashesRequest/Response: 0.684 seconds, GetBlindVoteStateHashesRequest/Response: 0.621 seconds, GetUpdatedDataRequest/Response: 1.549 seconds]
Sent data: 665.76 kB; {PreliminaryGetDataRequest=1, AddDataMessage=6, GetDaoStateHashesRequest=1, GetProposalStateHashesRequest=1, GetBlindVoteStateHashesRequest=1, RefreshOfferMessage=13, GetUpdatedDataRequest=1, GetPeersRequest=1, BundleOfEnvelopes=7}
Received data: 5.068 MB; {AddDataMessage=2, GetDaoStateHashesResponse=1, GetPeersResponse=1, GetDataResponse=2, RefreshOfferMessage=11, GetProposalStateHashesResponse=1, RemoveDataMessage=1, GetBlindVoteStateHashesResponse=1}
CPU time spent on sending messages: 0.179 seconds
CPU time spent on receiving messages: 3.171 seconds

Connection 2
Age: 1 minute, 45.427 seconds
Peer: [Seed node] wizseedscybbttk4bmb2lzvbuk2jtect37lcpva4l3twktmkzemwbead.onion:8000 
Type: INITIAL_DATA_EXCHANGE 
Direction: Outbound
UID: f65a92c4-45b7-497f-a79c-107393b1287d
Time since last message: 0.701 seconds
Time for response: [PreliminaryGetDataRequest/Response: 19.199 seconds, GetDaoStateHashesRequest/Response: 0.704 seconds, GetProposalStateHashesRequest/Response: 0.834 seconds, GetBlindVoteStateHashesRequest/Response: 0.690 seconds]
Sent data: 337.351 kB; {PreliminaryGetDataRequest=1, AddDataMessage=11, GetDaoStateHashesRequest=1, GetProposalStateHashesRequest=1, GetBlindVoteStateHashesRequest=1, RefreshOfferMessage=17, RemoveDataMessage=1, RemoveMailboxDataMessage=4, BundleOfEnvelopes=10}
Received data: 5.052 MB; {GetDaoStateHashesResponse=1, GetDataResponse=1, GetProposalStateHashesResponse=1, GetBlindVoteStateHashesResponse=1}
CPU time spent on sending messages: 0.090 seconds
CPU time spent on receiving messages: 0.457 seconds

Connection 3
Age: 1 minute, 38.621 seconds
Peer: devinsn2teu33efff62bnvwbxmfgbfjlgqsu3ad4b4fudx3a725eqnyd.onion:8000 
Type: INITIAL_DATA_EXCHANGE 
Direction: Inbound
UID: 81eb9315-5186-4fd6-a153-f9597883e2e6
Time since last message: 2.445 seconds
Time for response: [GetDaoStateHashesRequest awaiting response... , GetProposalStateHashesRequest awaiting response... , GetBlindVoteStateHashesRequest awaiting response... , GetUpdatedDataRequest/Response: 0.008 seconds]
Sent data: 1.989 kB; {GetDataResponse=1, RefreshOfferMessage=1, RemoveDataMessage=1}
Received data: 369.567 kB; {AddDataMessage=8, GetDaoStateHashesRequest=1, GetProposalStateHashesRequest=1, GetBlindVoteStateHashesRequest=1, RefreshOfferMessage=25, GetUpdatedDataRequest=1}
CPU time spent on sending messages: 0.011 seconds
CPU time spent on receiving messages: 0.147 seconds

remove dev log

This can happen from the main shut down routine when a timeout gets triggered.

sqrrm

utACK

Will hold off with merging until test nodes have run for a bit

ripcurlx · 2021-01-07T13:54:21Z

@chimp1984 Regarding the Codacy complaint. Do you want to keep the extra if statement to improve readability?

chimp1984 · 2021-01-08T19:27:30Z

@chimp1984 Regarding the Codacy complaint. Do you want to keep the extra if statement to improve readability?

Yes, I prefer to keep it as it is.

sqrrm

utACK

chimp1984 added 18 commits January 5, 2021 20:34

Refactoring: Move PeerType outside of Connection

f169cf1

Add onMessageSent method to MessageListener

2c8c9ac

Add InitialDataRequest and InitialDataResponse marker interface for

15cd42d

relevant classes

Refactoring: Rename INITIAL_DATA_REQUEST to INITIAL_DATA_EXCHANGE

809484e

Add ConnectionState class

0f7a69d

Remove PeerType from Connection. Use ConnectionState instead.

86d0f96

Remove unnecessary setPeerType calls. ConnectionState is handling that. Only PeerManager does the setting of isSeedNode as we do not have the required dependency in ConnectionState.

Fix null pointer

0bb9d15

Use isSeedNode

5f977ff

Remove PeerType.SEED_NODE

e1b1781

Fix typo

Update display string and UI representation

3d55c16

Do sorting at candidates stream.

acad31f

At INITIAL_DATA_EXCHANGE we sort by lastInitialDataExchangeMessageTimeStamp

Refactor: Rename lastInitialDataExchangeMessageTimeStamp to lastIniti…

4d64fac

…alDataMsgTimeStamp

Behaviour change:

cf4d89d

In the 3rd attempt we filter for INITIAL_DATA_EXCHANGE peers. Before we excluded 2 types and as PEER have been already filtered earlier we would look up for SEED_NODE. This was only called by non-seedNodes.

Refactor:

5628b7b

Rename maxConnectionsPeer to outBoundPeerTrigger Rename maxConnectionsNonDirect to initialDataExchangeTrigger Update comment

Fix wrong return value for getMaxConnections

34230f4

Behaviour change: Remove removeSuperfluousSeedNodes method

7414df0

We handle the connections by INITIAL_DATA_EXCHANGE which cover the seed nodes as well. Do have an parallel routine is risky and make things more complex.

Behaviour change: Remove setAllowDisconnectSeedNodes method

769a78f

We handle it in ConnectionState by counting requests and responses and adding a timer

Add safety filter to removeAnonymousPeers

c7bc772

Cleanups

chimp1984 marked this pull request as draft January 6, 2021 03:00

chimp1984 added 8 commits January 5, 2021 22:19

Set expectedRequests to 5 in case of fullDaoNode as

3fa2242

it does not do the getBlocksRequest.

Use OS independent double line breaks for better readability

347e680

Handle BundleOfEnvelopes in ConnectionState

a346db6

Add ConnectionStatistics Print statistics of all live connections periodically

Add seed node info, add line break

d13b7e7

Fix wrong param in addToMap

082cc33

Also track BundleOfEnvelopes

Improve log

a996754

Fix node display

8865d9b

Add formatDurationAsWords to Utilities in common

14008a6

to be accessible to the statistics log.

chimp1984 added 2 commits January 6, 2021 09:54

Fix translation string

379fec8

Add missing stub to mock

63a87de

Increase delay for reset

1dc71c9

Improve statistic logging

chimp1984 added a commit to chimp1984/bisq that referenced this pull request Jan 6, 2021

Apply PR bisq-network#5057 (removed networkFilter changes in Connection)

91ec766

chimp1984 added 4 commits January 6, 2021 12:18

Ignore test

d8f9581

It reports "Wanted but not invoked:" but when debugging into it it is called. So seems to be some mock setup issue

Add bytes sent/received to ConnectionStatistics

4aecd75

Do check for closed socket after blocking read. Move throttle code after blocking read.

Add RRT for request / response

e0e1443

If duration is 0 we return "0.000 seconds" instead of empty string

a1cb6d5

chimp1984 marked this pull request as ready for review January 6, 2021 21:52

chimp1984 added 2 commits January 6, 2021 18:09

Change print statistics period to 5 min.

bda9ebe

remove dev log

Prevent calling shutdown at TorNetworkNode twice.

8ccfd65

This can happen from the main shut down routine when a timeout gets triggered.

sqrrm previously approved these changes Jan 7, 2021

View reviewed changes

Call init after setting connectionState and connectionStatistics

3f8972b

chimp1984 dismissed sqrrm’s stale review via 3f8972b January 12, 2021 03:38

chimp1984 added a commit to chimp1984/bisq that referenced this pull request Jan 12, 2021

Apply PR bisq-network#5057 (removed networkFilter changes in Connection)

9d6411a

sqrrm approved these changes Jan 12, 2021

View reviewed changes

sqrrm merged commit fea52f0 into bisq-network:master Jan 12, 2021

chimp1984 deleted the fix-premature-disconnections-from-seeds branch January 12, 2021 15:40

ripcurlx added this to the v1.5.5 milestone Jan 20, 2021

chimp1984 mentioned this pull request Jan 27, 2021

For Cycle 21 [Please reject my proposal as we are over the max issuance limit] bisq-network/compensation#768

Closed

sqrrm mentioned this pull request Jan 27, 2021

For Cycle 21 bisq-network/compensation#773

Closed

chimp1984 mentioned this pull request Feb 10, 2021

For Cycle 22 bisq-network/compensation#783

Closed

sqrrm mentioned this pull request Feb 27, 2021

For Cycle 22 bisq-network/compensation#787

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix premature disconnections from seeds #5057

Fix premature disconnections from seeds #5057

chimp1984 commented Jan 6, 2021 •

edited

Loading

cbeams commented Jan 6, 2021

chimp1984 commented Jan 6, 2021

sqrrm left a comment

ripcurlx commented Jan 7, 2021

chimp1984 commented Jan 8, 2021

sqrrm left a comment

Fix premature disconnections from seeds #5057

Fix premature disconnections from seeds #5057

Conversation

chimp1984 commented Jan 6, 2021 • edited Loading

cbeams commented Jan 6, 2021

chimp1984 commented Jan 6, 2021

sqrrm left a comment

Choose a reason for hiding this comment

ripcurlx commented Jan 7, 2021

chimp1984 commented Jan 8, 2021

sqrrm left a comment

Choose a reason for hiding this comment

chimp1984 commented Jan 6, 2021 •

edited

Loading