Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lighthouse node get instantly banned by peers on restart due to excessive concurrent requests #6106

Closed
jimmygchen opened this issue Jul 16, 2024 · 2 comments
Assignees
Labels
bug Something isn't working das Data Availability Sampling Networking

Comments

@jimmygchen
Copy link
Member

Description

It looks like the restarting node may have sent out too many requests and exceeded the number of inbound_substreams and getting insta-bannded by lighthouse peers (Seeing HandlerRejected in logs):

self.events_out.push(HandlerEvent::Err(HandlerErr::Inbound {
id: self.current_inbound_substream_id,
proto: req.versioned_protocol().protocol(),
error: RPCError::HandlerRejected,
}));
return self.shutdown(None);

After a restart, the beacon node would usually have to perform some block lookups to sync to the latest head, it seems like we've exceed the max number of concurrent requests allowed per peer (32 defined here)

logs:

Jul 15 07:04:45.903 DEBG Received DataColumnsByRoot Request, returned: 6, request: [(0xd9a57b85fd527477dbd8188a16c1949d0849f5dba5cb364194c8183a0a82696e, [81, 18, 34, 82, 46, 110])], peer: 16Uiu2HAmHMG5brx3krT1rbsFFPYGkNKTRxoRbmSqQHQbbiPqNqwq, module: network::network_beacon_processor::rpc_methods:425
Jul 15 07:04:46.376 DEBG RPC Error, direction: Incoming, score: 0.000008898132403756511, peer_id: 16Uiu2HAmHMG5brx3krT1rbsFFPYGkNKTRxoRbmSqQHQbbiPqNqwq, client: Lighthouse: version: v5.2.1-84fdf0f+, os_version: x86_64-linux, err: Handler rejected the request, protocol: data_column_sidecars_by_root, service: libp2p, module: lighthouse_network::peer_manager:489
Jul 15 07:04:46.376 DEBG Peer has been banned, score: -100.00, peer_id: 16Uiu2HAmHMG5brx3krT1rbsFFPYGkNKTRxoRbmSqQHQbbiPqNqwq, service: libp2p, module: lighthouse_network::peer_manager::peerdb:1097
Jul 15 07:04:46.376 DEBG Peer Manager disconnecting peer, reason: Bad Score, peer_id: 16Uiu2HAmHMG5brx3krT1rbsFFPYGkNKTRxoRbmSqQHQbbiPqNqwq, service: libp2p, module: lighthouse_network::service:1726

Steps to reproduce

  1. Start local testnet with the network_params_das_local.yaml config
  2. Stop one of the lighthouse node and restart it immediately
cd scripts/local_testnet
./start_local_testnet.sh  -n ./network_params_das_local.yaml

# restart a LH node
kurtosis service stop local-testnet cl-3-lighthouse-geth
kurtosis service start local-testnet cl-3-lighthouse-geth
@jimmygchen jimmygchen added bug Something isn't working das Data Availability Sampling Networking labels Jul 16, 2024
@ackintosh ackintosh self-assigned this Jul 16, 2024
@ackintosh
Copy link
Member

The sequence diagram below shows what was happening when I reproduced the issue.

image

@jimmygchen
Copy link
Member Author

Completed in #6256. Thanks @ackintosh!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working das Data Availability Sampling Networking
Projects
None yet
Development

No branches or pull requests

2 participants