Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Severe Protocol Error" reported by nodes running older substrate/libp2p after portion of network updates #541

Open
drewstone opened this issue Jan 25, 2021 · 6 comments
Labels
I10-unconfirmed Issue might be valid, but it's not yet known.

Comments

@drewstone
Copy link
Contributor

drewstone commented Jan 25, 2021

We recently began updating nodes on our testnet to our new Edgeware client. For some time, the network was partitioned into two groups that still remained in sync, though peers could be identified as running "old" or "new" software. Once we began updating the remaining old nodes under our management, the remaining older nodes all dropped from their peer lists. The culprit seems to be a Severe Protocol Error like the following reported by an old node:

Handler(PeerId("12D3KooWMVj5bwGuB8e3BB7ikvmJGXEEC8MvXWffPeC7NCHpwkRH")) => Severe protocol error: Upgrade(Select(ProtocolError(InvalidMessage)))

Our libp2p version of that old node is:

3.0.8 (old):
name = "libp2p"
version = "0.19.1"
name = "libp2p-core"
version = "0.19.2"

3.1.0 (old):
name = "libp2p"
version = "0.28.1"
name = "libp2p-core"
version = "0.22.1"

3.2.0 (new):
name = "libp2p"
version = "0.33.0"
name = "libp2p-core"
version = "0.26.0"

It seems that new client software is not backwards compatible and we're looking for insight to help our investigation as well as poll for any ideas around mitigation. We are looking to upgrade our network soon but don't want to cause a large amount of slashable events in the case where validators fail to update in a timely manner and are subsequently dropped from connecting to other peers due to a protocol mismatch.

Linking @jnaviask.

@drewstone
Copy link
Contributor Author

In previous versions we had carried over the legacy protocol commits that have since exited Substrate since mid last year. We are not carrying it over in this new update and are trying to get off of any legacy protocol. Is there a way this was handled smoothly for Kusama/Polkadot or recommendations on how we should handle this for Edgeware?

@drewstone
Copy link
Contributor Author

One of our validators on the testnet who had fallen behind running 3.1.0, and then subsequently updated the node to 3.2.0 shared this set of logs AFTER updating: https://pastebin.com/F3Xc3RMA. Sharing in case this is also helpful.

@tomaka
Copy link
Contributor

tomaka commented Jan 26, 2021

One of our validators on the testnet who had fallen behind running 3.1.0, and then subsequently updated the node to 3.2.0 shared this set of logs AFTER updating: https://pastebin.com/F3Xc3RMA. Sharing in case this is also helpful.

You need to add this line when initializing the service: https://github.com/paritytech/polkadot/blob/44b756b0323759e426a793ce18c9d6fe2d1f576b/node/service/src/lib.rs#L573
It is now needed to "connect" the networking to the grandpa voter.

@tomaka
Copy link
Contributor

tomaka commented Jan 26, 2021

The so-called legacy protocol is still in the Substrate code base, and is still supposed to work. In theory, even a Substrate from mid-2018 should be able to connect to a recent Substrate. In practice, however, we don't test it, and it might not actually work.

I see that your "old" version is from August. That should in theory make it possible for old nodes to connect to new nodes, but not the other way around.

@drewstone
Copy link
Contributor Author

@tomaka is there a flag like there used to be that is accessible to the node operator? What we were seeing is that old nodes were rejecting new nodes due to protocol error, right @jnaviask ?

@altonen altonen transferred this issue from paritytech/substrate Aug 24, 2023
@the-right-joyce the-right-joyce added I10-unconfirmed Issue might be valid, but it's not yet known. and removed J2-unconfirmed labels Aug 25, 2023
claravanstaden pushed a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023
helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024
…ritytech#541)

* Fix rlp encoding when calculating the receipts root

* Use `ethereum::ReceiptsV2`

* Actually use `ethereum::ReceiptV3`

* `ethereum` 0.11

* `evm` 0.33.1

* Add ethereum schema v3

* Remove old comments

* Add AccessList support for `eth_call/create`

* Use AccessListItem instead tuple

* One liner match

* ethereum `0.11.1`

* fmt

* One liner matches
bkchr pushed a commit that referenced this issue Apr 10, 2024
* limit max number of messages in delivery tx

* support max-messages-in-delivery-tx in relayer

* clippy

* clippy

* Update modules/message-lane/src/lib.rs

Co-authored-by: Tomasz Drwięga <tomusdrw@users.noreply.github.com>

Co-authored-by: Tomasz Drwięga <tomusdrw@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I10-unconfirmed Issue might be valid, but it's not yet known.
Projects
Status: Backlog 🗒
Development

No branches or pull requests

4 participants