Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate peregrine syncing issue when using the polkadot-parachain binary #4787

Open
serban300 opened this issue Jun 13, 2024 · 15 comments
Open
Assignees
Labels
T0-node This PR/Issue is related to the topic “node”.

Comments

@serban300
Copy link
Contributor

The polkadot-parachain bin should be able to run custom parachains that satisfy some assumptions, mainly using the aura consensus.

peregrine should be such a parachain but it doesn't work with polkadot-parachain. More details here: https://forum.polkadot.network/t/polkadot-parachain-omni-node-gathering-ideas-and-feedback/7823/18

@serban300 serban300 added the T0-node This PR/Issue is related to the topic “node”. label Jun 13, 2024
@serban300
Copy link
Contributor Author

serban300 commented Jun 13, 2024

I'm seeing a lot of errors that go like:

2024-06-13 12:47:28.738 DEBUG tokio-runtime-worker sync: [Parachain] Substream opened for 12D3KooWALJtiCZzcUPVsCa5f5egGfQyFhPY67kKosDw95bJqK7M, handshake [4, 145, 221, 91, 0, 0, 0, 0, 0, 117, 190, 65, 105, 246, 208, 200, 8, 87, 59, 131, 14, 183, 20, 248, 133, 39, 127, 124, 43, 225, 152, 61, 108, 70, 116, 173, 12, 114, 110, 252, 129, 160, 198, 227, 186, 195, 130, 179, 22, 166, 139, 202, 113, 65, 175, 31, 186, 80, 114, 7, 89, 76, 118, 16, 118, 132, 124, 227, 88, 174, 237, 204, 33]    
2024-06-13 12:47:28.738 TRACE tokio-runtime-worker sync: [Parachain] New peer 12D3KooWALJtiCZzcUPVsCa5f5egGfQyFhPY67kKosDw95bJqK7M [4, 145, 221, 91, 0, 0, 0, 0, 0, 117, 190, 65, 105, 246, 208, 200, 8, 87, 59, 131, 14, 183, 20, 248, 133, 39, 127, 124, 43, 225, 152, 61, 108, 70, 116, 173, 12, 114, 110, 252, 129, 160, 198, 227, 186, 195, 130, 179, 22, 166, 139, 202, 113, 65, 175, 31, 186, 80, 114, 7, 89, 76, 118, 16, 118, 132, 124, 227, 88, 174, 237, 204, 33]    
2024-06-13 12:47:28.738 TRACE tokio-runtime-worker sync: [Parachain] Validate handshake for 12D3KooWALJtiCZzcUPVsCa5f5egGfQyFhPY67kKosDw95bJqK7M    
2024-06-13 12:47:28.738 DEBUG tokio-runtime-worker sync: [Parachain] Failed to decode handshake for 12D3KooWALJtiCZzcUPVsCa5f5egGfQyFhPY67kKosDw95bJqK7M: Error { cause: None, desc: "Input buffer has still data left after decoding!" }    
2024-06-13 12:47:28.738 DEBUG tokio-runtime-worker sync: [Parachain] `SyncingEngine` rejected 12D3KooWALJtiCZzcUPVsCa5f5egGfQyFhPY67kKosDw95bJqK7M    
2024-06-13 12:47:28.738 DEBUG tokio-runtime-worker sync: [Parachain] 12D3KooWALJtiCZzcUPVsCa5f5egGfQyFhPY67kKosDw95bJqK7M does not exist in `SyncingEngine`

Not sure if they are related to the issue so far

@serban300
Copy link
Contributor Author

The problem is that peregrine uses pub type BlockNumber = u64 while polkadot-parachain uses u32

I recompiled polkadot-parachain with BlockNumber = u32 and peregrine syncs. I didn't do a full sync, but at least it's syncing:

[Parachain] ⚙️  Syncing 164.2 bps, target=#6020845 (3 peers), best: #50380 (0xed2a…13e1), finalized #0 (0xa0c6…cc21), ⬇ 405.7kiB/s ⬆ 0.3kiB/s 

@Polkadot-Forum
Copy link

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/polkadot-parachain-omni-node-gathering-ideas-and-feedback/7823/20

@ntn-x2
Copy link

ntn-x2 commented Jun 13, 2024

We've hit this issue of a fixed BlockNumber definition in a lot of different places, last in the XCM emulator tests as well. Please do let me know if I can support in any way to get this fixed, as we would love to get rid of our node binary, eventually.

@kianenigma
Copy link
Contributor

I think foremost you should migrate the blocknumber in your current node and runtime to be u32, and then solving the rest would be easier.

Within the runtime, you'd have to see all the places where you might have stored a block number in the state, and migrate it over.

Although, assuming no blocknumber is above u32::max, decoding a u64 into a u32 is the same.

@bkchr
Copy link
Member

bkchr commented Jun 13, 2024

Although, assuming no blocknumber is above u32::max, decoding a u64 into a u32 is the same.

This is not true. U32 is only 4 byte, while u64 is 8 byte. This can very easily break decoding of other stuff.

@ntn-x2
Copy link

ntn-x2 commented Jun 13, 2024

I think foremost you should migrate the blocknumber in your current node and runtime to be u32, and then solving the rest would be easier.

But why is that? Why can't we simply use a u64? The requirement for the BlockNumber trait is

pub trait BlockNumber:
	Member
	+ MaybeSerializeDeserialize
	+ MaybeFromStr
	+ Debug
	+ sp_std::hash::Hash
	+ Copy
	+ MaybeDisplay
	+ AtLeast32BitUnsigned
	+ Into<U256>
	+ TryFrom<U256>
	+ Default
	+ TypeInfo
	+ MaxEncodedLen
	+ FullCodec

@kianenigma
Copy link
Contributor

But why is that? Why can't we simply use a u64

I suspect the early versions of the omni-node will not be configurable will use u32 block number as it is more commmonly used.

@kianenigma
Copy link
Contributor

Although, assuming no blocknumber is above u32::max, decoding a u64 into a u32 is the same.

This is not true. U32 is only 4 byte, while u64 is 8 byte. This can very easily break decoding of other stuff.

True, my advice is very much a shutgun as it only applies if you decode and not decode_all, but I cannot know if all underlying code is using the former.

@ntn-x2
Copy link

ntn-x2 commented Jun 18, 2024

@kianenigma I see, so perhaps we just identified one more requirement for the upcoming omni-node 😄

@bkchr
Copy link
Member

bkchr commented Jun 21, 2024

True, my advice is very much a shutgun as it only applies if you decode and not decode_all, but I cannot know if all underlying code is using the former.

struct Something<BlockNumber> {
     number: BlockNumber,
     whatever: Vec<u8>,
}

If you change the block number there, the decoding will be completely broken. This is not related to decode/decode_all.

But why is that? Why can't we simply use a u64? The requirement for the BlockNumber trait is

The point being that every interaction between the node and the runtime, plus any usage of block number in the node requires to know the type of the blocknumber. It is generic over the block number, but only generic at compile time and not generic at runtime.

@ntn-x2
Copy link

ntn-x2 commented Jun 24, 2024

The point being that every interaction between the node and the runtime, plus any usage of block number in the node requires to know the type of the blocknumber. It is generic over the block number, but only generic at compile time and not generic at runtime.

@bkchr I am not sure I understand 100% what you are saying, sorry 😀 Are you saying that it's not possible to know at compile time which block number type to use for a given runtime? Or does your point apply to a generic node that must run all runtimes, regardless of their block number type?

@bkchr
Copy link
Member

bkchr commented Jun 24, 2024

Or does your point apply to a generic node that must run all runtimes, regardless of their block number type?

My comment applies to this.

github-merge-queue bot pushed a commit that referenced this issue Aug 28, 2024
Related to #4787

The main changes in this PR are the following:
- making the NodeSpec logic generic on the Block type
- adding an omni-node variant with u64 block number

Apart from this, the PR also moves some of the logic in `service.rs` to
the `common` subfolder

The omni-node variant with u64 block number is not used yet. We have to
either expose the option in the CLI or to read the block number from the
chain spec somehow. Will do it in a future PR.
github-merge-queue bot pushed a commit that referenced this issue Aug 28, 2024
Related to #4787

The main changes in this PR are the following:
- making the NodeSpec logic generic on the Block type
- adding an omni-node variant with u64 block number

Apart from this, the PR also moves some of the logic in `service.rs` to
the `common` subfolder

The omni-node variant with u64 block number is not used yet. We have to
either expose the option in the CLI or to read the block number from the
chain spec somehow. Will do it in a future PR.
@kianenigma
Copy link
Contributor

Last step before we can close this on our side: KILTprotocol/kilt-node#716

@ntn-x2
Copy link

ntn-x2 commented Sep 3, 2024

Amazing. We will check it out. Thanks a lot @kianenigma!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T0-node This PR/Issue is related to the topic “node”.
Projects
Status: Milestone 0
Development

No branches or pull requests

5 participants