-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Parachain / Collator networking design proposal #1348
Comments
One major concern seems to be the pre-validation function. Cumulus at the moment allows any collator to author a block, so there is nothing to check in the pre-validation stage. Furthermore, it would be possible to register parachain Wasm which has no pre-validation check even if Cumulus didn't create those kinds of blobs. So we have to account for this base-case of an empty pre-validation check regardless. Assuming that we have to contend with this base-case of no pre-validation, the whitelisting and flow-control logic will still prove useful for defending against collators whose goal is to waste bandwidth. But we may just need to impose high bandwidth and RAM requirements for validators and their sentries. |
@mxinden please file and link issues in this repo, libp2p, and Substrate for implementation work |
What does the full validation function do? Just sign the whole block directly or something? Then what is the fork-choice rule for this type of naive authorship? |
The full validation function of chains is responsible for executing the transactions and per-block state transition logic and ensuring that they all proceed successfully. The fork-choice rule that we have used for Cumulus is "follow Polkadot". Polkadot has a fork-choice rule, and we determine the viable leaves of the parachain by inspecting those which have been backed in Polkadot blocks. Lastly, we also consider parachain blocks that have been seconded by a validator (but not backed) to be accepted by the fork-choice rule. See paritytech/cumulus#7 (comment) |
Pre-PoV should be header size so that we save a lot of bandwidth by checking them first. Their existence should indeed be optional, but they should improve networking performance by preventing spam of full PoV blocks when they are used. Ideally we would gossip these between full nodes, so that the parachains that use Aura or Sassafras, when there is only one collator that can produce the block, we can receive the pre-PoV from a peer we are connected to and know exactly who we need to connect to to get the PoV block. We really do need PoS or PoW before we deploy parachains seriously as they are required to deal with censorship attacks on parachains. I still have no clue how to prevent censorship on parathreads. |
In regards to proposal (1):
Validating data will be in a component above Achieving the above is in progress, but in no way done (e.g. paritytech/substrate#4552, paritytech/substrate#4631, paritytech/substrate#4661, paritytech/substrate#4691, paritytech/substrate#4766, paritytech/substrate#4767, paritytech/substrate#5042, paritytech/substrate#5390, paritytech/substrate#5669, paritytech/substrate#5748, paritytech/substrate#5983 - these are only the once I was involved in, there are plenty more). In regards to (3):
The heuristics on IP CIDRs sound neat and good to me. Introducing these into the Substrate's peerset should be a small-scoped change. |
Can we define the word "interface"? |
My opinion on (0) is waiting for an answer to the aforementioned question. About (1) and (2): About (3):
Unfortunately, the design of the peerset makes this extremely difficult. I'd strongly welcome any refactoring of the peerset or even the complete removal of the concept of peerset, but that is a way too big effort. |
The word "interface" here means the same thing as it does for the already-existing concept of the validation-function interface.
In the long run we do need bandwidth limitation for both incoming and outgoing traffic, for the detailed security reasons I described in the OP. I don't understand what you think the difference between "limiting the rate of processing incoming gossip messages" vs "limiting the incoming bandwidth" is. BTW, limiting the overall rate is insufficient for security, for this purpose want per-peer control over the limit. |
I would appreciate something more detailed than "it is the same as this" without any link or further explanation. |
The point is that defining this in detail is not critical for understanding the current proposal; besides other parts of substrate / polkadot already use the concept, so I don't see why I am being asked to define something everyone else is already using. |
Notes from meeting, will follow up with more details:
|
@tomaka Hey, the pre-validation interface is about adding an extra function to the Parachain Wasm that can do some light checks on a small piece of data that precedes the larger PoV that satisfies the full validation function. The effect on the networking side is to reduce the amount of data that we have to buffer or accept from somewhat untrusted collator peers. |
Thanks. This is unrelated to the low-level networking and can be removed from this issue as off-topic, correct? |
Ah right, now your intention is clear. This is somewhat related to networking as it helps to set a more reasonable upper bound on the size of messages we have to buffer from each peer, whereas with the regular full-validation function there an upper bound of 10MB per PoV block. The exact details are TBD, as I mentioned in a more recent comment above we know of 2 options. |
I realised that I had a bit of tunnel-vision here. These types of attacks are only really possible under a push-based protocol where peers are free to send us anything they want. They are much harder under a pull-based protocol where peers advertise us hashes (or other small pieces of metadata, e.g. perhaps small proofs), and then we request the larger bodies explicitly, which gives us control over not receiving redundant stuff. In other words, in a pull-based protocol, efficiency should be close to 100%, assuming the size of the advertisements are negligible compared to the data body, and we put in place some basic protections against advertisement spam such as only allowing <X outstanding advertisements. OTOH, pull has other downsides including increased latency; choosing push vs pull is a discussion for another issue. Regardless, I do feel that an efficiency-measurement mechanism as proposed by (1) is still useful for other purposes, including enabling further performance analysis, and possibly even preventing advertisement-spam in the pull protocol. However under a pull model it certainly becomes a less urgent priority than (2) or (3). |
Using pull over push when communicating with untrusted entities (Collators) sounds reasonable to me. Would these advertisements still be send over direct connections as in Collators connecting directly to Validators or would these advertisements be gossiped over a separate gossip overlay? When a Validator behind a Sentry deems an advertisement valid, would they tell their Sentry node to fetch the corresponding PoV from the Collator? |
Advertisements would be sent directly, modulo a sentry acting as a proxy in the middle, which I'll specify over in more detail soon. OTOH, validators will need to connect to each other to pass PoV blocks between each other (after first receiving them from a collator), this would be a separate gossip mini-network from the relay chain gossip network; I'll specify that in more detail as well. |
I wrote up more details on these points, as well as some framing text, in the hackMD - @rphmeier would you mind editing the OP? |
@infinity0 Updated |
Thinking about it further, sophisticated bandwidth-wasting attacks are still possible under a pull-based scheme, just different kinds. For example, send 99% of the supposed data, then disconnect. You can always say "my ISP had a dodgy connection", so we cannot just simply ban all peers who drop their connections. But then their IP address won't be affected by the IP-address-mechanism mentioned in (3), and they are free to reconnect, competing with other honest validators. The only reliable way to defend against these types of attacks, is to measure the thing 're ultimately interested in i.e. throughput, and blacklist the very worst ones on an heuristic basis. OTOH, the most recent update added the ability for the pre-validation function to specify authorised collators, that are trusted to distribute the data without attempting these types of bandwidth attacks (and if they do attempt it, they are attacking themselves). @AlistairStewart and I believe that parachains typically should be able to do this including PoW and other permissionless chains - there has to be something that makes a valid block hard-to-forge or generate arbitrary amounts of, and this something ought to be re-usable to authenticate a certificate authorising a particular collator for distribution. Nevertheless it's unclear at the present time if this can deter all cases of bandwidth-wasting attacks across all possible parachains, so (1) is still something to keep in mind as a general solution, even though it involves more low-level work. |
We actually set this to be a simple majority (>1/2) in practice, as it reduces the maximum amount of groups that can be compromised at once. Either way, I find it beyond scope for the issue as stated. |
ff5d539fcb Update Substrate/Polkadot/Cumulus references (#1353) 1581f60cd5 Support dedicated lanes for pallets (#962) 0a7ccf5c57 ignore more "increase" alerts that are sometimes signalling NoData at startup (#1351) 31165127cc added no_stack_overflow_when_decoding_nested_call_during_dispatch test (#1349) 7000619eb8 replace From<>InboundLaneApi with direct storage reads (#1348) 515df10ccc added alerts for relay balances (#1347) b56f6a87de Mortal conversion rate updater transactions (#1257) 20f2f331ec edition = "2021" (#1346) 99147d4f75 update regex to 1.5.5 (#1345) 686191f379 use DecodeLimit when decoding incoming calls (#1344) a70c276006 get rid of '[No Data] Messages from Millau to Rialto are not being delivered' warnings (#1342) 01f29b8ac1 fix conversion rate metric in dashboards (#1341) 51c3bf351f Increase rate from metric when estimating fee (#1340) 3bb9c4f68f fix generator scripts to be consistent with updatedrelay output (#1339) 0475a1667b fixed mess with conversion rates (#1338) d8fdd7d716 synchronize relay cli changes and token swap generator script (#1337) 6e928137a5 fix conversion rate override in token swap (#1336) 62d4a4811d override conversion rate in tokens swap generator (#1335) ed9e1c839c fi typo in generator script (#1334) 3254b5af7a Override conversion rate when computing message fee (#1261) 66df68b5b8 Revert "Revert "override conversion rate in estimate-message-fee RPC (#1189)" (#1275)" (#1333) 0ca6fc6ef8 fix clippy issues (#1332) 5414b2fffb Reinitialize bridge relay subcommand (#1331) a63d95ba7d removed extra *_RUNTIME_VERSION consts from relay code (#1330) 59fb18a310 fix typo in alert expression (#1329) a6267a47ee Using-same-fork metric for finality and complex relay (#1327) 88d684d37e use mortal transactions in transaction resubmitter (#1326) 8ff88b6844 impl Decode for SignedExtensions (otherwise transaction resubmitter panicks) (#1325) 1ed09854f0 Encode and estimate Rococo/Wococo/Kusama/Polkadot messages (#1322) ddb4517e13 Add some tests to check integrity of chain constants + bridge configuration (#1316) bdeedb7ab9 Fix issues from cargo deny (#1311) d3d79d01e0 expose fee multiplier metrics in messages relay (#1312) c8b3f0ea16 Endow relayer account at target chain in message benchmarks (#1310) f51ecd92b6 fix benchmarks before using it in Polkadot/Kusama/Rococo runtimes (#1309) 6935c619ad increase relay balance guard limits for Polkadot<>Kusama bridge (#1308) 7e31834c66 Fix mandatory headers scanning in on-demand relay (#1306) 3787193a31 fix session length of Rococo and Wococo (#1304) eb468d29c0 Revert nightly docker pin (#1301) e2d4c073e1 Use raw balance value if tokenDecimals property is missing (#1299) 108f4b29d1 Fix ss58 prefixes of Polkadot, Kusama and Westend used by relay (#1298) 64fbd2705e bump chain spec versions (#1297) 5707777b86 Bump Substrate/Polkadot/Cumulus refs (#1295) 1f0c05368e Relay balance metrics (#1291) 6356bb90b3 when messages pallet is halted, relay shall not submit messages delivery/confirmation transactions (#1289) 800dc2df8d when GRANDPA pallet is halted, relay shall not submit finality transactions (#1288) 3dd8e4f936 disable BEEFY allerts for Rialto (#1285) f58fed7380 support version mode cli options in send-message subcommand (#1284) 3aac448da3 reuse polkadot-service code (#1273) 2bdbb651e1 replace latest_confirmed_nonce runtime APIs with direct storage reads (#1282) 5f9c6d241f move "common" code of messages pallet benchmarks helpers to the common library (#1281) 8b9c4ec16d do not start spec_version guard when version mode is set to auto (#1278) e98d682de2 removed extra messages benchmarks (#1279) c730e25b61 Move benchmarks from Rialto to Millau (#1277) ed7def64c4 Revert "override conversion rate in estimate-message-fee RPC (#1189)" (#1275) 38c6c3a49f Use "production" floating tag when uilding docker image from version git tags (#1272) ded9ff6dbb Replace InboundLaneApi::latest_received_nonce with direct storage read (#1269) 8c65f0d7ab verify that GRANDPA pallet is not initialized before submitting initialization transaction (#1267) e7e83d8944 remove OutboundLaneApi::latest_received_nonce (#1262) 9f4b34acf1 bump rococo version (#1263) 82c08c5a87 read latest_generated_nonce directly from storage (#1260) 50ffb5dd08 override conversion rate in estimate-message-fee RPC (#1189) 467ca5ef59 move storage keys computation to primitivs (#1254) 4f9884066b remporary use pinned bridges-ci image in Dockerfile (#1258) edfcb74e00 Change submit transaction spec_version and transaction_version query from chain (#1248) 4009d970d0 pin bridges-ci image (#1256) 65e51b5e1c decrease startup sleep to 5s for relays and to 120s for generators + remove curl (#1251) 3bc74355d9 Add missing RPC APIs to rialto parachain node (#1250) 80c9429284 Bump relay version to 1.0.0 (#1249) 9ead06af2a runtimes: fix call_size() test (#1245) 4fc8a29357 Use same endowed accounts set on dev/local chains (#1244) fed54371c2 Refactor message relay helpers (#1234) a15b4faae7 post-merge build fix (#1243) 52232d8d54 Fix transactions mortality (#1196) c07bba931f Expose prometheus BEEFY metrics and add them to grafana dashboard (#1242) f927775bd5 Refactor finality relay helpers (#1220) 7bf76f14a8 Update Rococo/Wococo version + prepare relay for Rococo<>Wococo bridge (#1241) e860fecd04 Enable offchain indexing for Rialto/Millau nodes (#1239) 04d4d1c6b4 Enable Beefy debug logs in test deployment (#1237) cd771f1089 Fix storage parameter name computation (#1238) 816ddd2dd2 Integrate BEEFY with Rialto & Millau runtimes (#1227) d94b62b1ac update dependencies (#1229) 98eb9ee13d Add mut support (#1232) ffef6f89f9 fixed set_operational in GRANDPA pallet (#1226) bd2f8bfbd7 Add CODEOWNERS file (#1219) 6b5cf2b591 Unify metric names (#1209) d1541e797e remove abandoned exchange relay (#1217) 39140d0b34 Remove unused `relays/headers` (#1216) 9bc071d42b Remove unused PoA<>Substrate bridge (#1210) 877e8d01e3 Fix UI deployment. (#1211) 6cd5775ebe Add `AtLeast32BitUnsigned` for MessageLance::SourceChainBalance (#1207) REVERT: 1602249f0a Enable Beefy debug logs in test deployment (#1237) REVERT: c61d240b47 Fix storage parameter name computation (#1238) REVERT: 96d3808e88 Integrate BEEFY with Rialto & Millau runtimes (#1227) REVERT: f75a1bdd9b update dependencies (#1229) REVERT: 957da03854 Add mut support (#1232) REVERT: 8062637289 fixed set_operational in GRANDPA pallet (#1226) REVERT: 14b36ca4ee Add CODEOWNERS file (#1219) REVERT: 3bec15766f Unify metric names (#1209) REVERT: 0e839d2423 remove abandoned exchange relay (#1217) REVERT: 2c91c6815c Remove unused `relays/headers` (#1216) REVERT: 80b1e65db8 Remove unused PoA<>Substrate bridge (#1210) REVERT: f36f76fc2a Fix UI deployment. (#1211) REVERT: fc0b65365b Add `AtLeast32BitUnsigned` for MessageLance::SourceChainBalance (#1207) git-subtree-dir: bridges git-subtree-split: ff5d539fcbcd0e68dcd4fd921935ff9de87d7d94
f220d2fcca Polkadot staging update (#1356) 02fd3d497c fix parse_transaction on Rialto+Millau (#1360) bc191fd9a2 update parity-scale-codec to 3.1.2 (#1359) a37226e79c update chain versions (#1358) ff5d539fcb Update Substrate/Polkadot/Cumulus references (#1353) 1581f60cd5 Support dedicated lanes for pallets (#962) 0a7ccf5c57 ignore more "increase" alerts that are sometimes signalling NoData at startup (#1351) 31165127cc added no_stack_overflow_when_decoding_nested_call_during_dispatch test (#1349) 7000619eb8 replace From<>InboundLaneApi with direct storage reads (#1348) 515df10ccc added alerts for relay balances (#1347) b56f6a87de Mortal conversion rate updater transactions (#1257) 20f2f331ec edition = "2021" (#1346) 99147d4f75 update regex to 1.5.5 (#1345) 686191f379 use DecodeLimit when decoding incoming calls (#1344) a70c276006 get rid of '[No Data] Messages from Millau to Rialto are not being delivered' warnings (#1342) 01f29b8ac1 fix conversion rate metric in dashboards (#1341) 51c3bf351f Increase rate from metric when estimating fee (#1340) 3bb9c4f68f fix generator scripts to be consistent with updatedrelay output (#1339) 0475a1667b fixed mess with conversion rates (#1338) d8fdd7d716 synchronize relay cli changes and token swap generator script (#1337) 6e928137a5 fix conversion rate override in token swap (#1336) 62d4a4811d override conversion rate in tokens swap generator (#1335) ed9e1c839c fi typo in generator script (#1334) 3254b5af7a Override conversion rate when computing message fee (#1261) 66df68b5b8 Revert "Revert "override conversion rate in estimate-message-fee RPC (#1189)" (#1275)" (#1333) 0ca6fc6ef8 fix clippy issues (#1332) 5414b2fffb Reinitialize bridge relay subcommand (#1331) a63d95ba7d removed extra *_RUNTIME_VERSION consts from relay code (#1330) 59fb18a310 fix typo in alert expression (#1329) a6267a47ee Using-same-fork metric for finality and complex relay (#1327) 88d684d37e use mortal transactions in transaction resubmitter (#1326) 8ff88b6844 impl Decode for SignedExtensions (otherwise transaction resubmitter panicks) (#1325) 1ed09854f0 Encode and estimate Rococo/Wococo/Kusama/Polkadot messages (#1322) ddb4517e13 Add some tests to check integrity of chain constants + bridge configuration (#1316) bdeedb7ab9 Fix issues from cargo deny (#1311) d3d79d01e0 expose fee multiplier metrics in messages relay (#1312) c8b3f0ea16 Endow relayer account at target chain in message benchmarks (#1310) f51ecd92b6 fix benchmarks before using it in Polkadot/Kusama/Rococo runtimes (#1309) 6935c619ad increase relay balance guard limits for Polkadot<>Kusama bridge (#1308) 7e31834c66 Fix mandatory headers scanning in on-demand relay (#1306) 92ddc3ea7a Polkadot-staging update (#1305) 3787193a31 fix session length of Rococo and Wococo (#1304) eb468d29c0 Revert nightly docker pin (#1301) e2d4c073e1 Use raw balance value if tokenDecimals property is missing (#1299) 108f4b29d1 Fix ss58 prefixes of Polkadot, Kusama and Westend used by relay (#1298) 64fbd2705e bump chain spec versions (#1297) 5707777b86 Bump Substrate/Polkadot/Cumulus refs (#1295) 29eecdf1fa Merge pull request #1294 from paritytech/polkadot-staging-update 1f0c05368e Relay balance metrics (#1291) 6356bb90b3 when messages pallet is halted, relay shall not submit messages delivery/confirmation transactions (#1289) 800dc2df8d when GRANDPA pallet is halted, relay shall not submit finality transactions (#1288) 3dd8e4f936 disable BEEFY allerts for Rialto (#1285) f58fed7380 support version mode cli options in send-message subcommand (#1284) 3aac448da3 reuse polkadot-service code (#1273) 2bdbb651e1 replace latest_confirmed_nonce runtime APIs with direct storage reads (#1282) 5f9c6d241f move "common" code of messages pallet benchmarks helpers to the common library (#1281) 173d2d8229 Merge pull request #1280 from paritytech/polkadot-staging-update 8b9c4ec16d do not start spec_version guard when version mode is set to auto (#1278) e98d682de2 removed extra messages benchmarks (#1279) c730e25b61 Move benchmarks from Rialto to Millau (#1277) 54146416e7 Merge pull request #1276 from paritytech/polkadot-staging-update df70118174 Merge branch 'master' into polkadot-staging-update ed7def64c4 Revert "override conversion rate in estimate-message-fee RPC (#1189)" (#1275) 38c6c3a49f Use "production" floating tag when uilding docker image from version git tags (#1272) ded9ff6dbb Replace InboundLaneApi::latest_received_nonce with direct storage read (#1269) f704a741ee Polkadot staging update (#1270) 8c65f0d7ab verify that GRANDPA pallet is not initialized before submitting initialization transaction (#1267) e7e83d8944 remove OutboundLaneApi::latest_received_nonce (#1262) 9f4b34acf1 bump rococo version (#1263) 82c08c5a87 read latest_generated_nonce directly from storage (#1260) 50ffb5dd08 override conversion rate in estimate-message-fee RPC (#1189) 467ca5ef59 move storage keys computation to primitivs (#1254) 4f9884066b remporary use pinned bridges-ci image in Dockerfile (#1258) edfcb74e00 Change submit transaction spec_version and transaction_version query from chain (#1248) 4009d970d0 pin bridges-ci image (#1256) 65e51b5e1c decrease startup sleep to 5s for relays and to 120s for generators + remove curl (#1251) 3bc74355d9 Add missing RPC APIs to rialto parachain node (#1250) 80c9429284 Bump relay version to 1.0.0 (#1249) 9ead06af2a runtimes: fix call_size() test (#1245) 4fc8a29357 Use same endowed accounts set on dev/local chains (#1244) fed54371c2 Refactor message relay helpers (#1234) a15b4faae7 post-merge build fix (#1243) 52232d8d54 Fix transactions mortality (#1196) c07bba931f Expose prometheus BEEFY metrics and add them to grafana dashboard (#1242) f927775bd5 Refactor finality relay helpers (#1220) 7bf76f14a8 Update Rococo/Wococo version + prepare relay for Rococo<>Wococo bridge (#1241) e860fecd04 Enable offchain indexing for Rialto/Millau nodes (#1239) 04d4d1c6b4 Enable Beefy debug logs in test deployment (#1237) cd771f1089 Fix storage parameter name computation (#1238) 816ddd2dd2 Integrate BEEFY with Rialto & Millau runtimes (#1227) d94b62b1ac update dependencies (#1229) 98eb9ee13d Add mut support (#1232) ffef6f89f9 fixed set_operational in GRANDPA pallet (#1226) bd2f8bfbd7 Add CODEOWNERS file (#1219) 6b5cf2b591 Unify metric names (#1209) d1541e797e remove abandoned exchange relay (#1217) 39140d0b34 Remove unused `relays/headers` (#1216) 9bc071d42b Remove unused PoA<>Substrate bridge (#1210) 877e8d01e3 Fix UI deployment. (#1211) 6cd5775ebe Add `AtLeast32BitUnsigned` for MessageLance::SourceChainBalance (#1207) git-subtree-dir: bridges git-subtree-split: f220d2fccabbf141101d19456ecb4e3576a1d797
* Squashed 'bridges/' changes from 1602249f0a..f220d2fcca f220d2fcca Polkadot staging update (#1356) 02fd3d497c fix parse_transaction on Rialto+Millau (#1360) bc191fd9a2 update parity-scale-codec to 3.1.2 (#1359) a37226e79c update chain versions (#1358) ff5d539fcb Update Substrate/Polkadot/Cumulus references (#1353) 1581f60cd5 Support dedicated lanes for pallets (#962) 0a7ccf5c57 ignore more "increase" alerts that are sometimes signalling NoData at startup (#1351) 31165127cc added no_stack_overflow_when_decoding_nested_call_during_dispatch test (#1349) 7000619eb8 replace From<>InboundLaneApi with direct storage reads (#1348) 515df10ccc added alerts for relay balances (#1347) b56f6a87de Mortal conversion rate updater transactions (#1257) 20f2f331ec edition = "2021" (#1346) 99147d4f75 update regex to 1.5.5 (#1345) 686191f379 use DecodeLimit when decoding incoming calls (#1344) a70c276006 get rid of '[No Data] Messages from Millau to Rialto are not being delivered' warnings (#1342) 01f29b8ac1 fix conversion rate metric in dashboards (#1341) 51c3bf351f Increase rate from metric when estimating fee (#1340) 3bb9c4f68f fix generator scripts to be consistent with updatedrelay output (#1339) 0475a1667b fixed mess with conversion rates (#1338) d8fdd7d716 synchronize relay cli changes and token swap generator script (#1337) 6e928137a5 fix conversion rate override in token swap (#1336) 62d4a4811d override conversion rate in tokens swap generator (#1335) ed9e1c839c fi typo in generator script (#1334) 3254b5af7a Override conversion rate when computing message fee (#1261) 66df68b5b8 Revert "Revert "override conversion rate in estimate-message-fee RPC (#1189)" (#1275)" (#1333) 0ca6fc6ef8 fix clippy issues (#1332) 5414b2fffb Reinitialize bridge relay subcommand (#1331) a63d95ba7d removed extra *_RUNTIME_VERSION consts from relay code (#1330) 59fb18a310 fix typo in alert expression (#1329) a6267a47ee Using-same-fork metric for finality and complex relay (#1327) 88d684d37e use mortal transactions in transaction resubmitter (#1326) 8ff88b6844 impl Decode for SignedExtensions (otherwise transaction resubmitter panicks) (#1325) 1ed09854f0 Encode and estimate Rococo/Wococo/Kusama/Polkadot messages (#1322) ddb4517e13 Add some tests to check integrity of chain constants + bridge configuration (#1316) bdeedb7ab9 Fix issues from cargo deny (#1311) d3d79d01e0 expose fee multiplier metrics in messages relay (#1312) c8b3f0ea16 Endow relayer account at target chain in message benchmarks (#1310) f51ecd92b6 fix benchmarks before using it in Polkadot/Kusama/Rococo runtimes (#1309) 6935c619ad increase relay balance guard limits for Polkadot<>Kusama bridge (#1308) 7e31834c66 Fix mandatory headers scanning in on-demand relay (#1306) 92ddc3ea7a Polkadot-staging update (#1305) 3787193a31 fix session length of Rococo and Wococo (#1304) eb468d29c0 Revert nightly docker pin (#1301) e2d4c073e1 Use raw balance value if tokenDecimals property is missing (#1299) 108f4b29d1 Fix ss58 prefixes of Polkadot, Kusama and Westend used by relay (#1298) 64fbd2705e bump chain spec versions (#1297) 5707777b86 Bump Substrate/Polkadot/Cumulus refs (#1295) 29eecdf1fa Merge pull request #1294 from paritytech/polkadot-staging-update 1f0c05368e Relay balance metrics (#1291) 6356bb90b3 when messages pallet is halted, relay shall not submit messages delivery/confirmation transactions (#1289) 800dc2df8d when GRANDPA pallet is halted, relay shall not submit finality transactions (#1288) 3dd8e4f936 disable BEEFY allerts for Rialto (#1285) f58fed7380 support version mode cli options in send-message subcommand (#1284) 3aac448da3 reuse polkadot-service code (#1273) 2bdbb651e1 replace latest_confirmed_nonce runtime APIs with direct storage reads (#1282) 5f9c6d241f move "common" code of messages pallet benchmarks helpers to the common library (#1281) 173d2d8229 Merge pull request #1280 from paritytech/polkadot-staging-update 8b9c4ec16d do not start spec_version guard when version mode is set to auto (#1278) e98d682de2 removed extra messages benchmarks (#1279) c730e25b61 Move benchmarks from Rialto to Millau (#1277) 54146416e7 Merge pull request #1276 from paritytech/polkadot-staging-update df70118174 Merge branch 'master' into polkadot-staging-update ed7def64c4 Revert "override conversion rate in estimate-message-fee RPC (#1189)" (#1275) 38c6c3a49f Use "production" floating tag when uilding docker image from version git tags (#1272) ded9ff6dbb Replace InboundLaneApi::latest_received_nonce with direct storage read (#1269) f704a741ee Polkadot staging update (#1270) 8c65f0d7ab verify that GRANDPA pallet is not initialized before submitting initialization transaction (#1267) e7e83d8944 remove OutboundLaneApi::latest_received_nonce (#1262) 9f4b34acf1 bump rococo version (#1263) 82c08c5a87 read latest_generated_nonce directly from storage (#1260) 50ffb5dd08 override conversion rate in estimate-message-fee RPC (#1189) 467ca5ef59 move storage keys computation to primitivs (#1254) 4f9884066b remporary use pinned bridges-ci image in Dockerfile (#1258) edfcb74e00 Change submit transaction spec_version and transaction_version query from chain (#1248) 4009d970d0 pin bridges-ci image (#1256) 65e51b5e1c decrease startup sleep to 5s for relays and to 120s for generators + remove curl (#1251) 3bc74355d9 Add missing RPC APIs to rialto parachain node (#1250) 80c9429284 Bump relay version to 1.0.0 (#1249) 9ead06af2a runtimes: fix call_size() test (#1245) 4fc8a29357 Use same endowed accounts set on dev/local chains (#1244) fed54371c2 Refactor message relay helpers (#1234) a15b4faae7 post-merge build fix (#1243) 52232d8d54 Fix transactions mortality (#1196) c07bba931f Expose prometheus BEEFY metrics and add them to grafana dashboard (#1242) f927775bd5 Refactor finality relay helpers (#1220) 7bf76f14a8 Update Rococo/Wococo version + prepare relay for Rococo<>Wococo bridge (#1241) e860fecd04 Enable offchain indexing for Rialto/Millau nodes (#1239) 04d4d1c6b4 Enable Beefy debug logs in test deployment (#1237) cd771f1089 Fix storage parameter name computation (#1238) 816ddd2dd2 Integrate BEEFY with Rialto & Millau runtimes (#1227) d94b62b1ac update dependencies (#1229) 98eb9ee13d Add mut support (#1232) ffef6f89f9 fixed set_operational in GRANDPA pallet (#1226) bd2f8bfbd7 Add CODEOWNERS file (#1219) 6b5cf2b591 Unify metric names (#1209) d1541e797e remove abandoned exchange relay (#1217) 39140d0b34 Remove unused `relays/headers` (#1216) 9bc071d42b Remove unused PoA<>Substrate bridge (#1210) 877e8d01e3 Fix UI deployment. (#1211) 6cd5775ebe Add `AtLeast32BitUnsigned` for MessageLance::SourceChainBalance (#1207) git-subtree-dir: bridges git-subtree-split: f220d2fccabbf141101d19456ecb4e3576a1d797 * fix compilation warnings
outdated |
(text courtesy of @infinity0)
Parachain networking
This is a high-level proposal sketch for everyone's convenient reference; details will be added to https://github.com/w3f/research/tree/infinity0/paranet, including more justification, explanation, and background.
It should be implementable in an incremental way more-or-less, e.g. going through the points below in order. Perhaps (4) can be done in a very basic form, before (1).
Protocol workflow
Parachain networking forms the first main part of how parachains get their blocks finalised on the Polkadot relay chain. This happens in a few stages:
We go into these in more detail below. (2) is the most significant part, as it involves communication across a trust boundary - under the Polkadot model, from the perspective of the relay chain, validators are mostly-trusted since they are staked, whereas collators are entirely untrusted and even unauthenticated.
Collators selecting validators
Collators are expected to be full-nodes of the relay chain, so have easy access to relay chain data. Specifically, which validators are assigned to a parachain at the current block.
In order to help load-balancing, the collator should shuffle this set using their own transport (TLS or QUIC) public key as a seed. Then they can try connecting to each validator in this order, stopping when the first validator accepts the connection.
For honest collators that choose their public key randomly, this will distribute these collators evenly across the set of validators. (Malicious collators that attempt to overwhelm a single validator, are dealt with in the below section.)
Collator-validator communication
This section describes collator-validator direct communication, largely from the perspective of validators attempting to defend against potentially-malicious collators since that is the hard part.
(An honest collator being serviced by a malicious validator is a problem, but it is largely protected by rotating the validator groups around; our 2/3-honest assumption over the validators means that the effect of a malicious validator only lasts for a short time against any parachain.)
We need a pre-validation interface, a.k.a. incremental-validation interface - see #441. This would be in addition to the existing (full) validation function interface for parachains.
This enables validators to receive PoV blocks from collators in smaller pieces. Otherwise each validator must buffer up to 30MB of potentially-bogus data from every collator peer it is servicing; or more, if they want to allow for the possibility of multiple competing PoV blocks. With this mechanism available, we can buffer much less data. This is the most urgent immediate priority.
See below for more details.
Even with an incremental-validation function as in (0), collator peers can perform bandwidth-wasting attacks by sending us valid but redundant data, that can result in a parachain losing e.g. 2/3, 3/4, etc of its potential throughput. These attacks are hard to detect directly, since an attacker can always make a plausible-deniability defence "I didn't know you already had the data from someone else".
To defeat these attacks, each validator should measure the proportion of non-redundant valid data it gets from each peer. If any peer remains in the bottom X% of peers efficiency-wise, for longer than Y time, then we will disconnect them and accept a connection from a new stranger peer. (X and Y should be chosen so that the resulting churn does not negatively affect performance too much, in the common case where there is no attack.)
Thus attackers are forced to compete with genuine users in terms of the actual end performance that the application cares about - efficient use of bandwidth, i.e. throughput. This is more direct than "reputation scores" with vague semantics, and hopefully more effective.
As an implementation note, received pieces may switch status after being received (e.g. be initially unvalidated, then validated later), so the measurement mechanism needs to account for this.
As a future addition, we can reserve more buffer space for unvalidated data, for peers that have historically been more efficient. One can think of this as analogous to a "credit rating".
TODO: the above applies to a push-based protocol only. It is much harder under a pull-based protocol.
Even with good bandwidth measurement as in (1), attackers can easily generate new identities, a new IP address (e.g. in a IPv6 block), and reconnect to us again sending us more bogus data, wasting our bandwidth.
To protect ourselves against this scenario, we want good bandwidth control in addition to measurement. For example, 80% of our bandwidth can be reserved for the top X peers efficiency-wise. Then, newly-connected peers with no efficiency score, can only waste 20% of our bandwidth.
Even with good bandwidth control as in (2), attackers can DoS other collators by competing for a validator's attention in accepting new incoming connections. We can only defend against this via heuristics, and the most obvious piece of information is the source IP address. (For example, Bitcoin does not connect to 2 peers that share the same /16).
For parachain networking, if any peer sends us data that is eventually invalidated, their IP address and violation-time is recorded on a blacklist. Since IPv6 addresses are easy to generate, this blacklist affects not only those specific addresses, but is used to generate a "heat map", and then we prefer to accept new incoming connections from cooler parts of the heat map. Violations further back in time contribute less to the heat map, since IP address allocations change over time.
Initially we can start with something very simple, and make this more sophisticated / flexible later. We also need to figure out how to make this work concretely; the standard C TCP API function
accept(2)
does not let the caller selectively choose which pending incoming connection to accept based on address, but we can see if QUIC can provide us with such an API.The security justification is heuristic - an attacker is likely to control a clustered set of IP addresses, rather than being evenly distributed across the whole IP address space. Of course it also pollutes genuine users operating under similar IP addresses; however if no other addresses want service then we will still accept connections from the affected address ranges. Thus the heuristic is based on competition from unaffected IP address ranges, rather than being a hard block.
As time goes on, parachain validation groups rotate. To help the new group bootstrap to a good set of peers initially, the old group tells the new group which peers they believe were the best efficiency-wise - acting as a whitelist.
This whitelist is only used by the new group to select their initial collator peers; after that the new group tracks efficiency and blacklist as above, i.e. by their own observations without input from the old group. [*] Generally speaking, reputation systems that rely too much on information from others, can themselves be abused more easily.
Validators can tell each other about their whitelists and blacklists; this can be used to guide the acceptance of new incoming connections, including load-balancing - for example we don't want to accept a collator that is already being served by another validator.
Since the implementation of this depends on all of the above, the details of this are left open for future elaboration, bearing in mind the point [*] above.
Validator-validator communication
Since each PoV block needs a minimum number of attestations from validators, this part helps that achieve in a reasonable amount of time. (Otherwise, the parachain collators must send the same PoV block to multiple validators directly, which may be a bandwidth burden for smaller parachains.) It also adds some protection from DoS attacks against the parachain, where malicious collators compete with honest collators for attention from the validators - if at least one honest collator sends a PoV block, the validator servicing it will pass it onto the others for attestation.
This is done via a mini-overlay network over the parachain validators, structured as a d-regular random graph, generated deterministically via some seed material from the relay chain that is specific to the parachain. Whenever a validator successfully validates a PoV block, it is forwarded along these links to any other neighbour peers that do not already have the same PoV block.
As a future addition, this network can be used for metadata broadcasts along the lines of "I have successfully validated PoV block X". Other validators when seeing this, can then favour receiving X over other PoV blocks, helping to speed up the attestation process by all preferring to receive and validate the same block, rather than different blocks at the same time.
TODO: what happens during a validator rotation? transferral of whitelists and blacklists
Passing to the relay chain
The parachain networking component is not responsible for resolving forks; however to ensure we don't overload the block production protocol with too many forks, we introduce a special type of attestation called a "proposal" that each validator is only supposed to make one of. (If they make more than one, this is grounds for slashing.)
The first PoV that a validator receives and validates, they sign a proposal for, and forward this to the relay chain gossip network.
Any subsequent PoVs that a validator receives and validatos, they sign a regular attestation for, and forward this to the relay chain gossip network.
The block production protocol looks to receive a minimum quorum of attestations for each PoV block. Based on a trade-off between security and network unreliability, we set the quorum to be 2/3 of the validator set - note this is unrelated to the 2/3 consensus thresholds. TODO: think about & justify this number a bit more.
Sentry nodes
As described elsewhere, sentry nodes are essentially proxies for their single private validator, largely for anti-DoS purposes. Note that the term "private validator" is structural rather than security-related - the limited privacy is easily broken with a modest amount of resources, so should not be relied on.
In order to support the above proposal for parachain networking, sentry nodes must perform some additional functions beyond dumb proxying, described below.
Generally, the sentry node must proxy the data transfer of the PoV block - from either a collator or another validator, to the private validator recipient. This is conceptually quite straightforward; though care should be taken to ensure that backpressure from the recipient is passed through to the sender.
If we choose a pull-based protocol with advertisements: the sentry node has to remember which collator issued which advertisement, so it can forward the pull request from its private validator to the correct collator.
If we choose a push-based protocol with multi-acks: the sentry node doesn't have to remember anything; it broadcasts the multi-ack from its private validator, to all connected collators.
Additionally, since we want validators to connect to each other, we would like the private validator to be able to control its sentries' peers. If we do not have this ability, then the multiple sentries of a private validator must co-ordinate between each other in order to avoid overloading (all connecting to) the same neighbour validator (or one of its sentry nodes). It is easier for the private validator to make this decision itself, and tell one of its sentry nodes to make the outgoing connection.
Pre-validation
A pre-validation function is defined by the parachain. Given:
together occupying no more than $reasonable KB (TBD), it returns true iff:
When a validator receives such data, it runs this function. If true, this gives the collator the right to then send the larger PoV block to the validator. This provides some protection against DoS attacks by the collator, that send a large amount of data pretending to be a PoV block that does not then pass the full-validation function.
Security is based on the assumption (to be satisfied by the parachain) that the header is hard to create - e.g. a PoW or proving membership of a PoS staking set. If a parachain defines a weak pre-validation function, this will allow their parachain validators to be DoSed by malicious collators. So it is in the interests of the parachain to define a strong pre-validation function.
Future additions
When implementing the above, please bear in mind the long-term ideas below, as to make them not too awkward to add later.
Some way to prioritise between different proposers, for parachains that have that concept. For example, the pre-validation function could return an explicit priority number for the header; or we could have an additional comparison function over pairs of headers as an implicit priority ordering.
Censorship attacks remain possible, with or without this comparison function. e.g. bribe validators to choose their preferred collator, ignoring the priority.
Incremental validation, allowing collators to send different pieces of the same PoV block simultaneously. Some parts of this concept overlaps with A&V erasure-coded pieces, and we can probably re-use a bunch of logic from there. One difficulty is that A&V erasure-coded pieces include some information not known to collators, such as some state from the relay chain.
The text was updated successfully, but these errors were encountered: