Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paraInclusion is massively overestimating its weight #849

Closed
ggwpez opened this issue Mar 23, 2022 · 15 comments · Fixed by #5082
Closed

paraInclusion is massively overestimating its weight #849

ggwpez opened this issue Mar 23, 2022 · 15 comments · Fixed by #5082
Assignees
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T1-FRAME This PR/Issue is related to core FRAME, the framework.

Comments

@ggwpez
Copy link
Member

ggwpez commented Mar 23, 2022

Take a look at some recent Polkadot blocks. Even empty blocks consume ~500ms of weight.
Benchmarking such blocks shows that only a fraction (~4%) of the weight is actually needed.
That means paraInclusion is massively overestimating its weight cost.
@shawntabrizi noted that we could think about refunding weight here.
Screenshot from 2022-03-23 15-57-10

Output from benchmark-block on reference hardware with production profile and wasm executor:

Block 9556370 with     2 tx used   4.29% of its weight (    21,509,600 of    501,617,735 ns)    
Block 9556371 with     3 tx used   4.08% of its weight (    20,643,956 of    505,412,849 ns)    
Block 9556372 with     2 tx used   3.94% of its weight (    19,750,293 of    501,617,735 ns)    
Block 9556373 with     2 tx used   3.98% of its weight (    20,368,847 of    511,839,008 ns)    
Block 9556374 with     3 tx used   3.97% of its weight (    19,920,982 of    501,891,126 ns)    
Block 9556375 with     3 tx used   3.98% of its weight (    20,114,849 of    505,370,681 ns)    
@shawntabrizi
Copy link
Member

I think probably we can solve this with a weight refund. My guess is this overestimate comes from the fact that we must assume a lot of things about how many parachains there are, how many are upgrading, how many are sending messages, etc...

If we can track those real numbers and stick it back into the benchmark function, then we can refund any unused weight.

@burdges
Copy link

burdges commented Mar 28, 2022

It depends what weight means too, like does off chain work count? We only do availability when backing a parachain block, but paraInclusion triggers approval checks with like 40-50 nodes downloading and checking the parachain block. Also approval checks shall become the dominant rewards source for validators.

@ggwpez
Copy link
Member Author

ggwpez commented Mar 29, 2022

I'm not sure if I understand your question. The weight should reflect the time that it takes validators to import the block.
If you want to specifically incentivize something, you can modify the fee per weight for that.

Otherwise we have the problem like right now that the blocks are 25% full without much in them. @burdges

@burdges
Copy link

burdges commented Mar 29, 2022

Alright cool. We de facto rate limit approvals, etc. by the number of availability cores anyways, which makes sense.

@rphmeier
Copy link
Contributor

Yeah @burdges this only concerns the on-chain weight, not the implicit weight of coordinated off-chain stuff.

@kianenigma
Copy link
Contributor

I think this should be tackled not too long in the future. Perhaps someone from FRAME team can give you a hand (az Zeke did last year), especially if someone knowledgable is willing to help out.

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
@the-right-joyce the-right-joyce added I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. and removed I10-optimisation labels Aug 25, 2023
@eskimor
Copy link
Member

eskimor commented Oct 8, 2023

Indeed. We need a better way of tracking weight. The current top-level benchmarking of enter does not work at all. There are too many parameters to consider, including internal state and effects. Also the benchmarking system works poorly with many parameters.

We might need to go back to (semi-) manual tracking. E.g. benchmarking at building blocks and combining them. This would reduce the number of parameters needed for any single benchmark significantly and might make "proper" weight tracking actually feasible and manageable.

@burdges
Copy link

burdges commented Oct 8, 2023

Why was this so heavy in the first place? Just on-chain messages? If so, we could bill for inclusion (cheap) and message count (expensive).

@eskimor
Copy link
Member

eskimor commented Oct 8, 2023

The way it is calculated is just severely messed up, this combined with benchmarking doing weird stuff if you have many parameters. We can chat, if you are interested.

claravanstaden pushed a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023
* Cleanup readme docs

* fix broken test
@kianenigma kianenigma added the T1-FRAME This PR/Issue is related to core FRAME, the framework. label Apr 5, 2024
@kianenigma
Copy link
Contributor

related: #959

bkchr pushed a commit that referenced this issue Apr 10, 2024
* Start generalizing rialto-millau commands.

* cargo fmt --all

* Introduce generic balance.

* Unify message payloads.

* cargo fmt --all

* init - generic

* Attempt to unify send message.

* Start moving things around.

* cargo fmt --all

* Move init-bridge.

* cargo fmt --all

* Improve UX of bridge argument.

* Fix clippy.

* Fix docs and scripts.

* Add docs.

* Apply suggestions from code review

Co-authored-by: Hernando Castano <HCastano@users.noreply.github.com>

* Fix copyright.

* Add issue numbers.

* More todos.

* Update comments.

Co-authored-by: Hernando Castano <HCastano@users.noreply.github.com>
@sandreim
Copy link
Contributor

This needs to be fixed soon, at 500 parachain validators on Kusama we see more than 70% weight usage.

Screenshot 2024-07-13 at 14 29 42

@eskimor
Copy link
Member

eskimor commented Jul 13, 2024

@ordian you have been looking into this and related issues already, right?

@ordian
Copy link
Member

ordian commented Jul 24, 2024

Overestimation

Looking at https://www.polkadot-weigher.com/history, we see the current block weight numbers for Kusama are around 75%:
Screenshot 2024-07-24 at 14 23 29
A typical block on Kusama has around 500 bitfields (with around 20 candidates included) and around 20 candidates backed in its ParaInherent, which consumes almost all of the weight.

Similarly, Polkadot's weight is around 42% with 300 bitfields and 20 backed candidates per block.

Importing these blocks in practice takes much less time suggesting a benchmarking overhead over 10x.

Back-of-the-envelope

Looking at the bench numbers for kusama, we have:

type weight rocksdb paritydb
bitfield 480us + 27r + 17w 2.855ms 1.546ms
backed 1409us + 30r + 16w 3.759ms 2.449ms

The weights differ significantly based on whether the weight is set to RocksDB or ParityDB backend (currently, it's rocksdb).

Given that a typical block will contain n_validators bitfields and n_cores backed candidates, the bottleneck seems to be the bitfields (we expect 1k n_validators and 200 n_cores in the not so distant future). Also as we can see from the db backend difference, the bottleneck is in db ops, not CPU (ParityDB weights are lower than RocksDB's).

Causes

Currently, the way we calculate the weight of processing a bitfield and a backed candidate is by running the whole enter function.
The are 2 main problems with this approach:

  1. The weight of processing an empty inherent is non-trivial and is in included in the cost for each bitfield/backed candidate.
  2. The cost of processing n bitfields != n * the cost of processing bitfields for 1 candidate. For the example, the number of signatures does not depend on the number of included candidates, but only on the number of validators. Some of the DB reads are shared between all candidates.
    So the benchmark for bitfields seems to be incorrect.

Short-term fix

A proposed short-term fix is to address problem 1 and switch to ParityDB weights as it should be the default.

The estimated weights with the proposed fix should be (note that the bench weights would remain the same, but we subtract the weight of the empty parainherent when processing enter):

type weight rocksdb paritydb
empty 121us + 15r + 5w 0.996ms 0.491ms
bitfield 359us + 12r + 12w 1.859ms 1.058ms
backed 1288us + 15r + 11w 2.763ms 1.958ms

Given these numbers, current Kusama blocks should be estimated to weigh 1.859ms * 500 + 20 * 2.763ms = 0.985s = 49%
we shaved off 25% of the total weight
With the switch to ParityDB db weights it would be 0.568s = 28%
another 21% shaved off

With these (projected) numbers we should be able to scale to 1k validators with
200 (async backing) cores with para inherent weight being 1.5s = 75%

See #5082 which addresses point 1.

We can optimize this further by addressing the point 2 in a follow-up PR (being worked on).

Long-term fix

Instead of running the whole enter function in order to estimate the cost of processing individual bitfields/backed candidates, it would be better to refactor the code into sections that processes disputes, bitfields and backed candidates and only benchmark specific subsection/function. Luckily such a refactoring was deemed necessary for other reasons and is also being worked on.

Conclusion

We are not blocked on ParaInherent weight overestimation for scaling to 1k and 200 cores with a simple fix (#5082).

github-merge-queue bot pushed a commit that referenced this issue Aug 29, 2024
closes #849

## Context

For the background on this and the long-term fix, see
#849 (comment).

## Changes

* The weigh files are renamed from `runtime_(parachains|common).*` to
`polkadot_runtime_(parachains|common).*`. The reason for it is the
renaming introduced in #4633. The new weight command and files are
generated now include `polkadot_` prefix.
* The WeightInfo for `paras_inherent` now includes `enter_empty` which
calculates the cost of processing an empty parachains inherent. This
cost is subtracted dynamically when calculating other weights (so the
other weights remain the same)

## Benefits

See
#849 (comment),
but TL;DR is that we are not blocked on weights for scaling the number
of validators and cores further.

Resolved questions:
- [x] why new benchmarks for westend are doing fewer db IOPS?
Is it due polkadot-sdk update (db IOPS diff)?
or the bench setup is no longer valid?

https://github.com/polkadot-fellows/runtimes/blob/7723274a2c5cbb10213379271094d5180716ca7d/relay/polkadot/src/weights/runtime_parachains_paras_inherent.rs#L131-L196
Answer: see background section of #5270 

TODOs:
- [x] Rerun benchmarks for Rococo and Westend
- [x] PRDoc

---------

Co-authored-by: command-bot <>
@github-project-automation github-project-automation bot moved this from Backlog to Completed in parachains team board Aug 29, 2024
@alexggh
Copy link
Contributor

alexggh commented Sep 12, 2024

Hi,

I just observed that on kusama the paraInherent weight halfed after we enacted Update Kusama v1.3.0 it went from around 73% before the enactment to around 35% after the enactment, the number of the candidates does not seem to change, so I guess the weights changed.

That's good news, but was that expected ?

@ordian
Copy link
Member

ordian commented Sep 12, 2024

Yes, that's expected. See #5082 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T1-FRAME This PR/Issue is related to core FRAME, the framework.
Projects
Status: Completed
Status: To do
Development

Successfully merging a pull request may close this issue.

10 participants