-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Improve base weights consistency and make sure they're never zero #11806
Improve base weights consistency and make sure they're never zero #11806
Conversation
60b67de
to
d1300d6
Compare
The steps make sense to me, and seem to not break any assumptions for the benchmarking pipeline overall.
Doesn't this step assume that the first benchmark run is the absolute minimum weight the extrinsic could be? I don't think that is always true, for example, I could create a benchmark which runs between [10, 100], but actually 0 is a valid starting point, and less weight than 10. This might just be bad benchmark design, but a consideration here. |
Great! I was very interested in seeing whether this makes sense to you before I continue with it.
Well, yeah, it just picks the minimum of all of the runs. Perhaps not ideal, but it's the simplest and the most straightforward way of fixing this that I could think of. It is a good point that someone might only benchmark for, say, 10, but have the extrinsic accept a 0. That, said:
If such benchmarks are not widespread I think it should suffice to just update our docs regarding benchmarking to make it explicit how the base weights are calculated, and in the (hopefully?) rare case it happens to just eat the cost. |
We have a quite a lot of benchmarks with components that do not start at You can find them in the Polkadot repo with the regex
or
Maybe we can get around this by only applying your intercept-shift-logic if the linear regression reports a negative intercept? |
So what was the rationale to not run those benchmarks with zero values?
Hmm... well, in some cases that could work, but imagine that the linear regression returns an intercept of, say, There's one more problem I've found when digging through the numbers; the slopes can also be zero for some benchmarks! Behold, the benchmark for The dots are real samples from the benchmark, while the line on the bottom is the cost calculated using the weights. This happens because the slope here is calculated as So, uh, this is pretty bad. Weight calculation shouldn't randomly break down like this. |
In many cases, the 0 value makes no sense. For example, creating a membership group with 0 users. The underlying runtime code probably would reject this extrinsic. Regarding the slope issue, you must note that when looking at the raw, unaltered data, is is basically zero slope: https://www.shawntabrizi.com/substrate-graph-benchmarks/old/?p=system&e=remark Yes, the benchmarking process is truncating really small slopes to zero, but also these effects are nearly zero when compared to the underlying weights here. Im not against ideas which can bubble these things up accurately, but we should not also say that it is "break"ing. The weights we are generating, even for thinks like remark, are good for what they are used for. |
Somehow I also see a linear timing for cargo r --release --features=runtime-benchmarks --bin substrate -- benchmark pallet --dev --steps=50 --repeat=20 --pallet="frame_system" '--extrinsic=remark' --execution=wasm --wasm-execution=compiled produces on reference hardware:
The timing values seem to increase from 5.4µs at b=0 to 1_435µs at maximum b=3932150. |
So can we assume that if a parameter of zero doesn't make sense and the extrinsic would reject it then in normal use it wouldn't be called with such a parameter anyway? (Or at least wouldn't be called often if the caller is non-malicious.) In which case does it really matter that the base weight would be higher than necessary? I know that in general there is no such guarantee, but it'd make sense to me to basically require that extrinsics are benchmarked with full range of expected inputs to ensure optimal weights, and if you don't benchmark with the full range then you might get slightly higher weights, but you won't risk opening yourself to a potential DoS attack.
Sorry, I might have used a too strong of a word here. My point was more that it is a subtle landmine that someone could step on without realizing it, since whether this is a problem or not will depend on the exact range the input component can have.
Yes, I was looking at the raw, unaltered data. (: And if you only look at the first 16k of the input component it is indeed basically zero and doesn't matter, but the benchmark goes up to 4 million (as @ggwpez confirmed, thanks!) where it seems to start to matter. Again, whether this is actually a problem in practice for this particular extrinsic - I haven't checked, but even if it isn't it doesn't change the fact that it is a potential subtle correctness issue that might bite someone somewhere without them realizing. So I see two ways we could fix this: either always round up the weights to |
@koute either solutions sound good to me. I am still mostly in the camp that the current system is "good enough", we have plenty of full integration benchmarks which show that they are quite accurate (and in the cases they are not perfect, always pessimistic), but happy to have you find these further improvements that can be made :) |
Okay, I've increased the precision with which the weights are returned. So now the analysis code returns the extrinsic times' already essentially premultiplied by 1000 instead of multiplying them in the writer. From my side I think this should be good to go? As a sanity check here is an example diff comparing weights' outputs before and after this PR: @@ -62,27 +62,27 @@
impl<T: frame_system::Config> WeightInfo for SubstrateWeight<T> {
// Storage: Recovery Proxy (r:1 w:0)
fn as_recovered() -> Weight {
- (10_050_000 as Weight)
+ (10_110_000 as Weight)
.saturating_add(T::DbWeight::get().reads(1 as Weight))
}
// Storage: Recovery Proxy (r:0 w:1)
fn set_recovered() -> Weight {
- (20_610_000 as Weight)
+ (20_140_000 as Weight)
.saturating_add(T::DbWeight::get().writes(1 as Weight))
}
// Storage: Recovery Recoverable (r:1 w:1)
/// The range of component `n` is `[1, 9]`.
fn create_recovery(n: u32, ) -> Weight {
- (41_123_000 as Weight)
- // Standard Error: 14_000
- .saturating_add((155_000 as Weight).saturating_mul(n as Weight))
+ (39_920_000 as Weight)
+ // Standard Error: 5_800
+ .saturating_add((182_163 as Weight).saturating_mul(n as Weight))
.saturating_add(T::DbWeight::get().reads(1 as Weight))
.saturating_add(T::DbWeight::get().writes(1 as Weight))
}
// Storage: Recovery Recoverable (r:1 w:0)
// Storage: Recovery ActiveRecoveries (r:1 w:1)
fn initiate_recovery() -> Weight {
- (48_000_000 as Weight)
+ (47_270_000 as Weight)
.saturating_add(T::DbWeight::get().reads(2 as Weight))
.saturating_add(T::DbWeight::get().writes(1 as Weight))
}
@@ -90,9 +90,9 @@
// Storage: Recovery ActiveRecoveries (r:1 w:1)
/// The range of component `n` is `[1, 9]`.
fn vouch_recovery(n: u32, ) -> Weight {
- (31_502_000 as Weight)
- // Standard Error: 15_000
- .saturating_add((271_000 as Weight).saturating_mul(n as Weight))
+ (30_050_000 as Weight)
+ // Standard Error: 7_087
+ .saturating_add((334_068 as Weight).saturating_mul(n as Weight))
.saturating_add(T::DbWeight::get().reads(2 as Weight))
.saturating_add(T::DbWeight::get().writes(1 as Weight))
}
@@ -101,9 +101,9 @@
// Storage: Recovery Proxy (r:1 w:1)
/// The range of component `n` is `[1, 9]`.
fn claim_recovery(n: u32, ) -> Weight {
- (41_489_000 as Weight)
- // Standard Error: 16_000
- .saturating_add((192_000 as Weight).saturating_mul(n as Weight))
+ (40_270_000 as Weight)
+ // Standard Error: 6_915
+ .saturating_add((229_951 as Weight).saturating_mul(n as Weight))
.saturating_add(T::DbWeight::get().reads(3 as Weight))
.saturating_add(T::DbWeight::get().writes(1 as Weight))
}
@@ -111,9 +111,9 @@
// Storage: System Account (r:1 w:1)
/// The range of component `n` is `[1, 9]`.
fn close_recovery(n: u32, ) -> Weight {
- (47_169_000 as Weight)
- // Standard Error: 16_000
- .saturating_add((240_000 as Weight).saturating_mul(n as Weight))
+ (46_000_000 as Weight)
+ // Standard Error: 8_005
+ .saturating_add((296_745 as Weight).saturating_mul(n as Weight))
.saturating_add(T::DbWeight::get().reads(2 as Weight))
.saturating_add(T::DbWeight::get().writes(2 as Weight))
}
@@ -121,15 +121,15 @@
// Storage: Recovery Recoverable (r:1 w:1)
/// The range of component `n` is `[1, 9]`.
fn remove_recovery(n: u32, ) -> Weight {
- (44_697_000 as Weight)
- // Standard Error: 16_000
- .saturating_add((196_000 as Weight).saturating_mul(n as Weight))
+ (43_940_000 as Weight)
+ // Standard Error: 9_009
+ .saturating_add((169_804 as Weight).saturating_mul(n as Weight))
.saturating_add(T::DbWeight::get().reads(2 as Weight))
.saturating_add(T::DbWeight::get().writes(1 as Weight))
}
// Storage: Recovery Proxy (r:1 w:1)
fn cancel_recovered() -> Weight {
- (17_090_000 as Weight)
+ (17_141_000 as Weight)
.saturating_add(T::DbWeight::get().reads(1 as Weight))
.saturating_add(T::DbWeight::get().writes(1 as Weight))
}
@@ -139,27 +139,27 @@
impl WeightInfo for () {
// Storage: Recovery Proxy (r:1 w:0)
fn as_recovered() -> Weight {
- (10_050_000 as Weight)
+ (10_110_000 as Weight)
.saturating_add(RocksDbWeight::get().reads(1 as Weight))
}
// Storage: Recovery Proxy (r:0 w:1)
fn set_recovered() -> Weight {
- (20_610_000 as Weight)
+ (20_140_000 as Weight)
.saturating_add(RocksDbWeight::get().writes(1 as Weight))
}
// Storage: Recovery Recoverable (r:1 w:1)
/// The range of component `n` is `[1, 9]`.
fn create_recovery(n: u32, ) -> Weight {
- (41_123_000 as Weight)
- // Standard Error: 14_000
- .saturating_add((155_000 as Weight).saturating_mul(n as Weight))
+ (39_920_000 as Weight)
+ // Standard Error: 5_800
+ .saturating_add((182_163 as Weight).saturating_mul(n as Weight))
.saturating_add(RocksDbWeight::get().reads(1 as Weight))
.saturating_add(RocksDbWeight::get().writes(1 as Weight))
}
// Storage: Recovery Recoverable (r:1 w:0)
// Storage: Recovery ActiveRecoveries (r:1 w:1)
fn initiate_recovery() -> Weight {
- (48_000_000 as Weight)
+ (47_270_000 as Weight)
.saturating_add(RocksDbWeight::get().reads(2 as Weight))
.saturating_add(RocksDbWeight::get().writes(1 as Weight))
}
@@ -167,9 +167,9 @@
// Storage: Recovery ActiveRecoveries (r:1 w:1)
/// The range of component `n` is `[1, 9]`.
fn vouch_recovery(n: u32, ) -> Weight {
- (31_502_000 as Weight)
- // Standard Error: 15_000
- .saturating_add((271_000 as Weight).saturating_mul(n as Weight))
+ (30_050_000 as Weight)
+ // Standard Error: 7_087
+ .saturating_add((334_068 as Weight).saturating_mul(n as Weight))
.saturating_add(RocksDbWeight::get().reads(2 as Weight))
.saturating_add(RocksDbWeight::get().writes(1 as Weight))
}
@@ -178,9 +178,9 @@
// Storage: Recovery Proxy (r:1 w:1)
/// The range of component `n` is `[1, 9]`.
fn claim_recovery(n: u32, ) -> Weight {
- (41_489_000 as Weight)
- // Standard Error: 16_000
- .saturating_add((192_000 as Weight).saturating_mul(n as Weight))
+ (40_270_000 as Weight)
+ // Standard Error: 6_915
+ .saturating_add((229_951 as Weight).saturating_mul(n as Weight))
.saturating_add(RocksDbWeight::get().reads(3 as Weight))
.saturating_add(RocksDbWeight::get().writes(1 as Weight))
}
@@ -188,9 +188,9 @@
// Storage: System Account (r:1 w:1)
/// The range of component `n` is `[1, 9]`.
fn close_recovery(n: u32, ) -> Weight {
- (47_169_000 as Weight)
- // Standard Error: 16_000
- .saturating_add((240_000 as Weight).saturating_mul(n as Weight))
+ (46_000_000 as Weight)
+ // Standard Error: 8_005
+ .saturating_add((296_745 as Weight).saturating_mul(n as Weight))
.saturating_add(RocksDbWeight::get().reads(2 as Weight))
.saturating_add(RocksDbWeight::get().writes(2 as Weight))
}
@@ -198,15 +198,15 @@
// Storage: Recovery Recoverable (r:1 w:1)
/// The range of component `n` is `[1, 9]`.
fn remove_recovery(n: u32, ) -> Weight {
- (44_697_000 as Weight)
- // Standard Error: 16_000
- .saturating_add((196_000 as Weight).saturating_mul(n as Weight))
+ (43_940_000 as Weight)
+ // Standard Error: 9_009
+ .saturating_add((169_804 as Weight).saturating_mul(n as Weight))
.saturating_add(RocksDbWeight::get().reads(2 as Weight))
.saturating_add(RocksDbWeight::get().writes(1 as Weight))
}
// Storage: Recovery Proxy (r:1 w:1)
fn cancel_recovered() -> Weight {
- (17_090_000 as Weight)
+ (17_141_000 as Weight)
.saturating_add(RocksDbWeight::get().reads(1 as Weight))
.saturating_add(RocksDbWeight::get().writes(1 as Weight))
} And here's an example diff of what the benchmarking prints out on the stdout:
|
@koute can you bump the version of the crate? Otherwise, looking good, will review |
Sure. Do you mean version of
Thanks! |
The precision increase to pico-seconds is good, thanks. |
bot rebase |
…linear_regression
Rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets try it with the benchbot and some pallets.
/cmd queue -c bench-bot $ pallet dev pallet_contracts |
@ggwpez https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/1805446 was started for your command Comment |
@ggwpez Command |
frame/contracts/src/weights.rs
Outdated
} | ||
/// The range of component `r` is `[0, 50]`. | ||
fn instr_select(r: u32, ) -> Weight { | ||
Weight::from_ref_time(73_844_000 as u64) | ||
// Standard Error: 1_000 | ||
.saturating_add(Weight::from_ref_time(1_773_000 as u64).saturating_mul(r as u64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's increasing quite a bit… but if the math was wrong before, then we have on choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a shame we can't see the raw extrinsic times as we can't be sure whether this increase is due to the different way the regression is done or we've simply hit the issue from #12096. It be nice to have the bot always pass --json-file=
to the benchmarking CLI and always upload those files in the artifacts.
Considering that the majority of the weights got bumped I wouldn't be surprised if the increases are unrelated to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some bench bot does that, not sure which one 😅 Maybe the Polkadot release weight-update.
But yea, its hopefully the increase that we spotted in the other MR. Then lets go!
bot merge |
Waiting for commit status. |
…ritytech#11806) * Improve base weights consistency and make sure they're never zero * Switch back to `linregress` crate; add back estimation errors * Move the test into `mod tests` * Use all of the samples instead of only the first one * Use full precision extrinsic base weights and slopes * Add an extra comment * ".git/.scripts/bench-bot.sh" pallet dev pallet_contracts Co-authored-by: parity-processbot <> Co-authored-by: Shawn Tabrizi <shawntabrizi@gmail.com>
…ritytech#11806) * Improve base weights consistency and make sure they're never zero * Switch back to `linregress` crate; add back estimation errors * Move the test into `mod tests` * Use all of the samples instead of only the first one * Use full precision extrinsic base weights and slopes * Add an extra comment * ".git/.scripts/bench-bot.sh" pallet dev pallet_contracts Co-authored-by: parity-processbot <> Co-authored-by: Shawn Tabrizi <shawntabrizi@gmail.com>
Fixes #11804
Here's my attempt at fixing the way we generate our base weights. The fix is relatively straightforward:
y=0
.Here's a sample of how the benchmark results look before this PR:
As you can see the base weights are all over the place, and we even have one zero here.
And here's how they look after this PR:
This PR also increases the precision of the component weights by a factor of 1000; this helps in cases where the range of inputs is very wide and prevents the component weights from becoming zero due to insufficient precision.
There might be better ways of doing this. This PR is also unfinished and should not be merged as-is - it obviously needs to be actually properly tested to make sure this isn't subtly broken is some way; some tests also still need fixing, and the code to print out errors needs to be added back.I'm opening this mostly to discuss whether we actually want to fix it in this way, before I invest more time into it.I don't see any outstanding issues here so the PR should be good to go.