`frame-omni-bencher` fix for `--repeat` > 1 #5083

bkontur · 2024-07-19T16:05:04Z

Summary

I encountered a problem with the fellows' repository when I tried to regenerate weights for runtime using --steps 50 --repeat 20 and frame-omni-bencher here. I faced a strange issue with several benchmarks, and after some debugging, I discovered that e.g. pallet_xcm::SafeXcmVersion was removed during the second run. To address this, I had to apply a hotfix which sets the version every time, because the SafeXcmVersion from genesis was ignored in subsequent runs, works only for the first run.

Testing

I added several temporary CI pipelines with --repeat=2 setup for frame-omni-bencher.
Locally, see quick-benchmarks-omni-repeat-2-leave-intent and quick-benchmarks-omni-repeat-2-leave-intent how to run and simulate.

From local debugging, I see this strange behavior:

pallet_collator_selection::leave_intent and --steps=2 --repeat=1 works OK:

[2024-07-19T15:48:36Z ERROR bender::leave_intent] eligible_collators: 2
[2024-07-19T15:48:36Z ERROR bender::leave_intent] eligible_collators: 2
[2024-07-19T15:48:36Z ERROR bender::leave_intent] eligible_collators: 2
[2024-07-19T15:48:36Z ERROR bender::leave_intent] eligible_collators: 2
[2024-07-19T15:48:36Z ERROR bender::leave_intent] eligible_collators: 2
[2024-07-19T15:48:36Z ERROR bender::leave_intent] eligible_collators: 2
Pallet: "pallet_collator_selection", Extrinsic: "leave_intent", Lowest values: [], Highest values: [], Steps: 2, Repeat: 1

pallet_collator_selection::leave_intent and --steps=2 --repeat=2 fails:

[2024-07-19T15:45:59Z ERROR bender::leave_intent] eligible_collators: 2
[2024-07-19T15:45:59Z ERROR bender::leave_intent] eligible_collators: 2
[2024-07-19T15:45:59Z ERROR bender::leave_intent] eligible_collators: 0
The following 1 benchmarks failed:
- pallet_collator_selection::leave_intent
Error: Input("1 benchmarks failed")

Solution?

I suspect that genesis state is not re-initialized correctly after for second repeat - see suddenly eligible_collators: 0, but I will continue with debugging.

I think it could be related to the:

let (genesis_storage, genesis_changes) =
			self.genesis_storage::<Hasher, ExtraHostFunctions>(&chain_spec)?;

and

        /// Produce a genesis storage and genesis changes.
	///
	/// It would be easier to only return one type, but there is no easy way to convert them.
	// TODO: Re-write `BenchmarkingState` to not be such a clusterfuck and only accept
	// `OverlayedChanges` instead of a mix between `OverlayedChanges` and `State`. But this can only
	// be done once we deprecated and removed the legacy interface :(
	fn genesis_storage<H: Hash, F: HostFunctions>(

bkontur · 2024-07-19T16:09:39Z

cc: @ggwpez I will investigate later, but do you have any hints or thoughts?

ggwpez · 2024-07-19T16:30:56Z

Could be that somewhere a line like this is missing where it should reset the state:

polkadot-sdk/substrate/utils/frame/benchmarking-cli/src/pallet/command.rs

Line 347 in 9543d31

let mut changes = genesis_changes.clone();

One way to check this would be to print the storage root hashes in every benchmark and assert that they are the same.
sp_io::storage::root or something can be used for that.

ggwpez · 2024-07-19T18:31:28Z

It reproduces locally, i will take a look.

.gitlab/pipeline/test.yml

bkontur · 2024-07-19T22:59:55Z

It reproduces locally, i will take a look.

@ggwpez looks like your fix works, e.g. this passed: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6758635 or https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6758774

bkontur · 2024-07-22T10:17:48Z

@ggwpez testing with fellows:

frame-omni-bencher from actual master with --repeat 2:

./frame-omni-bencher v1 benchmark pallet --runtime  ./runtimes/target/debug/wbuild/bridge-hub-kusama-runtime/bridge_hub_kusama_runtime.wasm --all --steps 2 --repeat 2
./frame-omni-bencher v1 benchmark pallet --runtime  ./runtimes/target/debug/wbuild/bridge-hub-polkadot-runtime/bridge_hub_polkadot_runtime.wasm --all --steps 2 --repeat 2
./frame-omni-bencher v1 benchmark pallet --runtime  ./runtimes/target/debug/wbuild/coretime-kusama-runtime/coretime_kusama_runtime.wasm --all --steps 2 --repeat 2

...

Failed on the first one:


The following 27 benchmarks failed:
- pallet_collator_selection::leave_intent
- pallet_xcm::force_subscribe_version_notify
- pallet_xcm::force_unsubscribe_version_notify
- pallet_xcm::send
- pallet_xcm::teleport_assets
- pallet_xcm::transfer_assets
- pallet_xcm_benchmarks::fungible::deposit_reserve_asset
- pallet_xcm_benchmarks::fungible::initiate_reserve_withdraw
- pallet_xcm_benchmarks::fungible::initiate_teleport
- pallet_xcm_benchmarks::fungible::transfer_reserve_asset
- pallet_xcm_benchmarks::generic::query_pallet
- pallet_xcm_benchmarks::generic::report_error
- pallet_xcm_benchmarks::generic::report_holding
- pallet_xcm_benchmarks::generic::report_transact_status
- pallet_xcm_benchmarks::generic::subscribe_version
- pallet_xcm_benchmarks::generic::unsubscribe_version
- snowbridge_pallet_inbound_queue::submit
- snowbridge_pallet_system::create_agent
- snowbridge_pallet_system::create_channel
- snowbridge_pallet_system::force_transfer_native_from_agent
- snowbridge_pallet_system::force_update_channel
- snowbridge_pallet_system::set_operating_mode
- snowbridge_pallet_system::set_pricing_parameters
- snowbridge_pallet_system::set_token_transfer_fees
- snowbridge_pallet_system::transfer_native_from_agent
- snowbridge_pallet_system::update_channel
- snowbridge_pallet_system::upgrade
Error: Input("27 benchmarks failed")

frame-omni-bencher from actual bko-frame-omni-bencher-fix with --repeat 2:

./frame-omni-bencher v1 benchmark pallet --runtime  ./runtimes/target/debug/wbuild/bridge-hub-kusama-runtime/bridge_hub_kusama_runtime.wasm --all --steps 2 --repeat 2
./frame-omni-bencher v1 benchmark pallet --runtime  ./runtimes/target/debug/wbuild/bridge-hub-polkadot-runtime/bridge_hub_polkadot_runtime.wasm --all --steps 2 --repeat 2
./frame-omni-bencher v1 benchmark pallet --runtime  ./runtimes/target/debug/wbuild/coretime-kusama-runtime/coretime_kusama_runtime.wasm --all --steps 2 --repeat 2


...

Passed all.

…pec-builder` stuff to `get_preset` (#379) This PR introduces a new CI pipeline for checking and running runtime benchmarks as part of the test pipelines. This means we will now know if any changes break the benchmarks. The pipeline is based on downloading `frame-omni-bencher` (building it in-place is too expensive for every matrix run) and running it with a minimal setup: `--steps 2 --repeat 1`. Another part of this PR splits the default genesis setups from `chain-spec-generator` and moves them to the `sp_genesis_builder::GenesisBuilder::get_preset` runtime API implementation for every runtime, which is required by `frame-omni-bencher`. Closes: #197 Relates to: #298 Relates to: #324   - [X] Does not require a CHANGELOG entry ## Future works When [this issue](paritytech/polkadot-sdk#5083) is fixed, I will rewrite [weight-generation.md](https://github.com/polkadot-fellows/runtimes/blob/main/docs/weight-generation.md). The new version will be significantly simplified, with instructions to simply download `frame-omni-bencher`, build the runtime WASM, and run it—no other steps will be necessary. ``` 2024-07-18 14:45:51 Using the chain spec instead of the runtime to generate the genesis state is deprecated. Please remove the `--chain`/`--dev`/`--local` argument, point `--runtime` to your runtime blob and set `--genesis-builder=runtime`. This warning may become a hard error any time after December 2024. ``` --------- Co-authored-by: Bastian Köcher <git@kchr.de>

- Part of paritytech/ci_cd#1006 - Closes: paritytech/ci_cd#1010 - Related: #4405 - Possibly affecting how frame-omni-bencher works on different runtimes: #5083 Currently works in parallel with gitlab short benchmarks. Triggered only by adding `GHA-migration` label to assure smooth transition (kind of feature-flag). Later when tested on random PRs we'll remove the gitlab and turn on by default these tests --------- Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

paritytech-cicd-pr · 2024-08-12T11:12:18Z

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 2/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6967866

- Part of paritytech/ci_cd#1006 - Closes: paritytech/ci_cd#1010 - Related: paritytech#4405 - Possibly affecting how frame-omni-bencher works on different runtimes: paritytech#5083 Currently works in parallel with gitlab short benchmarks. Triggered only by adding `GHA-migration` label to assure smooth transition (kind of feature-flag). Later when tested on random PRs we'll remove the gitlab and turn on by default these tests --------- Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> Fix changes_to_storage Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> Better error messages Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> genesis-preset flag Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> Bench runner fix genesis storage Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> clippy

bkontur · 2024-08-16T08:54:55Z

@ggwpez I ran into a different issue and I think everything leads to the benchmarking code path that uses OverlayedChanges with connection to the StateMachine<BenchmarkingState e.g. here.

Originally, I found the issue when doing --repeat 2 (see description), when second iteration had different genesis which was missing data from OverlayedChanges.

But now, I have a different issue, when it looks like the storage is at some point flushed or something is removed.
Let's take it step by step:

With this PR#5327 I am refactoring testnet runtimes to support get_preset feature, so basically just moving chain-spec genesis generation to the runtime.
So the only result in this PR is that, polkadot-parachain generates chain spec with genesis default from runtime file genesis_config_presets.rs, so really no functional or whatever change. And the short-benchmarks with chain-spec works for BridgeHubRococo.
Everything is ok so far and I have correct get_preset in the BridgeHubRococo runtime.
So now I enabled new short-benchmarks CI that uses frame-omni-bencher (requires get_preset) in the PR#4405 based on 5327

But the short benchmarks with frame-omni-bencher for BridgeHubRococo fails e.g. here see full log here, error:

2024-08-13T09:01:50.3787830Z assertion failed: T::is_message_successfully_dispatched(setup.last_nonce())

So I cherry-picked your fix for --repeat 1 from this PR to the PR#4405 based on 5327
And now the short benchmarks with frame-omni-bencher passed for BridgeHubRococo here with full log here.

Based on what I tried and see, I assume that there is some "bug" somewhere when using StateMachine with BenchmarkingState and OverlayedChanges (maybe, something at some point clears/flushes the storage or something like that).
So even if we go with your working fix - changes_to_storage, there will be still some bug underhood.

I don't have time to investigate it further now, but when I come back from vacation in two weeks, I will continue.
If we want to do a hotfix till that time, these two commits are the actual hotfix:
eb04569
0628716

- Part of https://github.com/paritytech/ci_cd/issues/1006 - Closes: https://github.com/paritytech/ci_cd/issues/1010 - Related: paritytech#4405 - Possibly affecting how frame-omni-bencher works on different runtimes: paritytech#5083 Currently works in parallel with gitlab short benchmarks. Triggered only by adding `GHA-migration` label to assure smooth transition (kind of feature-flag). Later when tested on random PRs we'll remove the gitlab and turn on by default these tests --------- Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> Fix changes_to_storage Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> Better error messages Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> genesis-preset flag Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> Bench runner fix genesis storage Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> clippy

bkontur added the T12-benchmarks This PR/Issue is related to benchmarking and weights. label Jul 19, 2024

bkontur self-assigned this Jul 19, 2024

bkontur mentioned this pull request Jul 19, 2024

Short benchmarking in CI with frame-omni-bencher + extract chain-spec-builder stuff to get_preset polkadot-fellows/runtimes#379

Merged

1 task

ggwpez reviewed Jul 19, 2024

View reviewed changes

.gitlab/pipeline/test.yml Outdated Show resolved Hide resolved

ggwpez reviewed Jul 19, 2024

View reviewed changes

.gitlab/pipeline/test.yml Outdated Show resolved Hide resolved

bkontur force-pushed the bko-frame-omni-bencher-fix branch from cc826f3 to 00403dc Compare July 19, 2024 22:36

This was referenced Aug 6, 2024

frame-omni-bencher short checks #5267

Closed

frame-omni-bencher short checks #5268

Merged

bkontur and others added 15 commits August 12, 2024 12:35

Change --repeat 1 to --repeat 2 for `frame-omni-bencher for CI

aaf4aca

Set get_preset for AssetHubRococo

b64d99d

WIP:DNM - tmp debugs

f6ee848

clippy

45934bc

typo

44f28e1

typo

c8d5dd6

Bench runner fix genesis storage

b85a2a2

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

Fix changes_to_storage

619e1e6

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

Better error messages

673577b

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

genesis-preset flag

42089c8

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

fmt

59349d0

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

Add prdoc

7762b5e

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

Set get_preset for AssetHubRococo

b9f4f3a

Bench runner fix genesis storage

e02774e

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

clippy

e0dc270

bkontur force-pushed the bko-frame-omni-bencher-fix branch from fea3954 to e0dc270 Compare August 12, 2024 10:56

This was referenced Aug 28, 2024

Runtime genesis generation AstarNetwork/Astar#1341

Merged

[CI] Integrate Runtime Omni Bencher AstarNetwork/Astar#1316

Closed

kianenigma mentioned this pull request Sep 3, 2024

polkadot-omni-node: Fully remove benchmark sub-command #5564

Open

This was referenced Sep 11, 2024

use RuntimeGenesisConfig in genesis config presets polkadot-fellows/runtimes#451

Open

Clean up genesis initialization in benchmarking-cli #5694

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`frame-omni-bencher` fix for `--repeat` > 1 #5083

`frame-omni-bencher` fix for `--repeat` > 1 #5083

bkontur commented Jul 19, 2024

bkontur commented Jul 19, 2024

ggwpez commented Jul 19, 2024 •

edited

Loading

ggwpez commented Jul 19, 2024

bkontur commented Jul 19, 2024

bkontur commented Jul 22, 2024

paritytech-cicd-pr commented Aug 12, 2024

bkontur commented Aug 16, 2024

frame-omni-bencher fix for --repeat > 1 #5083

Are you sure you want to change the base?

frame-omni-bencher fix for --repeat > 1 #5083

Conversation

bkontur commented Jul 19, 2024

Summary

Testing

Solution?

bkontur commented Jul 19, 2024

ggwpez commented Jul 19, 2024 • edited Loading

ggwpez commented Jul 19, 2024

bkontur commented Jul 19, 2024

bkontur commented Jul 22, 2024

paritytech-cicd-pr commented Aug 12, 2024

bkontur commented Aug 16, 2024

`frame-omni-bencher` fix for `--repeat` > 1 #5083

`frame-omni-bencher` fix for `--repeat` > 1 #5083

ggwpez commented Jul 19, 2024 •

edited

Loading