Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fork-aware-tx-pool: add heavy load tests based on zombienet #7257

Draft
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

iulianbarbu
Copy link
Contributor

@iulianbarbu iulianbarbu commented Jan 20, 2025

Description

Builds up towards addressing #5497 by creating some zombienet-sdk code infra that can be used to spin regular networks, as described in the fork aware transaction pool testing setup added here #7100. It will be used for developing tests against such networks, and to also spawn them on demand locally through tooling that will be developed in follow ups.

Integration

Node/runtime developers can run tests based on the zombienet-sdk infra that spins frequently used networks which can be used for analyzing behavior of various node related components, like fork aware transaction pool.

Review Notes

  • This is work in progress, and I'll follow up with some expected behavior which should be showcased, for some of the scenarios described in fatxpool: add heavy load testsuits #5497 .
  • The zombienet-sdk networks setup will be extracted in its own module/crate, and reused with a CLI tool that will be developed in a follow up (can be named zn-spawner), which would simplify the startup of networks that are used regularly for testing against fork-aware tx pool (at least), the logs management of such networks, and convenient integration with tools that observe certain behavior during tests, like tx-test-pool: https://github.com/michalkucharczyk/tx-test-tool/. The end result would look as if some is using zombienet-cli in a specific way (with given DSLs), and parts of these DSL files are also customizable. More and more stuff might end up being customizable, which will make this no different from zombienet-sdk, but having this zn-sdk infra glue code would still be useful because it builds standard ways of writing ZN based tests, and verifying behaviours of the networks.

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
@iulianbarbu iulianbarbu added the R0-silent Changes should not be mentioned in any release notes label Jan 20, 2025
@iulianbarbu iulianbarbu self-assigned this Jan 20, 2025
@iulianbarbu iulianbarbu changed the title Ib zn test fatp fork-aware-tx-pool: add heavy load tests based on zombienet Jan 20, 2025
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Copy link
Contributor

@michalkucharczyk michalkucharczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far looks good.

I was thinking about CLI. Maybe we don't need it, after all? Instead, we could use following:

$ cargo test --test stand_alone -- --exact run_single_collator_network

which would just:

  • spawn the network,
  • print out the location of the executed binaries (this one is important to me, to be absolutely sure that I don't run test on old binaries),
  • print out the location of logs file,
  • ... or maybe just print zombienet summary - with ports, params, etc...
  • wait forever

Then one could use any tooling to just send transcations to this network.
We could start with this and see how it goes. We could do next iteration from here, and not over-complicate it from the beginning. In that way we would use the same config for manual testing, and for pre-defined test suits.

One more idea to control parameters would be using environment variables (not needed in the first step). Provides flexibility, less convenient to use comparing to CLI args, but much easier to implement.

export TXPOOLTESTS_POOL_LIMIT=1000
$ cargo test --test stand_alone -- --exact run_single_collator_network

Still we can have all the integration tests in different test module, reusing the same network configurations as those spawned in stand_alone mod, for example:

$ cargo test --test integration -- --exact single_collator_network__single_account_1M_txs

stand_alone tests would be excluded from cargo test command (as they never terminated on their own).

Any thoughts on this?


#[async_trait::async_trait]
impl Network for Limits30Network {
fn ensure_bins_on_path(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry to chime in. This is already checked by zombienet-sdk internally (for each cmd to execute and the workers).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is zombienet-sdk capable of printing full executable paths? (I know, I am a bit paranoid on this 😅)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pepoviola can you point me to where we're doing these checks in zombienet-sdk?

Copy link
Contributor Author

@iulianbarbu iulianbarbu Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far the logs zombienet emits aren't showing the full binary paths for the node binaries it executes. @michalkucharczyk does it help if the logs contain the dump of the $PATH variable? Thinking we can make a feature request for zn-sdk logs to show that when configuring the network.

At the same time, @pepoviola , is there a way to tell a log file path to zn-sdk so that the log records emitted by zn-sdk could be tailed and easily analyzed later on, instead of showing it directly to stdout (which depending on the terminal settings, can be limited and polluted with other outputs?

@pepoviola
Copy link
Contributor

Hi @iulianbarbu / @michalkucharczyk, I'm working on an small cli to spawn from toml and we already can load tomls. Did you think that could be handy here?
Thanks!

@michalkucharczyk
Copy link
Contributor

Hi @iulianbarbu / @michalkucharczyk, I'm working on an small cli to spawn from toml and we already can load tomls. Did you think that could be handy here? Thanks!

My 3 cents:
Our goal here is to have some abstraction that allows to run some testsuit against predefined network, and also run exactly the same predefined network to conduct manual tests.

We actually want to spawn network programatically. So I am not sure that cli will be helpful here. But having some API in zombienet that would accept toml and spawn the network could be potentially helpful. Especially when it comes to customization - instead of playing with CLI args or enviroment variables as I proposed in my previous comment we could just edit toml file.

On the other hand, it seems that using zn-sdk is not that difficult.

@iulianbarbu what is your opinion?

@iulianbarbu
Copy link
Contributor Author

Responding to the last messages where @pepoviola chimed in:

having some API in zombienet that would accept toml and spawn the network could be potentially helpful. Especially when it comes to customization - instead of playing with CLI args or enviroment variables as I proposed in my previous comment we could just edit toml file.

+1 to this idea @michalkucharczyk . I personally prefer Rust and zn-sdk for the testsuite, while for manual runs, if we'd be able to import the tomls directly with zn-sdk, and have the option to also use them with a CLI, then we can have the best of both worlds. It would be just a preference for how we'd like to do the manual testing, because we can still run the testsuite locally, by changing things within the rust tests, but if we want just to run the network and then do other stuff against it, we'd have the CLI as well.

we already can load tomls.

yup, thanks @pepoviola for confirming this offline. For reference: https://docs.rs/zombienet-sdk/latest/zombienet_sdk/struct.NetworkConfig.html#method.load_from_toml.

I'm working on an small cli to spawn from toml

@pepoviola how different would be from the existing zombienet CLI and why do we need another one?

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
@alindima
Copy link
Contributor

We don't want to commit the chainspecs to the repo, right? The better way is to generate them using build scripts, like we do here for example: https://github.com/paritytech/polkadot-sdk/blob/master/polkadot/zombienet-sdk-tests/build.rs

@michalkucharczyk
Copy link
Contributor

michalkucharczyk commented Jan 23, 2025

We don't want to commit the chainspecs to the repo, right? The better way is to generate them using build scripts, like we do here for example: https://github.com/paritytech/polkadot-sdk/blob/master/polkadot/zombienet-sdk-tests/build.rs

Good point. Chain-spec can be avoided with this commit:
#6267

Shall be enough to define dev-accounts in the genesis-patch (which in turn can be given in zobienet toml file).

However, not sure if we need extra aka-build.rs step. My guess would be it is not needed.

@iulianbarbu
Copy link
Contributor Author

iulianbarbu commented Jan 23, 2025

I think we need to mention the path to the runtime as well if mentioning the patch. That's a variable path, but we can assume it is target/release/wbuild/..., which should be fine for 99% of the cases (?). When loading the network with zn-sdk from zombienet.toml we must mention the runtime path in the toml file explicitly since there is no API to achieve changes after obtaining the network config.

However, not sure if we need extra aka-build.rs step. My guess would be it is not needed.

Can't see either how a build.rs can help when using zombienet-sdk with tomls loading.

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
@michalkucharczyk
Copy link
Contributor

michalkucharczyk commented Jan 24, 2025

I think we need to mention the path to the runtime as well if mentioning the patch.

Maybe we don't need runtime path?
For runtimes that are already embedded in polkadot-parachain binary it should be enought to just give the name of the runtime. It is already done in many tomls across the codebase.

We could skip yap which is kinda experimental. (or add it in 2nd phase / followup).

@iulianbarbu
Copy link
Contributor Author

Oh yeah, I guess asset-hub-* is covered. Leaving out experimental runtimes sounds good.

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Cargo.toml Outdated Show resolved Hide resolved
@michalkucharczyk michalkucharczyk linked an issue Jan 29, 2025 that may be closed by this pull request
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
iulianbarbu and others added 3 commits January 31, 2025 22:20
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
let handle1 = tokio::spawn(async move {
cmd_lib::run_cmd!(RUST_LOG=info ttxt tx --chain=sub --ws=$ws from-single-account --account 0 --count 5 --from $future_start)
});
tokio::time::sleep(Duration::from_secs(5)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is kinda temporary - ideally if we could avoid timeouts. They cause a lot of troubles in CI.

I see two options:

  • we could ask nodes if they have enough txs in the pool - unluckily we don't have API for this. See discussion around: fatxpool: add pending_extrinsics_len RPC call #7138
  • but we could also add some status getter to ttxt API and check if txs were sent (or validated). (could be sth like get_validate_count). This seems to be a mid-term way to go.

Copy link
Contributor Author

@iulianbarbu iulianbarbu Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should avoid the sleep because of the CI + the non-deterministic way the spawned tasks might be polled behind the scenes by the tokio runtime.

What you suggested sounds useful, but maybe we can implement them for more advanced scenarios. I would avoid complicating this initial scenario just because I think we should be fine assessing future and ready txs sent in parallel, in a high number. You suggested offline that we can use the from-many-accounts, and just to clarify more concretely, I would use an account range for ttxt between 0..99, with future txs starting from nonce 100, with a count of 100, so 10k future txs, and ready txs for the same accounts range, and with nonce starting at 0, with count 100, so another 10k.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented the above scenario and it works based on this PR: michalkucharczyk/tx-test-tool#22. #7257 is pending now on the tx-test-tool PR, to get merged, ttxt to be published as a lib on crates.io, and then the path dependency for ttxt to be replaced with a crates.io one.

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R0-silent Changes should not be mentioned in any release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fatxpool: add heavy load testsuits
5 participants