Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bridges + Zombienet tests: improvements and fixes #2796

Closed
4 of 9 tasks
svyatonik opened this issue Jan 17, 2024 · 5 comments · Fixed by paritytech/polkadot-sdk#3071
Closed
4 of 9 tasks

Bridges + Zombienet tests: improvements and fixes #2796

svyatonik opened this issue Jan 17, 2024 · 5 comments · Fixed by paritytech/polkadot-sdk#3071
Assignees
Labels
A-feat New feature or request
Milestone

Comments

@svyatonik
Copy link
Contributor

svyatonik commented Jan 17, 2024

Since there's a lot of in-progress improvements and issues with existing zombienet bridges tests, I'm filing this issue to track the progress + avoid losing the code.

Fixes:

  • an issue when tests are running inside the merge queue. The fix is ready: Try to fix bridges+zombienet tests when running in a merge queue polkadot-sdk#2940. Need a review + approval from CI team;
  • recent XCM changes (XCM WeightTrader: Swap Fee Asset for Native Asset polkadot-sdk#1845) have broke our existing test . Now we need to create a liquidity pool and mint wrapped tokens (possibly over the bridge? currently checking if that is possible No, we need to either have TakeFirstAssetTrader (which is absent on Rococo/Westend AHs) or have existing pools - so we need to create a pool manually) before anything else;
  • locally I'm getting the [Relaychain] XCM version is unknown for destination: Location { parents: 0, interior: X1([Parachain(1000)]) } error when starting tests. Could be some mess with binary versions though. UPD: some mess indeed;
  • https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4952771 - Timeout(600), "running cmd: /home/nonroot/bridges-polkadot-sdk/bridges/zombienet/scripts/invoke-script.sh with args init-bridge-hub-westend-local within 600 secs". Timeout 10m to initialize?

Improvements:

@svyatonik svyatonik added the A-feat New feature or request label Jan 17, 2024
@svyatonik svyatonik added this to the Nice To Have milestone Jan 17, 2024
@svyatonik svyatonik self-assigned this Jan 17, 2024
@bkontur
Copy link
Contributor

bkontur commented Jan 17, 2024

XCM version is unknown for destination: Location could be related to the merged XCMv4 maybe, I could take a look

@svyatonik
Copy link
Contributor Author

XCM version is unknown for destination: Location could be related to the merged XCMv4 maybe, I could take a look

Don't do that if you have something else, unless you really want to :) Because - if that'll block me from fixing the test, I'll need to fix it anyway. Also as I said - it could be something with my local binaries, because on CI I saw some other issue

@bkontur
Copy link
Contributor

bkontur commented Jan 17, 2024

... I could take a look

ok, sure :), so correction: I could take a look, when I won't I have other stuff and still not fixed :D :D

@svyatonik
Copy link
Contributor Author

... I could take a look

ok, sure :), so correction: I could take a look, when I won't I have other stuff and still not fixed :D :D

Actually - it was a mess with binaries, so no need to check that. I'm having the same xcm::execute: [Parachain] !!! ERROR: TooExpensive locally as on the CI. Sorry for the noise @bkontur

@svyatonik
Copy link
Contributor Author

A brief history of me trying to resurrect at least our first zombienet test for Rococo <> Westend bridge. When we have introduced first test on CI, it was running fine (see e.g. https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4904168, https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4905383, ...). Then removal of fees in sufficient payments support for sufficient tokens has broke it (it was cool that we have detected that using zombienet tests on CI) and after it has been fixed, tests have been running for a while (though they've turned red sometimes) until async backing has been enabled for . That has completely broken our zombienet tests (and also manual scripts) - it looks like a regular hardware is no longer enough to run 6 relay nodes and 8 parachain nodes. So we hit the payments "Timeout hit" much often now (we've seen that before even for Rialto<>Millau, but it has never caused any critical issues).

As a result, when running locally I see, e.g. for my last run, 136 for asset hub chain that is 128 blocks long. There are 109 timeouts in relay chain logs for the same run. And there are reorgs between non-sibling blocks (e.g. Reorg on #98,0xb6c4…0cf6 to #99,0x587d…2b7a). Also e.g. parachain may import 6+ parachain blocks at the same height. E.g.:

2024-01-26 11:17:54.033  INFO tokio-runtime-worker substrate: [Parachain] ✨ Imported #118 (0xdb8f…4582)    
2024-01-26 11:17:54.042  INFO tokio-runtime-worker substrate: [Parachain] ✨ Imported #118 (0x0374…91f0)    
2024-01-26 11:18:00.038  INFO tokio-runtime-worker substrate: [Parachain] ✨ Imported #118 (0x317f…0115)    
2024-01-26 11:18:06.024  INFO tokio-runtime-worker substrate: [Parachain] ✨ Imported #118 (0x4372…6069)    
2024-01-26 11:18:12.032  INFO tokio-runtime-worker substrate: [Parachain] ✨ Imported #118 (0xdae3…f015)  

All this causes havoc in tx pool - the same transaction may be included in several blocks. When blocks are retracted, transaction is revalidated and it may happen it is revalidated on the block that already has this transaction, so it gets dropped (Stale).

There were a couple of issues that I have never expected to see - like relay chain nodes were trying to submit equivocations (iirc GRANDPA). I thought when all validators are honest, it couldn't happen, even though we are running too many processes.

I've spent some time trying to fix that (see paritytech/polkadot-sdk#2982), but I think now it is time to remove our test from CI. we may readd it later if they'll be working better than now.

github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this issue Jan 26, 2024
closes paritytech/parity-bridges-common#2796

This partially reverts the #2439 - there are some changes (unrelated to
CI) that we still want to keep. The reason of that removal is that with
async backing enabled for Rococo AH (and for other chains in the near
future), we see a lot of issues there (because we run `14` nodes +
additional standalone process within a same container and it causes a
lot of timeouts). There's no way known to me to fix it right now, so
we're removing those tests hopefully temporarily to keep CI green
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this issue Jan 26, 2024
closes paritytech/parity-bridges-common#2796

This partially reverts the #2439 - there are some changes (unrelated to
CI) that we still want to keep. The reason of that removal is that with
async backing enabled for Rococo AH (and for other chains in the near
future), we see a lot of issues there (because we run `14` nodes +
additional standalone process within a same container and it causes a
lot of timeouts). There's no way known to me to fix it right now, so
we're removing those tests hopefully temporarily to keep CI green
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this issue Jan 26, 2024
closes paritytech/parity-bridges-common#2796

This partially reverts the #2439 - there are some changes (unrelated to
CI) that we still want to keep. The reason of that removal is that with
async backing enabled for Rococo AH (and for other chains in the near
future), we see a lot of issues there (because we run `14` nodes +
additional standalone process within a same container and it causes a
lot of timeouts). There's no way known to me to fix it right now, so
we're removing those tests hopefully temporarily to keep CI green
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-feat New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants