Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Reduce our monthly minutes usage #1366

Closed
jsdw opened this issue Jan 15, 2024 · 4 comments
Closed

CI: Reduce our monthly minutes usage #1366

jsdw opened this issue Jan 15, 2024 · 4 comments

Comments

@jsdw
Copy link
Collaborator

jsdw commented Jan 15, 2024

We need to reduce our monthly CI usage so that we don't exceed an org imposed (I think) cap on monthly CI minutes. Currently, Subxt is the worst offender.

The 45 minute timeout added to our tests should already help with this a lot (and perhaps we can further reduce this to say 30mint; we'd have to see! We don't want potentially successful jobs timing out).

Further, once we fix the unstable backend test timeouts (@lexnv is working on this; my hope is we can merge a fix in the next couple of days) we should reduce our usage some more.

Next, let's see whether using the faster ubuntu-8-core etc runners (which we can assume use a multiple of minutes depending on how many multiples of 4 cores they are, eg 8 core would be 2x minutes) actually are worth it where used, and revert to slower machines if the total time is not decreased significantly. We need to find a good balance here between test speed and minute use.

Perhaps we could also optimise our CI pipeline so that we run fast checks first (fmt and clippy for instance, but check the times these take to decide) in one block, and then the rest in another block (thanks for the idea @tadeohepperle). This will hopefully catch the quite frequent/basic fmt/clippy fails and prevent excess minutes being used until they are all fixed up, only running the long tests when the basics are all good.

Related issues:

@jsdw
Copy link
Collaborator Author

jsdw commented Jan 15, 2024

For reference, per minute rates for runners are as follows (https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#per-minute-rates):

Operating system vCPUs Per-minute rate (USD)
Linux 2 $0.008
Linux 4 $0.016
Linux 8 $0.032
Linux 16 $0.064
Linux 32 $0.128
Linux 64 $0.256
Windows 2 $0.016
Windows 8 $0.064
Windows 16 $0.128
Windows 32 $0.256
Windows 64 $0.512
macOS 3 or 4 $0.08
macOS 12 $0.12
macOS 6 (M1) $0.16

I believe that if you don't specify a runner, you get the 2 core linux one (I thought until now that the 4 core one was the default). So everything else is a multiple of this cost essentially based on the core count.

Addendum: When we use the basic runner, it is included in the free monthly mionutes that we get as an org (which we may exceed anyway many months, but would need to check again).

@lexnv
Copy link
Collaborator

lexnv commented Jan 17, 2024

We could further reduce the CI time by sharing a substrate-node across tests. Although we should consider this at a later time since the other proposed ideas are easier to implement.

Currently, each test will spawn a substrate node; and submit a few transactions and rpc calls (some with side-effects others not). Further, some tests will wait for the local substrate node to produce a few blocks.

By sharing the substrate node, we eliminate the overhead of spawning the node and waiting for blocks.

We could still have valid tests that produce side effects:

  • test 1: alice sends 1 DOT to bob; check alice account noune == 2
  • test 2: alice submits a TX which increments the account; expect alice account nonce == 2
    The side effect would happen if test 1 runs to completion before test 2; test 2 assumed that alice account started with nonce == 1.

To mitigate this we could initialize the substrate node with multiple test accounts; and each test would use a different account provided by our testing backend. We may need to add a custom testing RPC call to populate accounts, since last time I checked the initial accounts are seeded from polkadot-sdk code.

Would love to get some feedback on this 🙏 What do you guys think?

That said, I believe most of the CI time is related to my PRs investigating the timeout issues (lightclient and unstable backend). And we should indeed expect a normal return of CI consumption once debugging stops.

@jsdw
Copy link
Collaborator Author

jsdw commented Jan 17, 2024

Just for the complexity and fear of weird interactions between tests, I'd share a substrate-ndoe binary as a bit of a last resort really! I am hopeful that we can get our CI down to a good level with smaller timeouts, fixing the issue where we actually hit them, and running clippy+fmt+whatever first to fail early in the common case :)

@jsdw
Copy link
Collaborator Author

jsdw commented Jan 19, 2024

I think we've addressed these for now, so I'll close this and we can re-visit when we get feedback on our usage from now.

@jsdw jsdw closed this as completed Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants