Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: reduce resource consumption during GitHub Actions workflows #1340

Open
jayaddison opened this issue Oct 24, 2024 · 4 comments
Open

Idea: reduce resource consumption during GitHub Actions workflows #1340

jayaddison opened this issue Oct 24, 2024 · 4 comments

Comments

@jayaddison
Copy link
Collaborator

At the moment each of our individual GitHub Actions workflows - of which we generally launch at least 3x5 (unit tests: 3 operating systems, 5 Python versions) performs a checkout of the complete git repository, and subsequently performs build actions from there.

This seems wasteful for various reasons:

  • Most of the commit history -- and particularly the tree content -- from the commit history is no required during each individual CI task. In fact, it would be a problem if any particular workflow job does access any of that history (with the possibly exception of the publish workflows, that have a recently-added conditional (GitHub Actions: releases: confirm commit exists on release branch #1312) to confirm that their commit ref is on a release branch).
  • Each of the workflows is performing repetitive, duplicate work -- CPU, network and disk space are all consumed for what should otherwise be identical behaviour (modulo any small differences in operating-system behaviour).

Could we do better?

  • Perhaps we could have a single pre-workflow task that uses git archive to produce an artifact that each of the subsequent workflows use to build from?
  • Do we even need to place this into an archive file, or could it simply be a read-only directory layout that the other workflows read from?
  • Do we need integrity checks to allow all the workflows to confirm that the archived source is the same as the commit ref they're building from, and/or the same as each of the other workflows?

I think that an effective solution here might be able to reduce the sum duration of continuous integration workflows for a single commit by ~2 minutes (10 seconds for each of 15+ jobs == more than two minutes, but I also going to assume that we'd lose some time because each of the tasks would depend upon the initial export-from-git task).

@jayaddison
Copy link
Collaborator Author

Partially-related:

  • Build logs: I've begun clearing out some stale (pre-October) build logs, because they do contribute towards GitHub account billing. The interface for this isn't great, as each build log has to be deleted individually. The query I'm using is -branch%3Amain+-branch%3Av14+created%3A<2024-10-01 (not branch main and not branch v14 and workflow created before 2024-10-01). I accidentally deleted a couple of recent workflows because the web page UI that I'm using for this (the only one I'm currently aware of) loses the query after each delete action.
  • ci: suppress static-value warnings during GitHub Actions unit test workflow #1335 - reduce redundant build-log output (storage resource, primarily)

@jayaddison
Copy link
Collaborator Author

Build logs: I've begun clearing out some stale (pre-October) build logs, because they do contribute towards GitHub account billing. The interface for this isn't great, as each build log has to be deleted individually. The query I'm using is -branch%3Amain+-branch%3Av14+created%3A<2024-10-01 (not branch main and not branch v14 and workflow created before 2024-10-01). I accidentally deleted a couple of recent workflows because the web page UI that I'm using for this (the only one I'm currently aware of) loses the query after each delete action.

This is probably mostly-pointless; narrowing it to workflow:unittests could make some sense, though (hundreds of logs, as opposed to thousands) bearing in mind the current UI deletion workflow requiring probably a few seconds per item. Also, the file sizes are probably relatively insignificant compared to the size of the pip caches added recently (#1323).

@jayaddison
Copy link
Collaborator Author

jayaddison commented Oct 24, 2024

This is probably mostly-pointless; narrowing it to workflow:unittests could make some sense, though (hundreds of logs, as opposed to thousands) bearing in mind the current UI deletion workflow requiring probably a few seconds per item. Also, the file sizes are probably relatively insignificant compared to the size of the pip caches added recently (#1323).

Ok, I'm going to pause: it seems that the content of the logs for many of these entries cannot be viewed in the web UI, despite there existing an option to 'delete all logs' (associated with each item).

Basically I'd like to figure out whether we can upper-bound our usage of log storage space, because I don't want this project to exert unnecessary resource usage expenditure. My sense is that over the past few weeks, we've increased resource usage in various ways, but I don't have good statistics for those currently (but despite that, can think of some ways to reduce evident usage - e.g. duration of test workflows, and removal of duplicate work).

@jayaddison
Copy link
Collaborator Author

Perhaps we could have a single pre-workflow task that uses git archive to produce an artifact that each of the subsequent workflows use to build from?

The relevant documentation for this style of approach is described here: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/storing-and-sharing-data-from-a-workflow#passing-data-between-jobs-in-a-workflow - note again that there remains a question about integrity-checks of the resulting data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant