Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cirrus: Improve caching effectiveness #553

Merged
merged 1 commit into from
Jan 16, 2023

Conversation

cevich
Copy link
Member

@cevich cevich commented Jan 12, 2023

Prior to this commit, the cache maintained for the main branch has grown
to over 16gigabytes. Attempts to upload it are causing frequent branch-
level CI failures due to being larger than the maximum allowable.
Attempt to fix this by introducing a cache-grooming script and altering
the cache-keys to track with dependency specification and build settings.
Content for the grooming script comes mainly from three places:

  1. Cargo documentation
  2. The rust-cache github action (typescript) source code
  3. This cargo home in CI section

For branch-level cache, the grooming script will completely clobber it's
cache at loosely-random intervals determined by the Cirrus Build ID.
This is intended to prevent the cache from growing indefinitely. It's an
non-ideal solution in a space full of very limited possibilities.

@cevich
Copy link
Member Author

cevich commented Jan 12, 2023

For comparison:

About one minute of time was saved by re-using the targets cache. However, downloading the cache took 43 seconds 😢 I'm doing something wrong.

@cevich cevich force-pushed the groom_rust_cache branch 2 times, most recently from 7243dc7 to e213870 Compare January 12, 2023 22:07
@cevich
Copy link
Member Author

cevich commented Jan 12, 2023

okay, my aging memory recalls now. The super beneficial cache to maintain is the Cargo cache. That's the real time-saver when it comes to rebuilding stuffs. I've added some new perpetration commands for it to the grooming script.

@Luap99
Copy link
Member

Luap99 commented Jan 13, 2023

Mhh the timings do not look right,
looking at a good PR (#549) I except the unit and validate task to be much faster.

As I understand the build task should take longer since it has to compile from scratch but the validate and unit test task should use the build cache from the build task. This looks way to slow now.

@cevich
Copy link
Member Author

cevich commented Jan 13, 2023

@Luap99 All tasks subsequent to build have readonly cache. It's possible I'm grooming "too much". For sure the cargo-cache is a much bigger compile time-saver than the target-cache. Which is odd, since it's the target-cache we're having size problems with. The Most recent cold-cache run for cargo-cache resulted in a 3x increase in build time. Followup (warm-cache) run is underway now.

In anycase, I'll take a look at the other tasks shortly, thanks for pointing them out.

@Luap99
Copy link
Member

Luap99 commented Jan 13, 2023

Timings look much better now, although I wonder why aarch64 still looks much slower compared to the older PRs. But maybe this is just a one off.

Also I found rust-lang/cargo#5026, which explains why target dir just keeps increasing. Basically every time as the rust toolchain is updated we need clean that dir in the future.

@cevich
Copy link
Member Author

cevich commented Jan 13, 2023

But maybe this is just a one off.

The way cache is setup now, it's keyed on a pr-by-pr basis. For branches, it's keyed on the branch-name. So the first run in a PR or new branch, cache will be cold. Then subsequent re-runs (in the same PR) will benefit from a warm cache. I did it this way to thwart accidental or intentional pollution/poisoning from one PR to another.

If you think it's safe for PRs to share cache, I can make that change. I just defaulted to "most-safe" as a matter of habit.

Also I found rust-lang/cargo#5026,

Oh interesting, yep that's got to be the problem. It also explains why the github action goes to such lengths as reading config files and checking mtimes. I didn't know there was a cargo clean command. I'll use that in my "butcher's knife" hack instead of trying to manually remove files...

Prior to this commit, the cache maintained for the main branch has grown
to over 16gigabytes.  Attempts to upload it are causing frequent branch-
level CI failures due to being larger than the maximum allowable.
Attempt to fix this by introducing a cache-grooming script and altering
the cache-keys to track with dependency specification and build settings.
Content for the grooming script comes mainly from three places:

1. [Cargo documentation](https://doc.rust-lang.org/nightly/cargo/guide/build-cache.html)
2. The rust-cache github action [(typescript) source code](https://github.com/Swatinem/rust-cache/tree/master/src)
3. This [cargo home in CI section](https://doc.rust-lang.org/nightly/cargo/guide/cargo-home.html#caching-the-cargo-home-in-ci)

For branch-level cache, the grooming script will completely clobber it's
cache at loosely-random intervals determined by the Cirrus Build ID.
This is intended to prevent the cache from growing indefinitely.  It's an
non-ideal solution in a space full of very limited possibilities.

Signed-off-by: Chris Evich <cevich@redhat.com>
@cevich
Copy link
Member Author

cevich commented Jan 13, 2023

Okay, I think this PR is ready now. I tried to document what's happening as best as I could.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@flouthoc flouthoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 16, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cevich, flouthoc, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Luap99,cevich,flouthoc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 018056a into containers:main Jan 16, 2023
cevich added a commit to cevich/aardvark-dns that referenced this pull request Jan 16, 2023
Port of containers/netavark#553

Signed-off-by: Chris Evich <cevich@redhat.com>
cevich added a commit to cevich/aardvark-dns that referenced this pull request Jan 16, 2023
Port of containers/netavark#553

Signed-off-by: Chris Evich <cevich@redhat.com>
cevich added a commit to cevich/netavark-dhcp-proxy that referenced this pull request Jan 16, 2023
Port of containers/netavark#553

Signed-off-by: Chris Evich <cevich@redhat.com>
cevich added a commit to cevich/netavark-dhcp-proxy that referenced this pull request Jan 16, 2023
Port of containers/netavark#553

Signed-off-by: Chris Evich <cevich@redhat.com>
cevich added a commit to cevich/netavark-dhcp-proxy that referenced this pull request Jan 16, 2023
Port of containers/netavark#553

Signed-off-by: Chris Evich <cevich@redhat.com>
cevich added a commit to cevich/netavark-dhcp-proxy that referenced this pull request Jan 16, 2023
Port of containers/netavark#553

Signed-off-by: Chris Evich <cevich@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants