Skip to content

Conversation

cklin
Copy link
Contributor

@cklin cklin commented Sep 4, 2025

This PR adds automation ID to the overlay-base database cache key so that we properly distinguish different analyses in the same repo for the same language.

Since I am changing the cache key format, I also moved the CodeQL bundle version to the end of the cache restore key, in case we want to remove it from the restore key sometime in the future.

Note that I chose to leave CACHE_VERSION unchanged because the old and the new cache keys are sufficiently different that there should be no risk of confusion.

Changes in this PR has been validated in an internal test repository.

Risk assessment

For internal use only. Please select the risk level of this change:

  • Low risk: Changes are fully under feature flags, or have been fully tested and validated in pre-production environments and are highly observable, or are documentation or test only.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Consider adding a changelog entry for this change.
  • Confirm the readme and docs have been updated if necessary.

This commit adds automation ID to the overlay-base database cache key so
that we properly distinguish different analyses in the same repo for the
same language.

Since I am changing the cache key format, I also moved the CodeQL bundle
version to the end of the cache restore key, in case we want to remove
it from the restore key sometime in the future.

Note that I chose to leave CACHE_VERSION unchanged because the old and
the new cache keys are sufficiently different that there should be no
risk of confusion.
@cklin cklin marked this pull request as ready for review September 4, 2025 21:52
@cklin cklin requested a review from a team as a code owner September 4, 2025 21:52
@Copilot Copilot AI review requested due to automatic review settings September 4, 2025 21:52
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the overlay-base database caching mechanism by adding automation ID to the cache key to properly distinguish different analyses in the same repository for the same language. The change restructures the cache key format and includes hashing of additional components while moving the CodeQL bundle version to the end for future flexibility.

  • Adds automation ID to cache key components for better analysis differentiation
  • Introduces a hashing mechanism for cache key components to maintain manageable key length
  • Restructures the cache key format and makes getCacheRestoreKey async

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

File Description
src/overlay-database-utils.ts Implements the core changes: adds automation ID to cache key, converts getCacheRestoreKey to async, and introduces component hashing
src/overlay-database-utils.test.ts Updates tests to mock getAutomationID function for the new async cache key generation
lib/init-action.js Generated JavaScript compilation output reflecting the TypeScript changes
lib/analyze-action.js Generated JavaScript compilation output reflecting the TypeScript changes

@cklin cklin requested a review from mbg September 4, 2025 22:06
mbg
mbg previously approved these changes Sep 5, 2025
Copy link
Member

@mbg mbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you for proactively tackling this problem! I agree with your reasoning for keeping the cache version the same as well.

I only have a few minor, non-blocking comments.

const sha = await getCommitOid(checkoutPath);
return `${getCacheRestoreKey(config, codeQlVersion)}${sha}`;
const restoreKey = await getCacheRestoreKey(config, codeQlVersion);
return `${restoreKey}${sha}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I am not sure I realised until now is that restoreCache is happy to use the primary cache key as a prefix for restoring a cache. I think I was previously under the assumption that only the partial restore keys (if any) were prefix-matched. It might be worth adding a comment for this somewhere, since it might otherwise be non-obvious how restoring the cache can work if sha is included here, but not when restoring the cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it for quite a while. What is an effective way to highlight the prefix matching used for cache restore?

I ended up making three changes:

  • Rename functions and variables to highlight the difference between "save key" and "restore key"
  • Append "Prefix" to the restore key function and variable names to highlight the fact that they are key prefixes
  • Add comment that the save key consists of the restore key prefix followed by the checkout SHA

Hopefully that will make things clearer!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking the time to think about this and come up with a few changes to make this clearer! Out of those three changes, I think appending "Prefix" to the names has made the most difference, since it more clearly communicates that it is intentionally a prefix of the cache key.

I probably would have liked a comment (e.g. on getCacheRestoreKeyPrefix) that notes that the primary key is prefix-matched and therefore omitting sha works fine. The subtlety here is that restoreCache has separate parameters for the "primary key" (a string; which you use with for the cache key prefix) and the "restore keys" (a string array; which you don't use right now). Although semantically they seem to work as if the primary key could just be the first element of restore keys, they are separated by both the cache library and Action. The existing comment for getCacheRestoreKeyPrefix talks about "restore keys" in "Actions cache supports using multiple restore keys" which can be interpreted as being about the "restore keys" parameter rather than the "primary key" one, so it's not clear from this that the "primary key" can also be a prefix.

This is somewhat pedantic and definitely not blocking for this PR!

This commit updates componentsJson computation to call JSON.stringify()
without the replacer array and documents why the result is stable.
@cklin cklin merged commit 1c6bc38 into main Sep 8, 2025
278 checks passed
@cklin cklin deleted the cklin/overlay-db-automation-id branch September 8, 2025 13:33
@github-actions github-actions bot mentioned this pull request Sep 9, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants