-
Notifications
You must be signed in to change notification settings - Fork 404
Overlay: use restoreCache() timeout #3125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the overlay-base database download mechanism to use the native timeout capabilities of the restoreCache()
function instead of a custom timeout wrapper. The change replaces waitForResultWithTimeLimit()
with direct usage of actionsCache.restoreCache()
and its segmentTimeoutInMs
option to provide more effective timeout control for download operations.
Key changes:
- Remove custom timeout wrapper in favor of native cache API timeout option
- Configure segment-level timeout for more granular download control
- Clean up unused timeout constant while preserving it for other cache operations
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
File | Description |
---|---|
src/overlay-database-utils.ts | Replace waitForResultWithTimeLimit() wrapper with direct restoreCache() call using segmentTimeoutInMs option |
lib/init-action.js | Generated JavaScript code reflecting the TypeScript changes, including variable renaming to avoid conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one minor concern about download speeds outside the cloud.
src/overlay-database-utils.ts
Outdated
{ | ||
// Azure SDK download (which is the default) uses 128MB segments. | ||
// Setting segmentTimeoutInMs to 3000 translates to segment download | ||
// speed of about 40 MB/s, which should be achievable unless the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't 40MB/s in the high end for more residential connections?
What if this is somehow run from outside Actions? For instance for local test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you would generally run this locally. For tests, we'd mock the behaviour. Beyond that, I haven't checked, but I'd imagine that the API endpoints used by the Actions cache also are only accessible from hosted runners or at least require some suitable auth token. Setting that up locally is unlikely to ever be worth it. + even if we wanted to, it would be easy to adjust this value for local testing if it's too high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
40MB/s is indeed very high for residential connections. I don't think I could reach this download speed from my connection at home.
That said, I am still inclined to keep this setting, at least for now.
- I agree with Michael that we are not very likely to run this code locally.
- Using overlay analysis is a performance optimization, and there is a point where download would take up so much time that the entire workflow ends up being slower end-to-end. That break-even point will probably differ from one repository to another, but 40MB/s seems like a good figure to start with.
As we gain more experience, I am definitely open to adjusting this limit as appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me, if you're happy with it.
In terms of the motivating issue, I suppose this is a change in objective? Rather than an overall timeout (which didn't work), there's now a timeout for individual segments (which does work, but there's no overall timeout)? So you might spend more time than MAX_CACHE_OPERATION_MS
downloading the overlay cache if it's large, but no more than 3000ms per segment.
src/overlay-database-utils.ts
Outdated
actionsCache.restoreCache([dbLocation], cacheRestoreKeyPrefix), | ||
() => { | ||
logger.info("Timed out downloading overlay-base database from cache"); | ||
const foundKey = await actionsCache.restoreCache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any harm in keeping this wrapped in waitForResultWithTimeLimit
with a suitable time limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there is no harm, and there would be benefits (such as protecting against hangs outside segment downloads).
I put the waitForResultWithTimeLimit()
wrapper back, with a (longer) 10-minute time limit.
Thanks for the suggestion!
This commit changes overlay-base database download to pass the segmentTimeoutInMs option to restoreCache(), so that restoreCache() itself can properly abort slow downloads. The waitForResultWithTimeLimit() wrapper around restoreCache() remains as a second line of defense, but with a higher 10-minute time limit, to guard against cache restore hangs outside segment downloads.
489021a
to
80273e2
Compare
This PR changes overlay-base database download to pass the
segmentTimeoutInMs
option torestoreCache()
, so thatrestoreCache()
itself can properly abort slow downloads.The
waitForResultWithTimeLimit()
wrapper aroundrestoreCache()
remains as a second line of defense, but with a higher 10-minute time limit, to guard against cache restore hangs outside segment downloads.See linked issue for rationale and alternatives considered.
Risk assessment
For internal use only. Please select the risk level of this change:
Merge / deployment checklist