Skip to content
This repository has been archived by the owner on Nov 1, 2023. It is now read-only.

Add timeout to setup scripts #1659

Merged
merged 5 commits into from
Mar 1, 2022

Conversation

ranweiler
Copy link
Member

@ranweiler ranweiler commented Feb 16, 2022

Add a setup script-specific timeout of 59 minutes. This is just shorter than the service-side NODE_EXPIRATION_TIME which otherwise garbage collects nodes whose setup scripts are stuck or taking too long.

Ideally, the timeout would be user-configurable within some range. As-is, we'd only ever want to increase it, and doing so would require dynamically updating the service-side limit (which we'd probably want to revert to something short, after the setup script is done.

With this change, the high-level cause of the timeout is clear, instead of the closest error being something indirect, like "node reimaged during task execution".

Tested by creating a Linux libfuzzer basic job with a setup.sh that invokes yes >/dev/null. All task VMs got stuck in setting_up, then failed with an explicit error of "setup script timed out".

Closes #1658.

@ranweiler ranweiler merged commit 0a6b589 into microsoft:main Mar 1, 2022
@ranweiler ranweiler deleted the setup-script-timeout branch March 1, 2022 07:00
@ghost ghost locked as resolved and limited conversation to collaborators Mar 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task setup scripts should run with a timeout
4 participants