-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow tasks to destroy themselves #289
Comments
Do you have a good list of the particular resources that cause this issue? My two cents is to allow those resources to be predefined by the user (vpc, security grp rules, etc) so tpi doesn't need to clean them and users can have that extra control. |
Alternatively, why not add an explicit cleanup step at the end of workflows? on: workflow_dispatch
jobs:
create:
runs-on: ubuntu-latest
steps:
- uses: iterative/setup-cml@v1
- run: cml runner create ${{ github.run_id }}
reproduce:
needs: create
runs-on: self-hoster
steps:
- uses: iterative/setup-dvc@v1
- run: dvc repro
delete:
if: always()
needs: reproduce
runs-on: ubuntu-latest
steps:
- uses: iterative/setup-cml@v1
- run: cml runner delete ${{ github.run_id }} On GitHub Actions, this can even be automated by using the |
Not quite since the post is on the setup level and runs on the same host that was defined for the job. So it would either delete the instance right after it was made or would run on the instance and have the same problem. the |
Oh, my! 🤦🏼♂️ Yes, you're absolutely right. |
As opinionated as requiring separate
Definitely, and we should also explore webhook-based scaling solutions like the ones proposed at https://docs.github.com/es/actions/hosting-your-own-runners/autoscaling-with-self-hosted-runners |
|
With the current implementation, instances can't destroy all the supporting resources, because of interdependency. For example, after deleting a security group, it's impossible to issue more API calls because there is no network connection.
Possible solutions include:
Using cloud-native templates like AWS CloudFormation, Google Cloud Deployment
Manager and Azure Resource Templates to let providers destroy everything.
Leaving cheap and costless resources in the cloud, and running a garbage
collector in every invocation to delete resources from past tasks.
Requiring users to explicitly delete resources after each task. This approach
is convenient with the launch/harvest lifecycle, but not for the CML runner.
The text was updated successfully, but these errors were encountered: