Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CML AMIs #162

Closed
DavidGOrtega opened this issue Jul 16, 2020 · 7 comments
Closed

CML AMIs #162

DavidGOrtega opened this issue Jul 16, 2020 · 7 comments
Assignees
Labels
cml-runner Subcommand

Comments

@DavidGOrtega
Copy link
Contributor

We are right now based on docker, however for cloud runners we could also work with AMIs.
These images will remove the following pain points:

  • Accelerate the bootstrapping of the machine if it has properly installed the GPU drivers etc...
  • Not having to use connections to hub.docker
  • For spot instances we could access the dispose mechanism easier
@elleobrien
Copy link
Contributor

First point seems like a major potential optimization. Pushing/pulling big docker images and installing libraries (for deep learning especially) every time seems to be a significant bottleneck on the cloud-gpu-case- check out the timestamps in the Action logs.

An AMI that requires environment configuration only once is potentially a huge time saver in the long run of a project.

@courentin
Copy link
Contributor

That's a very good idea, do you have any ideas of an implementation ?
I'd like to help !

@DavidGOrtega
Copy link
Contributor Author

@courentin thanks for your offering!
We are actually articulating this. Definitely the idea is generating an image with nvidia drivers and the Dockerfile contents with packer. The missing part would be adding a small bin to start the runners.

@courentin If you have experience with packer, lets do it!

@DavidGOrtega
Copy link
Contributor Author

DavidGOrtega commented Feb 23, 2021

This is now a real thing with the upcoming release of CML 0.3.0 AWS CML amis are available based on ubuntu 18.04
Missing are Azure images

@DavidGOrtega DavidGOrtega added the cml-runner Subcommand label Feb 23, 2021
@0x2b3bfa0
Copy link
Member

Currently we are recreating from scratch the instance environment each time we run a continuous integration job on Azure, because (apparently) we can't share instances images with a wider audience through the Azure Marketplace without having the Microsoft Partner qualification. (?)

The current workaround implies significantly longer total job execution times due to the increased instance configuration times, so being able to publish prebuilt CML base images to the Azure Marketplace —as we're already doing for AWS machine images— could be a nice improvement for end users.

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented May 19, 2021

I was wondering if we could move terraform-provider-iterative/cml to this repository. It would imply decoupling the region list from the provider code, but if we check it dynamically to solve iterative/terraform-provider-iterative#99, everything would be much more clear.

Related to #538: generating both the machine images and the container images from the same repository could increase the maintainability.

@0x2b3bfa0
Copy link
Member

Closing in favor of iterative/terraform-provider-iterative#196

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cml-runner Subcommand
Projects
None yet
Development

No branches or pull requests

4 participants