A quick hack to run Stable Diffusion on an Azure GPU Spot Instance.
This is an Azure Resource Manager template that automatically deploys a GPU enabled spot instance running Ubuntu 20.04.
The template defaults to deploying NV6 Series VMs (Standard_NV6
, Standard_NV6_Promo
or, if you can get them, Standard_NV6ads_A10_v5
) with the smallest possible managed SSD disk size (P4, 32GB). It also deploys (and mounts) an Azure File Share on the machine with (very) permissive access at /srv
, which makes it quite easy to keep copies of your work between VM instantiations.
You will need to set a HUGGINGFACE_TOKEN
environment variable when running the Makefile
, and the machine will reboot after installing almost everything (it will automatically install GFPGAN
and other auxiliary models when you run webui.sh --listen
the first time).
I was getting a little bored with the notebook workflow in Google Collab and wanted access to a more persistent GPU setup without breaking the bank (hence spot instances, which I can run on demand in my personal subscription).
- Automatically set up Tailscale with
--authkey
to remove need for Gradio - Built-in auto-shutdown (easy to set via the portal, but I will be adding it to the template)
- Experimental imaginAIry installation (just use
experimental.yaml
instead ofcloud-init.yaml
) - Set up AUTOMATIC1111's pretty amazing Web UI
- change instance type to
Spot
for lower cost (also, removed availability set and changed SKU to be non-_Promo
) - Install NVIDIA drivers and CUDA toolkit
- remove unused packages from
cloud-config
- remove unnecessary commands from
Makefile
- remove unnecessary files from repo and trim history
- fork from
azure-k3s-cluster
, newREADME
Go to the Azure Resource Graph Explorer and enter this query to find the cheapest SKU/location combo for spot instances:
SpotResources
| where type =~ 'microsoft.compute/skuspotpricehistory/ostype/location'
| where sku.name in~ ('Standard_NV6','Standard_NV6ads_A10_v5')
| where properties.osType =~ 'linux'
| where location in~ ('westeurope','northeurope','eastus','eastus2')
| project skuName = tostring(sku.name), osType = tostring(properties.osType), location, latestSpotPriceUSD = todouble(properties.spotPrices[0].priceUSD)
| order by latestSpotPriceUSD asc
make keys
- generates an SSH key for provisioningmake deploy-storage
- deploys shared storagemake params
- generates ARM template parametersmake deploy-compute
- deploys VMmake view-deployment
- view deployment statusmake watch-deployment
- watch deployment progressmake ssh
- opens an SSH session tomaster0
and sets up TCP forwarding tolocalhost
make tail-cloud-init
- opens an SSH session and tails thecloud-init
logmake list-endpoints
- list DNS aliasesmake destroy-environment
- destroys the entire environment (should not be the default)make destroy-compute
- destroys only the compute resources (should be the default if you want to save costs)make destroy-storage
- destroys the storage (should be avoided)
az login
make keys
make deploy-storage
make params
make deploy-compute
make view-deployment
# Go to the Azure portal and check the deployment progress
# Clean up after we're done working for the day, to save costs (preserves storage)
make destroy-compute
# Clean up the whole thing (destroys storage as well)
make destroy-environment
Azure Cloud Shell (which includes all the below in bash
mode) or:
- Python 3
- The Azure CLI (
pip install -U -r requirements.txt
will install it) - GNU
make
(you can just read through theMakefile
and type the commands yourself)
Although it is possible to run SKUs like Standard_NV6ads_A10_v5
as spot instances, this should be considered experimental.
Pro Tip: You can set
STORAGE_ACCOUNT_GROUP
andSTORAGE_ACCOUNT_NAME
inside an.env
file if you want to use a pre-existing storage account. As long as you usemake
to do everything, the value will be automatically overridden.
Keep in mind that this is not meant to be used as a production service.