Skip to content

Latest commit

 

History

History
99 lines (71 loc) · 4.79 KB

README.md

File metadata and controls

99 lines (71 loc) · 4.79 KB

azure-stable-diffusion

A quick hack to run Stable Diffusion on an Azure GPU Spot Instance.

What

This is an Azure Resource Manager template that automatically deploys a GPU enabled spot instance running Ubuntu 20.04.

The template defaults to deploying NV6 Series VMs (Standard_NV6, Standard_NV6_Promo or, if you can get them, Standard_NV6ads_A10_v5) with the smallest possible managed SSD disk size (P4, 32GB). It also deploys (and mounts) an Azure File Share on the machine with (very) permissive access at /srv, which makes it quite easy to keep copies of your work between VM instantiations.

You will need to set a HUGGINGFACE_TOKEN environment variable when running the Makefile, and the machine will reboot after installing almost everything (it will automatically install GFPGAN and other auxiliary models when you run webui.sh --listen the first time).

Why

I was getting a little bored with the notebook workflow in Google Collab and wanted access to a more persistent GPU setup without breaking the bank (hence spot instances, which I can run on demand in my personal subscription).

Roadmap

  • Automatically set up Tailscale with --authkey to remove need for Gradio
  • Built-in auto-shutdown (easy to set via the portal, but I will be adding it to the template)
  • Experimental imaginAIry installation (just use experimental.yaml instead of cloud-init.yaml)
  • Set up AUTOMATIC1111's pretty amazing Web UI
  • change instance type to Spot for lower cost (also, removed availability set and changed SKU to be non-_Promo)
  • Install NVIDIA drivers and CUDA toolkit
  • remove unused packages from cloud-config
  • remove unnecessary commands from Makefile
  • remove unnecessary files from repo and trim history
  • fork from azure-k3s-cluster, new README

Cheapest Spots

Go to the Azure Resource Graph Explorer and enter this query to find the cheapest SKU/location combo for spot instances:

SpotResources 
| where type =~ 'microsoft.compute/skuspotpricehistory/ostype/location' 
| where sku.name in~ ('Standard_NV6','Standard_NV6ads_A10_v5') 
| where properties.osType =~ 'linux' 
| where location in~ ('westeurope','northeurope','eastus','eastus2') 
| project skuName = tostring(sku.name), osType = tostring(properties.osType), location, latestSpotPriceUSD = todouble(properties.spotPrices[0].priceUSD) 
| order by latestSpotPriceUSD asc 

Makefile commands

  • make keys - generates an SSH key for provisioning
  • make deploy-storage - deploys shared storage
  • make params - generates ARM template parameters
  • make deploy-compute - deploys VM
  • make view-deployment - view deployment status
  • make watch-deployment - watch deployment progress
  • make ssh - opens an SSH session to master0 and sets up TCP forwarding to localhost
  • make tail-cloud-init - opens an SSH session and tails the cloud-init log
  • make list-endpoints - list DNS aliases
  • make destroy-environment - destroys the entire environment (should not be the default)
  • make destroy-compute - destroys only the compute resources (should be the default if you want to save costs)
  • make destroy-storage - destroys the storage (should be avoided)

Recommended Sequence

az login
make keys
make deploy-storage
make params
make deploy-compute
make view-deployment
# Go to the Azure portal and check the deployment progress

# Clean up after we're done working for the day, to save costs (preserves storage)
make destroy-compute

# Clean up the whole thing (destroys storage as well)
make destroy-environment

Requirements

Azure Cloud Shell (which includes all the below in bash mode) or:

  • Python 3
  • The Azure CLI (pip install -U -r requirements.txt will install it)
  • GNU make (you can just read through the Makefile and type the commands yourself)

NVIDIA Support

Although it is possible to run SKUs like Standard_NV6ads_A10_v5 as spot instances, this should be considered experimental.

Deployment Notes

Pro Tip: You can set STORAGE_ACCOUNT_GROUP and STORAGE_ACCOUNT_NAME inside an .env file if you want to use a pre-existing storage account. As long as you use make to do everything, the value will be automatically overridden.

Disclaimers

Keep in mind that this is not meant to be used as a production service.