Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-attach volumes #1467

Open
Tracked by #1782
r4victor opened this issue Jul 30, 2024 · 0 comments
Open
Tracked by #1782

Support multi-attach volumes #1467

r4victor opened this issue Jul 30, 2024 · 0 comments

Comments

@r4victor
Copy link
Collaborator

Currently, dstack does not allow users to create volumes that are guaranteed to attach to multiple instances (multi-attach can work implicitly for backends that have multi-attach enabled by default such as runpod). The proposal is to let users specify multi-attach as a requirement in volume configuration:

type: volume
name: my-multi-attach-volume
backend: aws
region: eu-central-1
size: 500GB
multiattach: true

dstack would then create a volume of an appropriate type and setup so that attaching the volume to multiple instances in read-write mode is guaranteed to work.

Backends support for multi-attach

Multi-attach is supported in most clouds that provide network storage. Block storage services (EBS, GCP Disks, etc) have limited multi-attach capabilities and require cluster management software such as Pacemaker and cluster file system such as GFS2. Regular file systems (XFS, EXT4) may lead to data/fs corruption even for read-only access. Major clouds offer managed network file systems as a more general-purpose storage for multi-attach (AWS EFS, AWS FSx for Lustre, GCP Filestore, Azure NetApp Files). EFS is inferior to EBS performance-wise, but others such as GCP Filestore are comparable with block storages. dstack can support both multi-attach block storages and network file systems eventually, but NFSs seem more suitable as a default implementation for multi-attach volumes.

  • AWS. Supports EBS Multi-Attach for io1 (only in three regions) and io2 (in all regions). Offers EFS (general-purpose NFS) and Amazon FSx for Lustre (high-performance).
  • GCP. Persistent Disk can be attached to multiple VMs in read-only mode. Supports multi-attach read-write but for two VMs at most. Filestore (NFS) offers a versatile alternative to Persistent Disk with comparable performance. It's the only viable read-write storage for multi-device TPU Pods.
  • Azure. Multi-attach is supported via Shared Disks. Azure NetApp Files is an NFS service comparable to Filestore.
  • OCI. Block Volumes can be attached to multiple instances. OCI File Storage is an NFS service comparable to Filestore.
  • RunPod. Supports multi-attach by default and already available in dstack.
  • Lambda. Supports multi-attach by default.
@r4victor r4victor mentioned this issue Jul 31, 2024
42 tasks
@peterschmidt85 peterschmidt85 mentioned this issue Aug 22, 2024
41 tasks
@peterschmidt85 peterschmidt85 mentioned this issue Oct 3, 2024
49 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant