Skip to content

Support multi-attach volumes #1467

Open
@r4victor

Description

@r4victor

Currently, dstack does not allow users to create volumes that are guaranteed to attach to multiple instances (multi-attach can work implicitly for backends that have multi-attach enabled by default such as runpod). The proposal is to let users specify multi-attach as a requirement in volume configuration:

type: volume
name: my-multi-attach-volume
backend: aws
region: eu-central-1
size: 500GB
multiattach: true

dstack would then create a volume of an appropriate type and setup so that attaching the volume to multiple instances in read-write mode is guaranteed to work.

Backends support for multi-attach

Multi-attach is supported in most clouds that provide network storage. Block storage services (EBS, GCP Disks, etc) have limited multi-attach capabilities and require cluster management software such as Pacemaker and cluster file system such as GFS2. Regular file systems (XFS, EXT4) may lead to data/fs corruption even for read-only access. Major clouds offer managed network file systems as a more general-purpose storage for multi-attach (AWS EFS, AWS FSx for Lustre, GCP Filestore, Azure NetApp Files). EFS is inferior to EBS performance-wise, but others such as GCP Filestore are comparable with block storages. dstack can support both multi-attach block storages and network file systems eventually, but NFSs seem more suitable as a default implementation for multi-attach volumes.

  • AWS. Supports EBS Multi-Attach for io1 (only in three regions) and io2 (in all regions). Offers EFS (general-purpose NFS) and Amazon FSx for Lustre (high-performance).
  • GCP. Persistent Disk can be attached to multiple VMs in read-only mode. Supports multi-attach read-write but for two VMs at most. Filestore (NFS) offers a versatile alternative to Persistent Disk with comparable performance. It's the only viable read-write storage for multi-device TPU Pods.
  • Azure. Multi-attach is supported via Shared Disks. Azure NetApp Files is an NFS service comparable to Filestore.
  • OCI. Block Volumes can be attached to multiple instances. OCI File Storage is an NFS service comparable to Filestore.
  • RunPod. Supports multi-attach by default and already available in dstack.
  • Lambda. Supports multi-attach by default.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions