Skip to content

Commit

Permalink
Support drive anti-affinity for volumes
Browse files Browse the repository at this point in the history
Some optimal setups would require volumes to be allocated on unique disks.
ie, not more than one volume per disk.

By default, the volume scheduling algorithm choses the drive based on most
free capacity. This will end up allocating more than one volumes per disk.

This PR provides a way for such optimal setups, by using the storage class
parameters. Using a storage class with `directpv.min.io/unique-alloc-id: XXX`
parameter enables unique allocation for PVCs. This unique allocation id enables
the one-to-one cardinality for the drives and volumes.
  • Loading branch information
Praveenrajmani committed Sep 13, 2023
1 parent c26a1c1 commit fcc663d
Show file tree
Hide file tree
Showing 9 changed files with 209 additions and 53 deletions.
15 changes: 10 additions & 5 deletions docs/tools/create-storage-class.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,32 +18,37 @@

set -e -C -o pipefail

declare NAME DRIVE_LABEL
declare NAME DRIVE_LABELS LABELS

function init() {
if [[ $# -ne 2 ]]; then
cat <<EOF
USAGE:
create-storage-class.sh <NAME> <DRIVE-LABEL>
create-storage-class.sh <NAME> <DRIVE-LABELS,...>
ARGUMENTS:
NAME new storage class name.
DRIVE-LABEL drive labels to be attached.
DRIVE-LABELS comma separated drive labels to be attached.
EXAMPLE:
# Create new storage class 'fast-tier-storage' with drive labels 'directpv.min.io/tier: fast'
$ create-storage-class.sh fast-tier-storage 'directpv.min.io/tier: fast'
# Create new storage class with more than one drive label
$ create-storage-class.sh fast-tier-unique 'directpv.min.io/tier: fast,directpv.min.io/volume-claim-id: xxx'
EOF
exit 255
fi

NAME="$1"
DRIVE_LABEL="$2"
DRIVE_LABELS="$2"

if ! which kubectl >/dev/null 2>&1; then
echo "kubectl not found; please install"
exit 255
fi

LABELS=$(echo "$DRIVE_LABELS" | sed -e $'s/,/\\\n /g')
}

function main() {
Expand All @@ -67,7 +72,7 @@ metadata:
name: ${NAME}
parameters:
fstype: xfs
${DRIVE_LABEL}
${LABELS}
provisioner: directpv-min-io
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
Expand Down
90 changes: 62 additions & 28 deletions docs/volume-scheduling.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,19 @@
## Drive selection algorithm

DirectPV CSI controller selects suitable drive for `CreateVolume` request like below
1. Filesystem type and/or access-tier in the request is validated. DirectPV supports `xfs` filesystem only.
2. Each `DirectPVDrive` CRD object is checked whether the requested volume is already present or not. If present, the first drive containing the volume is selected.
3. As no `DirectPVDrive` CRD object has the requested volume, each drive is selected by
a. By requested capacity
b. By access-tier if requested
c. By topology constraints if requested
4. In the process of step (3), if more than one drive is selected, the maximum free capacity drive is picked.
5. If step (4) picks up more than one drive, a drive is randomly selected.
6. Finally the selected drive is updated with requested volume information.
7. If none of them are selected, an appropriate error is returned.
8. If any error in the above steps, Kubernetes retries the request.
9. In case of parallel requests and the same drive is selected, step (6) succeeds for any one of the request and fails for rest of the requests by Kubernetes.
1. Filesystem type and/or access-tier in the request is validated. DirectPV supports `xfs` filesystem only.
2. Each `DirectPVDrive` CRD object is checked whether the requested volume is already present or not. If present, the first drive containing the volume is selected.
3. As no `DirectPVDrive` CRD object has the requested volume, each drive is selected by
a. By requested capacity
b. By access-tier if requested
c. By topology constraints if requested
4. If volume claim id is set in the request parameter, filter the drives which doesn't have the respective claim id.
5. After the step (3) & step (4), if more than one drive is selected, the maximum free capacity drive is picked.
6. If step (5) picks up more than one drive, a drive is randomly selected.
7. Finally the selected drive is updated with requested volume information.
8. If none of them are selected, an appropriate error is returned.
9. If any error in the above steps, Kubernetes retries the request.
10. In case of parallel requests and the same drive is selected, step (6) succeeds for any one of the request and fails for rest of the requests by Kubernetes.

```text
╭╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╮
Expand All @@ -36,25 +37,31 @@ DirectPV CSI controller selects suitable drive for `CreateVolume` request like b
│ error │<----│ matched? │ | ╭╌╌╌╌╌╌╌V╌╌╌╌╌╌╌╮
└───────────┘ ╰╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯ | No │ Match by │
| No |<----│ capacity? │
╭╌╌╌╌╌╌╌V╌╌╌╌╌╌╌╮ | ╰╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯
| Elimitate | | | Yes
| drives by | | ╭╌╌╌╌╌╌╌V╌╌╌╌╌╌╌╮
| claim id | | No │ Match by │
╰╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯ |<----│ access-tier │
| | │ if requested? │
┌───────────┐ ╭╌╌╌╌╌╌╌V╌╌╌╌╌╌╌╮ | ╰╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯
│ Return │ No │ Is more than │ | | Yes
│ the first │<----│ one drive │ | ╭╌╌╌╌╌╌╌V╌╌╌╌╌╌╌╮
│ drive │ │ matched? │ | No │ Match by │
└─────^─────┘ ╰╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯ |<----access-tier
| | Yes | │ if requested?
| ┌───────V───────┐ | ╰╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯
| │ Filter drives │ | | Yes
| │ by maximum │ | ╭╌╌╌╌╌╌╌V╌╌╌╌╌╌╌╮
| │ free capacity │ | │ Match by │
| └───────────────┘ | No topology
| | |<----│ constraints
| ╭╌╌╌╌╌╌╌V╌╌╌╌╌╌╌╮ | │ if requested? │
| No │ Is more than │ | ╰╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯
+-----------│ one drive │ | | Yes
│ matched? │ | ┌───────V───────┐
╰╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯ | │ Append to │
| Yes +<----│ matched drives│
┌───────V───────┐ └───────────────┘
│ drive │ │ matched? │ | │ Match by │
└─────^─────┘ ╰╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯ | No topology
| | Yes |<----│ constraints
| ┌───────V───────┐ | │ if requested? │
| │ Filter drives │ | ╰╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯
| │ by maximum │ | | Yes
| │ free capacity │ | ┌───────V───────┐
| └───────────────┘ | Append to
| | +<----│ matched drives
| ╭╌╌╌╌╌╌╌V╌╌╌╌╌╌╌╮ └───────────────┘
| No │ Is more than │
+-----------│ one drive │
│ matched? │
╰╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯
| Yes
┌───────V───────┐
│ Return │
│ Randomly │
│ selected drive│
Expand Down Expand Up @@ -92,3 +99,30 @@ spec:
storage: 8Mi
EOF
```

### Unique drive selection

By default, DirectPV allocates drives for volumes based on the free capacity present on the drives. So, the drive with most free capacity gets selected for a volume provisioning request. For setups that require unique drive allocation for its volumes where one drive cannot share more than one volume, use the below steps to enable unique allocation of drives.

* Create new storage class with `directpv.min.io/volume-claim-id: <any-uid>` using [create-storage-class.sh script](../tools/create-storage-class.sh). Below is an example:

```sh
# NOTE: The allocation id must be 47 characters or less and should be valid label value.
#
# Create new storage class 'directpv-optimal' with the label 'directpv.min.io/volume-claim-id: 555e99eb-e255-4407-83e3-fc443bf20f86'
$ create-storage-class.sh directpv-optimal 'directpv.min.io/volume-claim-id: 555e99eb-e255-4407-83e3-fc443bf20f86'
```

* Use newly created storage class in [volume provisioning](./volume-provisioning.md). Below is an example:

```yaml
volumeClaimTemplates:
- metadata:
name: minio-data-1
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 16Mi
storageClassName: directpv-optimal
```
9 changes: 9 additions & 0 deletions pkg/apis/directpv.min.io/types/label.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,15 @@ const (

// SuspendLabelKey denotes if the volume is suspended.
SuspendLabelKey LabelKey = consts.GroupName + "/suspend"

// VolumeClaimIDLabelKey label key to denote the unique allocation of drives for volumes
VolumeClaimIDLabelKey LabelKey = consts.GroupName + "/volume-claim-id"

// VolumeClaimIDLabelKeyPrefix label key prefix for volume claim id to be set on the drive
VolumeClaimIDLabelKeyPrefix = consts.GroupName + "/volume-claim-id-"

// ClaimIDLabelKey label key to denote the claim id of the volumes
ClaimIDLabelKey LabelKey = consts.GroupName + "/claim-id"
)

// LabelValue is a type definition for label value
Expand Down
16 changes: 16 additions & 0 deletions pkg/apis/directpv.min.io/v1beta1/drive.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
package v1beta1

import (
"strconv"
"strings"

"github.com/minio/directpv/pkg/apis/directpv.min.io/types"
Expand Down Expand Up @@ -221,6 +222,21 @@ func (drive DirectPVDrive) GetNodeID() types.NodeID {
return types.NodeID(drive.getLabel(types.NodeLabelKey))
}

// HasVolumeClaimID checks if the provided volume claim id is set on the drive.
func (drive *DirectPVDrive) HasVolumeClaimID(claimID string) bool {
return drive.GetLabels()[types.VolumeClaimIDLabelKeyPrefix+claimID] == strconv.FormatBool(true)
}

// SetVolumeClaimID sets the provided claim id on the drive.
func (drive *DirectPVDrive) SetVolumeClaimID(claimID string) {
drive.SetLabel(types.LabelKey(types.VolumeClaimIDLabelKeyPrefix+claimID), types.LabelValue(strconv.FormatBool(true)))
}

// RemoveVolumeClaimID removes the volume claim id label.
func (drive *DirectPVDrive) RemoveVolumeClaimID(claimID string) {
drive.RemoveLabel(types.LabelKey(types.VolumeClaimIDLabelKeyPrefix + claimID))
}

// SetLabel sets label to this drive.
func (drive *DirectPVDrive) SetLabel(key types.LabelKey, value types.LabelValue) bool {
values := drive.GetLabels()
Expand Down
10 changes: 10 additions & 0 deletions pkg/apis/directpv.min.io/v1beta1/volume.go
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,16 @@ func (volume DirectPVVolume) IsSuspended() bool {
return string(volume.getLabel(types.SuspendLabelKey)) == strconv.FormatBool(true)
}

// SetClaimID sets the provided claim id on the volume.
func (volume *DirectPVVolume) SetClaimID(claimID string) {
volume.SetLabel(types.ClaimIDLabelKey, types.LabelValue(claimID))
}

// GetClaimID gets the claim id set on the volume.
func (volume *DirectPVVolume) GetClaimID() string {
return string(volume.getLabel(types.ClaimIDLabelKey))
}

// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

// DirectPVVolumeList denotes list of volumes.
Expand Down
15 changes: 14 additions & 1 deletion pkg/csi/controller/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -145,11 +145,18 @@ func (c *Server) CreateVolume(ctx context.Context, req *csi.CreateVolumeRequest)
return nil, status.Errorf(codes.InvalidArgument, "unsupported filesystem type %v for volume %v", req.GetVolumeCapabilities()[0].GetMount().GetFsType(), name)
}

var volumeClaimID string
for key, value := range req.GetParameters() {
if key == string(directpvtypes.AccessTierLabelKey) {
switch key {
case string(directpvtypes.AccessTierLabelKey):
if _, err := directpvtypes.StringsToAccessTiers(value); err != nil {
return nil, status.Errorf(codes.InvalidArgument, "unknown access-tier %v for volume %v; %v", value, name, err)
}
case string(directpvtypes.VolumeClaimIDLabelKey):
if err := validVolumeClaimID(value); err != nil {
return nil, status.Errorf(codes.InvalidArgument, "invalid %v value; %v", directpvtypes.VolumeClaimIDLabelKey, err)
}
volumeClaimID = value
}
}

Expand Down Expand Up @@ -177,6 +184,9 @@ func (c *Server) CreateVolume(ctx context.Context, req *csi.CreateVolumeRequest)
drive.GetDriveName(),
size,
)
if volumeClaimID != "" {
newVolume.SetClaimID(volumeClaimID)
}

if _, err := client.VolumeClient().Create(ctx, newVolume, metav1.CreateOptions{}); err != nil {
if !errors.IsAlreadyExists(err) {
Expand Down Expand Up @@ -206,6 +216,9 @@ func (c *Server) CreateVolume(ctx context.Context, req *csi.CreateVolumeRequest)
}

if drive.AddVolumeFinalizer(req.GetName()) {
if volumeClaimID != "" {
drive.SetVolumeClaimID(volumeClaimID)
}
drive.Status.FreeCapacity -= size
drive.Status.AllocatedCapacity += size

Expand Down
31 changes: 31 additions & 0 deletions pkg/csi/controller/utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ func matchDrive(drive *types.Drive, req *csi.CreateVolumeRequest) bool {
if len(accessTiers) > 0 && drive.GetAccessTier() != accessTiers[0] {
return false
}
case string(directpvtypes.VolumeClaimIDLabelKey):
default:
if labels[key] != value {
return false
Expand Down Expand Up @@ -136,6 +137,11 @@ func selectDrive(ctx context.Context, req *csi.CreateVolumeRequest) (*types.Driv
return nil, status.Error(codes.FailedPrecondition, "no drive found")
}

drives = filterByVolumeClaimID(req, drives)
if len(drives) == 0 {
return nil, status.Error(codes.ResourceExhausted, "no unique drive found for the provided claim id")
}

maxFreeCapacity := int64(-1)
var maxFreeCapacityDrives []types.Drive
for _, drive := range drives {
Expand All @@ -159,3 +165,28 @@ func selectDrive(ctx context.Context, req *csi.CreateVolumeRequest) (*types.Driv

return &maxFreeCapacityDrives[n.Int64()], nil
}

func validVolumeClaimID(claimID string) error {
_, err := directpvtypes.NewLabelKey(directpvtypes.VolumeClaimIDLabelKeyPrefix + claimID)
if err == nil {
_, err = directpvtypes.NewLabelValue(claimID)
}
return err
}

func filterByVolumeClaimID(req *csi.CreateVolumeRequest, drives []types.Drive) []types.Drive {
for key, value := range req.GetParameters() {
if key == string(directpvtypes.VolumeClaimIDLabelKey) && value != "" {
var uniqueDrives []types.Drive
for _, drive := range drives {
if drive.HasVolumeClaimID(value) {
// Do not allocate another volume with this claim id
continue
}
uniqueDrives = append(uniqueDrives, drive)
}
return uniqueDrives
}
}
return drives
}
Loading

0 comments on commit fcc663d

Please sign in to comment.