Fix calculation of tenant usage status #2244

ramondeklein · 2024-07-26T12:45:25Z

This PR should fix the reported tenant usage status and was implemented after the discussion about how to report / display usage information in EOS.

Current situation

The following usage is currently reported for the dev-1 cluster:

Capacity:      136419790848 (127 GiB)
Raw Capacity:  171798691840 (160 GiB)
Raw Usage:      35378900992 (32.95 GiB)
Usage:          35378900992 (32.95 GiB)

I would have expected the following definitions, but those are not all true:

🟢 Raw Capacity is the physical capacity of all disks.
🟢 Raw Usage is the amount of data that is actually stored on disk (including erasure coding overhead).
🔴 Capacity is the usable capacity of for clients of the cluster (it subtracts the EC overhead from the raw capacity, so is always less than the raw capacity). However, it is actually the number of physical bytes left on the disk.
🔴 Usage is the number of bytes that is used by the object. It would have been fine if this was factoring in the block-overhead of objects, but it's currently the amount of data that is stored on disk, so identical to raw usage.

Some more details how the current values are obtained...

🟢 Raw capacity

The raw capacity is calculated as 8 * 10 * 2 GiB, so 160 GiB is perfectly fine. These are all values taken from the Tenant resource. The raw capacity is the total size of all disks combined.

🟢 Raw usage

The raw usage is calculated by adding all the used space (retrieved via statfs per drive in the cluster. This results in the total used disk space (aligned to block-size).

🔴 Capacity and 🔴 Usage

The usage and capacity in the tenant usage status is calculated using the following code (source):

storageInfo, err := adminClnt.StorageInfo(srvInfoCtx)
if err != nil {
	// show the error and continue
	klog.Infof("'%s/%s' Failed to get storage info: %v", tenant.Namespace, tenant.Name, err)
	return tenant, nil
}

// Add back "Usable Capacity" & "Internal" values in Tenant Status and in the UI
// How much is available: "Usable Capacity" in UI comes from "tenant.Status.Usage.Capacity"
// How much is used: "Internal" in UI comes from "tenant.Status.Usage.Usage"
UsedSpace := int64(0)      // How much is used
AvailableSpace := int64(0) // How much is available
for _, disk := range storageInfo.Disks {
	UsedSpace = UsedSpace + int64(disk.UsedSpace)
	AvailableSpace = AvailableSpace + int64(disk.AvailableSpace)
}
tenant.Status.Usage.Usage = UsedSpace
tenant.Status.Usage.Capacity = AvailableSpace

So:

capacity is sum(disk.AvailableSpace), where disk.AvailableSpace maps to info.Free (source) and represents the number of free bytes left.
usage is sum(disk.UsedSpace), where disk.UsedSpace maps to info.Used that is actually info.Total - info.Free, so it represents the number of physical bytes that are actually being used.

New situation

The raw capacity and usage are still the same. The only difference is that the raw capacity is now determined using calls to statfs to align with the other information (previously it was based on the tenant specification. This will normally result in the same value if your storage provider respects the PVC sizes. Note that Kind uses Rancher's local path storage provider that just mounts the host drive. That will result in incorrect reports, but Kind clusters are for development only, so this shouldn't affect normal operation.

The net capacity will now report the usable storage by clients of MinIO, so the net capacity will always be less than the raw capacity. When using EC:2 on a 4 disk system, then it will be 50% of the raw capacity. Using EC:3 on a 10 disk system, then it would be 70% of the raw capacity.

The net usage is also factoring in the parity of the pools, so it will also be less than the raw usage. Note that the following equation will be always true:

efficiency = (data disks / (data + parity disks)) = (net capacity / raw capacity) = (net usage / net capacity)

There are two flaws with this calculation, but it probably won't matter in practice:

The parity can be changed for a pool, so the calculations may be off. This doesn't happen a lot, but it may give skewed results.
The report net usage may be different than the sum of all object sizes. The raw usage always rounds the object size to the nearest upper block. Because net usage is derived from the raw usage, the total may be higher than the sum of all object sizes.

ramondeklein · 2024-07-26T14:45:47Z

I'm not sure why the test-tenant-hotfix-update (1.22.x, ubuntu-latest) action fails. I don't think it's related to these changes.

ramondeklein · 2024-08-05T16:32:37Z

@cniackz Can you review if you have bandwidth?

Fix calculation of tenant usage status

34fb753

ramondeklein requested review from dvaldivia, cniackz, jiuker, harshavardhana and cesnietor July 26, 2024 12:51

ramondeklein self-assigned this Jul 26, 2024

ramondeklein added the bug fix label Jul 26, 2024

jiuker approved these changes Jul 29, 2024

View reviewed changes

harshavardhana approved these changes Aug 6, 2024

View reviewed changes

harshavardhana merged commit ccade59 into minio:master Aug 6, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix calculation of tenant usage status #2244

Fix calculation of tenant usage status #2244

ramondeklein commented Jul 26, 2024 •

edited

Loading

ramondeklein commented Jul 26, 2024

ramondeklein commented Aug 5, 2024

Fix calculation of tenant usage status #2244

Fix calculation of tenant usage status #2244

Conversation

ramondeklein commented Jul 26, 2024 • edited Loading

Current situation

🟢 Raw capacity

🟢 Raw usage

🔴 Capacity and 🔴 Usage

New situation

ramondeklein commented Jul 26, 2024

ramondeklein commented Aug 5, 2024

ramondeklein commented Jul 26, 2024 •

edited

Loading