Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix calculation of tenant usage status #2244

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

ramondeklein
Copy link
Contributor

@ramondeklein ramondeklein commented Jul 26, 2024

This PR should fix the reported tenant usage status and was implemented after the discussion about how to report / display usage information in EOS.

Current situation

The following usage is currently reported for the dev-1 cluster:

Capacity:      136419790848 (127 GiB)
Raw Capacity:  171798691840 (160 GiB)
Raw Usage:      35378900992 (32.95 GiB)
Usage:          35378900992 (32.95 GiB)

I would have expected the following definitions, but those are not all true:

🟢 Raw Capacity is the physical capacity of all disks.
🟢 Raw Usage is the amount of data that is actually stored on disk (including erasure coding overhead).
🔴 Capacity is the usable capacity of for clients of the cluster (it subtracts the EC overhead from the raw capacity, so is always less than the raw capacity). However, it is actually the number of physical bytes left on the disk.
🔴 Usage is the number of bytes that is used by the object. It would have been fine if this was factoring in the block-overhead of objects, but it's currently the amount of data that is stored on disk, so identical to raw usage.

Some more details how the current values are obtained...

🟢 Raw capacity

The raw capacity is calculated as 8 * 10 * 2 GiB, so 160 GiB is perfectly fine. These are all values taken from the Tenant resource. The raw capacity is the total size of all disks combined.

🟢 Raw usage

The raw usage is calculated by adding all the used space (retrieved via statfs per drive in the cluster. This results in the total used disk space (aligned to block-size).

🔴 Capacity and 🔴 Usage

The usage and capacity in the tenant usage status is calculated using the following code (source):

storageInfo, err := adminClnt.StorageInfo(srvInfoCtx)
if err != nil {
	// show the error and continue
	klog.Infof("'%s/%s' Failed to get storage info: %v", tenant.Namespace, tenant.Name, err)
	return tenant, nil
}

// Add back "Usable Capacity" & "Internal" values in Tenant Status and in the UI
// How much is available: "Usable Capacity" in UI comes from "tenant.Status.Usage.Capacity"
// How much is used: "Internal" in UI comes from "tenant.Status.Usage.Usage"
UsedSpace := int64(0)      // How much is used
AvailableSpace := int64(0) // How much is available
for _, disk := range storageInfo.Disks {
	UsedSpace = UsedSpace + int64(disk.UsedSpace)
	AvailableSpace = AvailableSpace + int64(disk.AvailableSpace)
}
tenant.Status.Usage.Usage = UsedSpace
tenant.Status.Usage.Capacity = AvailableSpace

So:

  • capacity is sum(disk.AvailableSpace), where disk.AvailableSpace maps to info.Free (source) and represents the number of free bytes left.
  • usage is sum(disk.UsedSpace), where disk.UsedSpace maps to info.Used that is actually info.Total - info.Free, so it represents the number of physical bytes that are actually being used.

New situation

The raw capacity and usage are still the same. The only difference is that the raw capacity is now determined using calls to statfs to align with the other information (previously it was based on the tenant specification. This will normally result in the same value if your storage provider respects the PVC sizes. Note that Kind uses Rancher's local path storage provider that just mounts the host drive. That will result in incorrect reports, but Kind clusters are for development only, so this shouldn't affect normal operation.

The net capacity will now report the usable storage by clients of MinIO, so the net capacity will always be less than the raw capacity. When using EC:2 on a 4 disk system, then it will be 50% of the raw capacity. Using EC:3 on a 10 disk system, then it would be 70% of the raw capacity.

The net usage is also factoring in the parity of the pools, so it will also be less than the raw usage. Note that the following equation will be always true:

efficiency = (data disks / (data + parity disks)) = (net capacity / raw capacity) = (net usage / net capacity)

There are two flaws with this calculation, but it probably won't matter in practice:

  • The parity can be changed for a pool, so the calculations may be off. This doesn't happen a lot, but it may give skewed results.
  • The report net usage may be different than the sum of all object sizes. The raw usage always rounds the object size to the nearest upper block. Because net usage is derived from the raw usage, the total may be higher than the sum of all object sizes.

@ramondeklein
Copy link
Contributor Author

I'm not sure why the test-tenant-hotfix-update (1.22.x, ubuntu-latest) action fails. I don't think it's related to these changes.

@ramondeklein ramondeklein self-assigned this Jul 26, 2024
@ramondeklein
Copy link
Contributor Author

@cniackz Can you review if you have bandwidth?

@harshavardhana harshavardhana merged commit ccade59 into minio:master Aug 6, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants