Fix calculation of tenant usage status #2244
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR should fix the reported tenant usage status and was implemented after the discussion about how to report / display usage information in EOS.
Current situation
The following usage is currently reported for the
dev-1
cluster:I would have expected the following definitions, but those are not all true:
🟢 Raw Capacity is the physical capacity of all disks.
🟢 Raw Usage is the amount of data that is actually stored on disk (including erasure coding overhead).
🔴 Capacity is the usable capacity of for clients of the cluster (it subtracts the EC overhead from the raw capacity, so is always less than the raw capacity). However, it is actually the number of physical bytes left on the disk.
🔴 Usage is the number of bytes that is used by the object. It would have been fine if this was factoring in the block-overhead of objects, but it's currently the amount of data that is stored on disk, so identical to raw usage.
Some more details how the current values are obtained...
🟢 Raw capacity
The raw capacity is calculated as 8 * 10 * 2 GiB, so 160 GiB is perfectly fine. These are all values taken from the
Tenant
resource. The raw capacity is the total size of all disks combined.🟢 Raw usage
The raw usage is calculated by adding all the used space (retrieved via
statfs
per drive in the cluster. This results in the total used disk space (aligned to block-size).🔴 Capacity and 🔴 Usage
The
usage
andcapacity
in the tenant usage status is calculated using the following code (source):So:
capacity
issum(disk.AvailableSpace)
, wheredisk.AvailableSpace
maps toinfo.Free
(source) and represents the number of free bytes left.usage
issum(disk.UsedSpace)
, wheredisk.UsedSpace
maps toinfo.Used
that is actuallyinfo.Total - info.Free
, so it represents the number of physical bytes that are actually being used.New situation
The raw capacity and usage are still the same. The only difference is that the raw capacity is now determined using calls to
statfs
to align with the other information (previously it was based on the tenant specification. This will normally result in the same value if your storage provider respects the PVC sizes. Note that Kind uses Rancher's local path storage provider that just mounts the host drive. That will result in incorrect reports, but Kind clusters are for development only, so this shouldn't affect normal operation.The net capacity will now report the usable storage by clients of MinIO, so the net capacity will always be less than the raw capacity. When using EC:2 on a 4 disk system, then it will be 50% of the raw capacity. Using EC:3 on a 10 disk system, then it would be 70% of the raw capacity.
The net usage is also factoring in the parity of the pools, so it will also be less than the raw usage. Note that the following equation will be always true:
There are two flaws with this calculation, but it probably won't matter in practice: