-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Behavior of ext4 on ZVOL (allocated with overcommit) when out of space on pool occurs. #2458
Comments
There are twofold solution to the problem Pavel set forth: Firstly, we could add an ad-hoc callback to struct backing_dev_info, for example, block_device_full(). Then, zvol could register there its zvol_device_full() method to return "true" if zfs pool is close to depletion. And, ext4 running on zvol might call that method in the context of syscalls to return ENOSPC when the method returns "true". In fact, block_device_full() callback may be called in exactly the same places where ext4 checks for disk quota. Secondly, if zvol cannot process incoming request due to pool depletion, it could put the request into a separate queue and schedule periodical checks for free space (e.g. once per a few seconds), rather than failing request immediately with EIO. Later, when system administrator free some amount of disk space, those requests could be retried. In this approach, end users would observe that some apps are freezed for while, may be even for a long while, but they have a chance to proceed smoothly after some amount of disk space freed. |
Hello again! You could see code examples for this approach from ext4 side here https://github.com/pavel-odintsov/openvz_rhel6_kernel_mirror/blob/f310e3d87d79ed06262852d770d5c140ae59422f/fs/ext4/ext4.h#L2180 (function check_bd_full) and from block device here https://github.com/pavel-odintsov/openvz_rhel6_kernel_mirror/blob/f310e3d87d79ed06262852d770d5c140ae59422f/drivers/block/ploop/dev.c#L3495 (function ploop_bd_full). This code written for OpenVZ project (for RHEL 2.6.32 kernel) and licensed under GPLv2 and you can use it without any problems. |
@pavel-odintsov If the number of outstanding IOs on zvols exceeded 32, then reordering could have occurred that might have worsened things. #2484 should address that. @mpatlasov what makes you think that EIO was returned? The ext4 source code will print that message for any kind of IO error, including ENOSPC: http://lxr.free-electrons.com/source/fs/ext4/page-io.c#L298 We return ENOSPC on datasets when we run out of space. It would be surprising if we returned EIO on a zvol given that the internal transaction paths are the same. A cursory look through the code suggests that this is indeed the case. In specific, zvol_write() calls dmu_tx_assign(), which should return ENOSPC if we lack sufficient space to complete the transaction. That being said, I suspect that your idea to retry requests would make the problem worse by allowing ext4 to believe things are fine when they are not. We are already doing that to some extent because of how zvol transaction processing works when all 32 zvol taskq threads are busy. That allows for reordering that could affect the flushes ext4 uses as part of its journalling. #2484 should address that by changing the zvol code to block those submitting requests until ZFS has decided on the fate of the request. That way the ext4 code does not proceed to do other things thinking that the request is in flight when it is possible for those other things to occur before the first request. Beyond that, we could look into implementing hooks to signal consumers like ext4 to stop writing when the pool is full such as marking the device congested or using the device full callback OpenVZ implemented. |
@pavel-odintsov My suggestion for this particular case is to mount ext4 with the |
behlendorf, your suggestion works really well in this case. We checked it deeply and found no issues! Thank you! |
@pavel-odintsov thanks for the follow up. I'm glad that solution worked for you. And as I mentioned above I don't think this is something we should be trying to tackle in ZFS so I'm closing out this issue. |
Hello!
My issue about behavior of ext4 on ZVOL (allocated with overcommit) when out of space on pool occurs.
I created mirrored pool for 220Gb:
And created zvol for 300 Gb over this pool:
And formatted it as ext4:
parted -s /dev/vzpool/ct101 mklabel gpt parted /dev/vzpool/ct101 "mkpart primary 1 -1" mkfs.ext4 /dev/vzpool/ct101-part1
After this I tried to emulate condition «out of space on pool» using following commands:
When I wrote about 230Gb on this partition I got following errors in dmesg:
And all space on spool was depleted:
And ext4 filesystem was corrupted totally:
But pool enlarge or fsck.ext4 can fix this issue.
Yep, I read about «unexpected behavior» in man zfs about overcommit. But it’s really useful thing for allocating a lot of partitions larger total disk space. It’s widely used in LVM.
Can you add some warnings to dmesg from ZFS about pool depletion and return «no space anymore» error to ext4 earlier then it really occur? And maybe is possible return some specific code to ext4 for safely save data and stop operations with this device.
The text was updated successfully, but these errors were encountered: