Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a mechanism to wait for delete queue to drain #9707

Merged
merged 10 commits into from
Apr 1, 2020

Conversation

pcd1193182
Copy link
Contributor

Motivation and Context

When doing redacted send/recv, many workflows involve deleting files that contain sensitive data. Because of the way zfs handles file deletions, snapshots taken quickly after a rm operation can sometimes still contain the file in question, especially if the file is very large. This can result in issues for redacted send/recv users who expect the deleted files to be redacted in the send streams, and not appear in their clones.

Description

This PR duplicates much of the zpool wait related logic into a zfs wait command. I would be open to a proposal to combine a bunch of this logic if people would prefer that, though some changes would need to be made to correctly dispatch the waiting in the kernel into the appropriate dsl_dir_wait or spa_wait as needed.

How Has This Been Tested?

New zfs-test added, and tested manually with files being held open by file descriptors. Verified that if there is nothing in the delete queue, the command returns immediately.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@pcd1193182 pcd1193182 added Type: Feature Feature request or new feature Status: Code Review Needed Ready for review and testing labels Dec 10, 2019
module/zfs/dsl_dir.c Outdated Show resolved Hide resolved
module/zfs/zfs_ioctl.c Show resolved Hide resolved
man/man8/zfs-wait.8 Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Dec 10, 2019

Codecov Report

Merging #9707 into master will decrease coverage by 14%.
The diff coverage is 25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9707      +/-   ##
==========================================
- Coverage      79%      66%     -14%     
==========================================
  Files         384      363      -21     
  Lines      121788   117604    -4184     
==========================================
- Hits        96808    77558   -19250     
- Misses      24980    40046   +15066
Flag Coverage Δ
#kernel 68% <40%> (-11%) ⬇️
#user 47% <5%> (-20%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 25df8fb...dcbb78a. Read the comment docs.

module/zfs/dsl_dir.c Outdated Show resolved Hide resolved
module/zfs/dsl_dir.c Show resolved Hide resolved
module/zfs/dsl_dir.c Show resolved Hide resolved
module/zfs/dsl_dir.c Show resolved Hide resolved
module/os/linux/zfs/zfs_vfsops.c Outdated Show resolved Hide resolved
man/man8/zfs-wait.8 Outdated Show resolved Hide resolved
module/zfs/zfs_ioctl.c Outdated Show resolved Hide resolved
module/zfs/dsl_dataset.c Outdated Show resolved Hide resolved
module/zfs/dsl_dir.c Outdated Show resolved Hide resolved
module/zfs/dsl_dir.c Outdated Show resolved Hide resolved
module/zfs/dsl_destroy.c Outdated Show resolved Hide resolved
@pcd1193182 pcd1193182 force-pushed the dq_wait branch 4 times, most recently from 7a08d34 to 892dd68 Compare February 26, 2020 16:54
@pcd1193182 pcd1193182 force-pushed the dq_wait branch 2 times, most recently from 8e7307d to ae6f413 Compare March 6, 2020 18:44
include/sys/fs/zfs.h Outdated Show resolved Hide resolved
include/sys/fs/zfs.h Outdated Show resolved Hide resolved
module/zfs/dsl_dir.c Show resolved Hide resolved
module/zfs/dsl_dir.c Outdated Show resolved Hide resolved
module/zfs/zfs_ioctl.c Outdated Show resolved Hide resolved
module/zfs/zfs_ioctl.c Show resolved Hide resolved
@behlendorf behlendorf requested a review from jgallag88 March 12, 2020 17:59
@behlendorf
Copy link
Contributor

From my perspective this is ready to go. But if possible I'd like to get a second approving review.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Mar 17, 2020
@ahrens
Copy link
Member

ahrens commented Mar 17, 2020

@jgallag88 and I will take another look.

module/os/linux/zfs/zfs_vfsops.c Outdated Show resolved Hide resolved
include/sys/dsl_dir.h Outdated Show resolved Hide resolved
module/zfs/dsl_dataset.c Outdated Show resolved Hide resolved
module/zfs/dsl_dataset.c Outdated Show resolved Hide resolved
mutex_enter(&dd->dd_activity_lock);
dd->dd_activity_count++;

dsl_dataset_long_hold(ds, FTAG);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a long hold? dsl_dir_cancel_waiters waits for all the threads doing zfs_ioc_wait_fs to wake up and finish using the dataset, which seems like enough to allow the dataset to be safely destroyed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could probably rewrite this correctly to work without having long-holds for each waiter, but I don't know if there would be a significant benefit? I would want to go through each case where we look at the longholds and fix them up to include the waiters count, which we basically get for free by just having long-holds for the waiters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"long holds" are normally used to prevent the dataset from being destroyed (but allow other manipulation of the dataset, e.g. renaming), but in this case we are not preventing the destruction but rather forcing destruction (and unmounting) to cancel the waiters. The long-hold ensures that the dataset_t / dsl_dir_t do not go away while we are waiting, without having a "real" hold (which must be short-lived because the pool config lock blocks spa_sync). An alternative mechanism for keeping the dataset around could be developed but this seemed simpler. Do we have a comment explaining all that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a lot of that is summed up in the comment that explains what long_holds are, though the specific case for the activity holds isn't written down anywhere. I'll add a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get a long-hold here so that the dsl_dataset_t and dsl_dir_t
aren't evicted while we're waiting.

My thinking is: any thread that wants to evict/destroy a dsl_dataset_t needs to wake up any waiters that exist, which it does by calling dsl_dir_cancel_waiters. By the time dsl_dir_cancel_waiters has returned, the threads that were doing the waiting have woken up and finished using the dsl_dataset_t and dsl_dir_t, and the dataset can be safely destroyed, even without having taken a long hold. Let me know if I'm missing something.

I don't think this is a huge deal either way, but I thought it might be simpler to not have to compare the number of long holds to the numbers of waiters when we are checking whether we can unmount / destroy a dataset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is correct, we could safely switch to that approach if needed. I'm just not sure if it is enough better to be worth reworking a change that is otherwise ready to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I'm OK with this as is.

module/zfs/dsl_dataset.c Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Mar 19, 2020

Codecov Report

Merging #9707 into master will decrease coverage by 0.12%.
The diff coverage is 76.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9707      +/-   ##
==========================================
- Coverage   79.28%   79.15%   -0.13%     
==========================================
  Files         385      385              
  Lines      122467   122616     +149     
==========================================
- Hits        97094    97057      -37     
- Misses      25373    25559     +186     
Flag Coverage Δ
#kernel 79.63% <81.44%> (+0.01%) ⬆️
#user 65.96% <54.54%> (+0.75%) ⬆️
Impacted Files Coverage Δ
include/sys/dsl_dir.h 100.00% <ø> (ø)
module/zfs/dsl_dir.c 88.75% <68.42%> (-1.10%) ⬇️
cmd/zfs/zfs_main.c 82.57% <69.44%> (-0.13%) ⬇️
lib/libzfs/libzfs_dataset.c 76.39% <71.42%> (-0.07%) ⬇️
module/zfs/dsl_dataset.c 92.59% <77.77%> (-0.09%) ⬇️
module/zfs/zfs_ioctl.c 86.31% <88.00%> (+0.01%) ⬆️
lib/libzfs_core/libzfs_core.c 86.45% <100.00%> (+0.27%) ⬆️
module/os/linux/zfs/zfs_dir.c 82.91% <100.00%> (-0.31%) ⬇️
module/os/linux/zfs/zfs_vfsops.c 75.75% <100.00%> (+0.02%) ⬆️
module/zfs/dsl_destroy.c 93.60% <100.00%> (+0.20%) ⬆️
... and 71 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 652bdc9...2aed9fc. Read the comment docs.

@behlendorf
Copy link
Contributor

@pcd1193182 would you mind rebasing this to resolve the conflict.

@behlendorf
Copy link
Contributor

@pcd1193182 there appears to be a locking problem in the updated PR which needs to be resolved. Aside from that this looks good.

[ 3660.685503] VERIFY(dsl_pool_config_held(ds->ds_dir->dd_pool)) failed
[ 3660.692369] PANIC at dmu_objset.c:648:dmu_objset_from_ds()
[ 3660.698814] Showing stack for process 23044
[ 3660.704305] CPU: 0 PID: 23044 Comm: zfs Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.18.1.el7.x86_64 #1
[ 3660.715582] Hardware name: Amazon EC2 m5d.large/, BIOS 1.0 10/16/2017
[ 3660.721736] Call Trace:
[ 3660.725989]  [<ffffffff89f7b416>] dump_stack+0x19/0x1b
[ 3660.809423]  [<ffffffffc06ba54b>] spl_dumpstack+0x2b/0x30 [spl]
[ 3660.815267]  [<ffffffffc06ba619>] spl_panic+0xc9/0x110 [spl]
[ 3660.907000]  [<ffffffffc07f585b>] dmu_objset_from_ds+0x22b/0x260 [zfs]
[ 3660.913129]  [<ffffffffc0840cdd>] dsl_dir_activity_in_progress+0x5d/0x260 [zfs]
[ 3660.938156]  [<ffffffffc0846042>] dsl_dir_wait+0x82/0xe0 [zfs]
[ 3660.944041]  [<ffffffffc08f10a4>] zfs_ioc_wait_fs+0x164/0x240 [zfs]
[ 3660.949960]  [<ffffffffc08f9554>] zfsdev_ioctl_common+0x1a4/0x5b0 [zfs]
[ 3660.962239]  [<ffffffffc09374d4>] zfsdev_ioctl+0x54/0xf0 [zfs]
[ 3660.967950]  [<ffffffff89a5fde0>] do_vfs_ioctl+0x3a0/0x5a0
[ 3660.973500]  [<ffffffff89a60081>] SyS_ioctl+0xa1/0xc0
[ 3660.978901]  [<ffffffff89f8dede>] system_call_fastpath+0x25/0x2a

Signed-off-by: Paul Dagnelie <pcd@delphix.com>
@behlendorf behlendorf added Status: Revision Needed Changes are required for the PR to be accepted and removed Status: Accepted Ready to integrate (reviewed, tested) labels Mar 31, 2020
@behlendorf
Copy link
Contributor

Updated to "Revision Needed" until the VERIFY failure mentioned above is resolved.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Revision Needed Changes are required for the PR to be accepted labels Apr 1, 2020
@behlendorf behlendorf merged commit 5a42ef0 into openzfs:master Apr 1, 2020
@behlendorf
Copy link
Contributor

Thanks, merged.

jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
Add a mechanism to wait for delete queue to drain.

When doing redacted send/recv, many workflows involve deleting files 
that contain sensitive data. Because of the way zfs handles file 
deletions, snapshots taken quickly after a rm operation can sometimes 
still contain the file in question, especially if the file is very 
large. This can result in issues for redacted send/recv users who 
expect the deleted files to be redacted in the send streams, and not 
appear in their clones.

This change duplicates much of the zpool wait related logic into a 
zfs wait command, which can be used to wait until the internal
deleteq has been drained.  Additional wait activities may be added 
in the future. 

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#9707
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested) Type: Feature Feature request or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants