-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make txg_wait_synced conditional in zfsvfs_teardown, for FreeBSD #16268
Conversation
This applies the same change in openzfs#9115 to FreeBSD. This was actually the old behavior in FreeBSD 12; it only regressed when FreeBSD support was added to OpenZFS. As far as I can tell, the timeline went like this: * Illumos's zfsvfs_teardown used an unconditional txg_wait_synced * Illumos added the dirty data check [^4] * FreeBSD merged in Illumos's conditional check [^3] * OpenZFS forked from Illumos * OpenZFS removed the dirty data check in openzfs#7795 [^5] * @mattmacy forked the OpenZFS repo and began to add FreeBSD support * OpenZFS PR openzfs#9115[^1] recreated the same dirty data check that Illumos used, in slightly different form. At this point the OpenZFS repo did not yet have multi-OS support. * Matt Macy merged in FreeBSD support in openzfs#8987[^2] , but it was based on slightly outdated OpenZFS code. In my local testing, this vastly improves the reboot speed of a server with a large pool that has 1000 datasets and is resilvering an HDD. [^1]: openzfs#9115 [^2]: openzfs#8987 [^3]: freebsd/freebsd-src@10b9d77 [^4]: illumos/illumos-gate@5aaeed5 [^5]: openzfs#7795 Sponsored by: Axcient Signed-off-by: Alan Somers <asomers@gmail.com>
I haven't looked too deep, but looking on original panic in #7753 I suspect there may be a race between last dnode of dataset being synced in dmu_objset_sync_dnodes() and userquota_updates_task() actually completed in another task. |
@amotin can you give a little more detail the Linux codepath in master: // module/os/linux/zfs/zfs_vfsops.c:zfsvfs_teardown()
/*
* Evict cached data. We must write out any dirty data before
* disowning the dataset.
*/
objset_t *os = zfsvfs->z_os;
boolean_t os_dirty = B_FALSE;
for (int t = 0; t < TXG_SIZE; t++) {
if (dmu_objset_is_dirty(os, t)) {
os_dirty = B_TRUE;
break;
}
}
if (!zfs_is_readonly(zfsvfs) && os_dirty) {
txg_wait_synced(dmu_objset_pool(zfsvfs->z_os), 0);
}
dmu_objset_evict_dbufs(zfsvfs->z_os);
dsl_dir_t *dd = os->os_dsl_dataset->ds_dir;
dsl_dir_cancel_waiters(dd); This PR: // module/os/freebsd/zfs/zfs_vfsops.c:zfsvfs_teardown()
/*
* Evict cached data. We must write out any dirty data before
* disowning the dataset.
*/
objset_t *os = zfsvfs->z_os;
boolean_t os_dirty = B_FALSE;
for (int t = 0; t < TXG_SIZE; t++) {
if (dmu_objset_is_dirty(os, t)) {
os_dirty = B_TRUE;
break;
}
}
if (!zfs_is_readonly(zfsvfs) && os_dirty)
txg_wait_synced(dmu_objset_pool(zfsvfs->z_os), 0);
dmu_objset_evict_dbufs(zfsvfs->z_os);
dd = zfsvfs->z_os->os_dsl_dataset->ds_dir;
dsl_dir_cancel_waiters(dd); |
@tonyhutter dmu_objset_is_dirty() checks whether objset has any elements in os_dirty_dnodes multilist. Entries are deleted from there by dmu_objset_sync_dnodes() before the dnode is actually synced and moved into os_synced_dnodes. But the last dnode may be not yet synced by that point, barely starting it. I am not sure flushing dbufs from under dnode_sync() is really dangerous, since they should still have references, but I guess they may be not evicted at very least. On top of that dsl_pool_sync() calls dmu_objset_sync_done() for all datasets only after they all synced their dnodes, that even more likely haven't completed yet. Meanwhile dmu_objset_sync_done() may call userquota_updates_task(), which as I can see may dirty few more dnodes. dnode_sync() for those dnodes is explicitly called later by dmu_objset_sync(), which again makes me think about flushing dbufs under it. After that meta-dnode is also synced, which may also have a bunch of dbufs. TLDR, I am not sure it is legal to call dmu_objset_is_dirty() out of syncing context, and while I am not sure it is fatal in this case, it seems to me it may end up not evicting some of dbufs from cache, as was desired. I also wonder if this code is expected to guarantee that dataset is synced in case of crash, if unmount is expected to provide such guaranties. |
As a quick thought, I wonder if |
This applies the same change in openzfs#9115 to FreeBSD. This was actually the old behavior in FreeBSD 12; it only regressed when FreeBSD support was added to OpenZFS. As far as I can tell, the timeline went like this: * Illumos's zfsvfs_teardown used an unconditional txg_wait_synced * Illumos added the dirty data check [^4] * FreeBSD merged in Illumos's conditional check [^3] * OpenZFS forked from Illumos * OpenZFS removed the dirty data check in openzfs#7795 [^5] * @mattmacy forked the OpenZFS repo and began to add FreeBSD support * OpenZFS PR openzfs#9115[^1] recreated the same dirty data check that Illumos used, in slightly different form. At this point the OpenZFS repo did not yet have multi-OS support. * Matt Macy merged in FreeBSD support in openzfs#8987[^2] , but it was based on slightly outdated OpenZFS code. In my local testing, this vastly improves the reboot speed of a server with a large pool that has 1000 datasets and is resilvering an HDD. [^1]: openzfs#9115 [^2]: openzfs#8987 [^3]: freebsd/freebsd-src@10b9d77 [^4]: illumos/illumos-gate@5aaeed5 [^5]: openzfs#7795 Sponsored by: Axcient Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Alan Somers <asomers@gmail.com> Closes openzfs#16268
This applies the same change in #9115 to FreeBSD. This was actually the old behavior in FreeBSD 12; it only regressed when FreeBSD support was added to OpenZFS. As far as I can tell, the timeline went like this:
In my local testing, this vastly improves the reboot speed of a server with a large pool that has 1000 datasets and is resilvering an HDD.
Sponsored by: Axcient
Signed-off-by: Alan Somers asomers@gmail.com
Motivation and Context
Without this change, unmount speed is very slow, especially when a scan is ongoing
Description
Copy the same changes as in #9115 to FreeBSD.
How Has This Been Tested?
Tested locally on a server with about 240 HDDs, 1000 datasets, and an ongoing resilver. Unmount speed went from 1-3 datasets per 10 seconds to "too fast to measure".
Types of changes
Checklist:
Signed-off-by
.Footnotes
https://github.com/illumos/illumos-gate/commit/5aaeed5c617553c4cec6328c1f4c19079a5a495a ↩
https://github.com/freebsd/freebsd-src/commit/10b9d77bf1ccf2f3affafa6261692cb92cf7e992 ↩
https://github.com/openzfs/zfs/pull/7795 ↩
https://github.com/openzfs/zfs/pull/9115 ↩
https://github.com/openzfs/zfs/pull/8987 ↩