Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change rangelock handling in FreeBSD's zfs_getpages() #16643

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 54 additions & 18 deletions module/os/freebsd/zfs/zfs_vnops_os.c
Original file line number Diff line number Diff line change
Expand Up @@ -452,8 +452,10 @@ mappedread_sf(znode_t *zp, int nbytes, zfs_uio_t *uio)
if (!vm_page_wired(pp) && pp->valid == 0 &&
vm_page_busy_tryupgrade(pp))
vm_page_free(pp);
else
else {
vm_page_deactivate_noreuse(pp);
vm_page_sunbusy(pp);
}
zfs_vmobject_wunlock(obj);
}
} else {
Expand Down Expand Up @@ -3928,6 +3930,7 @@ zfs_getpages(struct vnode *vp, vm_page_t *ma, int count, int *rbehind,
if (zfs_enter_verify_zp(zfsvfs, zp, FTAG) != 0)
return (zfs_vm_pagerret_error);

object = ma[0]->object;
start = IDX_TO_OFF(ma[0]->pindex);
end = IDX_TO_OFF(ma[count - 1]->pindex + 1);

Expand All @@ -3936,33 +3939,47 @@ zfs_getpages(struct vnode *vp, vm_page_t *ma, int count, int *rbehind,
* Note that we need to handle the case of the block size growing.
*/
for (;;) {
uint64_t len;

blksz = zp->z_blksz;
len = roundup(end, blksz) - rounddown(start, blksz);

lr = zfs_rangelock_tryenter(&zp->z_rangelock,
rounddown(start, blksz),
roundup(end, blksz) - rounddown(start, blksz), RL_READER);
rounddown(start, blksz), len, RL_READER);
if (lr == NULL) {
if (rahead != NULL) {
*rahead = 0;
rahead = NULL;
}
if (rbehind != NULL) {
*rbehind = 0;
rbehind = NULL;
/*
* Avoid a deadlock with update_pages(). We need to
* hold the range lock when copying from the DMU, so
* give up the busy lock to allow update_pages() to
* proceed. We might need to allocate new pages, which
* isn't quite right since this allocation isn't subject
* to the page fault handler's OOM logic, but this is
* the best we can do for now.
*/
for (int i = 0; i < count; i++) {
ASSERT(vm_page_none_valid(ma[i]));
vm_page_xunbusy(ma[i]);
}
break;

lr = zfs_rangelock_enter(&zp->z_rangelock,
rounddown(start, blksz), len, RL_READER);

zfs_vmobject_wlock(object);
(void) vm_page_grab_pages(object, OFF_TO_IDX(start),
VM_ALLOC_NORMAL | VM_ALLOC_WAITOK | VM_ALLOC_ZERO,
ma, count);
zfs_vmobject_wunlock(object);
}
if (blksz == zp->z_blksz)
break;
zfs_rangelock_exit(lr);
}

object = ma[0]->object;
zfs_vmobject_wlock(object);
obj_size = object->un_pager.vnp.vnp_size;
zfs_vmobject_wunlock(object);
if (IDX_TO_OFF(ma[count - 1]->pindex) >= obj_size) {
if (lr != NULL)
zfs_rangelock_exit(lr);
zfs_rangelock_exit(lr);
zfs_exit(zfsvfs, FTAG);
return (zfs_vm_pagerret_bad);
}
Expand All @@ -3987,11 +4004,30 @@ zfs_getpages(struct vnode *vp, vm_page_t *ma, int count, int *rbehind,
* ZFS will panic if we request DMU to read beyond the end of the last
* allocated block.
*/
error = dmu_read_pages(zfsvfs->z_os, zp->z_id, ma, count, &pgsin_b,
&pgsin_a, MIN(end, obj_size) - (end - PAGE_SIZE));
for (int i = 0; i < count; i++) {
int count1, j, last_size;

if (lr != NULL)
zfs_rangelock_exit(lr);
if (vm_page_any_valid(ma[i])) {
ASSERT(vm_page_all_valid(ma[i]));
continue;
}
for (j = i + 1; j < count; j++) {
if (vm_page_any_valid(ma[j])) {
ASSERT(vm_page_all_valid(ma[j]));
break;
}
}
count1 = j - i;
last_size = j == count ?
MIN(end, obj_size) - (end - PAGE_SIZE) : PAGE_SIZE;
error = dmu_read_pages(zfsvfs->z_os, zp->z_id, &ma[i], count1,
i == 0 ? &pgsin_b : NULL, j == count ? &pgsin_a : NULL,
last_size);
Comment on lines +4020 to +4025
Copy link
Member

@amotin amotin Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, once we read several (count1) pages at once, shall we also increment i by the same number? I guess it may be fine via vm_page_any_valid() continue above, and it seems to be cheap, but do we need those iterations at all?

But more serious: dmu_read_pages() seems to not expect NULL for rbehind and rahead arguments, requiring pointer to zeroes instead. @markjdb I think this may end in NULL pointer dereference in whatever scenario this page skipping algorithm handles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking over this again, I am seeing the same thing. I overlooked this originally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow I convinced myself that a null pointer was ok. Probably because zfs_getpages() itself permits null readahead/behind pointers.

I submitted a PR to fix this, thanks for taking another look: #16758

if (error != 0)
break;
}

zfs_rangelock_exit(lr);
ZFS_ACCESSTIME_STAMP(zfsvfs, zp);

dataset_kstats_update_read_kstats(&zfsvfs->z_kstat, count*PAGE_SIZE);
Expand Down