-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change rangelock handling in FreeBSD's zfs_getpages() #16643
Conversation
I'm not familiar with the FreeBSD VM layer but it looks like the From the "vm2: serial console" log, https://github.com/openzfs/zfs/actions/runs/11315921698/job/31476777580?pr=16643
|
Thanks, I think I see what's going on there. I didn't see that panic when I ran the ZTS locally, but if I understand the problem correctly, it's due to a race. |
mappedread_sf() may allocate pages; if it fails to populate a page can't free it, it needs to ensure that it's placed into a page queue, otherwise it can't be reclaimed until the vnode is destroyed. I think this is quite unlikely to happen in practice, it was noticed by code inspection. Signed-off-by: Mark Johnston <markj@FreeBSD.org>
I believe the latest version addresses this, and I didn't observe any panics while running the ZTS locally. |
As a deadlock avoidance measure, zfs_getpages() would only try to acquire a rangelock, falling back to a single-page read if this was not possible. However, this is incompatible with direct I/O. Instead, release the busy lock before trying to acquire the rangelock in blocking mode. This means that it's possible for the page to be replaced, so we have to re-lookup. Signed-off-by: Mark Johnston <markj@FreeBSD.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me. I am not an expert in FreeBSD, but acquiring the range lock unconditionally in zfs_getpages()
was the goal. It also is now passing the CI tests for FreeBSD for the dio_mmap
test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__
As a deadlock avoidance measure, zfs_getpages() would only try to acquire a rangelock, falling back to a single-page read if this was not possible. However, this is incompatible with direct I/O. Instead, release the busy lock before trying to acquire the rangelock in blocking mode. This means that it's possible for the page to be replaced, so we have to re-lookup. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #16643
count1 = j - i; | ||
last_size = j == count ? | ||
MIN(end, obj_size) - (end - PAGE_SIZE) : PAGE_SIZE; | ||
error = dmu_read_pages(zfsvfs->z_os, zp->z_id, &ma[i], count1, | ||
i == 0 ? &pgsin_b : NULL, j == count ? &pgsin_a : NULL, | ||
last_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, once we read several (count1
) pages at once, shall we also increment i
by the same number? I guess it may be fine via vm_page_any_valid()
continue
above, and it seems to be cheap, but do we need those iterations at all?
But more serious: dmu_read_pages()
seems to not expect NULL for rbehind
and rahead
arguments, requiring pointer to zeroes instead. @markjdb I think this may end in NULL pointer dereference in whatever scenario this page skipping algorithm handles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking over this again, I am seeing the same thing. I overlooked this originally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow I convinced myself that a null pointer was ok. Probably because zfs_getpages() itself permits null readahead/behind pointers.
I submitted a PR to fix this, thanks for taking another look: #16758
mappedread_sf() may allocate pages; if it fails to populate a page can't free it, it needs to ensure that it's placed into a page queue, otherwise it can't be reclaimed until the vnode is destroyed. I think this is quite unlikely to happen in practice, it was noticed by code inspection. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes openzfs#16643
As a deadlock avoidance measure, zfs_getpages() would only try to acquire a rangelock, falling back to a single-page read if this was not possible. However, this is incompatible with direct I/O. Instead, release the busy lock before trying to acquire the rangelock in blocking mode. This means that it's possible for the page to be replaced, so we have to re-lookup. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes openzfs#16643
mappedread_sf() may allocate pages; if it fails to populate a page can't free it, it needs to ensure that it's placed into a page queue, otherwise it can't be reclaimed until the vnode is destroyed. I think this is quite unlikely to happen in practice, it was noticed by code inspection. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes openzfs#16643
As a deadlock avoidance measure, zfs_getpages() would only try to acquire a rangelock, falling back to a single-page read if this was not possible. However, this is incompatible with direct I/O. Instead, release the busy lock before trying to acquire the rangelock in blocking mode. This means that it's possible for the page to be replaced, so we have to re-lookup. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes openzfs#16643
Unconditionally hold the rangelock in zfs_getpages().
Motivation and Context
This is reportedly required for direct I/O support on FreeBSD.
Description
This change modifies zfs_getpages() to uncondtionally acquire the rangelock. To avoid a deadlock, we may need to drop a page busy lock and allocate a new page later on.
How Has This Been Tested?
Smoke testing on FreeBSD.
Types of changes
Checklist:
Signed-off-by
.