Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change rangelock handling in FreeBSD's zfs_getpages() #16643

Closed
wants to merge 2 commits into from

Conversation

markjdb
Copy link
Contributor

@markjdb markjdb commented Oct 13, 2024

Unconditionally hold the rangelock in zfs_getpages().

Motivation and Context

This is reportedly required for direct I/O support on FreeBSD.

Description

This change modifies zfs_getpages() to uncondtionally acquire the rangelock. To avoid a deadlock, we may need to drop a page busy lock and allocate a new page later on.

How Has This Been Tested?

Smoke testing on FreeBSD.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 14, 2024
@behlendorf
Copy link
Contributor

I'm not familiar with the FreeBSD VM layer but it looks like the mmap_mixed test case manage to trip an assert.

From the "vm2: serial console" log, https://github.com/openzfs/zfs/actions/runs/11315921698/job/31476777580?pr=16643

  ZTS run /usr/local/share/zfs/zfs-tests/tests/functional/mmap/mmap_mixed
panic: VERIFY(vm_page_none_valid(m)) failed
  
  cpuid = 1
  time = 1728878112
  KDB: stack backtrace:
  #0 0xffffffff80b9002d at kdb_backtrace+0x5d
  #1 0xffffffff80b43132 at vpanic+0x132
  #2 0xffffffff82841b6a at spl_panic+0x3a
  #3 0xffffffff8284845b at dmu_read_pages+0x7fb
  #4 0xffffffff82865527 at zfs_freebsd_getpages+0x2e7
  #5 0xffffffff80eddf81 at vnode_pager_getpages+0x41
  #6 0xffffffff80ed45c2 at vm_pager_get_pages+0x22
  #7 0xffffffff80eb18df at vm_fault+0x5ef
  #8 0xffffffff80eb11db at vm_fault_trap+0x6b
  #9 0xffffffff8100ca39 at trap_pfault+0x1d9
  #10 0xffffffff8100bfb2 at trap+0x442
  #11 0xffffffff80fe3828 at calltrap+0x8

@markjdb
Copy link
Contributor Author

markjdb commented Oct 14, 2024

I'm not familiar with the FreeBSD VM layer but it looks like the mmap_mixed test case manage to trip an assert.

From the "vm2: serial console" log, https://github.com/openzfs/zfs/actions/runs/11315921698/job/31476777580?pr=16643

  ZTS run /usr/local/share/zfs/zfs-tests/tests/functional/mmap/mmap_mixed
panic: VERIFY(vm_page_none_valid(m)) failed
  
  cpuid = 1
  time = 1728878112
  KDB: stack backtrace:
  #0 0xffffffff80b9002d at kdb_backtrace+0x5d
  #1 0xffffffff80b43132 at vpanic+0x132
  #2 0xffffffff82841b6a at spl_panic+0x3a
  #3 0xffffffff8284845b at dmu_read_pages+0x7fb
  #4 0xffffffff82865527 at zfs_freebsd_getpages+0x2e7
  #5 0xffffffff80eddf81 at vnode_pager_getpages+0x41
  #6 0xffffffff80ed45c2 at vm_pager_get_pages+0x22
  #7 0xffffffff80eb18df at vm_fault+0x5ef
  #8 0xffffffff80eb11db at vm_fault_trap+0x6b
  #9 0xffffffff8100ca39 at trap_pfault+0x1d9
  #10 0xffffffff8100bfb2 at trap+0x442
  #11 0xffffffff80fe3828 at calltrap+0x8

Thanks, I think I see what's going on there. I didn't see that panic when I ran the ZTS locally, but if I understand the problem correctly, it's due to a race.

@behlendorf behlendorf added the Status: Work in Progress Not yet ready for general review label Oct 20, 2024
mappedread_sf() may allocate pages; if it fails to populate a page
can't free it, it needs to ensure that it's placed into a page queue,
otherwise it can't be reclaimed until the vnode is destroyed.

I think this is quite unlikely to happen in practice, it was noticed by
code inspection.

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
@markjdb
Copy link
Contributor Author

markjdb commented Nov 8, 2024

I'm not familiar with the FreeBSD VM layer but it looks like the mmap_mixed test case manage to trip an assert.
From the "vm2: serial console" log, https://github.com/openzfs/zfs/actions/runs/11315921698/job/31476777580?pr=16643

  ZTS run /usr/local/share/zfs/zfs-tests/tests/functional/mmap/mmap_mixed
panic: VERIFY(vm_page_none_valid(m)) failed
  
  cpuid = 1
  time = 1728878112
  KDB: stack backtrace:
  #0 0xffffffff80b9002d at kdb_backtrace+0x5d
  #1 0xffffffff80b43132 at vpanic+0x132
  #2 0xffffffff82841b6a at spl_panic+0x3a
  #3 0xffffffff8284845b at dmu_read_pages+0x7fb
  #4 0xffffffff82865527 at zfs_freebsd_getpages+0x2e7
  #5 0xffffffff80eddf81 at vnode_pager_getpages+0x41
  #6 0xffffffff80ed45c2 at vm_pager_get_pages+0x22
  #7 0xffffffff80eb18df at vm_fault+0x5ef
  #8 0xffffffff80eb11db at vm_fault_trap+0x6b
  #9 0xffffffff8100ca39 at trap_pfault+0x1d9
  #10 0xffffffff8100bfb2 at trap+0x442
  #11 0xffffffff80fe3828 at calltrap+0x8

Thanks, I think I see what's going on there. I didn't see that panic when I ran the ZTS locally, but if I understand the problem correctly, it's due to a race.

I believe the latest version addresses this, and I didn't observe any panics while running the ZTS locally.

As a deadlock avoidance measure, zfs_getpages() would only try to
acquire a rangelock, falling back to a single-page read if this was not
possible.  However, this is incompatible with direct I/O.

Instead, release the busy lock before trying to acquire the rangelock in
blocking mode.  This means that it's possible for the page to be
replaced, so we have to re-lookup.

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Copy link
Contributor

@bwatkinson bwatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me. I am not an expert in FreeBSD, but acquiring the range lock unconditionally in zfs_getpages() was the goal. It also is now passing the CI tests for FreeBSD for the dio_mmap test case.

@bwatkinson bwatkinson requested a review from amotin November 13, 2024 00:50
@amotin amotin removed the Status: Work in Progress Not yet ready for general review label Nov 13, 2024
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Nov 13, 2024
behlendorf pushed a commit that referenced this pull request Nov 13, 2024
As a deadlock avoidance measure, zfs_getpages() would only try to
acquire a rangelock, falling back to a single-page read if this was not
possible.  However, this is incompatible with direct I/O.

Instead, release the busy lock before trying to acquire the rangelock in
blocking mode.  This means that it's possible for the page to be
replaced, so we have to re-lookup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16643
Comment on lines +4020 to +4025
count1 = j - i;
last_size = j == count ?
MIN(end, obj_size) - (end - PAGE_SIZE) : PAGE_SIZE;
error = dmu_read_pages(zfsvfs->z_os, zp->z_id, &ma[i], count1,
i == 0 ? &pgsin_b : NULL, j == count ? &pgsin_a : NULL,
last_size);
Copy link
Member

@amotin amotin Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, once we read several (count1) pages at once, shall we also increment i by the same number? I guess it may be fine via vm_page_any_valid() continue above, and it seems to be cheap, but do we need those iterations at all?

But more serious: dmu_read_pages() seems to not expect NULL for rbehind and rahead arguments, requiring pointer to zeroes instead. @markjdb I think this may end in NULL pointer dereference in whatever scenario this page skipping algorithm handles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking over this again, I am seeing the same thing. I overlooked this originally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow I convinced myself that a null pointer was ok. Probably because zfs_getpages() itself permits null readahead/behind pointers.

I submitted a PR to fix this, thanks for taking another look: #16758

behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Nov 15, 2024
mappedread_sf() may allocate pages; if it fails to populate a page
can't free it, it needs to ensure that it's placed into a page queue,
otherwise it can't be reclaimed until the vnode is destroyed.

I think this is quite unlikely to happen in practice, it was noticed by
code inspection.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes openzfs#16643
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Nov 15, 2024
As a deadlock avoidance measure, zfs_getpages() would only try to
acquire a rangelock, falling back to a single-page read if this was not
possible.  However, this is incompatible with direct I/O.

Instead, release the busy lock before trying to acquire the rangelock in
blocking mode.  This means that it's possible for the page to be
replaced, so we have to re-lookup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes openzfs#16643
ixhamza pushed a commit to truenas/zfs that referenced this pull request Dec 2, 2024
mappedread_sf() may allocate pages; if it fails to populate a page
can't free it, it needs to ensure that it's placed into a page queue,
otherwise it can't be reclaimed until the vnode is destroyed.

I think this is quite unlikely to happen in practice, it was noticed by
code inspection.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes openzfs#16643
ixhamza pushed a commit to truenas/zfs that referenced this pull request Dec 2, 2024
As a deadlock avoidance measure, zfs_getpages() would only try to
acquire a rangelock, falling back to a single-page read if this was not
possible.  However, this is incompatible with direct I/O.

Instead, release the busy lock before trying to acquire the rangelock in
blocking mode.  This means that it's possible for the page to be
replaced, so we have to re-lookup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes openzfs#16643
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants