Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a loop device over ZFS file system causes NULL pointer exception when direct=always is set #16956

Closed
ixhamza opened this issue Jan 16, 2025 · 0 comments · Fixed by #17006
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@ixhamza
Copy link
Member

ixhamza commented Jan 16, 2025

System information

Type Version/Name
Distribution Name Ubuntu 24.04.1 LTS
Distribution Version 24.04
Kernel Version 6.8.0-51-generic
Architecture x86_64
OpenZFS Version zfs-2.3.0-1

Describe the problem you're observing

As the title states, creating a loop device over a ZFS file system with direct=always set causes a NULL pointer dereference in the kernel. Alternatively, setting direct=always just after the loop device is created and then reading a single page also triggers a kernel panic.
Note: Reproducible on 5.15, and 6.12 kernel as well.

Describe how to reproduce the problem

sudo truncate -s 2G /tmp/f1
sudo rm -rf /mnt/tank
sudo zpool create tank /tmp/f1 -O mountpoint=/mnt/tank -O direct=always
sudo truncate -s 1G /mnt/tank/temp_file
sudo losetup /dev/loop19 /mnt/tank/temp_file

Alternatively,

sudo truncate -s 2G /tmp/f1
sudo rm -rf /mnt/tank
sudo zpool create tank /tmp/f1 -O mountpoint=/mnt/tank
sudo truncate -s 1G /mnt/tank/temp_file
sudo losetup /dev/loop19 /mnt/tank/temp_file
sudo zfs set  direct=always tank
sudo dd if=/dev/loop19 bs=4k count=1

Include any warning/errors/backtraces from the system logs

[  867.662333] BUG: kernel NULL pointer dereference, address: 00000000000000b0
[  867.662354] #PF: supervisor write access in kernel mode
[  867.662361] #PF: error_code(0x0002) - not-present page
[  867.662367] PGD 2c548b067 P4D 2c548b067 PUD 0 
[  867.662380] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  867.662389] CPU: 16 PID: 351 Comm: kworker/u40:7 Tainted: P           OE      6.8.0-51-generic #52-Ubuntu
[  867.662399] Hardware name: Micro-Star International Co., Ltd. MS-7D96/MAG B760 TOMAHAWK WIFI DDR4 (MS-7D96), BIOS 1.70 10/26/2023
[  867.662405] Workqueue: loop19 loop_rootcg_workfn
[  867.662421] RIP: 0010:down_read_killable+0x1e/0xe0
[  867.662437] Code: 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 2e c0 ff ff 65 ff 05 df f2 9e 65 be 00 01 00 00 <f0> 48 0f c1 33 48 81 c6 00 01 00 00 78 65 48 b8 07 00 00 00 00 00
[  867.662445] RSP: 0018:ffffade5c086f9f8 EFLAGS: 00010282
[  867.662452] RAX: 0000000000000000 RBX: 00000000000000b0 RCX: 0000000000290001
[  867.662458] RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000
[  867.662463] RBP: ffffade5c086fa00 R08: 00000000000000b0 R09: 0000000000290001
[  867.662468] R10: ffff9467db256700 R11: 0000000000000001 R12: 0000000000000000
[  867.662473] R13: ffffade5c086faa4 R14: 0000000000000000 R15: ffffade5c086fd28
[  867.662478] FS:  0000000000000000(0000) GS:ffff946f1f600000(0000) knlGS:0000000000000000
[  867.662485] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  867.662490] CR2: 00000000000000b0 CR3: 0000000269352004 CR4: 0000000000f70ef0
[  867.662496] PKRU: 55555554
[  867.662500] Call Trace:
[  867.662505]  <TASK>
[  867.662515]  ? show_regs+0x6d/0x80
[  867.662528]  ? __die+0x24/0x80
[  867.662536]  ? page_fault_oops+0x99/0x1b0
[  867.662548]  ? do_user_addr_fault+0x2e9/0x670
[  867.662557]  ? exc_page_fault+0x83/0x1b0
[  867.662567]  ? asm_exc_page_fault+0x27/0x30
[  867.662582]  ? down_read_killable+0x1e/0xe0
[  867.662592]  ? down_read_killable+0x12/0xe0
[  867.662602]  __gup_longterm_locked+0x46e/0x980
[  867.662620]  ? spl_kvmalloc+0x7a/0xb0 [spl]
[  867.662663]  pin_user_pages_unlocked+0x7a/0xc0
[  867.662677]  zfs_uio_get_dio_pages_alloc+0xc7/0x270 [zfs]
[  867.663332]  zfs_setup_direct+0xda/0x180 [zfs]
[  867.663993]  zfs_read+0x153/0x610 [zfs]
[  867.664705]  zpl_iter_read+0xfd/0x1b0 [zfs]
[  867.665360]  do_iter_readv_writev+0x196/0x1d0
[  867.665377]  vfs_iter_read+0xac/0x150
[  867.665384]  lo_read_simple+0x11d/0x1f0
[  867.665395]  do_req_filebacked+0x196/0x1a0
[  867.665404]  loop_process_work+0xb9/0x3a0
[  867.665413]  loop_rootcg_workfn+0x1b/0x30
[  867.665420]  process_one_work+0x175/0x350
[  867.665434]  worker_thread+0x306/0x440
[  867.665446]  ? __pfx_worker_thread+0x10/0x10
[  867.665457]  kthread+0xef/0x120
[  867.665467]  ? __pfx_kthread+0x10/0x10
[  867.665476]  ret_from_fork+0x44/0x70
[  867.665485]  ? __pfx_kthread+0x10/0x10
[  867.665494]  ret_from_fork_asm+0x1b/0x30
[  867.665507]  </TASK>
@ixhamza ixhamza added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jan 16, 2025
bwatkinson added a commit to bwatkinson/zfs that referenced this issue Jan 29, 2025
Originally openzfs#16856 updated Linux Direct I/O requests to use the new
pin_user_pages API. However, it was an oversight that this PR only
handled iov_iter's of type ITER_IOVEC and ITER_UBUF. Other iov_iter
types may try and use the pin_user_pages API if it is available. This
can lead to panics as the iov_iter is not being iterated over correctly
in zfs_uio_pin_user_pages().

Unfortunately, generic iov_iter API's that call pin_user_page_fast() are
protected as GPL only. Rather than update zfs_uio_pin_user_pages() to
account for all iov_iter types, we can simply just call
zfs_uio_get_dio_page_iov_iter() if the iov_iter type is not ITER_IOVEC
or ITER_UBUF. zfs_uio_get_dio_page_iov_iter() calls the
iov_iter_get_pages() calls that can handle any iov_iter type.

In the future it might be worth using the exposed iov_iter iterator
functions that are included in the header iov_iter.h since v6.7. These
functions allow for any iov_iter type to be iterated over and advanced
while applying a step function during iteration. This could possibly be
leveraged in zfs_uio_pin_user_pages().

A new ZFS test case was added to test that a ITER_BVEC is handled
correctly using this new code path. This test case was provided though
issue openzfs#16956.

Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes openzfs#16956
bwatkinson added a commit to bwatkinson/zfs that referenced this issue Jan 29, 2025
Originally openzfs#16856 updated Linux Direct I/O requests to use the new
pin_user_pages API. However, it was an oversight that this PR only
handled iov_iter's of type ITER_IOVEC and ITER_UBUF. Other iov_iter
types may try and use the pin_user_pages API if it is available. This
can lead to panics as the iov_iter is not being iterated over correctly
in zfs_uio_pin_user_pages().

Unfortunately, generic iov_iter API's that call pin_user_page_fast() are
protected as GPL only. Rather than update zfs_uio_pin_user_pages() to
account for all iov_iter types, we can simply just call
zfs_uio_get_dio_page_iov_iter() if the iov_iter type is not ITER_IOVEC
or ITER_UBUF. zfs_uio_get_dio_page_iov_iter() calls the
iov_iter_get_pages() calls that can handle any iov_iter type.

In the future it might be worth using the exposed iov_iter iterator
functions that are included in the header iov_iter.h since v6.7. These
functions allow for any iov_iter type to be iterated over and advanced
while applying a step function during iteration. This could possibly be
leveraged in zfs_uio_pin_user_pages().

A new ZFS test case was added to test that a ITER_BVEC is handled
correctly using this new code path. This test case was provided though
issue openzfs#16956.

Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes openzfs#16956
bwatkinson added a commit to bwatkinson/zfs that referenced this issue Jan 29, 2025
Originally openzfs#16856 updated Linux Direct I/O requests to use the new
pin_user_pages API. However, it was an oversight that this PR only
handled iov_iter's of type ITER_IOVEC and ITER_UBUF. Other iov_iter
types may try and use the pin_user_pages API if it is available. This
can lead to panics as the iov_iter is not being iterated over correctly
in zfs_uio_pin_user_pages().

Unfortunately, generic iov_iter API's that call pin_user_page_fast() are
protected as GPL only. Rather than update zfs_uio_pin_user_pages() to
account for all iov_iter types, we can simply just call
zfs_uio_get_dio_page_iov_iter() if the iov_iter type is not ITER_IOVEC
or ITER_UBUF. zfs_uio_get_dio_page_iov_iter() calls the
iov_iter_get_pages() calls that can handle any iov_iter type.

In the future it might be worth using the exposed iov_iter iterator
functions that are included in the header iov_iter.h since v6.7. These
functions allow for any iov_iter type to be iterated over and advanced
while applying a step function during iteration. This could possibly be
leveraged in zfs_uio_pin_user_pages().

A new ZFS test case was added to test that a ITER_BVEC is handled
correctly using this new code path. This test case was provided though
issue openzfs#16956.

Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes openzfs#16956
bwatkinson added a commit to bwatkinson/zfs that referenced this issue Jan 30, 2025
Originally openzfs#16856 updated Linux Direct I/O requests to use the new
pin_user_pages API. However, it was an oversight that this PR only
handled iov_iter's of type ITER_IOVEC and ITER_UBUF. Other iov_iter
types may try and use the pin_user_pages API if it is available. This
can lead to panics as the iov_iter is not being iterated over correctly
in zfs_uio_pin_user_pages().

Unfortunately, generic iov_iter API's that call pin_user_page_fast() are
protected as GPL only. Rather than update zfs_uio_pin_user_pages() to
account for all iov_iter types, we can simply just call
zfs_uio_get_dio_page_iov_iter() if the iov_iter type is not ITER_IOVEC
or ITER_UBUF. zfs_uio_get_dio_page_iov_iter() calls the
iov_iter_get_pages() calls that can handle any iov_iter type.

In the future it might be worth using the exposed iov_iter iterator
functions that are included in the header iov_iter.h since v6.7. These
functions allow for any iov_iter type to be iterated over and advanced
while applying a step function during iteration. This could possibly be
leveraged in zfs_uio_pin_user_pages().

A new ZFS test case was added to test that a ITER_BVEC is handled
correctly using this new code path. This test case was provided though
issue openzfs#16956.

Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes openzfs#16956
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant