Block cloning conditionally destroy ARC buffer #16337

bwatkinson · 2024-07-10T17:41:34Z

dmu_buf_will_clone() calls arc_buf_destroy() if there is an assosciated ARC buffer with the dbuf. However, this can only be done conditionally. If the preivous dirty record's dr_data is pointed at db_dbf then destroying it can lead to NULL pointer deference when syncing out the previous dirty record.

This updates dmu_buf_fill_clone() to only call arc_buf_destroy() if the previous dirty records dr_data is not pointing to db_buf. The block clone wil still set the dbuf's db_buf and db_data to NULL, but this will not cause any issues as any previous dirty record dr_data will still be pointing at the ARC buffer.

Updated dmu_buf_will_clone() to conditionally call arc_buf_destroy().

Motivation and Context

dmu_buf_will_clone() always called arc_buf_destroy() if there was an associated ARC buffer with the dbuf. However, this can lead to a NULL pointer dereference, which can occur when a previous dirty record is being synced and it's dr_data is pointing at the ARC buffer also pointed to by db_buf.

Description

This updates dmu_buf_fill_clone() to only call arc_buf_destroy() if the previous dirty records dr_data is not pointing to db_buf. The block clone wil still set the dbuf's db_buf and db_data to NULL, but this will not cause any issues as any previous dirty record dr_data will still be pointing at the ARC buffer.

How Has This Been Tested?

Ran ZTS tests with bclone and block_cloning takes for 5 iterations without any issues.

Testing was done with kernel 4.18.0-408.el8.x86_64.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

module/zfs/dbuf.c

amotin

It still looks OK to me. I am not sure I like running heavy I/O test for 3 minutes each time to test for one scenario though. Tests already take too much time. But I'll leave it to somebody who is closer to the tests area to comment. I wonder, since we know where was the problem, could we try to craft something more specific to trigger it?

bwatkinson · 2024-07-12T00:03:53Z

It still looks OK to me. I am not sure I like running heavy I/O test for 3 minutes each time to test for one scenario though. Tests already take too much time. But I'll leave it to somebody who is closer to the tests area to comment. I wonder, since we know where was the problem, could we try to craft something more specific to trigger it?

No disagreement from me that running a test for 3 mins is not great... I have direct I/O tests that could also trigger this, but often times it would take running them for 100+ iterations to get it to hit. It is hard to get things to trigger with timing stuff like this. This is also why I ask @ixhamza for a test case he might supply us that was a good case that could easily duplicate the issue. We might be able to craft something for sure. I just wonder if we will get into the same dilemma of it being such a timing thing to trigger, we would still wind up having to run multiple iterations of it. Open to ideas though for a better reproducer if we could be craft one.

tonyhutter · 2024-07-18T00:07:24Z

Looks like the new test case timed out on FreeBSD 13:

Test: /usr/local/share/zfs/zfs-tests/tests/functional/block_cloning/block_cloning_overwrites (run as root) [10:00] [KILLED]

bwatkinson · 2024-07-18T20:41:35Z

Looks like the new test case timed out on FreeBSD 13:

Test: /usr/local/share/zfs/zfs-tests/tests/functional/block_cloning/block_cloning_overwrites (run as root) [10:00] [KILLED]

Yeah, look at the output the test never made it back doing the first sync of the pool. I played around with this a bit in a FreeBSD VM with little resources to mimic the CI runners. The vast majority of the time was spent just doing the initial dd command. I wound tweaking the block_cloning_overwrite test a bit so it is less intense. I removed setting the dataset records sizes to 4k and reduced the overall file size down to 128M vs the original 1G file that was written. This functionally should be exactly the same as the example @ixhamza provided us, but less CPU intensive on the CI runners. Let's see if this now works better with the test runs.

tonyhutter · 2024-07-26T00:05:34Z

Looking at the rest results it seems like for every ~20 clonefiles/sync:

clonefile -f /testpool/testfs1/file1 /testpool/testfs2/file2

..there's 1 dd:

dd if=/dev/urandom of=/testpool/testfs2/file2 bs=1M count=128

is that ok?

dmu_buf_will_clone() calls arc_buf_destroy() if there is an assosciated ARC buffer with the dbuf. However, this can only be done conditionally. If the preivous dirty record's dr_data is pointed at db_dbf then destroying it can lead to NULL pointer deference when syncing out the previous dirty record. This updates dmu_buf_fill_clone() to only call arc_buf_destroy() if the previous dirty records dr_data is not pointing to db_buf. The block clone wil still set the dbuf's db_buf and db_data to NULL, but this will not cause any issues as any previous dirty record dr_data will still be pointing at the ARC buffer. Signed-off-by: Brian Atkinson <batkinson@lanl.gov>

bwatkinson · 2024-07-26T19:21:25Z

Looking at the rest results it seems like for every ~20 clonefiles/sync:
clonefile -f /testpool/testfs1/file1 /testpool/testfs2/file2
..there's 1 dd:
dd if=/dev/urandom of=/testpool/testfs2/file2 bs=1M count=128
is that ok?

I did some local testing as well on a node I have, and I was seeing about the same thing. I just decided to remove the test case. It has been locally tested with the original test case by @ixhamza to show. without this patch the error occurs. We might be able to come up with a test case in the future that would be better at stressing this, but I don't think we should hold off merging a NULL pointer dereference based on a test case.

dmu_buf_will_clone() calls arc_buf_destroy() if there is an associated ARC buffer with the dbuf. However, this can only be done conditionally. If the previous dirty record's dr_data is pointed at db_dbf then destroying it can lead to NULL pointer deference when syncing out the previous dirty record. This updates dmu_buf_fill_clone() to only call arc_buf_destroy() if the previous dirty records dr_data is not pointing to db_buf. The block clone wil still set the dbuf's db_buf and db_data to NULL, but this will not cause any issues as any previous dirty record dr_data will still be pointing at the ARC buffer. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes openzfs#16337

bwatkinson mentioned this pull request Jul 10, 2024

Direct IO Support #10018

Merged

17 tasks

amotin approved these changes Jul 10, 2024

View reviewed changes

module/zfs/dbuf.c Outdated Show resolved Hide resolved

bwatkinson force-pushed the bclone_arc_buf_destroy branch from 6a55f0c to 330b0e7 Compare July 11, 2024 01:19

amotin approved these changes Jul 11, 2024

View reviewed changes

bwatkinson force-pushed the bclone_arc_buf_destroy branch from 330b0e7 to e463c2b Compare July 11, 2024 20:14

bwatkinson requested a review from amotin July 11, 2024 20:16

amotin approved these changes Jul 11, 2024

View reviewed changes

bwatkinson force-pushed the bclone_arc_buf_destroy branch from e463c2b to a7ba93c Compare July 18, 2024 20:35

bwatkinson force-pushed the bclone_arc_buf_destroy branch from a7ba93c to d3bb628 Compare July 26, 2024 19:18

amotin approved these changes Jul 26, 2024

View reviewed changes

allanjude approved these changes Jul 28, 2024

View reviewed changes

behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Aug 1, 2024

behlendorf approved these changes Aug 1, 2024

View reviewed changes

behlendorf merged commit c8184d7 into openzfs:master Aug 2, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block cloning conditionally destroy ARC buffer #16337

Block cloning conditionally destroy ARC buffer #16337

bwatkinson commented Jul 10, 2024 •

edited

Loading

amotin left a comment

bwatkinson commented Jul 12, 2024 •

edited

Loading

tonyhutter commented Jul 18, 2024

bwatkinson commented Jul 18, 2024

tonyhutter commented Jul 26, 2024

bwatkinson commented Jul 26, 2024

Block cloning conditionally destroy ARC buffer #16337

Block cloning conditionally destroy ARC buffer #16337

Conversation

bwatkinson commented Jul 10, 2024 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

amotin left a comment

Choose a reason for hiding this comment

bwatkinson commented Jul 12, 2024 • edited Loading

tonyhutter commented Jul 18, 2024

bwatkinson commented Jul 18, 2024

tonyhutter commented Jul 26, 2024

bwatkinson commented Jul 26, 2024

bwatkinson commented Jul 10, 2024 •

edited

Loading

bwatkinson commented Jul 12, 2024 •

edited

Loading