Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a tunable to disable BRT support. #15529

Merged
merged 3 commits into from
Nov 16, 2023

Conversation

rincebrain
Copy link
Contributor

Motivation and Context

#15526 #15513 #15485 #15464

This feature is not stable and should not be enabled by default.
Separate from that, we should offer people an option to not eat their data if they took zpool status's advice and upgraded their pool to enable it.

I know Brian said in #14935 (comment) that he didn't want the additional complexity, but given that people are reporting "mv causes a kernel panic" and "compiling Go on ZFS creates corrupt binaries", we should probably stop the bleeding now and clean it up later.

Description

I grabbed the zfs_bclone_enabled patch from freebsd/freebsd-src@068913e4ba3dd, hand-applied it because it wouldn't apply cleanly, and then shoved similar stanzas into the Linux VFS code.

How Has This Been Tested?

Isn't that what the CI is for?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

man/man4/zfs.4 Outdated
@@ -1154,6 +1154,11 @@ Selecting any option other than
results in vector instructions
from the respective CPU instruction set being used.
.
.It Sy zfs_blake3_impl Ns = Ns Sy 0 Ns | Ns 1 Pq int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy-pasted, fwiw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I'm not sure how I overlooked that. Thank you.

Copy the disable parameter that FreeBSD implemented, and extend it to
work on Linux as well, until we're sure this is stable.

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
@rincebrain
Copy link
Contributor Author

I'm expecting the test suite to fail on getting EOPNOTSUPP unexpectedly, maybe, but I figured debating whether this should default to enabled or disabled was mostly going to be the crux of this discussion, and that could start without having to modify tests to do that.

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we disable it, it will newer get ready. Lets work on issues rather than murdering features. I do not object the tunable, but I do the default. On FreeBSD main it was enabled later after the quoted commit.

@rincebrain
Copy link
Contributor Author

I don't particularly mind the default being 1 in master, but I would suggest it be 0 when I immediately request a cherrypick into 2.2, because at the moment, I've been advising people to avoid 2.2 in large part because of this.

I agree that murdering features by default is how you get untested codepaths and bitrot (hi hole_birth), but "mv/cp can cause kernel panics or corruption" is not a reasonable baseline of stable for 2.2 to "shake the bugs out", IMO.

@amotin
Copy link
Member

amotin commented Nov 15, 2023

I would suggest it be 0 when I immediately request a cherrypick into 2.2

I would not object it as a temporary measure for 2.2.1, but still would like to see active work on the solution to get it back in 2.2.2.

@tonyhutter
Copy link
Contributor

To suppress the block_cloning build failures (completely untested!):

diff --git a/tests/zfs-tests/include/tunables.cfg b/tests/zfs-tests/include/tunables.cfg
index fb861f1a2..4661bfa47 100644
--- a/tests/zfs-tests/include/tunables.cfg
+++ b/tests/zfs-tests/include/tunables.cfg
@@ -93,6 +93,7 @@ VOL_INHIBIT_DEV                       UNSUPPORTED                     zvol_inhibit_dev
 VOL_MODE                       vol.mode                        zvol_volmode
 VOL_RECURSIVE                  vol.recursive                   UNSUPPORTED
 VOL_USE_BLK_MQ                 UNSUPPORTED                     zvol_use_blk_mq
+BCLONE_ENABLED                 UNSUPPORTED                     zfs_bclone_enabled
 XATTR_COMPAT                   xattr_compat                    zfs_xattr_compat
 ZEVENT_LEN_MAX                 zevent.len_max                  zfs_zevent_len_max
 ZEVENT_RETAIN_MAX              zevent.retain_max               zfs_zevent_retain_max
diff --git a/tests/zfs-tests/tests/functional/block_cloning/cleanup.ksh b/tests/zfs-tests/tests/functional/block_cloning/cleanup.ksh
index 7ac13adb6..2453f126d 100755
--- a/tests/zfs-tests/tests/functional/block_cloning/cleanup.ksh
+++ b/tests/zfs-tests/tests/functional/block_cloning/cleanup.ksh
@@ -31,4 +31,8 @@ verify_runnable "global"
 
 default_cleanup_noexit
 
+if tunable_exists BCLONE_ENABLED ; then
+       log_must set_tunable32 BCLONE_ENABLED 0
+fi
+
 log_pass
diff --git a/tests/zfs-tests/tests/functional/block_cloning/setup.ksh b/tests/zfs-tests/tests/functional/block_cloning/setup.ksh
index 512f5a064..feec24ec2 100755
--- a/tests/zfs-tests/tests/functional/block_cloning/setup.ksh
+++ b/tests/zfs-tests/tests/functional/block_cloning/setup.ksh
@@ -33,4 +33,8 @@ fi
 
 verify_runnable "global"
 
+if tunable_exists BCLONE_ENABLED ; then
+    log_must set_tunable32 BCLONE_ENABLED 1
+fi
+
 log_pass

@amotin
Copy link
Member

amotin commented Nov 15, 2023

@tonyhutter The cleanup should somehow restore the original value, not disable it.

@behlendorf
Copy link
Contributor

I would not object it as a temporary measure for 2.2.1, but still would like to see active work on the solution to get it back in 2.2.2.

I agree, until these issues are resolved we should temporarily disable this feature in 2.2.1, but also target re-enabling it in 2.2.2.

@rincebrain
Copy link
Contributor Author

I don't know if I'd try to target it coming back in 2.2.2, it seems like it would merit a lot of testing to re-enable by default.

To be clear, this is not intended as a criticism of anyone involved's hard work - I know very well how hard this is as a problem, and have a lot of use cases for this feature personally, but I think we need to be very conservative about re-enabling it again after having this rocky an initial release.

@amotin
Copy link
Member

amotin commented Nov 15, 2023

@rincebrain How exactly do you see it have "a lot of testing" without being enabled? Its being enabled in master for months now until this ... storm.

@rincebrain
Copy link
Contributor Author

rincebrain commented Nov 15, 2023

Usually, one adds tests to the codebase with a new feature before including it in a stable release.

Presently, there's a few tests that Klara added months after the feature was integrated, that are all Linux-only as of this writing, and only exercise a limited number of things each, and as remarked above, FreeBSD had this feature disabled in its development releases until a few weeks ago, and it doesn't exercise the feature at all on FBSD stable trees because of the lack of any interface to doso. Linux also only triggers this codepath in very new systems, comparatively, by default (FICLONE/copy_file_range by default was coreutils 9.0, in September 2021, so you'd need Debian 12 (June 2023), Ubuntu 23.04 (April 2023), Fedora of some flavor, RHEL 9 (May 2022), or a rolling release distro to see this without explicit testing...)

So a lot of people who might be testing this on their "stable" systems would never have seen the feature tested in casual use, and gotten -EOPNOTSUPP if they tried it explicitly either from no support at all or from trying to use FICLONE to cross dataset boundaries.

@robn
Copy link
Member

robn commented Nov 15, 2023

For whatever its worth, I softly lean towards "disable indefinitely", but I hate it. All other things being equal, its better to not have the feature than to have it out there damaging data, and reputation along with it.

But, I don't know how we establish confidence in it without a hell of a lot of time spent, and I know that that time is not easy for any of us to come by. But at that this point we probably have no choice; the damage is done, now its about how we recover.

If we think we can spend that time before 2.2.2, great. From what I've seen when chasing down assertions (eg #15139), its very difficult to understand and make test cases for these extreme edge cases, so I'm not expecting it to be easy. So I have my doubts that it will be fixed quickly. But yes, we should try, and we should turn it off until then.

@oromenahar
Copy link
Contributor

@rincebrain How exactly do you see it have "a lot of testing" without being enabled? Its being enabled in master for months now until this ... storm.

@amotin yes the base code (like the brt.c) for all of this was in the main branch but the interface to the linux wasn't wired up. It's there for maybe 3 month now (#15050). I used a quick and fast implementation of myself a few month before #15050 landed. I don't think that many people tested it and used it for any testing environment.

I would suggest to disable this pretty fast. Don't like to disable it, but losing data and having people not upgrading isn't good as well. I don't upgrade any production servers right now, because of for example an open #15139 in the 2.2 release. Maybe we can leave it enabled on any 2.2.x-rcs? But I don't think it's a good idea. The final release will differ a lot when disabling it in a rc but not in final.

@amotin
Copy link
Member

amotin commented Nov 15, 2023

I am not arguing that. Lets disable it in 2.2.1 (assuming it really helps) and consider re-enabling later once we fix all we'll be able to find. I am only saying that disable in master would be counter-productive.

@oromenahar
Copy link
Contributor

oromenahar commented Nov 15, 2023

I am not arguing that. Lets disable it in 2.2.1 and consider re-enabling later once we fix all we'll be able to find. I am only saying that disable in master would be counter-productive.

totally understandable, for that reason I thought about disabling it in a final release but not in master or rcs.

@tonyhutter
Copy link
Contributor

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
@rincebrain
Copy link
Contributor Author

Thanks, I just pushed it with the minor change of adding the FreeBSD tunable version, since that's not unsupported there.

I'll push it again with a 1 default after the tests run clean, but making sure those pass seems like a good idea to do now, rather than when this should be a noop and then trying to get a cherrypick of it and fix things there...

Comment on lines 92 to 94
int zfs_bclone_enabled;
SYSCTL_INT(_vfs_zfs, OID_AUTO, bclone_enabled, CTLFLAG_RWTUN,
&zfs_bclone_enabled, 1, "Enable block cloning");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 6th argument is not a default value, it is used for constants reporting. It is a popular mistake. Please do this instead:

Suggested change
int zfs_bclone_enabled;
SYSCTL_INT(_vfs_zfs, OID_AUTO, bclone_enabled, CTLFLAG_RWTUN,
&zfs_bclone_enabled, 1, "Enable block cloning");
int zfs_bclone_enabled = 1;
SYSCTL_INT(_vfs_zfs, OID_AUTO, bclone_enabled, CTLFLAG_RWTUN,
&zfs_bclone_enabled, 0, "Enable block cloning");

SYSCTL_INT(_vfs_zfs, OID_AUTO, bclone_enabled, CTLFLAG_RWTUN,
&zfs_bclone_enabled, 0, "Enable block cloning");


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This matches current code in FreeBSD, except the extra empty line here.

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Nov 16, 2023
@behlendorf behlendorf removed the Status: Code Review Needed Ready for review and testing label Nov 16, 2023
@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Nov 16, 2023
@behlendorf behlendorf merged commit 03e9caa into openzfs:master Nov 16, 2023
20 of 26 checks passed
@behlendorf
Copy link
Contributor

Merged, so this can be cherry picked for 2.2.1 and BRT disabled by default in that branch until the open issues are resolved. The buildbot CI failures here were unrelated.

@tonyhutter
Copy link
Contributor

I just pulled this into 2.2.1 and then added a follow-up commit to set zfs_bclone_enabled = 0 by default on Linux/FreeBSD.

@amotin
Copy link
Member

amotin commented Nov 19, 2023

#15543 should hopefully fix #15513 .

ixhamza pushed a commit to truenas/zfs that referenced this pull request Nov 20, 2023
Copy the disable parameter that FreeBSD implemented, and extend it to
work on Linux as well, until we're sure this is stable.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#15529
ixhamza pushed a commit to truenas/zfs that referenced this pull request Nov 20, 2023
Disable block cloning by default to mitigate possible data corruption
(see openzfs#15529 and openzfs#15526).

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
@FL140
Copy link

FL140 commented Nov 22, 2023

All other things being equal, its better to not have the feature than to have it out there damaging data, and reputation along with it.

But, I don't know how we establish confidence in it without a hell of a lot of time spent, and I know that that time is not easy for any of us to come by. But at that this point we probably have no choice; the damage is done, now its about how we recover.

Being one of the persons hit by this badly that is exactly the point. I and a lot of people I know choose ZFS because of things like checksums and error detectability. So while I get that that this kind of bug can happen and it is nasty, testing is a very important part of software development and reading the comments above I get the feeling that was not done accordingly to the potential harm we see now.

As a long term user and advocate of ZFS reading statements as above that give the impression of "let's test it by rolling it out in big distros" really doesn't help in keeping the confidence that ZFS is the filesystem I am recommending to everyone I know (Which I did until today and hope I can also do in the future.)

So please take that as serious as it is and test the new version in depth before releasing it to a wider audience, corrupting real world systems as easily as with this bug (a simple cp which is NO corner case) is no joke and please help those who are affected like me by providing measurements to at least detect, as good as possible, which files are corrupted (as it seams this is not so simple).

@FL140
Copy link

FL140 commented Nov 22, 2023

@rincebrain 1st thank's for implementing this! As written above I am also hit by this bug and the question is how to do the best damage control now and don't corrupt the data further. Specifically:

  1. As I understand it the best option would be NOT to enable the feature@block_cloning on the pool in the first place which I did as recommended when upgrading my distro to Ubuntu 23.10.
  2. I can see the feature is enabled in the pool, but I can't disable it with zpool set feature@block_cloning=disabled POOLNAME. As it throws the error cannot set property for 'POOLNAME': property 'feature@block_cloning' can only be set to 'disabled' at creation time. If I understand it correctly this patch gives me the option to disable the feature now, is this correct?
  3. If so what is the best way to get this patch on my system (Ubuntu 23.10)? Waiting for a fix in Ubuntu is not an option as this also effects my productive system I am currently working on and I need to stop corruption ASAP. So can anyone please give instructions on what to do to stop this mess?
  4. No need for an exact howto, but things like which versions exactly to use and how to compile those so it can be used in the Linux distros (there are no binaries or did I miss something?) would be highly appreciated, thanks!

@rincebrain
Copy link
Contributor Author

The tunable doesn't let you disable the feature once it's enabled on the pool, it just means the code will act like it's disabled no matter what the pool setting is when the tunable is enabled.

Ask Ubuntu to incorporate it or build it yourself. There are, in fact, explanations of how to do this.

Frederick888 pushed a commit to Frederick888/zfs that referenced this pull request Nov 23, 2023
Copy the disable parameter that FreeBSD implemented, and extend it to
work on Linux as well, until we're sure this is stable.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#15529
Frederick888 pushed a commit to Frederick888/zfs that referenced this pull request Nov 23, 2023
Disable block cloning by default to mitigate possible data corruption
(see openzfs#15529 and openzfs#15526).

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
amarshall added a commit to amarshall/nixpkgs that referenced this pull request Nov 23, 2023
Due to openzfs/zfs#15533, we do not want to
update to 2.2.1, but we *do* want patches from 2.2.1 to address block
cloning issues (see openzfs/zfs#15529).

The first patch does not apply cleanly due to a trivial conflict in the
context, so vendor the patch.
@grahamperrin
Copy link
Contributor

#15529 (comment)

… the impression of "let's test it by rolling it out in big distros" …

#15529 (review)

… On FreeBSD main it was enabled later after the quoted commit.

Thanks, and for additional clarity:

grahamperrin@mowa219-gjp4-freebsd-14-zfs-vm:~ % pkg -vv | grep -e url -e enabled
    url             : "pkg+https://pkg.freebsd.org/FreeBSD:14:amd64/latest",
    enabled         : yes,
    url             : "pkg+https://pkg.freebsd.org/FreeBSD:14:amd64/base_release_0",
    enabled         : yes,
grahamperrin@mowa219-gjp4-freebsd-14-zfs-vm:~ % freebsd-version -kru
14.0-RELEASE
14.0-RELEASE
14.0-RELEASE
grahamperrin@mowa219-gjp4-freebsd-14-zfs-vm:~ % sysctl vfs.zfs.bclone_enabled
vfs.zfs.bclone_enabled: 0
grahamperrin@mowa219-gjp4-freebsd-14-zfs-vm:~ % sysctl -d vfs.zfs.bclone_enabled
vfs.zfs.bclone_enabled: Enable block cloning
grahamperrin@mowa219-gjp4-freebsd-14-zfs-vm:~ % 

MajesticFaucet pushed a commit to MajesticFaucet/zfs that referenced this pull request Nov 25, 2023
Copy the disable parameter that FreeBSD implemented, and extend it to
work on Linux as well, until we're sure this is stable.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#15529
MajesticFaucet pushed a commit to MajesticFaucet/zfs that referenced this pull request Nov 25, 2023
Disable block cloning by default to mitigate possible data corruption
(see openzfs#15529 and openzfs#15526).

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
Copy the disable parameter that FreeBSD implemented, and extend it to
work on Linux as well, until we're sure this is stable.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#15529
behlendorf added a commit to behlendorf/zfs that referenced this pull request Jan 19, 2024
If block cloning is disabled by default then enable it when running
the bclone tests.  Follow up to openzfs#15529.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
behlendorf added a commit to behlendorf/zfs that referenced this pull request Jan 19, 2024
If block cloning is disabled by default then enable it when running
the bclone tests.  Follow up to openzfs#15529.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
behlendorf added a commit that referenced this pull request Jan 23, 2024
If block cloning is disabled by default then enable it when running
the bclone tests.  Follow up to #15529.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15796
behlendorf added a commit that referenced this pull request Jan 23, 2024
If block cloning is disabled by default then enable it when running
the bclone tests.  Follow up to #15529.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15796
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Mar 13, 2024
If block cloning is disabled by default then enable it when running
the bclone tests.  Follow up to openzfs#15529.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#15796
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Mar 13, 2024
If block cloning is disabled by default then enable it when running
the bclone tests.  Follow up to openzfs#15529.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#15796
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants