I/O error on mounting encrypt fs after upgrading #13709

xofyarg · 2022-07-30T22:51:55Z

System information

Type	Version/Name
Distribution Name	debian
Distribution Version	buster, bookworm
Kernel Version	upgrade from 5.10 to 5.18
Architecture	amd64
OpenZFS Version	upgrade from 2.0.1 to 2.1.5

Describe the problem you're observing

After upgrading zfs software, without upgrading the pool, trying to mount a encrypted fs gives an I/O error. Downgrade zfs and do a scrub fixed the error. Not sure if this is expected and users are supposed to upgrade the pool first.

# zpool import tank
# zpool status
  pool: tank
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
	The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(7) for details.
...
errors: No known data errors

# zfs load-key tank/backup
Enter passphrase for 'tank/backup':

# zfs mount tank/backup/root
cannot mount 'tank/backup/root': Input/output error


# zpool status -v
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
...
errors: Permanent errors have been detected in the following files:

        tank/backup/root:<0x0>

The text was updated successfully, but these errors were encountered:

rincebrain · 2022-07-31T06:02:37Z

Going forward should only require explicitly enabling new things in order to use existing functionality in the event of show-stopping bugs, AFAIK, and none of those should really apply here.

I'm not quite sure what you'd be hitting here - there have been a few fixes to encryption bugs since 2.0.1, but none of them should (AFAIK) reduce the number of things that can be unlocked, and I suspect the existing issue is still there if you go back to 2.1.5, the error just got removed from the log because it was a decryption error and scrub won't find those. (You could confirm it's a decryption error by looking in /proc/spl/kstat/kcf/NONAME_provider_stats after attempting it, I believe...)

I'd be curious, if you do go back to 2.1.5 and it goes boom again, whether you could try doing echo 1 | sudo tee /sys/module/zfs/parameters/zfs_flags, attempt the mount again, and then look in /proc/spl/kstat/zfs/dbgmsg and see what the last dozen or two entries are. (And if none of those are new with the unlock attempt or relevant, you could try echo 512 | ..., but that's way more loquacious.)

xofyarg · 2022-07-31T08:29:06Z

Going forward should only require explicitly enabling new things in order to use existing functionality in the event of show-stopping bugs, AFAIK, and none of those should really apply here.

Agreed. The new feature doesn't seem to be correlated:

# zpool upgrade
...
POOL  FEATURE
---------------
tank
      draid

Here is output from the commands you showed above, some duplicated entries were trimmed from verbose dbgmsg. Let me know if you need more.

# echo 1 > /sys/module/zfs/parameters/zfs_flags

# zfs mount -o ro tank/backup/root
cannot mount 'tank/backup/root': Input/output error

# cat /proc/spl/kstat/kcf/NONAME_provider_stats
2 1 0x01 4 1088 28793720433 353409269345
name                            type data
kcf_ops_total                   4    10
kcf_ops_passed                  4    10
kcf_ops_failed                  4    0
kcf_ops_returned_busy           4    0

# tail /proc/spl/kstat/zfs/dbgmsg
...
1659254370   spa_history.c:294:spa_history_log_sync(): command: zpool import tank
1659254519   spa_history.c:330:spa_history_log_sync(): ioctl load-key
1659254519   spa_history.c:294:spa_history_log_sync(): command: zfs load-key tank/backup

# echo 1 > /sys/module/zfs/parameters/zfs_flags
# zfs mount -o ro tank/backup/root
cannot mount 'tank/backup/root': Input/output error

# cat /proc/spl/kstat/zfs/dbgmsg
...
1659254703   zap.c:769:fzap_checksize(): error 22
1659254706   zap_leaf.c:508:zap_entry_read_name(): error 75
1659254706   zap_micro.c:982:zap_lookup_impl(): error 2
1659254706   zap_micro.c:985:zap_lookup_impl(): error 75


# echo 512 > /sys/module/zfs/parameters/zfs_flags
# zfs mount -o ro tank/backup/root; cat /proc/spl/kstat/zfs/dbgmsg
1659254703   zap.c:769:fzap_checksize(): error 22
1659254706   zap_leaf.c:508:zap_entry_read_name(): error 75
1659254706   zap_micro.c:982:zap_lookup_impl(): error 2
1659254706   zap_micro.c:985:zap_lookup_impl(): error 75
1659254706   zap_micro.c:1145:zap_length(): error 2
1659254706   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1659254706   zap_leaf.c:424:zap_leaf_lookup(): error 2
1659254706   zap_leaf.c:508:zap_entry_read_name(): error 75
1659254706   zap_micro.c:1610:zap_cursor_retrieve(): error 2
1659254706   zap_leaf.c:487:zap_entry_read(): error 75
1659254706   zap_leaf.c:468:zap_leaf_lookup_closest(): error 2
1659254706   zap_micro.c:1610:zap_cursor_retrieve(): error 2
1659254706   dsl_prop.c:55:dodefault(): error 2
1659254706   dsl_dataset.c:796:dsl_dataset_hold_flags(): error 2
1659254706   vdev_removal.c:2342:spa_removal_get_stats(): error 2
1659254706   spa_checkpoint.c:167:spa_checkpoint_get_stats(): error 1026
1659254706   zap_micro.c:982:zap_lookup_impl(): error 2
1659254706   zfeature.c:239:feature_get_refcount(): error 95
1659254706   zap_micro.c:1610:zap_cursor_retrieve(): error 2
1659254706   zap_leaf.c:487:zap_entry_read(): error 75
1659254706   zap_leaf.c:468:zap_leaf_lookup_closest(): error 2
1659254706   zap_micro.c:1610:zap_cursor_retrieve(): error 2
1659254706   dsl_crypt.c:619:spa_keystore_dsl_key_hold_impl(): error 2
1659254706   dsl_crypt.c:2700:spa_do_crypt_objset_mac_abd(): error 52
1659254706   arc.c:2177:arc_untransform(): error 5
1659254708   zap.c:769:fzap_checksize(): error 22
1659254708   zap_leaf.c:508:zap_entry_read_name(): error 75
1659254708   dnode.c:1486:dnode_hold_impl(): error 28
1659254708   zap_leaf.c:508:zap_entry_read_name(): error 75
1659254708   dbuf.c:2953:dbuf_findbp(): error 2

rincebrain · 2022-07-31T08:51:58Z

What does "zfs get keystatus tank/backup/root" say? ...oh, actually, I have an idea now. If you do a zfs send -w of tank/backup/root on 2.1.5 and recv it at tank/backup/root2 or something, does that one mount? If so...let me see if I can find the commit... #12981 e257bd4 But I'm surprised if that mounts pre-fix and not post...hm.

…

On Sun, Jul 31, 2022 at 4:29 AM xofyarg ***@***.***> wrote: Going forward should only require explicitly enabling new things in order to use existing functionality in the event of show-stopping bugs, AFAIK, and none of those should really apply here. Agreed. The new feature doesn't seem to be correlated: # zpool upgrade ... POOL FEATURE --------------- tank draid Here is output from the commands you showed above, some duplicated entries were trimmed from verbose dbgmsg. Let me know if you need more. # echo 1 > /sys/module/zfs/parameters/zfs_flags # zfs mount -o ro tank/backup/root cannot mount 'tank/backup/root': Input/output error # cat /proc/spl/kstat/kcf/NONAME_provider_stats 2 1 0x01 4 1088 28793720433 353409269345 name type data kcf_ops_total 4 10 kcf_ops_passed 4 10 kcf_ops_failed 4 0 kcf_ops_returned_busy 4 0 # tail /proc/spl/kstat/zfs/dbgmsg ... 1659254370 spa_history.c:294:spa_history_log_sync(): command: zpool import tank 1659254519 spa_history.c:330:spa_history_log_sync(): ioctl load-key 1659254519 spa_history.c:294:spa_history_log_sync(): command: zfs load-key tank/backup # echo 1 > /sys/module/zfs/parameters/zfs_flags # zfs mount -o ro tank/backup/root cannot mount 'tank/backup/root': Input/output error # cat /proc/spl/kstat/zfs/dbgmsg ... 1659254703 zap.c:769:fzap_checksize(): error 22 1659254706 zap_leaf.c:508:zap_entry_read_name(): error 75 1659254706 zap_micro.c:982:zap_lookup_impl(): error 2 1659254706 zap_micro.c:985:zap_lookup_impl(): error 75 # echo 512 > /sys/module/zfs/parameters/zfs_flags # zfs mount -o ro tank/backup/root; cat /proc/spl/kstat/zfs/dbgmsg 1659254703 zap.c:769:fzap_checksize(): error 22 1659254706 zap_leaf.c:508:zap_entry_read_name(): error 75 1659254706 zap_micro.c:982:zap_lookup_impl(): error 2 1659254706 zap_micro.c:985:zap_lookup_impl(): error 75 1659254706 zap_micro.c:1145:zap_length(): error 2 1659254706 dsl_prop.c:150:dsl_prop_get_dd(): error 2 1659254706 zap_leaf.c:424:zap_leaf_lookup(): error 2 1659254706 zap_leaf.c:508:zap_entry_read_name(): error 75 1659254706 zap_micro.c:1610:zap_cursor_retrieve(): error 2 1659254706 zap_leaf.c:487:zap_entry_read(): error 75 1659254706 zap_leaf.c:468:zap_leaf_lookup_closest(): error 2 1659254706 zap_micro.c:1610:zap_cursor_retrieve(): error 2 1659254706 dsl_prop.c:55:dodefault(): error 2 1659254706 dsl_dataset.c:796:dsl_dataset_hold_flags(): error 2 1659254706 vdev_removal.c:2342:spa_removal_get_stats(): error 2 1659254706 spa_checkpoint.c:167:spa_checkpoint_get_stats(): error 1026 1659254706 zap_micro.c:982:zap_lookup_impl(): error 2 1659254706 zfeature.c:239:feature_get_refcount(): error 95 1659254706 zap_micro.c:1610:zap_cursor_retrieve(): error 2 1659254706 zap_leaf.c:487:zap_entry_read(): error 75 1659254706 zap_leaf.c:468:zap_leaf_lookup_closest(): error 2 1659254706 zap_micro.c:1610:zap_cursor_retrieve(): error 2 1659254706 dsl_crypt.c:619:spa_keystore_dsl_key_hold_impl(): error 2 1659254706 dsl_crypt.c:2700:spa_do_crypt_objset_mac_abd(): error 52 1659254706 arc.c:2177:arc_untransform(): error 5 1659254708 zap.c:769:fzap_checksize(): error 22 1659254708 zap_leaf.c:508:zap_entry_read_name(): error 75 1659254708 dnode.c:1486:dnode_hold_impl(): error 28 1659254708 zap_leaf.c:508:zap_entry_read_name(): error 75 1659254708 dbuf.c:2953:dbuf_findbp(): error 2 — Reply to this email directly, view it on GitHub <#13709 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUI7KIEA7KPXSEGSNXEK3VWY2NZANCNFSM55D5NSLA> . You are receiving this because you commented.Message ID: ***@***.***>

danig-1 · 2022-07-31T13:57:26Z

After upgrading from Ubuntu Server 20.04 LTS to 22.04 LTS, same problem happened here.

Solution:

Send and receive:
zfs send --raw -v pool/original-encrypted@timestamp | zfs receive -v pool/new-encrypted
Destroy old filesystem with snapshots:
zfs destroy -r pool/original-encrypted
Rename new to old, so that it will be found at the old location:
zfs rename pool/new-encrypted pool/original-encrypted
Mount:
zfs mount -l pool/original-encrypted

xofyarg · 2022-07-31T17:36:53Z

What does "zfs get keystatus tank/backup/root" say?

The key was available.

If you do a zfs send -w ... does that one mount

That one mount, even created after the source mount failed(same as @danig-1 mentioned above). Unfortunately I'm not able to use this workaround at the moment because of the lack of storage for backing up the backup. I wonder will zpool upgrade tank simply work in this case without any side effect. Have you tried it @danig-1?

rincebrain · 2022-07-31T17:41:59Z

zpool upgrade won't save you, no. I keep meaning to try writing a zhack subcommand to trigger this recalculation without an outright send/recv requirement, but haven't had enough need to do it yet. It shouldn't be hard...

…

On Sun, Jul 31, 2022 at 1:37 PM xofyarg ***@***.***> wrote: What does "zfs get keystatus tank/backup/root" say? The key was available. If you do a zfs send -w ... does that one mount That one mount, even created after the source mount failed(same as @danig-1 <https://github.com/danig-1> mentioned above). Unfortunately I'm not able to use this workaround at the moment because of the lack of storage for backing up the backup. I wonder will zpool upgrade tank simply work in this case without any side effect. Have you tried it @danig-1 <https://github.com/danig-1>? — Reply to this email directly, view it on GitHub <#13709 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUI7LOF4Q4CL75GOZLL5TVW22T7ANCNFSM55D5NSLA> . You are receiving this because you commented.Message ID: ***@***.***>

danig-1 · 2022-07-31T18:03:44Z

No, I have not tried that.

xofyarg · 2022-07-31T22:42:36Z

Regarding to the linked issue, I don't remember I ever sent the dataset in raw before(those were sent incrementally unencrypted). But simply creating a encrypted filesystem on 2.0.1 and then mounting it on 2.1.5 doesn't reproduce the problem I have.

Actually, for the above workaround, I don't have to raw send the filesystem, zfs send tank/backup/root | zfs recv ... just works. Maybe this has a slightly different root cause.

rincebrain · 2022-07-31T23:08:41Z

I mean, if the destination isn't encrypted, then it won't encounter a bug with decrypting it on mount, no.

xofyarg · 2022-07-31T23:14:46Z

Sorry, I didn't mention the destination was under the same encryptionroot, so the created filesystem was encrypted.

rincebrain · 2022-07-31T23:30:42Z

Was it received under 2.0.x, or 2.1.5?

Because 2.1.x shouldn't result in a filesystem that has this problem even if you encrypt on receive, no.

xofyarg · 2022-08-01T00:31:02Z

It was 2.1.5. IIUC, the filesystem created from the old version(with some weird conditions) is somehow incompatible with current version. And the issue only affect "mounting", the data can be retrieved(after loading the key) and send out. Not sure if it's related to the user accounting flag, as I don't have any quota enabled. So unless there's a hack way to mangle the filesystem internally, creating a new one would be the only solution.

rincebrain · 2022-08-01T00:45:01Z

The specific problem is, IIRC, that certain metadata (the accounting metadata) got generated without a MAC on it. So things before this problem was fixed shrug and don't care, and things after it was fixed go "oh damn there's no MAC" and you get a decryption error. But since send/recv doesn't include the accounting metadata (it's generated on recv for a variety of reasons), you can just do a send|recv and it'll recalculate on recv and move on with life. I don't think it would be too troublesome to implement either a debug command to force it to recalculate Now or to fall back to doing so on failure to decrypt this specific data, it's just nobody has done so.

…

On Sun, Jul 31, 2022 at 8:31 PM xofyarg ***@***.***> wrote: It was 2.1.5. IIUC, the filesystem created from the old version(with some weird conditions) is somehow incompatible with current version. And the issue only affect "mounting", the data can be retrieved(after loading the key) and send out. Not sure if it's related to the user accounting flag, as I don't have any quota enabled. So unless there's a hack way to mangle the filesystem internally, creating a new one would be the only solution. — Reply to this email directly, view it on GitHub <#13709 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUI7IWXR3D6L5JK5L5IY3VW4LFDANCNFSM55D5NSLA> . You are receiving this because you commented.Message ID: ***@***.***>

xofyarg · 2022-08-01T20:12:02Z

Quick update: ran into an issue likely related to #11679. Otherwise, the newly created filesystems by send | recv look good.

clhedrick · 2022-08-04T19:16:05Z

I'm seeing the same issue. I just updated from Ubuntu 20 to 22, and I get Input Output error trying to mount encrypted file systems. zpool status shows lots of files with errors but not disk problems.

Fortunately I had mostly moved away from encryption after a previous disaster. So I don't absolutely have to have these file systems. But it would be useful.

My root is encrypted. Fortunately it mounts. Some of the things under it that don't mount inherit keys from the root. Some have their own. All keys are show available. There's nothing interesting in syslog or kern.log.

Should I destroy the file systems? Or would that be more dangerous? I looked at the other entries here, and I don't see any suggestions of a way to recover. Many of these file systems don't have quotas, and weren't received by zfs receive. They were created locally using rsync. I'm reluctant to do zfs send | receive on them, because that has its own problems. We lost a pool on another server while doing a zfs send from it.

rincebrain · 2022-08-04T19:26:12Z

It's very simple to recover, you just zfs send -w | zfs recv, and the result will work.

And you don't have to have set a quota for it to be calculating the quota metadata needed to enforce one.

...how, exactly, did you lose a pool from using zfs send?

clhedrick · 2022-08-05T14:40:43Z

Here's how we lost a pool due to zfs send: #13703. I assume that was #11679, though I was asked to report it separately.

The problem with recovering using zfs send -w | receive is that it risks triggering the same problem. I really, really don't want to have to restore or recreate 300 TB of data. (The problem isn't the number of bytes, but the fact that it's a billion files.)

I deleted all the encrypted file systems, except one. Unfortunately the root file system is encrypted. I don't know of any way to replace it. It was the one encrypted file system that we could mount. However at the moment I'm mounting everything under it but not the root. I've gotten paranoid about encryption.

rincebrain · 2022-08-05T15:30:49Z

You cannot replace the rootfs of a pool, no, that's one reason it's commonly suggested not to use it for storing anything, just as a source for property inheritance.

As Brian said in 13703, though, now OpenZFS should just burp and note an error on encountering the problem you saw, not hard panic. (I've also not seen, to my recollection, any encryption bugs that could cause it to incorrectly set the checksum algo there, so that might be unrelated to encryption? Unclear.)

So if you're running a newer version, at least that failure mode shouldn't happen.

I haven't tested this, and I don't know that I'll have time to do the experiment for a while, but as I said above, I think you could probably theoretically do something like "if we fail at decoding this specific MAC on this special case, throw it out and trigger recalculating" or "fall back to decoding via the incorrect prior encoding method", possibly guarded by a flag.

Whether you would trust a not very aggressively tested patch to do something like that over send/recv is an open and complicated question to consider, though.

clhedrick · 2022-08-05T15:34:11Z

I didn't use the root for anything. It's not currently mounted. But it was encrypted so that everything would inherit the encryption. I can't unencrypt it even though it's empty.

The system I'm reporting this for is different from the one where I reported #13703. I am running a current version of ZFS on it. I'm going to upgrade the rest of our systems later this month.

mprasil · 2022-08-27T22:57:01Z

I have encountered similar issue. I have moved my drives from Ubuntu 20.04 system to Ubuntu 22.04 system. In the newer system some filesystems in the pool didn't mount and showed as corrupted:

errors: Permanent errors have been detected in the following files:

        data/enc/users:<0x0>
        data/enc/system:<0x0>

I stopped right there and moved the drives back. The zpool status -v still shows the same filesystems as corrupted, (assuming these errors are somewhere in the log?) but they are mounting okay and as far as I can tell there aren't any corrupted files. I'm running scrub at the moment to know for sure. (Edit: Scrub finished, repaired 0B and errors disappeared, which seems to confirm that there were no actual errors)

All of the failing filesystems are encrypted. But I also have some other filesystems encrypted with the same key in the same pool that mounted on the new system without any problems.

#13763 seems to be also related. There are some folks mentioning upgrade from 20.04, which makes me wonder if that version doesn't have some bug around encryption.

#11688 also seems similar and shows that some people saw this bug disappear with ZFS upgrade to v2.1.4

My versions are following:

# Ubuntu 20.04:
zfs-0.8.3-1ubuntu12.14
zfs-kmod-0.8.3-1ubuntu12.14

# Ubuntu 22.04:
zfs-2.1.4-0ubuntu0.1
zfs-kmod-2.1.4-0ubuntu0.1

So using the v2.1.4 didn't help me.

jonryk · 2022-08-30T15:58:54Z

@rincebrain - Cheers! I can confirm that (at least for me, and for the one dataset I tested) your suggestion of receiving the dataset under a different name seemed to work fine, but my pool doesn't have enough room to hold duplicates of all the datasets... - Do you have a suggestion for how to trigger this recalculation without retransmitting / duplicating a huge dataset in my pool?

rincebrain · 2022-08-30T16:57:46Z

I can try to work up something to forcibly trigger it, but I'm wary of sharing minimally tested patches for encryption issues after a few rounds of attempts...so I guess I'd say, assuming I do find time to try working something up in the next week or so, use it on a dataset that you did have room to make a duplicate of, just in case?

Or, if the idea of testing novel patches does not appeal, then no, I have no suggestions.

mprasil · 2022-08-30T17:38:22Z

@jonryk this might not work in all cases, but perhaps you can create new, empty dataset and move files between them? Something like rsync -ax --remove-source-files /old/dataset/ /new/dataset/ should do the trick.

That way one dataset will be gradually bigger as the files are moved while the other will be gradually smaller. This should not require much more space in the pool. (assuming no snapshots referencing the moved files in the old dataset) When done, you can zfs set the mountpoint of the datasets so that the files are mounted on the same path if that matters to you.

Obviously the downside is, that the files will be split between two datasets while this is in progress, so whatever services are running on top of this might need to be stopped.

jonryk · 2022-08-30T23:50:38Z

Thanks a lot, @rincebrain! One more "trick" I discovered; I was able to revive / recalculate a couple of datasets - for whom I previously had backed up a recent incremental snapshot.
What I did was to first zfs destroy the snapshot in question, and then zfs recv it back from the file (using the -F option) - seemed to work for some datasets.
For the biggest dataset, however the "trick" didn't work (at least not so far), although that same snapshot-file was successfully received by an offsite, emergency backup, "old version" zpool - sitting on a very remote server, with a very limited network connection. When I attempted to import on the updated, local server, however, I got the following:
cannot receive incremental stream: incremental send stream requires -L (--large-block), to match previous receive.
(I guess my remote server is NOT set up with large blocks.)

jonryk · 2022-08-31T00:26:01Z

@mprasil : Thanks, but since in my case the datasets cannot be opened, and the files within them cannot be accessed, your suggestion is not applicable. The only solution so far / in my case, seems to be to trigger a "recalculation", as suggested by @rincebrain - which (until now) only seems possible through a zfs recv... - I have successfully "revived" a couple of the datasets this way so far, but the biggest datasets poses a challenge...

rincebrain · 2022-08-31T03:56:25Z

I suppose you could see if a "zfs rollback [latest snapshot]", or "zfs clone [latest snapshot] some/where", or the like might be a terrible workaround - or taking a new snapshot, then a second new snapshot, then do a send -wi newsnap1 newsnap2 to a file, then trigger a rollback from newsnap2 to newsnap1, and if that doesn't recalculate, receive the noop send you wrote to a file earlier...

jonryk · 2022-08-31T07:41:29Z

HAHA - fantastic, @rincebrain - my HERO!
("zfs rollback" or "zfs clone" didn't help, seems a zfs recv must be applied to force the recalculation.)

The working solution was the last one you suggested:

Make two new snapshots on the dataset
zfs send the incremental to a file
Roll back the dataset to the first of the two snaps
Receive the incremental from the file to the dataset

Voila - the datasets can all be mounted and all seems OK - Much obliged!

(Strange thing is I thought I tried this before, but that the zfs send failed - perhaps I didn't use the "-w" (RAW) option? - This time I ran zfs send with the "-Lwi" options, to make absolutely sure, although I guess the "-wi" options would suffice, and it all worked like a charm!)

In openzfs#13709, as in openzfs#11294 before it, it turns out that 63a2645 still had the same failure mode as when it was first landed as d1d4769, and fails to unlock certain datasets that formerly worked. Rather than reverting it again, let's add handling to just throw out the accounting metadata that failed to unlock when that happens, as well as a test with a pre-broken pool image to ensure that we never get bitten by this again. Fixes: openzfs#13709 Signed-off-by: Rich Ercolani <rincebrain@gmail.com>

In openzfs#13709, as in openzfs#11294 before it, it turns out that 63a2645 still had the same failure mode as when it was first landed as d1d4769, and fails to unlock certain datasets that formerly worked. Rather than reverting it again, let's add handling to just throw out the accounting metadata that failed to unlock when that happens, as well as a test with a pre-broken pool image to ensure that we never get bitten by this again. Fixes: openzfs#13709 Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov>

commit 619a318a127722ade0dcf94a6bbd224f3aca54fc Author: Jorgen Lundman <lundman@lundman.net> Date: Sun Nov 20 16:28:03 2022 +0900 Adding sysv_abi to assembly prototypes This is a test to see if Linux, and toolchains, would be unhappy specifying sysv abi usage for the assembler functions, they are written with sysv in mind after all. Otherwise we can leave it as an empty MACRO on Linux. Signed-off-by: Jorgen Lundman <lundman@lundman.net> commit b0657a59abb38659721bf8d973920292c4f4a1a8 Author: John Wren Kennedy <john.kennedy@delphix.com> Date: Fri Nov 18 12:43:18 2022 -0700 ZTS: zts-report silently ignores perf test results The regex used to extract test result information from a test run only matches the functional tests. Update the regex so it matches both. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com> Closes #14185 commit 3a74f488fcd9b3802efa366adcb813415d3f13a8 Author: Ameer Hamza <106930537+ixhamza@users.noreply.github.com> Date: Sat Nov 19 00:39:59 2022 +0500 zed: post a udev change event from spa_vdev_attach() In order for zed to process the removal event correctly, udev change event needs to be posted to sync the blkid information. spa_create() and spa_config_update() posts the event already through spa_write_cachefile(). Doing the same for spa_vdev_attach() that handles the case for vdev attachment and replacement. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14172 commit 3226e0dc8ef6f7770035c42b28f2b088bbdd2944 Author: George Amanakis <gamanakis@gmail.com> Date: Fri Nov 18 20:38:37 2022 +0100 Fix setting the large_block feature after receiving a snapshot We are not allowed to dirty a filesystem when done receiving a snapshot. In this case the flag SPA_FEATURE_LARGE_BLOCKS will not be set on that filesystem since the filesystem is not on dp_dirty_datasets, and a subsequent encrypted raw send will fail. Fix this by checking in dsl_dataset_snapshot_sync_impl() if the feature needs to be activated and do so if appropriate. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #13699 Closes #13782 commit 99c0479a4ef4cbfdf49ad05a4457d0872ab98f4c Author: Laura Hild <hild.laura.s@gmail.com> Date: Fri Nov 18 14:36:19 2022 -0500 Correct multipathd.target to .service https://github.com/openzfs/zfs/pull/9863 says it "orders zfs-import-cache.service and zfs-import-scan.service after multipathd.service" but the commit (79add96) actually ordered them after .target. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Laura Hild <lsh@jlab.org> Closes #12709 Closes #14171 commit 0a0166c9755a423906c097a29702d4962c73cf77 Author: Richard Yao <richard.yao@alumni.stonybrook.edu> Date: Thu Nov 3 13:53:17 2022 -0400 FreeBSD: do_mount() passes wrong string length to helper It should pass `MNT_LINE_MAX`, but passes `sizeof (mntpt)`. This is harmless because the strlen is not actually used by the helper, but FreeBSD's Coverity scans complained about it. This was missed in my audit of various string functions since it is not actually passed to a string function. Upon review, it was noticed that the helper function does not need to be a separate function, so I have inlined it as cleanup. Reported-by: Coverity (CID 1432079) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: szubersk <szuberskidamian@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14136 commit 31247c78b15aefeac5d395109209ca8a99ff5d60 Author: Richard Yao <richard.yao@alumni.stonybrook.edu> Date: Thu Nov 3 13:58:38 2022 -0400 FreeBSD: get_zfs_ioctl_version() should be cast to (void) FreeBSD's Coverity scans complain that we ignore the return value. There is no need to check the return value so we cast it to (void) to suppress further complaints by static analyzers. Reported-by: Coverity (CID 1018175) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: szubersk <szuberskidamian@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14136 commit 9e7fc5da3806b971304d13d513ea1504c7fe98f6 Author: szubersk <szuberskidamian@gmail.com> Date: Sat Nov 12 22:48:32 2022 +1000 Ubuntu 22.04 integration: GitHub workflows - GitHub workflows are run on Ubuntu 22.04 - Extract the `checkstyle` workflow dependencies to a separate file. - Refresh the `build-dependencies.txt` list. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #14148 commit 32ef14de0f3609c35d2478dd52950e9ad65b8c6d Author: szubersk <szuberskidamian@gmail.com> Date: Sat Nov 12 22:30:57 2022 +1000 Ubuntu 22.04 integration: ZTS Add `detect_odr_violation=1` to ASAN_OPTIONS to allow both libzfs and libzpool expose ``` zfeature_info_t spa_feature_table[SPA_FEATURES] ``` from module/zcommon/zfeature_common.c in public ABI. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #14148 commit 28ea4f9b088fd7fb33593f09d37bae44ea85e4fb Author: szubersk <szuberskidamian@gmail.com> Date: Sat Nov 12 22:29:29 2022 +1000 Ubuntu 22.04 integration: Cppcheck Suppress a false positive found by new Cppcheck version. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #14148 commit b46be903fb45a1ff463518d8e6b92f05723427cf Author: szubersk <szuberskidamian@gmail.com> Date: Sat Nov 12 22:23:30 2022 +1000 Ubuntu 22.04 integration: mancheck Correct new mandoc errors. ``` STYLE: input text line longer than 80 bytes STYLE: no blank before trailing delimiter ``` Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #14148 commit a5087965fe2fbb8cae60232b9b41b7ce7464daf1 Author: szubersk <szuberskidamian@gmail.com> Date: Sat Nov 12 22:22:49 2022 +1000 Ubuntu 22.04 integration: ShellCheck - Add new SC2312 global exclude. ``` Consider invoking this command separately to avoid masking its return value (or use '|| true' to ignore). [SC2312] ``` - Correct errors detected by new ShellCheck version. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #14148 commit c3b6fd3d594f27827d69d972b41520ef0646bdea Author: Damian Szuberski <szuberskidamian@gmail.com> Date: Thu Nov 17 03:27:53 2022 +1000 Make autodetection disable pyzfs for kernel/srpm configurations Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #13394 Closes #14178 commit 2163cde450d0898b5f7bac16afb4e238485411ff Author: Rich Ercolani <214141+rincebrain@users.noreply.github.com> Date: Tue Nov 15 17:44:12 2022 -0500 Handle and detect #13709's unlock regression (#14161) In #13709, as in #11294 before it, it turns out that 63a26454 still had the same failure mode as when it was first landed as d1d47691, and fails to unlock certain datasets that formerly worked. Rather than reverting it again, let's add handling to just throw out the accounting metadata that failed to unlock when that happens, as well as a test with a pre-broken pool image to ensure that we never get bitten by this again. Fixes: #13709 Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> commit b445b25b273d263f032fadd717e5731185b74bf5 Author: shodanshok <g.danti@assyoma.it> Date: Fri Nov 11 19:41:36 2022 +0100 Fix arc_p aggressive increase The original ARC paper called for an initial 50/50 MRU/MFU split and this is accounted in various places where arc_p = arc_c >> 1, with further adjustment based on ghost lists size/hit. However, in current code both arc_adapt() and arc_get_data_impl() aggressively grow arc_p until arc_c is reached, causing unneeded pressure on MFU and greatly reducing its scan-resistance until ghost list adjustments kick in. This patch restores the original behavior of initially having arc_p as 1/2 of total ARC, without preventing MRU to use up to 100% total ARC when MFU is empty. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes #14137 Closes #14120 commit 9f4ede63d23be4f43ba8dd0ca42c6a773a8eaa8d Author: Paul Dagnelie <paul.dagnelie@delphix.com> Date: Thu Nov 10 15:23:46 2022 -0800 Add ability to recompress send streams with new compression algorithm As new compression algorithms are added to ZFS, it could be useful for people to recompress data with new algorithms. There is currently no mechanism to do this aside from copying the data manually into a new filesystem with the new algorithm enabled. This tool allows the transformation to happen through zfs send, allowing it to be done efficiently to remote systems and in an incremental fashion. A new zstream command is added that decompresses WRITE records and then recompresses them with a provided algorithm, and then re-emits the modified send stream. It may also be possible to re-compress embedded block pointers, but that was not attempted for the initial version. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #14106 commit e9ab9e512c277ce3c22208599ebe5814db41a036 Author: John Wren Kennedy <john.kennedy@delphix.com> Date: Thu Nov 10 15:00:04 2022 -0700 ZTS: random_readwrite test doesn't run correctly This test uses fio's bssplit mechanism to choose io sizes for the test, leaving the PERF_IOSIZES variable empty. Because that variable is empty, the innermost loop in do_fio_run_impl is never executed, and as a result, this test does the setup but collects no data. Setting the variable to "bssplit" allows performance data to be gathered. Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com> Closes #14163 commit b1eec00904a22bd6600a2853709ca50faa56ea24 Author: Richard Yao <richard.yao@alumni.stonybrook.edu> Date: Thu Nov 10 09:09:35 2022 -0500 Cleanup: Suppress Coverity dereference before/after NULL check reports f224eddf922a33ca4b86d83148e9e6fa155fc290 began dereferencing a NULL checked pointer in zpl_vap_init(), which made Coverity complain because either the dereference is unsafe or the NULL check is unnecessary. Upon inspection, this pointer is guaranteed to never be NULL because it is from the Linux kernel VFS. The calls into ZFS simply would not make sense if this pointer were NULL, so the NULL check is unnecessary. Reported-by: Coverity (CID 1527260) Reported-by: Coverity (CID 1527262) Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Reviewed-by: Youzhong Yang <yyang@mathworks.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14170 commit 9e2be2dfbde6c41ff53d71f3669cb6b9909c5a40 Author: Richard Yao <richard.yao@alumni.stonybrook.edu> Date: Thu Nov 10 09:01:58 2022 -0500 Fix potential NULL pointer dereference regression 945b407486a0072ec7dd117a0bde2f72d52eb445 neglected to `NULL` check `tx->tx_objset`, which is already done in the function. This upset Coverity, which complained about a "dereference after null check". Upon inspection, it was found that whenever `dmu_tx_create_dd()` is called followed by `dmu_tx_assign()`, such as in `dsl_sync_task_common()`, `tx->tx_objset` will be `NULL`. Reported-by: Coverity (CID 1527261) Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Reviewed-by: Youzhong Yang <yyang@mathworks.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14170 commit 16f0fdadddcc7562ddf475f496a434b9ac94b0f7 Author: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Date: Thu Nov 10 22:37:12 2022 +0100 Allow to control failfast Linux defaults to setting "failfast" on BIOs, so that the OS will not retry IOs that fail, and instead report the error to ZFS. In some cases, such as errors reported by the HBA driver, not the device itself, we would wish to retry rather than generating vdev errors in ZFS. This new property allows that. This introduces a per vdev option to disable the failfast option. This also introduces a global module parameter to define the failfast mask value. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Seagate Technology LLC Submitted-by: Klara, Inc. Closes #14056 commit 945b407486a0072ec7dd117a0bde2f72d52eb445 Author: Mariusz Zaborski <oshogbo@vexillium.org> Date: Tue Nov 8 21:40:22 2022 +0100 quota: disable quota check for ZVOL The quota for ZVOLs is set to the size of the volume. When the quota reaches the maximum, there isn't an excellent way to check if the new writers are overwriting the data or if they are inserting a new one. Because of that, when we reach the maximum quota, we wait till txg is flushed. This is causing a significant fluctuation in bandwidth. In the case of ZVOL, the quota is enforced by the volsize, so we can omit it. This commit adds a sysctl thats allow to control if the quota mechanism should be enforced or not. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Zededa Inc. Sponsored-by: Klara Inc. Closes #13838 commit e197bb24f1857c823b44c2175b2318c472d79731 Author: Alan Somers <asomers@gmail.com> Date: Tue Nov 8 13:38:08 2022 -0700 Optionally skip zil_close during zvol_create_minor_impl If there were no zil entries to replay, skip zil_close. zil_close waits for a transaction to sync. That can take several seconds, for example during pool import of a resilvering pool. Skipping zil_close can cut the time for "zpool import" from 2 hours to 45 seconds on a resilvering pool with a thousand zvols. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Sponsored-by: Axcient Closes #13999 Closes #14015 commit f224eddf922a33ca4b86d83148e9e6fa155fc290 Author: youzhongyang <youzhong@gmail.com> Date: Tue Nov 8 13:28:56 2022 -0500 Support idmapped mount in user namespace Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes #14097 commit 109731cd73c56c378b4c71732b9b9d3504a7a7e1 Author: Damian Szuberski <szuberskidamian@gmail.com> Date: Wed Nov 9 04:16:01 2022 +1000 dsl_prop_known_index(): check for invalid prop Resolve UBSAN array-index-out-of-bounds error in zprop_desc_t. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #14142 Closes #14147 commit 41715771b5de07cbfcb1f7b75f324e824dfa1728 Author: Mohamed Tawfik <m_tawfik@aucegypt.edu> Date: Tue Nov 8 20:08:21 2022 +0200 Adds the `-p` option to `zfs holds` This allows for printing a machine-readable, accurate to the second, hold creation time in the form of a unix epoch timestamp. Additionally, updates relevant documentation and man pages accordingly. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mohamed Tawfik <m_tawfik@aucegypt.edu> Closes #13690 Closes #14152 commit ecbf02791f921b39594719ea103ae66ed2fce095 Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Fri Oct 28 00:55:45 2022 +0100 freebsd: simplify MD isa_defs.h Most of this file was a pile of defines, apparently from Solaris that controlled nothing in the source tree. A few things controlled the definition of unused types or macros which I have removed. Considerable further cleanup is possible including removal of architectures FreeBSD never supported. This file should likely converge with the Linux version to the extent possible. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14127 commit e3ba8eb12ef80a102a3f208a5a8d43eee3d21931 Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Fri Oct 28 00:41:53 2022 +0100 freebsd: trim dkio.h to the minimum Only DKIOCFLUSHWRITECACHE is required. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14127 commit 20b867f5f716fedab675f5eac395e7e1ea54572d Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Thu Oct 27 22:45:44 2022 +0100 freebsd: add ifdefs around legacy ioctl support Require that ZFS_LEGACY_SUPPORT be defined for legacy ioctl support to be built. For now, define it in zfs_ioctl_compat.h so support is always built. This will allow systems that need never support pre-openzfs tools a mechanism to remove support at build time. This code should be removed once the need for tool compatability is gone. No functional change at this time. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14127 commit 6c89cffc2cccbca82314bf276d31512f9dc4f6ec Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Thu Oct 27 22:28:55 2022 +0100 freebsd: remove no-op vn_renamepath() vn_renamepath() is a Solaris-ism that was defined away in the FreeBSD port. Now that the only use is in the FreeBSD zfs_vnops_os.c, drop it entierly. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14127 commit 270b1b5fa75adc54d5af5794a885d05120f83640 Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Thu Oct 27 22:24:42 2022 +0100 freebsd: remove unused vn_rename() Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14127 commit c23738c70eb86a7f04f93292caef2ed977047608 Author: Ameer Hamza <106930537+ixhamza@users.noreply.github.com> Date: Fri Nov 4 23:33:47 2022 +0500 zed: Prevent special vdev to be replaced by hot spare Special vdevs should not be replaced by a hot spare. Log vdevs already support this, extending the functionality for special vdevs. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14129 commit 73b8f700b68dc1c537781b2bee0f06c2b6d09418 Author: Alexander Lobakin <alobakin@pm.me> Date: Sun Oct 16 23:41:39 2022 +0200 icp: fix all !ENDBR objtool warnings in x86 Asm code Currently, only Blake3 x86 Asm code has signs of being ENDBR-aware. At least, under certain conditions it includes some header file and uses some custom macro from there. Linux has its own NOENDBR since several releases ago. It's defined in the same <asm/linkage.h>, so currently <sys/asm_linkage.h> already is provided with it. Let's unify those two into one %ENDBR macro. At first, check if it's present already. If so -- use Linux kernel version. Otherwise, try to go that second way and use %_CET_ENDBR from <cet.h> if available. If no, fall back to just empty definition. This fixes a couple more 'relocations to !ENDBR' across the module. And now that we always have the latest/actual ENDBR definition, use it at the entrance of the few corresponding functions that objtool still complains about. This matches the way how it's used in the upstream x86 core Asm code. Reviewed-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Closes #14035 commit 61cca6fa0506d41e5c794b293bedd982265fc1b2 Author: Alexander Lobakin <alobakin@pm.me> Date: Sun Oct 16 23:23:44 2022 +0200 icp: fix rodata being marked as text in x86 Asm code objtool properly complains that it can't decode some of the instructions from ICP x86 Asm code. As mentioned in the Makefile, where those object files were excluded from objtool check (but they can still be visible under IBT and LTO), those are just constants, not code. In that case, they must be placed in .rodata, so they won't be marked as "allocatable, executable" (ax) in EFL headers and this effectively prevents objtool from trying to decode this data. That reveals a whole bunch of other issues in ICP Asm code, as previously objtool was bailing out after that warning message. Reviewed-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Closes #14035 commit b844489ec0e35b0a9b3cda5ba72bf29334f81081 Author: Alexander Lobakin <alobakin@pm.me> Date: Sun Oct 16 16:53:22 2022 +0200 icp: properly fix all RETs in x86_64 Asm code Commit 43569ee37420 ("Fix objtool: missing int3 after ret warning") addressed replacing all `ret`s in x86 asm code to a macro in the Linux kernel in order to enable SLS. That was done by copying the upstream macro definitions and fixed objtool complaints. Since then, several more mitigations were introduced, including Rethunk. It requires to have a jump to one of the thunks in order to work, so the RET macro was changed again. And, as ZFS code didn't use the mainline defition, but copied it, this is currently missing. Objtool reminds about it time to time (Clang 16, CONFIG_RETHUNK=y): fs/zfs/lua/zlua.o: warning: objtool: setjmp+0x25: 'naked' return found in RETHUNK build fs/zfs/lua/zlua.o: warning: objtool: longjmp+0x27: 'naked' return found in RETHUNK build Do it the following way: * if we're building under Linux, unconditionally include <linux/linkage.h> in the related files. It is available in x86 sources since even pre-2.6 times, so doesn't need any conftests; * then, if RET macro is available, it will be used directly, so that we will always have the version actual to the kernel we build; * if there's no such macro, we define it as a simple `ret`, as it was on pre-SLS times. This ensures we always have the up-to-date definition with no need to update it manually, and at the same time is safe for the whole variety of kernels ZFS module supports. Then, there's a couple more "naked" rets left in the code, they're just defined as: .byte 0xf3,0xc3 In fact, this is just: rep ret `rep ret` instead of just `ret` seems to mitigate performance issues on some old AMD processors and most likely makes no sense as of today. Anyways, address those rets, so that they will be protected with Rethunk and SLS. Include <sys/asm_linkage.h> here which now always has RET definition and replace those constructs with just RET. This wipes the last couple of places with unpatched rets objtool's been complaining about. Reviewed-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Closes #14035 commit 993ee7a00670667f97d990aa5e38eb5cf5effc37 Author: Richard Yao <richard.yao@alumni.stonybrook.edu> Date: Fri Nov 4 14:06:14 2022 -0400 FreeBSD: Fix out of bounds read in zfs_ioctl_ozfs_to_legacy() There is an off by 1 error in the check. Fortunately, this function does not appear to be used in kernel space, despite being compiled as part of the kernel module. However, it is used in userspace. Callers of lzc_ioctl_fd() likely will crash if they attempt to use the unimplemented request number. This was reported by FreeBSD's coverity scan. Reported-by: Coverity (CID 1432059) Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14135 commit f66ffe68787f9675ad7cce7644a1f81f28a86939 Author: Serapheim Dimitropoulos <serapheim@delphix.com> Date: Thu Nov 3 15:02:46 2022 -0700 Expose zfs_vdev_open_timeout_ms as a tunable Some of our customers have been occasionally hitting zfs import failures in Linux because udevd doesn't create the by-id symbolic links in time for zpool import to use them. The main issue is that the systemd-udev-settle.service that zfs-import-cache.service and other services depend on is racy. There is also an openzfs issue filed (see https://github.com/openzfs/zfs/issues/10891) outlining the problem and potential solutions. With the proper solutions being significant in terms of complexity and the priority of the issue being low for the time being, this patch exposes `zfs_vdev_open_timeout_ms` as a tunable so people that are experiencing this issue often can increase it as a workaround. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #14133 commit 595d3ac2ed61331124feda2cf5787c3dd4c7ae09 Author: Allan Jude <allan@klarasystems.com> Date: Thu Nov 3 14:53:24 2022 -0400 Allow mounting snapshots in .zfs/snapshot as a regular user Rather than doing a terrible credential swapping hack, we just check that the thing being mounted is a snapshot, and the mountpoint is the zfsctl directory, then we allow it. If the mount attempt is from inside a jail, on an unjailed dataset (mounted from the host, not by the jail), the ability to mount the snapshot is controlled by a new per-jail parameter: zfs.mount_snapshot Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Modirum MDPay Sponsored-by: Klara Inc. Closes #13758 commit 11e3416ae78d09380c523b703fad8dee145658d5 Author: Richard Yao <richard.yao@alumni.stonybrook.edu> Date: Thu Nov 3 13:47:48 2022 -0400 Cleanup: Remove branches that always evaluate the same way Coverity reported that the ASSERT in taskq_create() is always true and the `*offp > MAXOFFSET_T` check in zfs_file_seek() is always false. We delete them as cleanup. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14130 commit 1e1ce10e5579a530606060f095f2f139916621fe Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Tue Nov 1 20:45:36 2022 +0000 Remove an unused variable Clang-16 detects this set-but-unused variable which is assigned and incremented, but never referenced otherwise. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14125 commit abb42dc5e1d5073ac72d9294fa78ab2203406b1c Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Tue Nov 1 20:43:32 2022 +0000 Make 1-bit bitfields unsigned This fixes -Wsingle-bit-bitfield-constant-conversion warning from clang-16 like: lib/libzfs/libzfs_dataset.c:4529:19: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion] flags.nounmount = B_TRUE; ^ ~~~~~~ Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14125 commit f47f6a055d0c282593fe701bcaa968225ba9d1fc Author: Richard Yao <richard.yao@alumni.stonybrook.edu> Date: Thu Nov 3 12:58:14 2022 -0400 Address warnings about possible division by zero from clangsa * The complaint in ztest_replay_write() is only possible if something went horribly wrong. An assertion will silence this and if it goes off, we will know that something is wrong. * The complaint in spa_estimate_metaslabs_to_flush() is not impossible, but seems very unlikely. We resolve this by passing the value from the `MIN()` that does not go to infinity when the variable is zero. There was a third report from Clang's scan-build, but that was a definite false positive and disappeared when checked again through Clang's static analyzer with Z3 refution via CodeChecker. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14124 commit 27d29946be5e555d8659d6ebdeda6ae771ada5d6 Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Thu Nov 3 09:57:05 2022 -0700 libuutil: deobfuscate internal pointers uu_avl and uu_list stored internal next/prev pointers and parent pointers (unused) obfuscated (byte swapped) to hide them from a long forgotten leak checker (No one at the 2022 OpenZFS developers meeting could recall the history.) This would break on CHERI systems and adds no obvious value. Rename the members, use proper types rather than uintptr_t, and eliminate the related macros. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14126 commit 211ec1b9fde303968d42e49553c666f74638d2ec Author: Attila Fülöp <attila@fueloep.org> Date: Thu Nov 3 17:55:13 2022 +0100 Deny receiving into encrypted datasets if the keys are not loaded Commit 68ddc06b611854560fefa377437eb3c9480e084b introduced support for receiving unencrypted datasets as children of encrypted ones but unfortunately got the logic upside down. This resulted in failing to deny receives of incremental sends into encrypted datasets without their keys loaded. If receiving a filesystem, the receive was done into a newly created unencrypted child dataset of the target. In case of volumes the receive made the target volume undeletable since a dataset was created below it, which we obviously can't handle. Incremental streams with embedded blocks are affected as well. We fix the broken logic to properly deny receives in such cases. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #13598 Closes #14055 Closes #14119 commit 84477e148dccf4665067c0d39006f31bb073cc9e Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Thu Oct 27 23:39:06 2022 +0100 lua: cast through uintptr_t when return a pointer Don't assume size_t can carry pointer provenance and use uintptr_t (identialy on all current platforms) instead. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14131 commit b9041e1f27b7b29b27ac3b873c7ba2922bccca01 Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Thu Oct 27 23:28:03 2022 +0100 Use intptr_t when storing an integer in a pointer Cast the integer type to (u)intptr_t before casting to "void *". In CHERI C/C++ we warn on bare casts from integers to pointers to catch attempts to create pointers our of thin air. We allow the warning to be supressed with a suitable cast through (u)intptr_t. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14131 commit 877790001e74b6c3b2955e4b7a8c685385e77654 Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Thu Oct 27 23:25:42 2022 +0100 recvd_props_mode: use a uintptr_t to stash nvlists Avoid assuming than a uint64_t can hold a pointer. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14131 commit 250b2bac78102f707dc105450f25d91e5fab481e Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Thu Oct 27 23:20:05 2022 +0100 zfs_onexit_add_cb: make action_handle point to a uintptr_t Avoid assuming than a uint64_t can hold a pointer and reduce the number of casts in the process. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14131 commit d96303cb0787bf7217aacd51074e00d820a98700 Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Thu Oct 27 23:04:17 2022 +0100 acl: use uintptr_t for ace walker cookies Avoid assuming that a pointer can fit in a uint64_t and use uintptr_t instead. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14131 commit 7309e94239a456de043c590ae85027e932c86f62 Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Fri Oct 28 17:36:43 2022 +0100 linux isa_defs.h: Don't define _ALIGNMENT_REQUIRED Nothing consumes this definition so stop defining it. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14128 commit 5229071ba1e6c5dbba277e50306d2ad38f417947 Author: Brooks Davis <brooks@one-eyed-alien.net> Date: Fri Oct 28 00:58:41 2022 +0100 Improve RISC-V support Check __riscv_xlen == 64 rather than _LP64 and define _LP64 if missing. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14128 commit da3d2666728ed21707bd66182c4077f4adcd61aa Author: Richard Yao <richard.yao@alumni.stonybrook.edu> Date: Tue Nov 1 16:58:17 2022 -0400 FreeBSD: Fix regression from kmem_scnprintf() in libzfs kmem_scnprintf() is only available in libzpool. Recent buildbot issues with showing FreeBSD results kept us from seeing this before 97143b9d314d54409244f3995576d8cc8c1ebf0a was merged. The code has been changed to sanitize the output from `kmem_scnprintf()`. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14111 commit fdc59cf56356858c00b9f06fd9fe11ab60ad7790 Author: Vince van Oosten <techhazard@codeforyouand.me> Date: Sun Oct 23 11:11:58 2022 +0200 include overrides for zfs snapshot/rollback bootfs.service Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me> Closes #14075 Closes #14076 commit 59ca6e2ad0b40a67d83cddae8e33d95e8957ad06 Author: Vince van Oosten <techhazard@codeforyouand.me> Date: Sun Oct 23 11:11:18 2022 +0200 include overrides for zfs-import.target Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me> Closes #14075 Closes #14076 commit b10f73f78eb223dd799a87474c537a69113edee1 Author: Vince van Oosten <techhazard@codeforyouand.me> Date: Sun Oct 23 10:55:46 2022 +0200 include systemd overrides to zfs-dracut module If a user that uses systemd and dracut wants to overide certain settings, they typically use `systemctl edit [unit]` or place a file in `/etc/systemd/system/[unit].d/override.conf` directly. The zfs-dracut module did not include those overrides however, so this did not have any effect at boot time. For zfs-import-scan.service and zfs-import-cache.service, overrides are now included in the dracut initramfs image. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me> Closes #14075 Closes #14076 commit 748b9d5bda935d126eeb62acab86c95e8b2ccac3 Author: Ryan Moeller <ryan@iXsystems.com> Date: Tue Nov 1 15:19:32 2022 -0400 zil: Relax assertion in zil_parse Rather than panic debug builds when we fail to parse a whole ZIL, let's instead improve the logging of errors and continue like in a release build. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #14116 commit 95055c2ce2a51b5285091d928c8481d02796ea72 Author: youzhongyang <youzhong@gmail.com> Date: Tue Nov 1 15:08:37 2022 -0400 ZTS: rsend_009_pos.ksh is destructive on zfs-on-root system Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes #14113 commit dcce0dc5f009e8a3ec6dc48f5fc99abc4d74200f Author: Richard Yao <richard.yao@alumni.stonybrook.edu> Date: Mon Oct 31 13:01:04 2022 -0400 Fix oversights from 4170ae4e 4170ae4ea600fea6ac9daa8b145960c9de3915fc was intended to tackle TOCTOU race conditions reported by CodeQL, but as an oversight, a file descriptor was not closed and some comments were not updated. Interestingly, CodeQL did not complain about the file descriptor leak, so there is room for improvement in how we configure it to try to detect this issue so that we get early warning about this. In addition, an optimization opportunity was missed by mistake in lib/libshare/os/linux/smb.c, which prevented us from truly closing the TOCTOU race. This was also caught by Coverity. Reported-by: Coverity (CID 1524424) Reported-by: Coverity (CID 1526804) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14109 commit b37d495e04ed6fc0012b2eccfff80af9e8887422 Author: Allan Jude <allan@klarasystems.com> Date: Sat Oct 29 16:08:54 2022 -0400 Avoid null pointer dereference in dsl_fs_ss_limit_check() Check for cr == NULL before dereferencing it in dsl_enforce_ds_ss_limits() to lookup the zone/jail ID. Reported-by: Coverity (CID 1210459) Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes #14103 commit 97143b9d314d54409244f3995576d8cc8c1ebf0a Author: Richard Yao <richard.yao@alumni.stonybrook.edu> Date: Thu Oct 27 14:16:04 2022 -0400 Introduce kmem_scnprintf() `snprintf()` is meant to protect against buffer overflows, but operating on the buffer using its return value, possibly by calling it again, can cause a buffer overflow, because it will return how many characters it would have written if it had enough space even when it did not. In a number of places, we repeatedly call snprintf() by successively incrementing a buffer offset and decrementing a buffer length, by its return value. This is a potentially unsafe usage of `snprintf()` whenever the buffer length is reached. CodeQL complained about this. To fix this, we introduce `kmem_scnprintf()`, which will return 0 when the buffer is zero or the number of written characters, minus 1 to exclude the NULL character, when the buffer was too small. In all other cases, it behaves like snprintf(). The name is inspired by the Linux and XNU kernels' `scnprintf()`. The implementation was written before I thought to look at `scnprintf()` and had a good name for it, but it turned out to have identical semantics to the Linux kernel version. That lead to the name, `kmem_scnprintf()`. CodeQL only catches this issue in loops, so repeated use of snprintf() outside of a loop was not caught. As a result, a thorough audit of the codebase was done to examine all instances of `snprintf()` usage for potential problems and a few were caught. Fixes for them are included in this patch. Unfortunately, ZED is one of the places where `snprintf()` is potentially used incorrectly. Since using `kmem_scnprintf()` in it would require changing how it is linked, we modify its usage to make it safe, no matter what buffer length is used. In addition, there was a bug in the use of the return value where the NULL format character was not being written by pwrite(). That has been fixed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14098 commit 2e08df84d8649439e5e9ed39ea13d4b755ee97c9 Author: Richard Yao <richard.yao@alumni.stonybrook.edu> Date: Thu Oct 27 15:41:39 2022 -0400 Cleanup dump_bookmarks() Assertions are meant to check assumptions, but the way that this assertion is written does not check an assumption, since it is provably always true. Removing the assertion will cause a compiler warning (made into an error by -Werror) about printing up to 512 bytes to a 256-byte buffer, so instead, we change the assertion to verify the assumption that we never do a snprintf() that is truncated to avoid overrunning the 256-byte buffer. This was caught by an audit of the codebase to look for misuse of `snprintf()` after CodeQL reported that we had misused `snprintf()`. An explanation of how snprintf() can be misused is here: https://www.redhat.com/en/blog/trouble-snprintf This particular instance did not misuse `snprintf()`, but it was caught by the audit anyway. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14098 commit d71d69326116756e69b2d7bee4582f00de27ec72 Author: Richard Yao <richard.yao@alumni.stonybrook.edu> Date: Thu Oct 27 12:45:26 2022 -0400 Fix too few arguments to formatting function CodeQL reported that when the VERIFY3U condition is false, we do not pass enough arguments to `spl_panic()`. This is because the format string from `snprintf()` was concatenated into the format string for `spl_panic()`, which causes us to have an unexpected format specifier. A CodeQL developer suggested fixing the macro to have a `%s` format string that takes a stringified RIGHT argument, which would fix this. However, upon inspection, the VERIFY3U check was never necessary in the first place, so we remove it in favor of just calling `snprintf()`. Lastly, it is interesting that every other static analyzer run on the codebase did not catch this, including some that made an effort to catch such things. Presumably, all of them relied on header annotations, which we have not yet done on `spl_panic()`. CodeQL apparently is able to track the flow of arguments on their way to annotated functions, which llowed it to catch this when others did not. A future patch that I have in development should annotate `spl_panic()`, so the others will catch this too. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14098 commit 4170ae4ea600fea6ac9daa8b145960c9de3915fc Author: Richard Yao <richard.yao@alumni.stonybrook.edu> Date: Thu Oct 27 11:03:48 2022 -0400 Fix TOCTOU race conditions reported by CodeQL and Coverity CodeQL and Coverity both complained about: * lib/libshare/os/linux/smb.c * tests/zfs-tests/cmd/mmapwrite.c * twice * tests/zfs-tests/tests/functional/tmpfile/tmpfile_002_pos.c * tests/zfs-tests/tests/functional/tmpfile/tmpfile_stat_mode.c * coverity had a second complaint that CodeQL did not have * tests/zfs-tests/cmd/suid_write_to_file.c * Coverity had two complaints and CodeQL had one complaint, both differed. The CodeQL complaint is about the main point of the test, so it is not fixable without a hack involving `fork()`. The issues reported by CodeQL are fixed, with the exception of the last one, which is deemed to be a false positive that is too much trouble to wrokaround. The issues reported by Coverity were only fixed if CodeQL complained about them. There were issues reported by Coverity in a number of other files that were not reported by CodeQL, but fixing the CodeQL complaints is considered a priority since we want to integrate it into a github workflow, so the remaining Coverity complaints are left for future work. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14098 commit 82ad2a06ac4e379fa67ff69901a1a70c86fd8f01 Author: Brian Behlendorf <behlendorf1@llnl.gov> Date: Fri Oct 28 13:25:37 2022 -0700 Revert "Cleanup: Delete dead code from send_merge_thread()" This reverts commit fb823de9f due to a regression. It is in fact possible for the range->eos_marker to be false on error. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #14042 Closes #14104 commit 5f0a48c7c95d938e4cb0ae3ee864241b324853b7 Author: Rob N ★ <robn@despairlabs.com> Date: Sat Oct 29 05:46:44 2022 +1100 debug: fix output from VERIFY0 assertion The previous version reported all the right info, but the VERIFY3 name made a little more confusing when looking for the matching location in the source code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Rob N ★ <robn@despairlabs.com> Closes #14099 commit 8af08a69cda63e6d7983fc2f32f9fed4155b95be Author: Mariusz Zaborski <oshogbo@vexillium.org> Date: Fri Oct 28 20:44:18 2022 +0200 quota: extend quota for dataset This patch relax the quota limitation for dataset by around 3%. What this means is that user can write more data then the quota is set to. However thanks to that we can get more stable bandwidth, in case when we are overwriting data in-place, and not consuming any additional space. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org> Sponsored-by: Zededa Inc. Sponsored-by: Klara Inc. Closes #13839 commit dc56c673e3b0d206f1d3fca66fdf5f6a46dbc4b2 Author: shodanshok <g.danti@assyoma.it> Date: Fri Oct 28 19:21:54 2022 +0200 Fix ARC target collapse when zfs_arc_meta_limit_percent=100 Reclaim metadata when arc_available_memory < 0 even if meta_used is not bigger than arc_meta_limit. As described in https://github.com/openzfs/zfs/issues/14054 if zfs_arc_meta_limit_percent=100 then ARC target can collapse to arc_min due to arc_purge not freeing any metadata. This patch lets arc_prune to do its work when arc_available_memory is negative even if meta_used is not bigger than arc_meta_limit, avoiding ARC target collapse. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes #14054 Closes #14093 commit 7822b50f548e6ca73faa6f0d2de029e981be1d73 Author: vaclavskala <33496485+vaclavskala@users.noreply.github.com> Date: Fri Oct 28 19:16:31 2022 +0200 Propagate extent_bytes change to autotrim thread The autotrim thread only reads zfs_trim_extent_bytes_min and zfs_trim_extent_bytes_max variable only on thread start. We should check for parameter changes during thread execution to allow parameter changes take effect without needing to disable then restart the autotrim. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Václav Skála <skala@vshosting.cz> Closes #14077 commit dbf6108b4df92341eea40d0b41792ac16eabc514 Author: Aleksa Sarai <cyphar@cyphar.com> Date: Sat Jun 22 10:35:11 2019 +1000 zfs_rename: support RENAME_* flags Implement support for Linux's RENAME_* flags (for renameat2). Aside from being quite useful for userspace (providing race-free ways to exchange paths and implement mv --no-clobber), they are used by overlayfs and are thus required in order to use overlayfs-on-ZFS. In order for us to represent the new renameat2(2) flags in the ZIL, we create two new transaction types for the two flags which need transactional-level support (RENAME_EXCHANGE and RENAME_WHITEOUT). RENAME_NOREPLACE does not need any ZIL support because we know that if the operation succeeded before creating the ZIL entry, there was no file to be clobbered and thus it can be treated as a regular TX_RENAME. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Closes #12209 Closes #14070 commit e015d6cc0b60d4675c9b6d2433eed2c8ef0863e8 Author: Aleksa Sarai <cyphar@cyphar.com> Date: Fri Apr 26 23:23:07 2019 +1000 zfs_rename: restructure to have cleaner fallbacks This is in preparation for RENAME_EXCHANGE and RENAME_WHITEOUT support for ZoL, but the changes here allow for far nicer fallbacks than the previous implementation (the source and target are re-linked in case of the final link failing). In addition, a small cleanup was done for the "target exists but is a different type" codepath so that it's more understandable. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Closes #12209 Closes #14070 commit 7b3ba296543724611c12c52c18e85a1028f8f19e Author: Aleksa Sarai <cyphar@cyphar.com> Date: Wed May 18 20:29:33 2022 +1000 debug: add VERIFY_{IMPLY,EQUIV} variants This allows for much cleaner VERIFY-level assertions. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Closes #14070 commit 86db35c447aa3f4cc848497d78d54ec9c985d1ed Author: Pavel Snajdr <snajpa@snajpa.net> Date: Thu Dec 5 01:52:27 2019 +0100 Remove zpl_revalidate: fix snapshot rollback Open files, which aren't present in the snapshot, which is being roll-backed to, need to disappear from the visible VFS image of the dataset. Kernel provides d_drop function to drop invalid entry from the dcache, but inode can be referenced by dentry multiple dentries. The introduced zpl_d_drop_aliases function walks and invalidates all aliases of an inode. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net> Closes #9600 Closes #14070

In openzfs#13709, as in openzfs#11294 before it, it turns out that 63a2645 still had the same failure mode as when it was first landed as d1d4769, and fails to unlock certain datasets that formerly worked. Rather than reverting it again, let's add handling to just throw out the accounting metadata that failed to unlock when that happens, as well as a test with a pre-broken pool image to ensure that we never get bitten by this again. Fixes: openzfs#13709 Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov>

wdoekes · 2023-01-30T16:26:48Z

FYI: relevant bug report at Ubuntu
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1987190

Thanks for the patch, @rincebrain! I can confirm that it works.

lucidBrot · 2023-02-10T16:27:01Z

I just encountered the same problem - the do-release-upgrade had somehow failed, leaving me with only a tty and a dead screen session (that it had autocreated). In there, the output of zfs --version showed mismatching versions between the kmod and the command itself, but I rebooted anyway. Got dropped to initramfs because it failed to mount my encrypted zfs on root despite having loaded the correct key, with the message about corruption and Input/Output error. Interestingly, the versions of zfs --version in the initramfs were both the new one now.

Following @mat128 's steps worked!

There were errors after I tried to exit the initramfs but forcing a reboot using the power button did boot me into the gui again. Thanks!

lucidBrot · 2023-02-13T16:37:23Z

Potentially Relevant: When the new zfs version tried to access the old zfs filesystem with missing MAC (or whatever exactly the problem was that was discussed here), it had set a warning on the pool. Even though everthing is fine now, the warning is still there:

generic@motorbrot:/tmp$ sudo zpool status tank -v
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Mon Feb 13 16:55:08 2023
	385G scanned at 1.43G/s, 72.7G issued at 276M/s, 510G total
	0B repaired, 14.25% done, 00:27:04 to go
config:

	NAME         STATE     READ WRITE CKSUM
	tank         ONLINE       0     0     0
	  nvme0n1p9  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        tank/enc/ds1/u18:<0x0>

I am currently running zpool scrub tank, hoping that this will clear out the warning - zpool clear tank did not clear it. The zpool scrub tank indeed did clear the warning.

lucidBrot · 2023-02-16T18:01:51Z

I just realized that despite my zfs-on-root booting correctly, i am no longer able to access any of the older snapshots, since mounting them gives an I/O error.

Is there a known workaround to mount the old snapshots anyway, despite them having been created by an older version of zfs, without copying all the data to a new pool?

rincebrain · 2023-02-16T18:08:11Z

Older snapshots should work after #14161; if they don't, that's a bug, and I should figure out what to do about it.

lucidBrot · 2023-02-16T18:21:19Z

Oh, I see. Thanks for that information!

I am on ubuntu 22.04

zfs --version
zfs-2.1.5-1ubuntu6-22.04.1
zfs-kmod-2.1.5-1ubuntu6-22.04.1

Considering that the 2.1.5 release on the github releases page is from Jun 22, 2022 and the PR you reference was merged on Nov 15, 2022, I'll assume that your fix does work and is simply not yet in the latest version on ubuntu.

wdoekes · 2023-02-17T08:01:32Z

@lucidBrot: could you take a look at the Ubuntu bug at https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1987190 (mentioned in #13709 (comment)) and:

check the workaround
click that it affects you too

I checked the the 2.1.5-1ubuntu6-22.04.1 changelog, and this is indeed not fixed there.

lucidBrot · 2023-02-18T17:01:46Z

Thanks @wdoekes for the link! I clicked that it affects me too and posted a comment there as well. However, I no longer need a reply from you despite having asked

Does it still make sense to apply your patch, or would I rather figure out how to install zfs directly from the source from github?

It took me a day but i have now managed to install zfs 2.1.9 from source and my laptop still boots. The only thing missing is that I no longer have zfs-zed, but since I never used that or understood what it was for, that should be alright i guess.

@rincebrain I have verified: the same snapshot that I was no longer able to access in my earlier testing is now accessible again. So your fix is working indeed. Thank you for your work!

bigtonylewis · 2023-02-26T07:51:17Z

I have this bug and @mat128's solution worked, but didn't survive a reboot. After rebooting, I had to redo the snapshot workaround again.

Is there a way to fix the problem permanently?

lucidBrot · 2023-02-26T08:37:32Z

@bigtonylewis It worked for me after a reboot still ... just only for the new snapshots. (until I decided to built the latest zfs version myself, that made everything work again fully)

Is your zfs --version in the initramfs newer than your zfs version in the booted OS?

bigtonylewis · 2023-02-26T10:33:35Z

@lucidBrot There's no ZFS in initrd; I don't use ZFS for OS filesystems, just for data. I just confirmed this with lsinitramfs.

I do note though that there is a difference in versions between the binary and the kernel module:

# zfs --version
zfs-2.1.5-1ubuntu6~22.04.1
zfs-kmod-2.1.4-0ubuntu0.1

Could that be a factor? The module comes from the kernel image (MD5 confirms it is the same as per the kernel package)

lucidBrot · 2023-02-26T22:04:16Z

Ah, well @bigtonylewis I don't see why a reboot should have any effect then, if you aren't booting from it. I'm no expert either though.

Do note that when I finished upgrading to ubuntu 22.04 I had consistent versioning here. So ... maybe you can apt upgrade it? ( You can check that with sudo apt update && apt list --upgradeable )

rptb1 · 2023-10-10T12:21:37Z

I have recently had this same issue, and have written about it to zfs-discussion and earlier at serverfault. I'm recording links here. I will study this thread and follow up in all locations with my results.

In the meantime, please tell me if anyone has suggestions for getting debugging output, backtraces, dumps etc. from my current pool that might help debug this issue for other people before I make it go away.

Thanks.

rincebrain · 2023-10-10T13:08:02Z

Just install a newer version than Ubuntu is shipping, or get them to ship the patch that fixes your problem. There's no real debugging needed unless you're running the fix and it didn't help.

wdoekes · 2023-10-10T13:59:15Z

@rptb1: Did you check the bug report at launchpad, and clicked "Affects me too"?

Get patch against 2.1.5: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1987190/+attachment/5704587/+files/zfs-dkms-2.1.5-1-fix-zero-mac-io-error.patch
Follow steps described here: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1987190/comments/4

That should fix it for Jammy.

I did notice occasional errors in zpool status with (unpatched) 2.1.5-1ubuntu6~22.04.1 but they were fixed up by a zpool scrub initiated by zed:

errors: 7 data errors, use '-v' for a list

Permanent errors have been detected in the following files:

        <0x20f17>:<0x0>
        tank/someserver@planb-20221101T2223Z:<0x0>
        ...

But the autoscrub made them go away before I could look into it some more:

  scan: scrub repaired 0B in 1 days 13:42:09 with 0 errors on Mon Oct  9 14:06:11 2023
...
errors: No known data errors

On this host I had mounted every encrypted fileset when running the patch, and then went back to the unpatched Ubuntu module. I did not expect to see any new errors after that. But.. if this is the same issue, I guess I would have to mount all the old snapshots, not just the filesets, before I can go back to the unpatched version.

Update:

I guess I would have to mount all the old snapshots, not just the filesets, before I can go back to the unpatched version.

Yes. Looks like that was the cause of the later errors. Running a patched version now, and I'm not adding more snapshots to the zpool error list.

(I can probably forego the mounting, and simply ls in .zfs/snapshot/* in all filesets.)

wdoekes · 2023-10-11T10:41:29Z

Update:

Alas: the errors on the snapshots keep reappearing when using the unpatched version. Even though I did mount them with a patched module once.

It can be quickly observed when looking in .zfs/snapshot:

# ls -l .zfs/snapshot
total 513
drwxrwxrwx 1 root  root 0 Oct 11 12:26 2021-07-01-last-on-old-swift
drwxrwxrwx 1 root  root 0 Oct 11 12:26 planb-20220101T0014Z
drwxrwxrwx 1 root  root 0 Oct 11 12:26 planb-20221101T0017Z
drwxrwxrwx 1 root  root 0 Oct 11 12:26 planb-20221201T0009Z
drwxrwxrwx 1 root  root 0 Oct 11 12:26 planb-20230101T0014Z  <-- these and above are unreadable without patch
drwxr-xr-x 3 planb root 4 Feb  1  2023 planb-20230201T0021Z
drwxr-xr-x 3 planb root 4 Mar  1  2023 planb-20230301T0016Z
drwxr-xr-x 3 planb root 4 Apr  1  2023 planb-20230401T0016Z
drwxr-xr-x 3 planb root 4 May  1 02:17 planb-20230501T0017Z
...

It would be nice if we can get these persistently fixed so they work without the patch. Not sure if that is possible though 🤷

rptb1 · 2023-10-12T04:38:30Z

Thanks for the suggestions and the pointer to the patch. I'm not keen run with a non-standard LTS system for all sorts of reasons, so I went with using zfs send/receive to reconstruct the damaged filesystems. I've posted details to zfs-discuss. tl;dr I did ZFS replication send from the old pool in Ubuntu 20 to receive in a fresh pool on Ubuntu 22, then sent the filesystems back to the old pool in Ubuntu 22. This won't work for everyone -- there was quite a bit of server downtime.

xofyarg added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jul 30, 2022

xofyarg mentioned this issue Aug 1, 2022

ZFS on Linux null pointer dereference #11679

Closed

jonryk mentioned this issue Aug 30, 2022

permanent errors after upgrading ZFS #13763

Open

rincebrain mentioned this issue Nov 8, 2022

Handle and detect #13709's unlock regression #14161

Merged

13 tasks

0n-s mentioned this issue Nov 10, 2022

Immutable data corruption(?) after hitting #13709 #14166

Open

tonyhutter closed this as completed in 2163cde Nov 15, 2022

rincebrain mentioned this issue Nov 25, 2022

PANIC at dmu_recv.c on receiving snapshot to encrypted file system #12732

Open

rincebrain mentioned this issue Nov 26, 2023

Input/output error in recent snapshot; three times on same host now #15474

Open

I/O error on mounting encrypt fs after upgrading #13709

I/O error on mounting encrypt fs after upgrading #13709

Comments

xofyarg commented Jul 30, 2022

System information

Describe the problem you're observing

rincebrain commented Jul 31, 2022

xofyarg commented Jul 31, 2022

rincebrain commented Jul 31, 2022 via email

danig-1 commented Jul 31, 2022

xofyarg commented Jul 31, 2022

rincebrain commented Jul 31, 2022 via email

danig-1 commented Jul 31, 2022

xofyarg commented Jul 31, 2022

rincebrain commented Jul 31, 2022

xofyarg commented Jul 31, 2022

rincebrain commented Jul 31, 2022

xofyarg commented Aug 1, 2022

rincebrain commented Aug 1, 2022 via email

xofyarg commented Aug 1, 2022

clhedrick commented Aug 4, 2022

rincebrain commented Aug 4, 2022 • edited Loading

clhedrick commented Aug 5, 2022 • edited Loading

rincebrain commented Aug 5, 2022

clhedrick commented Aug 5, 2022 • edited Loading

mprasil commented Aug 27, 2022 • edited Loading

jonryk commented Aug 30, 2022

rincebrain commented Aug 30, 2022

mprasil commented Aug 30, 2022

jonryk commented Aug 30, 2022 • edited Loading

jonryk commented Aug 31, 2022

rincebrain commented Aug 31, 2022 • edited Loading

jonryk commented Aug 31, 2022

wdoekes commented Jan 30, 2023

lucidBrot commented Feb 10, 2023

lucidBrot commented Feb 13, 2023

lucidBrot commented Feb 16, 2023

rincebrain commented Feb 16, 2023

lucidBrot commented Feb 16, 2023

wdoekes commented Feb 17, 2023 • edited Loading

lucidBrot commented Feb 18, 2023

bigtonylewis commented Feb 26, 2023

lucidBrot commented Feb 26, 2023

bigtonylewis commented Feb 26, 2023

lucidBrot commented Feb 26, 2023

rptb1 commented Oct 10, 2023

rincebrain commented Oct 10, 2023

wdoekes commented Oct 10, 2023 • edited Loading

wdoekes commented Oct 11, 2023

rptb1 commented Oct 12, 2023 • edited Loading

rincebrain commented Aug 4, 2022 •

edited

Loading

clhedrick commented Aug 5, 2022 •

edited

Loading

clhedrick commented Aug 5, 2022 •

edited

Loading

mprasil commented Aug 27, 2022 •

edited

Loading

jonryk commented Aug 30, 2022 •

edited

Loading

rincebrain commented Aug 31, 2022 •

edited

Loading

wdoekes commented Feb 17, 2023 •

edited

Loading

wdoekes commented Oct 10, 2023 •

edited

Loading

rptb1 commented Oct 12, 2023 •

edited

Loading