-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A manually set zpool 'compatibility' property is not preserved over system reboots #12261
Comments
@siebenmann I agree, not having the property persist would be very strange behavior. It's definitely intended to be persistent. I did a little bit of local testing using the master branch but wasn't able to reproduce the issue. The property remained set after an import/export cycle, and also after a reboot. This is just a guess, but is it possible you were using mismatched user utilities and kernel modules? Perhaps there's a case there which was overlooked. |
Based on the timestamps on my machines, I believe that it was not a mismatched kernel/userland. I normally have a coherent user and kernel package set, because I only upgrade the ZFS packages to the current git tip shortly before I install a new kernel and reboot. I won't be able to reboot machines and do other more disruptive testing until some time tomorrow or Wednesday. My pool on one machine has been upgraded and so may not reproduce this, but two pools on my other machine are as yet un-upgraded. |
In case it's useful, on the system that hasn't been updated,
|
That's very strange. Before performing an upgrade on another pool you may want to consider creating a checkpoint with This property isn't that different from other ZFS property so I don't have a great explanation for what might have happened here. @colmbuckley by chance did you happen to observe anything like this in your testing? Or perhaps have an idea what caused the reported behavior:? |
That is indeed very strange. There's no difference (at least, no intentional difference) between how the @siebenmann What versions were you upgrading from/to in your previous comment? Is it possible that the property was lost during the package/module upgrade rather than the reboot? (Mind you, I don't know what mechanism would cause that either, but it feels like it might be slightly more likely than over the reboot.) If this is the case, then (very handwavey working theory) we might look to confirm that newly-set property values can survive a module unload/reload without an explicit pool export/import. I'll poke at this shortly. |
Not able to reproduce here (Debian buster, freshly-installed ZFS from git master head) - the Digging in the source, I don't see anything which would cause this property to be stored or behave any differently from any other. If you do get the opportunity to test on another system, could you please try setting an additional property as well as
|
Well, I can reproduce something on Fedora 34 with 5.12.11-300 and ba91311.
(those export -Fs were automatic when I rebooted the host) Since this didn't reproduce for me just from doing edit: Nah, no reboot needed.
|
Hey Rich -
That's interesting - and strange. What happens if you set the "comment"
property at the same time?
I cant remember whether I needed to do anything special to validate the
property on import (don't *think* so), but I can't really tell what's going
on here.
Colm
…On Tue 22 Jun 2021, 14:15 Rich Ercolani, ***@***.***> wrote:
Well, I can reproduce *something* on Fedora 34 with 5.12.11-300 and
ba91311
<ba91311>
.
$ sudo zpool get all test | grep compat
test compatibility zol-0.8 local
$ sudo zpool history test
History for 'test':
2021-06-22.09:05:12 zpool create test /apool -o compatibility=zol-0.8
2021-06-22.09:05:45 zpool export test -F
2021-06-22.09:07:29 zpool import -c /etc/zfs/zpool.cache -aN
2021-06-22.09:07:44 zpool set compatibility=openzfs-2.0-linux test
2021-06-22.09:07:48 zpool export test -F
2021-06-22.09:11:42 zpool import -c /etc/zfs/zpool.cache -aN
(those export -Fs were automatic when I rebooted the host)
Since this didn't reproduce for me just from doing zpool import, I'm
going to guess the -c is important? (The reboots or -aN might be too, but
that would be unfortunate.)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12261 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABMW5CWSMKGG3VLPKKAMOQDTUCEGBANCNFSM47CEBFXA>
.
|
|
I was previously running 2.1.99-314_gff3175040, which was the current git tip when I built and installed it. Since I have live ZFS pools, I don't believe that my kernel modules were unloaded and reloaded during package update and there's no sign of this in the kernel logs. I've now reproduced this with a test virtual machine that's currently running 2.1.99-314_gff3175040; I booted the VM, set the
There's no export during the reboot, and I don't believe I normally see any. |
This would seem to indicate that it's not just the I'm not familiar with the |
Neither am I, really, I just plundered it from From a quick look at the source, it's undocumented in many places, and labeled "hardforce" where it's processed. A line in spa.c above
which lines up with the observed behavior of being able to do a edit: it also doesn't seem to be all recently-set properties - when I tried setting autoreplace, it persisted as one would expect, even though the other two did not (though that was from a non-default to default value). even more edit:
|
My suspicion here is that the Compare and contrast:
The act of setting the
So I think we're seeing a strange interaction between the undocumented This needs the attention of someone with a little more broad-based understanding of the system logic than I currently have, but it doesn't seem to be related specifically to the |
(A question for the OP, or anyone who knows about Fedora's inner workings; does it routinely do |
I suspect this is to blame (or the equivalent line in mount-zfs.sh.in), but has been the case for 5 years. And if I remove the |
I don't have the |
My suspicion is that this is a long-dormant bug (under some circumstances, certain pool properties are not written synchronously to the cachefile) which has only been noticed now because It might be that Fedora has been doing unclean shutdowns for years and nobody ever noticed before. I think this issue might be better described as “Some zpool properties are lost across a |
I see something interesting at line 6427 of
which I suspect is the root of the problem here. Pending config changes will not be written to the cachefile on a hard-force export. It's not clear (yet) to me why the act of changing the zpool properties like I suspect there's a 'dirty' flag which should be set by the |
Best guess; this is in |
Thanks for digging in to this. You're right, the issue is in The problem here is that the diff --git a/module/zfs/spa.c b/module/zfs/spa.c
index 26995575a..e2dcaa902 100644
--- a/module/zfs/spa.c
+++ b/module/zfs/spa.c
@@ -8725,19 +8725,6 @@ spa_sync_props(void *arg, dmu_tx_t *tx)
spa_history_log_internal(spa, "set", tx,
"%s=%s", nvpair_name(elem), strval);
break;
- case ZPOOL_PROP_COMPATIBILITY:
- strval = fnvpair_value_string(elem);
- if (spa->spa_compatibility != NULL)
- spa_strfree(spa->spa_compatibility);
- spa->spa_compatibility = spa_strdup(strval);
- /*
- * Dirty the configuration on vdevs as above.
- */
- if (tx->tx_txg != TXG_INITIAL)
- vdev_config_dirty(spa->spa_root_vdev);
- spa_history_log_internal(spa, "set", tx,
- "%s=%s", nvpair_name(elem), strval);
- break;
default:
/*
@@ -8804,6 +8791,11 @@ spa_sync_props(void *arg, dmu_tx_t *tx)
case ZPOOL_PROP_MULTIHOST:
spa->spa_multihost = intval;
break;
+ case ZPOOL_PROP_COMPATIBILITY:
+ strval = fnvpair_value_string(elem);
+ if (spa->spa_compatibility != NULL)
+ spa_strfree(spa->spa_compatibility);
+ spa->spa_compatibility = spa_strdup(strval);
default:
break;
} And then updating @colmbuckley would you mind taking a crack at this fix? While this code still isn't in a tagged release we may still want to add some compatibility code to first check the |
Gah; just bad luck that I picked the one exceptional property to base my extension on. I'll have a look at this during the week, and will think about how we might recover from people using pre-production code with the incorrect location. |
Well upon second thought maybe this actually is a better location than with the other properties. By storing it in the configuration object this information is more easily available without needing to fully import the pool. That might be handy to get an idea of what level of compatibility is required. I've gone ahead and opened PR #12276 which fixes this by simply making sure the cache file gets updated as well when the @colmbuckley if you've got a few minutes to take a look please do. |
What, you mean all the time I spent yesterday learning about how the ZAP works was wasted?!?!! I was going to suggest that, if we move But I don't have any objections to including them in config instead - I see some benefit to having properties which only affect userland being there, so that they can be inspected before a full import. Will take a look at yours now. |
Not necessarily! My initial inclination was to do exactly what you're proposing. However, after a little reflection keeping both properties in the configuration didn't seem so unreasonable. Plus it avoided the (minimal) complexity of needing to check two locations and maybe even has some advantages. So I opted to make the smaller fix for now. I wouldn't be against moving both the |
I've merged #12276 which is really the minimal change need to resolve the immediate bug. |
Unlike most other properties the 'compatibility' property is stored in the pool config object and not the DMU_OT_POOL_PROPS object. This had the advantage that the compatibility information is available without needing to fully import the pool (it can be read with zdb). However, this means we need to make sure to update both the copy of the config in the MOS and the cache file. This wasn't being done. This commit adds a call to spa_async_request() to ensure the copy of the config in the cache file gets updated as well as the one stored in the pool. This same change is made for the 'comment' property which suffers from the same inconsistency. Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Colm Buckley <colm@tuatha.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12261 Closes #12276
System information
Describe the problem you're observing
The zpool
compatibility
property documented in the zpoolprops manual page appears to be howzpool create
creates pools with only certain features enabled and howzpool upgrade
implements limited feature upgrades. The latter property makes it quite interesting to set on existing old pools to limit the degree thatzpool upgrade
will upgrade them. However, experimentation says that a manually setcompatibility
property does not appear to be preserved over a reboot. If I set egzpool set compatibility=openzfs-2.0-linux <pool>
,zpool get compatibility
will show it set at that point, but after a system rebootzpool get compatibility
reports that the value is back tooff
.Further,
zpool history
shows the compatibility being set, but it's gone now:There is no mention of this behavior in the manual pages, and it would be highly unusual for a pool property setting to be recorded in the pool history and then discarded. If the pool must have some required feature in order for this to work, it should at least be mentioned in the manpage, and better yet
zpool set compatibility=...
should either fail outright or give an error message.Describe how to reproduce the problem
Set
compatibility
to some valid value on an old pool without it that has pending updates. Dozpool get compatibility
to observe that it's reported as set, andzpool history
to see that it's in the history. Reboot, and checkzpool get compatibility
. If you're running git tip or a development version and your pool has not yet been upgraded, you can further verify this by setting compatibility to openzfs-2.0-linux, which excludes draid, rebooting, and doing azpool upgrade
, which will now enable draid on you.The text was updated successfully, but these errors were encountered: