-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Curious zvol behavior oddities in GH Actions since 20211004.1 #12644
Comments
The culprit appears to be the upgrade of the You can see removing the udev rule run to completion here. If that seems like too heavy-handed a solution, you could just shove I've reported this upstream here. @behlendorf Not that #12663 is a bad idea, but something like this could be less invasive, perhaps. |
That explains a lot. Thanks for getting to the real root cause of this change in behavior. Hopefully, upstream will not only fix this is 21.4 but also make the change in 18.04 and 20.04. My feeling is we should do both. Could you open a PR for your less heavy handed |
Oh sure, I'm absolutely not suggesting not going forward with that, and if I came across as doing that, I'm sorry, that's a failure on my part - just that it might make sense if this turns out to involve whack-a-mole beyond the initial 100% failures to do something like this. They did say it'll be fixed in the next release and have a PR out to make it opt-in not always-on, though there's no promise that various cloud providers don't all opt-in by default... Do you want the workaround just for the GH workflows, or in the wider package install script? Normally I'd be aggressively advocating for the latter "just in case", but since they said they're fixing it in the next release, and cloud-init is basically only installed on cloud providers, it feels like such a short-lived workaround might be fine to put in the former, in this one case. |
Why don't we just make the change for now in the GH workflows since it's something we're expecting will be fixed. It's also hopefully something very few people would ever encounter since I'd like to think creating and immediately destroying a volume is a pretty uncommon activity. |
cloud-init added a hook which triggers on every device add/rm event, which results in holding open devices for a while after they're created/destroyed. So let's shove an exclusion rule for that into the GH workflows until it gets fixed. Closes: openzfs#12644 Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
cloud-init added a hook which triggers on every device add/rm event, which results in holding open devices for a while after they're created/destroyed. So let's shove an exclusion rule for that into the GH workflows until it gets fixed. Closes: openzfs#12644 Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
cloud-init added a hook which triggers on every device add/rm event, which results in holding open devices for a while after they're created/destroyed. So let's shove an exclusion rule for that into the GH workflows until it gets fixed. Closes: openzfs#12644 Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
cloud-init added a hook which triggers on every device add/rm event, which results in holding open devices for a while after they're created/destroyed. So let's shove an exclusion rule for that into the GH workflows until it gets fixed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes openzfs#12644 Closes openzfs#12669
cloud-init added a hook which triggers on every device add/rm event, which results in holding open devices for a while after they're created/destroyed. So let's shove an exclusion rule for that into the GH workflows until it gets fixed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes openzfs#12644 Closes openzfs#12669
cloud-init added a hook which triggers on every device add/rm event, which results in holding open devices for a while after they're created/destroyed. So let's shove an exclusion rule for that into the GH workflows until it gets fixed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12644 Closes #12669
cloud-init added a hook which triggers on every device add/rm event, which results in holding open devices for a while after they're created/destroyed. So let's shove an exclusion rule for that into the GH workflows until it gets fixed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12644 Closes #12669
cloud-init added a hook which triggers on every device add/rm event, which results in holding open devices for a while after they're created/destroyed. So let's shove an exclusion rule for that into the GH workflows until it gets fixed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes openzfs#12644 Closes openzfs#12669
cloud-init added a hook which triggers on every device add/rm event, which results in holding open devices for a while after they're created/destroyed. So let's shove an exclusion rule for that into the GH workflows until it gets fixed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes openzfs#12644 Closes openzfs#12669
cloud-init added a hook which triggers on every device add/rm event, which results in holding open devices for a while after they're created/destroyed. So let's shove an exclusion rule for that into the GH workflows until it gets fixed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes openzfs#12644 Closes openzfs#12669
System information
Describe the problem you're observing
Since Github rolled out 20211004.1 images, a bunch of tests have started failing ~100% of the time (some of them aren't 100%, but many are):
Having looked into them, basically all of them are of the form:
zfs set reservation
/zfs destroy -r
/zfs promote
/...)zpool export
/zpool destroy
/zfs destroy
/etc (writing more space than the pool should allow is one additional example)Curiously, the kernel/userland doesn't have any flamingly obvious differences, dmesg from working/nonworking don't have obviously consistent differences, config.log doesn't have any obvious differences, and it doesn't reproduce on the same kernel in a stock Ubuntu VM of 18.04/20.04, so...?
I'm iterating on a PR that applies
udev_wait
s andzpool sync
s like there's no tomorrow that cleans these up, but I got to thezfs_destroy
failures, and that seems like it might be a bit stranger...since it seems to be incorrectlyzfs destroy
ing...Describe how to reproduce the problem
Just run a Github Action workflow with Ubuntu 18.04/20.04 now, since you can't run old versions. :D
Include any warning/errors/backtraces from the system logs
Curiously, this one fails somewhere that shouldn't have any outstanding references, and yet...
An example from reservation_014_pos (no code changes, but I modified reservation_014_pos to set zfs_flags to 512, echo the various env variables used before running zfs set quota, and try setting it twice):
The text was updated successfully, but these errors were encountered: