-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZTS: Standardize use of destroy_dataset in cleanup #12663
Conversation
cf7eee0
to
e305710
Compare
When cleaning up a test case standardize on using the convention: datasetexists $ds && destroy_dataset $ds <flags> By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure that the destroy is retried in the event that a ZFS volume is busy. This helps ensures ensure tests are fully cleaned up and prevents false positive test failures on Linux. Note that all of the tests which used 'zfs destroy' in cleanup have been updated even if they don't use volumes. This was done to clearly establish the expected convention. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
e305710
to
3c01e0b
Compare
tests/zfs-tests/tests/functional/cli_root/zfs_rename/zfs_rename.kshlib
Outdated
Show resolved
Hide resolved
This patch seems to have missed
And the copy-paste jobs in the other snapused, which explains those continued failures. history_002_pos is failing on the destroys in: zfs/tests/zfs-tests/tests/functional/history/history_002_pos.ksh Lines 182 to 197 in ec64fdb
So I'd probably just iostat/setup failed because zfs_list/cleanup failed, and that failed because zfs/tests/zfs-tests/tests/functional/cli_root/zfs_get/zfs_get_list_d.kshlib Lines 79 to 82 in ec64fdb
needs the same love as everything else in this PR. zfs_unload-key_all is still failing on trying to unload-key -a while the zvol is still open. If "retry a couple times" is the order of the day, perhaps a s/ |
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Thanks for taking a look. I've added a commit to the PR to address the latest failures we saw in the CI. That should improve things but I'd still like to run this PR through the Ubuntu builders a few times to make sure it reliably passes. It wouldn't shock me if I missed some test cases since I only updated those which did a "datasetexists && zfs destroy". It looks like that's why I missed the "snapused" tests. |
Of course. I'm really curious to know how you reproduced it locally, because I tried Ubuntu 18.04/20.04 under Hyper-V, VirtualBox, and KVM, and they all were perfectly happy with life backed by SSD or spinning disks. |
Somewhat to my surprise I was able to easily reproduce the issue using using an ec2 t2.xlarge instance and Ubuntu 18.04. |
...that just raises further questions, since the other testbots on AWS seem content with their lives. Huh. Maybe the intersection of recent kernel and VM setup? But Fedora 33 should be new enough...hm. I also worry about whether busywaiting like this is the wrong fix, if this worked consistently everywhere before and is breaking consistently in only some places now - like the need for an explicit |
At least on Linux it's always been the case that some What is odd, is that we're suddenly seeing this all the time in some environments. Specifically we're racing with Well the additional |
I have no burning desire to drop it, I just picked on it because I recalled needing changes in that test and it wasn't a simple matter of I think, more than anything, it just bothers me a lot that it's suddenly failing consistently on some platforms and acting like nothing changed on others, and it's not particularly evident what's changed, because it might break other expectations. |
Yes, I completely agree. I'm not happy about needing this change, but making the test suite a little more resilient to this kind of known behavior seemed like the least terrible option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on.
tests/zfs-tests/tests/functional/cli_root/zfs_create/zfs_create_004_pos.ksh
Show resolved
Hide resolved
This was missed in the first pass of changes but caught by the CI. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
zfs_load-key/zfs_load-key_all:
Same cause, different command? I think it's good to tackle it in separate PR. |
This issue may also occur when unloading keys. We made this same fix to zfs_unload-key_all.ksh so do it here as well. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Right, same cause different command. I went ahead and updated this PR to handle it since we'd already made the same change to |
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Edit the workaround in zfs-tests-*.yml to print if it successfully edited the rules file, and add explicit cleanup calls in a couple tests that have occasionally failed in ways that look like more fun from openzfs#12663 even with all this. Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
When cleaning up a test case standardize on using the convention: datasetexists $ds && destroy_dataset $ds <flags> By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure that the destroy is retried in the event that a ZFS volume is busy. This helps ensures ensure tests are fully cleaned up and prevents false positive test failures on Linux. Note that all of the tests which used 'zfs destroy' in cleanup have been updated even if they don't use volumes. This was done to clearly establish the expected convention. Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12663
When cleaning up a test case standardize on using the convention: datasetexists $ds && destroy_dataset $ds <flags> By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure that the destroy is retried in the event that a ZFS volume is busy. This helps ensures ensure tests are fully cleaned up and prevents false positive test failures on Linux. Note that all of the tests which used 'zfs destroy' in cleanup have been updated even if they don't use volumes. This was done to clearly establish the expected convention. Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12663
When cleaning up a test case standardize on using the convention: datasetexists $ds && destroy_dataset $ds <flags> By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure that the destroy is retried in the event that a ZFS volume is busy. This helps ensures ensure tests are fully cleaned up and prevents false positive test failures on Linux. Note that all of the tests which used 'zfs destroy' in cleanup have been updated even if they don't use volumes. This was done to clearly establish the expected convention. Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12663
When cleaning up a test case standardize on using the convention: datasetexists $ds && destroy_dataset $ds <flags> By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure that the destroy is retried in the event that a ZFS volume is busy. This helps ensures ensure tests are fully cleaned up and prevents false positive test failures on Linux. Note that all of the tests which used 'zfs destroy' in cleanup have been updated even if they don't use volumes. This was done to clearly establish the expected convention. Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12663
When cleaning up a test case standardize on using the convention: datasetexists $ds && destroy_dataset $ds <flags> By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure that the destroy is retried in the event that a ZFS volume is busy. This helps ensures ensure tests are fully cleaned up and prevents false positive test failures on Linux. Note that all of the tests which used 'zfs destroy' in cleanup have been updated even if they don't use volumes. This was done to clearly establish the expected convention. Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12663
When cleaning up a test case standardize on using the convention: datasetexists $ds && destroy_dataset $ds <flags> By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure that the destroy is retried in the event that a ZFS volume is busy. This helps ensures ensure tests are fully cleaned up and prevents false positive test failures on Linux. Note that all of the tests which used 'zfs destroy' in cleanup have been updated even if they don't use volumes. This was done to clearly establish the expected convention. Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12663
When cleaning up a test case standardize on using the convention: datasetexists $ds && destroy_dataset $ds <flags> By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure that the destroy is retried in the event that a ZFS volume is busy. This helps ensures ensure tests are fully cleaned up and prevents false positive test failures on Linux. Note that all of the tests which used 'zfs destroy' in cleanup have been updated even if they don't use volumes. This was done to clearly establish the expected convention. Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12663
Motivation and Context
After the Ubuntu 18.04 and 20.04 CI builder VMs were last updated we're
reliably seeing instances where ZFS volumes are active (still open) when
zfs destroy
is run. This isn't unexpected since processes likeblkid
willopen the device when it's first created. The fix for this is to retry on busy
in the ZTS for Linix. This had been done previously but wasn't done
exhaustively in all places. This change is intended to address those remaining
cases by systematically updating the cleanup functions to use
destroy_dataset
which does retry.
Description
When cleaning up a test case standardize on using the convention:
By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure
that the destroy is retried in the event that a ZFS volume is busy.
This helps ensures ensure tests are fully cleaned up and prevents false
positive test failures on Linux.
Note that all of the tests which used 'zfs destroy' in cleanup have
been updated even if they don't use volumes. This was done to
clearly establish the expected convention.
How Has This Been Tested?
Locally ran the majority of the test suite on Ubuntu 20.04 which I
was able to reproduce this issue. With the change applied the
testing which previously failed are passing.
Types of changes
Checklist:
Signed-off-by
.