Skip to content
This repository has been archived by the owner on May 14, 2024. It is now read-only.

Generic sanity check for all collections #96

Closed
felixfontein opened this issue Apr 27, 2022 · 29 comments
Closed

Generic sanity check for all collections #96

felixfontein opened this issue Apr 27, 2022 · 29 comments

Comments

@felixfontein
Copy link
Contributor

Summary

As ansible-community/ansible-build-data#114 showed we should really check the existing collections including in the Ansible package. Some ideas:

  1. Set up some nightly CI that runs some basic sanity checks (like ansible-test sanity --docker -v) on all collections. Let's specify one or two of the stable branches for that.
  2. Give all collections that fail this some time to fix it. If they don't, say, in two months, let's deprecate them with planned removal from Ansible 7. (Doing this for Ansible 6 is too close IMO.)
  3. From then on, warn in advance once new stable branches are added to this CI (and remove old ones so that at most 1-2 of them are active at the same time), with the same rules: if a collection fails CI and doesn't fix it in a certain amount of time (with a new release), they will be deprecated and removed in the next major version of Ansible.

What do you think?

Also it would be great to do some manual review (check whether CI is set up, look at least at some of the conditions from https://github.com/ansible-collections/overview/blob/main/collection_requirements.rst, ...), but that is very time consuming, and since we already struggle with reviewing new collections, I doubt we will find enough time to do this.

@mariolenz
Copy link
Contributor

Sounds reasonable. But I have a question about this:

  1. Give all collections that fail this some time to fix it. If they don't, say, in two months, let's deprecate them with planned removal from Ansible 7. (Doing this for Ansible 6 is too close IMO.)

If a collection takes, say, two and a half months to fix it, would we still remove it from Ansible 7? Or would we "un-deprecate" it? Could we even do it, I mean how would we add to the changelog: We've deprecated this collection and wanted to remove it from Ansible 7 but we've changed our minds.

@felixfontein
Copy link
Contributor Author

I guess we can still un-deprecate it if we're happy with the result (and the explanation why it took so long). I guess this would work similarly to the document we're voting on in #90, i.e. we'd have a procedure described for this with possible conditions.

@mariolenz
Copy link
Contributor

I guess we can still un-deprecate it if we're happy with the result (and the explanation why it took so long).

But how would we document this in the changelog? There's deprecated_features but not undeprecated_features or something similar, is there?

@felixfontein
Copy link
Contributor Author

I'd probably use major_changes, since both shows up in the porting guide. We can also remove the existing changelog entry if we want to.

@Andersson007
Copy link
Contributor

All the ideas in the description sound good to me
+1

@MarkusTeufelberger
Copy link

Do all collections actually have sanity tests? Especially if they are only containing roles, I'm uncertain if these are even doing much, they seem very geared towards Python code in plugins from a quick look.

@felixfontein
Copy link
Contributor Author

@MarkusTeufelberger the ones included in Ansible are required to run (and pass) sanity checks (though it's not clear how many actually do - fortinet.fortios is an example of a collection which obviously didn't run them before release). ansible-test's sanity checks don't do much for collections that only contain roles, but I think it's still a good idea to run them (it will also be rather quick), because a) the tests also check meta/runtime.yml (which you can have without plugins/modules) and changelog fragments (in case you use them), and b) hopefully sooner or later there will be a sanity check for role argument specs.

@dmsimard
Copy link

dmsimard commented May 2, 2022

FYI I have a WIP PR in which I provide an implementation to (optionally) run sanity tests on all the collections included the package as part of the build process: ansible-community/antsibull-build#419

@dmsimard
Copy link

dmsimard commented May 3, 2022

I've ran a comprehensive sanity test on every collection included in 6.0.0a2 (minus pylint) and found that 67 collections (out of 98) are currently failing sanity tests, data is here: https://gist.github.com/dmsimard/8a0526917398b19410581568947ee0dc

I'm not entirely positive if part of it could be due to ansible-test somehow not picking up the ignore files and need to double check but I figured I would at least share my preliminary findings for the time being and will update them if need be.

@mariolenz
Copy link
Contributor

Thanks @dmsimard, great work!

I'm having a look at the output for community.vmware. vca_fw and vca_nat fail the tests:

ERROR: Found 2 validate-modules issue(s) which need to be resolved:
ERROR: plugins/modules/vca_fw.py:0:0: no-default-for-required-parameter: DOCUMENTATION.options.fw_rules: Argument is marked as required but specifies a default. Arguments with a default should not be marked as required for dictionary value @ data['options']['fw_rules']. Got {'description': ['A list of firewall rules to be added to the gateway, Please see examples on valid entries'], 'required': True, 'default': False}
ERROR: plugins/modules/vca_nat.py:0:0: no-default-for-required-parameter: DOCUMENTATION.options.nat_rules: Argument is marked as required but specifies a default. Arguments with a default should not be marked as required for dictionary value @ data['options']['nat_rules']. Got {'description': ['A list of rules to be added to the gateway, Please see examples on valid entries'], 'required': True, 'default': False}
ERROR: The 2 sanity test(s) listed below (out of 43) failed. See error output above for details.

They're deprecated, so we didn't want to invest any more time in them. I thought this would ignore the errors:

https://github.com/ansible-collections/community.vmware/blob/2ce7a1313af74bdc96e9f476330c808f25c16033/tests/sanity/ignore-2.13.txt#L1-L12

Interestingly, vca_vapp doesn't fail the tests although we had to ignore some things there, too. That's a bit weird.

Then there are a lot of this:

DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives

I wonder why we don't see this in our Zuul jobs (see here for a recent CI run).

Hmmm, we've already replaced LooseVersion from distutils.version with the one from ansible.module_utils.compat.version in ansible-collections/community.vmware#1165. Looks like we have to do the same for StrictVersion.

Nevertheless, the question remains: Why don't we see this error in the Zuul jobs?

@felixfontein
Copy link
Contributor Author

The problems in community.general and community.network are very likely due to symlinks being converted to actual files by the build process (ansible-community/antsibull-build#218). (Also for c.g the wrong version is used, that's fixed in ansible-community/antsibull-build#421.)

@felixfontein
Copy link
Contributor Author

@mariolenz you don't see these errors in Zuul since you don't test against stable-2.13. You only test against devel (and the errors are correctly ignored in the ignore-2.14.txt), but then you test against milestone, which is a very out of date version of stable-2.13 (for example it does not have ansible/ansible#77268 which was merged on March 17th, which is what produces these errors). It basically is the state of the devel branch on 2022-02-14.

Hmmm, we've already replaced LooseVersion from distutils.version with the one from ansible.module_utils.compat.version in ansible-collections/community.vmware#1165. Looks like we have to do the same for StrictVersion.

Since the whole of distutils is deprecated: yes, you do. LooseVersion is the most commonly used thing from distutils.version, but everything else from distutils also has to be removed. StrictVersion can be 'removed' the same way as LooseVersion.

@mariolenz
Copy link
Contributor

you don't see these errors in Zuul since you don't test against stable-2.13. You only test against devel (and the errors are correctly ignored in the ignore-2.14.txt), but then you test against milestone, which is a very out of date version of stable-2.13

Thanks, that explains it. And now I remember: I couldn't ignore those in ignore-2.13.txt because milestone is so old that it doesn't know about them.

Since the whole of distutils is deprecated: yes, you do. LooseVersion is the most commonly used thing from distutils.version, but everything else from distutils also has to be removed.

I think distutils.version.StrictVersion is the only thing left from distutils.

StrictVersion can be 'removed' the same way as LooseVersion.

ansible-collections/community.vmware#1306

@mariolenz
Copy link
Contributor

you test against milestone, which is a very out of date version of stable-2.13. It basically is the state of the devel branch on 2022-02-14.

Shouldn't the milestone branch have been advanced 2022-05-02?

@mariolenz
Copy link
Contributor

I've ran a comprehensive sanity test on every collection included in 6.0.0a2 (minus pylint) and found that 67 collections (out of 98) are currently failing sanity tests

Maybe some collections test against the milestone branch, not against stable-2.13. We're doing exactly this in community.vmware, and it looks like this is part of the problem.

Now that the milestone branch has been advanced (ansible-collections/news-for-maintainers#16), we hopefully will see some improvements here because they run into problems in their CI.

@dmsimard
Copy link

dmsimard commented May 6, 2022

I've finished creating standalone issues in collection repositories where we picked up errors from sanity tests.

@felixfontein pointed out in ansible-collections/community.dns#100 that running tests from the release tarball is not ideal because of a symlink issue described in ansible-community/antsibull-build#218 so I will address that by the time we run tests again.

@Andersson007
Copy link
Contributor

Sanity tests can be actually fixed like in ansible-collections/community.rabbitmq#121 but not released yet.

  • The question is if they should appear on Galaxy and there's nothing else to ship (no bugfixes, major/minor change), what maintainers should do? Would it be OK to consider such fixes bugfixes and to release a patch release containing only test fixes?
  • If so, should we recommend maintainers (at least, in the collection requirements) to release their collections on the spot after fixing sanity tests?

@mariolenz
Copy link
Contributor

Sanity tests can be actually fixed like in ansible-collections/community.rabbitmq#121 but not released yet.

* The question is if they should appear on Galaxy and there's nothing else to ship (no bugfixes, major/minor change), what maintainers should do? Would it be OK to consider such fixes bugfixes and to release a patch release containing only test fixes?

I should say: Yes.

* If so, should we recommend maintainers (at least, in the collection requirements) to release their collections on the spot after fixing sanity tests?

Generally: Yes. But I think "on the spot" might be a bit extreme, as soon as possible and preferably befor the next Ansible release would be sufficient, wouldn't it?

@felixfontein
Copy link
Contributor Author

Generally: Yes. But I think "on the spot" might be a bit extreme, as soon as possible and preferably befor the next Ansible release would be sufficient, wouldn't it?

I fully agree!

@dmsimard
Copy link

dmsimard commented May 13, 2022

A quick recap and then an update:

I've ran a comprehensive sanity test on every collection included in 6.0.0a2 (minus pylint) and found that 67 collections (out of 98) are currently failing sanity tests, data is here: https://gist.github.com/dmsimard/8a0526917398b19410581568947ee0dc

I'm not entirely positive if part of it could be due to ansible-test somehow not picking up the ignore files and need to double check but I figured I would at least share my preliminary findings for the time being and will update them if need be.

It turns out there was something wrong:

@felixfontein pointed out in ansible-collections/community.dns#100 that running tests from the release tarball is not ideal because of a symlink issue described in ansible-community/antsibull#218 so I will address that by the time we run tests again.

I've re-run sanity tests for every packaged collection, this time installed with ansible-galaxy collection install -r galaxy-requirements.yaml provided by the release of 6.0.0a2.

This time there were 38 failures out of 98 with many now passing and a few now failing (edit: this was mistakenly with 2.12, see edit at the end):
diff

I'll update the issues I've created and I've sent a WIP PR to ansible-build-data (changing the approach from within antsibull) so we can test this on a regular basis: ansible-community/ansible-build-data#122

Edit: While troubleshooting, I realized I mistakenly ran the tests with "ansible-galaxy collection install" from ansible-core 2.12 instead of 2.13.0rc1 🤦

It looks like that may be why there were less failures because it's back up to 59 now:
Screenshot from 2022-05-12 23-52-12

I apologize for the confusion, even though the comparison between 2.12 and 2.13 may be interesting.

@mariolenz
Copy link
Contributor

@dmsimard The community.dns issue is already closed, and I don't see that you've opened one in community.general. So it looks like these issues are left to be closed since the tests don't fail anymore:

Is it OK for you if I do?

@dmsimard
Copy link

@mariolenz your help would be appreciated, sure !

I'd still like to understand why it is that these particular collections are failing from the tarball (and not from galaxy collection install) and whether it is fixable or not, though.

My initial objective was to test the package itself as it is shipped so it's unfortunate that we get different results but we have to start somewhere and it may be a better investment to focus on the collections that are failing when installing with galaxy.

We can always circle back to the packaging later.

@mariolenz
Copy link
Contributor

@dmsimard I can't close those issues due to missing permissions :-/ But I've added a comment that they can be closed because of a false positive from our side.

@dmsimard
Copy link

@dmsimard I can't close those issues due to missing permissions :-/ But I've added a comment that they can be closed because of a false positive from our side.

I've closed the remaining ones, thanks !

@dmsimard
Copy link

Worked through some of the collections and updated linked issues above (will edit this list for a first batch of updates to keep the noise down):

  • amazon.aws issues are different from the release tarball vs ansible-galaxy collection install. Fixes are WIP.
  • chocolatey.chocolatey issues are the same as the release tarball, maintainers are responsive.
  • community.digitalocean issues are different from the release tarball vs ansible-galaxy collection install. Fixes are WIP.
  • hetzner.hcloud issues are different from the release tarball vs ansible-galaxy collection install. It has new failures with 2.13 that don't reproduce with 2.12, has a responsive maintainer.
  • kubernetes.core issues are different from the release tarball vs ansible-galaxy collection install. Has not yet been acknowledged.

@dmsimard
Copy link

I haven't gotten around to re-visiting every collection issue yet but in the meantime, 6.0.0a3 is upon us and I've run sanity tests again with the latest releases of collections.

We're down from 59 failures to 51 which is progress (full data here):
Screenshot from 2022-05-17 18-27-50

@mariolenz
Copy link
Contributor

mariolenz commented May 18, 2022

@dmsimard Most of the issues in the collections that don't fail any more have been closed already. I have added a comment that the tests seem to succeed in 6.0.0a3 to the two or three remaining.

vmware.vmware_rest is new, but I think the tests are fixed but not released yet: ansible-collections/vmware.vmware_rest#332

edit:

Still open:

openstack-mirroring pushed a commit to openstack/ansible-collections-openstack that referenced this issue Jul 27, 2022
distutils is deprecated in 3.10: https://peps.python.org/pep-0632/
Ansible requires it to be replaced[1]

[1] ansible-community/community-topics#96
ansible-collections/news-for-maintainers#18

Change-Id: I2bae37f206319e8f9ace468f5b94f6be643b6a3c
openstack-mirroring pushed a commit to openstack/ansible-collections-openstack that referenced this issue Sep 28, 2022
distutils is deprecated in 3.10: https://peps.python.org/pep-0632/
Ansible requires it to be replaced[1]

[1] ansible-community/community-topics#96
ansible-collections/news-for-maintainers#18

Change-Id: I2bae37f206319e8f9ace468f5b94f6be643b6a3c
(cherry picked from commit ccbbc31)
@mariolenz
Copy link
Contributor

@felixfontein Please close this issue if done, or open a new forum topic and then close the issue with a pointer to the new discussion: Community-topics: Archiving the repo

@felixfontein
Copy link
Contributor Author

I guess the place to continue the discussion is https://forum.ansible.com/t/testing-collections-within-the-ansible-package/2455. Thanks everyone!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants