Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

pypi.org modules for "ansible" and "ansible-core" are mislabeled #185

Closed
nkadel opened this issue Sep 19, 2021 · 9 comments
Closed

pypi.org modules for "ansible" and "ansible-core" are mislabeled #185

nkadel opened this issue Sep 19, 2021 · 9 comments

Comments

@nkadel
Copy link

nkadel commented Sep 19, 2021

SUMMARY

The pypi.org published module for "ansible" is no longer ansible. It contains dozens of modules from the ansible-collections, bundled in a single release tarball. It lists a dependency on the "ansible-core" module, which is itself the original ansible software, but in a separately mislabeled tarball. What was in the old "ansible" module has been relabeled as an "ansible-core" module.

This causes and will continue to cause confusion for python users who run "pip install ansible" and get both ansible and the bulky and unnecessary "ansible-collections" bundle. The resulting ansible-collections is roughly 4 MByte compressed, and generates a 2.5 MByte RPM, all of which is not necessary and should not be welcome part of the default ansible installation.

Segregating it to an "ansible-collections" python module would be reasonable, and workable. Reverting the python module names and numbers will require caution: the "ansible" module is now labeled as version 4.5.0, even though the real "ansible" source code's most recent version is 2.11.5. It's very confusing.

ISSUE TYPE
  • Bug Report
@felixfontein
Copy link
Contributor

The pypi.org published module for "ansible" is no longer ansible. It contains dozens of modules from the ansible-collections, bundled in a single release tarball. It lists a dependency on the "ansible-core" module, which is itself the original ansible software, but in a separately mislabeled tarball. What was in the old "ansible" module has been relabeled as an "ansible-core" module.

That is not correct. The Pypi package "ansible" has always been Ansible "with batteries included". This did not change between 2.9 (when everything was still one package) and 2.10 (when the core was moved to the ansible-base package, which was renamed to ansible-core in 2.11). ansible-core is ansible-core, and ansible is what is also called the Ansible community distribution.

This causes and will continue to cause confusion for python users who run "pip install ansible" and get both ansible and the bulky and unnecessary "ansible-collections" bundle.

This is not "unnecessary", since that's basically the batteries that have always been included as part of the Ansible package. It's only that since 2.10, it is now possible to install the "core" part of Ansible, i.e. the part "without batteries".

(And yes, things are still confusing, and documentation can definitely be improved - PRs are always welcome! -, but what you are requesting is not the solution.)

The resulting ansible-collections is roughly 4 MByte compressed, and generates a 2.5 MByte RPM, all of which is not necessary and should not be welcome part of the default ansible installation.

If you don't want the batteries, simply install ansible-core.

Segregating it to an "ansible-collections" python module would be reasonable, and workable. Reverting the python module names and numbers will require caution: the "ansible" module is now labeled as version 4.5.0, even though the real "ansible" source code's most recent version is 2.11.5. It's very confusing.

ansible-core has version 2.11.5. Ansible has version 4.5.0.

If you are wondering on ansible --verison showing 2.11.5, please note that ansible --version shows the version of the ansible CLI tool, which is part of ansible-core. Please take note of the "core" which is (since 2.11) part of the version output:

ansible [core 2.11.5]

@tadeboro
Copy link

The pypi.org published module for "ansible" is no longer ansible. It contains dozens of modules from the ansible-collections, bundled in a single release tarball. It lists a dependency on the "ansible-core" module, which is itself the original ansible software, but in a separately mislabeled tarball. What was in the old "ansible" module has been relabeled as an "ansible-core" module.

While the ansible Python package might not be released and maintained by the same team as it was before version 2.10.0, it is still the same batteries-included package that people expect. The main reason why the name did not change is backward compatibility: playbooks and roles that worked with Ansible 2.9 should still work with the more recent Ansible versions if you make sure you follow the upgrade guides.

This causes and will continue to cause confusion for python users who run "pip install ansible" and get both ansible and the bulky and unnecessary "ansible-collections" bundle. The resulting ansible-collections is roughly 4 MByte compressed, and generates a 2.5 MByte RPM, all of which is not necessary and should not be welcome part of the default ansible installation.

From my experience, most casual Ansible users do not care about the split and this is the target population developers and community tried to protect when they designed the way forward during the core/content split. And I explained in the previous paragraph, going any other way would mean that some playbooks would start working because some of the content would be missing.

More experienced Ansible users have the freedom to optimize their deployment systems in a number of ways: from installing ansible-core and then adding only required collections to building execution environments.

Segregating it to an "ansible-collections" python module would be reasonable, and workable. Reverting the python module names and numbers will require caution: the "ansible" module is now labeled as version 4.5.0, even though the real "ansible" source code's most recent version is 2.11.5. It's very confusing.

Again, renaming the package is not an option here because the ansible PyPI package was always a batteries-included bundle that just happened to live in one GitHub repo. But even in those times, the core team only maintained a small fraction of plugins and modules.

@nkadel
Copy link
Author

nkadel commented Sep 19, 2021

I just checked. Up until at least the pypi.org published ansible 2.9.25, the "ansible" published in pypi.org differed only slightly from the git repo tags at https://github.com/ansible/ansible. Apparently, somewhere among the 2.10 builds, someone decided to replace the contents of the original "ansible" tarball with the "ansible_collection" tarball, and divorce it from the upstream github repo called "ansible", and rely on installing a new pypi module called "ansible-core" as a dependency.

This is now very confusing, and even dangerous. There is no pointer to the relevant upstream git repo or provenance for the published ansible_collections tarball being used, and the documentation incorrectly points to https://github.com/ansible/ansible, all of which is now instead in the "ansible-core" module. Dozens if not hundreds of vendor published "pip install ansible" steps, including those published for Ansible tower and AWX, need to be revised to point to "pip install ansible-core" to avoid a bulky and unwelcome undle of potentially conflicting or incompatible modules which they don't need or want.

If I want a 20 pound accessory kit to go with my good 1 pound electric screwdriver, I'd like the box it comes in to be labeled "powered screwdriver accessories", not to be labeled "powered screwdriver" and find a sign in it saying "go get the other box labeled powered-screwdriver-core". It's confusing, and it multiplies both the size and the installation time for "pip install ansible" by a factor of about 25.

Maybe not that much increase in time, the ansible test scripts are pretty thorough. But it burns a lot of time very few people need to spend.

@tadeboro
Copy link

I think you are missing a few pieces of information. Did you read through the https://www.ansible.com/blog/ansible-3.0.0-qa and https://www.ansible.com/blog/announcing-the-community-ansible-3.0.0-package ? Those two blog post should give you a bit more information about what is happening. But just for the sake of completeness, let me briefly summarize the situation here.

When you run pip install "ansible<2.10", what you basically get is the contents of the https://github.com/ansible/ansible/tree/stable-2.9 that includes:

  1. the core execution engine,
  2. a few general-purpose modules (things like package, service, user, group, copy, template, etc.),
  3. a selection of OpenStack modules,
  4. a selection of AWS modules,
  5. a selection of modules for managing Grafana,
  6. ...

So whether you like it or not, you are installing the 20 pound accessory kit (ca. 4000 modules and plugins) when you install Ansible 2.9. But things changed with the release of Ansible 2.10.

When you run pip install "ansible>=2.10", you still get:

  1. the core execution engine,
  2. a few general-purpose modules (things like package, service, user, group, copy, template, etc.),
  3. a selection of OpenStack modules,
  4. a selection of AWS modules,
  5. a selection of modules for managing Grafana,
  6. ...

The difference is that ansible/ansible repository only contains things listed under 1 and 2 (those are packaged as the ansible-core PyPI package) while the rest of the content is sourced from Ansible Galaxy and combined into the ansible PyPI package that depends on the ansible-core.

I just checked. Up until at least the pypi.org published ansible 2.9.25, the "ansible" published in pypi.org differed only slightly from the git repo tags at https://github.com/ansible/ansible.

This is correct. Up until Ansible 2.10, tags in ansible/ansible repo and releases of the ansible PyPI package were in sync.

Apparently, somewhere among the 2.10 builds, someone decided to replace the contents of the original "ansible" tarball with the "ansible_collection" tarball, and divorce it from the upstream github repo called "ansible", and rely on installing a new pypi module called "ansible-core" as a dependency.

This is also true, but you are missing one important fact here: during the 2.10 development cycle, the ansible/ansible repo lost almost all modules that were previously part of that repo. During that time, the number of plugins and modules that were part of ansible/ansible dropped from cca. 4000 to a few tens (to what is listed in https://docs.ansible.com/ansible/latest/collections/ansible/builtin/index.html).

This is now very confusing, and even dangerous. There is no pointer to the relevant upstream git repo or provenance for the published ansible_collections tarball being used,

There is no upstream repo for the ansible>=2.10 packages because, as I said before, this package is built from the content available in Ansible Galaxy. There is a repository that contains metadata about the ansible package at https://github.com/ansible-community/ansible-build-data, but there is no single repo that would contain all files that are part of the built Python package.

and the documentation incorrectly points to https://github.com/ansible/ansible, all of which is now instead in the "ansible-core" module.

As Felix said before, there are still issues with the docs, and contributions/clarifications are more than welcome.

Dozens if not hundreds of vendor published "pip install ansible" steps, including those published for Ansible tower and AWX, need to be revised to point to "pip install ansible-core" to avoid a bulky and unwelcome of potentially conflicting modules which they're unlikely to need or even want.

This is not true. If they used Ansible 2.9 before, the only safe option for them is to keep using the ansible PyPI package because ansible-core is missing most of the content that is part of the ansible==2.9.* package.

If I want a 20 pound accessory kit to go with my good 1 pound electric screwdriver, I'd like the box it comes in to be labeled "powered screwdriver accessories", not to be labeled "powered screwdriver" and find a sign in it saying "got get the other box labeled powered-screwdriver-core". It's confusing, and it multiplies both the size and the installation time for "pip install ansible" by a factor off about 25.

I have a feeling that you are thinking about Ansible 2.9 as a "powered scredriver" while it was always a "powered screwdriver and accessories and kitchen sinks" thing. Ansible Core is what one would call "powered scredriver", but this did not exist in the Ansible 2.9 world.

@nkadel
Copy link
Author

nkadel commented Sep 19, 2021

The ansible-2.9 world consisted mainly of the "powered screwdriver", by line count and by number of files.

Whether or not folks have come to expect that "kitchen sink", they don't really need the kiddie pool, the Olympic diving board, David Hasslehoff as a life guard, and a golden retriever to collect ducks in hunting season. Most of the rest of the ansible_collection is unnecessary and would be unwelcome for most installations, especially since it now has 140 distinct license files embedded in it. Without getting into the wisdom and instability of creating such large bundles of disparate software, I'll suggest that it should be segregated. Move it aside to an "ansible_collections" module. If the pypi "ansible" module needs to remain linked to these other components, then make "ansible" just require "ansible-core" and "ansible_collections", to better manage the resources and make more clear that "ansible-core" has the vital components, the others are add-ons.

I'd be happy to help do this if I can. I mostly work with RPM packaging of python modules, which is why I noticed this, and have published roughly 500 such SRPMs.

@tadeboro
Copy link

The ansible-2.9 world consisted mainly of the "powered screwdriver", by line count and by number of files.

The number of lines did grow substantially during the 2.10 development cycle. The main reasons for this growth are:

  1. Ansible package now also contains a test code that was not packaged before. We are working on this one and we track the progress in Smaller ansible packages (no tests/ dir?) ansible-community/community-topics#29.
  2. Symlinks in the collections are converted to file copies during the packaging process. Ansible tarball duplicates symlinked modules ansible-community/antsibull-build#218 is the issue for this problem where you can also find some linked issues in the upstream project.
  3. Some new content was added during the 2.10 development cycle.
  4. Splitting modules and plugins into collections did introduce some additional metadata that needs to be part of the collection.

I did some measuring in order to quantify the effects the split had on the Ansible installation size and these are the results I got:

  1. Ansible 2.8.10 takes 96 MB on disk.
  2. Ansible 2.9.26 takes 121 MB on disk.
  3. Ansible 2.10.7 takes 11 MB for core and 363 MB for collections, totaling 374 MB.

If we remove the size increases due to reasons 1 and 2, the size of Ansible 2.10.7 decreases to 11 MB for core and 130 MB for collections, bringing its total size down to 141 MB. This means that Ansible grew less during the 2.10 development cycle compared to the 2.9 development cycle.

There is not much we can do about the tests being included until we hear back from the legal, but preserving the symlinks is definitely something we can do. And if you would be willing to help us here, that would be great.

Whether or not folks have come to expect that "kitchen sink", they don't really need the kiddie pool, the Olympic diving board, David Hasslehoff as a life guard, and a golden retriever to collect ducks in hunting season. Most of the rest of the ansible_collection is unnecessary and would be unwelcome for most installations, especially since it now has 140 distinct license files embedded in it.

I am not sure what are you trying to say here. Ansible package way growing fast even before the content was split into individual collections. In fact, this was one of the main reasons why it was split in the first place. So all those "unnecessary" things were always part of Ansible and are not something that we started adding after the Ansible was split. I hope the size increase between Ansible 2.8 and 2.9 makes this clear.

As for the licenses, there are quite a few of them used in the package, but they are all GPLv3 compatible and the package itself is released under GPLv3.

Without getting into the wisdom and instability of creating such large bundles of disparate software, I'll suggest that it should be segregated. Move it aside to an "ansible_collections" module. If the pypi "ansible" module needs to remain linked to these other components, then make "ansible" just require "ansible-core" and "ansible_collections", to better manage the resources and make more clear that "ansible-core" has the vital components, the others are add-ons.

The Ansible PyPI package must remain in place for the backward compatibility reasons I mentioned a few times already. So this package is not going anywhere anytime soon.

If it were up to me to decide if the community package lives or not, Ansible 2.9 would be the last version of the batteries-included package. And in my ideal world, people would only use ansible-core plus those few collections that they really need. But the reality is quite different. Lots of people use all-in-one package because it is convenient and because they were always doing things this way. And judging by the number of inclusion requests (https://github.com/ansible-collections/ansible-inclusion/discussions), collection authors also feel that being included in a common package is beneficial to the user.

Making Ansible package a "virtual" package that exists just to pull certain dependencies would introduce more administrative work for very little benefit. Since both packages would need to be updated at the same time and their versions kept in sync, it makes no sense to go in that direction.

@nkadel
Copy link
Author

nkadel commented Sep 19, 2021

I thought the "David Hasslehoff as a lifeguard" metaphor was pretty clear. I won't take on trying to talk collection authors out of default inclusion, though I think it's a horrible, horrible idea. I've worked in environments that believed in the "monolithic, guaranteed to work" tarball. The question isn't whether it will break down, the question is when. I suspect it is when different sub "collections" are compatible only with distinct versions of ansible-core, or even different versions of python 3.

I'm not in a position to insist on changing this layout, but I do think that migrating ansible_collections away from the "ansible" module itself would help clarify the layout. Having the "ansible" python module install its python components to the "ansible_collections" folder is unwelcome confusion. There are a number of python modules with different directory names than their module names, and it just confuses people.

@tadeboro
Copy link

There is not much we can do about the naming. The core team controls the engine part naming (when you install the ansible-core package, you get ansible and ansible_test folders in site-packages) so it is very little we (the community) can do about this one. And since that part is supported by Red Hat, those things are not just technical problems, they are also marketing and business problems.

And I hope I also explained why renaming the ansible PyPI package is not really an option (backward compatibility) and why maintaining two separate packages for no real benefit is a no-go for now.

So as things stand, there is not much we can actually do or makes sense doing apart from educating users. I will be the first one to admit that the situation is not ideal, but it is what it is. And I have a feeling that renaming the packages will not resolve any issues since people who do not like the all-in-one package can opt to start using the ansible-core and call it a day.

@felixfontein
Copy link
Contributor

Closing since the naming is intentional.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants