-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Review 1:) Add: package structure & build tools overview to packaging guide #23
Conversation
Potentially relevant: https://pypackaging-native.github.io/ (by @rgommers) |
As a clarification for PR author/reviewers here: the audience of that site is mostly Python packaging experts. It's about documenting problems so we have a full view and can start working on solutions. If the audience of the pyOpenSci packaging guide are mostly beginning package authors and the main focus is pure Python, then best not to scare them with the hairy stuff probably. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lwasser, again this is a giant step forward for us and you have done a huge amount of work integrating everything you've synthesized from everyone you've listened to. I think all the major parts are there and just need polishing up.
That said, the one thing I do feel is missing is a little bit more of a guided overview, maybe on the intro page? Similar to what @sneakers-the-rat suggested for docs, maybe something like @eriknw describes, that says "if you're this kind of developer, read this and then do this". Like, something with "personas" of who will be reading this part of the guide.
I have some strong feelings about the package structure page. I have done my best to edit myself so that I don't annoy you so much that you stop listening to me completely. But I still felt like I needed to at least explain a slightly different point of view on the review. I hope that's understandable.
package-structure-code/python-package-distribution-files-sdist-wheel.md
Outdated
Show resolved
Hide resolved
package-structure-code/python-package-distribution-files-sdist-wheel.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lwasser , thanks for this very useful documentation!
I'm suggesting small nit comments inline and I have one more general comment/question about "pinning" dependencies.
This an uncomfortable term and practice when you land to python from other (older... :) ) languages and packaging.
Perhaps its meaning is defined somewhere else? If not I think it could be very useful to add an explanation in this page, as the packaging tools are also evaluated on their support of pinning dependencies.
Also, Python pinning best practices are cited (eg in PDM description), I would be happy to see a linked reference to them in a small general paragraph about pinning.
In particular, I think it would be useful to make authors aware of https://github.com/scipy/oldest-supported-numpy, as numpy and scipy are very likely to be used by scientific python packages and pinning dependencies in inconsistent ways may lead to obscure installation errors.
Hi everyone. the review for this section of the guide is now CLOSED while i work on updating things locally. Please refrain from adding new comments UNLESS i ping you (as i might for clarification as i work on each set of comments). Similar to our last round of reviews for documentation, once a new version of this PR is available i'll reopen it to another round of reviews. I suspect this might take more than 2 rounds given the complexity - but we shall see! many thanks for all of the comments above. |
package-structure-code/intro.md
Outdated
look like and what tools to use across the Python ecosystem. | ||
|
||
In this guide, we have made decisions around suggested standards and required | ||
standards, based upon the commonly used approaches in the scientific Python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eriknw i'm pulling our your comment below so i can respond and we can potentiall chat about it a bit as i work locally to update
More directly: how much of this guide is based on what scientific Python packages currently do vs. what is recommended in the broader Python community? What would this guide look like if it were only based on looking at major and mid-major scientific Python packages?
I have been thinking about this a lot and chatting with folks across communities. I think i'm hoping to :
- find a single workflow for those who are newer / getting started. BUT one using a tool that is extensible enough to support other backends like meson for instance, as such that someone who did need more complex build steps COULD still use the basic workflow. i say this KNOWING some will strongly disagree with parts of this but... we can't win fully here. the vision being some local to the scientific community consistency in the future!!
- Present the broader landscape of tools (maybe not including versioneer 😆 ) so people understand what each tool and option provides generally. But still directing folks to a particular workflow and approach.
And there is my dream. ✨ can.it.happen? please! :) i will need everyones help getting there!
if the above could happen, then there is no need for two guides EXCEPT for ralf's guide which is needed but also super technical . maybe that is the guide for the larger and more complex packages ?
open to your thoughts...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say there are two options: show the standard tools that do one job, but do it well. IMO, there's not much difference between:
flit build
poetry build
pdm build
And
pipx run build
Except the later always works, even in CI, even in reusable workflows, you can put a version of it in nox/tox files, everywhere, even if you switch backends daily.
Same thing for all the alternatives and pipx run twine upload dist/*
. Again, it doesn't matter what unified tool someone comes up with, this can be put into a GitHub Action and reused on all your projects.
To show just one frontend tool, then PDM seems to be the best option at the moment. You can use any backend, so it should work with scikit-build-core and meson-python. It has built in commands for the various things you'd probably want to do. It's got lock files, too.
Hatch is another option, but not supporting arbitrary backends is a huge downside; no lock files yet is one too. The up-side is it basically can replace nox/tox, being a true "all-in-one" tool. While ideally you should still having a nox/tox file with PDM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hatch is another option, but not supporting arbitrary backends is a huge downside
Hatch will be able to build projects that don't use Hatchling soon
PDM [...] got lock files
That's kind of a nebulous term though, does it support source distributions, does the lock file work on all platforms, etc. Just saying it has something doesn't mean much in this case without more details (the same goes for Poetry)
PDM seems to be the best option at the moment [...] It has built in commands for the various things you'd probably want to do
How does that differ from what Hatch provides?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Poetry's work well, though they make a mistake in metadata - they assume metadata can't change between wheels of the same version, which is incorrect, but very commonly good enough. PDM's file work great for the reason I use lock files - they allow me to control updates. I'm not sure how they work for security (which I think is the main issue people are worried about for SDists in a general lock file description). But being able to have a locked set of versions for an application you deploy (like a website) is crucial.
How does that differ from what Hatch provides?
It doesn't, my point is it provides a very useful core feature (the ability to control updates, similar to Gemfile.lock, package-lock.json, etc - and really the only reason I use a frontend) but all the other commands Hatch has as well (which, by the way, I never use - I always use pipx run <build/twine>
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@henryiii , @ofek thank you for this... i wondered if you can clarify this as i'm working on the poetry part of this guide today (and am getting some great feedback from the maintainers as well.
they (Poetry) assume metadata can't change between wheels of the same version, which is incorrect
do you mean that a wheel for the same version of a package but that would be custom built for different say operating systems or hardware in an HPC environment may have different metadata needs?
728cabb
to
bd6fad5
Compare
* Flit | ||
* Hatch | ||
* setuptools | ||
* PDM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this front-ends or back-ends? setuptools is a backend (along with flit-core, pdm-backend, and hatchling). There is no front-end for setuptools. It just uses standard tooling (build, pip, and twine).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @henryiii @astrojuanlu also had commented on this with my table mixing back ends and front ends. TBH i'd used setuptools for years but didn't understand it just isn't / doesn't have a front end. i'm still actually a bit confused because build is what you use to build your package. but setuptools is the back end.
But what i plan to do is mention just mention that setuptools is a very mature packaging tool that supports backwards compatibility in packaging and is the most commonly used tool. Then in that table i will remove setuptools and add poetry.
i'm going to suggest that users don't use setuptools and pick a modern front end that supports other back ends like meson.
Does that sound reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we defining frontends? Do we mean "a CLI command that builds the project"? Do we mean "a project management tool providing various lifecycle helpers including, but not limited to: building, testing, installing, publishing to PyPI, environment management, editing project configuration, and more"?
Setuptools has a CLI command that builds the project -- it's just deprecated. You don't have to use python -m build
, you can use python setup.py bdist_wheel
if you don't mind getting a warning that someday they will remove this.
flit-core has python -m flit_core.wheel
which builds the project. Is this a frontend?
...
If the section intends to focus on the CLI user experience, then honestly I'd say, just suggest the use of python -m build
"or you can use a full project management tool such as PDM or, in two minor releases, hatch". A guide to choosing options for getting started, doesn't need to focus on the specifics of what tools have wired up their own private CLI. People who care about that will learn about it after they have chosen a tool and gotten comfortable with it.
If the section intends to focus on the configuration and support for build steps and other aspects of getting the software into a package, then avoid mentioning hatch or pdm -- mention hatchling and pdm-backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this has been covered, but typical front-end uses would be ... ? I'd list:
- build package
- install package
- install package as editable
- bundle package into binary wheel
- bundle source into sdist
That's what we typically need during development workflow. Uploading I can always do with twine.
python -m build
covers (4) and (5), with (1) implied?
There are some alternative workflows for 2 and 3, such as launching a Python interpreter with PYTHONPATH correctly set (like we do in devpy
), but I think installs (+ editable installs) are preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i've included a few other things @stefanv
* [Build your packages (create the SDist and Wheel distributions](python-package-distribution-files-sdist-wheel)
* Installing your package in a development mode (so it updates when you update your code)
* Publishing to PyPI
* Running tests
* Building documentation
* Managing an environment or multiple environments in which you need to run tests and develop your package
These front end tools have other bells and whistles. i think that the major benefit here is there is less cognitive load if you have similar commands. sure you can use twine to publish but flit build, flit publish is going to be easier to remember especially for a beginner.
Also i'm learning that tools like hatch have other bells such as a nox like tool so you could skip the whole nox / makefile part for pure python builds. PDM and hatch offer levels of environment management with matrix testing offered by hatch (i hope i'm getting these all right). So i think it's worth users understanding what tools provide.
Also version bumping is slightly different across tools! anyway i'll push a new version of this all with a fancy table for all to look at and see if i've gotten closer to getting this all right!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``` | ||
|
||
## Use a pyproject.toml file for your package configuration & metadata | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have been discouraging linking to or even mentioning PEPs. Those are intended for implementations, but not as long-standing documents. The canonical location is packaging.python.org's specs.
```{note} | ||
Moving away from the **setup.py** file provides two benefits: | ||
|
||
1. Because setup.py has a mixture of code and metadata, it opens the user to a potential bad code injection on their computer when the package is installed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick side note: setup.py (or pyproject.toml) is not used (or even included) in installing a wheel. If you are worried about security, you should ideally only use pre-built wheels. For SDist builds, you pretty much have to treat that as arbitrary code.
Static metadata is great and has other benefits, and setup.py has the terrible issue of being run twice, once as a dummy run, because it can't require anything it needs for itself (which made plugins for setuptools pretty much non-existent).
[PEP518 describes the move away from setup.py to the pyproject.toml file.](https://peps.python.org/pep-0518/) | ||
Python package standards are moving away from | ||
including both package metadata and python code needed to setup a package installation in the same **setup.py** file. Instead we are moving towards using | ||
a **proproject.toml** file sometimes combined with a **setup.cfg** file. |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
``` | ||
|
||
```{tip} | ||
When you make a release on GitHub, it creates a SDist file (`.tar.gz`) that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would absolutely not call a GitHub release an SDist. An SDist is a Python Source Distribution. It contains a generated metadata directory. A GitHub tarball is a git archive
output of the source repo, and will not contain that generated directory. It is not a valid SDist.
This comment was marked as resolved.
This comment was marked as resolved.
They are very similar files, but they don't have to be. If you are using setuptools-scm or another VSC versioning tool, they will add a version file to the SDist. (Nowadays, you can get a version burned into the git archive too with The PKG-INFO file is required, as per https://packaging.python.org/en/latest/specifications/source-distribution-format/#source-distribution-file-format. So the git archive is never an SDist. The Egg-info directory is setuptools specific, other systems won't add it, and that's fine. They are ultimately simply similar but not the same file. For a beginner, I'd maybe mention that GitHub will bundle all git files as an archive, but that is not an SDist. |
On the sdist vs. github autogenerated archives topic, maybe it helps to provide some wider context? The python concept of an sdist isn't unique to python -- the C/C++ communities, the autotools community in general, and to some extent the "multi-language projects" communities face the same challenges. By autotools convention -- and usually copied elsewhere -- there is the concept of a "dist tarball" which is not the same as a snapshot of the git revision control contents. A dist tarball is distributable source code, user-oriented rather than developer-oriented, and:
You make this with Dist tarballs for autotools will include configure and Makefile.in which are generated from templates (configure.ac, Makefile.am). Dist tarballs for any build system or project may contain git submodules, or possibly even sources which are Dist tarballs can exclude stuff that isn't needed for distribution. This might include CI scripts, the For python, specifically, the generated metadata directory is python's equivalent of the autotools configure script. It's really useful in many cases, and absolutely critical in some -- such as setuptools_scm's version finder, which requires to either be run from a full git repo, or be run from an sdist that contains the generated metadata (which was generated in a full git repo). Github autogenerated tarballs contain effectively none of this functionality. And that actually results in build errors for projects that use setuptools_scm, if people attempt to use github autogenerated tarballs for building.
(Or you can do that, I added support to git for this -- and I keep forgetting that setuptools_scm finally released a version that supports this.) |
That only solves the one problem (version files), but it's great that git does support it now ❤️, and setuptools-scm 7+ supports it. Though the user has to manually add a couple of files to make it work. The next release of setuptools-scm will no longer depend on setuptools, too, since it's the tool behind hatch-vcs, too. |
This comment was marked as resolved.
This comment was marked as resolved.
bd6fad5
to
78120cd
Compare
* what else? | ||
--> | ||
|
||
| Feature | Setuptools | Flit | Hatch | PDM | |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
Hi colleagues. I truly value respectful, open conversation surrounding open source topics. Please know that I appreciate y'all. However, it is also important that the discussions in our organization remain positive and supportive of each other's efforts. We can disagree, but there's no need to direct negative comments towards the work of others. Our code of conduct highlights this. Particularly we talk about the importance of micro-agressions and avoiding using words that are combative / place others in a position that they feel uncomfortable. Notably, @eli-schwartz, please consider this post a direct warning to not engage in such behavior going forward. I value your work and your thoughts and want your input. But its counter productive to our organization's shared goals to be adversarial. Thank y'all for understanding. |
Just to avoid more off-topic in python-poetry/poetry#2740 I'm answering your questions here.
I'm open to review. You can ping me here or as mentioned by Secrus ask your questions on our Discord server or in a new discussion |
@radoering ok great - i'll ask questions in the discussion thread over in the poetry repo and will ping you once i have a cleaned up version that you can look at! many many thanks for the input and for being willing to answer questions! This is a work in progress and i'm trying to get as much input as possible to ensure it represents everyone's hard work on tooling in the community! |
…-wheel.md Co-authored-by: David Nicholson <NickleDave@users.noreply.github.com>
Co-authored-by: Henry Schreiner <HenrySchreinerIII@gmail.com>
Co-authored-by: David Nicholson <NickleDave@users.noreply.github.com>
@all-contributors please add @astrojuanlu for code, design, review |
I've put up a pull request to add @astrojuanlu! 🎉 |
@all-contributors please add @eli-schwartz for code, design, review |
I've put up a pull request to add @eli-schwartz! 🎉 |
@all-contributors please add @rgommers for code, design, review |
I've put up a pull request to add @rgommers! 🎉 |
This comment was marked as resolved.
This comment was marked as resolved.
I've put up a pull request to add @ofek! 🎉 |
hey y'all - i am going to hide comments that i've addressed in the docs because there are so many i keep losing track of what i've addressed. please do not take offense to your comment being hidden - this just means i've addressed it in the text. it's just a lot of information and I want to make sure that i have addressed EVERYONE's feedback. many thanks |
hi everyone! i've made significant updates to this chapter of our guide and as such welcome a second round of review - here. I encourage constructive feedback that points out errors, issues things we've missed grammar etc in the text from everyone! Id like to have feedback by march 12th with the goal of working on it again (and merging it) that week of the 13th! many thanks y'all for all of the contributions here!! |
we have a CI build for the book if you want to have a look and just see what it looks like - https://app.circleci.com/pipelines/github/pyOpenSci/python-package-guide/131/workflows/1d0c1ca3-f7ff-4b61-a171-54d8409954b0/jobs/107/artifacts |
y'all - closing this PR - the most recent one with ALL of the changes from this PR was just merged!! #55 |
All - last ping from me here on this CLOSED pr. I've had this zenodo file open for a while and just did a bit update with everyone's names who has participated in the reviews and contributed to our work across both this PR and the other that was just merged yesterday! Please review and add / edit the zenodo file in this pr if you'd like your information displayed in the guidebook citation differently: If i missed you (i'm very sorry if that happened but i hope it doesn't!) please either comment on the pr with your info or add yourself via an inline comment. i don't want to miss anyone. I will push a new release of this guide with the updated zenodo file next Tuesday April 28th. you can still add your name or edit your details after that time but only as a pull request given the open one will be merged!! Many thanks!! |
hi y'all - one last reminder - i will merge this pr with the zenodo doi information tomorrow (tuesday mar 28) and create a release of this repo at that time. feel free to update / comment on the pr that is open if any of your information if missing. many thanks y'all for the comments already there (i just committed them!) |
I know this likely will need a good bit of work but look forward to feedback from everyone!!