-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help Wanted: Guidance on Python Package Metadata #3
Comments
@makkus, you mentioned interest in Python support. I've done quite a bit of work in the past few weeks to prepare to support more package ecosystems, like Python. Maybe you could help me with a few high-level implementation questions? |
Collecting some reference: |
Sure, I'll have a look. Might not get to it before next week, but would be happy to help. |
One thing worth mentioning straight away is that there is some movement in the Python community to come up with a better way to manage package dependencies and project setup in general. Which might or might not mean that setup.py will go away in the future (not short term, but at some stage). One contender is Pipenv (https://docs.pipenv.org/), the other one Poetry (https://github.com/sdispater/poetry). So, not relying too much on setup.py might be a good idea. Not 100% sure yet, will have a think over the weekend... |
@makkus, I'd be very grateful! If you need support for the work, let me know. I'd also be happy to mention you on https://licensezero.com/thanks, if you're alright with that. |
I heard a bit about this from @anseljh at some point. Sounds like a freestanding, self-contained License Zero metadata file might be the way to go. |
Probably. And even if not, having that sort of generic option might be a good idea in any case, as that would provide a (even if just rudimentary) fallback for all kinds of as-of-yet unsupported languages/project types. |
@makkus That's a thought. I briefly considered I'd want to give some thought to how to link a freestanding |
Ok, I've thought about this a bit in the last few days, and there is definitely more thinking to do, but here are a few points I think are relevant:
The following really only applies to 'traditional' Python projects using setuptools and not Pipenv or Poetry (I've tried both but wasn't prepared to spend the time switching to one of them yet, for different reasons). Parts of it will most likely apply for any sort of project-setup, but I'm not sure. Ok, a bit of background:
Now, I think the most realistic way of analyzing a project is requiring it to be in a virtualenv, and work with the dependencies that come up when one does a The main issue will be how to get to the There are a few tools out there that can do Python dependency analysis for different purposes (license-check, dependency-graphs, etc.). Those are all written in Python though, and would have to be installed to be of use. I never really needed any of them, so can't really comment on whether there are any out there that would make licensezeros job easier. Might be worth to think about publishing a minimal licensezero python package that could be installed into a development environment and provide a few helper functions. Not sure. There are also some tools that do dependency analysis based on metadata published by the pypi index, but that brings up the problem of where to put licensezero metadata so it can get accessed by such a tool. That's all I can think of for now, feel free to ask for clarification if necessary. |
Before I forget: sometime in the next couple of weeks I'm planning to setup a private python index for development and trying out some things. It won't be stable or anything, but could be used to test publishing python projects and retrieve dependencies. If you want, I could give you access for your testing. You could also use https://test.pypi.org for that though. Also, if you want, I can throw together a templated Python project that you can use to easily create multiple projects, to test your cli. And which maybe could be developed into a sort of documentation or example Project for aspiring licensezero Python devs. |
Just thought of another command that could potentially be useful: It gives output like this:
This is the relevant portion of the corresponding
So, you could iterate through the output of |
If you wanted to get really fancy, every licensezero Python project could be required to have a sort of 'license' entry_point, which would point to a function that would return license information. That way you could write a short python app that would automatically list all such projects in the PYTHON_PATH. The result would then be used by the licensezero cli. I'd probably still go with a pure file-based approach like the one you are trying at the moment, so this is just another option if that doesn't work out for some reason. |
License Zero licenses close the loophole for proprietary use of dev tools! The |
This is basically how we're handling RubyGems at the moment. That approach initially seemed unbearably slow. But I was able to speed it up by reading through |
@makkus, this is just incredible feedback. You obviously took significant time just to type out these responses, which I've read from top to bottom. Would you be alright with a public nod on https://licensezero.com/thanks? |
I agree with @makkus's super detailed writeup. Good job! I can speak to Pipenv a bit. Pipenv is not doing anything terribly fancy. It has essentially these elements:
A Also important to note that Pipenv is only for application dependencies; it is not meant for libraries. See https://docs.pipenv.org/advanced/#pipfile-vs-setuppy. |
Thanks, @anseljh! Your note on application dependencies was especially appreciated. Python dependency management sure doesn't sound fun. I have a lot of reading to do here. I'll try to bring as many of those notes together as I can. In the end, It's Complicated, but I want to do what I can to add meaningful support. |
Yes, the problem I thought could come up in Python projects is that a dependency might or might not be in the virtualenv, depending on the circumstance. I'm not quite sure what the expectation is for when exactly the licensezero cli app will be run. Ideally it'd be when all -- both dev tools and dependency libraries -- are already installed and available in a virtualenv. Some tools are only necessary in a CI pipeline for example or when doing integration testing, so they might not be present when checking for licensezero libraries. I usually have a few different virtualenvs for the same project, depending on what I try to do. Even in my development env I don't have all libraries installed that I use across the whole lifecycle of the project. Now, I don't think that's too difficult to deal with for most cases, more a matter of how sure you want to be to catch everything in every circumstance. Maybe just a case of documentation and explaining those limitation to users. I'm still torn, but the more I think about it, the more I like the idea of having a little license 'helper' library in the dependencies (should be lean enough, and not have it's own dependencies of course) of a project. Haven't really thought about how to do it technically, but even just inheriting a parent Licensezero class might be enough for this library to find all sub-classes in the current environment. This could easily expose the metadata and maybe a few more convenience functions to the licensezero cli (which would just have to call something like Developers have to do some work to 'enable' licensezero anyway, doing it in code (or alternatively by adding one dependency to setup.py and the licensezero.json file to the root of the project and in the Manifest) wouldn't be much worse (or even better, since it could also contain some code to auto-generate said licensezero.json file and somesuch). And I think something like this would be more reliable than any combined pip/filesystem-trawling approach. It could still not find any dependencies that are not currently present in the virtualenv of course. It would mean to have a language/project-type specific approach though, which would be more work to maintain/develop than having a generic solution. So if that can be done, it's probably still the more sensible way to go. |
Sure, of course. |
Is it necessary to autodetect the packages? I would prefer just feeding the licensezero app a requirements.txt file, additionally, this would let me automate the detection of licensezero libraries across several projects at once. |
@chason, thanks for your input! Does |
Yes, usually it lives in the root of the project. But not always. I'd say the majority still uses requirements.txt files for their dependencies, but it's not an overwhelming majority by any means. My dependencies are in setup.cfg, for example. Then there are people who use setup.py directly, and some that use pipenv. And poetry gets more and more traction too. Also, a requirements.txt files only holds the names (and also sometimes version numbers) of other packages, no other metadata at all. So there would definitely be more work involved than just parsing that file. Also, what about dependencies of dependencies? It's going to be really difficult if you want to cover a meaningful portion of everything that is out there, even more so going forward, nobody knows yet how most Python projects will be packaged in the future. If I had to bet, I'd say poetry, and the more or less new standard pyproject.toml will win, but currently the tooling is just not there yet. To make all that even more difficult, since Licensezero often also applies to development tooling, not just dependencies that will be bundled with a release need to be discovered, but also development dependencies. Those are typically declared in separate files in Python (actually, this is what best practices usually recommend to use requirements.txt for -- production dependencies should live in setup.cfg or setup.py -- if you don't use any of the newer tools). Anyway, big messy mess I think. @chason : how would you implement your suggested requirements.txt-parsing? How would the licensezero app be able to find all licensezero-licensed projects by just using one requirements file? Installing a dummy virtualenv? I honestly can't think of a non-overengineered and expensive way to implement this. I'd guess i'd probably be a good idea to utilize pip-tools in some way. Personally, I think the most future-proof and user-friendly way that covers most cases would probably be if it'd be possible to run the licensezero app in a virtualenv that typically has development as well as 'normal' dependencies installed (which all developers have anyway, since it is their working environment), and autodetect all licensezero project files either by looking for certain filenames in a certain location, or by having a Python class that inherits from a licensezero 'base' class, and which could be discovered via Python-native ways. I'd be happy to go into more detail, or write a prototype. But could well be I'm overlooking something obvious and all this is much easier. |
Is there some tool in Python that abstracts over all the various dependency-management solution, and outputs a report? |
Not that I know of. There are a few version and license checkers and such, but I have never found one that works well and reliably. And usually they only work in virtualenvs anyway, that is, when all dependencies are installed. And once that is the case, it's not that difficult to read a few environment variables and search for relevant metadata files with your go application (I think). But, as I said, I'm not 100% sure I'm not missing an obvious, easy solution.... |
I think this can be kept fairly to a fairly simple workflow.
Essentially the idea is that we should separate the concerns into a phase that is dependent on the specific python environment, and a phase that does not care about the specific python environment. Since only the python specific parts really need to care about which python environment they are running in, this feels like it should be more flexible and help fit within the still shifting python packaging landscape. P.S. You can request the creation of new trove classifiers that python packages can then use in their package metadata. There are a few currently open examples and plenty of closed examples to show the complete process here https://github.com/pypa/warehouse/labels/classifier%20request. I would suggest opening an issue requesting the addition of some classifiers similar to these rough examples based on the existing ones:
|
@techdragon that comment was gold! Thank you so much! I’m particularly grateful to read your recommendation on the license identifiers. As for invoking local Python, is there a short script you can think of to print the file system paths of dependencies of the Python project at the current working directory? Even a partial or toy attempt would be greatly appreciated. I think I would be fine hard coding a Python script of the kind into the L0 CLI and piping it to |
For invoking introspecting a given python environment, there are a few possible options depending on how a package will 'signal' that it is using a license zero based license. If you want to drive it entirely off the trove classifiers, then it could be as simple as this, I've used a full string match for import pkg_resources
[print(package) for package in list(
pkg for pkg in pkg_resources.working_set if (
'Classifier: Programming Language :: Python :: 3' in list(
pkg.get_metadata('METADATA' if pkg.has_metadata('METADATA') else 'PKG-INFO').splitlines()
)
)
)] It converts into a short enough command that can be run via bash/shell to inspect the current python -c "import pkg_resources; [print(package) for package in list(pkg for pkg in pkg_resources.working_set if ('Classifier: Programming Language :: Python :: 3' in list(pkg.get_metadata('METADATA' if pkg.has_metadata('METADATA') else 'PKG-INFO').splitlines())))]" This approach would also extend to doing a partial string match for something like There are standards about the availability of specific metadata about a package, including the ability to mark the license the package is using. https://packaging.python.org/specifications/core-metadata/#license and the majority of tools in the python packaging ecosystem support and respect package author's choices with regards to the selection of any license related metadata they wish. Such as the use of multiple license files documented here in the wheel package format documentation. https://wheel.readthedocs.io/en/stable/user_guide.html?highlight=dist-info#including-license-files-in-the-generated-wheel-file Alternately, you can continue the use of the This code isn't quite as self contained of an example as it will require either a tempfile, or shell heredoc type passing of the program in via stdin, due to issues producing code that can handle the range of errors that might be produced when importing some packages. for package in pkg_resources.working_set:
try:
print(package.key, pkg_resources.resource_exists(package.key, 'licensezero.json'))
except:
print(package.key, None) Additionally there is the possibility of using the built in package metadata mechanisms as a way of storing the content of the import pkg_resources
for package for package in pkg_resources.working_set:
try:
if package.has_metadata('LICENSE'):
print(package.get_metadata('LICENSE'))
except:
pass Edit: I only tested these code fragments with the latest stable version of python (3.7.x). But there should be other ways to do it if these don’t work since these are some of the things that pip is built on top of. So this should definitely be possible on any useful version of python. |
Hey @kemitchell , not sure if that's helpful in any way, but I've setup a dummy Python project from my personal Python project cookiecutter template, for you to play around with: https://gitlab.com/makkus/license_test_project It's using pyenv to setup a virtualenv, but I guess as @techdragon says, if you assume everything relevant is in the current virtualenv, then it doesn't matter how that virtualenv is setup. If you are familiar with a different way of setting up a virtualenv, you can just ignore the README.md and do it any other way. My folder structure is also a bit different than what one usually sees, but there is no 'real' convention in regards to that in the Python community, as far as I know. Anyway, for your purposes that shouldn't really matter that much. I've added @techdragon 's code to be executed when you invoke the 'dummy-command' cli command (in cli.py), I reckon that's useful to do testing. Feel free to clone it, but I'd also be happy to work with you and add/adjust whatever is necessary, ( licensezero.json / trove classifier ) and maybe create a second project that relies on this one, so you can test that scenario. Didn't do that yet because I'm not sure how you go about those things. Also, feel free to move that over to Github if that is more convenient for you. The CI won't work anymore then without a bit of work to migrate that to travis or whatever else Github users use nowadays, but that might not be necessary anyway. Might be interesting to use that to test different versions of Python, not sure.... |
@techdragon and @makkus, thanks so much for your writing! I am very interested in the approach outlined. In the end, I think a built-in script to list out the paths of dependencies in the environment, then iterating them to look for |
I've just come across the conversation about some future changes to the way packages will specifiy licensing metadata here https://discuss.python.org/t/improving-license-clarity-with-better-package-metadata/2154/51 and after reading through the comments and the draft. It appears that the most reliable way for a package to store the data for distribution will be
This will allow you to, for the licensezero.com using packages (which are the ones I believe the CLI is designed to work with). Walk all of the packages in the python environment directly getting the data you need roughly like this script. (I'd make this a more complete, but I'm unsure what the best way to format a parsable string for Go would be, so the output from the below script is import pkg_resources
for package for package in pkg_resources.working_set:
try:
if package.has_metadata('licensezero.json'):
print(package.key)
print(package.get_metadata('licensezero.json'))
except:
pass The reason I feel its important to get the most of the python tooling is that there are some more advanced/complex use the python packaging tools that will cause trouble with taking a list of package names generated using python, and then trying to walk package directories for the appropriate data. Two biggest issues I can think of are compressed packages that don't have a traditional source folder yet do still have the appropriate metadata, and namespace packages, where the installed packages do not 100% match up with the directory tree that will need to be walked in order to get dat out of packages. |
@techdragon thanks so much! Do I correctly understand that pkg_resources will allow us to put arbitrary data into package metadata, like a I was not aware of compressed packages and installed packages. You've put a lot of work on this. We should discuss how to get you recognized and compensated for that. |
@techdragon, are you currently offering licenses through L0? If so, it’s the least I can do to waive commission. That goes for all those who’ve contributed here. |
Ok, so my take on this, especially in light of PEP517. There are two mainstream tools for producing redistributable packages: setuptools and poetry. Managing environments themselves is a blend of various tools doing various things, including venv, pipenv, conda, poetry, tox, various venv managers, etc. The common thing is scanning As I understand it (I've never looked at it with this use in mind), messing with the metadata isn't really feasible? The schema isn't specified to be extensible, so you'd have to re-implement some things or teach a lot of tools about your extensions. IMHO, the lowest-friction way to go is to embed |
@astronouth7303 thanks so much! There's so much here. It's a lot to take in. But it sounds like saying something like this could button it up:
|
Something something package data something. |
@astronouth7303 I'm sorry. I didn't quite get the gist of your last comment here. Maybe I could ask you to put it in different words? |
The python term for what you want is "package data" as in:
|
Ok, per python-poetry/poetry#2015 (comment) Poetry requires no extra steps. |
@astronouth7303 thank you! |
I would greatly appreciate advice on how best to support Python dependencies. I've done a bit of Python programming, but I'm not up-to-date on package or project management.
Supporting a package type in License Zero boils down to two operations:
Given a project directory for a project using dependencies, find all Python dependencies in it, and read metadata about them. For npm packages, the CLI recurses
node_modules
and readslicensezero
properties ofpackage.json
files.Given a project directory for a License Zero project, write metadata in such a way that it will end up in depending projects' directories. For npm packages, the CLI writes
ametadata tolicensezero
property topackage.json
licensezero.json
files.A few specific questions:
Should the CLI write information to
setup.py
or its own metadata file, likelicensezero.json
?Any gotchas with a
licensezero.json
file?MANIFEST.in
?Do Python developers have a CLI tool installed that will list all project dependencies and their paths?
The text was updated successfully, but these errors were encountered: