Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help Wanted: Guidance on Python Package Metadata #3

Closed
kemitchell opened this issue Jun 28, 2018 · 40 comments
Closed

Help Wanted: Guidance on Python Package Metadata #3

kemitchell opened this issue Jun 28, 2018 · 40 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested

Comments

@kemitchell
Copy link
Member

kemitchell commented Jun 28, 2018

I would greatly appreciate advice on how best to support Python dependencies. I've done a bit of Python programming, but I'm not up-to-date on package or project management.

Supporting a package type in License Zero boils down to two operations:

  1. Given a project directory for a project using dependencies, find all Python dependencies in it, and read metadata about them. For npm packages, the CLI recurses node_modules and reads licensezero properties of package.json files.

  2. Given a project directory for a License Zero project, write metadata in such a way that it will end up in depending projects' directories. For npm packages, the CLI writes a licensezero property to package.json metadata to licensezero.json files.

A few specific questions:

  1. Should the CLI write information to setup.py or its own metadata file, like licensezero.json?

  2. Any gotchas with a licensezero.json file? MANIFEST.in?

  3. Do Python developers have a CLI tool installed that will list all project dependencies and their paths?

@kemitchell
Copy link
Member Author

@makkus, you mentioned interest in Python support. I've done quite a bit of work in the past few weeks to prepare to support more package ecosystems, like Python. Maybe you could help me with a few high-level implementation questions?

@kemitchell kemitchell self-assigned this Jun 28, 2018
@kemitchell kemitchell added enhancement New feature or request help wanted Extra attention is needed question Further information is requested labels Jun 28, 2018
@kemitchell
Copy link
Member Author

Collecting some reference:

PEP 566: Metadata for Python Software Packages

@makkus
Copy link

makkus commented Jun 28, 2018

Sure, I'll have a look. Might not get to it before next week, but would be happy to help.

@makkus
Copy link

makkus commented Jun 28, 2018

One thing worth mentioning straight away is that there is some movement in the Python community to come up with a better way to manage package dependencies and project setup in general. Which might or might not mean that setup.py will go away in the future (not short term, but at some stage).

One contender is Pipenv (https://docs.pipenv.org/), the other one Poetry (https://github.com/sdispater/poetry). So, not relying too much on setup.py might be a good idea. Not 100% sure yet, will have a think over the weekend...

@kemitchell
Copy link
Member Author

@makkus, I'd be very grateful!

If you need support for the work, let me know. I'd also be happy to mention you on https://licensezero.com/thanks, if you're alright with that.

@kemitchell
Copy link
Member Author

One thing worth mentioning straight away is that there is some movement in the Python community to come up with a better way to manage package dependencies and project setup in general.

I heard a bit about this from @anseljh at some point.

Sounds like a freestanding, self-contained License Zero metadata file might be the way to go.

@makkus
Copy link

makkus commented Jun 28, 2018

Sounds like a freestanding, self-contained License Zero metadata file might be the way to go.

Probably. And even if not, having that sort of generic option might be a good idea in any case, as that would provide a (even if just rudimentary) fallback for all kinds of as-of-yet unsupported languages/project types.

@kemitchell
Copy link
Member Author

@makkus That's a thought. I briefly considered licensezero.json even for npm packages. But at least in that specific case, it seemed more trouble than it was worth. For packages in less settled formats--sounds like that describes Python---it could make perfect sense.

I'd want to give some thought to how to link a freestanding licensezero.json file to the package. Perhaps by including (signed) data in the metadata file indicating the name of the package or directory in which it should appear.

@kemitchell
Copy link
Member Author

@makkus, I've gone ahead and implemented the licensezero.json approach, at least on the read side, in #5.

@makkus
Copy link

makkus commented Jul 1, 2018

Ok, I've thought about this a bit in the last few days, and there is definitely more thinking to do, but here are a few points I think are relevant:

  1. Given a project directory for a project using dependencies, find all Python dependencies in it, and read metadata about them. For npm packages, the CLI recurses node_modules and reads licensezero properties of package.json files.

The following really only applies to 'traditional' Python projects using setuptools and not Pipenv or Poetry (I've tried both but wasn't prepared to spend the time switching to one of them yet, for different reasons). Parts of it will most likely apply for any sort of project-setup, but I'm not sure.

Ok, a bit of background:

  1. there is no single, commonly agreed way of setting up a Python development environment. I'm not even talking about Pipenv and Poetry here. Usually setting up a Python project involves a virtualenv in some way, but sometimes people just put everything into the global Python path. This is terrible practice, but nonetheless it happens. If a virtualenv is used, there are different ways of creating that. You can use 'plain' virtualenv. There is virtualenvwrapper which is pretty much like plain virtualenv, but with a few sensible default locations and wrapper scripts to make live easier. Then there is pyenv which comes with it's own virtualenv management plugin. And there is also Conda, which also lets you create a project environment, but that works a bit differently than a normal environment, because it can also contain system-libraries, sort of. For the purpose of licensezero trying to auto-parse dependencies, it shouldn't make too much a difference which one is used, but at least in the case of Conda there is a potential that it could. So, mainly something to keep in mind, without worrying too much. Except one doesn't use virtualenvs at all, in which case all libraries would end up in the same location, and licensezero would have no way of knowing which project a dependency belongs to.

  2. more importantly, Python development environments contain both development tools (repls, documentation generators, linters) as well as dependency libraries. I'm not big on Javascript, but I can imagine that's the same there, so you might have already solved that issue. Either way, it is probably relevant in determining whether a license is needed for any particular dependency that is licensezeroed. Depending on the developer, those dependencies are specified in different places:

  • one or several requirements.txt files:
    those can contain either all dependencies, or only ones that are needed to create the documentation pages, do the testing, setup development tools, etc
  • one or several sections in the setup.py files:
    it's possible to have a section called extras_require which can contain extra dependencies for special use-cases (e.g. include an extra cli frontend, or do xml-parsing with a Python-native library), or stuff that others put in a requirements.txt file
  • in case of conda, it has it's own environment.yaml (or some-such) file
  1. not super-likely, but a Python project might need libraries if run in one version of Python which are not required when run in a different version (e.g. Python3 feature-backports). Probably irrelevant in this context though.

Now, I think the most realistic way of analyzing a project is requiring it to be in a virtualenv, and work with the dependencies that come up when one does a pip list in that virtualenv. That result should always be the same independent of which virtualenv helper is used. Conda might be a bit different, but unless there is a licensezeroed Conda package in play somehow (which could be just dealt with as it's own project type later on) pip list would still work.

The main issue will be how to get to the licensezero.json file. In normal virtualenvs, the VIRTUAL_ENV environment variable gets you the path to the virtualenv (not sure if that also works for Conda), which you could then traverse and find all licensezero.json files, like you do in node_modules. Developers need to make sure to include the licensezero.json file in the manifest for that to work. Not sure whether the licensezero cli should use the output of pip list to filter what comes out of the folder search, although in most cases the dependency list would be the same.

There are a few tools out there that can do Python dependency analysis for different purposes (license-check, dependency-graphs, etc.). Those are all written in Python though, and would have to be installed to be of use. I never really needed any of them, so can't really comment on whether there are any out there that would make licensezeros job easier. Might be worth to think about publishing a minimal licensezero python package that could be installed into a development environment and provide a few helper functions. Not sure.

There are also some tools that do dependency analysis based on metadata published by the pypi index, but that brings up the problem of where to put licensezero metadata so it can get accessed by such a tool.

That's all I can think of for now, feel free to ask for clarification if necessary.

@makkus
Copy link

makkus commented Jul 1, 2018

Before I forget: sometime in the next couple of weeks I'm planning to setup a private python index for development and trying out some things. It won't be stable or anything, but could be used to test publishing python projects and retrieve dependencies. If you want, I could give you access for your testing. You could also use https://test.pypi.org for that though.

Also, if you want, I can throw together a templated Python project that you can use to easily create multiple projects, to test your cli. And which maybe could be developed into a sort of documentation or example Project for aspiring licensezero Python devs.

@makkus
Copy link

makkus commented Jul 1, 2018

Just thought of another command that could potentially be useful: pip show <package_name>

It gives output like this:

➜ pip show my_package
Name: my_package
Version: 0.4.0
Summary: My package does ...
Home-page: https://gitlab.com/makkus/my_package
Author: My Name
Author-email: my@email.com
License: Parity Public License 2.1.0
Location: /home/markus/projects/my_project
Requires: Click,
Required-by: 

This is the relevant portion of the corresponding setup.py:

...
...
...
test_requirements = ['pytest', ]

setup(
    author="My Name",
    author_email='my@email.com',
    classifiers=[
        'Development Status :: 2 - Pre-Alpha',
        'Intended Audience :: Developers',
        'License :: Other/Proprietary License',
        'Natural Language :: English',
        "Programming Language :: Python :: 2",
        'Programming Language :: Python :: 2.7',
        'Programming Language :: Python :: 3',
        'Programming Language :: Python :: 3.4',
        'Programming Language :: Python :: 3.5',
        'Programming Language :: Python :: 3.6',
    ],
    description="My package does ...",
    entry_points={
        'console_scripts': [
            'my_pkg=,y_package.cli:main',
        ]
    },
    install_requires=requirements,
    license="Parity Public License 2.1.0",
    long_description=readme + '\n\n' + history,
    include_package_data=True,
    keywords='my_package',
    name='my_package',
    packages=find_packages(include=['my_package']),
    setup_requires=setup_requirements,
    test_suite='tests',
    tests_require=test_requirements,
    url='https://gitlab.com/makkus/my_package',
    version='0.4.0',
    zip_safe=False,
)
...
...
...

So, you could iterate through the output of pip list and find all packages that have a Parity or Prosperity license. Even though you can't put a licensezero license directly in the setup.py classfiers list (basically tags), it might be worth investigating whether it's allowed to put a free-form string as value for the license key. And/Or maybe it's possible to add licensezero files to the approved pypi classifiers via some sort of formal process.

@makkus
Copy link

makkus commented Jul 1, 2018

If you wanted to get really fancy, every licensezero Python project could be required to have a sort of 'license' entry_point, which would point to a function that would return license information. That way you could write a short python app that would automatically list all such projects in the PYTHON_PATH. The result would then be used by the licensezero cli.

I'd probably still go with a pure file-based approach like the one you are trying at the moment, so this is just another option if that doesn't work out for some reason.

@kemitchell
Copy link
Member Author

Python development environments contain both development tools (repls, documentation generators, linters) as well as dependency libraries. I'm not big on Javascript, but I can imagine that's the same there, so you might have already solved that issue. Either way, it is probably relevant in determining whether a license is needed for any particular dependency that is licensezeroed.

License Zero licenses close the loophole for proprietary use of dev tools!

The licensezero command supports licensezero quote --no-noncommercial and licensezero quote --no-reciprocal for noncommercial and open developers, so they can avoid buying licenses they don't need. However, both Parity (the reciprocal license) and Prosperity (the noncommercial license) apply regardless of whether the package is a dev tool or a library. Using, say, a Propserity lint tool on a proprietary project is probably commercial use. Using, say, a Parity transpiler on closed, proprietary code breaks rule 3 of that license.

@kemitchell
Copy link
Member Author

So, you could iterate through the output of pip list and find all packages that have a Parity or Prosperity license.

This is basically how we're handling RubyGems at the moment. licensezero quote runs bundle show for a list of dependencies by name and version, then bundle show --paths for a list of their paths on disk. licensezero quote then checks each of those paths for a licensezero.json file.

That approach initially seemed unbearably slow. But I was able to speed it up by reading through bundle's manpages and finding ways to cut the number of times I had to run it down to two.

@kemitchell
Copy link
Member Author

@makkus, this is just incredible feedback. You obviously took significant time just to type out these responses, which I've read from top to bottom.

Would you be alright with a public nod on https://licensezero.com/thanks?

@anseljh
Copy link

anseljh commented Jul 2, 2018

I agree with @makkus's super detailed writeup. Good job!

I can speak to Pipenv a bit. Pipenv is not doing anything terribly fancy. It has essentially these elements:

  1. Wrapper hiding all the ugly bits of virtualenv
  2. The Pipfile & Pipfile.lock files for listing dependencies instead of one or morerequirements.txt files. Pipfile does have a separate section for development-only dependencies ([dev-packages]).

A Pipfile does not include any information about licenses, so you'll still have to get that another way—likely from each dependency's setup.py.

Also important to note that Pipenv is only for application dependencies; it is not meant for libraries. See https://docs.pipenv.org/advanced/#pipfile-vs-setuppy.

@kemitchell
Copy link
Member Author

Thanks, @anseljh! Your note on application dependencies was especially appreciated.

Python dependency management sure doesn't sound fun. Pipfile.lock does sound promising, but I'm aware that's not universal practice.

I have a lot of reading to do here. I'll try to bring as many of those notes together as I can. In the end, It's Complicated, but I want to do what I can to add meaningful support.

@makkus
Copy link

makkus commented Jul 2, 2018

The licensezero command supports licensezero quote --no-noncommercial and licensezero quote --no-reciprocal for noncommercial and open developers, so they can avoid buying licenses they don't need. However, both Parity (the reciprocal license) and Prosperity (the noncommercial license) apply regardless of whether the package is a dev tool or a library.

Yes, the problem I thought could come up in Python projects is that a dependency might or might not be in the virtualenv, depending on the circumstance. I'm not quite sure what the expectation is for when exactly the licensezero cli app will be run. Ideally it'd be when all -- both dev tools and dependency libraries -- are already installed and available in a virtualenv. Some tools are only necessary in a CI pipeline for example or when doing integration testing, so they might not be present when checking for licensezero libraries. I usually have a few different virtualenvs for the same project, depending on what I try to do. Even in my development env I don't have all libraries installed that I use across the whole lifecycle of the project.

Now, I don't think that's too difficult to deal with for most cases, more a matter of how sure you want to be to catch everything in every circumstance. Maybe just a case of documentation and explaining those limitation to users.

I'm still torn, but the more I think about it, the more I like the idea of having a little license 'helper' library in the dependencies (should be lean enough, and not have it's own dependencies of course) of a project. Haven't really thought about how to do it technically, but even just inheriting a parent Licensezero class might be enough for this library to find all sub-classes in the current environment. This could easily expose the metadata and maybe a few more convenience functions to the licensezero cli (which would just have to call something like python -c 'import licensezero; print licensezero.info()', and could maybe also be used to automatically create parts of 'About' messages and display an auto-generated license text if the developer chooses to do so, or some other common licensezero functionality.

Developers have to do some work to 'enable' licensezero anyway, doing it in code (or alternatively by adding one dependency to setup.py and the licensezero.json file to the root of the project and in the Manifest) wouldn't be much worse (or even better, since it could also contain some code to auto-generate said licensezero.json file and somesuch). And I think something like this would be more reliable than any combined pip/filesystem-trawling approach. It could still not find any dependencies that are not currently present in the virtualenv of course.

It would mean to have a language/project-type specific approach though, which would be more work to maintain/develop than having a generic solution. So if that can be done, it's probably still the more sensible way to go.

@makkus
Copy link

makkus commented Jul 2, 2018

Would you be alright with a public nod on https://licensezero.com/thanks?

Sure, of course.

@chason
Copy link

chason commented Jun 14, 2019

Is it necessary to autodetect the packages? I would prefer just feeding the licensezero app a requirements.txt file, additionally, this would let me automate the detection of licensezero libraries across several projects at once.

@kemitchell
Copy link
Member Author

@chason, thanks for your input! Does requirements.txt tend to live at the root of a project? Would it be enough to train licensezero quote to look for $PWD/requirements.txt, and parse if present?

@makkus
Copy link

makkus commented Jun 14, 2019

Yes, usually it lives in the root of the project. But not always. I'd say the majority still uses requirements.txt files for their dependencies, but it's not an overwhelming majority by any means. My dependencies are in setup.cfg, for example. Then there are people who use setup.py directly, and some that use pipenv. And poetry gets more and more traction too. Also, a requirements.txt files only holds the names (and also sometimes version numbers) of other packages, no other metadata at all. So there would definitely be more work involved than just parsing that file. Also, what about dependencies of dependencies?

It's going to be really difficult if you want to cover a meaningful portion of everything that is out there, even more so going forward, nobody knows yet how most Python projects will be packaged in the future. If I had to bet, I'd say poetry, and the more or less new standard pyproject.toml will win, but currently the tooling is just not there yet.

To make all that even more difficult, since Licensezero often also applies to development tooling, not just dependencies that will be bundled with a release need to be discovered, but also development dependencies. Those are typically declared in separate files in Python (actually, this is what best practices usually recommend to use requirements.txt for -- production dependencies should live in setup.cfg or setup.py -- if you don't use any of the newer tools). Anyway, big messy mess I think.

@chason : how would you implement your suggested requirements.txt-parsing? How would the licensezero app be able to find all licensezero-licensed projects by just using one requirements file? Installing a dummy virtualenv? I honestly can't think of a non-overengineered and expensive way to implement this. I'd guess i'd probably be a good idea to utilize pip-tools in some way.

Personally, I think the most future-proof and user-friendly way that covers most cases would probably be if it'd be possible to run the licensezero app in a virtualenv that typically has development as well as 'normal' dependencies installed (which all developers have anyway, since it is their working environment), and autodetect all licensezero project files either by looking for certain filenames in a certain location, or by having a Python class that inherits from a licensezero 'base' class, and which could be discovered via Python-native ways. I'd be happy to go into more detail, or write a prototype. But could well be I'm overlooking something obvious and all this is much easier.

@kemitchell
Copy link
Member Author

Is there some tool in Python that abstracts over all the various dependency-management solution, and outputs a report?

@makkus
Copy link

makkus commented Jun 15, 2019

Not that I know of. There are a few version and license checkers and such, but I have never found one that works well and reliably. And usually they only work in virtualenvs anyway, that is, when all dependencies are installed. And once that is the case, it's not that difficult to read a few environment variables and search for relevant metadata files with your go application (I think).

But, as I said, I'm not 100% sure I'm not missing an obvious, easy solution....

@techdragon
Copy link

I think this can be kept fairly to a fairly simple workflow.

  1. Run tool in python environment.
    • It shouldn't need to matter how the python environment is setup or managed, just that it is 'active', checking the packages in a stock virtualenv, one managed by poetry or pipenv or even during a tox based CI tool run, there are relatively simple approaches to list all the packages and get their metadata, in my experience it only get complicated and has problems when you try to do it from outside the python environment.
  2. Somehow merge the results of one-or-many potential invocations of step one.
    • This is similar to how I am currently managing coverage testing with tox. I specify that the test coverage from two different tox environments needs to be appended non destructively before generating the final results of my test coverage. This is in order to get proper test coverage reports when I'm dealing with code pathways that supporting certain features being disabled based on which database my library is working with. A similar mechanism to this will significantly simplify dealing with the dev/test/prod/etc python environment permutation issues.
  3. Generate final results.
    • We can build a final result from the "raw" data, with reasonable simplicity because it should be pretty close to the final data structures expected by the cli tool, so the final process should require little modification, effectively becoming "serialize", "dedupe", "pass to existing cli output related code".

Essentially the idea is that we should separate the concerns into a phase that is dependent on the specific python environment, and a phase that does not care about the specific python environment. Since only the python specific parts really need to care about which python environment they are running in, this feels like it should be more flexible and help fit within the still shifting python packaging landscape.


P.S. You can request the creation of new trove classifiers that python packages can then use in their package metadata. There are a few currently open examples and plenty of closed examples to show the complete process here https://github.com/pypa/warehouse/labels/classifier%20request.

I would suggest opening an issue requesting the addition of some classifiers similar to these rough examples based on the existing ones:

  • License :: OSI Approved :: Reciprocal Public License (RPL-1.5)
  • License :: License Zero :: Prosperity Public License 2.0.0
  • License :: License Zero :: Parity Public License 6.0.0
  • License :: Licence Zero :: Charity Public License 2.0.0

@kemitchell
Copy link
Member Author

@techdragon that comment was gold! Thank you so much! I’m particularly grateful to read your recommendation on the license identifiers.

As for invoking local Python, is there a short script you can think of to print the file system paths of dependencies of the Python project at the current working directory? Even a partial or toy attempt would be greatly appreciated.

I think I would be fine hard coding a Python script of the kind into the L0 CLI and piping it to python-c on $PATH.

@techdragon
Copy link

techdragon commented Sep 3, 2019

For invoking introspecting a given python environment, there are a few possible options depending on how a package will 'signal' that it is using a license zero based license.

If you want to drive it entirely off the trove classifiers, then it could be as simple as this, I've used a full string match for Classifier: Programming Language :: Python :: 3 as an example since it made it a bit easier to test and demonstrate the principle.

import pkg_resources
[print(package) for package in list(
    pkg for pkg in pkg_resources.working_set if (
        'Classifier: Programming Language :: Python :: 3' in list(
            pkg.get_metadata('METADATA' if pkg.has_metadata('METADATA') else 'PKG-INFO').splitlines()
        )
    )
)]

It converts into a short enough command that can be run via bash/shell to inspect the current
python environment.

python -c "import pkg_resources; [print(package) for package in list(pkg for pkg in pkg_resources.working_set if ('Classifier: Programming Language :: Python :: 3' in list(pkg.get_metadata('METADATA' if pkg.has_metadata('METADATA') else 'PKG-INFO').splitlines())))]"

This approach would also extend to doing a partial string match for something like License :: License Zero ::
It could also be modified to work with a specific license zero package metadata attribute that package authors had to setup on their package. This would be similar to, the "licensezero helper" described by @makkus, or a helper could in fact be written that did the job of applying this metadata correctly for various packaging configurations, since even with the broad array of packaging options for python, they all tend to produce package metadata either via list or dict interfaces in python, or using a yaml/toml/json file which can be read and then updated by a fairly simple tool that is careful to only touch its own metadata.

There are standards about the availability of specific metadata about a package, including the ability to mark the license the package is using. https://packaging.python.org/specifications/core-metadata/#license and the majority of tools in the python packaging ecosystem support and respect package author's choices with regards to the selection of any license related metadata they wish. Such as the use of multiple license files documented here in the wheel package format documentation. https://wheel.readthedocs.io/en/stable/user_guide.html?highlight=dist-info#including-license-files-in-the-generated-wheel-file

Alternately, you can continue the use of the licensezero.json files, but use python's built in package tooling to inspect which packages shipped a licensezero.json file as part of their package.

This code isn't quite as self contained of an example as it will require either a tempfile, or shell heredoc type passing of the program in via stdin, due to issues producing code that can handle the range of errors that might be produced when importing some packages.
In this example I'm checking for a licensezero.json file in the package root. But the file lookup can be configured to point somewhere else, such as in a licensezero folder.

for package in pkg_resources.working_set:
    try:
        print(package.key, pkg_resources.resource_exists(package.key, 'licensezero.json'))
    except:
        print(package.key, None)

Additionally there is the possibility of using the built in package metadata mechanisms as a way of storing the content of the licensezero.json file in a way that makes it more readily accessible.
If a package was setup to specify that its license_file was the licensezero.json or alternatively stored the content of the licensencezero.json in the LICENSE or LICENSE.txt file... then you could access the entire licensezero.json file content of all packages using a short python script like this.

import pkg_resources
for package for package in pkg_resources.working_set:
    try:
        if package.has_metadata('LICENSE'):
            print(package.get_metadata('LICENSE'))
    except:
        pass

Edit: I only tested these code fragments with the latest stable version of python (3.7.x). But there should be other ways to do it if these don’t work since these are some of the things that pip is built on top of. So this should definitely be possible on any useful version of python.

@makkus
Copy link

makkus commented Sep 3, 2019

Hey @kemitchell , not sure if that's helpful in any way, but I've setup a dummy Python project from my personal Python project cookiecutter template, for you to play around with: https://gitlab.com/makkus/license_test_project

It's using pyenv to setup a virtualenv, but I guess as @techdragon says, if you assume everything relevant is in the current virtualenv, then it doesn't matter how that virtualenv is setup. If you are familiar with a different way of setting up a virtualenv, you can just ignore the README.md and do it any other way.

My folder structure is also a bit different than what one usually sees, but there is no 'real' convention in regards to that in the Python community, as far as I know. Anyway, for your purposes that shouldn't really matter that much.

I've added @techdragon 's code to be executed when you invoke the 'dummy-command' cli command (in cli.py), I reckon that's useful to do testing.

Feel free to clone it, but I'd also be happy to work with you and add/adjust whatever is necessary, ( licensezero.json / trove classifier ) and maybe create a second project that relies on this one, so you can test that scenario. Didn't do that yet because I'm not sure how you go about those things.

Also, feel free to move that over to Github if that is more convenient for you. The CI won't work anymore then without a bit of work to migrate that to travis or whatever else Github users use nowadays, but that might not be necessary anyway. Might be interesting to use that to test different versions of Python, not sure....

@kemitchell
Copy link
Member Author

@techdragon and @makkus, thanks so much for your writing! I am very interested in the approach outlined.

In the end, I think a built-in script to list out the paths of dependencies in the environment, then iterating them to look for licensezero.json, strikes me as the simplest, dumbest, most reliable proposal so far. I'd like to register the L0 licenses for metadata, but there are also many packages under Parity and Prosperity that don't use licensezero.com. I want to make sure that the CLI only adds License Zero projects to the checkout page for licenses.

@techdragon
Copy link

techdragon commented Sep 4, 2019

I've just come across the conversation about some future changes to the way packages will specifiy licensing metadata here https://discuss.python.org/t/improving-license-clarity-with-better-package-metadata/2154/51 and after reading through the comments and the draft. It appears that the most reliable way for a package to store the data for distribution will be

  • SPDX License specifier string in the existing license metadata field
  • Using the license_files metadata field to specify one or more licenses.
    • A human readable format of the appropriate license text, Prosperity, Parity, etc.
    • If they are using licensezero.com, specifying the inclusion of the licensezero.json as an additional license file.

This will allow you to, for the licensezero.com using packages (which are the ones I believe the CLI is designed to work with). Walk all of the packages in the python environment directly getting the data you need roughly like this script. (I'd make this a more complete, but I'm unsure what the best way to format a parsable string for Go would be, so the output from the below script is package name + newline + json package data + newline )

import pkg_resources
for package for package in pkg_resources.working_set:
    try:
        if package.has_metadata('licensezero.json'):
            print(package.key)
            print(package.get_metadata('licensezero.json'))
    except:
        pass

The reason I feel its important to get the most of the python tooling is that there are some more advanced/complex use the python packaging tools that will cause trouble with taking a list of package names generated using python, and then trying to walk package directories for the appropriate data. Two biggest issues I can think of are compressed packages that don't have a traditional source folder yet do still have the appropriate metadata, and namespace packages, where the installed packages do not 100% match up with the directory tree that will need to be walked in order to get dat out of packages.

@kemitchell
Copy link
Member Author

@techdragon thanks so much! Do I correctly understand that pkg_resources will allow us to put arbitrary data into package metadata, like a licensezero.json dictionary entry?

I was not aware of compressed packages and installed packages.

You've put a lot of work on this. We should discuss how to get you recognized and compensated for that.

@kemitchell
Copy link
Member Author

@techdragon, are you currently offering licenses through L0? If so, it’s the least I can do to waive commission. That goes for all those who’ve contributed here.

@AstraLuma
Copy link

Ok, so my take on this, especially in light of PEP517.

There are two mainstream tools for producing redistributable packages: setuptools and poetry.

Managing environments themselves is a blend of various tools doing various things, including venv, pipenv, conda, poetry, tox, various venv managers, etc.

The common thing is scanning sys.path for either embedded data (eg, licensezero.json inside the package next to the .py files), or metadata (*.dist-info).

As I understand it (I've never looked at it with this use in mind), messing with the metadata isn't really feasible? The schema isn't specified to be extensible, so you'd have to re-implement some things or teach a lot of tools about your extensions.

IMHO, the lowest-friction way to go is to embed licensezero.json as a data file in the package, and then you can recursively scan sys.path for licensezero.json. It requires developers to add some bits either setup.py (setuptools), setup.cfg (setuptools), or pyproject.toml (poetry), but they should be pretty minimal copy/paste.

@kemitchell
Copy link
Member Author

@astronouth7303 thanks so much! There's so much here. It's a lot to take in. But it sounds like saying something like this could button it up:

Here's a licensezero.json file. Do what you gotta do, Python style, to make sure it will end up in sys.path for folks who install your package for their projects.

@AstraLuma
Copy link

Here's a licensezero.json file. Do what you gotta do, Python style, to ship it inside your package.

Something something package data something.

@kemitchell
Copy link
Member Author

@astronouth7303 I'm sorry. I didn't quite get the gist of your last comment here. Maybe I could ask you to put it in different words?

@AstraLuma
Copy link

The python term for what you want is "package data" as in:

  • setuptools
  • ... you can presumably do it with poetry, but the docs are being unhelpful.

@AstraLuma
Copy link

Ok, per python-poetry/poetry#2015 (comment) Poetry requires no extra steps.

@kemitchell
Copy link
Member Author

@astronouth7303 thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants