Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add license files to each of the subpackages #17856

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ericwb
Copy link

@ericwb ericwb commented Feb 19, 2025

Description

This change adds the MIT license file to each of the subpackages so that poetry picks them up and includes them in the source distribution when published to PyPI.

The one subpackage not requiring any change is
llama-index-integrations/readers/llama-index-readers-pdf-marker which already includes a GPL license file.

Fixes # 10806

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • I ran make format; make lint to appease the lint gods

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Feb 19, 2025
@logan-markewich
Copy link
Collaborator

Can all of these actually be published as MIT though? I would bet there are some packages with dependencies other than MIT

@ericwb
Copy link
Author

ericwb commented Feb 19, 2025

@logan-markewich Good point, quite possibly not. However, if you search the repo tree for "license = "MIT" you'll find they are already being published to PyPI as MIT licensed via the package metadata in pyproject.toml (with the one GPL exception as noted above).

@ericwb
Copy link
Author

ericwb commented Feb 19, 2025

After running a recursive scan using licensecheck, I do see some GPL and AGPL dependencies for various packages which would impact what license they could associate with them. Because GPL and AGPL are copyleft and the affected code is dynamically linked (like importing), those packages should also be GPL or AGPL.

Project Path Dependency License
./llama-index-packs/llama-index-packs-arize-phoenix-query-engine html2text GNU GENERAL PUBLIC LICENSE (GPL)
./llama-index-integrations/callbacks/llama-index-callbacks-uptrain PyMuPDF DUAL LICENSED - GNU AFFERO GPL 3.0 OR ARTIFEX COMMERCIAL LICENSE
./llama-index-integrations/readers/llama-index-readers-pdf-table ghostscript GNU GENERAL PUBLIC LICENSE V3 OR LATER (GPLV3+)
./llama-index-integrations/readers/llama-index-readers-pdf-marker marker-pdf GNU GENERAL PUBLIC LICENSE V3 OR LATER (GPLV3+)
./llama-index-integrations/readers/llama-index-readers-pdf-marker surya-ocr GNU GENERAL PUBLIC LICENSE V3 OR LATER (GPLV3+)
./llama-index-integrations/readers/llama-index-readers-pdf-marker texify GNU GENERAL PUBLIC LICENSE V3 OR LATER (GPLV3+)
./llama-index-integrations/readers/llama-index-readers-stripe-docs html2text GNU GENERAL PUBLIC LICENSE (GPL)
./llama-index-integrations/readers/llama-index-readers-nougat-ocr Levenshtein GNU GENERAL PUBLIC LICENSE V2 OR LATER (GPLV2+)
./llama-index-integrations/readers/llama-index-readers-confluence html2text GNU GENERAL PUBLIC LICENSE (GPL)
./llama-index-integrations/readers/llama-index-readers-airbyte-cdk Unidecode GNU GENERAL PUBLIC LICENSE V2 OR LATER (GPLV2+)
./llama-index-integrations/readers/llama-index-readers-web html2text GNU GENERAL PUBLIC LICENSE (GPL)
./llama-index-integrations/readers/llama-index-readers-kaltura KalturaApiClient GNU AFFERO GENERAL PUBLIC LICENSE V3 OR LATER (AGPLV3+)
./llama-index-integrations/readers/llama-index-readers-imdb-review cinemagoer GNU GENERAL PUBLIC LICENSE (GPL)
./llama-index-integrations/readers/llama-index-readers-upstage PyMuPDF DUAL LICENSED - GNU AFFERO GPL 3.0 OR ARTIFEX COMMERCIAL LICENSE
./llama-index-integrations/readers/llama-index-readers-boarddocs html2text GNU GENERAL PUBLIC LICENSE (GPL)
./llama-index-integrations/llms/llama-index-llms-upstage PyMuPDF DUAL LICENSED - GNU AFFERO GPL 3.0 OR ARTIFEX COMMERCIAL LICENSE

@ericwb
Copy link
Author

ericwb commented Feb 20, 2025

Note: I opened a related issue on project uptrain since it wasn't properly licensed based on a dependency. uptrain-ai/uptrain#745

@ericwb
Copy link
Author

ericwb commented Feb 20, 2025

Also opened a bug on nougat-ocr which also was improperly licensed.

facebookresearch/nougat#255

@ericwb
Copy link
Author

ericwb commented Feb 20, 2025

Opened anther issue on airbyte-python-cdk which didn't have a proper license.
airbytehq/airbyte-python-cdk#362

@ericwb
Copy link
Author

ericwb commented Feb 20, 2025

Ok, I have re-licensed the affected subpackages as a result of either direct or transitive dependencies on strong copyleft licenses (GPL and AGPL).

This change adds the MIT license file to each of the subpackages
so that poetry picks them up and includes them in the source
distribution when published to PyPI.

The one subpackage without any change is
llama-index-integrations/readers/llama-index-readers-pdf-marker
which already includes a GPL license file.

Fixes: run-llama#10806

Signed-off-by: Eric Brown <eric_wade_brown@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants