Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debut-0.9.4 not detecting GPLv2 from copyright texts #11

Closed
rnjudge opened this issue Mar 3, 2021 · 2 comments
Closed

debut-0.9.4 not detecting GPLv2 from copyright texts #11

rnjudge opened this issue Mar 3, 2021 · 2 comments

Comments

@rnjudge
Copy link

rnjudge commented Mar 3, 2021

Tern uses the debut package to parse debian copyrights and find package licenses. I understand that debut is now debian-inspector but as far as I can tell, the code is the same at the moment so I am opening an issue in this repo. Debut is not finding a license for the following copyright text (libgpm2copyright.txt) from the libgpm2 package. Here's what we're doing to collect the licenses that doesn't yield any results:

>>> from debut import debcon
>>> from debut import copyright as debut_copyright

>>> with open('libgpm2copyright.txt') as file:
...     libgpm2copy = file.read()

>>> collected_paragraphs = list()
>>> for paragraph in iter(debcon.get_paragraphs_data(libgpm2copy)):
...     if 'license' in paragraph:
...             cp = debut_copyright.CopyrightLicenseParagraph.from_dict(paragraph)
...             collected_paragraphs.append(cp)
>>> collected_paragraphs
[CopyrightLicenseParagraph(license=LicenseField(name='', text=None), comment=FormattedTextField(text=None), extra_data={})]


>>> deb_pkg_data = debut_copyright.DebianCopyright(collected_paragraphs).to_dict()
>>> deb_pkg_data
{'paragraphs': [{'license': '', 'comment': ''}]}

Is it possible for this text to be parse-able for licenses by debian-inspector?

@pombredanne
Copy link
Member

pombredanne commented Mar 3, 2021

@rnjudge Hi! 👋 and thanks for the report!
This is one of the many unstructured copyright files.
There are these things we can do:

  1. try harder to infer some structure from this Recover parsing from almost machine-readable copyright files #6
  2. improve license detection in Debian copyright files in Improve quality and tracing of license detection in Debian copyright files scancode-toolkit#2390

Separately these are related:

pombredanne pushed a commit that referenced this issue May 12, 2021
@AyanSinhaMahapatra
Copy link
Member

We now process and report correctly the license in unstructured copyright files at https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/debian_copyright.py#L393
Since these have no structure whatsoever, we recover from parsing and treat this differently.

Some files are semi-structured like pulseaudio and we have #6 open for this.
Here's the result scanning the copyright file mentioned in the issue.
libgpm2_copyright.json

Other related issues are tracked elsewhere, so closing for now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants