Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track lines when parsing copyright files #13

Closed
pombredanne opened this issue Apr 19, 2021 · 2 comments
Closed

Track lines when parsing copyright files #13

pombredanne opened this issue Apr 19, 2021 · 2 comments

Comments

@pombredanne
Copy link
Member

This would be helpful when reviewing results

@pombredanne
Copy link
Member Author

I have started to work on this as the lack of proper start and end line in Debian copyright license detection makes reviewing test failures rather hard.

I searched for a library that track line numbers and could not find any. And none seems really suitable to patch and add line numbers.

So my overall approach is going to be:

  • add a new module that can parse RFC822/deb822 formats and track lines for each elements at the low level (essentially replacing our use of the standard emailmodule)
  • create a new debcon2.py module modeled after debcon.py that will track line numbers at the paragraphs and field levels
  • create a new copyright2.py module modeled after copyright.py that will track line numbers at the paragraphs and field levels and is backed by debcon2.py
  • add tests
  • try copyright2.py in ScanCode for #2643
  • See if debcon2 and copyright2 can be dumbed down and also not track licenses when not needed and replace the older debcon and copyright. Otherwise, keep the debcon and copyright around with deprecation warnings and drop them in the future.

pombredanne added a commit that referenced this issue Sep 7, 2021
This adds support for tracking line number when processing copyright
files. The approach is to only support this for copyright files and
keep a mapping of start/end line numbers by field name at the paragraph
level. This way existing fields do not need modifications and the core
code update is when paragraphs are created which is limited to a single
place.

There is also a new deb822 module replacing the email parser to parse
copyright files keep track of line numbers.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Sep 14, 2021
Track line numbers in copyright files #13

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit to aboutcode-org/scancode-toolkit that referenced this issue Sep 14, 2021
This allows to pass a string which we known starts at some line offset.
This is useful for Debian copyright parsing for instance

See: #2643
See: aboutcode-org/debian-inspector#13

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit to aboutcode-org/scancode-toolkit that referenced this issue Sep 14, 2021
Add new licensing.get_license_matches_from_query_string() function
that accepts a text and a start line (defaulting at 1). This is used
to build a Query which is then passed to the new Index.match_query()
method.

See: #2643
See: aboutcode-org/debian-inspector#13

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Member Author

This has been implemented and released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant