Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve copyrights detection #3752

Merged
merged 29 commits into from
Jun 26, 2024
Merged

Improve copyrights detection #3752

merged 29 commits into from
Jun 26, 2024

Conversation

pombredanne
Copy link
Member

@pombredanne pombredanne commented Apr 26, 2024

This PR improves copyright detection

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

Reported-by: Anton Augsburg @vw-anton
Reference: #3655
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Reported-by:  Dimitris Iliou @dimitris-iliou
Reference: #3735
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Spotted in some common python libraries such as numpy and scipy

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Use an input file where each line is either:
- a URL to fetch
- a text to test

Then generate a test data files pair accordingly

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
- Start detecting "is held by"
- Do not include some trailing junk

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Reference: #3764
Reported-by: Anton Augsburg @vw-anton
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Make detection of copyright with a single lowercase name more specific

Reference: #3764
Reported-by: Anton Augsburg @vw-anton
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
This makes copyright detection more specific

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Also improve NOTICEs, and other misc. variants
Don not detect "The Initial Developer"

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Reference: #3797
Reported-by: Jörg Arndt @Joerki
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Handle corner cases with markup
Detect new copyright forms.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
* Handle better various parens, markup and quotes

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pombredanne we need to fix the test failures here and after regenerating it seems to me like some of these are regressions potentially, we need more review of these failures.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne pombredanne changed the title Apply small copyrights detection improvements Improve copyrights detection Jun 22, 2024
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Member Author

@AyanSinhaMahapatra ready for your review, all greeen

Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I have a couple small questions and fixes here.

tests/textcode/test_markup.py Outdated Show resolved Hide resolved
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

Co-authored-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks++ @pombredanne This improves copyright detection a lot!
Merging!

@AyanSinhaMahapatra AyanSinhaMahapatra merged commit 1242518 into develop Jun 26, 2024
34 checks passed
@AyanSinhaMahapatra AyanSinhaMahapatra deleted the misc-copyrights branch June 26, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants