Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible false positive copyright? #1669

Open
pombredanne opened this issue Aug 1, 2019 · 2 comments
Open

Possible false positive copyright? #1669

pombredanne opened this issue Aug 1, 2019 · 2 comments

Comments

@pombredanne
Copy link
Member

pombredanne commented Aug 1, 2019

Description

This is a problem found in ClearlyDefined by @bduranc

Here is the gist of a chat conversion:

duranbc: @philippe : Not necessarily a bug, but perhaps an interesting observation.
The following component shows (C) Hewlett-Packard, however the statement appears to be extracted from the metadata of a JPEG image.
CD Definition: https://clearlydefined.io/definitions/git/github/h2non/filetype/b2d66fbb5aed4a1fe9f0ca71a109b9161373866a
If you look inside the file "sample.tar", you will find a few JPEG images. The metadata has the HP copyright statement inside it.
https://github.com/h2non/filetype/blob/b2d66fbb5aed4a1fe9f0ca71a109b9161373866a/fixtures/sample.tar
I did not know CD goes this deep into binary files, and have not taken notice of this before.

Philippe:
actually this is scancode diving in these binaries quite happily (it is designed to do this)

So Copyright (c) 1998 Hewlett-Packard Company is correct there. BUT one test sample image... this is also noise, is it?
actually this very specific copyright "Copyright (c) 1998 Hewlett-Packard Company" in a JPEG file is a problem eventually: https://www.flickr.com/help/forum/en-us/7987/
It sounds like this is present in many JEPG files BUT at the same time this is not in the exif metadata of the image
net-net, I think this should be always excluded when found exactly in a JPEG file
or not... there seems to be more to it ...
See nodejs/node#5749
So ScanCode may not be wrong there AND there is a possibly weird underlying issue ... that said these images (the one you found) are test fixtures and therefore not core code, so this is minor in this case

@pombredanne
Copy link
Member Author

Here is the output of ImageMagick indentify:

$ identify -verbose sample.jpg 
Image: sample.jpg
  Format: JPEG (Joint Photographic Experts Group JFIF format)
  Mime type: image/jpeg
  Class: DirectClass
  Geometry: 600x400+0+0
....
    icc:copyright: Copyright (c) 1998 Hewlett-Packard Company
    icc:description: sRGB IEC61966-2.1
    icc:manufacturer: IEC http://www.iec.ch
    icc:model: IEC 61966-2.1 Default RGB colour space - sRGB
 ....
 Version: ImageMagick 6.8.9-9 Q16 x86_64 2019-06-15 http://www.imagemagick.org

@pombredanne
Copy link
Member Author

This copyright is https://en.wikipedia.org/wiki/ICC_profile and does not apply really to the image proper. The way to handle could be:

  1. have an optional way to ignore copyrights detected in images
  2. specifically ignore icc:copyright when in an image
  3. have a generic way to ignore certain copyright in some context based on an option and configuration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant