Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the --ignore behaviour needs refinement #726

Open
pombredanne opened this issue Aug 17, 2017 · 8 comments
Open

the --ignore behaviour needs refinement #726

pombredanne opened this issue Aug 17, 2017 · 8 comments

Comments

@pombredanne
Copy link
Member

The glob patterns are applied to each path segment making it impossible to ignore a path with more than one segment. It should instead be applied to the whole path at once which would be more useful and intuitive.

@yash-nisar
Copy link
Contributor

@pombredanne Can you please elaborate with an example to make things a bit lucid ?

@pombredanne
Copy link
Member Author

@yash-nisar not sure exactly in fact. May you can come with some tests to see if this always works more or less like with .gitignore?

@mtenberge
Copy link

+1 on this one...

Use case: one of our projects (a Go + NPM combination) includes the npm package spdx-license-list, which includes a subdirectory with numerous existing license texts. Furthermore, the main directory of this package holds some json files containing the same license texts. Of course, these files give false positives. But I can't ignore the whole package subdirectory, because the package also has one LICENSE.txt file for the package itself...

Besides that, I'd like to exclude minified CSS files because they return the whole file instead of only the matched license text (because of the missing line breaks). The same license info is also in the non-minified files, so I'm not missing anything there by doing that...

And finally, the build and test process results in a few (intermediate/final) output files, which I'd like to ignore as well.

So ideally, my ignores would look something like:

scancode -clpi --json-pp $SCANRESULTSFILE --license-text -n 4 --consolidate \
  --summary --filter-clues --is-license-text \
  --ignore "controller/.wwhrd.yml" \
  --ignore "controller/.testCoverage.txt" \
  --ignore "controller/coverage.html" \
  --ignore "controller/coverage_by_function.txt" \
  --ignore "controller/.testOutput.txt" \
  --ignore "controller/.coverage.tmp" \
  --ignore "controller/controller" \
  --ignore "app/build/*" \
  --ignore "app/node_modules/spdx-license-list/licenses/*" \
  --ignore "app/node_modules/spdx-license-list/spdx*.json" \
  --ignore "app/node_modules/type-detect/package.json" \
  --ignore "*.min.css" \
  --ignore "*.js.map" \
  --ignore "*.mjs.map" \
  controller app

Needless to say that this currently doesn't work, the --ignore entries containing / are not considered and those files are not excluded.

@pombredanne
Copy link
Member Author

pombredanne commented Jun 10, 2021

@mtenberge thanks for the details! I reckon this has been surprisingly hard to get these ignores right.
I wish there were a good lib for this.
Your example makes me think we should also have a way to store these in a config file!

@mtenberge
Copy link

Thanks for your response. The possibility of an ignore file, with comparable functionality to e.g. .gitignore, would be nice. But as you can see, providing these options from a bash script is very workable too.

As an alternative for a full-fledged implementation of includes, excludes and config files and the corresponding include/exclude logic, maybe it would be easier to accept a list of files to be scanned from a file? This would allow existing powerful utilities, such as find (including its prune action and all its testing possibilities), grep (with its regexp support) etc to be used to prepare the list. For this to work correctly, no recursion/tree walking should be performed by scancode itself (or it would again include files that were just removed from the list).

Scancode can already accept a list of to-be-scanned files/directories on its command line, but this can't be used for a large number of files, because the length of the command line is limited.

@mtenberge
Copy link

mtenberge commented Jun 11, 2021

see also #1392 and #2454

@Blackclaws
Copy link

Wouldn't it make sense to allow specification of an ignore file with the same syntax as a .gitignore file (which incidentally is the same as dockerignore and some other ignore formats). There are cases where you want to scan a checked out git repo that you might have already run some builds on however and that is therefore full of files that aren't really part of the package anyway.

@pombredanne
Copy link
Member Author

@Blackclaws re:

Wouldn't it make sense to allow specification of an ignore file with the same syntax as a .gitignore file (which incidentally is the same as dockerignore and some other ignore formats).

Yes 100% agree. This exactly what we want to have. There has been some work towards enabling this with aboutcode-org/commoncode#42
Next will be to find a decent library and dropping the custom ignores and also use a dot ignore file. Would a .scanignore or a .scancodeignore work as a name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants