-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor scancode license texts #4523
Conversation
@fviernau @sschuberth This is a proposal for how we could decouple the resolving of ScanCode license texts from the imported files, as a preparation for making the ScanCode version configurable. Please have a look and provide feedback, no need for code review until we agree on the approach. As an alternative we could also keep the imported texts and read license texts from the ScanCode installation only as a fallback, in case the installed version differs from the version for which we imported the files. |
cbd8f8d
to
6b3300c
Compare
I agree with the approach.
I like the fact that we can get rid of the license resources this way. After all, ScanCode is just one of the scanners that can be used, so I always felt a bit uncomfortable giving its license texts a special role in ORT. |
I also agree with the approach. I have some use cases and questions I'd like to discuss though:
|
I don't believe that's necessary. I'd rather make the logic more generic, e.g. by recursively searching for a fixed example license file below the installation directory, and then using that directory from then on to look up license files.
We still have that mechanism in place, but we should indeed discuss whether we keep the bootstrapping logic in general for command line tools, scanners and other tools.
That should be as simple as putting the desired ScanCode version in PATH / make it the first one found via PATH.
I don't think so, as also currently we do not have a mechanism to change the path for license files in tests. But also see my previous answer, we could tamper with PATH in tests after all. |
val scanCodeDir = getPathFromEnvironment("scancode")?.parentFile?.absoluteFile | ||
|
||
return scanCodeDir?.resolve("src/licensedcode/data/licenses")?.takeIf { it.isDirectory } | ||
?: scanCodeDir?.resolve("../lib/site-packages/licensedcode/data/licenses")?.takeIf { it.isDirectory } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pombredanne I'm looking for a simple heuristic to locate the ScanCode license directory based on the path of the scancode
executable. The two paths above seem to work for the PIP install and the binary distribution of ScanCode. Are there any other paths you would suggest adding here? It's good enough it works for the standard ways to install ScanCode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's sounds fine. We could also publish a tarball of the licenses alone if this helps? There is also https://scancode-licensedb.aboutcode.org/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the link, I think we could use this as a fallback in case the license directory can not be found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, and it will soon be also available at https://licensedb.org which has kindly been transferred and gifted over by the previous owner!
The thing is that the directory is not always below the path of ScanCode, and in most cases a simple heuristic might be good enough. We could add a way to configure the path to the licenses if required.
But in general the bootstrapping topic is separate to this change.
I don't think we need this for the ScanCode path, but maybe for the path to the licenses. |
Agreed. I was meaning to say we need to discuss this, but outside the scope of this PR.
Well, with your current code the license path is calculated relative to the |
6a8f3f1
to
bea7bd3
Compare
I have updated the PR to continue using the ScanCode version of the SPDX license texts. I still have to fix some tests before removing the draft status. |
60ef576
to
cdbf7ee
Compare
cdbf7ee
to
45a9d58
Compare
45a9d58
to
d67277b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stopped the review for now at spdx-utils: Remove the task to generate license ref text resources
.
@@ -21,6 +21,7 @@ variables: | |||
GO_DEP_VERSION: 0.5.4 | |||
PYTHON_PIPENV_VERSION: 2018.11.26 | |||
RUST_VERSION: 1.35.0 | |||
SCANCODE_VERSION: 3.2.1rc2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be squashed to the previous commit, so really each commit in this PR would pass tests on its own?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is this change really necessary as ScanCode still would get bootstrapped, or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be squashed to the previous commit, so really each commit in this PR would pass tests on its own?
I have instead merged it with a new commit that removes the committed ScanCode license files which I forgot to do previously. FYI, I have also rewritten the commit that introduces the function to locate the ScanCode license dir to automatically do this in the license text functions. Otherwise all callers of those functions would have had to do this. This was actually the reason why this was put on hold until the utils refactoring was done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is this change really necessary as ScanCode still would get bootstrapped, or?
Only by the ScanCodeScannerFunTest
, but this would not help for the license text related tests which do not call the ScanCode
binary at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like detecting the ScanCode dir does not work on CI, I'll have to look into this next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message nit: the "previous commit" reference does not seem right anymore.
1535874
to
d3146cf
Compare
33a2292
to
52d0dae
Compare
Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.io>
Add support for multiple license text directories instead of only a single custom directory, to prepare for an upcoming change. Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.io>
Do not require that the license text filename matches the license id exactly, but also allow files without the LicenseRef prefix. This is a preparation for using the ScanCode license text directory where the filenames do not have the LicenseRef prefix. Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.io>
Try to locate the directory which contains the ScanCode license texts and use it for resolving license texts. This is a preparation for removing the ScanCode license texts from the spdx-utils resources. The advantage of the new approach is that the license texts always match the installed ScanCode version. Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.io>
The license texts can instead be loaded from an installed ScanCode since the previous commit. This requires to install ScanCode for `LinuxTest` and `WindowsTest` on Azure pipelines, as otherwise tests related to ScanCode licenses would fail. Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.io>
The task is not required anymore as of the previous commit. Note that the ScanCode license texts are still downloaded because they are used for generating the SPDX license text resources. Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.io>
Add a Gradle property to configure which version of the SPDX license list is imported. This allows to import a specific release instead of always importing the latest master. The property must point to a valid tag [1] in the license-list-data repository. [1]: https://github.com/spdx/license-list-data/tags Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.io>
…ation" This reverts commit ded4f20. Since the ScanCode license texts are now only used for the SPDX licenses it is not required anymore to sync them with the ScanCode version. Instead, it is preferrable to use the latest version of those license texts to get the best quality. Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.io>
Done by running "./gradlew :utils:spdx:generateSpdxEnums". Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.io>
And upgrade the SPDX license text resources by running "./gradlew :utils:spdx:generateSpdxEnums". Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.io>
52d0dae
to
2ad1900
Compare
Rename the test to match the name of the file which contains the functions under test. This is a fixup for d564080 where `SpdxUtils.kt` was renamed to `Utils.kt`. Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.io>
2ad1900
to
a9aa52d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve as I tested that it works with our internal setup. I didn't review it in detail.
Merged, ignoring the unrelated failing test. |
ScanCode may return SPDX license containing underscores characters which is not allwed, see [1]. This results in ORT's scanner crashing due to an SpdxException when it tries to parse the SPDX license key. This issue has been first occured in 2020 and fixed by [2]. It got re-introduced recently by [3]. Fix the issue based on the idea of the orignal fix [2]. The touched function becomes less efficient. Delierately don't refactor for efficiency because this fix can be reverted as soon as [3] is fixed. [1] aboutcode-org/scancode-toolkit#2813 [2] fb0370f [3] #4523 Signed-off-by: Frank Viernau <frank.viernau@here.com>
ScanCode may return SPDX license keys containing underscores characters which is not allwed, see [1]. This results in ORT's scanner crashing due to an SpdxException when it tries to parse the SPDX license key. This issue has been first occured in 2020 and fixed by [2]. It got re-introduced recently by [3]. Fix the issue based on the idea of the orignal fix [2]. The touched function becomes less efficient. Delierately don't refactor for efficiency because this fix can be reverted as soon as [3] is fixed. [1] aboutcode-org/scancode-toolkit#2813 [2] fb0370f [3] #4523 Signed-off-by: Frank Viernau <frank.viernau@here.com>
ScanCode may return SPDX license keys containing underscores characters which is not allwed, see [1]. This results in ORT's scanner crashing due to an SpdxException when it tries to parse the SPDX license key. This issue has been first occured in 2020 and fixed by [2]. It got re-introduced recently by [3]. Fix the issue based on the idea of the orignal fix [2]. The touched function becomes less efficient. Delierately don't refactor for efficiency because this fix can be reverted as soon as [3] is fixed. [1] aboutcode-org/scancode-toolkit#2813 [2] fb0370f [3] #4523 Signed-off-by: Frank Viernau <frank.viernau@here.com>
ScanCode may return SPDX license keys containing underscores characters which is not allwed, see [1]. This results in ORT's scanner crashing due to an SpdxException when it tries to parse the SPDX license key. This issue has first occured in 2020 and been fixed by [2]. It got re-introduced recently by [3]. Fix the issue based on the idea of the orignal fix [2]. The touched function becomes less efficient. Delierately don't refactor for efficiency because this fix can be reverted as soon as [3] is fixed. [1] aboutcode-org/scancode-toolkit#2813 [2] fb0370f [3] #4523 Signed-off-by: Frank Viernau <frank.viernau@here.com>
ScanCode may return SPDX license keys containing underscores characters which is not allwed, see [1]. This results in ORT's scanner crashing due to an SpdxException when it tries to parse the SPDX license key. This issue has first occured in 2020 and been fixed by [2]. It got re-introduced recently by [3]. Fix the issue based on the idea of the orignal fix [2]. The touched function becomes a bit less efficient which is not an issue for the current callers / use cases. However, a consequtive look-up of many license texts would probably more efficient if the files in the directories were listed only once. [1] aboutcode-org/scancode-toolkit#2813 [2] fb0370f [3] #4523 Signed-off-by: Frank Viernau <frank.viernau@here.com>
ScanCode has one SPDX license key containing an underscore characters which is not allwed, see [1]. This results in ORT's scanner crashing due to an SpdxException when it tries to parse the SPDX license key. This issue has first occured in 2020 and been fixed by [2]. It got re-introduced recently by [3]. Deliberatly don't fix the general problem with underscores in `getLicenseTextFile()` in favor of a license ID specific work around, because this can be implemented efficiently without doing a refactoring first. [1] aboutcode-org/scancode-toolkit#2813 [2] fb0370f [3] #4523 Signed-off-by: Frank Viernau <frank.viernau@here.com>
ScanCode has one SPDX license key containing an underscore characters which is not allwed, see [1]. This results in ORT's scanner crashing due to an SpdxException when it tries to parse the SPDX license key. This issue has first occured in 2020 and been fixed by [2]. It got re-introduced recently by [3]. Deliberatly don't fix the general problem with underscores in `getLicenseTextFile()` in favor of a license ID specific work around, because this can be implemented efficiently without doing a refactoring first. [1] aboutcode-org/scancode-toolkit#2813 [2] fb0370f [3] #4523 Signed-off-by: Frank Viernau <frank.viernau@here.com>
ScanCode has one SPDX license key containing an underscore characters which is not allwed, see [1]. This results in ORT's scanner crashing due to an SpdxException when it tries to parse the SPDX license key. This issue has first occured in 2020 and been fixed by [2]. It got re-introduced recently by [3]. Deliberatly don't fix the general problem with underscores in `getLicenseTextFile()` in favor of a license ID specific work around, because this can be implemented efficiently without doing a refactoring first. [1] aboutcode-org/scancode-toolkit#2813 [2] fb0370f [3] #4523 Signed-off-by: Frank Viernau <frank.viernau@here.com>
Read the ScanCode license texts from the ScanCode installation instead of importing them to spdx-utils. Please see the commit messages for details.