You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Integrating the unknown license detection --unknown-licenses step in a better way with the current license detection post processing such that it is always used in the cases where we have no detections or sub-par detections with multiple fragmented matches. This is complementary to step 1. as it is restricts false positive detections, but also increases false negatives, by discarding approximate matches, and to cover there, we need the unknown license detection, to make sure we don't lose the good things about scancode license detection: strong approximate matching.
There are two main steps/solution elements here, which are WIP/already implemented. But we need to put them together in a more effective way and test this better to make sure we're doing much better on false positives and to also prove that we are not failing to detect any piece of license related test due to the stricter restrictions put on the detection rules.
Further follow up is required to test/validate that false positives were actually reduced:
To test, we will run license detection on existing scancode license rules selectively such that we run license detection on a group of rules with a specific license_expression, and we configure scancode to run with a license index such that rules of this particular license_expression is not present, but all other rules are indexed there.
We also need to do a review pass on all the open license detection related issues and make sure they are fixed/closed or summarized into more action items.
See also the long-running RFC issue on this topic: #2878, which has two more action items:
This issue summarizes remaining work on scancode license detection false positives:
--unknown-licenses
step in a better way with the current license detection post processing such that it is always used in the cases where we have no detections or sub-par detections with multiple fragmented matches. This is complementary to step 1. as it is restricts false positive detections, but also increases false negatives, by discarding approximate matches, and to cover there, we need the unknown license detection, to make sure we don't lose the good things about scancode license detection: strong approximate matching.There are two main steps/solution elements here, which are WIP/already implemented. But we need to put them together in a more effective way and test this better to make sure we're doing much better on false positives and to also prove that we are not failing to detect any piece of license related test due to the stricter restrictions put on the detection rules.
Further follow up is required to test/validate that false positives were actually reduced:
license_expression
, and we configure scancode to run with a license index such that rules of this particularlicense_expression
is not present, but all other rules are indexed there.See also the long-running RFC issue on this topic: #2878, which has two more action items:
The text was updated successfully, but these errors were encountered: