diff --git a/CHANGELOG.rst b/CHANGELOG.rst index bb0a8b9f619..60aed28016a 100644 --- a/CHANGELOG.rst +++ b/CHANGELOG.rst @@ -5,16 +5,12 @@ Changelog 31.0.0 (next, roadmap) ----------------------- -Important API changes: -~~~~~~~~~~~~~~~~~~~~~~~~ +This is a major release with important bug and security fixes, new and improved +features and API changes. -- Adopted the new skeleton from https://github.com/nexB/skeleton - The key change is the location of the virtual environment. It used to be - created at the root of the scancode-toolkit directory. It is now created - under the ``venv`` subdirectory. -- The main package API function `get_package_infos` is deprecated, and - replaced by `get_package_data`. +Important API changes: +~~~~~~~~~~~~~~~~~~~~~~~~ - The data structure of the JSON output has changed for copyrights, authors and holders. We now use a proper name for attributes and not a generic "value". @@ -31,14 +27,14 @@ Important API changes: rather than "packages". This has all the data attributes of a "package_data" field plus others: "package_uuid", "package_data_files" and "files". -- There is a a new top-level "packages" attribute that contains package - instances that can be aggregating data from multiple manifests. + - There is a a new top-level "packages" attribute that contains package + instances that can be aggregating data from multiple manifests. -- There is a a new top-level "dependencies" attribute that contains each dependency - instance, these can be standalone or releated to a package. + - There is a a new top-level "dependencies" attribute that contains each + dependency instance, these can be standalone or releated to a package. -- There is a new resource-level attribute "for_packages" which refers to packages - through package_uuids (pURL + uuid string). + - There is a new resource-level attribute "for_packages" which refers to + packages through package_uuids (pURL + uuid string). - The data structure for HTML output has been changed to include emails and urls under the "infos" object. The HTML template displays output for holders, @@ -48,12 +44,18 @@ Important API changes: column to "path". "copyright_holder" has been renamed to "holder" - The license clarity scoring plugin has been overhauled to show new license - clarity criteria. More details of the new criteria are provided below. + clarity criteria. More details of the new scoring criteria are provided below. -- The functionality of the summary plugin has been changed to provide declared - origin information for the codebase being scanned. The previous summary plugin - functionality has been preserved in the new ``tallies`` plugin. More details - are provided below. +- The functionality of the summary plugin has been imprived to provide declared + origin and license information for the codebase being scanned. The previous + summary plugin functionality has been preserved in the new ``tallies`` plugin. + More details are provided below. + +- ScanCode has adopted the new code skeleton from https://github.com/nexB/skeleton + The key change is the location of the virtual environment. It used to be + created at the root of the scancode-toolkit directory. It is now created + under the ``venv`` subdirectory. You mus be aware of this if you use ScanCode + from a git clone Copyright detection: @@ -76,7 +78,7 @@ License detection: - XXXX new license detection rules have been added, and - XXXX existing license rules have been updated. - XXXX existing false positive license rules have been removed (see below). - - The SPDX license list has been updated to the latest v3.15 + - The SPDX license list has been updated to the latest v3.16 - The rule attribute "only_known_words" has been renamed to "is_continuous" and its meaning has been updated and expanded. A rule tagged as "is_continuous" can only @@ -85,10 +87,10 @@ License detection: The processing for "is_continous" has been merged in "key phrases" processing below. -- Key phrases can now be defined in RULEs by surrounding one or more words with - `{{` and `}}`. When defined a RULE will only match when the key phrases match - exactly. When all the text of rule is a "key phrase", this is the same as being - "is_continuous". +- Key phrases can now be defined in a RULE text by surrounding one or more words + with double curly braces `{{` and `}}`. When defined a RULE will only match + when the key phrases match exactly. When all the text of rule is a "key phrase", + this is the same as being "is_continuous". - The "--unknown-licenses" option now also detects unknown licenses using a simple and effective ngrams-based matching in area that are not matched or @@ -135,6 +137,7 @@ License detection: tagged and they may not be detected unless you activate this new indexing feature. + Package detection: ~~~~~~~~~~~~~~~~~~ @@ -172,14 +175,14 @@ Package detection: License Clarity Scoring Update ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- We are moving away from the license clarity scoring defined by ClearlyDefined - in the license clarity score plugin. The previous license clarity scoring - logic produced a score that was misleading when it would return a low score - due to the stringent scoring criteria. We are now - using more general criteria to get a sense of what provenance information has - been provided and whether or not there is a conflict in licensing between - what licenses were declared at the top-level key files and what licenses have - been detected in the files under the top-level. +- We are moving away from the original license clarity scoring designed for + ClearlyDefined in the license clarity score plugin. The previous license + clarity scoring logic produced a score that was misleading when it would + return a low score due to the stringent scoring criteria. We are now using + more general criteria to get a sense of what provenance information has been + provided and whether or not there is a conflict in licensing between what + licenses were declared at the top-level key files and what licenses have been + detected in the files under the top-level. - The license clarity score is a value from 0-100 calculated by combining the weighted values determined for each of the scoring elements: @@ -223,26 +226,33 @@ License Clarity Scoring Update - Conflicting license categories: - - When true, indicates that the declared license expression of the software is in - the permissive category, but that other potentially conflicting categories, - such as copyleft and proprietary, have been detected in lower level code. + - When true, indicates that the declared license expression of the software + is in the permissive category, but that other potentially conflicting + categories, such as copyleft and proprietary, have been detected in lower + level code. - Scoring Weight = -20 Summary Plugin Update ~~~~~~~~~~~~~~~~~~~~~ -The summary plugin's behavior has been changed. Previously, it provided a count -of the detected license expressions, copyrights, holders, authors, and -programming languages from a scan. We have preserved this functionality by -creating a new plugin called ``tallies``. All functionality of the previous -summary plugin have been preserved in the tallies plugin. -The plugin now attempts to determine a declared license expression, holder, and -primary programming language from a scan. The license clarity score provides -context on what origin information is provided from key files. It also returns -lists of tallies of the other detected license expressions, holders, and -programming languages. All information is provided in the codebase level -attribute named ``summary``. +- The summary plugin's behavior has been changed. Previously, it provided a + count of the detected license expressions, copyrights, holders, authors, and + programming languages from a scan. + + We have preserved this functionality by creating a new plugin called ``tallies``. + All functionality of the previous summary plugin have been preserved in the + tallies plugin. + +- The new summary plugin now attempts to determine a declared license expression, + declared holder, and the primary programming language from a scan. And the + updated license clarity score provides context on the quality of the license + information provided in the codebase key files. + +- The new summary plugin also returns lists of tallies for the other "secondary" + detected license expressions, copyright holders, and programming languages. + +All summary information is provided at the codebase-level attribute named ``summary``. Outputs: @@ -258,7 +268,8 @@ Outputs: Output version -------------- -Scancode Data Output Version is now 1.0.0. +Scancode Data Output Version is now 2.0.0. + Changes: @@ -276,16 +287,22 @@ Documentation Update correct minor documentation issues. -Development environment changes: -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Development environment and Code API changes: +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- The main package API function `get_package_infos` is deprecated, and + replaced by `get_package_data`. + +- The Resources path are always the same regardless of the strip-root or + full-root arguments. -- The license cache consistency is not checked anymore when you are using a Git +- The license cache consistency is not checked anymore when you are using a git checkout. The SCANCODE_DEV_MODE tag file has been removed entirely. Use instead the --reindex-licenses option to rebuild the license index. -- We can now regenerate updated test fixtures using the new SCANCODE_REGEN_TEST_FIXTURES - environment variable. There is no need to replace the regen=False with regen=True - in the code. +- We can now regenerate test fixtures using the new SCANCODE_REGEN_TEST_FIXTURES + environment variable. There is no need to replace the regen=False with + regen=True in the code. 30.1.0 - 2021-09-25 diff --git a/README.rst b/README.rst index 7107bb02d7c..d9bc8ca2569 100644 --- a/README.rst +++ b/README.rst @@ -10,6 +10,12 @@ Read more about ScanCode here: https://scancode-toolkit.readthedocs.io/. Check out the code at https://github.com/nexB/scancode-toolkit +Discover also: + +- The ScanCode.io server project here: https://scancodeio.readthedocs.io +- Other companion SCA projects for code origin, license and security analysis + here: https://aboutcode.org + Build and tests status ====================== @@ -92,12 +98,15 @@ for upcoming features. Documentation ============= -The ScanCode documentation is hosted at `scancode-toolkit.readthedocs.io `_. +The ScanCode documentation is hosted at +`scancode-toolkit.readthedocs.io `_. -If you are new to Scancode, start `here `_. +If you are new to Scancode, start with our +`newcomer `_ page. -If you want to compare output changes between different versions of Scancode, or want to look at reference scans -generated by Scancode, start `here `_. +If you want to compare output changes between different versions of Scancode, +or want to look at scans generated by Scancode, review our +`reference scans `_. Other Important Documentation Pages: