Skip to content

Commit

Permalink
Upgrade ScanCode-toolkit to version v31 #411
Browse files Browse the repository at this point in the history
* Upgrade scancode-toolkit to latest beta release #411

Signed-off-by: Thomas Druez <tdruez@nexb.com>

* Add a test class to regen test data #411

Signed-off-by: Thomas Druez <tdruez@nexb.com>

* Upgrade container_inspector to latest 31.0.0 version #411

Signed-off-by: Thomas Druez <tdruez@nexb.com>

* Handle new scan format in scancode pipes #411

Signed-off-by: Jono Yang <jyang@nexb.com>

* Handle package_uids for DiscoveredPackages #411

    * Remove create_discovered_packages2 and create_codebase_resources2

Signed-off-by: Jono Yang <jyang@nexb.com>

* Update deprecated code #411

    * Normalize package_uids before comparing results in tests
    * Update expected test results

Signed-off-by: Jono Yang <jyang@nexb.com>

* Regenerate asgiref 3.3.0 test data #411

    * Mark ProjectCodebase tests with expectedFailure
    * We will revisit ProjectCodebase and update it to fit our current models

Signed-off-by: Jono Yang <jyang@nexb.com>

* Add asgiref-3.3.0_scancode_scan.json #411

    * We are using a scancode scan results for tests since asgiref-3.3.0_scan.json is not exactly the same format as scancode's json output

Signed-off-by: Jono Yang <jyang@nexb.com>

* Add asgiref-3.3.0_walk_test_fixtures.json #411

    * Update regen_test_data.py to generate asgiref-3.3.0_walk_test_fixtures.json

Signed-off-by: Jono Yang <jyang@nexb.com>

* Signed-off-by: Jono Yang <jyang@nexb.com>

* Update make_results_summary() #411

    * No need to explicity get license_clarity_score in make_results_summary()
    * Update expected test results

Signed-off-by: Jono Yang <jyang@nexb.com>

* Exclude system_environment from diff #411

    * Add .vscode to .gitignore

Signed-off-by: Jono Yang <jyang@nexb.com>

* Upgrade scancode-toolkit and extractcode to latest version #411

Signed-off-by: Thomas Druez <tdruez@nexb.com>

* Update package_getter #434 #438

    * Adapt code from previous version of scancode-toolkit for use in the debian pipeline

Signed-off-by: Jono Yang <jyang@nexb.com>

* Allow packages to be created without versions #438

    * Update DiscoveredPackage.create_from_data to create packages without a version

Signed-off-by: Jono Yang <jyang@nexb.com>

* Update expected test results

Signed-off-by: Jono Yang <jyang@nexb.com>

* Report DiscoveredPackage correctly in summary #411

    * Ensure that DiscoveredPackages are reported one time in the scan_package pipeline summary
   * Add test to check key_file_packages field in the summary output

Signed-off-by: Jono Yang <jyang@nexb.com>

* Add test for docker pipeline for alpine #411

Signed-off-by: Jono Yang <jyang@nexb.com>

* Add docker pipeline test for rpm images #411

Signed-off-by: Jono Yang <jyang@nexb.com>

* Track package_uids in make_results_summary #435

    * Avoid checking if package_data dictionary is already in the key_files_packages list
    * Keep track of package_uids instead

Signed-off-by: Jono Yang <jyang@nexb.com>

* Add truncated ubuntu docker image for testing #435

Signed-off-by: Jono Yang <jyang@nexb.com>

* Bump scancode and commoncode versions #435

Signed-off-by: Jono Yang <jyang@nexb.com>

* Update docker pipeline #435

    * We now run scancode-toolkit on the docker image resources using the new --system-package option
    * This gives us the installed system packages in the returned scan
    * We use the scan to create the DiscoveredPackages and CodebaseResources
    * The rest of the pipeline is unchanged

Signed-off-by: Jono Yang <jyang@nexb.com>

* Fix code validity #411

Signed-off-by: Thomas Druez <tdruez@nexb.com>

* Simplify the filtering of key_files_packages using a QuerySet #411

Signed-off-by: Thomas Druez <tdruez@nexb.com>

* Remove copied code from docker.py #411 #435

    * Create Docker pipeline from combining the rootfs pipeline and scan_package pipeline

Signed-off-by: Jono Yang <jyang@nexb.com>

* Update alpine test image and results #411 #435

    * TODO: create smaller test images for ubuntu and redhat docker image tests

Signed-off-by: Jono Yang <jyang@nexb.com>

* Properly create multiple package instances #411

    * Do not attempt to combine multiple instances of the same package
    * Store package_uid in extra data by itself
    * Add test for multiple package instances

Signed-off-by: Jono Yang <jyang@nexb.com>

* Sort packages in JSON output by type and name #411

    * Normalize package_uid in extra_data fields

Signed-off-by: Jono Yang <jyang@nexb.com>

* Get file info and packages in initial scan #438

    * Remove step for scanning application packages

Signed-off-by: Jono Yang <jyang@nexb.com>

* Revert changes to docker pipes and pipeline #438

    * Check for existence of installed_file attribute before using it

Signed-off-by: Jono Yang <jyang@nexb.com>

* Use generic package_getter for all distros #438

    * Ensure both installed_file and codebase_resource have the same checksum field before comparing them

Signed-off-by: Jono Yang <jyang@nexb.com>

* Use get_path() with strip_root to get paths #438

    * Update mappings_keys_by_fieldname
    * Look for package data in package_data field instead of packages in save_scan_package_results

Signed-off-by: Jono Yang <jyang@nexb.com>

* Remove distro specific pipes #438

    * Move get_installed_packages to rootfs.py
    * Use get_package_data instead of get_package_info
    * Rename all instances of packages to package_data when scanning for application packages
    * Update test docker images and test results
    * Add test for basic rootfs

Signed-off-by: Jono Yang <jyang@nexb.com>

* Use list comprehension for key_file_packages #438

Signed-off-by: Jono Yang <jyang@nexb.com>

* Add package_uid field to DiscoveredPackage #411

    * Update expected test results

Signed-off-by: Jono Yang <jyang@nexb.com>

* Add test docker image for Ubuntu #438

    * Update expected test results
    * Remove old ubuntu.tar

Signed-off-by: Jono Yang <jyang@nexb.com>

* Update formatting #411 #438

Signed-off-by: Jono Yang <jyang@nexb.com>

* Use smaller rpm docker image for testing #438

Signed-off-by: Jono Yang <jyang@nexb.com>

* Replace ubuntu docker test image #438

Signed-off-by: Jono Yang <jyang@nexb.com>

* Use purl data in update_or_create_packages #438

    * Add package_uid to test package data
    * Update expected test result

Signed-off-by: Jono Yang <jyang@nexb.com>

* Bump scancode version to v31.0.0rc1 #438 #411

Signed-off-by: Jono Yang <jyang@nexb.com>

* Consider all PURL fields when ordering Packages #411 #438

Signed-off-by: Jono Yang <jyang@nexb.com>

* Create Packages before Resources  #411 #438

    * In the LoadInventory pipeline, create the DiscoveredPackages from a scan before creating the CodebaseResources

Signed-off-by: Jono Yang <jyang@nexb.com>

* Add test for load_inventory pipeline #411

Signed-off-by: Jono Yang <jyang@nexb.com>

* Code cleanups and formatting #411

Signed-off-by: Thomas Druez <tdruez@nexb.com>

* Upgrade dependencies #411

Signed-off-by: Thomas Druez <tdruez@nexb.com>

* Refactor create_inventory_from_scan to remove duplicated code #411

Signed-off-by: Thomas Druez <tdruez@nexb.com>

* Add changelog entry #411

Signed-off-by: Thomas Druez <tdruez@nexb.com>

Co-authored-by: Thomas Druez <tdruez@nexb.com>
  • Loading branch information
JonoYang and tdruez authored Jun 14, 2022
1 parent 777167e commit ba4695a
Show file tree
Hide file tree
Showing 48 changed files with 197,306 additions and 2,773 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ local
policies.yml
*.rdb
*.aof
.vscode

# This is only created when packaging for external redistribution
/thirdparty/
4 changes: 4 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ v31.0.0 (next)
- WARNING: Drop support for Python 3.6 and 3.7. Add support for Python 3.10.
Upgrade Django to version 4.x series.

- Upgrade ScanCode-toolkit to version v31.
See https://github.com/nexB/scancode-toolkit/blob/develop/CHANGELOG.rst for an
overview of the changes in v31 compared to v30.

- Implement run status auto-refresh using the htmx JavaScript library.
The statuses of queued and running pipeline are now automatically refreshed
in the project list and project details views every 10 seconds.
Expand Down
18 changes: 18 additions & 0 deletions scanpipe/migrations/0016_discoveredpackage_package_uid.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Generated by Django 4.0.4 on 2022-06-09 18:26

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('scanpipe', '0015_alter_codebaseresource_project_and_more'),
]

operations = [
migrations.AddField(
model_name='discoveredpackage',
name='package_uid',
field=models.CharField(blank=True, help_text='Unique identifier for this package.', max_length=1024),
),
]
7 changes: 6 additions & 1 deletion scanpipe/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -1726,6 +1726,11 @@ class DiscoveredPackage(
blank=True,
help_text=_("A list of dependencies for this package."),
)
package_uid = models.CharField(
max_length=1024,
blank=True,
help_text=_("Unique identifier for this package."),
)

# `AbstractPackage` model overrides:
keywords = models.JSONField(default=list, blank=True)
Expand Down Expand Up @@ -1769,7 +1774,7 @@ def create_from_data(cls, project, package_data):
If one of the values of the required fields is not available, a "ProjectError"
is created instead of a new DiscoveredPackage instance.
"""
required_fields = ["type", "name", "version"]
required_fields = ["type", "name"]
missing_values = [
field_name
for field_name in required_fields
Expand Down
4 changes: 2 additions & 2 deletions scanpipe/pipelines/docker.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@
# ScanCode.io is a free software code scanning tool from nexB Inc. and others.
# Visit https://github.com/nexB/scancode.io for support and download.

from scanpipe.pipelines import root_filesystems
from scanpipe.pipelines.root_filesystems import RootFS
from scanpipe.pipes import docker
from scanpipe.pipes import rootfs


class Docker(root_filesystems.RootFS):
class Docker(RootFS):
"""
A pipeline to analyze Docker images.
"""
Expand Down
9 changes: 4 additions & 5 deletions scanpipe/pipelines/load_inventory.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,15 +42,14 @@ def get_scan_json_input(self):
Locates a JSON scan input from a project's input/ directory.
"""
inputs = list(self.project.inputs(pattern="*.json"))

if len(inputs) != 1:
raise Exception("Only 1 JSON input file supported")

self.input_location = str(inputs[0].absolute())

def build_inventory_from_scan(self):
"""
Processes a given JSON scan input to populate codebase resources and packages.
Processes a JSON Scan results file to populate codebase resources and packages.
"""
project = self.project
scanned_codebase = scancode.get_virtual_codebase(project, self.input_location)
scancode.create_codebase_resources(project, scanned_codebase)
scancode.create_discovered_packages(project, scanned_codebase)
scancode.create_inventory_from_scan(self.project, self.input_location)
1 change: 0 additions & 1 deletion scanpipe/pipelines/scan_codebase.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@

from scanpipe import pipes
from scanpipe.pipelines import Pipeline
from scanpipe.pipes import output
from scanpipe.pipes import rootfs
from scanpipe.pipes import scancode
from scanpipe.pipes.input import copy_inputs
Expand Down
20 changes: 7 additions & 13 deletions scanpipe/pipelines/scan_package.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,13 +56,9 @@ def steps(cls):
"--license-text",
"--package",
"--url",
] + [
"--classify",
"--consolidate",
"--is-license-text",
"--license-clarity-score",
"--summary",
"--summary-key-files",
]

def get_package_archive_input(self):
Expand Down Expand Up @@ -102,33 +98,31 @@ def run_scancode(self):
"""
Scans extracted codebase/ content.
"""
self.scan_output = self.project.get_output_file_path("scancode", "json")
scan_output_path = self.project.get_output_file_path("scancode", "json")
self.scan_output_location = str(scan_output_path.absolute())

with self.save_errors(scancode.ScancodeError):
scancode.run_scancode(
location=str(self.project.codebase_path),
output_file=str(self.scan_output),
output_file=self.scan_output_location,
options=self.scancode_options,
raise_on_error=True,
)

if not self.scan_output.exists():
if not scan_output_path.exists():
raise FileNotFoundError("ScanCode output not available.")

def build_inventory_from_scan(self):
"""
Processes the JSON scan results to determine resources and packages.
Processes a JSON Scan results file to populate codebase resources and packages.
"""
project = self.project
scanned_codebase = scancode.get_virtual_codebase(project, str(self.scan_output))
scancode.create_codebase_resources(project, scanned_codebase)
scancode.create_discovered_packages(project, scanned_codebase)
scancode.create_inventory_from_scan(self.project, self.scan_output_location)

def make_summary_from_scan_results(self):
"""
Builds a summary in JSON format from the generated scan results.
"""
summary = scancode.make_results_summary(self.project, str(self.scan_output))
summary = scancode.make_results_summary(self.project, self.scan_output_location)
output_file = self.project.get_output_file_path("summary", "json")

with output_file.open("w") as summary_file:
Expand Down
13 changes: 9 additions & 4 deletions scanpipe/pipes/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,6 @@

from django.db.models import Count

from packageurl import normalize_qualifiers

from scanpipe.models import CodebaseResource
from scanpipe.models import DiscoveredPackage
from scanpipe.pipes import scancode
Expand Down Expand Up @@ -73,12 +71,19 @@ def update_or_create_package(project, package_data, codebase_resource=None):
"""
Gets, updates or creates a DiscoveredPackage then returns it.
Uses the `project` and `package_data` mapping to lookup and creates the
DiscoveredPackage using its Package URL as a unique key.
DiscoveredPackage using its Package URL and package_uid as a unique key.
"""
purl_data = DiscoveredPackage.extract_purl_data(package_data)
package_uid = package_data.get("package_uid")
purl_data_and_package_uid = {
**purl_data,
"package_uid": package_uid,
}

try:
package = DiscoveredPackage.objects.get(project=project, **purl_data)
package = DiscoveredPackage.objects.get(
project=project, **purl_data_and_package_uid
)
except DiscoveredPackage.DoesNotExist:
package = None

Expand Down
32 changes: 0 additions & 32 deletions scanpipe/pipes/alpine.py

This file was deleted.

2 changes: 2 additions & 0 deletions scanpipe/pipes/codebase.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ def get_tree(resource, fields, codebase=None):
return resource_dict


# TODO: Walking the ProjectCodebase is broken as we do not have a consistent way
# to get the root of a codebase.
class ProjectCodebase:
"""
Represents the codebase of a project stored in the database.
Expand Down
35 changes: 0 additions & 35 deletions scanpipe/pipes/debian.py

This file was deleted.

22 changes: 10 additions & 12 deletions scanpipe/pipes/docker.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@

import logging
import posixpath
from functools import partial
from pathlib import Path

from container_inspector.image import Image
Expand Down Expand Up @@ -152,23 +151,21 @@ def scan_image_for_system_packages(project, image, detect_licenses=True):
raise rootfs.DistroNotFound(f"Distro not found.")

distro_id = image.distro.identifier
if distro_id not in rootfs.PACKAGE_GETTER_BY_DISTRO:
if distro_id not in rootfs.SUPPORTED_DISTROS:
raise rootfs.DistroNotSupported(f'Distro "{distro_id}" is not supported.')

package_getter = partial(
rootfs.PACKAGE_GETTER_BY_DISTRO[distro_id],
distro=distro_id,
detect_licenses=detect_licenses,
)

installed_packages = image.get_installed_packages(package_getter)
installed_packages = image.get_installed_packages(rootfs.package_getter)

for i, (purl, package, layer) in enumerate(installed_packages):
logger.info(f"Creating package #{i}: {purl}")
created_package = pipes.update_or_create_package(project, package.to_dict())

installed_files = []
if hasattr(package, "resources"):
installed_files = package.resources

# We have no files for this installed package, we cannot go further.
if not package.installed_files:
if not installed_files:
logger.info(f" No installed_files for: {purl}")
continue

Expand All @@ -177,8 +174,9 @@ def scan_image_for_system_packages(project, image, detect_licenses=True):

codebase_resources = project.codebaseresources.all()

for install_file in package.installed_files:
install_file_path = pipes.normalize_path(install_file.path)
for install_file in installed_files:
install_file_path = install_file.get_path(strip_root=True)
install_file_path = pipes.normalize_path(install_file_path)
layer_rootfs_path = posixpath.join(
layer.layer_id,
install_file_path.strip("/"),
Expand Down
13 changes: 9 additions & 4 deletions scanpipe/pipes/output.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,12 @@ def get_headers(self, project):
def get_packages(self, project):
from scanpipe.api.serializers import DiscoveredPackageSerializer

packages = project.discoveredpackages.all()
packages = project.discoveredpackages.all().order_by(
"type",
"namespace",
"name",
"version",
)

for obj in packages.iterator():
yield self.encode(DiscoveredPackageSerializer(obj).data)
Expand Down Expand Up @@ -280,9 +285,9 @@ def _add_xlsx_worksheet(workbook, worksheet_name, rows, fields):
# https://github.com/nexB/scancode-toolkit/pull/2381
# https://github.com/nexB/scancode-toolkit/issues/2350
mappings_key_by_fieldname = {
"copyrights": "value",
"holders": "value",
"authors": "value",
"copyrights": "copyright",
"holders": "holder",
"authors": "author",
"emails": "email",
"urls": "url",
}
Expand Down
Loading

0 comments on commit ba4695a

Please sign in to comment.