Skip to content

Commit

Permalink
Merge pull request #115 from nexB/matchcode-toolkit-release-prep
Browse files Browse the repository at this point in the history
Prep matchcode-toolkit codebase for release #113
  • Loading branch information
JonoYang authored Jun 6, 2023
2 parents 4549f09 + 9e0ca7f commit be57326
Show file tree
Hide file tree
Showing 8 changed files with 67 additions and 680 deletions.
7 changes: 7 additions & 0 deletions matchcode-toolkit/CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Changelog
=========

v1.0.0
------

*2023-06-05* -- Initial release.
75 changes: 58 additions & 17 deletions matchcode-toolkit/README.rst
Original file line number Diff line number Diff line change
@@ -1,31 +1,72 @@
matchcode-toolkit
MatchCode toolkit
=================
This contains a scancode-toolkit post-scan plugin that fingerprints the
directories of a scan and queries those fingerprints against the matchcode API
to find package matches.
MatchCode toolkit is a Python library that provides the directory fingerprinting
functionality for `ScanCode toolkit <https://github.com/nexB/scancode-toolkit>`_
and `ScanCode.io <https://github.com/nexB/scancode.io>`_ by implementing the
HaloHash algorithm and using it in ScanCode toolkit and ScanCode.io plugins and
pipelines.


Installation
------------

MatchCode toolkit must be installed in the same environment as ScanCode toolkit
or ScanCode.io.

From PyPI:
::

pip install matchcode-toolkit

A checkout of this repo can also be installed into an environment using pip's
``--editable`` option,
::

# Activate the virtual environment you want to install MatchCode-toolkit into,
# change directories to the ``matchcode-toolkit`` directory
pip install --editable .

or built into a wheel and then installed:
::

python setup.py bdist_wheel # The built wheel will be in the dist/ directory
pip install matchcode_toolkit-*-py3-none-any.whl


Usage
-----

Ensure that the PurlDB server is up. Set the following environment variables:
* ``MATCHCODE_DIRECTORY_CONTENT_MATCHING_ENDPOINT``
MatchCode toolkit provides the ``--fingerprint`` option for ScanCode toolkit.
This is a post-scan plugin that adds the fields
``directory_content_fingerprint`` and ``directory_structure_fingerprint`` to
Resources and computes those values for directories.
::

scancode --info --fingerprint <scan target location> --json-pp <output location>

* ``export MATCHCODE_DIRECTORY_CONTENT_MATCHING_ENDPOINT="http://127.0.0.1:8001/api/approximate_directory_content_index/match/"``

* ``MATCHCODE_DIRECTORY_STRUCTURE_MATCHING_ENDPOINT``
MatchCode toolkit provides the ``scan_and_fingerprint_package`` pipeline for
ScanCode.io. This is the same as the ``scan_package`` pipeline, but has the
added step of computing fingerprints for directories.

* ``export MATCHCODE_DIRECTORY_STRUCTURE_MATCHING_ENDPOINT="http://127.0.0.1:8001/api/approximate_directory_structure_index/match/"``

Install the matchcode-toolkit plugin into scancode-toolkit:
* Open a shell and enable the virtual environment of the scancode-toolkit instance you want to use
* Navigate to the matchcode-toolkit directory and run ``pip install -e .``
License
-------

Run scancode with matching enabled:
* The ``--info`` option has to be enabled on the scan you are running:
SPDX-License-Identifier: Apache-2.0

* ``scancode --info --match <scan target directory> --json-pp -``
The ScanCode.io software is licensed under the Apache License version 2.0.
Data generated with ScanCode.io is provided as-is without warranties.
ScanCode is a trademark of nexB Inc.

or on the scan you are importing:
You may not use this software except in compliance with the License.
You may obtain a copy of the License at: http://apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

* ``scancode --from-scan <path to scan JSON with --info> --match --json-pp -``
Data Generated with ScanCode.io is provided on an "AS IS" BASIS, WITHOUT WARRANTIES
OR CONDITIONS OF ANY KIND, either express or implied. No content created from
ScanCode.io should be considered or used as legal advice. Consult an Attorney
for any legal advice.
5 changes: 1 addition & 4 deletions matchcode-toolkit/setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = matchcode-toolkit
version = 0.0.1
version = 1.0.0
license = Apache-2.0

# description must be on ONE line https://github.com/pypa/setuptools/issues/1390
Expand Down Expand Up @@ -65,9 +65,6 @@ docs =
[options.entry_points]
scancode_post_scan =
fingerprint = matchcode_toolkit.plugin_fingerprint:Fingerprint
match = matchcode_toolkit.plugin_match:Match

scancodeio_pipelines =
scan_and_fingerprint_codebase = matchcode_toolkit.pipelines.scan_and_fingerprint_codebase:ScanAndFingerprintCodebase
scan_and_fingerprint_package = matchcode_toolkit.pipelines.scan_and_fingerprint_package:ScanAndFingerprintPackage
matching = matchcode_toolkit.pipelines.matching:Matching
3 changes: 1 addition & 2 deletions matchcode-toolkit/src/matchcode_toolkit/halohash.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,7 @@
from bitarray.util import count_xor

from commoncode import codec

from matchcode_toolkit import hash as commoncode_hash
from commoncode import hash as commoncode_hash

"""
Halo is a family of hash functions that have the un-common property that mostly
Expand Down
116 changes: 0 additions & 116 deletions matchcode-toolkit/src/matchcode_toolkit/hash.py

This file was deleted.

Loading

0 comments on commit be57326

Please sign in to comment.