Skip to content

Commit 4235041

Browse files
authored
refactor: accept provenance data in artifact pipeline check (#872)
This PR renames `mcn_infer_artifact_pipeline_1` to `mcn_find_artifact_pipeline_1`. This check can support all the package registries now. When a verifiable provenance is found for an artifact, we use it to obtain the pipeline trigger. Otherwise, we use heuristics to find the triggering pipeline. Signed-off-by: behnazh-w <behnaz.hassanshahi@oracle.com>
1 parent b65f0db commit 4235041

File tree

88 files changed

+1780
-314
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+1780
-314
lines changed
15 KB
Loading
-37.7 KB
Binary file not shown.
74.9 KB
Loading

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ the requirements that are currently supported by Macaron.
7777
* - ``mcn_build_as_code_1``
7878
- **Build as code** - If a trusted builder is not present, this requirement determines that the build definition and configuration executed by the build service is verifiably derived from text file definitions stored in a version control system.
7979
- Identify and validate the CI service(s) used to build and deploy/publish an artifact.
80-
* - ``mcn_infer_artifact_pipeline_1``
80+
* - ``mcn_find_artifact_pipeline_1``
8181
- **Infer artifact publish pipeline** - When a provenance is not available, checks whether a CI workflow run has automatically published the artifact.
8282
- Identify a workflow run that has triggered the deploy step determined by the ``Build as code`` check.
8383
* - ``mcn_provenance_level_three_1``

docs/source/pages/tutorials/detect_malicious_java_dep.rst

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
.. Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved.
22
.. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
33
4-
.. _detect-malicious-java-dep:
4+
.. _detect-manual-upload-java-dep:
55

6-
------------------------------------------------------------------------
7-
Detecting a malicious Java dependency uploaded manually to Maven Central
8-
------------------------------------------------------------------------
6+
--------------------------------------------------------------
7+
Detecting Java dependencies manually uploaded to Maven Central
8+
--------------------------------------------------------------
99

1010
In this tutorial we show how Macaron can determine whether the dependencies of a Java project are built
1111
and published via transparent CI workflows or manually uploaded to Maven Central. You can also
@@ -24,12 +24,12 @@ dependencies:
2424

2525
* - Artifact name
2626
- `Package URL (PURL) <https://github.com/package-url/purl-spec>`_
27-
* - `guava <https://central.sonatype.com/artifact/com.google.guava/guava>`_
28-
- ``pkg:maven/com.google.guava/guava@32.1.2-jre?type=jar``
27+
* - `log4j-core <https://central.sonatype.com/artifact/org.apache.logging.log4j/log4j-core>`_
28+
- ``pkg:maven/org.apache.logging.log4j/log4j-core@3.0.0-beta2?type=jar``
2929
* - `jackson-databind <https://central.sonatype.com/artifact/io.github.behnazh-w.demo/jackson-databind>`_
3030
- ``pkg:maven/io.github.behnazh-w.demo/jackson-databind@1.0?type=jar``
3131

32-
While the ``guava`` dependency follows best practices to publish artifacts automatically with minimal human
32+
While the ``log4j-core`` dependency follows best practices to publish artifacts automatically with minimal human
3333
intervention, ``jackson-databind`` is a malicious dependency that pretends to provide data-binding functionalities
3434
like `the official jackson-databind <https://github.com/FasterXML/jackson-databind>`_ library (note that
3535
this artifact is created for demonstration purposes and is not actually malicious).
@@ -70,7 +70,7 @@ First, we need to run the ``analyze`` command of Macaron to run a number of :ref
7070

7171
.. code-block:: shell
7272
73-
./run_macaron.sh analyze -purl pkg:maven/io.github.behnazh-w.demo/example-maven-app@1.0?type=jar -rp https://github.com/behnazh-w/example-maven-app
73+
./run_macaron.sh analyze -purl pkg:maven/io.github.behnazh-w.demo/example-maven-app@2.0?type=jar -rp https://github.com/behnazh-w/example-maven-app --deps-depth=1
7474
7575
.. note:: By default, Macaron clones the repositories and creates output files under the ``output`` directory. To understand the structure of this directory please see :ref:`Output Files Guide <output_files_guide>`.
7676

@@ -96,7 +96,7 @@ As you can see, some of the checks are passing and some are failing. In summary,
9696
* is not producing any :term:`SLSA` or :term:`Witness` provenances (``mcn_provenance_available_1``)
9797
* is using GitHub Actions to build and test using ``mvnw`` (``mcn_build_service_1``)
9898
* but it is not deploying any artifacts automatically (``mcn_build_as_code_1``)
99-
* and no CI workflow runs are detected that automatically publish artifacts (``mcn_infer_artifact_pipeline_1``)
99+
* and no CI workflow runs are detected that automatically publish artifacts (``mcn_find_artifact_pipeline_1``)
100100

101101
As you scroll down in the HTML report, you will see a section for the dependencies that were automatically identified:
102102

@@ -110,25 +110,25 @@ As you scroll down in the HTML report, you will see a section for the dependenci
110110
| Macaron has found the two dependencies as expected:
111111
112112
* ``io.github.behnazh-w.demo:jackson-databind:1.0``
113-
* ``com.google.guava:guava:32.1.2-jre``
113+
* ``org.apache.logging.log4j:log4j-core:3.0.0-beta2``
114114

115-
When we open the reports for each dependency, we see that ``mcn_infer_artifact_pipeline_1`` is passed for ``com.google.guava:guava:32.1.2-jre``
116-
and a GitHub Actions workflow run is found for publishing version ``32.1.2-jre``. However, this check is failing for ``io.github.behnazh-w.demo:jackson-databind:1.0``.
115+
When we open the reports for each dependency, we see that ``mcn_find_artifact_pipeline_1`` is passed for ``org.apache.logging.log4j:log4j-core:3.0.0-beta2``
116+
and a GitHub Actions workflow run is found for publishing version ``3.0.0-beta2``. However, this check is failing for ``io.github.behnazh-w.demo:jackson-databind:1.0``.
117117
This means that ``io.github.behnazh-w.demo:jackson-databind:1.0`` could have been built and published manually to Maven Central
118118
and could potentially be malicious.
119119

120-
.. _fig_infer_artifact_pipeline_guava:
120+
.. _fig_find_artifact_pipeline_log4j:
121121

122-
.. figure:: ../../_static/images/tutorial_guava_infer_pipeline.png
123-
:alt: mcn_infer_artifact_pipeline_1 for com.google.guava:guava:32.1.2-jre
122+
.. figure:: ../../_static/images/tutorial_log4j_find_pipeline.png
123+
:alt: mcn_find_artifact_pipeline_1 for org.apache.logging.log4j:log4j-core:3.0.0-beta2
124124
:align: center
125125

126-
``com.google.guava:guava:32.1.2-jre``
126+
``org.apache.logging.log4j:log4j-core:3.0.0-beta2``
127127

128128
.. _fig_infer_artifact_pipeline_bh_jackson_databind:
129129

130130
.. figure:: ../../_static/images/tutorial_bh_jackson_databind_infer_pipeline.png
131-
:alt: mcn_infer_artifact_pipeline_1 for io.github.behnazh-w.demo:jackson-databind:1.0
131+
:alt: mcn_find_artifact_pipeline_1 for io.github.behnazh-w.demo:jackson-databind:1.0
132132
:align: center
133133

134134
``io.github.behnazh-w.demo:jackson-databind:1.0``
@@ -154,7 +154,7 @@ The security requirement in this tutorial is to mandate dependencies of our proj
154154
transparent artifact publish CI workflows. To write a policy for this requirement, first we need to
155155
revisit the checks shown in the HTML report in the previous :ref:`step <fig_example-maven-app>`.
156156
The result of each of the checks can be queried by the check ID in the first column. For the policy in this tutorial,
157-
we are interested in the ``mcn_infer_artifact_pipeline_1`` and ``mcn_provenance_level_three_1`` checks:
157+
we are interested in the ``mcn_find_artifact_pipeline_1`` and ``mcn_provenance_level_three_1`` checks:
158158

159159
.. code-block:: prolog
160160
@@ -167,7 +167,7 @@ we are interested in the ``mcn_infer_artifact_pipeline_1`` and ``mcn_provenance_
167167
.decl violating_dependencies(parent: number)
168168
violating_dependencies(parent) :-
169169
transitive_dependency(parent, dependency),
170-
!check_passed(dependency, "mcn_infer_artifact_pipeline_1"),
170+
!check_passed(dependency, "mcn_find_artifact_pipeline_1"),
171171
!check_passed(dependency, "mcn_provenance_level_three_1").
172172
173173
apply_policy_to("detect-malicious-upload", component_id) :-
@@ -176,8 +176,8 @@ we are interested in the ``mcn_infer_artifact_pipeline_1`` and ``mcn_provenance_
176176
177177
This policy requires that all the dependencies
178178
of repository ``github.com/behnazh-w/example-maven-app`` either pass the ``mcn_provenance_level_three_1`` (have non-forgeable
179-
:term:`SLSA` provenances) or ``mcn_infer_artifact_pipeline_1`` check. Note that if an artifact already has a non-forgeable provenance, it means it is produced
180-
by a hosted build platform, such as GitHub Actions CI workflows. So, the ``mcn_infer_artifact_pipeline_1`` needs to pass
179+
:term:`SLSA` provenances) or ``mcn_find_artifact_pipeline_1`` check. Note that if an artifact already has a non-forgeable provenance, it means it is produced
180+
by a hosted build platform, such as GitHub Actions CI workflows. So, the ``mcn_find_artifact_pipeline_1`` needs to pass
181181
only if ``mcn_provenance_level_three_1`` fails.
182182

183183
Let's take a closer look at this policy to understand what each line means.
@@ -219,12 +219,12 @@ This rule populates the ``Policy`` relation if ``component_id`` exists in the da
219219
.decl violating_dependencies(parent: number)
220220
violating_dependencies(parent) :-
221221
transitive_dependency(parent, dependency),
222-
!check_passed(dependency, "mcn_infer_artifact_pipeline_1"),
222+
!check_passed(dependency, "mcn_find_artifact_pipeline_1"),
223223
!check_passed(dependency, "mcn_provenance_level_three_1").
224224
225225
This is the rule that the user needs to design to detect dependencies that violate a security requirement.
226226
Here we declare a relation called ``violating_dependencies`` and populate it if the dependencies in the
227-
``transitive_dependency`` relation do not pass any of the ``mcn_infer_artifact_pipeline_1`` and
227+
``transitive_dependency`` relation do not pass any of the ``mcn_find_artifact_pipeline_1`` and
228228
``mcn_provenance_level_three_1`` checks.
229229

230230
.. code-block:: prolog
@@ -253,7 +253,7 @@ printed to the console will look like the following:
253253
failed_policies
254254
['detect-malicious-upload']
255255
component_violates_policy
256-
['1', 'pkg:github.com/behnazh-w/example-maven-app@34c06e8ae3811885c57f8bd42db61f37ac57eb6c', 'detect-malicious-upload']
256+
['1', 'pkg:maven/io.github.behnazh-w.demo/example-maven-app@2.0?type=jar', 'detect-malicious-upload']
257257
258258
As you can see, the policy has failed because the ``io.github.behnazh-w.demo:jackson-databind:1.0``
259259
dependency is manually uploaded to Maven Central and does not meet the security requirement.

docs/source/pages/tutorials/exclude_include_checks.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ This tutorial will show how you can configure Macaron to:
2424
Prerequisites
2525
-------------
2626

27-
* You are expected to have gone through :ref:`this tutorial <detect-malicious-java-dep>`.
27+
* You are expected to have gone through :ref:`this tutorial <detect-manual-upload-java-dep>`.
2828
* This tutorial requires a high-level understanding of checks in Macaron and how they depend on each other. Please see this :ref:`page <macaron-developer-guide>` for more information.
2929

3030
------------------

src/macaron/config/defaults.ini

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,12 @@ wrapper_files =
146146
mvnw
147147

148148
[builder.maven.ci.build]
149-
github_actions = actions/setup-java
149+
github_actions =
150+
actions/setup-java
151+
# Parent project used in Maven-based projects of the Apache Logging Services.
152+
apache/logging-parent/.github/workflows/build-reusable.yaml
153+
# This action can be used to deploy artifacts to a JFrog artifactory server.
154+
spring-io/artifactory-deploy-action
150155
travis_ci = jdk
151156
circle_ci =
152157
gitlab_ci =
@@ -159,6 +164,8 @@ jenkins =
159164

160165
[builder.maven.ci.deploy]
161166
github_actions =
167+
# Parent project used in Maven-based projects of the Apache Logging Services.
168+
apache/logging-parent/.github/workflows/deploy-release-reusable.yaml
162169
travis_ci =
163170
gpg:sign-and-deploy-file
164171
deploy:deploy
@@ -237,6 +244,8 @@ jenkins =
237244

238245
[builder.gradle.ci.deploy]
239246
github_actions =
247+
# This action can be used to deploy artifacts to a JFrog artifactory server.
248+
spring-io/artifactory-deploy-action
240249
travis_ci =
241250
artifactoryPublish
242251
./gradlew publish
@@ -495,7 +504,7 @@ artifact_extensions =
495504
# Package registries.
496505
[package_registry]
497506
# The allowed time range (in seconds) from a deploy workflow run start time to publish time.
498-
publish_time_range = 3600
507+
publish_time_range = 7200
499508

500509
# [package_registry.jfrog.maven]
501510
# In this example, the Maven repo can be accessed at `https://internal.registry.org/repo-name`.
@@ -505,9 +514,12 @@ publish_time_range = 3600
505514

506515
[package_registry.maven_central]
507516
# Maven Central host name.
508-
hostname = search.maven.org
517+
search_netloc = search.maven.org
518+
search_scheme = https
509519
# The search REST API. See https://central.sonatype.org/search/rest-api-guide/
510520
search_endpoint = solrsearch/select
521+
registry_url_netloc = repo1.maven.org/maven2
522+
registry_url_scheme = https
511523
request_timeout = 20
512524

513525
[package_registry.npm]

src/macaron/json_tools.py

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -31,28 +31,27 @@ def json_extract(entry: dict | list, keys: Sequence[str | int], type_: type[T])
3131
T | None:
3232
The found value as the type of the type parameter.
3333
"""
34-
target: JsonType = entry
3534
for key in keys:
36-
if isinstance(target, dict) and isinstance(key, str):
37-
if key not in target:
38-
logger.debug("JSON key '%s' not found in dict target.", key)
35+
if isinstance(entry, dict) and isinstance(key, str):
36+
if key not in entry:
37+
logger.debug("JSON key '%s' not found in dict entry.", key)
3938
return None
40-
elif isinstance(target, list) and isinstance(key, int):
41-
if key < 0 or key >= len(target):
42-
logger.debug("JSON list index '%s' is outside of list bounds %s.", key, len(target))
39+
elif isinstance(entry, list) and isinstance(key, int):
40+
if key < 0 or key >= len(entry):
41+
logger.debug("JSON list index '%s' is outside of list bounds %s.", key, len(entry))
4342
return None
4443
else:
45-
logger.debug("Cannot index '%s' (type: %s) in target (type: %s).", key, type(key), type(target))
44+
logger.debug("Cannot index '%s' (type: %s) in entry (type: %s).", key, type(key), type(entry))
4645
return None
4746

4847
# If statement required for mypy to not complain. The else case can never happen because of the above if block.
49-
if isinstance(target, dict) and isinstance(key, str):
50-
target = target[key]
51-
elif isinstance(target, list) and isinstance(key, int):
52-
target = target[key]
48+
if isinstance(entry, dict) and isinstance(key, str):
49+
entry = entry[key]
50+
elif isinstance(entry, list) and isinstance(key, int):
51+
entry = entry[key]
5352

54-
if isinstance(target, type_):
55-
return target
53+
if isinstance(entry, type_):
54+
return entry
5655

57-
logger.debug("Found value of incorrect type: %s instead of %s.", type(target), type(type_))
56+
logger.debug("Found value of incorrect type: %s instead of %s.", type(entry), type(type_))
5857
return None

0 commit comments

Comments
 (0)