-
Notifications
You must be signed in to change notification settings - Fork 28
feat: add GitHub attestation discovery #1020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
dc4736e
to
69099a8
Compare
ea49398
to
dce704b
Compare
dce704b
to
4be1f9d
Compare
src/macaron/slsa_analyzer/package_registry/maven_central_registry.py
Outdated
Show resolved
Hide resolved
with tempfile.TemporaryDirectory() as temp_dir: | ||
attestation_file = os.path.join(temp_dir, "attestation") | ||
with open(attestation_file, "w", encoding="UTF-8") as file: | ||
json.dump(git_attestation, file) | ||
|
||
try: | ||
payload = load_provenance_payload(attestation_file) | ||
provenance_payload = payload | ||
except LoadIntotoAttestationError as error: | ||
logger.debug("Failed to load provenance payload: %s", error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really necessary to dump it to a temp file then read it back in? Can't we just process the git_attestation
directly as a provenance in-memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not necessary but is in-line with pre-existing handling of attestation. Changing it is doable, it will just require some refactoring to ensure consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be addressed as part of a separate issue #1095
if git_attestation_dict: | ||
git_attestation_list = json_extract(git_attestation_dict, ["attestations"], list) | ||
if git_attestation_list: | ||
git_attestation = git_attestation_list[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we assuming that the the first element is the attestation we are looking for ? Or is there a schema for the attestation content from Github that supports this decision 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are assuming that the first attestation is the one we are looking for. The alternative is to decode the payload of each and compare against the target artifact name. I'm unsure how common it is to find multiple attestations for a given artifact hash.
For the schema, there is an example given in the API. I will add this to the doc comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Updating the doc comment sounds good to me.
src/macaron/slsa_analyzer/package_registry/maven_central_registry.py
Outdated
Show resolved
Hide resolved
@@ -247,3 +252,53 @@ def get_local_artifact_dirs( | |||
) | |||
|
|||
raise LocalArtifactFinderError(f"Unsupported PURL type {purl_type}") | |||
|
|||
|
|||
def get_local_artifact_hash(purl: PackageURL, artifact_dirs: list[str], hash_algorithm_name: str) -> str | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could have some unit tests for this function ?
src/macaron/slsa_analyzer/package_registry/maven_central_registry.py
Outdated
Show resolved
Hide resolved
src/macaron/slsa_analyzer/package_registry/maven_central_registry.py
Outdated
Show resolved
Hide resolved
return artifact_hash | ||
|
||
|
||
def find_or_create_pypi_asset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, this function is trying to do 2 things:
- Find the first
PyPIPackageJsonAsset
instance withinpypi_registry_info.metadata
and returns it:asset_name
andasset_version
are not used, pypi_registry_info.metadata is not mutated. - If we cannot find the asset, we create a new PyPIPackageJsonAsset instance and append to
pypi_registry_info.metadata
.
I think this function can be refactored so that it only find and process any existing PyPIPackageJsonAsset in pypi_registry_info
. The creation of a new PyPIPackageJsonAsset could be done separately, because it's only a constructor call. This way, we can prevent the mutation of a passed in parameter. This is what I currently have in mind:
def find_pypi_asset_from_registry_info(registry_info: PackageRegistryInfo) -> PyPIPackageJsonAsset | None:
...
def create_pypi_asset_from_pypi_registry_info(name: str, version: str | None, pypi_registry_info: PyPIRegistry) -> PyPIPackageJsonAsset | None:
...
And then the adding of the new asset to pypi_registry_info.metadata
can happen outside of those 2 functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with this suggestion. It's true that the function can be split into smaller pieces, however, in the 2 places that it is used, all split pieces would still need to be called sequentially. E.g. If we are about to download an asset we must first check if it already has been downloaded, then if not, create the asset, add it to the registry, etc. Ultimately, this means identical blocks of code calling all 2 to 3 sub functions.
Perhaps we can improve the name and/or documentation instead?
(There is one more place this function should be called. I will add it there instead of the almost identical solution that exists there currently.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. Thanks for the clarification. I think it's good enough to update the doc string to indicate the mutation of the registry infor object when a new PyPIPackageJsonAsset instance is created.
tests/integration/cases/github_maven_attestation_local/test.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have finished my first round of review. Thank you.
return None | ||
|
||
artifact_target = None | ||
if purl.type == "maven": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we support PyPI
packages here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming packages are installed via pip
, the wheel files are not stored within the virtual environment. It might be possible to add support for pip
caches though.
274f20b
to
686a6e8
Compare
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
…it test Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
029a0dc
to
4d2ff9a
Compare
This PR adds support for GitHub attestation discovery. To access GitHub attestations, the SHA256 hash of the repositories artefacts must be generated and submitted via the API. Artefacts may be found in local caches, or downloaded from remote repositories.
Closes #915