Skip to content

feat: add GitHub attestation discovery #1020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

benmss
Copy link
Member

@benmss benmss commented Mar 19, 2025

This PR adds support for GitHub attestation discovery. To access GitHub attestations, the SHA256 hash of the repositories artefacts must be generated and submitted via the API. Artefacts may be found in local caches, or downloaded from remote repositories.

  • Support for Maven artefacts
  • Support for PyPI artefacts

Closes #915

@benmss benmss self-assigned this Mar 19, 2025
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Mar 19, 2025
@benmss benmss force-pushed the 915-discover-github-attestations branch from dc4736e to 69099a8 Compare March 19, 2025 10:14
@benmss benmss changed the base branch from staging to main April 8, 2025 06:38
@benmss benmss force-pushed the 915-discover-github-attestations branch from ea49398 to dce704b Compare April 28, 2025 02:50
@benmss benmss marked this pull request as ready for review April 30, 2025 23:27
@benmss benmss requested review from behnazh-w and tromai as code owners April 30, 2025 23:27
@behnazh-w behnazh-w requested a review from nicallen May 1, 2025 11:33
@benmss benmss force-pushed the 915-discover-github-attestations branch from dce704b to 4be1f9d Compare May 19, 2025 05:31
Comment on lines 528 to 537
with tempfile.TemporaryDirectory() as temp_dir:
attestation_file = os.path.join(temp_dir, "attestation")
with open(attestation_file, "w", encoding="UTF-8") as file:
json.dump(git_attestation, file)

try:
payload = load_provenance_payload(attestation_file)
provenance_payload = payload
except LoadIntotoAttestationError as error:
logger.debug("Failed to load provenance payload: %s", error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to dump it to a temp file then read it back in? Can't we just process the git_attestation directly as a provenance in-memory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary but is in-line with pre-existing handling of attestation. Changing it is doable, it will just require some refactoring to ensure consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be addressed as part of a separate issue #1095

if git_attestation_dict:
git_attestation_list = json_extract(git_attestation_dict, ["attestations"], list)
if git_attestation_list:
git_attestation = git_attestation_list[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we assuming that the the first element is the attestation we are looking for ? Or is there a schema for the attestation content from Github that supports this decision 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are assuming that the first attestation is the one we are looking for. The alternative is to decode the payload of each and compare against the target artifact name. I'm unsure how common it is to find multiple attestations for a given artifact hash.

For the schema, there is an example given in the API. I will add this to the doc comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updating the doc comment sounds good to me.

@@ -247,3 +252,53 @@ def get_local_artifact_dirs(
)

raise LocalArtifactFinderError(f"Unsupported PURL type {purl_type}")


def get_local_artifact_hash(purl: PackageURL, artifact_dirs: list[str], hash_algorithm_name: str) -> str | None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could have some unit tests for this function ?

return artifact_hash


def find_or_create_pypi_asset(
Copy link
Member

@tromai tromai May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, this function is trying to do 2 things:

  • Find the first PyPIPackageJsonAsset instance within pypi_registry_info.metadata and returns it: asset_name and asset_version are not used, pypi_registry_info.metadata is not mutated.
  • If we cannot find the asset, we create a new PyPIPackageJsonAsset instance and append to pypi_registry_info.metadata.

I think this function can be refactored so that it only find and process any existing PyPIPackageJsonAsset in pypi_registry_info. The creation of a new PyPIPackageJsonAsset could be done separately, because it's only a constructor call. This way, we can prevent the mutation of a passed in parameter. This is what I currently have in mind:

def find_pypi_asset_from_registry_info(registry_info: PackageRegistryInfo) -> PyPIPackageJsonAsset | None:
    ...

def create_pypi_asset_from_pypi_registry_info(name: str, version: str | None, pypi_registry_info: PyPIRegistry) -> PyPIPackageJsonAsset | None:
    ...

And then the adding of the new asset to pypi_registry_info.metadata can happen outside of those 2 functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this suggestion. It's true that the function can be split into smaller pieces, however, in the 2 places that it is used, all split pieces would still need to be called sequentially. E.g. If we are about to download an asset we must first check if it already has been downloaded, then if not, create the asset, add it to the registry, etc. Ultimately, this means identical blocks of code calling all 2 to 3 sub functions.

Perhaps we can improve the name and/or documentation instead?

(There is one more place this function should be called. I will add it there instead of the almost identical solution that exists there currently.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Thanks for the clarification. I think it's good enough to update the doc string to indicate the mutation of the registry infor object when a new PyPIPackageJsonAsset instance is created.

Copy link
Member

@tromai tromai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have finished my first round of review. Thank you.

return None

artifact_target = None
if purl.type == "maven":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we support PyPI packages here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming packages are installed via pip, the wheel files are not stored within the virtual environment. It might be possible to add support for pip caches though.

@benmss benmss force-pushed the 915-discover-github-attestations branch from 274f20b to 686a6e8 Compare June 2, 2025 01:55
benmss added 13 commits June 3, 2025 15:09
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
…it test

Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
@benmss benmss force-pushed the 915-discover-github-attestations branch from 029a0dc to 4d2ff9a Compare June 3, 2025 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discover GitHub Attestations
4 participants