Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify Release URLs using publish attestations #2833

Closed
wants to merge 2 commits into from

Conversation

facutuesca
Copy link

This PR adds a new field to Release, trusted_publisher_url, which stores the publisher URL used to create that release.

More specifically, this field is only populated when the first file uploaded (the one that creates the release) is uploaded using Trusted Publishing and has a PEP-740 publish attestation.

This allows PyPI to verify a release's URLs (the ones returned by the Release.urls property), by comparing them with the trusted_publisher_url. This means we put those URLs in the Verified details section of the project page:
image

(compare with the mockup provided in this comment: pypi#8635 (comment))

This is an alternative implementation to these two PRs:
pypi#15862
pypi#15891

cc @woodruffw

Comment on lines 558 to 631
@pytest.mark.parametrize(
("url", "trusted_publisher_url", "expected"),
[
(
"https://github.com/owner/project",
"https://github.com/owner/project",
True,
),
(
"https://github.com/owner/project/",
"https://github.com/owner/project",
True,
),
(
"https://github.com/owner/project/issues",
"https://github.com/owner/project",
True,
),
("https://github.com/owner/", "https://github.com/owner/project", False),
(
"https://gitlab.com/owner/project",
"https://github.com/owner/project",
False,
),
],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: might be good to have some ActiveState URL examples in here as well 🙂

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

def verify_url(self, url: str) -> bool:
if not self.trusted_publisher_url:
return False
return url.startswith(self.trusted_publisher_url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: This doesn't do case normalization or anything else, which is probably fine but might be worth calling out in a comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new implementation, now we do normalization. I added a detailed docstring to document the behavior

@facutuesca facutuesca marked this pull request as draft June 24, 2024 13:25
@facutuesca
Copy link
Author

facutuesca commented Jun 24, 2024

@woodruffw While looking into URL normalization, I realized this implementation has a pretty bad security bug:

publisher_url = "https://github.com/org/project"
release_url = "https://github.com/org/project22"
verify_url(release_url) == True  # because release_url.startswith(publisher_url) is True

Which led me to look into urllib to see if we can parse and compare URL elements. But urllib doesn't do any validation on inputs, which means there's no guarantee a malicious input will not parse as a valid URL that matches the publisher URL.

Any ideas on how we can tackle URL comparison here?
edit: Would something like this be enough?:

    def verify_url(self, url: str) -> bool:
        if not self.trusted_publisher_url:
            return False
        return (url == self.trusted_publisher_url or 
                url.startswith(self.trusted_publisher_url + '/'))  # if checking with startswith, ensure the character after the publisher URL is a forward slash

@woodruffw
Copy link
Member

Which led me to look into urllib to see if we can parse and compare URL elements. But urllib doesn't do any validation on inputs, which means there's no guarantee a malicious input will not parse as a valid URL that matches the publisher URL.

Yeah, I think we should avoid urllib entirely for this, and use a more standards-conformant URL parser that offers strong validation. One good candidate is rfc3986 -- technically that'll be stricter than WHATWG requires for URL validation, but this shouldn't be an issue in practice (and when it is, it'll be a good validation check).

TL;DR: I think we should use rfc3986 + test the bejeezus out of it, including negative and backstop cases 🙂

@facutuesca
Copy link
Author

TL;DR: I think we should use rfc3986 + test the bejeezus out of it, including negative and backstop cases 🙂

@woodruffw Done! The Release.verify_url now does more exhaustive checks, and I added several more tests for the error cases I could think of.

@@ -22,6 +22,7 @@
from github_reserved_names import ALL as GITHUB_RESERVED_NAMES
from pyramid.authorization import Allow, Authenticated
from pyramid.threadlocal import get_current_request
from rfc3986 import api
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: api is a pretty generic import, so maybe we do import rfc3986.api instead and refer to things with that fully qualified path 🙂

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 682 to 684
is_subpath = publisher_uri.path == user_uri.path or user_uri.path.startswith(
publisher_uri.path + "/"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: Probably need to check what happens when either URL doesn't have a path component; I suspect path will be None in that case 🙂

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Added a check, and a test

Comment on lines +611 to +614
( # URL path component is empty
"https://github.com",
"https://github.com/owner/project",
False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A testcase for the inverse (TP has path, expected has no path) would also be good! And same for both having no path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@facutuesca
Copy link
Author

Superseeded by pypi#16205

@facutuesca facutuesca closed this Jul 2, 2024
@facutuesca facutuesca deleted the verify-project-urls branch August 16, 2024 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants