Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify spec for version #380

Open
4 tasks
pombredanne opened this issue Jan 22, 2025 · 4 comments
Open
4 tasks

Clarify spec for version #380

pombredanne opened this issue Jan 22, 2025 · 4 comments
Assignees

Comments

@pombredanne
Copy link
Member

pombredanne commented Jan 22, 2025

We some key issues to tackle wrt. version encoding:

And this PR:

@pombredanne pombredanne converted this from a draft issue Jan 22, 2025
@pombredanne
Copy link
Member Author

@mprpic This is yours since your volunteered. Do you want to join the project?

@mprpic
Copy link
Contributor

mprpic commented Jan 22, 2025

@pombredanne Yes please.

@davidB
Copy link

davidB commented Jan 27, 2025

Yes please clarify, because the doc says version is percent-encoded. But it's only for some characters => it's a pain, because if I always percent-encode the version before building the URL:

sha256:adf450ad2a44e7cf94a9ba15378ad66f3b906ebd88d541f980d3b3e2b33f2399 => sha256%3Aadf450ad2a44e7cf94a9ba15378ad66f3b906ebd88d541f980d3b3e2b33f2399 as expected

0.3.0 => 0%2E3%2E0 (ok for tag in query parameter by mismatch version in samples with ...@0.3.0)

UPDATE: encoding/decoding of the version should be simple and bijective to be able to build and parse purl in lot of programming language.

@matt-phylum
Copy link
Contributor

The encoding is only supposed to be applied to characters that require encoding to be unambiguous in the position they are used in: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#character-encoding It's suggested that plus signs are also encoded to avoid interoperability problems: #261

Unfortunately, there is a lot of confusion about percent encoding and x-www-form-urlencoded and RFC3986 reserved characters, all of which are distinct topics. Percent encoding is the mechanism of encoding, and does not specify which characters are to be encoded, although obviously you have to encode % itself or its meaning will change during decoding. Based on the results you're seeing, you're probably encoding all characters except ascii alphanumeric characters, which is valid and should be understood by any PURL implementation, but it's not the canonical form.

I think at some other time on another issue I suggested having explicit rules that clearly specify which characters are or are not expected to be encoded. WHATWG URL has a nice format where it defines different encode sets and then specifies the encode set to be used for the different parts of the URL: https://url.spec.whatwg.org/#percent-encoded-bytes

As far as I know, these sets are correct for PURL: https://github.com/phylum-dev/purl/blob/151168733f75a9802556e4b07eb577b9d99f7cea/purl/src/format.rs#L9-L27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

4 participants