Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the purl scheme/type be prefixed with purl+? #9

Closed
pombredanne opened this issue Nov 27, 2017 · 31 comments
Closed

Should the purl scheme/type be prefixed with purl+? #9

pombredanne opened this issue Nov 27, 2017 · 31 comments

Comments

@pombredanne
Copy link
Member

From #1 (comment) :

While reviewing this IANA list of registered URI schemes https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml two thoughts came to my crooked mind:

  1. we should reference it in the spec here https://github.com/package-url/purl-spec/pull/1/files#diff-88b99bb28683bd5b7e3a204826ead112R138
  1. to avoid any type/scheme naming conflict (there is already an official scheme for go://) we could consider stating that:
  • we cannot use any official or known schemes unless registered for purl usage
  • OR we could prefix all the type with purl+type as in purl+go:github.com/gorilla/context ....

I kinda like the purl+ type prefix as this makes it always clear we deal with a purl BUT at the same time this makes the string a tad more heavier with more characters.
And we could also register with IANA purl and purl+* as official schemes

Thoughts?

@pombredanne pombredanne changed the title Should the schme be prefixed with purl+? Should the purl scheme/type be prefixed with purl+? Nov 27, 2017
@ashcrow
Copy link
Contributor

ashcrow commented Nov 27, 2017

Doing purl+ does make sense. It is a little ugly IMHO but it also does specify strongly what it is and lets us use names that we should probably not use because they are valid in different forms elsewhere.

@pombredanne
Copy link
Member Author

@ashcrow agreed. And it may defuse some arguments from the URL/URI experts too

@pombredanne
Copy link
Member Author

@R2wenD2 what do you think about this purl+ URL scheme/type prefix?

@jackfirth
Copy link

jackfirth commented Nov 29, 2017

This has some problems and would be difficult to get registered. But in the interest of productive discussion I'll suggest an alternative: why not define a single purl scheme and specify that the first component of the authority must be some registered "package namespace identifier"? PURLs would then look like this:

  • purl://pypi/django@1.11.1
  • purl://docker/ubuntu@jessie

This doesn't solve the problem of specifying how each of those namespaces work, but it would let you divide up the spec like this:

  • The main PURL spec, which registers the purl scheme, defines structure and semantics common to all PURLs including what stuff is optional and required in a PURL, specifies how PURLs will maintain forward compatibility, defines what "package namespaces" are, and defines a package namespace IANA registry.
  • A spec for each package manager that defines its package namespace, registering it in the package namespace registry defined in the PURL spec.

This is what a lot of existing schemes do, including the urn scheme. The current existence of the "generic" package namespace seems like a wart that results from trying to use different schemes for each package system; I think there wouldn't be a need for something like that with this approach.

About URI scheme prefixes

A purl+ scheme pattern is something the URI folks have explicitly argued against in the past. There is no protocol whatsoever for defining common structure across a set of URI schemes by design. See the article Distributed Hungarian Notation Doesn't Work by Mark Nottingham[1] which explicitly condemns this pattern, using the (unstandardized) web+ URI prefix in HTML5 as an example of what not to do.

[1]: Mark Nottingham is the current chairman of the web/HTTP parts of the IETF and he's probably one of the people you'd have to convince when registering a scheme as generic as this one.

@pombredanne
Copy link
Member Author

This makes some sense to me, though I still would rather eschew having an authority in a purl at all.

So it could come out instead as:

  • purl:pypi/django@1.11.1
  • purl:docker/ubuntu@jessie

... where the type is now a mandatory part of the Path component and the Path is made of:
type/ + optional names/pace/ + name

which is still something easy and never ambiguous to parse.

About Mark, I followed his work closely over the years and his insights value would be somewhere between gold, platinum, electrum, or diamond-like.

@mnot I'd be quite interested to get you take there:
The skinny: we have a fine crowd of software package aficionados working together here to define a way to reference software packages in a mostly universal way. The messy conversion started here: #1 and received feedback from a real fine crowd such as the dpkg maintainer, the npm cli maintainer, a key maven contributor, the spack maintainer, the nuget project manager, the victims DB maintainers, openshift analytics member, the Grafeas maintainer, the FSFE director, the libraries.io mainatiner and several other incredibly great individuals that matter in this domain (some working for RedHat, Google, Microsoft, Intel and other fine businesses).

The current draft is here: https://github.com/package-url/purl-spec/blob/d31af66785896bd42dd9467b14bf9352f8f1abfe/README.rst

And the question in this ticket:

  • shall we use one URL scheme per package type (e.g. creating possibly 50 or 100+ new URL schemes) as currently suggested in the spec and as in pypi:redbnot@1.1
  • OR prefix every scheme with purl+ as in purl+pypi:redbnot@1.1 still creating 50 or 100+ new URL schemes
  • OR only use a single purl: scheme and use convention to stuff the package type as the first segment of the Path as in purl:pypi/redbnot@1.1

(NB: we DO NOT use an URL authority in a purl by design so far)

@jackfirth
Copy link

Could you explain why you'd like to avoid including an authority component? I don't think I understand what the purpose of such a restriction is.

@pombredanne
Copy link
Member Author

Could you explain why you'd like to avoid including an authority component? I don't think I understand what the purpose of such a restriction is.

  1. in the common case, the authority is implied by the purl type: Python pypi packages come from pypi.python.org, Maven maven packages come from repo1.maven.org so the main, default public repo does not need to be specified. The same way most package management CLI tools always use the default public repo UNLESS you specific something else.

  2. the place where you get the package from matters less than what this package is. From hierarchy standpoint it therefore should come after. Therefore it is speced as qualifier that comes after the naming and version. This has nice benefits such that when you deal with many many purls they sort mostly right as plain strings.

FWIW this has been discussed also there:

And I also created a FAQ entry on this topic "Can I use the Authority (i.e. user:pass@host:port) of a URI/URI in a purl?" https://github.com/package-url/purl-spec/wiki/FAQ#can-i-use-the-authority-ie-userpasshostport-of-a-uriuri-in-a-purl

@pombredanne
Copy link
Member Author

@annevk since you maintain https://github.com/whatwg/url you take would mean a very lot too!

@pombredanne
Copy link
Member Author

@annevk @mnot if you ever care to chime in here, please be gentle with us! we have a crowd of fine software hackers chiming in here but I am eventually a standards and URL purity noob and idiot.

@annevk
Copy link

annevk commented Nov 29, 2017

So you basically never need relative URLs?

@pombredanne
Copy link
Member Author

@annevk scripsit:

So you basically never need relative URLs?

Not ever ever IMHO. Never without a scheme aka. type in a purl

@annevk
Copy link

annevk commented Nov 29, 2017

In that case #9 (comment) seems fine (if there's a scheme it's by definition not relative for non-special schemes).

@pombredanne
Copy link
Member Author

@annevk just to be sure, you say that a single purl scheme would be best? as in: purl:pypi/django@1.11.1 or purl:docker/ubuntu@jessie

@annevk
Copy link

annevk commented Nov 29, 2017

A single scheme seems a lot better than a family of schemes, yes. I don't have a strong opinion on whether to use authority or not; I think the main benefit of using the authority component is with various types of relative URLs.

@jackfirth
Copy link

To expand on relative URLs, there's three forms to consider:

  • Path-relative, e.g. ./foo - whether a scheme allows authorities or not says nothing about whether this is allowed
  • Authority-relative, e.g. /foo- whether this refers to a package in the same ecosystem or registry depends on whether the package namespace and registry are in the authority or not
  • Scheme-relative, e.g. '//foo/bar- means "use only the same scheme"; typically only relevant for weird cases like an "absolute" link that's either anhttporhttpslink depending on whether the context resource was accessed overhttporhttps`

@pombredanne
Copy link
Member Author

@jackfirth Thanks. No purl would ever be relative in any of these meanings. That's forbidden.

@R2wenD2
Copy link

R2wenD2 commented Nov 29, 2017

I like purl+ URL scheme/type prefix - from a migration to purl perspective it will make it very easy to tell when we're working with purl vs not. I'm thinking of this largely from a Grafeas perspective, but I'd guess other adopters would need to do a similar migration if they already use some other scheme.

@pombredanne
Copy link
Member Author

@R2wenD2 Thank you for the input
By the way do you care for joining as a co-org admin here?

@R2wenD2
Copy link

R2wenD2 commented Nov 30, 2017

I'm happy to join, it would be helpful to have guidance on governance.
And I also agree that we need a cool logo :)

@pombredanne
Copy link
Member Author

pombredanne commented Nov 30, 2017

@R2wenD2 Done. invite sent. We really really need a cool logo. See #19

@pombredanne
Copy link
Member Author

@R2wenD2 on "governance" please see #21

@pombredanne
Copy link
Member Author

@R2wenD2 you wrote:

I like purl+ URL scheme/type prefix

Just to clarify: do you prefer:

  1. purl:golang/github.com/grafeas/grafeas/server-go@e61c8332 (which is likely a better idea based on feedback of @annevk )
    OR
  2. purl+golang:github.com/grafeas/grafeas/server-go@e61c8332 (which is likely not a good idea based on writings of @mnot and feedback from @annevk , initially rightfully reported by @jackfirth )

In anycase I totally agree with you on this:

I'm thinking of this largely from a Grafeas perspective, but I'd guess other adopters would need to do a similar migration if they already use some other scheme.

This is true for you and anyone as this makes the purl clearly unique and provides a smooth migration path from anything in use before.

@R2wenD2
Copy link

R2wenD2 commented Nov 30, 2017

Number 1 is good, I'll defer to the experts here.

@mnot
Copy link

mnot commented Nov 30, 2017

Hi!

Defining a convention like purl+ is something you can do, but it would require registering all of the individual schemes still. Personally, if I were doing it, I'd put the package type in the first segment (e.g., purl:pypi/foo; see below) and leave it at that.

However, I think there's a much bigger issue that needs to be resolved first.

I'm assuming that one of the design goals here is to identify both packages that are both in the name spaces of the various package registries (e.g., pypi, npm) and ad hoc packages that live in places like Github.

If that's the case, the full URL of the ad hoc location needs to be included; otherwise, it's ambiguous. Also, if someone needs to locate the package, they won't have enough information to be able to do so (e.g., do I use HTTP? HTTPS? SSH+GIT? etc.). Relying on the client's knowledge of various repositories like Github isn't a great solution here.

There are couple of ways to do this; e.g.,

purl:golang/https://github.com/grafeas/grafeas/server-go@e61c8332

Note that this example does not have an authority; it's using path-noscheme, where the package type is followed by a couple of structured path segments.

Or,

purl:npm:foopackage?loc=https://github.com/foo/foopackage

Here, it's using a structured path-rootless to separate the package type (npm) and the unique package name (foopackage), sticking location information in a query parameter.

There are a lot of tradeoffs here, of course, and it depends on your use cases.

Just in case you haven't seen it, the RFC you need to be looking at is:
https://tools.ietf.org/html/rfc7595

I'm happy to help you start walking through that process, help get you started writing a draft, etc. One great way to start the draft is:
https://github.com/martinthomson/i-d-template

One other thing -- it'd be great if you'd consider a name other than PURL; first of all, putting "URL" into a URL scheme isn't that great, and second, it'd going to confuse old-timers like me, because (to us) that term means this:
https://archive.org/services/purl/

@pombredanne
Copy link
Member Author

@mnot Thanks. This means a lot as you and @annevk are true URL Authorities i.e. the registered, globally unique and fully qualified Authority Component in these URLs: url://mnot and url://annevk 😃

Let me incorporate all this in an updated draft PR. (except of course the url:// URL scheme above)

About and RFC, that's definitely on the radar once this has firmed up and the dust has settled a bit and thank you for your hand holding for this!

As for the purl name, this is not cast in stone. I kinda like the purr happy cat 🐈 sound it makes. Shame on the archive folks for not registering a scheme, and being all school too, that was the first thing it made me think of e.g. a permanent URL. Now, it could have been much worse as @sschuberth pointed in early discussions.

So could we come up with a more sexy name, borrowing a page from the Erlang the movie II rebranding efforts for OTP .... e.g. we could use a more dull pkg for package? And find a better name from this acronym like @gar1t did nicely for OTP?

@pombredanne
Copy link
Member Author

On the topic of the pkg scheme, I presented purls at FOSDEM https://fosdem.org/2018/schedule/event/purl/ and polled the audience of this topic. Sounds like the balance is clearly in favor of a common pkg: prefix scheme for many good reasons including RFC/iana registration of a single common scheme, possible use as a browser URL, and many more that I agree with in the end. You can see some of the discussion in the video linked above.

@zvr ping too since you brought up the idea too in #19

I will submit a PR for this today using a pkg: prefix throughout. IMHO with this we have enough to go 1.0

@stevespringett @ashcrow ping too since you respectively wrote a Java and Go implementation

@R2wenD2
Copy link

R2wenD2 commented Feb 9, 2018

IIUC, we'd end up using something like:

  • pkg:pypi/django@1.11.1
  • pkg:docker/ubuntu@jessie

Is that the case?

@pombredanne
Copy link
Member Author

@R2wenD2 yes, that would it. I am working on a PR for review. I was much more in favor of the shorter form, but the argument presented here and at FOSDEM are hard to ignore ;)

pombredanne added a commit that referenced this issue Feb 9, 2018
 * also update spec license to MIT per #21
 * update test suite accordingly

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Member Author

PR #31 is available for everyone's review and adds the pkg: prefix including to the test suite.

@brianf
Copy link
Contributor

brianf commented Apr 10, 2018

+1.

@iamwillbar
Copy link
Member

Closing this since #31 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants