Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial purl draft spec #1

Merged
merged 26 commits into from
Nov 22, 2017
Merged

Initial purl draft spec #1

merged 26 commits into from
Nov 22, 2017

Conversation

pombredanne
Copy link
Member

@pombredanne pombredanne commented Nov 11, 2017

For reference this is the result of a discussion that started here
aboutcode-org/scancode-toolkit#805

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Link: aboutcode-org/scancode-toolkit#805

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
README.rst Outdated
- in Go:
- in JavaScript:
- in Perl:
- for the JVM:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider including .NET as well.

For implementation, these places are potential candidate for discussion:

https://github.com/nuget/home
https://github.com/dotnet/home
https://github.com/dotnet/standard

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point! I am adding it in the updated version that I am pushing shortly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kasper3 this has been addressed in the latest push.

 * Update spec to use "purl" and package URL, not "puurl"
   based on @sschuberth feedback
 * Re-organize the document in context/problem/solution chapters
 * Refine examples, parsing and construction rules
 * Rename path part to subpath for clarity
 * Document relationship with URL based on @sschuberth feedback
 * Add encoding section
 * Add known types and qualifiers section
 * Add list of candidate types to define
 * Add section for implementation tests
 * Add NuGet and .NET details based on @kasper3 feedback
 * Fix typos

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne pombredanne changed the title Initial puurl draft spec Initial purl draft spec Nov 13, 2017
@jayfk
Copy link

jayfk commented Nov 13, 2017

How are package managers handled that have no unique package name?

PyPi treats - and _ as the same character and is case insensitive.

# resolves to the same package
django-allauth
django_allauth
Django-Allauth
dJAnGo_aLLauTH

 * Pypi package names are case insensitive and a - and _ are the same:
   the name must be normalized.
 * reported by @jayfk

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Member Author

@jayfk your wrote:

How are package managers handled that have no unique package name? PyPi treats - and _ as the same character and is case insensitive.

Thanks and this is an excellent point. I should know better! We should then specify this for each type. I added this for Pypi in 042f108

@pombredanne
Copy link
Member Author

@kasper3 the latest version covers NuGet and .NET

@pombredanne
Copy link
Member Author

@R2wenD2 @sschuberth @mnonnenmacher @jpopelka @jdaguil @JonoYang @MaJuRG @mjherzog @chinyeungli @tdruez This draft ready for your review which is highly valued!

@pombredanne
Copy link
Member Author

@andrew you wrote

Happy to implement on Libraries.io once the spec is finished 👌

This is ready for your review now, I guess!

@andrew
Copy link
Contributor

andrew commented Nov 13, 2017

@pombredanne it's gonna be a busy week for me, don't block on me!

 * therefore the name and namespace for these package types must
   be normalized to lowercase
 * reported by @jayfk

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@ashcrow
Copy link
Contributor

ashcrow commented Nov 13, 2017

It would be good to have an example with the subpath as well. The way I read it this would work:

 github:package-url/purl-spec@244fd47e07d1004f0aed9c#/everybody/loves/dogs

@ashcrow
Copy link
Contributor

ashcrow commented Nov 13, 2017

/cc @jasinner

 * reported by @ashcrow

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@ashcrow
Copy link
Contributor

ashcrow commented Nov 13, 2017

I'm happy to create a golang and/or python parser for this spec once it's finalized.

@pombredanne
Copy link
Member Author

@ashcrow you wrote:

It would be good to have an example with the subpath as well. The way I read it this would work:

github:package-url/purl-spec@244fd47e07d1004f0aed9c#/everybody/loves/dogs

Yes! thanks. I added it in 20f84b3 with a slight modification: the leading slash is not significant in a subpath. Note also FWIW that subpaths may be common for Go.

@pombredanne
Copy link
Member Author

@ashcrow you wrote:

I'm happy to create a golang and/or python parser for this spec once it's finalized.

Let me make you a co-owner of the org together with @andrew

I suggest that we use this convention for implementations repo names: purl-language as in purl-python, purl-go, purl-ruby, purl-js, etc.

@pombredanne
Copy link
Member Author

@ashcrow @andrew org owner invite sent. @jayfk do you want it too?

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Member Author

pombredanne commented Nov 13, 2017

@ashcrow I have a rough, poorly tested first draft toy bit of python code in scancode here
https://github.com/nexB/scancode-toolkit/blob/275-streamline-package-manifests-models/src/packagedcode/purl.py

I will move this out to a bona-fide repo as a Python starter, I guess either public domain or MIT licensed with a "Copyright (c) the purl authors". public domain might be best?

And in terms of contributions a simple DCO https://developercertificate.org/ should be plenty enough IMHO. Do you agree?

@ashcrow
Copy link
Contributor

ashcrow commented Nov 14, 2017

Cool! I'll take a look.

I'm fine with the MIT license and using the DCO.

@pombredanne pombredanne merged commit a21a9ec into master Nov 22, 2017
@pombredanne pombredanne deleted the initial-draft branch November 22, 2017 14:51
@pombredanne
Copy link
Member Author

So next step, please review the main doc and submits tickets as needed, this will be less messy than a crowded PR. Thanks you all for chiming in!

rpm:fedora/curl@7.50.3-1.fc25?arch=i386&distro=fedora-25
rpm:opensuse/curl@7.56.1-1.1.?arch=i386&distro=opensuse-tumbleweed

(NB: some checksums are truncated for brevity)
Copy link

@tgamblin tgamblin Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a potential problem here in that this is going to result in a lot of new URL schemes, but URL schemes require a lot of review to get approved. See here. I suspect they're going to reject this spec on that basis alone. The obvious way around it would be to change these to look something like:

purl:rpm/fedora/curl@7.50.3-1.fc25?arch=i386&distro=fedora-25
purl:github/package-url/purl-spec@244fd47e07d1004f0aed9c
...
etc.

Although honestly if this is going to be a thing, and it's always written in a URL, the url part of purl seems redundant. Why not put the focus on the package? e.g.:

pkg:rpm/fedora/curl@7.50.3-1.fc25?arch=i386&distro=fedora-25
pkg:github/package-url/purl-spec@244fd47e07d1004f0aed9c
...
etc.

You'll note that pkg: isn't yet claimed 😄 .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgamblin I moved this URL and scheme discussion to this ticket: #4

For clarity and simplicity a `purl` is always an ASCII string. To ensure that
there is no ambiguity when parsing a `purl`, separator characters and non-ASCII
characters must be UTF-encoded and then percent-encoded as defined at::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the way to go. URLs can have UTF-8 characters, and they've benefitted countries with different character sets, e.g. Japan and China. I think you should allow UTF-8 to be inclusive. What if there are packages whose users don't ever write their names in roman characters? ASCII seems very limiting there. See here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am with you there: the whole point is to avoid any weird thing and ensure that things are properly UTF-8 and then percent encoded. This means that eventually any character may be used, but using non-ASCII would require a bit more encoding and decoding.

@tgamblin
Copy link

tgamblin commented Nov 22, 2017

@pombredanne: looks cool! Two questions and kind of one for @andrew:

  1. The original referenced issue says the goal is to have a "unique" identifier for each package, although the spec doesn't seem to dwell on that too much, which is probably good. Do you have ideas on how to reconcile the same package fetched from multiple sources? e.g., the same Python package might exist in pypi, conda, spack, and system package managers. @andrew: does libraries.io do anything to reconcile the different names?
  2. Is the idea that these URLs will one day be fetchable by curl? How do you see the translation being implemented? I guess libraries.io could provide that as a service?

=======

When tools, APIs and databases process or store multiple package types, it is
difficult to reference the same software package across tools in a uniform way.
Copy link

@iarna iarna Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok… so is the goal to provide a URI that:

  1. That tells you which tool consumes this
    AND
  2. Provides enough information for that tool to consume it

?

Or is the goal only to provide a way to uniquely reference a specific package, not to be able to reconstruct how its installed?

Who would consume these? Humans? Software? Which software?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iarna you wrote:

Ok… so is the goal to provide a URI that:

That tells you which tool consumes this
AND
Provides enough information for that tool to consume it ?

Not exactly which tools consumes it: for instance a maven purl does not tell you to use Maven, or Gradle, SBT or Ivy, an npm purl does not tell you you should use the npm tool over yarn or anything else. pypi does not tell you to use pip , easy_install or buildout .... what npm mor pypi mean here is the whole spet of packaging format, manifest, registry, protocols and APIs that a tool uses to deal with such package type.

And provide just enough to consume it: well that's a possibility though it may be just a side goal.

In #5 I toyed with the hypothetical idea of a "meta" package manager.... Though frankly this would be a just a franken manager IMHO.

Or is the goal only to provide a way to uniquely reference a specific package, not to be able to reconstruct how its installed?

Who would consume these? Humans? Software? Which software?

I think that the distinguished racketist @jackfirth provides a better summary in #6 that I could have ever written. I intend to steal his words and add them to the spec:

[...] here are the proposed use cases as I understand them:

  • Cross-system metadata indexing to search and monitor packages by metadata like available versions, dependencies, contributors, etc. across multiple package managers (libraries.io)
  • Vulnerability tracking to determine whether a package's set of possible transitive dependencies includes a known vulnerability and whether the version constraints of that dependency graph allow or prevent patching
  • Other kinds of package-content-agnostic analysis tools, especially tools that look at the dependency graphs of package ecosystems

So IMHO the primary beneficiaries are folks or DBs or APIs or tools that deal with several package formats and languages. Very selfishly,that might include me, @andrew @jayfk @jasinner @ashcrow @R2wenD2 @tgamblin @jpopelka and hopefully many more. This may be used in these context for UI and DB and APIs.

It may also --if this gets enough traction-- be something that influences positively some standardization in the domain in the future as a side benefit: say tomorrow you decide to include an npm package size as a key identifying package attribute: may you will think about whether this makes sense purl-wise? and either reconsider this or contribute to purl to adopt such a fine new standard?

- **type**: the package "type" or package "protocol" such as maven, npm, nuget,
gem, pypi, etc. Required.
- **namespace**: some name prefix such as a Maven groupid, a Docker image owner,
a GitHub user or organization. Optional and type-specific.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just gonna add it here too: the npm version of this starts with @. It would be super nice to not have to escape that. 😆

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iarna I am with you there, but FWIW there are some percent-encoding alright in NPMs registry API URLs:
For instance https://registry.yarnpkg.com/@invisionag%2feslint-config-ivx/ works but https://registry.yarnpkg.com/@invisionag/feslint-config-ivx/ comes out as {"error":"Not found"} just so you known.... the slash between the "namespace" and name must be percent encoded....and yes, I know you might think that I have researched too many of these tiny details and quirks.

Here in a purl I guess this is not strictly needed: a name is always required. Therefore a leading @ in a scoped NPM namespace is never ambiguous. What would be ambiguous is if we allow unescaped @ anywhere in the name or namespace and a purl comes with no version.

For instance, say a package name is super@package for the hypothetical weirdo package type:

With the purl weirdo:super@package I would parse @``package as a version even though no version was provided: weirdo:super@package@1.2.3 would not be ambiguous though of course, but my hypothetical weirdo packages rarely have a version attached...

So for simplicity I specified that the whole name and each namespace segment should be percent-encoded: we could relax this for the leading @ in namespace/name (to make scoped NPMs look beautiful) alright. I will update this and both can be acceptable in any case.


::

bitbucket:birkenfeld/pygments-main@244fd47e07d1014f0aed9c
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bitbucket isn't really a consumer though, is it? Like, what kind of package does that specifier refer to?

This is sticky because npm currently consumes specifiers that look almost like this. Specifically, for npm that would be: bitbucket:birkenfeld/pygments-main#244fd47e07d1014f0aed9c.

The full npm specifier, assuming it was an npm package, would be: pygments-main@bitbucket:birkenfeld/pygments-main#244fd47e07d1014f0aed9c

I'm wondering how that would encode as a purl, particularly seeing as # is already used for a subpath.

Copy link
Member Author

@pombredanne pombredanne Nov 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iarna bitbucket, github, gitlab are not strictly-speaking providing packages but are de-facto large reservoirs of packagish-things. They provide more than just a a VCS repo and have tickets, release, some API-fetchable metadata, etc which makes these "packagish" enough to join the fray IMHO.
In the case you mentioned you are encoding the whole VCS address as an NPM version and this works beautifully IMHO:
The purl would be:
npm:pygments-main@bitbucket:birkenfeld/pygments-main%23244fd47e07d1014f0aed9c

Of note is the percent encoding of the version # to avoid parsing an incorrect subpath that would otherwise come out as this ugly mess:
type='npm', name='pygments-main', version='bitbucket:birkenfeld/pygments-main', subpath='244fd47e07d1014f0aed9c'

@pombredanne
Copy link
Member Author

@tgamblin you wrote:

looks cool! Two questions and kind of one for @andrew:
Thanks!

  1. The original referenced issue says the goal is to have a "unique" identifier for each package, although the spec doesn't seem to dwell on that too much, which is probably good. Do you have ideas on how to reconcile the same package fetched from multiple sources? e.g., the same Python package might exist in pypi, conda, spack, and system package managers. @andrew: does libraries.io do anything to reconcile the different names?

I kinda like to think of these as "mostly" unique, at least unique if a package manager/type provides some unicity within its standard package manager and within a repo/registry of these. Most provide such a guarantee.

As for thing being the same, I would think this is something that a DB of purls can help with. There is a an amazing graph of relations among the packages: one upstream package may be repackaged in Linux distro, has its source on GH and BB, be bundled or packaged on Conda, spack as RubyGems, etc.

For me, I intend to maintain such relationships in https://github.com/nexB/vulnerablecode (e.g. relate a CPE and several purls together and relate this cluster to a vulnerability; and I capture some relationships in https://github.com/nexB/scancode-toolkit/blob/275-streamline-package-manifests-models/src/packagedcode/models.py#L237 (e.g. this srpm is the sources or this rpm)

Finding that two packages are the same is not trivial matter though.
I know of two efforts in that domain, focused on Linux mostly:

In all cases, this is hard and @AMDmi3 does a rather superb job in this domain with his concept of "meta package"

  1. Is the idea that these URLs will one day be fetchable by curl? How do you see the translation being implemented? I guess libraries.io could provide that as a service?

I would not make this part of the goals .... though there is a discussion in #5 started by @jackfirth where I toyed with such an hypothetical tool that could fetch a purl #5 (comment)

@pombredanne
Copy link
Member Author

So this PR is closed but the discussion can go on there alright. Tickets are best going forward though!

In particular an important one would be #9 : Should the purl scheme/type be prefixed with purl+?
Please chime in on this as this would be a reasonably important change.

@sschuberth
Copy link
Member

@sschuberth do you want to be on the GH org too, btw?

Only if it gets a nice icon 😆

@pombredanne
Copy link
Member Author

@sschuberth then you won the right to design a logo! Invitation sent!

But what's wrong with this fine logo? :D
https://avatars2.githubusercontent.com/u/33497028?s=60&v=4

@sschuberth
Copy link
Member

@pombredanne Thanks!

But what's wrong with this fine logo?

Nothing... if you like Space Invaders 😉

stevespringett added a commit to CycloneDX/specification that referenced this pull request Dec 1, 2017
pombredanne added a commit to aboutcode-org/scancode-toolkit that referenced this pull request Feb 8, 2018
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit to aboutcode-org/scancode-toolkit that referenced this pull request Feb 8, 2018
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit to aboutcode-org/scancode-toolkit that referenced this pull request Feb 8, 2018
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit to aboutcode-org/scancode-toolkit that referenced this pull request Feb 8, 2018
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit to aboutcode-org/scancode-toolkit that referenced this pull request Feb 16, 2018
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit to aboutcode-org/scancode-toolkit that referenced this pull request Feb 17, 2018
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit to aboutcode-org/scancode-toolkit that referenced this pull request Feb 27, 2018
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit to aboutcode-org/scancode-toolkit that referenced this pull request Mar 12, 2018
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit to aboutcode-org/scancode-toolkit that referenced this pull request Mar 23, 2018
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit to aboutcode-org/scancode-toolkit that referenced this pull request Apr 11, 2018
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
colindean pushed a commit to colindean/purl-spec that referenced this pull request Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.