-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial purl draft spec #1
Conversation
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Link: aboutcode-org/scancode-toolkit#805 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
README.rst
Outdated
- in Go: | ||
- in JavaScript: | ||
- in Perl: | ||
- for the JVM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider including .NET as well.
For implementation, these places are potential candidate for discussion:
https://github.com/nuget/home
https://github.com/dotnet/home
https://github.com/dotnet/standard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point! I am adding it in the updated version that I am pushing shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kasper3 this has been addressed in the latest push.
cce2ebc
to
7f730e6
Compare
* Update spec to use "purl" and package URL, not "puurl" based on @sschuberth feedback * Re-organize the document in context/problem/solution chapters * Refine examples, parsing and construction rules * Rename path part to subpath for clarity * Document relationship with URL based on @sschuberth feedback * Add encoding section * Add known types and qualifiers section * Add list of candidate types to define * Add section for implementation tests * Add NuGet and .NET details based on @kasper3 feedback * Fix typos Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
7f730e6
to
cdf28ff
Compare
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
How are package managers handled that have no unique package name? PyPi treats
|
* Pypi package names are case insensitive and a - and _ are the same: the name must be normalized. * reported by @jayfk Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@kasper3 the latest version covers NuGet and .NET |
@R2wenD2 @sschuberth @mnonnenmacher @jpopelka @jdaguil @JonoYang @MaJuRG @mjherzog @chinyeungli @tdruez This draft ready for your review which is highly valued! |
@pombredanne it's gonna be a busy week for me, don't block on me! |
* therefore the name and namespace for these package types must be normalized to lowercase * reported by @jayfk Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
It would be good to have an example with the subpath as well. The way I read it this would work:
|
/cc @jasinner |
* reported by @ashcrow Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
I'm happy to create a golang and/or python parser for this spec once it's finalized. |
@ashcrow you wrote:
Yes! thanks. I added it in 20f84b3 with a slight modification: the leading slash is not significant in a subpath. Note also FWIW that subpaths may be common for Go. |
@ashcrow you wrote:
Let me make you a co-owner of the org together with @andrew I suggest that we use this convention for implementations repo names: |
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@ashcrow I have a rough, poorly tested first draft toy bit of python code in scancode here I will move this out to a bona-fide repo as a Python starter, I guess either public domain or MIT licensed with a "Copyright (c) the purl authors". public domain might be best? And in terms of contributions a simple DCO https://developercertificate.org/ should be plenty enough IMHO. Do you agree? |
Cool! I'll take a look. I'm fine with the MIT license and using the DCO. |
So next step, please review the main doc and submits tickets as needed, this will be less messy than a crowded PR. Thanks you all for chiming in! |
rpm:fedora/curl@7.50.3-1.fc25?arch=i386&distro=fedora-25 | ||
rpm:opensuse/curl@7.56.1-1.1.?arch=i386&distro=opensuse-tumbleweed | ||
|
||
(NB: some checksums are truncated for brevity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a potential problem here in that this is going to result in a lot of new URL schemes, but URL schemes require a lot of review to get approved. See here. I suspect they're going to reject this spec on that basis alone. The obvious way around it would be to change these to look something like:
purl:rpm/fedora/curl@7.50.3-1.fc25?arch=i386&distro=fedora-25
purl:github/package-url/purl-spec@244fd47e07d1004f0aed9c
...
etc.
Although honestly if this is going to be a thing, and it's always written in a URL, the url
part of purl
seems redundant. Why not put the focus on the package? e.g.:
pkg:rpm/fedora/curl@7.50.3-1.fc25?arch=i386&distro=fedora-25
pkg:github/package-url/purl-spec@244fd47e07d1004f0aed9c
...
etc.
You'll note that pkg:
isn't yet claimed 😄 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity and simplicity a `purl` is always an ASCII string. To ensure that | ||
there is no ambiguity when parsing a `purl`, separator characters and non-ASCII | ||
characters must be UTF-encoded and then percent-encoded as defined at:: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the way to go. URLs can have UTF-8 characters, and they've benefitted countries with different character sets, e.g. Japan and China. I think you should allow UTF-8 to be inclusive. What if there are packages whose users don't ever write their names in roman characters? ASCII seems very limiting there. See here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am with you there: the whole point is to avoid any weird thing and ensure that things are properly UTF-8 and then percent encoded. This means that eventually any character may be used, but using non-ASCII would require a bit more encoding and decoding.
@pombredanne: looks cool! Two questions and kind of one for @andrew:
|
======= | ||
|
||
When tools, APIs and databases process or store multiple package types, it is | ||
difficult to reference the same software package across tools in a uniform way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok… so is the goal to provide a URI that:
- That tells you which tool consumes this
AND - Provides enough information for that tool to consume it
?
Or is the goal only to provide a way to uniquely reference a specific package, not to be able to reconstruct how its installed?
Who would consume these? Humans? Software? Which software?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iarna you wrote:
Ok… so is the goal to provide a URI that:
That tells you which tool consumes this
AND
Provides enough information for that tool to consume it ?
Not exactly which tools consumes it
: for instance a maven
purl
does not tell you to use Maven, or Gradle, SBT or Ivy, an npm
purl
does not tell you you should use the npm
tool over yarn
or anything else. pypi
does not tell you to use pip
, easy_install
or buildout
.... what npm
mor pypi
mean here is the whole spet of packaging format, manifest, registry, protocols and APIs that a tool uses to deal with such package type
.
And provide just enough to consume it: well that's a possibility though it may be just a side goal.
In #5 I toyed with the hypothetical idea of a "meta" package manager.... Though frankly this would be a just a franken manager IMHO.
Or is the goal only to provide a way to uniquely reference a specific package, not to be able to reconstruct how its installed?
Who would consume these? Humans? Software? Which software?
I think that the distinguished racketist @jackfirth provides a better summary in #6 that I could have ever written. I intend to steal his words and add them to the spec:
[...] here are the proposed use cases as I understand them:
- Cross-system metadata indexing to search and monitor packages by metadata like available versions, dependencies, contributors, etc. across multiple package managers (libraries.io)
- Vulnerability tracking to determine whether a package's set of possible transitive dependencies includes a known vulnerability and whether the version constraints of that dependency graph allow or prevent patching
- Other kinds of package-content-agnostic analysis tools, especially tools that look at the dependency graphs of package ecosystems
So IMHO the primary beneficiaries are folks or DBs or APIs or tools that deal with several package formats and languages. Very selfishly,that might include me, @andrew @jayfk @jasinner @ashcrow @R2wenD2 @tgamblin @jpopelka and hopefully many more. This may be used in these context for UI and DB and APIs.
It may also --if this gets enough traction-- be something that influences positively some standardization in the domain in the future as a side benefit: say tomorrow you decide to include an npm
package size as a key identifying package attribute: may you will think about whether this makes sense purl-wise? and either reconsider this or contribute to purl
to adopt such a fine new standard?
- **type**: the package "type" or package "protocol" such as maven, npm, nuget, | ||
gem, pypi, etc. Required. | ||
- **namespace**: some name prefix such as a Maven groupid, a Docker image owner, | ||
a GitHub user or organization. Optional and type-specific. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just gonna add it here too: the npm version of this starts with @
. It would be super nice to not have to escape that. 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iarna I am with you there, but FWIW there are some percent-encoding alright in NPMs registry API URLs:
For instance https://registry.yarnpkg.com/@invisionag%2feslint-config-ivx/ works but https://registry.yarnpkg.com/@invisionag/feslint-config-ivx/ comes out as {"error":"Not found"}
just so you known.... the slash between the "namespace" and name must be percent encoded....and yes, I know you might think that I have researched too many of these tiny details and quirks.
Here in a purl
I guess this is not strictly needed: a name is always required. Therefore a leading @
in a scoped NPM namespace
is never ambiguous. What would be ambiguous is if we allow unescaped @
anywhere in the name or namespace and a purl
comes with no version.
For instance, say a package name is super@package
for the hypothetical weirdo
package type:
With the purl
weirdo:super@package
I would parse @``package
as a version even though no version was provided: weirdo:super@package@1.2.3
would not be ambiguous though of course, but my hypothetical weirdo packages rarely have a version attached...
So for simplicity I specified that the whole name and each namespace segment should be percent-encoded: we could relax this for the leading @
in namespace/name (to make scoped NPMs look beautiful) alright. I will update this and both can be acceptable in any case.
|
||
:: | ||
|
||
bitbucket:birkenfeld/pygments-main@244fd47e07d1014f0aed9c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bitbucket
isn't really a consumer though, is it? Like, what kind of package does that specifier refer to?
This is sticky because npm currently consumes specifiers that look almost like this. Specifically, for npm that would be: bitbucket:birkenfeld/pygments-main#244fd47e07d1014f0aed9c
.
The full npm specifier, assuming it was an npm package, would be: pygments-main@bitbucket:birkenfeld/pygments-main#244fd47e07d1014f0aed9c
I'm wondering how that would encode as a purl, particularly seeing as #
is already used for a subpath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iarna bitbucket, github, gitlab are not strictly-speaking providing packages but are de-facto large reservoirs of packagish-things. They provide more than just a a VCS repo and have tickets, release, some API-fetchable metadata, etc which makes these "packagish" enough to join the fray IMHO.
In the case you mentioned you are encoding the whole VCS address as an NPM version and this works beautifully IMHO:
The purl would be:
npm:pygments-main@bitbucket:birkenfeld/pygments-main%23244fd47e07d1014f0aed9c
Of note is the percent encoding of the version #
to avoid parsing an incorrect subpath
that would otherwise come out as this ugly mess:
type='npm', name='pygments-main', version='bitbucket:birkenfeld/pygments-main', subpath='244fd47e07d1014f0aed9c'
@tgamblin you wrote:
I kinda like to think of these as "mostly" unique, at least unique if a package manager/type provides some unicity within its standard package manager and within a repo/registry of these. Most provide such a guarantee. As for thing being the same, I would think this is something that a DB of For me, I intend to maintain such relationships in https://github.com/nexB/vulnerablecode (e.g. relate a CPE and several Finding that two packages are the same is not trivial matter though.
In all cases, this is hard and @AMDmi3 does a rather superb job in this domain with his concept of "meta package"
I would not make this part of the goals .... though there is a discussion in #5 started by @jackfirth where I toyed with such an hypothetical tool that could fetch a |
So this PR is closed but the discussion can go on there alright. Tickets are best going forward though! In particular an important one would be #9 : |
Only if it gets a nice icon 😆 |
@sschuberth then you won the right to design a logo! Invitation sent! But what's wrong with this fine logo? :D |
@pombredanne Thanks!
Nothing... if you like Space Invaders 😉 |
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
…estcases Adds Homebrew test cases
For reference this is the result of a discussion that started here
aboutcode-org/scancode-toolkit#805