Add functions for parsing wheel and sdist filenames #387

pfmoore · 2021-01-19T12:09:15Z

I finally got fed up of writing the same code to parse wheel and sdist filenames, so I thought it might be worth having a "standard" version.

As written, the code only handles sdist filenames in the form described in the packaging standards (.tar.gz files with the form {name}-{version}.tar.gz). It could be extended to handle more of the cases in use on PyPI (e.g., .zip format sdists) but I chose in the first instance to stick with the stricter definition that's the current de facto standard.

pfmoore · 2021-01-19T12:10:38Z

The docs build is failing. It seems that the doctest step is checking against the released code, not against the in-development sources. I'm not sure what I'm doing wrong to have this happen. Advice would be appreciated 🙂

pradyunsg · 2021-01-19T13:15:04Z

Try adding a session.install("-e", ".") go the docs function in the noxfile?

pfmoore · 2021-01-19T13:23:29Z

Thanks, that fixed it. I spent ages trying to work out why there was an older version of packaging present. (I'm still not clear why that's happening, or how the other doctests didn't fail when they were added, but it's working now, so 🤷

pradyunsg · 2021-01-19T13:50:03Z

nox is like tox, each run happens in it's own isolated environment. However, unlike tox, nox doesn't install your package automatically - you need to do that explicitly.

We hadn't. :)

pradyunsg · 2021-01-19T13:50:49Z

how the other doctests didn't fail when they were added

Pretty sure we weren't testing them earlier.

uranusjr · 2021-01-19T13:51:09Z

I built a package for this functionality a while ago: https://github.com/uranusjr/packaging-dists

The interface proposed here is slightly lower level, but the parsing part should be the same. I’ll try to find some time to compare implementations and see if either is missing edge cases.

pradyunsg · 2021-01-19T13:54:32Z

packaging/utils.py

+        )
+
+    parts = filename.split("-", dashes - 2)
+    name = canonicalize_name(parts[0])


This should instead check that name == canonical name?

The escaping rules for wheel filenames aren't the same in PEP 427 as in PEP 503. I'm more interested in parsing the data than validating, so I chose to return the canonicalized name rather than try to apply a check. I could be persuaded to not canonicalize at all, but I'm against trying to validate.

Validating the name component against PEP 427 would be "only alphanumeric and underscores, and underscores can't occur together". Which I don't think is that useful.

Would it be worth documenting what these functions don't check compared to the PEPs? That way anyone using them who wants to be strict know what more they would need to do on top of what the functions return?

On reflection, doing the check and raising an error if the name part isn't valid isn't that hard, and I'm fine with raising an error if it's invalid. The check is so minimal I don't expect anything to fail it in any practical situation...

But if we don't validate, then yes, I'm fine with documenting what we don't do 🙂

pfmoore · 2021-01-19T14:24:08Z

I’ll try to find some time to compare implementations and see if either is missing edge cases.

I had a quick look. The main differences I can see are that you allow more formats for sdists (technically only .tar.gz is allowed nowadays, thanks to PEP 517, but as I say, I'm happy to add more) and you don't assume PEP 440 versions for sdists, so you can't simply split on the last -. I'm not personally interested in legacy versions, and packaging is deprecating them, so I don't think this function should worry about them.

Let me know if I missed anything, though, or you disagree on any of the above.

brettcannon · 2021-01-19T17:33:56Z

Thanks for doing this, @pfmoore (and @uranusjr for your version)! I also came close to doing this myself for mousebender (which I do hope to upstream here after we drop Python 2 support), so I definitely think there's demand for this. 😉

sinoroc · 2021-01-19T18:49:00Z

I definitely think there's demand for this

I can confirm, this is something I would find very useful.

pradyunsg · 2021-01-19T22:31:13Z

I'll hold off on clicking the merge button, until Paul decides whether he wants to add the validation for the name. Regardless, once he's happy with this, I think this is ready for merging. :)

pfmoore · 2021-01-19T22:56:20Z

Let me add the validation. I'll do that tomorrow then this can be good to go :-)

pfmoore · 2021-01-20T09:03:11Z

OK, validation added. I tried to add a test for the Unicode case, but Python 2 made me cry, so I just dropped it. If anyone can tell me how I'd do that cleanly, I'm happy to add it back, but it's a bit of a corner case, so I'm not too worried. (My first thought was to slap a u"..." around the string, but I don't know if the type annotations would approve...)

pfmoore · 2021-01-20T11:36:46Z

@pradyunsg This is ready for merge now, IMO.

pradyunsg · 2021-01-20T12:29:35Z

Thanks @pfmoore. I've made an issue to follow up on the unicode test, once we drop Python 2. :)

pfmoore · 2021-01-20T12:46:44Z

Excellent, thanks! I wasn't sure what the timescale was for dropping Python 2, but that sounds like the ideal solution.

brettcannon · 2021-01-20T20:00:05Z

@pfmoore The plans are as soon as we do one more release (see #385 ).

brettcannon · 2021-01-23T19:24:58Z

Quick question before we release this: according to PEP 427, the build tag should be sorted "as the empty string if unspecified". Should we then have parse_wheel_filename() return "" instead of None when there's no build tag?

BTW, I'm expecting that as part of upstreaming mousebender's simple index code there will be an improvement to group files together, which will mean some code to sort by build tag which has some interesting requirements:

Sort as the empty string if unspecified, else sort the initial digits as a number, and the remainder lexicographically.

pfmoore · 2021-01-23T20:25:10Z

Good point, yes that makes sense.

brettcannon · 2021-01-24T02:05:22Z

Sorry about this, but I'm thinking through how to group wheel files from a Simple repository API perspective and it's causing me to realize things in stages. 😉

Maybe this function is where we should do the work to parse the build tag for proper sorting? Looking at the default used in pip it's actually the empty tuple. Then when there is a build tag it's a tuple of (int, str) to get the sorting that PEP 491 specifies. Maybe for typing purposes the default should be (0, '')? Or maybe we could use a type alias like we use for normalized package names so people don't need to care about the underlying tuple structure? I have no magical insight here, BTW, just ideas. 😄

Since the appropriate wheel can't be selected without sorting by build tag as shown by the sort key in pip (if I'm reading that sort key tuple appropriately), it seems this work can't be avoided and so it might as well be centralized here and done automatically.

And I will open an issue as appropriate for any of these ideas based on what we all agree on.

pfmoore · 2021-01-24T10:03:25Z

I'm fine with returning something more structured for the build tag. I've very rarely seen it used "in the wild", so I did the simplest thing that would work to start with, but I'm happy to expand on it.

I think a tuple is probably sufficient - I don't think there's any need to name the parts. I haven't done enough with type hints to know how I feel about using a type alias, so I'd go with what others feel is reasonable there.

BTW, I'm curious to know how you plan on using this in mousebender. I have a library, shadwell that implements the package selection part of pip's finder functionality - that will need this change - and from what I recall when we last spoke you didn't feel like that was a good fit for mousebender. Have you since changed your mind, or are you planning to use this for something different?

brettcannon · 2021-01-26T18:32:30Z

@pfmoore the reason I bring all of this up in terms of mousebender is grouping results from a Simple repository API. Taking https://download.pytorch.org/whl/torch_stable.html as an example, I would assume the results from a Simple repository index isn't useful unless you group by:

Project
Version
Sort by build tag
(Frozen) set of wheel tags

Since the build tag seemingly has an innate sort order which is agnostic of what sort of wheel one is after, I figured having the build tag pulled together upfront made sense (i.e. build tag takes precedence over wheel tag matching).

Now if I have read the PEPs and pip source code wrong and the build tag doesn't play that much of an important part and is really more for breaking ties when matching by wheel tag, please let me know as that means we can skip the parsing upfront. That would save CPU cycles for the common case of having only one wheel tag match per version.

BTW, I'm going to open an issue to track all of this in a more appropriate location. 😄

brettcannon · 2021-01-26T19:17:27Z

#389

henryiii · 2021-01-26T19:52:19Z

FYI, parse_wheel_filename('spam-0.1.0-cp39-cp39-macosx_10_9_universal2.macosx_11_0_universal2.whl') produces ('spam', <Version('0.1.0')>, None, frozenset([<cp39-cp39-macosx_10_9_universal2 @ 4329376176>, <cp39-cp39-macosx_11_0_universal2 @ 4329376256>])). Why does it not give access to the python/abi/platform tags, out of curiosity?

Edit: partially answering my question, they are grouped together.

pfmoore · 2021-01-26T20:45:06Z

Why does it not give access to the python/abi/platform tags, out of curiosity?

It returns a frozenset of tag objects, which have interpreter, abi and platform attributes...

henryiii · 2021-01-27T00:21:42Z

Ahh, excellent, that's useful, thanks!

Add functions for parsing wheel and sdist filenames

ad5b301

pfmoore force-pushed the parse_filenames branch from 6a8c53a to ad5b301 Compare January 19, 2021 12:12

Fix docs build

5667f8f

Correct the reference to where sdist filenames are documented

fd9977b

pradyunsg reviewed Jan 19, 2021

View reviewed changes

brettcannon approved these changes Jan 19, 2021

View reviewed changes

pradyunsg approved these changes Jan 19, 2021

View reviewed changes

Validate the project name field in wheel filenames

56b1dfd

pfmoore force-pushed the parse_filenames branch from 8c168e7 to 56b1dfd Compare January 20, 2021 08:48

Sigh. Python 2 and Unicode

d3388aa

pradyunsg mentioned this pull request Jan 20, 2021

Re-add unicode test for filename parsing #388

Open

pradyunsg approved these changes Jan 20, 2021

View reviewed changes

pradyunsg merged commit da68002 into pypa:master Jan 20, 2021

pfmoore deleted the parse_filenames branch January 20, 2021 12:46

henryiii mentioned this pull request Jan 24, 2021

New release with ability to load 10.9 Universal2 wheels on ARM #385

Closed

brettcannon mentioned this pull request Jan 26, 2021

(Potentially) tweak build tag return values for utils.parse_wheel_filename() #389

Closed

dependabot bot mentioned this pull request Mar 15, 2021

Bump packaging from 20.4 to 20.9 release-engineering/exodus-lambda#144

Merged

pelson mentioned this pull request Apr 19, 2021

Test: Add coverage for wheel filename normalization. #422

Merged

lkollar mentioned this pull request May 4, 2021

Add support for .zip sdists in parse_sdist_filename #429

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functions for parsing wheel and sdist filenames #387

Add functions for parsing wheel and sdist filenames #387

pfmoore commented Jan 19, 2021 •

edited

Loading

pfmoore commented Jan 19, 2021

pradyunsg commented Jan 19, 2021

pfmoore commented Jan 19, 2021

pradyunsg commented Jan 19, 2021

pradyunsg commented Jan 19, 2021

uranusjr commented Jan 19, 2021

pradyunsg Jan 19, 2021

pfmoore Jan 19, 2021

brettcannon Jan 19, 2021

pfmoore Jan 19, 2021

pfmoore Jan 19, 2021

pfmoore commented Jan 19, 2021

brettcannon commented Jan 19, 2021

sinoroc commented Jan 19, 2021

pradyunsg commented Jan 19, 2021

pfmoore commented Jan 19, 2021

pfmoore commented Jan 20, 2021

pfmoore commented Jan 20, 2021

pradyunsg commented Jan 20, 2021

pfmoore commented Jan 20, 2021

brettcannon commented Jan 20, 2021

brettcannon commented Jan 23, 2021

pfmoore commented Jan 23, 2021

brettcannon commented Jan 24, 2021

pfmoore commented Jan 24, 2021

brettcannon commented Jan 26, 2021

brettcannon commented Jan 26, 2021

henryiii commented Jan 26, 2021 •

edited

Loading

pfmoore commented Jan 26, 2021

henryiii commented Jan 27, 2021

Add functions for parsing wheel and sdist filenames #387

Add functions for parsing wheel and sdist filenames #387

Conversation

pfmoore commented Jan 19, 2021 • edited Loading

pfmoore commented Jan 19, 2021

pradyunsg commented Jan 19, 2021

pfmoore commented Jan 19, 2021

pradyunsg commented Jan 19, 2021

pradyunsg commented Jan 19, 2021

uranusjr commented Jan 19, 2021

pradyunsg Jan 19, 2021

Choose a reason for hiding this comment

pfmoore Jan 19, 2021

Choose a reason for hiding this comment

brettcannon Jan 19, 2021

Choose a reason for hiding this comment

pfmoore Jan 19, 2021

Choose a reason for hiding this comment

pfmoore Jan 19, 2021

Choose a reason for hiding this comment

pfmoore commented Jan 19, 2021

brettcannon commented Jan 19, 2021

sinoroc commented Jan 19, 2021

pradyunsg commented Jan 19, 2021

pfmoore commented Jan 19, 2021

pfmoore commented Jan 20, 2021

pfmoore commented Jan 20, 2021

pradyunsg commented Jan 20, 2021

pfmoore commented Jan 20, 2021

brettcannon commented Jan 20, 2021

brettcannon commented Jan 23, 2021

pfmoore commented Jan 23, 2021

brettcannon commented Jan 24, 2021

pfmoore commented Jan 24, 2021

brettcannon commented Jan 26, 2021

brettcannon commented Jan 26, 2021

henryiii commented Jan 26, 2021 • edited Loading

pfmoore commented Jan 26, 2021

henryiii commented Jan 27, 2021

pfmoore commented Jan 19, 2021 •

edited

Loading

henryiii commented Jan 26, 2021 •

edited

Loading