Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predictable wheel URLs #1944

Closed
toddrme2178 opened this issue Apr 24, 2017 · 12 comments
Closed

Predictable wheel URLs #1944

toddrme2178 opened this issue Apr 24, 2017 · 12 comments
Labels
feature request requires triaging maintainers need to do initial inspection of issue

Comments

@toddrme2178
Copy link

As shown in issue #1239, there is a predictable URL that downstream packagers can use to check to make sure downloaded source archives are valid, of the form http://files.pythonhosted.org/packages/source/{first letter of package name}/{package name}/{file name}. However, I have not been able to find a similar predictable URL for wheels. @minrk and @hobarrera expressed interest in having this feature as well.

Is there a predictable wheel URL like there is for source archives? If not, having one is extremely important for checking download integrity, especially with more and more packages shipping only wheels.

@ddevault
Copy link

This should be prioritized, not ignored. It makes it very difficult to package up Python dependencies.

@brainwane brainwane added requires triaging maintainers need to do initial inspection of issue and removed requires triaging maintainers need to do initial inspection of issue labels Mar 1, 2018
@brainwane
Copy link
Contributor

Hi, @toddrme2178, @SirCmpwn and any other readers. Thanks for letting us know about your thoughts, and sorry for the slow response!

The Warehouse developers have gotten funding to improve Warehouse, and have been progressing on our development roadmap -- the most urgent task is to redirect pypi.python.org to pypi.org so the site is more sustainable and reliable. Along the way we've substantially improved our Warehouse API reference guide which makes it easier for downstreams to programmatically download artifacts, including wheels.

I've spoken with other Warehouse maintainers and we advise that you use more robust methods of grabbing these artifacts, using our supported APIs (such as the JSON API), rather than doing string concatenation as you've described. We cannot make any guarantees that the URLs to the distributions won’t change some day, so unless you are getting URLs from our API, your download tool will always potentially be brittle.

I'm sorry to have to disappoint you. I hope we can address your needs in other ways. Please see other open API-related issues and tell us more about the problems you're trying to solve, so we can help make sure you get what you need via our supported APIs.

Thanks and sorry again for the wait and the disappointment.

@ddevault
Copy link

using our supported APIs (such as the JSON API), rather than doing string concatenation as you've described. We cannot make any guarantees that the URLs to the distributions won’t change some day

This isn't going to work. You're never going to find a distrubution whose packages are going to be making API calls to some custom JSON API to resolve a download link for Python packages. We already went through the headache of getting consistent download URLs once - don't take them away!

@ewdurbin
Copy link
Member

You can make an attempt by working back from the expected file name as documented in PEP 491.

$ curl -I https://files.pythonhosted.org/packages/py2.py3/r/requests/requests-2.18.4-py2.py3-none-any.whl
HTTP/2 302 
date: Thu, 15 Mar 2018 10:00:00 GMT
cache-control: max-age=604800, public
content-type: application/octet-stream
location: https://files.pythonhosted.org/packages/49/df/50aa1999ab9bde74656c2919d9c0c085fd2b3775fd3eca826012bef76d8c/requests-2.18.4-py2.py3-none-any.whl
...

It's not pretty, but it does work:

$ curl -I https://files.pythonhosted.org/packages/cp36/p/psycopg2/psycopg2-2.7.4-cp36-cp36m-manylinux1_x86_64.whl
HTTP/2 302 
date: Thu, 15 Mar 2018 10:10:58 GMT
cache-control: max-age=604800, public
content-type: application/octet-stream
location: https://files.pythonhosted.org/packages/92/15/92b5c363243376ce9cb879bbec561bba196694eb663a6937b4cb967e230e/psycopg2-2.7.4-cp36-cp36m-manylinux1_x86_64.whl

@toddrme2178
Copy link
Author

@brainwane There is simply no way to call JSON APIs at all. There is a program that extracts the URL from the build file using defined tags such as the version, downloads the file, and makes sure it is the same as the one stored already. This is done a consistent way for hundreds of thousands of packages, includes a more than a thousand python packages.

We thought this had at least been worked out for tar and zip-based files almost two years ago. Based on that we went through and manually changed all those more than a thousand python packages to the new URL, many of them twice due to initial confusion regarding whether we should use the pypi.io or files.pythonhosted.org url. Having this URL disappear would be a major feature regression.

@di
Copy link
Member

di commented Mar 15, 2018

@toddrme2178 I think it's not totally clear to us what you're asking for. If you want a function which takes the attributes of a wheel distribution and gives a URL corresponding to where that wheel would exist, it would be something like:

def wheel_url(name, version, build_tag, python_tag, abi_tag, platform_tag):
    host = 'https://files.pythonhosted.org'
    optional_build_tag = f'-{build_tag}' if build_tag else ''
    filename = f'{name}-{version}{optional_build_tag}-{python_tag}-{abi_tag}-{platform_tag}.whl'
    return f'{host}/packages/{python_tag}/{name[0]}/{name}/{filename}'

e.g.:

>>> wheel_url('psycopg2', '2.7.4', None, 'cp36', 'cp36m', 'manylinux1_x86_64')
https://files.pythonhosted.org/packages/cp36/p/psycopg2/psycopg2-2.7.4-cp36-cp36m-manylinux1_x86_64.whl

If you're looking for a way to predict what wheels exist for a given (name, version), it's not possible without using the JSON or Simple APIs.

@toddrme2178
Copy link
Author

toddrme2178 commented Mar 15, 2018

@di To use the previous requests example, say we make a package for requests, we would have something like:

Version:     2.18.4
Url:         https://files.pythonhosted.org/packages/py2.py3/r/requests/requests-%{version}-py2.py3-none-any.whl

Then we find out there is an update numbered 2.19.6. We would download the package from pypi or warehouse somehow, upload it to our build system, then change the %{version} tag to 2.19.6. Whenever the build system rebuilds the package, it will replace the %{version} text with 2.19.6, download the file https://files.pythonhosted.org/packages/py2.py3/r/requests/requests-2.19.6-py2.py3-none-any.whl, and compare that to the file that was uploaded to make sure nothing has changed (there could be other tags besides version, but the build system handles putting the tags in the right place).

The important thing is that if upstream only updates the version number (for example requests-2.18.4-py2.py3-none-any.whl to requests-2.19.6-py2.py3-none-any.whl), then that is the only thing that should change in the download URL (of course you have no control over upstream changing the file name or archive format). That is currently the case with zip and tar.gz source archives, and I was asking if there was a similar URL for wheels (apparently there is, which is helpful).

From what @brainwane said, it sounded like we couldn't even count on those URLs being consistent for zip and tar.gz source archives, not to mention the wheels I was asking about. If those will be consistent then our use-case is satisfied (I can't speak for anyone else on the thread).

@di
Copy link
Member

di commented Mar 15, 2018

From what @brainwane said, it sounded like we couldn't even count on those URLs being consistent for zip and tar.gz source archives, not to mention the wheels I was asking about. If those will be consistent then our use-case is satisfied (I can't speak for anyone else on the thread).

@brainwane is right in this sense, because (to my knowledge) the https://files.pythonhosted.org host is not really intended to be used this way, so while we have no plans to change the host, or remove the redirects, it's possible that we might have needed to change this at some point, as we have in the past.

However, we now know about your use case, so please be assured that we won't suddenly break this for you.

@toddrme2178
Copy link
Author

@di
Copy link
Member

di commented Jun 7, 2018

You've got some issues with the python_tag for these ones:

- https://files.pythonhosted.org/packages/py2.py3/c/coverage-config-reload-plugin/coverage_config_reload_plugin-0.2.0-py2.py3-none-any.whl
+ https://files.pythonhosted.org/packages/any/c/coverage-config-reload-plugin/coverage_config_reload_plugin-0.2.0-py2.py3-none-any.whl
- https://files.pythonhosted.org/packages/py2.py3/n/numericalunits/numericalunits-1.21-py2.py3-none-any.whl
+ https://files.pythonhosted.org/packages/3.6/n/numericalunits/numericalunits-1.21-py2.py3-none-any.whl

The version in the filename is wrong for this one:

- https://files.pythonhosted.org/packages/py2.py3/c/coverage-env-plugin/coverage_env_plugin-0.1.0-py2.py3-none-any.whl
+ https://files.pythonhosted.org/packages/any/c/coverage-env-plugin/coverage_env_plugin-0.1-py2.py3-none-any.whl

@toddrme2178
Copy link
Author

toddrme2178 commented Jun 7, 2018 via email

@toddrme2178
Copy link
Author

I think this is okay now.

jvolkman added a commit to jvolkman/rules_pycross that referenced this issue May 3, 2022
Before we were guessing the URL at pythonhosted.org based on
pypi/warehouse#1944. This works in theory, but
there are over 100k wheels uploaded with the wrong python tags (i.e.,
the tag in index metadata doesn't match the tag in the .whl filename).

Instead, pypi_file takes a package name, version, filename, and sha256,
fetches package metadata from pypi (or a compatible index), and uses
that URL to download the package. It's still pure Bazel, thanks to the
PyPI JSON API and Bazel's JSON support.
jvolkman added a commit to jvolkman/rules_pycross that referenced this issue May 3, 2022
Before we were guessing the URL at pythonhosted.org based on
pypi/warehouse#1944. This works in theory, but
there are over 100k wheels uploaded with the wrong python tags (i.e.,
the tag in index metadata doesn't match the tag in the .whl filename).

Instead, pypi_file takes a package name, version, filename, and sha256,
fetches package metadata from pypi (or a compatible index), and uses
that URL to download the package. It's still pure Bazel, thanks to the
PyPI JSON API and Bazel's JSON support.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request requires triaging maintainers need to do initial inspection of issue
Projects
None yet
Development

No branches or pull requests

5 participants