Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot download packages from private PyPi repository using HTTP basic auth with Poetry 1.1.0 + old v1.0.x PypiCloud w/ default settings #3041

Closed
3 tasks done
MasterNayru opened this issue Oct 2, 2020 · 25 comments
Labels
kind/bug Something isn't working as expected

Comments

@MasterNayru
Copy link
Contributor

  • I am on the latest Poetry version.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).

Issue

We have been using Poetry to pull down packages from a private PyPi repository and everything has been working fine until Poetry 1.1.0. We are configuring poetry to talk to our private PyPi installation by HTTP basic auth, and that auth works perfectly fine to resolve which versions of a package to install. The problem seems to be that that same auth is then used in the requests to download wheels from PyPi, which causes the following error to occur:

$ poetry config http-basic.myprivaterepo <username> <password>
$ poetry update -vvv

<snip>

   2  ~/.pyenv/versions/3.8.5/lib/python3.8/site-packages/poetry/repositories/pypi_repository.py:454 in _download
       452│ 
       453│     def _download(self, url, dest):  # type: (str, str) -> None
     → 454│         return download_file(url, dest, session=self.session)
       455│ 
       456│     def _log(self, msg, level="info"):

   1  ~/.pyenv/versions/3.8.5/lib/python3.8/site-packages/poetry/utils/helpers.py:98 in download_file
        96│ 
        97│     with get(url, stream=True) as response:
     →  98│         response.raise_for_status()
        99│ 
       100│         with open(dest, "wb") as f:

  HTTPError

  400 Client Error: Bad Request for url: https://deckard-pip.s3.amazonaws.com/1234/my_broken_dependency/my_broken_dependency-0.1.3-py3-none-any.whl?AWSAccessKeyId=<key>&Signature=kz30gf304b%2F%2F93pQeUSPrto5MiE%3D&x-amz-security-token=<token>&Expires=1601690152

  at ~/.pyenv/versions/3.8.5/lib/python3.8/site-packages/requests/models.py:941 in raise_for_status
      937│         elif 500 <= self.status_code < 600:
      938│             http_error_msg = u'%s Server Error: %s for url: %s' % (self.status_code, reason, self.url)
      939│ 
      940│         if http_error_msg:
    → 941│             raise HTTPError(http_error_msg, response=self)
      942│ 
      943│     def close(self):
      944│         
      945│         called the underlying ``raw`` object must not be accessed again.

If I change the following lines in the poetry code:

   2  ~/.pyenv/versions/3.8.5/lib/python3.8/site-packages/poetry/repositories/pypi_repository.py:454 in _download
       453│     def _download(self, url, dest):  # type: (str, str) -> None
     → 454│         return download_file(url, dest, session=self.session)

changes to:

   2  ~/.pyenv/versions/3.8.5/lib/python3.8/site-packages/poetry/repositories/pypi_repository.py:454 in _download
       453│     def _download(self, url, dest):  # type: (str, str) -> None
     → 454│         return download_file(url, dest)

and re-run, everything works:

$ poetry update
Skipping virtualenv creation, as specified in config file.
Updating dependencies
Resolving dependencies... (41.8s)

No dependencies to install or update

It seems like the auth is needed to talk to the API for package version resolution but causes issues when it is also used for package downloads. If it makes any difference, I am using pypicloud as the backend for my private PyPi installation.
I am trying to be as brief as possible with my output as possible without dumping any keys or stuff like that. Please let me know if you need any more information or suggestions on what I should be changing in my configuration to get my stuff working again.

@MasterNayru MasterNayru added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Oct 2, 2020
@abn
Copy link
Member

abn commented Oct 2, 2020

@MasterNayru interesting. We recently identified that in 1.0.10 we did not apply authentication correctly for source specified in the pyproject toml. In typical circumstances we expect authentication to be used for both api queries as well as file downloads.

Am I correct in understanding that in your case; authentication is used to retrieve wheel direct links (ie. incl. tokens) but the expectation is that we do not send basic auth when downloading these wheels? If so, this is a bit tricky, as this would mean there are 2 use cases that are not necessarily compatible with each other.

@MasterNayru
Copy link
Contributor Author

@abn I am expecting that if I am trying to download packages from S3 that, since pypicloud returns a URL with the necessary auth parameters in the download URL from the API requests, and since S3 seems to error out when the username/password auth parameters are provided, that they will somehow not be used as part of the requests for the downloads, which seemed to fit in line pretty well with the behaviour in the older versions.

@MasterNayru
Copy link
Contributor Author

@abn I am going to close this issue as it would appear that pypicloud have, since I last looked at it, defaulted to the behaviour which poetry is enforcing with regards to auth for package downloads. Rather than try to support the old behaviour which pip works with, I will update my pypicloud installation and make use of the new behaviour. Cheers for bearing with us.

@MasterNayru MasterNayru changed the title Cannot download packages from private PyPi repository using HTTP basic auth with Poetry 1.1.0 Cannot download packages from private PyPi repository using HTTP basic auth with Poetry 1.1.0 + old v1.0.x PypiCloud w/ default settings Oct 2, 2020
@pepastach
Copy link

pepastach commented Oct 6, 2020

We hit exactly the same problem with poetry 1.1.1 and pypicloud 1.0.10.
The way I understand it, this can't be solved by pypicloud upgrade. Here's my reasoning:

  1. poetry makes a call to pypicloud
  2. pypicloud returns a pre-signed URL pointing to our S3 bucket (the URL already contains AWS access key and token)
  3. poetry makes a GET request (using requests library) but since it passes the session as @MasterNayru described in the issue, requests adds the Authorization header. This makes the request invalid. We verified it manually using curl.

Since it's poetry/requests who adds the authorization header, I don't see how this can be fixed on pypicloud side.

Please correct me (or reopen the issue 😉 ).

@pepastach
Copy link

pypicloud 1.0.11 introduced the ability to stream files through pypicloud. By briefly looking at the diff, it seems like we can configure pypicloud with pypi.stream_files = True. pypicloud should then return the package file directly instead of redirecting to S3.

We'll try pypicloud bump and report back.

@Katafalkas
Copy link

Hi. Ran into the same issue.
pypicloud==1.1.5
poetry==1.1.3

Tried both with and without pypi.stream_files = True - same issue.

Error is because of the headers being sent. The same url is downloadable using curl.

b'<?xml version="1.0" encoding="UTF-8"?>\n<Error><Code>InvalidArgument</Code><Message>Only one auth mechanism allowed; only the X-Amz-Algorithm query parameter, Signature query string parameter or the Authorization header should be specified</Message><ArgumentName>Authorization</ArgumentName><ArgumentValue>Basic XxXx...</ArgumentValue><RequestId>58D6A864A3D27683</RequestId><HostId>FpXxx...</HostId></Error>'

@Katafalkas
Copy link

@MasterNayru I would suggest that this issue should be considered as poetry issue. The reason being - same pypicloud server is working just fine with pip, and does not with poetry.

We run number of repositories and python packages, only one of them is with poetry and currently it does not work. We should either wait or help with fixing poetry or migrate to pip.

@abn
Copy link
Member

abn commented Nov 2, 2020

@Katafalkas one aspect to note here is that pip and poetry have different uses for these URLs. One such case is that poetry stores these URLs in the lock file. Considering that the tokens used for these URLs are short-lived, it is not ideal to be used within a lock file. As far as I can tell the use of short-lived authorised URLs are not a defined behaviour. The use case in pip most likely works because search and retrival are treated seperately, this in-effect might allow this to work as expected.

Considering that pypicloud is a common component these days, we might need to look at how we can better support it. On the other hand however, PEP 503 does not define a mechanism for independent authentication for the file URLs. Typically, the authentication for the host domain is re-used.

Out of curiosity, are the domains different for the index and the file?

@Qu4tro
Copy link
Contributor

Qu4tro commented Nov 2, 2020

They are for us (also using pypicloud), as the files are hosted on amazonaws.com and the index is on our domain.

@Katafalkas
Copy link

@Katafalkas one aspect to note here is that pip and poetry have different uses for these URLs. One such case is that poetry stores these URLs in the lock file. Considering that the tokens used for these URLs are short-lived, it is not ideal to be used within a lock file. As far as I can tell the use of short-lived authorised URLs are not a defined behaviour. The use case in pip most likely works because search and retrival are treated seperately, this in-effect might allow this to work as expected.

Considering that pypicloud is a common component these days, we might need to look at how we can better support it. On the other hand however, PEP 503 does not define a mechanism for independent authentication for the file URLs. Typically, the authentication for the host domain is re-used.

Out of curiosity, are the domains different for the index and the file?

The url of pypi-cloud server and the file served from S3 are different by default, but there is an option to passthrough url. Which makes both of those URLs the same.

@cereblanco
Copy link
Contributor

cereblanco commented Dec 12, 2020

This is how I setup private pypi

  1. Edit pyproject.toml
[[tool.poetry.source]]
name = "myprivate_pypi"
url = "https://pypi.myprivate_pypi.com/simple/"
  1. At terminal, add poetry config credentials for private_pypi
    poetry config http-basic.myprivate_pypi <username> <password>

  2. Update lock with --no-update
    poetry lock --no-update

  3. Add your library that is found at private pypi or poetry install
    poetry add <my-packate-found-at-private>

@MeanderingCode let me know if this one works for you

@MeanderingCode
Copy link

This is how I setup private pypi

  1. Edit pyproject.toml
[[tool.poetry.source]]
name = "myprivate_pypi"
url = "https://pypi.myprivate_pypi.com/simple/"
  1. At terminal, add poetry config credentials for private_pypi
    poetry config http-basic.myprivate_pypi <username> <password>

  2. Update lock with --no-update
    poetry lock --no-update

  3. Add your library that is found at private pypi or poetry install
    poetry add <my-packate-found-at-private>

@MeanderingCode let me know if this one works for you

@cereblanco Thank you. I discovered this yesterday when looking at changes related to legacy repositories. Edited my comment on your PR.

@rizerzero
Copy link

rizerzero commented Jan 21, 2021

@cereblanco

Hi, I tested your method and it did not work for me.
When I use a version above 1.0.10 I get this message, it seems like poetry is trying to download the 'requests' package which is a dev dependency from my private repository 🤨. (is this an expected behaviour ?)

My server should not return a 500 error, but the previous versions were not trying to download other packages from my private repo.

500 Server Error: Internal Server Error for url: https://pypi.myprivaterepo.com/simple/requests/
 at /usr/local/lib/python3.6/site-packages/poetry/repositories/legacy_repository.py:393 in _get
      389│             if response.status_code == 404:
      390│                 return
      391│             response.raise_for_status()
      392│         except requests.HTTPError as e:
    → 393│             raise RepositoryError(e)
      394│
      395│         if response.status_code in (401, 403):
      396│             self._log(
      397│                 "Authorization error accessing {url}".format(url=url), level="warn"
The command '/bin/sh -c poetry lock --no-update' returned a non-zero code: 1 

Here is my pyproject.toml .

[tool.poetry]
name = "project"
version = "0.4.0"
description = "project"
authors = ["me <me@author.com>"]

[tool.poetry.dependencies]
python = "^3.6"
python-dotenv = "^0.10.3"
myprivatepackage = "^1.1.13"
fastapi = "^0.61.1"
pymongo = "^3.11.0"
uvicorn = "^0.12.2"
graphene-pydantic = "^0.2.0"

[tool.poetry.dev-dependencies]
pytest = "^3.0"
requests = "^2.25.1"

[[tool.poetry.source]]
name = "myprivaterepo"
url = "https://pypi.myprivaterepo.com/simple/"

[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"

@jensgustafsson
Copy link

@abn I am going to close this issue as it would appear that pypicloud have, since I last looked at it, defaulted to the behaviour which poetry is enforcing with regards to auth for package downloads. Rather than try to support the old behaviour which pip works with, I will update my pypicloud installation and make use of the new behaviour. Cheers for bearing with us.

I think we should consider reopening this issue as this problem still exists when using a pypi server that returns presigned urls to be used when fetching packages (for instance pypi-cloud)

Right now I'm forced to stick with poetry<1.1 due to this problem.

@rizerzero
Copy link

rizerzero commented Jan 25, 2021

@abn I am going to close this issue as it would appear that pypicloud have, since I last looked at it, defaulted to the behaviour which poetry is enforcing with regards to auth for package downloads. Rather than try to support the old behaviour which pip works with, I will update my pypicloud installation and make use of the new behaviour. Cheers for bearing with us.

Same for me, forced to stick with poetry<1.1 due to this problem.

@FlippAre
Copy link

Also experiencing the same problem. We have tried to configure pypicloud according to suggestions in this issue, but no luck. It's forcing us to stick to <1.1, which is unfortunate to leave all the great speed improvements of >1.1 on the table

@jensgustafsson
Copy link

I wouldn't mind helping out fixing this issue but then I need to know that the poetry community actually considers this as a bug.

I would also like some context. What was changed and why in 1.1 with regards to the download package functionality?

@MasterNayru
Copy link
Contributor Author

The fix is to make pypicloud stop returning pre-signed URLs to pip. Python package management tools are enough of a mess without having to assume that the auth for package information retrieval is different from the auth needed for package downloads. This is the way pypi.org works and having to assume that auth maybe is or is not the same is the kind of thing that makes these package manager tools an absolute mess.

The whole reason why it returned pre-signed URLs by default was because easy_install wouldn't work without it. I wish I was kidding. https://pypicloud.readthedocs.io/en/latest/topics/redirect_urls.html#redirect-detail

In pypicloud v1.0.14 (in a patch change, I guess they aren't doing the whole semver thing), https://pypicloud.readthedocs.io/en/latest/changes.html#id9 they changed the config value storage.redirect_urls to True. That changes the URLs returned to pip to actually be download URLs using the same server name for downloads as for API requests for package info and that require the same auth, which then makes poetry's auth behaviour work fine and makes pypicloud behave like pypi.org would.

I haven't needed to change this setting myself as I fixed the issue by just updating pypicloud to a version where it was set to True by default, but that is the setting you would need to change. Haven't had an issue with it since.

@jensgustafsson
Copy link

The fix is to make pypicloud stop returning pre-signed URLs to pip. Python package management tools are enough of a mess without having to assume that the auth for package information retrieval is different from the auth needed for package downloads. This is the way pypi.org works and having to assume that auth maybe is or is not the same is the kind of thing that makes these package manager tools an absolute mess.

The whole reason why it returned pre-signed URLs by default was because easy_install wouldn't work without it. I wish I was kidding. https://pypicloud.readthedocs.io/en/latest/topics/redirect_urls.html#redirect-detail

In pypicloud v1.0.14 (in a patch change, I guess they aren't doing the whole semver thing), https://pypicloud.readthedocs.io/en/latest/changes.html#id9 they changed the config value storage.redirect_urls to True. That changes the URLs returned to pip to actually be download URLs using the same server name for downloads as for API requests for package info and that require the same auth, which then makes poetry's auth behaviour work fine and makes pypicloud behave like pypi.org would.

I haven't needed to change this setting myself as I fixed the issue by just updating pypicloud to a version where it was set to True by default, but that is the setting you would need to change. Haven't had an issue with it since.

This was very interesting news! We're running pypi cloud 1.1.17 (The latest version) and running poetry lock does not work in any release after 1.0.10 of poetry. Would you mind sharing you pypi cloud config file?

Which version of pypi cloud are you using btw?

Our config looks like this:

[app:main]
use = egg:pypicloud

redirect_urls = true

pypi.fallback = cache
pypi.always_show_upstream = True
pypi.stream_files = True
pypi.package_max_age = 604800
pypi.storage = s3
storage.bucket = S3_BUCKET
storage.region_name = eu-west-1
pypi.db = redis
db.url = REDIS_URL

@jensgustafsson
Copy link

The fix is to make pypicloud stop returning pre-signed URLs to pip. Python package management tools are enough of a mess without having to assume that the auth for package information retrieval is different from the auth needed for package downloads. This is the way pypi.org works and having to assume that auth maybe is or is not the same is the kind of thing that makes these package manager tools an absolute mess.
The whole reason why it returned pre-signed URLs by default was because easy_install wouldn't work without it. I wish I was kidding. https://pypicloud.readthedocs.io/en/latest/topics/redirect_urls.html#redirect-detail
In pypicloud v1.0.14 (in a patch change, I guess they aren't doing the whole semver thing), https://pypicloud.readthedocs.io/en/latest/changes.html#id9 they changed the config value storage.redirect_urls to True. That changes the URLs returned to pip to actually be download URLs using the same server name for downloads as for API requests for package info and that require the same auth, which then makes poetry's auth behaviour work fine and makes pypicloud behave like pypi.org would.
I haven't needed to change this setting myself as I fixed the issue by just updating pypicloud to a version where it was set to True by default, but that is the setting you would need to change. Haven't had an issue with it since.

This was very interesting news! We're running pypi cloud 1.1.17 (The latest version) and running poetry lock does not work in any release after 1.0.10 of poetry. Would you mind sharing you pypi cloud config file?

Which version of pypi cloud are you using btw?

Our config looks like this:

[app:main]
use = egg:pypicloud

redirect_urls = true

pypi.fallback = cache
pypi.always_show_upstream = True
pypi.stream_files = True
pypi.package_max_age = 604800
pypi.storage = s3
storage.bucket = S3_BUCKET
storage.region_name = eu-west-1
pypi.db = redis
db.url = REDIS_URL

UPDATE: Things actually works! We were indeed running an old version of pypi cloud. After updating to 1.1.17 things actually started to work! 🌟

@MasterNayru
Copy link
Contributor Author

Great to hear that it is working for you. The setting you would have needed to set was storage.redirect_urls, not redirect_urls as you had it in the config you posted

@jensgustafsson
Copy link

Great to hear that it is working for you. The setting you would have needed to set was storage.redirect_urls, not redirect_urls as you had it in the config you posted

Thanks! Yes I actually figured that out after your previous reply 🙏 Thanks a lot!

@abn abn removed the status/triage This issue needs to be triaged label Mar 3, 2022
@voney
Copy link

voney commented Apr 29, 2022

I'm getting the exact same issue but using myget.org instead. The "solution" here is specific to pypicloud but the underlying issue of a package download being redirected to a different URL with auth baked in remains. I imagnee this issue will only grow as services use managed S3 style storage more and more.

Can this be re-opened and fixed "properly"?

abn added a commit to abn/poetry that referenced this issue Apr 29, 2022
This change refactors HTTP repository source implementations. The
following changes have been made.

- CacheControl cache now lives within Authenticator.
- Authenticator manages unique sessions for individual netloc.
- CacheControl usage now respects disable cache parameter in repos.
- Certificate and authentication logic is now managed solely within
  Authenticator for source repositories taking advantage of recent
  enhancements.

These changes should allow for better handling of cases like those
described in python-poetry#3041. Additionally, this forms the foundation for
unifying HTTP specific logic within the code base and possibly allowing
for migration of requests etc. if/when required.
@abn
Copy link
Member

abn commented Apr 29, 2022

@voney can you try the fix at #5518? That should "in theory" handle this better. If not, please create a new issue.

abn added a commit to abn/poetry that referenced this issue Apr 29, 2022
This change refactors HTTP repository source implementations. The
following changes have been made.

- CacheControl cache now lives within Authenticator.
- Authenticator manages unique sessions for individual netloc.
- CacheControl usage now respects disable cache parameter in repos.
- Certificate and authentication logic is now managed solely within
  Authenticator for source repositories taking advantage of recent
  enhancements.

These changes should allow for better handling of cases like those
described in python-poetry#3041. Additionally, this forms the foundation for
unifying HTTP specific logic within the code base and possibly allowing
for migration of requests etc. if/when required.
abn added a commit to abn/poetry that referenced this issue May 5, 2022
This change refactors HTTP repository source implementations. The
following changes have been made.

- CacheControl cache now lives within Authenticator.
- Authenticator manages unique sessions for individual netloc.
- CacheControl usage now respects disable cache parameter in repos.
- Certificate and authentication logic is now managed solely within
  Authenticator for source repositories taking advantage of recent
  enhancements.

These changes should allow for better handling of cases like those
described in python-poetry#3041. Additionally, this forms the foundation for
unifying HTTP specific logic within the code base and possibly allowing
for migration of requests etc. if/when required.
abn added a commit to abn/poetry that referenced this issue May 6, 2022
This change refactors HTTP repository source implementations. The
following changes have been made.

- CacheControl cache now lives within Authenticator.
- Authenticator manages unique sessions for individual netloc.
- CacheControl usage now respects disable cache parameter in repos.
- Certificate and authentication logic is now managed solely within
  Authenticator for source repositories taking advantage of recent
  enhancements.

These changes should allow for better handling of cases like those
described in python-poetry#3041. Additionally, this forms the foundation for
unifying HTTP specific logic within the code base and possibly allowing
for migration of requests etc. if/when required.
abn added a commit to abn/poetry that referenced this issue May 7, 2022
This change refactors HTTP repository source implementations. The
following changes have been made.

- CacheControl cache now lives within Authenticator.
- Authenticator manages unique sessions for individual netloc.
- CacheControl usage now respects disable cache parameter in repos.
- Certificate and authentication logic is now managed solely within
  Authenticator for source repositories taking advantage of recent
  enhancements.

These changes should allow for better handling of cases like those
described in python-poetry#3041. Additionally, this forms the foundation for
unifying HTTP specific logic within the code base and possibly allowing
for migration of requests etc. if/when required.
neersighted pushed a commit that referenced this issue May 7, 2022
This change refactors HTTP repository source implementations. The
following changes have been made.

- CacheControl cache now lives within Authenticator.
- Authenticator manages unique sessions for individual netloc.
- CacheControl usage now respects disable cache parameter in repos.
- Certificate and authentication logic is now managed solely within
  Authenticator for source repositories taking advantage of recent
  enhancements.

These changes should allow for better handling of cases like those
described in #3041. Additionally, this forms the foundation for
unifying HTTP specific logic within the code base and possibly allowing
for migration of requests etc. if/when required.
Copy link

github-actions bot commented Mar 2, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Something isn't working as expected
Projects
None yet
Development

No branches or pull requests