Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing credentials when attempting to install dependency from private Gitlab project #5955

Open
3 tasks done
cbuffett opened this issue Jul 5, 2022 · 10 comments
Open
3 tasks done
Labels
kind/bug Something isn't working as expected status/triage This issue needs to be triaged

Comments

@cbuffett
Copy link

cbuffett commented Jul 5, 2022

  • I am on the latest Poetry version.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).
  • Ubuntu 20,04 WSL2:
  • 1.2.0b2:
[tool.poetry]
name = "<package_name>"
version = "0.0.0"
description = "test"
authors = ["Chris Buffett <email@domain.com>"]
license = "Proprietary"

[tool.poetry.dependencies]
python = "^3.7"
requests = "^2.25.1"
dask = "^2021.4.1"
fsspec = "^2021.4.0"
partd = "^1.2.0"
<private_package_name> = {git = https://<private_gitlab_repo>/<private_package>.git", rev = "master"}

[tool.poetry.extras]
dask = ["dataframe"]

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Issue

I have a package I'm using poetry to build and version. This private package has a dependency on another private package in a different repo. This other private package is not published to a package registry, so in order to access it, I've set up basic HTTP credentials to the git project directly (as recommended in #5567)

poetry config repositories.gitlab https://<private_gitlab_repo>/<private_package>.git
poetry config http-basic.gitlab <gitlab_deploy_token> <password>

This works fine when building/installing the package locally using poetry build and poetry install. However, when attempting to install my package as a dependency in an Airflow Docker image, the install of the private package fails due to missing authentication credentials. I'm using pip to install my package inside the Docker image, so it doesn't appear to have any notion of the Poetry credentials that were configured locally when building the package.

#12 84.81   Collecting <private_package_name>@ git+https://<private_gitlab_repo>/<private_package>.git@master
#12 84.81     Cloning https://<private_gitlab_repo>/<private_package>.git (to revision master) to /tmp/pip-resolver-gg2sgq6r/<private_package_name>
#12 84.82     Running command git clone --filter=blob:none -q https://<private_gitlab_repo>/<private_package>.git /tmp/pip-resolver-gg2sgq6r/<private_package_name>
#12 85.32     fatal: could not read Username for 'https://<private_gitlab_repo>': No such device or address
#12 85.32 Traceback (most recent call last):
#12 85.32   File "/home/airflow/.local/bin/pip-compile", line 8, in <module>
#12 85.32     sys.exit(cli())
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
#12 85.32     return self.main(*args, **kwargs)
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/click/core.py", line 782, in main
#12 85.32     rv = self.invoke(ctx)
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
#12 85.32     return ctx.invoke(self.callback, **ctx.params)
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
#12 85.32     return callback(*args, **kwargs)
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
#12 85.32     return f(get_current_context(), *args, **kwargs)
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/piptools/scripts/compile.py", line 487, in cli
#12 85.32     results = resolver.resolve(max_rounds=max_rounds)
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/piptools/resolver.py", line 266, in resolve
#12 85.32     has_changed, best_matches = self._resolve_one_round()
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/piptools/resolver.py", line 356, in _resolve_one_round
#12 85.32     their_constraints.extend(self._iter_dependencies(best_match))
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/piptools/resolver.py", line 451, in _iter_dependencies
#12 85.32     dependencies = self.repository.get_dependencies(ireq)
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/piptools/repositories/pypi.py", line 251, in get_dependencies
#12 85.32     download_dir, ireq, wheel_cache
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/piptools/repositories/pypi.py", line 213, in resolve_reqs
#12 85.32     results = resolver._resolve_one(reqset, ireq)
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/pip/_internal/resolution/legacy/resolver.py", line 379, in _resolve_one
#12 85.32     dist = self._get_dist_for(req_to_install)
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/pip/_internal/resolution/legacy/resolver.py", line 332, in _get_dist_for
#12 85.32     dist = self.preparer.prepare_linked_requirement(req)
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 482, in prepare_linked_requirement
#12 85.32     return self._prepare_linked_requirement(req, parallel_builds)
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 528, in _prepare_linked_requirement
#12 85.32     link, req.source_dir, self._download, self.download_dir, hashes
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 190, in unpack_url
#12 85.32     unpack_vcs_link(link, location)
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 65, in unpack_vcs_link
#12 85.32     vcs_backend.unpack(location, url=hide_url(link.url))
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/pip/_internal/vcs/versioncontrol.py", line 598, in unpack
#12 85.32     self.obtain(location, url=url)
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/pip/_internal/vcs/versioncontrol.py", line 512, in obtain
#12 85.32     self.fetch_new(dest, url, rev_options)
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/pip/_internal/vcs/git.py", line 269, in fetch_new
#12 85.32     dest,
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/pip/_internal/vcs/versioncontrol.py", line 649, in run_command
#12 85.32     stdout_only=stdout_only,
#12 85.32   File "/home/airflow/.local/lib/python3.7/site-packages/pip/_internal/utils/subprocess.py", line 254, in call_subprocess
#12 85.32     raise InstallationSubprocessError(proc.returncode, command_desc)
#12 85.32 pip._internal.exceptions.InstallationSubprocessError: Command errored out with exit status 128: git clone --filter=blob:none -q https://<private_gitlab_repo>/<private_package>.git /tmp/pip-resolver-gg2sgq6r/<private_package_name> Check the logs for full command output.

I've tried setting environment variables (POETRY_HTTP_BASIC_GITLAB_USERNAME/POETRY_HTTP_BASIC_GITLAB_PASSWORD) for my private repo inside the Dockerfile, but this seems to be ignored by pip. I've tried hardcoding the private repo deploy_token/password in the git URL inside pyproject.toml, but this results in an error saying the git URL is invalid when attempting to build the package

poetry build
Building <package_name> (0.0.0)

Invalid git url "https://<deploy_token>:<password>@<private_gitlab_repo>/<private_package>.git"

This appears somewhat related to #2062. While the pre-release fix (#5567) seems to address the issue of supporting deploy tokens when adding HTTP basic authentication credentials for private gitlab projects, the absence of these credentials in the dependency URL itself prevents pip from accessing the package.

@cbuffett cbuffett added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Jul 5, 2022
@jaklan
Copy link

jaklan commented Jul 25, 2022

Bump this one, #5567 doesn't really solve the issue - as mentioned above, it works only when using poetry, which makes a package unusable if you want to specify it as a pip dependency.

@neersighted
Copy link
Member

My post on the linked PR (please don't start discussion there where it is hard to find and tangential):

Generally I would avoid bundling private Git deps, and if you have to SSH keys would be the preferred way to solve this. I'm really not sure that a 'controlled' option to leak credentials into the lockfile (and thus built artifacts) is a good idea as it's incredibly likely to bite more users than it helps.

Also, the 'correct' way to solve this would be uploading your packages to a private registry -- private Git deps is not a great option for this workflow. I do think that just because something is possible as a workflow, doesn't mean it's necessarily a good one. Maybe beefing up our docs to suggest against using Git deps for private packages would be helpful.

Regardless, let's take discussion back to #5955 as this is not a great forum/place for a feature request.

@jaklan
Copy link

jaklan commented Jul 25, 2022

@neersighted

  • SSH is of course a safer option, but if I don't miss any options - it would be pretty problematic to apply here, because in GitLab you can either:

    • add SSH keys of CI runners, servers to your personal account - not a real solution for a project setup
    • add them as deploy keys for a given project - you would need to add your keys in all dependencies' repos, which means you have to each time collaborate with Maintainers of all the dependencies, quite unmaintainable
  • that option is actually widely available anyway - it works with pip, it works with git, it even works with poetry when you add the entry manually. Ofc it's not the safest solution, there is even a 5-year issue in pip repo related to that:
    Please add support for credentials via environment variables pypa/pip#4789
    but for some reason it's still not resolved... I wouldn't say blocking people from specifying the dependency together with the token would help in any way - it just generates an additional problem without any obvious alternative, so we could end up with people adding the problematic dependencies with pip install after running poetry install - which makes the whole setup even worse.

  • I fully agree a private registry is the best solution here, but as I mentioned - usually you don't have a control of it for such dependencies, you can of course try to reach to proper people and ask for it, but it doesn't guarantee the success (especially within the expected timeline).

In general, imho that comment from @abn: #2062 (comment) is pretty reasonable here - so don't recommend that, maybe even provide some warning, but anyway - simply don't block people from doing that. It was also nicely summed up by @xinbinhuang here: #2062 (comment).

@neersighted
Copy link
Member

neersighted commented Jul 25, 2022

The direction we're going is pretty firmly against leaking credentials in the lockfile/distfiles: #3995 is likely going in to 1.3.

I'm unclear on what the workflow is, but it sounds like you may be part of a large organization where you can generate deploy tokens or use user credentials to access repos, but you cannot have them add SSH deploy keys or upload to a registry?

In that case I'd suggest use of gitcredentials, which pip should support during install.

I am still not sure of any use case where credentials in the lockfile/distfiles are the only solution, but please let me know if there's a situation where it's not usable. I firmly believe secrets should be kept in your CI server/deployment tooling and not in your repo/artifacts where they can be leaked easily.

@jaklan
Copy link

jaklan commented Jul 25, 2022

@neersighted

  • but even Add a warning if credentials are written in lock file #3995 is only about warnings, but it's not blocking - and I agree a warning in that scenario would be beneficial
  • about the workflow:
    • deploy keys - you would need to add them each time in each related repo when someone wants to enable installing such dependencies via SSH in their CI / server, it simply doesn't make sense (ofc you can just share a common SSH key among projects, but then it's nothing more than overcomplicated deploy token, which is even more pointless
      🙃)
    • deploy tokens - you can just generate them "once" (let's skip token rolling for simplification) with only read_repository scope enabled, so it can be utilised by all the people in the company then. Of course there's a potential risk of a leak - if your self-hosted instance is accessible from the Internet, someone outside the company would be able to pull the affected repository having the token. If it's hidden behind VPN - it could be acceptable though
    • packages - yes, we talk about the scenario when you cannot enforce on people to do proper packaging
  • about gitcredentials - that's actually interesting, wasn't aware of that one. Seems that sth like the below:
    https://stackoverflow.com/a/43022442/6534668
    could be a workaround, but from the other side - the most convenient solution would be to utilise a common deploy token anyway, because AFAIK project access tokens allow to pull packages from other internal registries, but doesn't allow to clone other internal repositories (which I find personally quite strange, but maybe I miss something). Of course we don't have the token in git anymore, which is definitely beneficial, but if we want to keep it shareable - it's pretty easy to leak it anyway, just in different ways.

My point is - even if we don't allow to provide tokens via poetry add, there are many other security gotchas, we wouldn't save the world anyway. Instead of that, we can actually complicate the life for people who need that functionality for whatever reason. And putting all the security deliberations aside - the whole issue is simply about not failing when running poetry add to follow the behaviour of pip and git. Only that, because as mentioned - it simply works when added manually to pyproject.toml.

Personally - I really believe the warning message, maybe even with some note like consider usage of gitcredentials instead, would be a sufficient compromise. And the end of the day - we're in the Python world, so it's easier to ask for forgiveness than permission + we are all consenting adults 😉

@cbuffett
Copy link
Author

@neersighted You're correct in that I work for a large organization where it can be difficult to get packages uploaded to a private registry (additional CI logic/maintenance). Using deploy keys may be a simpler option, but would require working with each dependent project to accept said key as alluded to by @jaklan. I will look into the gitcredentials route as an alternative in the meantime.

@neersighted
Copy link
Member

neersighted commented Jul 26, 2022

I do agree that the ask -- letting Poetry validate URLs with your credentials in them is simple. However, there is a reason we haven't fixed this long-standing bug and instead have added more alternatives -- you may be comfortable embedding secrets into your repository/history/build artifacts, but the fact that these files are sensitive (when with other projects/ecosystems they are not) violates the element of least surprise.

An explicit gitcredentials or other alternatives (env vars for poetry install, etc etc) require the programmer to explicitly make the secrets available to the CI/deployment environment, and separates the handling of secrets from the handling of code.

I think there is room to improve the docs here (e.g. adding a FAQ entry about how to handle private Git deps if you cannot use the preferred alternative of a registry), as well as food for thought for the PR warning on credentials in the lockfile (eventually, we want to make it an error) -- but I don't think making it easier to embed credentials (or encouraging its use) as it currently stands does anything to improve the situation.

It may make your specific use case easier, but I am very worried that wider adoption of such a workflow is a footgun for the whole ecosystem.

@jaklan
Copy link

jaklan commented Jul 26, 2022

@neersighted

but I am very worried that wider adoption of such a workflow is a footgun for the whole ecosystem

Do you really think simply non-failing poetry add command (without any additional mentions in docs about it) would have any real impact on the adoption of it? If we could achieve it with git for years, if we could achieve it with pip for years - and we still can - if there are dozens of SO posts and issues mentioning such commands... the source of adoption is simply in another place. People don't try to do it with poetry because they found out a bug and decided to make it useful, but because they already know that option from the "pre-poetry" world.

I think there is room to improve the docs here (e.g. adding a FAQ entry about how to handle private Git deps if you cannot use the preferred alternative of a registry)

Fully agree, I would say even more - it's just crucial here. If you really want to change people mindset, extensive docs would be much more beneficial than just blocking people of doing sth without any obvious alternative (how much people would think about gitcredentials by themselves instead of falling back to pip install?). What's also important - such docs cannot be only poetry-oriented - #5567 can be mentioned, but it needs a clear statement it wouldn't work if someone tries to install your package as a dependency with pip.

Having that, we would be able to utilise it in a warning / error message when trying to run poetry add with a token. Now, when it just fails with some generic error message, which makes you googling it and looking at multiple issues in GH - we just make it very confusing. So my point is - such docs should have a really high priority then.

How I personally would imagine the roadmap here:

  1. Fix the bug in poetry add (don't overthink it yet - under-the-hood, with a manual entry in pyproject.toml, it works anyway, so just treat it as a bug, not an accidental security mechanism)
  2. Create extensive docs about the recommended solutions
  3. Provide warning messages starting from e.g. 1.3, pointing to the above docs

and then, if you really decide to make it an error one day:

  1. Add to the warning information it will become an error in e.g. 1.4
  2. Give people some time to understand the change and get familiar with the alternatives
  3. Release 1.4 and make it an actual error

Imho such a workflow allows you to achieve the final goal, but it also takes into account users' perspective (and emotions) and make the "mindset changing" simply easier to introduce, without unnecessary irritation or even frustration.

@cbuffett
Copy link
Author

cbuffett commented Aug 4, 2022

I've confirmed that using gitcredentials (by running additional git commands in my Dockerfile to build the gitcredentials file) works and allows the downstream pip/poetry calls to access the private repo. So this is a workaround, albeit a not particularly graceful one. The deploy token is stored in the Dockerfile (this could be moved to a CI variable, though local non-CI builds would still need this provided from somewhere), though I suppose this is no different than having it stored in pyproject.toml and/or poetry.lock. At least with our use of multistage builds, the credentials do not persist into the final image.

@jaklan
Copy link

jaklan commented Aug 4, 2022

@cbuffett have you tried with envars?
https://stackoverflow.com/a/43022442/6534668

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working as expected status/triage This issue needs to be triaged
Projects
None yet
Development

No branches or pull requests

3 participants