Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git dependencies' submodules with relative URLs handled incorrectly (regression from 1.1) #6499

Closed
3 tasks done
adamgreig opened this issue Sep 13, 2022 · 19 comments · Fixed by #7017 or #8020
Closed
3 tasks done
Labels
area/vcs Related to support for VCS dependencies (Git and Dulwich) kind/bug Something isn't working as expected status/confirmed Issue is reproduced and confirmed

Comments

@adamgreig
Copy link

  • I am on the latest Poetry version.

  • I have searched the issues of this repo and believe that this is not a duplicate.

  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).

  • OS version and name: Ubuntu 20.04

  • Poetry version: 1.2.0

Issue

I have a project with a git dependency. The dependency uses git submodules, and the dependency's .gitmodules file contains a submodule using a relative URL, e.g.:

[submodule "foo/bar"]
    path = foo/bar
    url = ../bar.git

When Poetry attempts to install the dependency, it clones the repository and then attempts to clone each submodule, but it passes the raw relative URL from the configuration to _clone() and thus _fetch_remote_refs(), which calls Dulwich's get_transport_and_path(), which decides the URL is for a local repository so it attempts to find ../bar.git on the filesystem, which fails. I think it should instead detect relative URLs and append them to the root repository's URL so that Dulwich will realise it's a remote repo and return the appropriate client.

If I clone the dependency using git on the command line, with --recurse-submodules, git correctly uses a relative URL to fetch the submodule. If I force Poetry to use the legacy system git instead of Dulwich, it doesn't seem to clone the submodules at all, so also doesn't run into trouble (in this case the dependency's submodule isn't needed to install the dependency, just for testing). Similarly this also worked OK on poetry 1.1 which I think used the system git to recursively clone submodules.

It looks like this is from #5428.

Here's a representative traceback:

Traceback
$ poetry lock -vvv
Using virtualenv: /home/x/.cache/pypoetry/virtualenvs/baz-Nz-qS6ut-py3.8
Updating dependencies
Resolving dependencies...
   1: fact: baz is 0.1.0
   1: derived: baz
Cloning ssh://git@github.com/x/foo.git at 'master' to /home/x/.cache/pypoetry/virtualenvs/baz-Nz-qS6ut-py3.8/src/foo
   1: Version solving took 1.496 seconds.
   1: Tried 1 solutions.

Stack trace:

28 ~/.local/lib/python3.8/site-packages/cleo/application.py:329 in run
327│
328│ try:
→ 329│ exit_code = self._run(io)
330│ except Exception as e:
331│ if not self._catch_exceptions:

27 ~/.local/lib/python3.8/site-packages/poetry/console/application.py:185 in _run
183│ self._load_plugins(io)
184│
→ 185│ exit_code: int = super()._run(io)
186│ return exit_code
187│

26 ~/.local/lib/python3.8/site-packages/poethepoet/plugin.py:249 in _run
247│ tokens.insert(task_name_index, "--")
248│
→ 249│ continue_run(self, io)
250│
251│ # Apply the patch

25 ~/.local/lib/python3.8/site-packages/cleo/application.py:423 in _run
421│ io.input.set_stream(stream)
422│
→ 423│ exit_code = self._run_command(command, io)
424│ self._running_command = None
425│

24 ~/.local/lib/python3.8/site-packages/cleo/application.py:465 in _run_command
463│
464│ if error is not None:
→ 465│ raise error
466│
467│ return event.exit_code

23 ~/.local/lib/python3.8/site-packages/cleo/application.py:449 in _run_command
447│
448│ if event.command_should_run():
→ 449│ exit_code = command.run(io)
450│ else:
451│ exit_code = ConsoleCommandEvent.RETURN_CODE_DISABLED

22 ~/.local/lib/python3.8/site-packages/cleo/commands/base_command.py:119 in run
117│ io.input.validate()
118│
→ 119│ status_code = self.execute(io)
120│
121│ if status_code is None:

21 ~/.local/lib/python3.8/site-packages/cleo/commands/command.py:83 in execute
81│
82│ try:
→ 83│ return self.handle()
84│ except KeyboardInterrupt:
85│ return 1

20 ~/.local/lib/python3.8/site-packages/poetry/console/commands/lock.py:54 in handle
52│ self.installer.lock(update=not self.option("no-update"))
53│
→ 54│ return self.installer.run()
55│

19 ~/.local/lib/python3.8/site-packages/poetry/installation/installer.py:111 in run
109│ self._execute_operations = False
110│
→ 111│ return self._do_install()
112│
113│ def dry_run(self, dry_run: bool = True) -> Installer:

18 ~/.local/lib/python3.8/site-packages/poetry/installation/installer.py:244 in _do_install
242│ source_root=self._env.path.joinpath("src")
243│ ):
→ 244│ ops = solver.solve(use_latest=self._whitelist).calculate_operations()
245│ else:
246│ self._io.write_line("Installing dependencies from lock file")

17 ~/.local/lib/python3.8/site-packages/poetry/puzzle/solver.py:73 in solve
71│ with self._provider.progress():
72│ start = time.time()
→ 73│ packages, depths = self._solve(use_latest=use_latest)
74│ end = time.time()
75│

16 ~/.local/lib/python3.8/site-packages/poetry/puzzle/solver.py:151 in _solve
149│
150│ try:
→ 151│ result = resolve_version(
152│ self._package, self._provider, locked=locked, use_latest=use_latest
153│ )

15 ~/.local/lib/python3.8/site-packages/poetry/mixology/init.py:24 in resolve_version
22│ solver = VersionSolver(root, provider, locked=locked, use_latest=use_latest)
23│
→ 24│ return solver.solve()
25│

14 ~/.local/lib/python3.8/site-packages/poetry/mixology/version_solver.py:127 in solve
125│ while next is not None:
126│ self._propagate(next)
→ 127│ next = self._choose_package_version()
128│
129│ return self._result()

13 ~/.local/lib/python3.8/site-packages/poetry/mixology/version_solver.py:446 in _choose_package_version
444│ package = locked
445│
→ 446│ package = self._provider.complete_package(package)
447│
448│ conflict = False

12 ~/.local/lib/python3.8/site-packages/poetry/puzzle/provider.py:556 in complete_package
554│ for r in requires:
555│ if r.is_direct_origin():
→ 556│ self.search_for_direct_origin_dependency(r)
557│
558│ optional_dependencies = []

11 ~/.local/lib/python3.8/site-packages/poetry/puzzle/provider.py:230 in search_for_direct_origin_dependency
228│ elif dependency.is_vcs():
229│ dependency = cast("VCSDependency", dependency)
→ 230│ package = self._search_for_vcs(dependency)
231│
232│ elif dependency.is_file():

10 ~/.local/lib/python3.8/site-packages/poetry/puzzle/provider.py:312 in _search_for_vcs
310│ and get the information we need by checking out the specified reference.
311│ """
→ 312│ package = self.get_package_from_vcs(
313│ dependency.vcs,
314│ dependency.source,

9 ~/.local/lib/python3.8/site-packages/poetry/puzzle/provider.py:342 in get_package_from_vcs
340│ raise ValueError(f"Unsupported VCS dependency {vcs}")
341│
→ 342│ return _get_package_from_git(
343│ url=url,
344│ branch=branch,

8 ~/.local/lib/python3.8/site-packages/poetry/puzzle/provider.py:96 in _get_package_from_git
94│ source_root: Path | None = None,
95│ ) -> Package:
→ 96│ source = Git.clone(
97│ url=url,
98│ source_root=source_root,

7 ~/.local/lib/python3.8/site-packages/poetry/vcs/git/backend.py:427 in clone
425│ if not cls.is_using_legacy_client():
426│ local = cls._clone(url=url, refspec=refspec, target=target)
→ 427│ cls._clone_submodules(repo=local)
428│ return local
429│ except HTTPUnauthorized:

6 ~/.local/lib/python3.8/site-packages/poetry/vcs/git/backend.py:356 in _clone_submodules
354│ continue
355│
→ 356│ cls.clone(
357│ url=url.decode("utf-8"),
358│ source_root=source_root,

5 ~/.local/lib/python3.8/site-packages/poetry/vcs/git/backend.py:426 in clone
424│ try:
425│ if not cls.is_using_legacy_client():
→ 426│ local = cls._clone(url=url, refspec=refspec, target=target)
427│ cls._clone_submodules(repo=local)
428│ return local

4 ~/.local/lib/python3.8/site-packages/poetry/vcs/git/backend.py:258 in _clone
256│ local = Repo(str(target))
257│
→ 258│ remote_refs = cls._fetch_remote_refs(url=url, local=local)
259│
260│ logger.debug(

3 ~/.local/lib/python3.8/site-packages/poetry/vcs/git/backend.py:201 in _fetch_remote_refs
199│
200│ with local:
→ 201│ result: FetchPackResult = client.fetch(
202│ path,
203│ local,

2 ~/.local/lib/python3.8/site-packages/dulwich/client.py:1501 in fetch
1499│
1500│ """
→ 1501│ with self._open_repo(path) as r:
1502│ refs = r.fetch(
1503│ target,

1 ~/.local/lib/python3.8/site-packages/dulwich/client.py:1423 in _open_repo
1421│ if not isinstance(path, str):
1422│ path = os.fsdecode(path)
→ 1423│ return closing(Repo(path))
1424│
1425│ def send_pack(self, path, update_refs, generate_pack_data, progress=None):

NotGitRepository

No git repository was found at ../bar.git

at ~/.local/lib/python3.8/site-packages/dulwich/repo.py:1090 in init
1086│ elif (os.path.isdir(os.path.join(root, OBJECTDIR))
1087│ and os.path.isdir(os.path.join(root, REFSDIR))):
1088│ bare = True
1089│ else:
→ 1090│ raise NotGitRepository(
1091│ "No git repository was found at %(path)s" % dict(path=root)
1092│ )
1093│
1094│ self.bare = bare

@adamgreig adamgreig added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Sep 13, 2022
@neersighted
Copy link
Member

Poetry has never recursively cloned submodules -- this isn't exactly a regression, but a new deficiency in the Dulwich Git implementation. I would encourage you to open a Dulwich issue for this as I do not believe there is anything to do from the Poetry end to make Dulwich understand these submodules as relative to the base remote.

@neersighted neersighted added status/external-issue Issue is caused by external project (platform, dep, etc) area/vcs Related to support for VCS dependencies (Git and Dulwich) and removed status/triage This issue needs to be triaged labels Sep 13, 2022
@adamgreig
Copy link
Author

adamgreig commented Sep 13, 2022

Thanks for looking into this! My understanding was that Poetry added code specifically to recursively clone submodules when using Dulwich, as it's not something Dulwich itself does:

if not cls.is_using_legacy_client():
local = cls._clone(url=url, refspec=refspec, target=target)
cls._clone_submodules(repo=local)
return local

Poetry then calls Dulwich's parse_submodules() method to get the submodule URLs, and then later passes those URLs to Git.clone() and thus to Dulwich, so perhaps Dulwich could modify the URLs so they can later be cloned, or otherwise provide some interface to help with this?

Edit: I guess jelmer/dulwich#506 tracks providing the interface Poetry would want to use to recursively clone a repository and its submodules, rather than Poetry having to figure this out by itself.

@neersighted
Copy link
Member

Ah, I see. We have a PR open to make the system Git client clone recursively, and I guess I got turned around on what the Dulwich implementation is doing.

It looks like we probably need to munge the submodule URLs ourselves then (and add recursive cloning to the system Git client), or remove recursive cloning, to match the behavior of the system Git client.

@neersighted neersighted added status/confirmed Issue is reproduced and confirmed and removed status/external-issue Issue is caused by external project (platform, dep, etc) labels Sep 13, 2022
@evanrittenhouse
Copy link
Contributor

@neersighted I can probably do this, though am new to Poetry so would need some guidance. What approach is preferable, munging the submodule URLs and adding recursive cloning, or simply removing recursive cloning?

@neersighted
Copy link
Member

neersighted commented Sep 15, 2022

@neersighted I can probably do this, though am new to Poetry so would need some guidance. What approach is preferable, munging the submodule URLs and adding recursive cloning, or simply removing recursive cloning?

Munging would be preferable in my opinion -- we don't want to regress new functionality unless there's no choice, and there's already a PR making the system git client execute a recursive clone merged.

@evanrittenhouse
Copy link
Contributor

@neersighted Do we still want to implement this, or would we rather just track the Dulwich PR mentioned above?

@neersighted
Copy link
Member

The linked Dulwich issue is an issue and not a PR, and barring someone from Dulwich commenting that they're interested in implementing it soon, I think we should handle this on the Poetry side.

@jelmer
Copy link
Contributor

jelmer commented Oct 6, 2022

The issue on the Dulwich side is for porcelain; poetry uses the plumbing from Dulwich directly. There's already a function in the plumbing for getting o list of submodules.

@evanrittenhouse
Copy link
Contributor

evanrittenhouse commented Oct 11, 2022

@neersighted can you please give some guidance on how to test this? The other submodule test simply checks to see if the submodule directory exists, which is simple enough. Given that the submodule specified in python-poetry/test-fixture-vcs-repository#5 is itself a branch of the main test-fixture-vcs-repository, the submodule does not exist in the /submodules folder. You can verify this by going checking here. This invalidates the approach taken by the aforementioned test.

The only difference between the submodule (standalone_branch) and the main branch is that the submodule's README.md differs. Trying to test the contents of the README is impossible when we clone the repo on main, since the README will obviously reflect main. As far as I can tell, there are no other testable behaviors. We want to ensure that the submodule can be instantiated properly, but I can't think of any other methods to determine if it has been.

Appreciate it in advance.

@neersighted
Copy link
Member

Hi @evanrittenhouse -- I'm not sure what you mean. Poetry should choke and die on the bad submodule (indeed, I forgot we couldn't merge this until we had the fix in, and broke CI for a bit there) -- you can just make your PR test against https://github.com/python-poetry/test-fixture-vcs-repository/tree/relative_submodule (I moved the commit there) instead of main, and we'll swap main back/fix up your PR right before merge.

Essentially, the existing tests capture this failure state -- you just need to make sure they pass/we don't choke on the URL.

@evanrittenhouse
Copy link
Contributor

Hey @neersighted, thanks for doing that - I see the submodule folder on the relative_submodule branch.

Sorry for all the questions + appreciate the patience. I haven't worked much with Git submodules, but if I understand them correctly we'd need to update .gitmodules to reflect the submodule's new location at relative_submodule rather than standalone_branch, right?

@neersighted
Copy link
Member

Unless @jelmer has a better idea, what I'd expect is this:

When cloning submodules, determine if any is a relative path instead of a URL. If so, resolve the relative path relative to the URL of the 'main' repository.

This would look like:

main: https://git.host/user/myrepo.git
submodule: ../relative-repo.git

result: https://git.host/user/relative-repo.git

Those URLs would need to be mutated after we use the Dulwich plumbing to list them, and before we pass the source URLs and destination paths back to Dulwich to clone.

@jelmer
Copy link
Contributor

jelmer commented Oct 17, 2022

Yep, exactly what @neersighted mentioned. I don't think you necessarily want the (current) porcelain, you probably want to open .submodules and call dulwich.config.parse_submodules() on the contents.

@evanrittenhouse
Copy link
Contributor

Apologies for the delay, work has been crazy. Commenting here so the issue doesn't get picked up by someone else as I'm basically code complete - will hopefully be able to submit a PR next weekend.

@gbooth27
Copy link

gbooth27 commented Feb 7, 2023

are there any updates on this?

@rmorshea
Copy link
Contributor

This has not been properly fixed by #7017. Unfortunately, urllib.parse only handles relative paths for URL protocols found in urllib.parse.uses_relative. It just so happens that ssh is not in uses_relative. As a result, the following unexpected behavior arises:

>>> from urllib.parse import urljoin
>>> urljoin("ssh://git@github.com/org/repo", "../other-repo")
'../other-repo'

Rather than the expected "ssh://git@github.com/org/other-repo".

The way to work around this would be to join the paths from a URL, and then recombine with the protocol:

>>> from urllib.parse import urlparse, urlunparse, urljoin
>>> repo_url = urlparse("ssh://git@github.com/org/repo")
>>> other_repo_url = repo_url._replace(path=urljoin(repo_url.path, "../other-repo"))
>>> urlunparse(other_repo_url)
'ssh://git@github.com/org/other-repo'

@rmorshea
Copy link
Contributor

rmorshea commented Jul 4, 2023

This issue should be re-opened.

@rmorshea
Copy link
Contributor

rmorshea commented Sep 13, 2023

@neersighted, as per the comment above, this should be re-opened. I had a fix ready some months back but it hasn't gotten any attention.

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/vcs Related to support for VCS dependencies (Git and Dulwich) kind/bug Something isn't working as expected status/confirmed Issue is reproduced and confirmed
Projects
None yet
6 participants