add prefetching of index in PEP503 repositories #5442

tgolsson · 2022-04-12T16:11:19Z

Pull Request Check List

Resolves: #4885, partially

Added tests for changed code.
Updated documentation for changed code.

Description

This PR adds a new keyword indexed when defining a LegacyRepository (e.g. PEP503) source. The indexed keyword enables the use of a prefetched and cached index, which will limit the amount of unnecessary calls.

Currently, if one configures a secondary repository for Poetry it'll get queried for all dependencies, no matter whether it's default, primary, or secondary. For projects with lots of dependencies (transitive or direct) this leads to a lot of unnecessary calls to a host which can't serve the requested package. One such case is using GPU-based packages from https://download.pytorch.org/whl/, where most other packages should be served from Pypi. This leads to user confusion due to error messages; and takes a lot of time.

While update-time on a cold cache is dominated by downloading every possible GPU package; this PR changes noop poetry update time from 30-40 seconds to <5 seconds. In total, this repository has 89 dependencies (reported by poetry show).

Also, it removes all errors from querying subpages that don't exist.

This depends on a PR to poetry-core, and thus a release there: python-poetry/poetry-core#323

Timing

Poetry==1.2.0b1

===> multitime results
1: poetry update
            Mean        Std.Dev.    Min         Median      Max
real        36.818      2.927       32.470      36.176      44.032
user        3.478       0.040       3.408       3.480       3.568
sys         0.126       0.030       0.075       0.123       0.181

This PR

===> multitime results
1: poetry update
            Mean        Std.Dev.    Min         Median      Max
real        4.617       0.211       4.410       4.588       5.165
user        3.362       0.162       3.235       3.301       3.813
sys         0.097       0.027       0.059       0.099       0.151

abn

Can we get similar results by applying an functools.lru_cache on _get_page instead?

tgolsson · 2022-04-12T18:55:05Z

@abn Nope! So the problem here is that if you ask pytorch.org where to get foobar, it's going to throw an error back at you. No manner of caching is going to change that -- except maybe cross-session, and we'd still have to get the error once. This asks pytorch.org "what do you have" and then we only ask for those things.

abn · 2022-04-12T18:58:12Z

@tgolsson I should really read the descriptions fully before looking at code 🤣

tgolsson · 2022-04-12T19:01:21Z

No worries! For some context for why I landed on this solution:

I did investigate alternative methods such as "explicit" sources, not querying secondary sources by default, etc. However, changing how sources are combined felt like a larger/potentially breaking or verbose change compared to relying on PEP503 behaviours. I do think it's weird that default/primary/secondary sources seemingly are mostly treated the same way, but again, anything changing that is a breaking change.

This is 100% backwards compatible, a pure speed-up, and purely opt-in. The only way to get "worse" results than today is to opt-in to indexing for a repository that doesn't have an index.

lovesegfault · 2022-04-14T22:18:59Z

This is awesome!

tgolsson · 2022-05-03T16:25:26Z

@abn What's the release-process for poetry-core so the dependency constraint can be updated here?

abn · 2022-05-04T16:53:00Z

@tgolsson I am aiming for a new core release this week; so once that lands we can rebase this.

abn · 2022-05-24T11:47:53Z

This can now be rebased.

tgolsson · 2022-05-26T18:46:30Z

@abn Rebased! I'm going to need to check how the docs render since that had changed quite a bit.

github-actions · 2022-05-26T18:55:19Z

Deploy preview for website ready!

✅ Preview
https://website-ds65cso7d-python-poetry.vercel.app

Built with commit a0c1846.
This pull request is being automatically deployed with vercel-action

tgolsson · 2022-05-26T18:59:37Z

Feels like a natural flow:

tgolsson · 2022-05-26T19:04:23Z

If anyone watching this PR has a known use-case for indexing; I'd love to know if it works for you! I've tested against torch and pypi, plus of course unit tests -- but I'm sure there are other cases out there that have... interesting configurations that I'm not dealing with correctly.

The new indexed keyword introduced for pre-fetching legacy repositories was not known the Source class.

Jinior · 2022-06-03T14:36:03Z

I just tried out your branch and it is working great. I can finally install PyTorch using poetry. Thank you for the work.

I do however get an error when I try to add a repository to pyproject.toml using poetry source add if the pyproject.toml contains the indexed keyword.

I made a pull request into tgolsson:ts/prefetch-legacy-repository. As far as I can see that fixed the problem.

Error

Command ran: poetry source add mmcv https://download.openmmlab.com/mmcv/dist/cu115/torch1.11.0/index.html

TypeError

  Source.__init__() got an unexpected keyword argument 'indexed'

  at src/poetry/poetry.py:78 in <listcomp>
       74│         return self
       75│
       76│     def get_sources(self) -> list[Source]:
       77│         return [
    →  78│             Source(**source)
       79│             for source in self.pyproject.poetry_config.get("source", [])
       80│         ]
       81│

pyproject.toml

[tool.poetry]
name = "test"
version = "0.1.0"
description = ""
authors = ["test@test.test"]
readme = "README.md"

[tool.poetry.dependencies]
python = "~3.8"
numpy = "^1.22"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu115/"
default = false
secondary = true
indexed = true

docs/repositories.md

src/poetry/repositories/link_sources/html.py

Co-authored-by: Bjorn Neergaard <bjorn@neersighted.com>

Add indexed keyword to Source class

tgolsson · 2022-06-03T21:19:44Z

Thanks @neersighted for the feedback, and @Jinior for testing and the PR!

tgolsson · 2022-06-07T16:14:18Z

@abn / @neersighted I'd missed the follow up to the convo but went ahead and did some restructuring; which I do think made it clearer! I don't think it's exactly what you asked for with a "SimpleIndexedRepositoryPage", though -- but I'm not sure I see how that would work.

tgolsson · 2022-06-29T18:41:02Z

Please let me know if there's anything I can do to get this merged...

vikigenius · 2022-07-16T04:09:41Z

@tgolsson what do you mean when you say that this only partially resolves #4885

This should solve it right since you say

Also, it removes all errors from querying subpages that don't exist.

tgolsson · 2022-07-16T08:07:54Z

@vikigenius It does solve it if users opt in. So for some users it's a perfect fix but otherwise it has no effect.

strangemonad · 2022-07-19T00:48:57Z

@tgolsson is this branch pinning the correct version of poetry core? I tried to run this against our private gitlab package registry and ran into the following

TypeError

  Link.__init__() got multiple values for argument 'requires_python'

  at ~/.local/pipx/venvs/poetry@pr5442/lib/python3.10/site-packages/poetry/repositories/link_sources/html.py:37 in links
       33│                 href = anchor.get("href")
       34│                 url = self.clean_link(urllib.parse.urljoin(self._url, href))
       35│                 pyrequire = anchor.get("data-requires-python")
       36│                 pyrequire = unescape(pyrequire) if pyrequire else None
    →  37│                 link = Link(url, self, requires_python=pyrequire)
       38│
       39│                 if link.ext not in self.SUPPORTED_FORMATS:
       40│                     continue

see the second argument, self, to Link.__init__ which clashes with requires_python? pipx installed poetry core 1.1.0b3 for me.

when I try to run with this branch without setting index=true I get a stack trace when poetry gets a 404 from pypi for a private package.

tgolsson · 2023-02-13T09:26:57Z

I'm abandoning this PR as we've adopted PDM instead :-)

github-actions · 2024-02-29T01:27:19Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

tgolsson mentioned this pull request Apr 12, 2022

support indexed legacy repositories python-poetry/poetry-core#323

Merged

2 tasks

abn reviewed Apr 12, 2022

View reviewed changes

tgolsson mentioned this pull request Apr 21, 2022

repository: do not call secondary repositories if package is find in primary ones #5472

Closed

2 tasks

abn mentioned this pull request Apr 27, 2022

Adding a source even with secondary true take precedence #4704

Closed

3 tasks

tgolsson mentioned this pull request May 2, 2022

pytorch and poetry #4231

Closed

This was referenced May 10, 2022

1.2.0 Release #5586

Closed

Poetry doesn't try public pypi when private pypi included #3855

Closed

repository.secondary=true + dependency.source broken? #5122

Closed

neersighted added this to the 1.2 milestone May 18, 2022

abn mentioned this pull request May 24, 2022

schema: validate source objects in poetry #5678

Merged

add prefetching of index in legacy repositories

1437a69

tgolsson force-pushed the ts/prefetch-legacy-repository branch from 0118da1 to 1437a69 Compare May 26, 2022 18:45

abn added the area/docs Documentation issues/improvements label May 26, 2022

fix mypy post rebase

d57cc29

tgolsson and others added 2 commits May 26, 2022 21:07

pre-commit is lazy

66c2926

Add indexed keyword to Source dataclass

b404d7a

The new indexed keyword introduced for pre-fetching legacy repositories was not known the Source class.

neersighted reviewed Jun 3, 2022

View reviewed changes

docs/repositories.md Outdated Show resolved Hide resolved

neersighted reviewed Jun 3, 2022

View reviewed changes

src/poetry/repositories/link_sources/html.py Outdated Show resolved Hide resolved

tgolsson and others added 3 commits June 3, 2022 23:16

Update src/poetry/repositories/link_sources/html.py

122b02e

Co-authored-by: Bjorn Neergaard <bjorn@neersighted.com>

Update docs/repositories.md

4f003a2

Co-authored-by: Bjorn Neergaard <bjorn@neersighted.com>

Merge pull request #1 from Jinior/ts/prefetch-legacy-repository

320aa9b

Add indexed keyword to Source class

update from feedback

e85e1c5

tgolsson added 3 commits June 7, 2022 18:15

must be there

463d184

canonicalize

174aca4

move import

a0c1846

tgolsson mentioned this pull request Jun 8, 2022

try fix installs on MacOS EmbarkStudios/emote#37

Closed

Secrus requested review from abn and neersighted June 21, 2022 07:12

neersighted mentioned this pull request Sep 5, 2022

Avoid the deprecated JSON API #6081

Merged

neersighted added area/solver Related to the dependency resolver impact/changelog Requires a changelog entry labels Sep 5, 2022

neersighted modified the milestones: 1.2, 1.3 Sep 5, 2022

neersighted mentioned this pull request Oct 25, 2022

6713 Introduce supplemental package source #6879

Merged

2 tasks

Secrus modified the milestones: 1.3, 1.4 Dec 12, 2022

tgolsson closed this Feb 13, 2023

tgolsson deleted the ts/prefetch-legacy-repository branch February 13, 2023 09:27

github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add prefetching of index in PEP503 repositories #5442

add prefetching of index in PEP503 repositories #5442

tgolsson commented Apr 12, 2022 •

edited

Loading

abn left a comment

tgolsson commented Apr 12, 2022

abn commented Apr 12, 2022

tgolsson commented Apr 12, 2022

lovesegfault commented Apr 14, 2022

tgolsson commented May 3, 2022

abn commented May 4, 2022

abn commented May 24, 2022

tgolsson commented May 26, 2022

github-actions bot commented May 26, 2022 •

edited

Loading

tgolsson commented May 26, 2022

tgolsson commented May 26, 2022

Jinior commented Jun 3, 2022 •

edited

Loading

tgolsson commented Jun 3, 2022

tgolsson commented Jun 7, 2022

tgolsson commented Jun 29, 2022

vikigenius commented Jul 16, 2022

tgolsson commented Jul 16, 2022

strangemonad commented Jul 19, 2022

tgolsson commented Feb 13, 2023

github-actions bot commented Feb 29, 2024

add prefetching of index in PEP503 repositories #5442

add prefetching of index in PEP503 repositories #5442

Conversation

tgolsson commented Apr 12, 2022 • edited Loading

Pull Request Check List

Description

Timing

Poetry==1.2.0b1

This PR

abn left a comment

Choose a reason for hiding this comment

tgolsson commented Apr 12, 2022

abn commented Apr 12, 2022

tgolsson commented Apr 12, 2022

lovesegfault commented Apr 14, 2022

tgolsson commented May 3, 2022

abn commented May 4, 2022

abn commented May 24, 2022

tgolsson commented May 26, 2022

github-actions bot commented May 26, 2022 • edited Loading

tgolsson commented May 26, 2022

tgolsson commented May 26, 2022

Jinior commented Jun 3, 2022 • edited Loading

Error

pyproject.toml

tgolsson commented Jun 3, 2022

tgolsson commented Jun 7, 2022

tgolsson commented Jun 29, 2022

vikigenius commented Jul 16, 2022

tgolsson commented Jul 16, 2022

strangemonad commented Jul 19, 2022

tgolsson commented Feb 13, 2023

github-actions bot commented Feb 29, 2024

tgolsson commented Apr 12, 2022 •

edited

Loading

github-actions bot commented May 26, 2022 •

edited

Loading

Jinior commented Jun 3, 2022 •

edited

Loading