Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up build environment creation (re-usable, faster) #7294

Open
gaborbernat opened this issue Nov 4, 2019 · 15 comments
Open

Speed up build environment creation (re-usable, faster) #7294

gaborbernat opened this issue Nov 4, 2019 · 15 comments
Labels
C: PEP 517 impact Affected by PEP 517 processing help wanted For requesting inputs from other members of the community state: needs discussion This needs some more discussion type: enhancement Improvements to functionality type: performance Commands take too long to run

Comments

@gaborbernat
Copy link

gaborbernat commented Nov 4, 2019

Currently pip always creates all build environments from scratch, requiring network access for this. This causes a significant overhead of creating isolated build environments. Can we somehow speed-up this operation?

Idea: Creating isolated build environments from cached wheels only? Do not recreate isolated build environments from scratch, but have a master copy that we just copy.

@triage-new-issues triage-new-issues bot added the S: needs triage Issues/PRs that need to be triaged label Nov 4, 2019
@chrahunt
Copy link
Member

chrahunt commented Nov 4, 2019

Having a concrete target can help. Can you provide an example of a slow command, the time it took, and how long you're expecting it to take?

The network access issue can be overcome by invoking pip like pip install --no-index --find-links path/to/deps pkg after a previous invocation of pip wheel -w path/to/deps pkg. That should also remove build environment setup overhead since everything is built by the pip wheel step.

@chrahunt chrahunt added S: awaiting response Waiting for a response/more information type: support User Support labels Nov 4, 2019
@triage-new-issues triage-new-issues bot removed the S: needs triage Issues/PRs that need to be triaged label Nov 4, 2019
@gaborbernat
Copy link
Author

My issue raised here was exactly that pip wheel builds a build environment every-time from scratch via the netwrok; which is quite expensive and could be avoided.

python3.8 -m pip wheel -w . --no-deps .

Looking in indexes: http://localhost:3141/bernat/bb
Processing /Users/bgabor8/git/github/tox
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Building wheels for collected packages: tox
  Building wheel for tox (PEP 517) ... done
  Created wheel for tox: filename=tox-3.13.3.dev23+g6cb400e-py2.py3-none-any.whl size=80760 sha256=8cff1ac6b9a01ceac6a15d5d0c415a7d9414a847f866e414a6a224eb8ef9e31a
  Stored in directory: /Users/bgabor8/git/github/tox
Successfully built tox
env PIP_INDEX_URL=12321 python3.8 -m pip wheel -w . --no-deps . 

Looking in indexes: 12321
Processing /Users/bgabor8/git/github/tox
  Installing build dependencies ... error
  ERROR: Command errored out with exit status 1:
   command: /Users/bgabor8/.pyenv/versions/3.8.0/bin/python3.8 /Users/bgabor8/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pip install --ignore-installed --no-user --prefix /private/var/folders/kt/btg285ds5kx4l1k398lb7df80000gr/T/pip-build-env-cfk5t5ih/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i 12321 -- 'setuptools >= 40.0.4' 'setuptools_scm >= 2.0.0, <4' 'wheel >= 0.29.0'
       cwd: None
  Complete output (4 lines):
  Looking in indexes: 12321
  WARNING: Url '12321/setuptools/' is ignored. It is either a non-existing path or lacks a specific scheme.
  ERROR: Could not find a version that satisfies the requirement setuptools>=40.0.4 (from versions: none)
  ERROR: No matching distribution found for setuptools>=40.0.4
  ----------------------------------------

@no-response no-response bot removed the S: awaiting response Waiting for a response/more information label Nov 4, 2019
@chrahunt
Copy link
Member

chrahunt commented Nov 4, 2019

Thanks for the example! On my machine running this script I'm showing it takes about two seconds to build tox:

script.sh
#!/bin/sh
cd "$(mktemp -d)"
python -m venv venv
PYTHON="$PWD/venv/bin/python"
"$PYTHON" -V

git clone https://github.com/tox-dev/tox.git
cd tox
time "$PYTHON" -m pip wheel -w . --no-deps .
time "$PYTHON" -m pip wheel -w . --no-deps .
time "$PYTHON" -m pip wheel -w . --no-deps .
Output
$ ./repro.sh
Python 3.8.0
Collecting pip
  Using cached https://files.pythonhosted.org/packages/00/b6/9cfa56b4081ad13874b0c6f96af8ce16cfbc1cb06bedf8e9164ce5551ec1/pip-19.3.1-py2.py3-none-any.whl
Installing collected packages: pip
  Found existing installation: pip 19.2.3
    Uninstalling pip-19.2.3:
      Successfully uninstalled pip-19.2.3
Successfully installed pip-19.3.1
Cloning into 'tox'...
remote: Enumerating objects: 10807, done.
remote: Total 10807 (delta 0), reused 0 (delta 0), pack-reused 10807
Receiving objects: 100% (10807/10807), 10.30 MiB | 37.03 MiB/s, done.
Resolving deltas: 100% (3663/3663), done.
Processing /tmp/user/1000/tmp.hiF75TCOBr/tox
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Building wheels for collected packages: tox
  Building wheel for tox (PEP 517) ... done
  Created wheel for tox: filename=tox-3.14.1.dev6+g3980da7-py2.py3-none-any.whl size=80773 sha256=c0200254a4e10b86c1f1e17dbcb17f42a24202c08d0ff8a5227db6e29cb7ac40
  Stored in directory: /tmp/user/1000/tmp.hiF75TCOBr/tox
Successfully built tox
1.82user 0.19system 0:02.03elapsed 99%CPU (0avgtext+0avgdata 48488maxresident)k
352inputs+42280outputs (0major+112397minor)pagefaults 0swaps
Processing /tmp/user/1000/tmp.hiF75TCOBr/tox
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Building wheels for collected packages: tox
  Building wheel for tox (PEP 517) ... done
  Created wheel for tox: filename=tox-3.14.1.dev6+g3980da7-py2.py3-none-any.whl size=80773 sha256=d258f3485f842e82b8ba4ac05b69e2d057ec12477af0bf1969ec4b42c77a65d3
  Stored in directory: /tmp/user/1000/tmp.hiF75TCOBr/tox
Successfully built tox
1.74user 0.21system 0:01.95elapsed 100%CPU (0avgtext+0avgdata 48484maxresident)k
0inputs+42400outputs (0major+112203minor)pagefaults 0swaps
Processing /tmp/user/1000/tmp.hiF75TCOBr/tox
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Building wheels for collected packages: tox
  Building wheel for tox (PEP 517) ... done
  Created wheel for tox: filename=tox-3.14.1.dev6+g3980da7-py2.py3-none-any.whl size=80773 sha256=fb850196ad8cd3449fb16b6e2d812c4c54a7fd16a5036f766b6562710a2c4703
  Stored in directory: /tmp/user/1000/tmp.hiF75TCOBr/tox
Successfully built tox
1.75user 0.23system 0:01.98elapsed 100%CPU (0avgtext+0avgdata 48588maxresident)k
0inputs+42400outputs (0major+112101minor)pagefaults 0swaps

The important lines are:

  • 1.82user 0.19system 0:02.03elapsed 99%CPU (0avgtext+0avgdata 48488maxresident)k
    352inputs+42280outputs (0major+112397minor)pagefaults 0swaps
  • 1.74user 0.21system 0:01.95elapsed 100%CPU (0avgtext+0avgdata 48484maxresident)k
    0inputs+42400outputs (0major+112203minor)pagefaults 0swaps
  • 1.75user 0.23system 0:01.98elapsed 100%CPU (0avgtext+0avgdata 48588maxresident)k
    0inputs+42400outputs (0major+112101minor)pagefaults 0swaps

How long does it take for you? If you're seeing much longer times, it may be caused by something other than the build environment setup (e.g. #2195). If you run pip with --log pip.log and post it here (attached as a file if it is too long for a comment) then we can help identify what is taking the most time. That should give us a concrete idea about how much the build environment setup factors in to the overall build time.

Sorry if I'm asking something that seems obvious, I'm trying to cover all bases so we can say without a doubt that a change in this area would help and identify how much it would help. That helps with prioritization since there are several existing issues related to pip being slow to install (#4768, #4497 kind of, #825).

@chrahunt chrahunt added the S: awaiting response Waiting for a response/more information label Nov 4, 2019
@pradyunsg
Copy link
Member

pradyunsg commented Nov 4, 2019

(just noting -- @gaborbernat is a virtualenv maintainer)

The build environments could be created from cached wheels -- optimizing things to minimize network requests is a good idea. However as #7132 and similar show, it is kinda tricky to do. Note that these are intended as optimizations, so we'd want to not have subtle+nuanced behavior changes.

If someone wants to implement the approach I propose there and a way for pip to aggressively use the cache, that'd be a good way to avoid network requests in this specific scenario.

@pradyunsg pradyunsg added state: needs discussion This needs some more discussion type: enhancement Improvements to functionality type: feature request Request for a new feature and removed type: support User Support labels Nov 4, 2019
@no-response
Copy link

no-response bot commented Nov 19, 2019

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

@no-response no-response bot closed this as completed Nov 19, 2019
@gaborbernat
Copy link
Author

@chrahunt those timings are platform dependent; it's much slower on Windows. I personally find 2s a bit slow, when could be 0.1 👍 I'd expect most projects have the same build dependencies, so we could keep reusing the envs instead of recreating them. They could for example be cached by the build-dependency names and versions.

@no-response no-response bot removed the S: awaiting response Waiting for a response/more information label Nov 19, 2019
@no-response no-response bot reopened this Nov 19, 2019
@pfmoore
Copy link
Member

pfmoore commented Nov 19, 2019

I will second @gaborbernat's comments here. Creating build environments is a non-trivial overhead in certain environments, Windows being very much an example of how bad it can get (particularly in corporate environments, where aggressive antivirus can result in all of the file copying being very slow).

On my work PC, a tox build took about 40 seconds elapsed time. From the build log, about half of that was setting up the build environment. I'd kill for a 2-second build 😉

"Make it go faster" without a clear target is always a difficult thing to ask, but conversely, the existing development was very much done on the basis of "get it work before worrying about performance", and I think we should now at a minimum review where the performance bottlenecks¹ are and look to improve them.

¹ Specifically the bottlenecks in pip's code - obviously a lot of the time in a from-source build is in the actual build, but that doesn't mean that we should ignore the overheads that pip adds to the process.

@chrahunt
Copy link
Member

I don't disagree with this idea (and implemented similar environment sharing to speed up our tests in #7276, on top of the environment sharing that we already do), I'm just trying to make sure we take an effective approach. Ideas are important, but without any data or test cases I don't see how we can prioritize or properly weigh the implementation complexity or robustness trade-offs against the benefit.

Can we:

  1. Have some way to demonstrate the performance issue independent of anyone's machine, like a representative test on CI. There we could validate that pre-built wheels are getting used where applicable, guard against issues like pip install of a directory is super slow #2195, and so on. This would also give us a chance to figure out some basics like "how to enable antivirus" and "how to make SSDs act more like HDDs for performance tests" since e.g. Azure Pipelines is unlike a typical user machine in both respects.
  2. Measure the actual time the high-level parts of the operation are taking (from log files) several times, aggregating across runs to reduce noise
  3. Given the parts of the operation taking the longest time, add more logging to unpack them even more or reproduce them in isolation while profiling and identify fundamentally what's actually taking the time

We can extract some code from chrahunt#1 for getting the profiling details and I can write up a (probably extremely basic) analysis workflow if it would help.

@pfmoore
Copy link
Member

pfmoore commented Nov 19, 2019

I agree with all of this (although it's not easy to do and I don't have time to offer much help on it) but I think it's also worth looking at where we could simply avoid work altogether. As @gaborbernat suggests, re-using build environments (or copying a well-known master) could be a useful avenue to explore.

Note that these are intended as optimizations, so we'd want to not have subtle+nuanced behavior changes.

While this is true in principle, I think there are some reasonable compromises we could make in practice. For example, we potentially build multiple environments in one run of pip. Currently I'd expect nearly all of them to be based on setuptools+wheel. At the moment, I assume we hit the network every time just in case a new release of setuptools occurs between environment builds. I don't think it would be unreasonable to not do that check, and copy one environment (or even just link to it with a .pth file, PYTHONPATH, or similar). Yes, it's a behaviour change in theory, but in practice I'd rather have the improved speed. And "pip assumes that the content of indexes does not change during the course of a single run" is hardly an unreasonable assumption to make.

Basically, at the moment I'm more interested in exploring design-level ideas than digging into the weeds of the detail.

IMO, the implementation of PEP 517 and 518 spent a lot of our budget of "acceptable performance waste" (creation of lots of isolated environments, lots of subprocess calls, etc). At some point we should look at recovering some of that overspend, as with any other form of technical debt.

@pradyunsg pradyunsg added this to the Improve User Experience milestone Nov 20, 2019
@pradyunsg
Copy link
Member

I'm also very much in favor of speed ups here.

All of this would be fairly tricky though, so I concur with @pfmoore that we should think about what this looks like and the implications of various choices, before diving into implementation (though, I don't want that to block someone else from diving into them, just do that in a new PR or different issue).

@chrahunt
Copy link
Member

Can anyone that pip is running unacceptably slowly for post a corresponding pip log file?

@gaborbernat
Copy link
Author

Unacceptably slow is relative. I don't think pip is there, but this thread is about pip doing a lot of wasteful operations here and there (such as always creating isolated build environments from scratch) that add up when used multiple times (e.g. inside tox). Granted we should measure, sadly with the virtualenv and the tox rewrite I'm engaged can't dedicate time for this now, but wanted to start a discussion on this, and see if anyone else concurs my goals to make things work faster.

@pfmoore
Copy link
Member

pfmoore commented Nov 20, 2019

Here you are @chrahunt. toxbuild.log

@chrahunt chrahunt mentioned this issue Dec 3, 2019
8 tasks
@pradyunsg pradyunsg added the help wanted For requesting inputs from other members of the community label Mar 6, 2020
@pradyunsg pradyunsg changed the title re-usable/faster build environment creation Speed up build environment creation (re-usable? faster?) Apr 21, 2020
@pradyunsg pradyunsg changed the title Speed up build environment creation (re-usable? faster?) Speed up build environment creation (re-usable, faster) Apr 21, 2020
@pradyunsg pradyunsg added PEP implementation Involves some PEP and removed PEP implementation Involves some PEP labels Jul 22, 2020
@nlhkabu nlhkabu added UX User experience related and removed UX User experience related labels Jul 28, 2020
@nlhkabu nlhkabu removed this from the Improve User Experience milestone Jul 29, 2020
@pradyunsg pradyunsg added the C: PEP 517 impact Affected by PEP 517 processing label Nov 29, 2020
@gmichaeljaison
Copy link

pip is usually slow to reinstall an already installed package in editable mode. (even with no external dependencies)
But with this additional step of "Installing build dependencies", it is even more slower.
I use pip in a monorepo, and every branch change requires to run pip install on all packages and it is extremely slow compared to other language options such as yarn

@uranusjr
Copy link
Member

uranusjr commented Sep 7, 2022

Contributions are always welcomed, especially since this issue is tagged help wanted :)

@ichard26 ichard26 added type: performance Commands take too long to run and removed type: feature request Request for a new feature labels Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: PEP 517 impact Affected by PEP 517 processing help wanted For requesting inputs from other members of the community state: needs discussion This needs some more discussion type: enhancement Improvements to functionality type: performance Commands take too long to run
Projects
None yet
Development

No branches or pull requests

8 participants