-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the amount of calls to pip and the number of tempfiles in batch_install. #5301
Conversation
…files in batch_install.
…d remove progress bar.
""".strip() | ||
f.write(contents) | ||
c = p.pipenv("install") | ||
assert c.returncode == 0 | ||
assert "fake_package" in p.pipfile["packages"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fake package has been problematic in many different refactors, and it isn't relevant to this test case anyway. I've made an effort to replace those other problem usages with a different relevant package for testing as well.
f.write(contents) | ||
|
||
c = p.pipenv('install') | ||
c = p.pipenv('install depends-on-marked-package') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was problematic because this only exists in our local pypi, but that Pipfile doesn't have the sources -- not sure why this hadn't cropped up before now but I think it has to do with I ported the intended fix for the searching of all indexes to this PR: #5220 as well since it started as a copy of pip_install
which had this bug.
# Don't use the local fake pypi | ||
with temp_environ(), PipenvInstance_NoPyPI(chdir=True) as p: | ||
# Using pypi.python.org as pipenv-test-public-package is not | ||
# included in the local pypi mirror | ||
mirror_url = os.environ.pop('PIPENV_TEST_INDEX', "https://pypi.kennethreitz.org/simple") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mirror is dead in all actuality, and was giving problems in a couple recent refactors.
@@ -94,25 +94,6 @@ def test_sync_sequential_detect_errors(PipenvInstance): | |||
assert c.returncode != 0 | |||
|
|||
|
|||
@pytest.mark.sync | |||
@pytest.mark.lock | |||
def test_sync_sequential_verbose(PipenvInstance): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The verbose output is driven by the output of pip, no sense having a test depend on what it or what it did before since pip is subject to future upgrades.
@@ -1574,6 +1548,166 @@ def pip_install( | |||
return c | |||
|
|||
|
|||
def pip_install_deps( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This started as a copy of pip_install
and as modified to take a list of dependencies instead of a single package.
pipenv/core.py
Outdated
prefix="pipenv", suffix="requirements" | ||
) | ||
|
||
hashed_requirements = tempfile.NamedTemporaryFile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the pip documentation: https://pip.pypa.io/en/stable/topics/secure-installs/#hash-checking-mode
Note that hash-checking is an all-or-nothing proposition. Specifying --hash against any requirement will activate this mode globally.
We have two types of requirements -- those normal once with hashes, and the editable, vcs, local types that do not and it is not possible to do this in a single call to pip, but we can do it in just two!
pipenv/core.py
Outdated
pypi_mirror=pypi_mirror, | ||
) | ||
source_names = {src.get("name") for src in sources} | ||
if not search_all_sources and requirement.index in source_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This includes the fix from that other PR I referenced around package index restrictions and whether or not to search all sources.
pipenv/core.py
Outdated
source_names = {src.get("name") for src in sources} | ||
if not search_all_sources and requirement.index in source_names: | ||
sources = list(filter(lambda d: d.get("name") == requirement.index, sources)) | ||
if sources and not vcs_or_editable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit me the first go round because vcs and editable dependencies need to not specify the index, because they won't get resolved.
pipenv/core.py
Outdated
cmds.append(c) | ||
while True: | ||
line = c.stdout.readline() | ||
if line == "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was really problematic when the pip logger was overriden or verbose was passed to pip, we never recieved the EOF and no matter what the process.poll() or return_code are never available, as if the process doesn't get cleaned up until the cleanup routine. Though the cleanup routine uses p.communicate()
which doesn't get the output until the process returns at the end so there was no indicator of progress without this logic here. This gives real time progress logic when verbose is passed.
@dqkqd I would greatly appreciate your review of this as well if you have the time to do so. |
Yes, I would love to. |
There are some global options used by |
@dqkqd many of the global options though are per line such as the --hash or the --index -- for example "--require-hashes |
@dqkqd Hmm, you might be right about the indexes though -- I think we may not even need them in the requirements.txt the way it is now, because the install command also gets the index and extra-index arguments:
|
So every packages with the same index should go into a same batch. I think other global options also work that way as well. |
If we don't use index-url, |
There was a prior bug I discovered from my conversation with you and from testing with my pytorch + pypi example, I changed it to be how the regular
This latest commit solved the issue I was having installing with this 2-index example, plus it is inline with what
|
So packages with hashes and
If Dockerfile FROM archlinux
ENV LANG C.UTF-8
RUN pacman -Syu python3 python-pip --noconfirm
COPY requirements.txt .
RUN python -m pip install -r requirements.txt -v -v &> result.txt
RUN cat result.txt | grep "Found index url" Requirements file with index in the same line: # requirements.txt
pytest -i https://pypi.tuna.tsinghua.edu.cn/simple result
Requirements file with index in different line: # requirements.txt
-i https://pypi.tuna.tsinghua.edu.cn/simple
pytest result:
|
@dqkqd Thanks for the insights -- |
Thanks for explaining, I read doc and code again. You were right.
There are somethings I noticed during experiments, not sure they are helpful though:
# Pipfile
[package]
pytest = "*"
|
@dqkqd Thank you for your observations and comments--I have pushed some additional changes to make the distinction between the two file types more clear and fix the bug with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change the access to c.deps
?
I'm happy to see this made it into Pipenv. I pointed out the performance overhead of using a separate |
Oh thank you @Tenzer -- I am happy to hear others may be as excited about this change as I am. I am sorry that I didn't quite understand that issue back when I commented on it in July and locked discussions -- It was mainly because there are so many old issues with the original contributors that can be impossible to parse and make sense of sometimes, that I just prefer in some cases people open new issue reports but now that I re-read that one all the same conclusions I came to you had come to there as well. I had not even gotten to performance profiling pip, so its interesting you had found that most of the time was with importlib, I find that kind of surprising. Anyway, it became clear when I was trying out such an optimization that it used significantly less CPU time and less or similar wall clock time to do the same installs. In the case of the Python Package Manager benchmark, we cut our install time of the sentry requirements in half! https://lincolnloop.github.io/python-package-manager-shootout/ Fortunately I had the support of @oz123 to merge such a change--we both saw the performance improvement as way more important than the progress bar, especially when the |
…h_install. (pypa#5301) * Reduce the amount of calls to pip and the number of temp files in batch_install. * Add logic to read the progress of the install in realtime from pip and stop using progress bar. * refactor based on PR feedback.
This is a performance oriented PR -- the more packages you have to install, the better this change is. The one possible downside to this change is it becomes necessary to drop the progress bar indicator as the installs are performed by
pip
in up to two large batches (packages with hashes, and packages without hashes). I did some work to ensure that the verbose flag prints the pip output progress in real time such that progress can still be observed, though it is not the traditionalpipenv
progress bar, that gets dropped.When doing a regular
pipenv sync --dev
of the pipenv project when the current dependencies are already installed, the running time is about equal betweenmain
andexplore-batch-install
to re-install the four editable dependencies.0m34.686s for
explore-batch-install
and real 0m33.596s onmain
for exampleHowever it seems to really shine when installing a lot of dependencies for the first time.
main
pipenv --rm && time pipenv sync --dev
explore-batch-install
pipenv --rm && time pipenv sync --dev
For the sentry benchmark test: main branch:
explore-batch-install (this development branch)
Sentry Benchmark -- just the sync
Based on an estimate of this running at ~30% of the time of the main branch, I estimate our mean times for the pipenv install benchmark would wind up 45-55 seconds; it would be really cool to see!