-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf regression: pipenv lock creates many full-tree copies of the project #4403
Comments
Note that this change is recent, it was introduced in c4a165b, so the affected versions are:
@techalchemy, you made this change; the repeated directory copying is causing For 'regular' projects on developer machines, the |
Here is a timing example with two separate pipenv versions, one before the change, one after, local project with cached files (note the size of the .git repository!). I had to run the 2020.6.2 attempt with $ du -hs . .git
3.2M .
2.7M .git
$ ~/.local/bin/pipenv --version
pipenv, version 2018.11.26
$ pipenv --versions
pipenv, version 2020.6.2
$ time ~/.local/bin/pipenv lock
Locking [dev-packages] dependencies…
✔ Success!
Locking [packages] dependencies…
✔ Success!
Updated Pipfile.lock (e0aade)!
real 0m49.705s
user 0m43.434s
sys 0m4.617s
$ export PIPENV_INSTALL_TIMEOUT=99999999
$ time pipenv lock
Locking [dev-packages] dependencies…
Building requirements...
Resolving dependencies...
✔ Success!
Locking [packages] dependencies…
Building requirements...
Resolving dependencies...
✔ Success!
Updated Pipfile.lock (e0aade)!
real 18m51.186s
user 17m42.817s
sys 3m3.365s So instead of 50 seconds, I had to wait nearly 19 minutes before completion. A separate Python script tracked how many directories were created: import glob, tempfile, time
maxcount = 0
pattern = f"{tempfile.gettempdir()}/reqlib-src*"
while True:
count = len(glob.glob(pattern))
if count > maxcount:
maxcount = count
print("Current maximum directory count:", maxcount)
time.sleep(1) which reached:
|
I didn't dive into why, but I have noticed that it takes an extremely long time to do an Specifically its bad enough that I've realized its way faster to just blow away the entire .venv and re-create it from scratch than it is to do an |
Our team is affected by this. We are either rolling back to pipenv 2018 or migrating to poetry. The minimal activity on this thread does not give me much confidence. Thanks mjpieters for the diagnosis. |
Issue description
While trying to debug why
pipenv update
was taking a monumental amount of time, I noticed that thepipenv/resolver.py
process was very busy with copying across data to/tmp/reqlib-src[randomvalue]
directories. The project includes several GBs of log files, so this was eating up a lot of/tmp
disk space and taking a lot of time.I traced this to the following
SetupInfo.from_ireq()
lines:pipenv/pipenv/vendor/requirementslib/models/setup_info.py
Lines 1877 to 1880 in b29a488
This creates multiple copies of the source tree (in apparent defiance of the LRU decorator), even though the project is listed just once, with no other entries under
[packages]
:It appears to create a copy per dependency; we have 104:
and while the temp dirs get cleaned up once the process finishes, towards the end I counted nearly as many temp directories:
I'm assuming at this point that had I put in a breakpoint somewhere I'd have seen 104 directories before pipenv completes. (addendum: my later timing tests show that this isn't quite the case, but still close).
While having log files in a project source directory is not ideal, it was not otherwise a problem as
setuptools
,.gitignore
andMANIFEST.in
files are configured to ignore the log files. We can't work around the issue with a symlink either asshutil.copytree()
is called withsymlinks=True
.Even then, nearly 100 copies of the full project is a huge waste of resources.
Expected result
If pipenv must create an isolated environment, it should either attempt to enumerate the source distribution files to copy, or warn or clearly document that a full copy is created of the whole tree. It should then create just one copy.
By temporarily removing the log files,
pipenv update
completed (albeit still slowly) without completely trashing the filesystem.Actual result
Either
pipenv update
times out (in pexpect), or you run out of disk space on your temp partition.Note: I've cut sensitive information out of the
--support
output; the project dependencies are simply:install_requires = apache-airflow[celery,postgres,redis] >= 1.10.11 airflow_multi_dagrun airflow-prometheus-exporter
$ pipenv --support
Pipenv version:
'2020.6.2'
Pipenv location:
'/home/ubuntu/.local/lib/python3.6/site-packages/pipenv'
Python location:
'/home/ubuntu/miniconda3/envs/project-name/bin/python'
Python installations found:
3.7.4
:/home/ubuntu/miniconda3/bin/python3.7m
3.7.4
:/home/ubuntu/miniconda3/bin/python3
3.7.4
:/home/ubuntu/miniconda3/bin/python3.7
3.6.9
:/usr/bin/python3
3.6.9
:/usr/bin/python3.6m
3.6.9
:/usr/bin/python3.6
2.7.17
:/usr/bin/python2
2.7.17
:/usr/bin/python2.7
PEP 508 Information:
System environment variables:
CONDA_SHLVL
LC_ALL
LS_COLORS
LD_LIBRARY_PATH
CONDA_EXE
SSH_CONNECTION
LESSCLOSE
LANG
CONDA_PREFIX
S_COLORS
_CE_M
XDG_SESSION_ID
USER
PWD
HOME
CONDA_PYTHON_EXE
LC_CTYPE
LC_TERMINAL
SSH_CLIENT
TMUX
LC_TERMINAL_VERSION
XDG_DATA_DIRS
COMBILEXDIR
_CE_CONDA
LADSPA_PATH
CONDA_PROMPT_MODIFIER
SSH_TTY
MAIL
TERM
SHELL
TMUX_PANE
SHLVL
LOGNAME
XDG_RUNTIME_DIR
PATH
CONDA_DEFAULT_ENV
LESSOPEN
_
OLDPWD
PIP_DISABLE_PIP_VERSION_CHECK
PYTHONDONTWRITEBYTECODE
PIP_SHIMS_BASE_MODULE
PIP_PYTHON_PATH
PYTHONFINDER_IGNORE_UNSUPPORTED
Pipenv–specific environment variables:
Debug–specific environment variables:
PATH
:/home/ubuntu/.local/bin:/home/ubuntu/bin:/home/ubuntu/.local/bin:/home/ubuntu/bin:/home/ubuntu/miniconda3/bin:/home/ubuntu/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
SHELL
:/bin/bash
LANG
:C.UTF-8
PWD
:/home/ubuntu/project-name
Contents of
Pipfile
('/home/ubuntu/project-name/Pipfile'):The text was updated successfully, but these errors were encountered: