Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Re-install python libraries rather than upgrading #1939

Draft
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

bemoody
Copy link
Collaborator

@bemoody bemoody commented Mar 21, 2023

Change the git deployment process to install pip packages by re-installing into a new directory each time, rather than trying to upgrade in place.

See issue #1921. There are some potential problems with this (virtualenv is ugly, yo) and it needs thorough testing.

TODO:

  • install-pn-test-server won't set up the initial prefix the same way update does, so the first push will always force a reinstall. Should it?

  • push options don't actually work with update?! (also, push options are disabled by default)

@bemoody
Copy link
Collaborator Author

bemoody commented Mar 23, 2023

This writes a somewhat amazing amount of junk to /data/log/pn/update.log (about 10 MB each time.) Maybe better to log the pip stuff elsewhere.

Also, why does it recompile psycopg every time? I mean, yeah, that is kind of what I asked for, but I thought pip had a cache specifically to avoid that kind of thing. What is pip's cache for if not that?

Benjamin Moody added 3 commits April 13, 2023 11:46
In order to upgrade Python libraries atomically, we want to be able to
change /physionet/python-env/physionet to point to a new location,
while leaving the old tree in place.

When setting up the test environment, create 'physionet' as a link
pointing to the real directory 'initial', so that subsequent scripts
can overwrite the link atomically.
We want to avoid "upgrading" python libraries using pip, because this
may not give predictable results, and if installation fails it can
leave the system in a broken half-installed state.

Consequently, instead of installing packages into
/physionet/python-env/physionet, install them into a directory named
for the MD5 hash of requirements.txt.

Installation may produce a lot of output (e.g. when compiling
extension modules from source), so store the log in the installation
directory rather than further cluttering update.log.

After the pip command is successful, copy requirements.txt to a new
file installed.txt.  This serves as a flag to indicate that the
packages were installed successfully, and also provides documentation
of what was supposedly installed.

If requirements.txt matches installed.txt, then we know that we don't
need to reinstall anything.

After the packages have been successfully installed and we have tested
that they're working (at least well enough to invoke
getmigrationtargets), then update the symlink
/physionet/python-env/physionet to point to the new prefix.
We want to avoid "upgrading" python libraries using pip, because this
may not give predictable results, and if installation fails it can
leave the system in a broken half-installed state.  Instead, we want
to install packages from scratch whenever it is required.

To be consistent with the new behavior of the git update hook, do the
same thing when running upgrade tests: rather than "upgrading" the
virtualenv directory, delete the old one and create a new one.  This
makes caching simpler, too, since the directory contents should be
reproducible just from requirements.txt.
@tompollard
Copy link
Member

@bemoody is this still a WIP or ready for review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants