You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As I really like uv, I've recently started testing it on the academic HPC clusters I regularly use, and would like to share some issues that I found, that makes working with uv harder.
I understand those might seem somewhat exotic, but I would still like to share them because I feel that handling to some extent those issues is necessary to use uv reliably in this setting.
Issue 1: venv location
In case you're not familiar with those environments, a pecularity of HPC systems that caused problems is the following:
HPC clusters generally have 2+ filesystems, $WORK and $SCRATCH, where WORK is subject to various forms of quota restrictions on the file size and # of inodes (files and directories) that can be stored, while $SCRATCH is a system where files untouched for 30 days are automatically deleted, much faster and without any restrictions on the number of files created.
In general admins are happy to increase file size quotas on WORK, but not inode quota.
A python environment with jax already consumes ~10% of the inode quota per user on some HPC clusters.
Normally, we work inside the $WORK directory, so uv will naturally create a .venv directory in there, eating up all my precious inode quota. Moreover, I normally define XDG_CACHE_HOME=$SCRATCH/.cache so uv complaints that he cannot simlink things to the vent because those are different filesystems.
To work around this issue, I declare manually a per-project UV_PROJECT_ENVIRONMENT inside of $SCRATCH, so that the environment will be created there. This is great, because it won't eat up my quota and even if it's deleted after 30 days, I don't care, because uv can naturally regenerate it if needed.
# Get the current path and remove $workcurrent_path=$(pwd)
relative_path=${current_path#$DIR}relative_path=${relative_path#/}# Replace all slashes with underscores to create a single dirnamenormalized_path=$(echo"$relative_path"|tr'/''_')
exportUV_PROJECT_ENVIRONMENT="${SCRATCH}/uv-venvs/${normalized_path}"
However I have to export this variable every time from the correct path.
I would love it if it was possible to set a single global environment variable like UV_USE_VENV_DEPOT=$SCRATCH/.cache/uv-venvs/ and uv would automatically use some logic like the one above to keep all .venv in there.
I understand that I can manually declare UV_PROJECT_ENVIRONMENT every time I change project, but that is error prone and goes against the idea that uv always makes sure I am running with the 'correct' virtual environment, automatically managed for me.
Issue 2: multiple architectures
Another peculiarity of HPC systems is that the user might use different modules. A common issue is that an user wants to use in two different settings two different versions of MPI, which can be 'loaded' by running module load mpich/v1 or module load mpich/v2.
When installing some packages with binary dependencies, such as mpi4py, uv will aggressively cache the compiled wheel. However the wheel will differ depending on the version of mpi I have loaded, which ùv does not know about.
The simplest thing that would make it easier to work with is if there was a way to specify in the pyproject.toml that the compiled wheels of a package should not be cached.
A nicer (albeit more complex and I'm not sure if it's worth it) thing would be to provide some env variable or bash command to be checked when looking into the cache.
The text was updated successfully, but these errors were encountered:
Regarding the second point, you could set reinstall-package in your pyproject.toml and we'll refresh the cached version of that package. I wonder if we need --no-cache-package <name> or --no-cache-binary
Hi,
As I really like uv, I've recently started testing it on the academic HPC clusters I regularly use, and would like to share some issues that I found, that makes working with uv harder.
I understand those might seem somewhat exotic, but I would still like to share them because I feel that handling to some extent those issues is necessary to use uv reliably in this setting.
Issue 1:
venv
locationIn case you're not familiar with those environments, a pecularity of HPC systems that caused problems is the following:
$WORK
and$SCRATCH
, where WORK is subject to various forms of quota restrictions on the file size and # of inodes (files and directories) that can be stored, while$SCRATCH
is a system where files untouched for 30 days are automatically deleted, much faster and without any restrictions on the number of files created.Normally, we work inside the
$WORK
directory, souv
will naturally create a.venv
directory in there, eating up all my precious inode quota. Moreover, I normally defineXDG_CACHE_HOME=$SCRATCH/.cache
so uv complaints that he cannot simlink things to the vent because those are different filesystems.To work around this issue, I declare manually a per-project
UV_PROJECT_ENVIRONMENT
inside of $SCRATCH, so that the environment will be created there. This is great, because it won't eat up my quota and even if it's deleted after 30 days, I don't care, because uv can naturally regenerate it if needed.However I have to export this variable every time from the correct path.
I would love it if it was possible to set a single global environment variable like
UV_USE_VENV_DEPOT=$SCRATCH/.cache/uv-venvs/
and uv would automatically use some logic like the one above to keep all.venv
in there.I understand that I can manually declare
UV_PROJECT_ENVIRONMENT
every time I change project, but that is error prone and goes against the idea that uv always makes sure I am running with the 'correct' virtual environment, automatically managed for me.Issue 2: multiple architectures
Another peculiarity of HPC systems is that the user might use different modules. A common issue is that an user wants to use in two different settings two different versions of MPI, which can be 'loaded' by running
module load mpich/v1
ormodule load mpich/v2
.When installing some packages with binary dependencies, such as
mpi4py
, uv will aggressively cache the compiled wheel. However the wheel will differ depending on the version of mpi I have loaded, whichùv
does not know about.The simplest thing that would make it easier to work with is if there was a way to specify in the
pyproject.toml
that the compiled wheels of a package should not be cached.A nicer (albeit more complex and I'm not sure if it's worth it) thing would be to provide some env variable or bash command to be checked when looking into the cache.
The text was updated successfully, but these errors were encountered: