-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make torch dependency more flexible #355
Comments
I guess this was meant to be a fractal-tasks-core issue (unless its goal is to provide a way to install different packages on different clusters). Relevant refs on CUDA/pytorch versions and compatibility: |
(also: ref #220) |
My bad. Yes, it should be a tasks issue :) And the goal would be to allow an admin setting things up or a user installing the core tasks to get more control about which torch version is used. The effect would be that different torch versions are installed on different clusters. Not sure what the best way to make this happen will be, but it shouldn't be a server concern if at all possible :) |
A possible way out would be to add package extras, so that one could install the package as
Let's rediscuss it. |
Optional extras specify the pytorch version If nothing is specified, pip install cellpose will install something (likely the newest pytorch version) |
What is our plan regarding torch versions for the fractal-tasks extra? Not the biggest fan of multiple different extra editions tbh, but would be great to allow the torch installation to work better (i.e. also work "out of the box" on more modern system than the UZH GPUs) |
Refs (to explore further):
|
See
Maybe doable by combining [tool.poetry.dependencies]
pathlib2 = { version = "^2.2", markers = "python_version <= '3.4' or sys_platform == 'win32'" } with [tool.poetry.dependencies]
foo = [
{version = "<=1.9", python = ">=3.6,<3.8"},
{version = "^2.0", python = ">=3.8"}
] |
We explored multiple options with @mfranzon, and we don't see any which makes sense to us via conditional dependencies or something similar. We then propose that:
Since this is very tedious, we also propose the following workaround for doing it automatically (to be included in fractal-server - we can then open issue over there). {
"package": "string",
"package_version": "string",
"package_extras": "string",
"python_version": "string"
} We could add an additional attribute, like
CAVEAT: this is messing with the package, and thus creating a not-so-clean log of the installation (although we would still include also the additional torch-installation logs). Such an operation is meant to be restricted to very specific cases, where there is an important dependency on hardware or system libraries - things that a regular user should not be using. IMPORTANT NOTE 1 IMPORTANT NOTE 2 MINOR NOTE: |
Thanks for digging into this! Sounds good to me. I already tested it with torch 2.0.0 on the FMI side and that also works, so I don't see a strong reason for limiting the torch version at all for the time being. Having the |
Server work is deferred to fractal-analytics-platform/fractal-server#740.
|
I have seen no reason for constraints so far, given that 2.0.0 still worked well. We just need torch for cellpose, right? Do we still add it as an explicit dependency for the extras (to make the |
Basically, our torch constraint is:
|
Note that when torch 2.0 is used this change also introduces additional dependencies (e.g. sympy and mpmath)
Anndata also uses it, but they are not very strict in the dependency version: To do:
Note: the list below is a bunch of not-very-systematic tests. This is all preliminary, but it'd be nice to understand things clearly - since we are already at it. Here are some raw CI tests
|
Finally found the issue (it's a torch 2.0.1 issue, which is exposed by anndata imports but unrelated to anndata)
Current fix: we have to include torch dependency explicitly, and make it |
For the record, the new size of the installed package is quite larger - and I think this is due to the torch 2.0 requirement of nvidia libraries:
|
Currently, we hardcode torch version 1.12 in the fractal-tasks-core dependencies to make it work well on older UZH GPUs. The tasks themselves don't depend on that torch version though and run fine in other torch versions (e.g. 1.13 or even the new 2.0.0).
The 1.12 dependency made some issues on @gusqgm Windows Subsystem Linux test. On the FMI cluster, it's fine on some GPU nodes, but actually runs in the error below on other GPU nodes. I tested with torch 2.0.0 now and then everything works.
Thus, we should make the torch version more flexible. The correct torch version to install depends on the infrastructure, not the task package.
A workaround until we have it is to manually install torch of a given version into the task venv:
If someone is searching for it, I'm hitting this error message when the torch version doesn't match:
The text was updated successfully, but these errors were encountered: