-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] Training with Metal Performance Shaders #358
Comments
There's now basic MPS support in lightning so acceleration on Mac should work now with the development version. |
|
This isn't/shouldn't be Mac OS specific. The latest fix on |
shapely-2.0a1, Bd231_qt_1_10.jpg. Shapely-1.7.1 fails to install. With Shapely-1.8.4 and GPU I get errors and kraken does not terminate:
Shapely-1.8.4 and CPU works:
|
Yeah, I'm loath to touch shapely-related issues right now until they stabilise the code base. 1.7.1 should work fine and is pinned now on both the binary packages and environments. |
But in this case I was forced to use 1.8.4, see my last comment above. So the pinned 1.7.1 does not always work. I still have to find a solution how to get 1.7.1 installed on MacOS. In addition the Metal Performance Shaders still don't work as kraken segmentation does not terminate with |
Latest code with Shapely 1.8.4 no longer hangs, but fails:
With 2.0a1 it fails, too:
|
On the way to restoring MacOS support #358
Thanks. One is a shapely error (fixed now), the other something else. Unfortunately, I can't reproduce the non-shapely one but it is in the polygonizer and crashes because the RoI has a size of 0 which shouldn't happen. |
Shapely 2.0a1 (and same error now with 1.8.4, too):
|
I'll have to find an Apple system to figure out why it fails as I can't reproduce it on any of my Linux machines. |
The latest code still fails with Shapely-1.8.4 and Shapely-2.0b2, but now with a different error:
|
Running with CPU and Shapely-2.0b2 now fails, too (regression?):
The same test works with Shapely 1.8.5.post1. |
My latest attempt to train using MPS failed because PyTorch doesn't yet support CTC layers in MPS. The error message referred to a PyTorch issue, inviting to ask for support of specific features. |
Update: latest pytorch still has the same problem:
With
|
Update: |
@mittagessen, I suggest to re-open this feature request because MPS support is still incomplete (not sufficient for a training). |
Related pytorch issue: pytorch/pytorch#77764 (comment). |
PyTorch recently introduced support for Apple's Metal Performance Shaders (MPS), see https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/.
Using the new mps backend for PyTorch should accelerate training on Apple M1 machines a lot and is a desired feature for Kraken. In this issue I'd like to track the current status.
kraken<4.0.0
Installation in a fresh virtual Python environment:
Training can be startet with
ketos train -f page -t list.eval -d mps
, but aborts with a runtime error which does not occur when running with-d cpu
:kraken>=4.0.0
Latest kraken releases use PyTorch Lightning which currently does not support the mps backend (see Lightning-AI/pytorch-lightning#13102), so training is not possible with
-d mps
.The text was updated successfully, but these errors were encountered: