Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Intel XPU / Intel ARC GPUs #1329

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Sikerdebaard
Copy link

Hi everyone,

I am excited to announce that I have begun adding Intel XPU support through IPEX into nnUNet, which will allow training and inference on the Intel ARC GPUs. However, I would like to note that the code needs further testing and optimization before merging. Therefore, I am sharing it with the community in hopes that others can contribute to this project.

Currently, the code has only been tested for CPU and Intel XPU. Therefore, there may be bugs that need to be addressed. Furthermore, I have noticed that training on an AMD 7900x CPU is faster than training with the A770 Intel ARC GPU using this code. Additionally, the XPU backend only supports BFloat16 precision at this time.

If you are interested in helping with this project, please feel free to contribute or provide feedback.

@FabianIsensee
Copy link
Member

Hey thanks for this amazing work! I like how you are abstracting the backends into separate classes. Today we have released nnU-net v2 which already extends the supported devices to cuda, cpu and mps.
For the next couple of weeks I will be quite busy with an upcoming evaluation, but after that I would like to discuss how we can use this principle in nnU-Net v2 in order to make integrating new devices less tedious. May I get bet to you on that?

@FabianIsensee
Copy link
Member

Hey Thomas, I think fabric is the way to go for this in the future. I will work on adding fabric to nnU-Net soon
https://lightning.ai/pages/open-source/fabric/

@Sikerdebaard
Copy link
Author

Sikerdebaard commented Mar 31, 2023

Hi Fabian,

If you are looking into frameworks as a solution then ONNX might be worth considering as well. It is backed by Microsoft.
It seems that both frameworks, lightning and ONNX, do not support Intel XPU out of the box yet for training, but for inference ONNX can already use XPU through the oneDNN API. Furthermore with ONNX it is possible to convert the model to tensorflow and then to tensorflow.js which could be a useful addition.

@FabianIsensee
Copy link
Member

Hey, I am quite confident that fabric will support XPUs soon. I have talked to one of their developers recently and they seem highly motivated to include everything that is needed for broad adoption. I like how fabric seamlessly integrates into existing pytorch code which is why I like this solution. It works for both training and inference.
If certain formats, like ONNX, are required for running inference in some circumstances, then it would be better to have some onnx export code that takes care of that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants